Optimal Classification Policy and Comparisons for Highly Reliable Products

In the current competitive marketplace, manufacturers need to classify products in a short period of time, according to market demand. Hence, it is a ...

0 downloads 50 Views 537KB Size

Download PDF

Sankhy¯ a : The Indian Journal of Statistics 2015, Volume 77-B, Part 2, pp. 321-358 c 2015, Indian Statistical Institute

Optimal Classiﬁcation Policy and Comparisons for Highly Reliable Products Chien-Yu Peng Institute of Statistical Science, Academia Sinica, Taipei, Taiwan Abstract In the current competitive marketplace, manufacturers need to classify products in a short period of time, according to market demand. Hence, it is a challenge for manufacturers to implement a classiﬁcation test that can distinguish the diﬀerent grades of a product quickly and eﬃciently. For highly reliable products, if quality characteristics exist whose degradation over time can be related to the lifetime of the product, the degradation model can then be constructed based on the degradation data. In this study, we propose a general degradation model using a Gaussian mixture process that simultaneously considers unit-to-unit variability, within-unit variability and measurement error. Then, by adopting the concept of linear discriminant analysis, we propose a three-step classiﬁcation policy to determine the optimal coeﬃcients, the optimal cutoﬀ points and the optimal test stopping time. In addition, we use an analytic approach to compare the eﬃciency of our proposed procedure with the methods that are reported in the previous literature under small sample size cases. Analytical comparisons provide functional equations under diﬀerent assumptions. The solutions are found to elucidate the foundation between diﬀerent methods proposed in recent studies. Finally, several data sets are used to illustrate the proposed classiﬁcation procedure. AMS (2000) subject classiﬁcation. Primary 62N05; Secondary 62H30. Keywords and phrases. Degradation model, Functional equation, Highly reliable products, Linear discriminant analysis, Mahalanobis distance, Gaussian mixture process

1 Introduction For traditional classiﬁcations, the products can be classiﬁed by a simple procedure. For example, due to some degree of manufacturing, microprocessors are routinely classiﬁed by their safe clock speed (some devices of the same model can be run safely at higher speeds). Certain types of resistors (capacitors) are classiﬁed for an on-line test (Phase II) according to their Electronic supplementary material The online version of this article (doi:10.1007/s13571-015-0097-z) contains supplementary material, which is available to authorized users.

322

C.-Y. Peng

resistance (capacitance) as 1 %, 5 % or 10 % devices, depending on how close they are to their nominal values. The nominal value is usually chosen based on the lifetime of resistors. In reliability, the lifetime of products is a simple measurement and is needed to be estimated by an oﬀ-line experiment or a pilot study (Phase I). However, for highly reliable products, the major obstacle is the diﬃculty in obtaining suﬃcient failure data to eﬃciently estimate the lifetime of products. In such a situation, an alternative approach assumes that there exists a quality characteristic (QC) whose degradation over time (also called degradation paths) can be related to reliability. The product’s lifetime can be accurately estimated by collecting such degradation data. Further discussions of degradation models are included in Nelson (1990), Singpurwalla (1995), Meeker and Escobar (1998), Chao (1999), Lai and Xie (2006), van Noortwijk (2009), Zhou, Serban and Gebraeel (2011), Peng (2012, 2015) and the references given therein. Other practical applications of degradation models are in areas such as the contact image scanner of a copy/fax machine, optical ﬁber in high-speed computer networks or communication systems, power supply, metal-oxide-semiconductor ﬁeld effect transistor devices, and other dependable systems. Due to market demand, it is critical for manufacturers to design an optimal classiﬁcation policy to classify products being shipped to a marketplace. A burn-in test, which is a type of classiﬁcation, is often used to classify products and assumes that the population of a product is composed of two subpopulations (e.g., the weak and typical subpopulations). Various works on ﬁnding an optimal burn-in policy based on a degradation model have been considered. For instance, Tseng and Tang (2001) proposed an innovative approach to determine the optimal burn-in policy through a Wiener process formulation of a degradation model. Tseng, Tang and Ku (2003) proposed a burn-in procedure with a general window size for better prediction of the misclassiﬁcation probabilities. Tseng and Peng (2004) provided an eﬃcient burn-in method using an integrated Wiener process for the cumulative degradation of the product’s QC. Wu and Xie (2007) used the technique of receiver operating characteristic for classifying weak and strong components from the production. Tsai, Tseng and Balakrishnan (2011) changed the underlying process from a Wiener process to a gamma process in the burn-in problem. Xiang, Coit and Feng (2013) proposed geometric Brownian motion and gamma processes as degradation models that simultaneously determine burn-in and age-based preventive replacement policies for populations composed of distinct subpopulations. However, their assumption is restricted to the case that the mean degradation path follows a ﬁxed eﬀect model and the within-unit variability in degradation data are mainly described by

Optimal classification policy and comparisons...

323

time-dependent stochastic processes. In practical applications, however, the unit-to-unit variability (also called random eﬀects), within-unit variability and measurement error may occur simultaneously in the degradation paths; see, for example, Whitmore (1995), Lawless and Crowder (2004), and Bian and Gebraeel (2012). In this study, the issue of two subpopulations in a burn-in test is extended to the problem of several subpopulations in a classiﬁcation test. This classiﬁcation, which is generally used for manufactured products with variations, is typically generated by the manufacturing equipment and/or producers. Manufacturers can develop a pricing mechanism and make a decision of marketing strategy according to diﬀerent grades of the product. For instance, for a Wiener degradation-based process, Yu (2003) proposed the optimal test plan of choosing sample size, inspection frequency, and termination time so that the total experimental cost can be minimized. Yang (2012) provided optimal test plans that choose sample size and test time to minimize the variance of an estimate related to the overlap probability, and simultaneously satisfy constraints on test budgets and available sample size. For a linear mixed model, 2006 developed the optimal design of classifying more and less reliable products among several competing designs. Feng, Peng and Coit (2010) presented a linear mixed model to jointly determine the optimal burn-in, inspection, and maintenance decisions, based on degradation analysis and an integrated quality and reliability cost model. Using an optimal classiﬁcation policy to distinguish products can save both manufacturing cost, testing time, and substantially increase sales and the competitiveness of the products. Hence, how to develop an optimal classiﬁcation policy is a great challenge for highly reliable products. Furthermore, analytical comparisons of classiﬁcation policies have not been studied very much in the literature because there are too many conﬁgurations (e.g., Tseng and Peng 2004; Wu and Xie 2007). As a result of this lack of information, researchers and engineers encounter a challenging issue in choosing a better strategy to classify products. In this paper, an eﬀective classiﬁcation policy is proposed based on degradation reliability models in Phase I, both empirically and analytically. Speciﬁcally, we ﬁrst propose a general degradation model based on a Gaussian mixture process, which can simultaneously consider the unit-tounit variability, within-unit variability and measurement error. We then provide an eﬃcient three-step classiﬁcation policy inspired by the concept of linear discriminant analysis (LDA). By constructing a cost model that contains a trade-oﬀ between the mis-classiﬁcation and operating/measurement costs, the optimal cutoﬀ points and optimal test stopping time can be easily

324

C.-Y. Peng

obtained. In addition, several data sets are used to demonstrate the proposed classiﬁcation procedure. Finally, we use an analytic method to compare the eﬃciency of our proposed procedure with the competing approaches that were previously reported for the burn-in issue in the literature under small sample size cases. The LDA method is found to outperform the existing methods in many cases, both analytically and empirically. By solving the functional equations under diﬀerent assumptions, the solutions are found to elucidate the foundation between diﬀerent methods. Several examples are presented to show the applicability and eﬀectiveness of the classiﬁcation policy proposed in this work. The remainder of this paper is organized as follows. Section 2 describes the motivation and its problem formulation. Section 3 derives the optimal classiﬁcation policy. Section 4 uses the EM-type algorithm to compute the maximum likelihood (ML) estimations of unknown parameters for the proposed degradation model. Section 5 applies the proposed method to several illustrative examples. Section 6 provides an analytic approach to choosing the best policy in a classiﬁcation test for highly reliable products. Section 7 contains concluding remarks.

2 Motivation and Problem Formulation For highly reliable products, the successful performance of a degradation test strongly depends on the suitability of the model of a product’s degradation path. Peng and Tseng (2009) proposed the following model to describe the degradation data. Let Y (t), t ≥ 0, denote the observed values of a product’s QC at time t. Assume that Y (t) = Θt + σB B(t) + σ ,

(2.1)

where the degradation rate Θ denotes the random eﬀect with N (η, ση2 ), representing the unit-to-unit variability; σB is a diﬀusion coeﬃcient; B(t) denotes standard Brownian motion (denoted by N (0, t)), representing the withinunit variability; and is the measurement error with N (0, 1). The random eﬀect Θ, the standard Brownian motion B(t) and the measurement error are assumed to be mutually independent. Similar to the assumption in Lu and Meeker (1993, page 164), we further make an assumption of that Θ ∼ N (η, ση2 ) with ση η, so that Pr{Θ ≤ 0} is negligible. The advantage of the degradation model in (2.1) is that it allows us to simultaneously consider the unit-to-unit variability, within-unit variability and measurement

325

Optimal classification policy and comparisons...

error. More discussions and applications of this degradation model can be found in Cheng and Peng (2012), Peng and Tseng (2013), and the references therein. The linearity of a degradation path may not be achieved in practical applications. The degradation model in (2.1) can be extended by allowing a shape function Λ(t) to adapt more scientiﬁc and engineering work as follows: (2.2) Y (t) = ΘΛ(t) + σB B(t) + σ , where Λ(t) is a given, increasing and continuously diﬀerentiable on (0, ∞) and Λ(0) = 0. In addition, assume that there are g subpopulations of the products in a classiﬁcation test. The unknown number of subpopulations, g, can be determined using a model selection criteria as illustrated by the subsequent case applications. The following degradation model is used to describe the degradation path of a product’s QC for g levels of the products: ⎧ Θ1 Λ(t) + σB B(t) + σ with probability p1 for the 1st subpopulation; ⎪ ⎪ ⎨ Θ2 Λ(t) + σB B(t) + σ with probability p2 for the 2nd subpopulation; Y (t) = .. ⎪ ⎪ . ⎩ Θg Λ(t) + σB B(t) + σ with probability pg for the gth subpopulation,

(2.3)

where Θk ∼ N (ηk , ση2 ), the diﬀerent groups, having a diﬀerent mean of the random eﬀect; pk is the unknown proportion of subpopulation k, pk ∈ [0, 1] for k = 1, . . . , g, and gk=1 pk = 1. To make representation (2.3) unique (or identiﬁable), we assume the order restriction η1 < · · · < ηg (see Boldea and Magnus 2009). A more general model setting will be investigated elsewhere. For an oﬀ-line experiment in Phase I, assume that n units are tested and each unit i has a diﬀerent set of measuring times ti,1 , . . . , ti,mi . Let Yi (ti,j ) denote the degradation path of the ith unit at time ti,j with ti,0 = 0 and Yi (0) = 0, where i = 1, . . . , n and j = 1, . . . , mi . This means that every degradation path can be measured at diﬀerent time points. For conve i nience of notation, let t = ∪ni=1 ∪m j=1 {ti,j }, tmi = (ti,1 , . . . , ti,mi ) , Λ(tmi ) = (Λ(ti,1 ), . . . , Λ(ti,mi )) and Y i = (Yi (ti,1 ), . . . , Yi (ti,mi )) , where the superscript “ ” denotes the transpose. Then Y i is a Gaussian mixture process and follows a mi -dimensional normal mixture distribution, i.e., i.i.d.

Y i ∼ p1 Nmi (η1 Λ(tmi ), Σmi ) + · · · + pg Nmi (ηg Λ(tmi ), Σmi ), i = 1, . . . , n, (2.4) where ⎡ ⎤

ti,1 ti,1 ⎢ 2 Σmi = ση2 Λ(tmi )Λ(tmi ) + Ωmi , Ωmi = σB Qmi + σ2 I mi , Qmi = ⎢ ⎣ .. . ti,1

ti,1 ti,2 .. . ti,2

··· ··· .. . ···

ti,1 ti,2 ⎥ .. ⎥ ⎦, . ti,mi (2.5)

326

C.-Y. Peng

and I mi stands for an mi × mi identity matrix. In this study, we consider a linear combination of Y i , ai Y i , to represent the integrated information of the ith unit over time tmi , where ai ∈ Rmi for i = 1, . . . , n. Thus, we have i.i.d.

ai Y i ∼ p1 N (η1 ai Λ(tmi ), ai Σmi ai ) + · · · + pg N (ηg ai Λ(tmi), ai Σmi ai ), i = 1, . . . , n. (2.6)

A rule of classiﬁcation is given by

a unit Y i over time tmi

⎧ 1st subpopulation ⇔ ai Y i ≤ ξi,1 , ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ 2nd subpopulation ⇔ ξi,1 < ai Y i ≤ ξi,2 , . .. is classiﬁed as ⎪ ⎪ ⎪ ⎪ (g−1)th subpopulation ⇔ ξi,g−2 < ai Y i ≤ ξi,g−1 , ⎪ ⎪ ⎩ gth subpopulation ⇔ ai Y i > ξi,g−1 , (2.7)

where ξ i = (ξi,1 , . . . , ξi,g−1 ) and ξ = (ξ1 , . . . , ξ n ) denotes the cutoﬀ points given the set of measuring times tmi , for g levels of the products. Note that ai Λ(tmi ) > 0 for i = 1, . . . , n is also assumed to preserve the order of the transformed values with regard to the subpopulations. Furthermore, the burn-in policy proposed by Tseng and Tang (2001) uses information only on current observations (i.e., ai = emi ≡ (0, . . . , 0, 1) , where emi de mi −1

notes the mi th element vector). Tseng and Peng (2004) proposed an integrated Wiener process to model the cumulative degradation path of the product’s QC. It can be seen as a speciﬁc linear combination of the entire sequence of observations (e.g., trapezoidal rule or Simpson’s rule). In these references, the model formulations in the burn-in policy are expressed as special cases of (2.6). Therefore, it is of immediate interest to investigate how to determine the optimal coeﬃcients (i.e., a∗ = (a∗1 , . . . , a∗n ) ) and implement an optimal classiﬁcation policy for highly reliable products. We propose three steps for an optimal classiﬁcation procedure, as follows: (i) determine the optimal coeﬃcients (a∗ ) by using linear discriminant analysis; (ii) minimize the mis-classiﬁcation cost to determine the optimal cutoﬀ points (ξ ∗ ); and (iii) minimize the total cost to determine the optimal test stopping time (t∗ ).

327

Optimal classification policy and comparisons...

3 Optimal Classiﬁcation Policy In linear discriminant analysis, the Mahalanobis distance is widely used to measure dissimilarity between several groups. The discriminating function (D) in the classiﬁcation policy using (2.6) can be deﬁned as D(a) =

g−1 n

Δ2i,k ,

(3.1)

i=1 k=1

where the Mahalanobis squared distance (MSD) is given by Δi,k =

ηk+1 ai Λ(tmi ) − ηk ai Λ(tmi ) , i = 1, . . . , n, k = 1, . . . , g − 1. a i Σ mi a i

The magnitude of the MSD can be suitably used to measure the separation of the mean degradation path between two adjacent subpopulations. Intuitively speaking, a large MSD will accelerate the diﬀerence between two subpopulations. As a result, given the set of measuring times t, the optimal coeﬃcients a∗ can be determined by maximizing the sum of the Mahalanobis distance between groups, i.e., a∗ = arg max D(a). a=0

We have the following results and all proofs are given in the Appendix. Theorem 3.1. Under the assumption of the degradation model (2.3), the optimal coeﬃcients is a∗i = Ki Σ−1 mi Λ(tmi ) for any constant Ki = 0 and i = 1, . . . , n, which maximizes thediscriminating functionof (3.1). Moreover, g−1 2 the optimal value is D(a∗ ) = ni=1 Λ(tmi ) Σ−1 mi Λ(tmi ) k=1 (ηk+1 − ηk ) . −1 Without loss of generality, take Ki = 1/ Λ(tmi ) Σ−1 mi Σmi Λ(tmi ) in Theorem 3.1 to remove the eﬀect of the scale of a∗i for its identiﬁability. According to the classiﬁcation rule in (2.7), given the set of measuring times tmi and a∗i (i = 1, . . . , n), the probability (Ei,k ) of misclassifying the unit from the kth subpopulation as the other subpopulations is ⎧ for k = 1, ⎨ αi,1 βi,k−1 + αi,k for k = 2, . . . , g − 1, Ei,k = ⎩ for k = g, βi,g−1

328 where

C.-Y. Peng

αi,k = Pr a∗i Y i > ξi,k kth subpopulation ξi,k − ηk a∗i Λ(tmi ) , k = 1, . . . , g − 1, = 1−Φ a∗i Σmi a∗i

βi,k−1 = Pr a∗i Y i ≤ ξi,k−1 kth subpopulation ξi,k−1 − ηk a∗i Λ(tmi ) , k = 2, . . . , g, = Φ a∗i Σmi a∗i

and Φ denotes the cumulative distribution function of the standard normal distribution. Hence, the corresponding mis-classiﬁcation cost (MC) function and total cost (TC) function can be deﬁned as follows: g n n ∗ ∗ M C(ξ|a , t) = M Ci (ξ i |ai , tmi ) := pk Ci,k Ei,k , (3.2) i=1

T C(t|a∗ , ξ) = :=

n

i=1 k=1

T Ci (tmi |a∗i , ξ i )

i=1 g n

pk Ci,k Ei,k + Cope

i=1 k=1

n i=1

tmi + Cmea

n (mi + 1), i=1

(3.3) where Ci,k denotes the unit cost of mistaking the ith unit from the kth subpopulation for another subpopulation; Cope and Cmea denote per unit cost of operating the degradation test per unit of time and the cost of measuring data on a unit, respectively. The optimal cutoﬀ points, ξ ∗ , is determined by minimizing the MC function in (3.2), given in the following theorem. Theorem 3.2. Given the set of measuring times tmi and a∗i , the optimal ∗ , . . . , ξ∗ cutoﬀ points are ξ ∗i = (ξi,1 i,g−1 ) , where ∗ = ξi,k

(ηk + ηk+1 )a∗i Λ(tmi ) 2 ∗ a Σmi a∗i ln(Ci,k pk /[Ci,k+1 pk+1 ]) + i , i = 1, . . . , n, k = 1, . . . , g − 1. (ηk+1 − ηk )a∗i Λ(tmi ) (3.4)

Finally, substituting a∗ and ξ ∗ into (3.3), the optimal test stopping time, can be directly obtained by enumerating t ∈ t to minimize the TC function, i.e., t∗ = arg min T C(t|a∗ , ξ ∗ ). t∗ ,

t∈t

Optimal classification policy and comparisons...

329

When the measuring times are ﬁxed and the same for all tested units (i.e., m1 = · · · = mn = m and t1,1 = · · · = tn,1 , . . . , t1,m = · · · = tn,m ), only m coeﬃcients in a and g − 1 cutoﬀ points in ξ need to be determined, i.e., a∗1 = · · · = a∗n and ξ ∗1 = · · · = ξ∗n . This means that all tested units share the same linear combination with the vector of coeﬃcients, cutoﬀ points and misclassiﬁcation probabilities at the same measuring time. For practical applications in Phase II, however, the parameters in the degradation model (2.3) are unknown. Thus, in order to implement the above classiﬁcation procedure, we need to conduct a pilot study to estimate these parameters in Phase I. In the following, we adopt the ML method to estimate these unknown parameters. 4

Parameter Estimation be the collection of all the parameters Let θ = to be estimated, where η = (η1 , . . . , ηg ) , p = (p1 , . . . , pg−1 ) and pg = 1 − p1 − · · · − pg−1 . From (2.4), the density function of a single degradation path Y i is 2 , σ 2 , p ) (η , Λ, ση2 , σB

f (y i |θ) =

g−1

pk fk (y i |θ) + (1 − p1 − · · · − pg−1 )fg (y i |θ),

k=1

where fk and y i = (yi (ti,1 ), . . . , yi (ti,mi )) denotes the density function of Nmi (ηk Λ(tmi ), Σmi ) and the sample path of the ith unit at the set of measuring times tmi , respectively. The log-likelihood function can be expressed as L(θ) = ni=1 ln f (y i |θ). Since the log-likelihood function, f (y i |θ), is in the form of a logarithm of a sum, it is not easy to have an analytical expression for ML estimates of unknown parameters. The EM-type algorithm oﬀers an alternative, simpler framework for computation of the ML estimates of the unknown parameters; see, for example, Dempster, Laird and Rubin (1977) and McLachlan and Krishnan (2008). Deﬁne a random vector Z i = (Zi,1 , Zi,2 , . . . , Zi,g ) for i = 1, . . . , n as i.i.d.

a classiﬁcation index of Y i and Z i ∼ M ultinomial(1, p1 , . . . , pg ) for i = 1, . . . , n. If Zi,k = 1 denotes the kth class with the probability pk of the ith product for k = 1, . . . , g, i.e., Pr (Zi,k = 1) = pk . Let z i = (zi,1 , . . . , zi,g ) and y = (y 1 , . . . , y n ) be the observation of classiﬁcation indexes and n observations of test units, respectively. Thus, the complete data is denoted by yc = (y , z ) , where z = (z 1 , . . . , z n ) . Since z is unavailable in practice, we

330

C.-Y. Peng

can treat z as unobserved latent data. Hence, the joint density function of (Y i , Z i ) can be expressed as g z pk fk (y i |θ) i,k . f (y i , z i |θ) = k=1

Given the complete data, the log-likelihood function, Lc , can be written as Lc (θ|yc )

n

=

ln f (y i , z i |θ)

i=1 n n ln(2π) 1 mi − ln |Σmi | 2 2 i=1 i=1

g n 1 zi,k ln pk − (y i − ηk Λ(tmi )) Σ−1 + mi (y i − ηk Λ(tmi )) , 2 i=1 k=1

−

=

where zi,g = 1 − zi,1 − · · · − zi,g−1 and pg = 1 − p1 − · · · − pg−1 . Since Z i |y, θ ∼ Multinomial (1, pi,1 , . . . , pi,g ) for i = 1, . . . , n, given the lth esti2(l) 2(l) 2(l) (l) ˆ (l) = (ˆ ˆ (l) , σ ˆ ), we have mate θ η (l) , Λ ˆη , σ ˆ ,σ ˆ , p B

(l) ˆ (l) ) pˆk fk (y i |θ (l) (l) ˆ = E Zi,k |y, θ := pˆi,k (l) (l) g ˆ ) ˆk fk (y i |θ k=1 p

for k = 1, . . . , g.

The conditional expectation Q-function can be expressed as follows: (l)

ˆ ) Q(θ|θ

ˆ (l) E Lc (θ|yc )|y, θ n n ln(2π) 1 − mi − ln |Σmi | 2 2 i=1 i=1

g n 1 (l) pˆi,k ln pk − (y i − ηk Λ(tmi )) Σ−1 (y − η Λ(t )) , + m k mi i i 2 i=1 k=1

= =

(4.1) (l) where pˆi,g (l)

= 1−

(l) pˆi,1

− ··· −

(l) pˆi,g−1 .

Taking the ﬁrst partial derivatives of

ˆ ) in (4.1) with respect to p, η and σ 2 , setting these derivatives to Q(θ|θ η zero and solving these equations, we obtain n (l+1)

pˆk

2(l+1)

σ ˆη

=

=

1 n

1 n

n

(l)

i=1

n i=1

−

(l+1)

pˆi,k , ηˆk

g

(l)

k=1 1

pˆi,k

=

i=1

n

−1(l+1) (l) ˆm pˆi,k Λ(tmi ) Σ Λ(tmi ) i

, k = 1, . . . , g − 1,

i=1 ˆ −1(l+1) (y −η (l+1) Λ(tm )) 2 ) Ω i m k i i

Λ(tm

−1(l+1) Λ(tmi ) i

ˆ Λ(tmi ) Ω m

−1(l+1) (l) ˆm pˆi,k Λ(tmi ) Σ yi i

i

−1(l+1) Λ(tmi ) i

ˆ Λ(tmi ) Ω m

.

(4.2)

Optimal classification policy and comparisons... 2(l+1)

ˆ (l+1) and σ ˆ (l+1) , η ˆη Substituting p 2 ˆ Q(Λ, σB , σ2 |θ

where

(l)

)

331

into (4.1), we have

n n ln(2π) 1 mi − ln |Σmi | 2 2 i=1 i=1 g n 1 (l) (l+1) (l+1) ˆ −1 (y − ηˆ(l+1) Λ(tm )) , pˆi,k ln pˆk − (y i − ηˆk Λ(tmi )) Σ + mi i k i 2 i=1 k=1

−

=

2 ˆm = σ ˆη2(l+1) Λ(tmi )Λ(tmi ) + σB Qmi + σ2 I mi . Σ i

2(l+1) 2(l+1) 2 , σ 2 |θ ˆ (l) ) to obtain Λ ˆ (l+1) , σ Then, maximize Q(Λ, σB ˆB and σ ˆ . That is 2(l+1)

ˆ (l+1) , σ ˆB (Λ

2 ˆ (l) ). ,σ ˆ2(l+1) ) = arg max Q(Λ, σB , σ2 |θ 2 ,σ 2 ) (Λ,σB

2(l+1)

(4.3)

2(l+1)

ˆ (l+1) , σ 2 = σ ˆB , σ2 = σ ˆ into (4.2), we have the Substituting Λ = Λ B 2(l+1) (l+1) (l+1) ˆ ˆ ,η and σ ˆη . Hence, the EM-type algorithm updated estimates p then proceeds iteratively in the following two steps: 2(l+1) ˆ (l) , calculate pˆ(l+1) , ηˆ(l+1) and σ E-step: Given the lth estimate θ ˆη k k for k = 1, . . . , g − 1 by using (4.2). 2(l+1) 2(l+1) ˆ (l+1) , σ ˆB and σ ˆ by using (4.3). M-step: Update Λ

We iterate this EM-type procedure until all the parameters converge. Note that when the parametric functional form of Λ(t) is unknown or not clear, the shape function Λ(t) can be estimated non-parametrically using the above EM-type procedure. To show the applicability and eﬀectiveness of this classiﬁcation policy, three examples (concave, linear and convex degradation paths) are presented in the following section. 5 Applications We use the degradation model in (2.3) and Λ(t) = tδ proposed by Tseng et al. (2003) to describe the degradation path, where δ ∈ [0, ∞). Since the variation sources contain unit-to-unit variability, within-unit variability and measurement error, there are six combinations of variations for a degradation model (refer to Cheng and Peng 2012) as follows: M0g : Y (t) = Θtδ + σB B(t) + σ , M3g : Y (t) = Θtδ + σB B(t),

M1g : Y (t) = ηtδ + σB B(t) + σ , M4g : Y (t) = ηtδ + σB B(t),

M2g : Y (t) = Θtδ + σ , M5g : Y (t) = ηtδ + σ .

332

C.-Y. Peng

Note that the model M5g is a traditional regression, and is used as a benchmark for comparing the other degradation models. In each degradation model, the ML estimates for unknown parameters of the Gaussian mixture process have been obtained by an iterative EM procedure until all the estimated parameters converge. Furthermore, we adopt Akaike information ˆ + 2r) for the degradation model selection, criterion (i.e., AIC = −2L(θ) where r represents the number of unknown parameters in θ. The degradation model with the smallest AIC, has been chosen as a good model to ﬁt the degradation data. If it is not certain how many groups should be identiﬁed, the minimum AIC can be used to guide the number of groups. In the following examples, we also compare the minimum total cost, the optimal test stopping time and the sum of misclassiﬁcation probabilities among the non-cumulative degradation (NCD) method proposed by Tseng and Tang (2001), the cumulative degradation (CD) method proposed by Tseng and Peng (2004), and our proposed LDA method. All values are based on estimated parameter values at the test stopping time.

Example 1. The following fatigue data is taken from Wu and Ni (2003). There are 30 tested samples (i.e., n = 30) for fatigue crack development and the inspections were taken at the same measurement times. The time unit is in ten thousand cycles from 1 to 4, in increments of 0.5. Figure 1(a) is a plot of the degradation path of 40000 cycles for the fatigue data. Table 1 summarizes the results of ML estimates, sample log-likelihoods and AICs of fatigue data. A comparison of the AIC shows that in this case, the degradation model M43 (divided into three subpopulations) is adequate and parametric variations of ση and σ (i.e., random eﬀect and measurement error) are not evident in the fatigue data. Note that some of the AIC values in the Table 1 are diﬀerent from the others because the corresponding degradation models are not suitable for modeling the fatigue data. Now, we apply the NCD, CD and LDA methods to the degradation model M43 as follows. For the LDA method, since m1 = · · · = m30 = 7 and t1,1 = · · · = t30,1 , . . . , t1,7 = · · · = t30,7 , by using Theorem 3.1, we have

a∗i

=

ˆ −1 tδˆ Σ 7 7

δ Q−1 7 t7 = ˆ ˆ −1 ˆ −1 δˆ ˆ −1 δˆ tδ7 Q−1 tδ7 Σ 7 Σ 7 t7 7 Q7 t 7 ˆ

= (−0.2193, −0.1157, −0.1053, −0.0979, −0.0922, −0.0877, 0.9495) (5.1)

333

6

Y(t)

6

0

0

2

2

4

4

Y(t)

8

8

10

10

12

Optimal classification policy and comparisons...

0

1

2

3

4

0

1000

2000

t

3000

4000

0.2 0.0

0.1

Y(t)

0.3

t

0

2000

4000

6000

8000

10000

t

Figure 1: Degradation paths

ˆ

ˆ

ˆ

ˆ

ˆ

ˆ

ˆ

for i = 1, . . . , 30, where tδ7 = (1, 1.5δ , 2δ , 2.5δ , 3δ , 3.5δ , 4δ ) . For illustration, assume that the cost conﬁguration

(Ci,1 , Ci,2 , Ci,3 , Cmea , Cope ) = (25, 35, 55, 2, 1)

is used in this classiﬁcation policy. From Theorem 3.2, given the set of measuring times t7 and a∗i in (5.1), the optimal cutoﬀ points are ξ∗1 = · · · = ∗ , ξ ∗ ) = (3.6413, 5.8129) for i = 1, . . . , 30. Given the ξ ∗30 , where ξ ∗i = (ξi,1 i,2

Model M01 M11 M21 M31 M41 M51 M02 M12 M22 M32 M42 M52 M03 M13 M23 M33 M43 M53 M04 M14 M24 M34 M44 M54

ηˆ1 0.58838 0.67787 0.74076 0.58844 0.67791 0.77531 0.50207 0.49559 0.58812 0.50208 0.49560 0.55527 0.37885 0.37865 0.52419 0.37887 0.37866 0.52269 0.37527 0.37528 0.52120 0.37528 0.37526 0.52026

ηˆ2 – – – – – – 0.88006 0.88626 0.92532 0.88013 0.88631 0.92732 0.64796 0.64771 0.80401 0.64799 0.64772 0.79187 0.62071 0.62073 0.78564 0.62072 0.62070 0.76299

ηˆ3 – – – – – – – – – – – – 0.96181 0.96074 1.05884 0.96187 0.96075 1.04706 0.76043 0.76042 0.94573 0.76041 0.76041 0.91006

ηˆ4 – – – – – – – – – – – – – – – – – – 0.98152 0.98155 1.09828 0.98152 0.98150 1.10549

δˆ 1.67761 1.58713 1.51455 1.67759 1.58707 1.47641 1.67876 1.65093 1.51450 1.67875 1.65091 1.49862 1.67880 1.67712 1.51463 1.67878 1.67712 1.51121 1.67941 1.67939 1.51473 1.67940 1.67942 1.51500 σ ˆη 0.20607 – 0.21340 0.20607 – – 0.13178 – 0.13179 0.13176 – – 0.03426 – 0.07100 0.03427 – – 5.3141×10−6 – 0.06011 5.3642×10−6 – – σ ˆB 0.3281229 0.5424013 – 0.3281263 0.5423996 – 0.3281135 0.4047504 – 0.3281180 0.4047559 – 0.3281125 0.3345344 – 0.3281146 0.3345414 – 0.32584 0.32584 – 0.32584 0.32585 –

σ ˆ 9.8905×10−6 1.1344×10−5 0.25961 – – 1.07536 1.2117×10−6 6.5173×10−6 0.25962 – – 0.60300 1.4274×10−6 2.9062×10−6 0.25961 – – 0.40391 5.9492×10−8 4.8866×10−6 0.25961 – – 0.35900

pˆ1 – – – – – – 0.77447 0.69806 0.54720 0.77451 0.69805 0.46305 0.40706 0.40518 0.40521 0.40711 0.40518 0.40003 0.39892 0.39894 0.39893 0.39890 0.39893 0.39979

pˆ2 – – – – – – – – – – – – 0.43716 0.43461 0.39826 0.43719 0.43463 0.36667 0.35704 0.35699 0.35017 0.35705 0.35708 0.27134

pˆ3 – – – – – – – – – – – – – – – – – – 0.10904 0.10903 0.11779 0.10903 0.10900 0.19549

Table 1: ML estimates, sample log-likelihoods and AICs of fatigue data ˆ L(θ) −40.330 −107.128 −85.851 −40.330 −107.128 −313.245 −38.834 −61.794 −84.581 −38.834 −61.794 −212.047 −34.259 −34.739 −81.468 −34.259 −34.739 −139.817 −33.208 −33.208 −81.080 −33.208 −33.208 −121.520

AIC 90.660 222.256 179.703 88.660 220.256 632.490 91.668 135.589 181.162 89.668 133.589 434.094 86.519 85.477 178.936 84.519 83.477 293.613 88.416 86.416 182.160 86.416 84.416 261.040

334 C.-Y. Peng

Optimal classification policy and comparisons...

335

set of measuring times t7 and a∗i in (5.1), the estimated misclassiﬁcation ˆi,k , are computed as probabilities, E

ˆi,k E

⎧ ⎛ ⎞ ˆ ⎪ ∗ ∗ δ ⎪ ξ − η ˆ a t 1 ⎪ 7 i,1 i ⎪ ⎠ = 0.0159 1 − Φ⎝ ⎪ ⎪ ⎪ ∗ ˆ ∗ ⎪ ⎪ a a Σ 7 i ⎪ ⎛ ⎞i ⎛ ⎞ ⎪ ⎪ ⎪ ˆ ˆ ∗ ∗ δ ∗ ∗ δ ⎨ ξ − η ˆ a t ξ − η ˆ a t 2 2 i,1 i 7⎠ i,2 i 7⎠ + 1 − Φ⎝ = 0.0133 Φ⎝ = ⎪ ∗ ˆ ∗ ∗ ˆ ∗ ⎪ a a a a Σ Σ ⎪ 7 i 7 i i i ⎪ ⎛ ⎞ ⎪ ⎪ ˆ ⎪ ∗ ∗ δ ⎪ ξ − η ˆ ⎪ 3 ai t7 i,2 ⎪ ⎠ = 0.0062 ⎪ Φ⎝ ⎪ ⎪ ⎩ ˆ 7 a∗ a∗i Σ i

for k = 1,

for k = 2,

for k = 3,

where i = 1, . . . , 30. From the deﬁnition of the total cost in (3.3), we have ∗

∗

T C(t7 |a , ξ ) = n

3

ˆi,k + 4nCope + (7 + 1)nCmea = 492.52. pˆk Ci,k E

k=1

Following the same procedure, we computed the optimal cutoﬀ points, estimated misclassiﬁcation probabilities and total cost at each measuring time by replacing t7 by tm , where m = 1, . . . , 6. If we use the NCD method, we only set a∗i = (0, 0, 0, 0, 0, 0, 1) in the previous procedure. In addition, applying the CD method to the degradation model M23 and using Lemma 1 in Tseng and Peng (2004), we have δ+1 2 3 t t t σ t ηt δ , , B Y (x)dx = η x dx + σB B(x)dx ∼ N δ+1 3 0 0 0 which can be seen as a speciﬁc linear combination of the entire sequence of observations. The remaining evaluations of the optimal cutoﬀ points, estimated misclassiﬁcation probabilities and total cost at each measuring time can be obtained by using the procedure in Tseng and Peng (2004). Hence, by applying the NCD, CD and LDA methods to the degradation model M43 at each measuring time, the optimal cutoﬀ points, estimated misclassiﬁcation probabilities and total cost are shown in Table 2. From Table 2, the optimal test stopping time of NCD, CD and LDA methods are 3, 3.5 and 2.5 in ten thousand cycles, respectively. The corresponding optimal total costs are 443.19, 578.07 and 421.35, respectively. Numerical comparisons in terms of the sum of misclassiﬁcation probabilities and the total cost demonstrate that the LDA method can outperform the other methods for each measuring time. Thus, the LDA method is more economical for distinguishing these three subpopulations. Furthermore, for an on-line test in Phase II, use of the LDA method can shorten the test stopping time by 37.5 %(= 1 − 2.5/4).

336

C.-Y. Peng

Table 2: Optimal cutoﬀ points, estimated misclassiﬁcation probabilities and total cost at each measuring time of fatigue data t (×104 cycles) 1 1.5 2 2.5 3 3.5 4 NCD method ∗ ξi,1 0.3441 0.8845 1.5353 2.2951 3.1590 4.1227 5.1820 ∗ ξi,2 0.9995 1.7359 2.6940 3.8442 5.1694 6.6579 8.3009 Ei,1 0.5412 0.3690 0.2464 0.1561 0.0923 0.0504 0.0252 Ei,2 0.3285 0.3003 0.2227 0.1455 0.0854 0.0453 0.0217 Ei,3 0.5461 0.3476 0.2119 0.1196 0.0612 0.0280 0.0114 TC 578.74 521.03 472.53 445.45 443.19 463.36 500.54 Δi,1 0.8043 1.2963 1.8187 2.3650 2.9312 3.5143 4.1125 Δi,2 0.9357 1.5080 2.1158 2.7514 3.4101 4.0885 4.7844 CD method ∗ ξi,1 ∗ ξi,2 Ei,1 Ei,2 Ei,3 TC Δi,1 Δi,2

0.0408 0.4746 0.6989 0.2629 0.7255 644.16 0.5204 0.6054

0.3955 1.0881 0.5261 0.3304 0.5286 630.38 0.8387 0.9757

1.0372 2.1393 0.4041 0.3145 0.3876 608.78 1.1767 1.3689

2.0252 3.7260 0.3088 0.2670 0.2798 589.65 1.5301 1.7801

3.4149 5.9373 0.2315 0.2109 0.1961 578.44 1.8964 2.2063

5.2584 8.8562 0.1690 0.1573 0.1322 578.07 2.2737 2.6452

7.6053 12.5612 0.1195 0.1114 0.0851 589.63 2.6607 3.0954

LDA method ∗ ξi,1 ∗ ξi,2 Ei,1 Ei,2 Ei,3 TC Δi,1 Δi,2

0.3441 0.9995 0.5412 0.3285 0.5461 578.73 0.8043 0.9357

0.6082 1.1657 0.3492 0.2905 0.3251 504.63 1.3689 1.5926

1.0685 1.8467 0.2212 0.2025 0.1853 448.62 1.9525 2.2715

1.6042 2.6612 0.1315 0.1227 0.0962 421.35 2.5569 2.9747

2.2135 3.5985 0.0718 0.0658 0.0443 423.55 3.1805 3.7002

2.8934 4.6511 0.0356 0.0313 0.0179 449.84 3.8213 4.4457

3.6413 5.8129 0.0159 0.0133 0.0062 492.52 4.4777 5.2092

Example 2. We use laser data (linear degradation paths, i.e., δ = 1) from Meeker and Escobar (1998, Table C.17, page 642) to illustrate the classiﬁcation procedure. The QC of a laser device is its operating current. To

Optimal classification policy and comparisons...

337

maintain nearly constant light output, the laser device contains a feedback mechanism for increasing operating current when its light intensity degrades. Figure 1(b) shows a plot of the operating current for 15 tested subjects. The measured frequency of the currents is 250 hours and the experiment was terminated at 4000 hours. The results of the ML estimates, sample loglikelihoods and AICs of laser data are summarized in the supplemental ﬁle. In this case the degradation model M42 (divided into two subpopulations) is ˆ under the degradation model adequate. Hence, we use the ML estimated θ M42 and the assumed cost conﬁguration (Ci,1 , Ci,2 , Cmea , Cope ) = (65, 90, 0.0004, 0.1) in this classiﬁcation policy. Following the same procedure, the optimal cutoﬀ points, estimated misclassiﬁcation probabilities and total cost at each measuring time are available in the supplemental ﬁle. The optimal test stopping time of NCD, CD and LDA methods are 2500, 3000 and 2500 hours, respectively. The corresponding optimal total costs are 41.31, 51.15 and 41.31, respectively. Our proposed LDA method and the NCD method outperform the CD method with regard to the total cost and the sum of misclassiﬁcation probabilities. However, the LDA method and the NCD method have the same total cost and misclassiﬁcation probabilities for each measuring time. This leads to an interesting question: is the LDA method equivalent to the NCD method under the conditions δ = 1 or (ση , σ ) = (0, 0) in the degradation model? Note that by using the LDA/NCD method, we can shorten the test stopping time by 37.5 %(= 1 − 2500/4000) for an on-line test in Phase II. Example 3. Light emitting diodes (LEDs) have been applied in many ﬁelds (e.g., traﬃc signals and full-color displays etc.) and they are desirable because of their brightness, low power consumption and high reliability. We use an LED data set (T25) in Hamada et al. (2008, Table 8.6, page 292) to illustrate the procedure of classiﬁcation policy. Figure 1(c) shows a plot of the luminous ﬂux (in lumens) of 9744 hours for 25 tested subjects. The measured frequency of lumens is 336 hours for 29 inspection times. The results of ML estimates, sample log-likelihoods and AICs of T25 data are shown in the supplemental ﬁle. The degradation model M02 (divided into two subpopulations) is adequate. Following the same procedure and the assumed cost setting (Ci,1 , Ci,2 , Cmea , Cope ) = (65, 90, 0.0004, 0.1), the optimal cutoﬀ points, estimated misclassiﬁcation probabilities and total cost at each measuring time are found, as shown in the supplemental ﬁle. The optimal test stopping time of NCD, CD and LDA methods are 3024, 4368 and 2352 hours, respectively. The corresponding optimal total costs are 210.18, 245.58 and 148.94, respectively. Again, our proposed LDA method is superior to

338

C.-Y. Peng

NCD and CD methods in terms of the sum of misclassiﬁcation probabilities and the total cost for each measuring time, except for the ﬁrst measuring time. The corresponding reduction of the test stopping time is a dramatic 75.86 %(= 1 − 2352/9744) for an on-line test in Phase II. Observations and concerns raised from inspecting these examples are as follows: (i) the LDA method is useful to discriminate these degradation data sets no matter which degradation model is chosen, how many groups are identiﬁed or how the cost conﬁguration is given; (ii) it seems that the value of θ in a degradation model plays an important role in judging whether the LDA method is equivalent to the NCD method; (iii) does the given function Λ(t) = tδ provide any support in degradation modeling under these three classiﬁcation policies? Based on the proposed degradation model in (2.3) it is of interest to investigate the following problems: (i) What is a suﬃcient condition under which the LDA (or NCD, or CD) method is a better classiﬁcation policy? (ii) Which parameter in θ plays an important role in a classiﬁcation policy? (iii) What is the eﬀect of the shape function Λ for these three classiﬁcation policies? In the following section, we discuss and analyze these problems. 6 Analytical Comparisons In this section, based on the degradation model (2.3), we compare these three classiﬁcation policies in terms of the two criteria: (i) the sum of misclassiﬁcation probabilities; (ii) the total cost of conducting a classiﬁcation test. Since the measurements are available for all degradation paths at diﬀerent measuring times, with diﬀerent numbers of measurements for each subject, the comparisons of three methods will be aﬀected by this assumption. Hence, we use single subject to investigate these three classiﬁcation methods. Some i,k denote the MSD of two additional notations are needed. Let Δi,k and Δ diﬀerent linear combinations with the vector of coeﬃcients ai and bi , respectively, i.e., Δi,k =

ηk+1 ai Λ(tmi ) − ηk ai Λ(tmi ) , ai Σ m i ai

m ) − ηk bi Λ(tmi ) i,k = ηk+1 bi Λ(t i Δ b i Σm i b i

for k = 1, . . . , g − 1.

Optimal classification policy and comparisons...

339

The symbol “∼ ” denotes researchers/engineers using another classiﬁcation policy. In addition, the relationships between mis-classiﬁcation probabilities αi,k (βi,k ) and α i,k (βi,k ) under two classiﬁcation policies and cost settings are given by

Ni,k =

Ti,k =

i,k , βi,k < βi,k αi,k < α

for (−1)Ii,k Ri,k > 1,

i,k , (−1)Ii,k βi,k > (−1)Ii,k βi,k (−1)Ii,k αi,k < (−1)Ii,k α

for 0 < (−1)Ii,k Ri,k < 1,

i,k , βi,k > βi,k αi,k > α

for (−1)Ii,k Ri,k > 1,

i,k , (−1)Ii,k βi,k < (−1)Ii,k βi,k (−1)Ii,k αi,k > (−1)Ii,k α

for 0 < (−1)Ii,k Ri,k < 1,

where

Ii,k = and Ri,k =

0, if Ci,k pk < Ci,k+1 pk+1 , 1, otherwise,

i,k Δi,k Δ , k = 1, . . . , g − 1. −2 ln(Ci,k pk /[Ci,k+1 pk+1 ])

We will investigate the conditions for choosing a better strategy from the two vectors of coeﬃcients ai and bi . Lemma 6.1. For two diﬀerent linear combinations in a classiﬁcation test, we have i,k < Δi,k , 1 ≤ k = k1 ≤ g − 1. i,k < Δi,k ⇔ Δ Δ 1 1 From Lemma 6.1, it is clear that we need to understand only one relationship between two MSDs, which is equivalent to understanding g − 1 relationships among pairwise MSDs for two diﬀerent linear combinations in a classiﬁcation test. Hence, under the order restriction, we use the single re i,1 < Δi,1 to represent Δ i,k < Δi,k for k = 1, . . . , g − 1 and omit lationship Δ the index g in the degradation models M0g -M5g in the following inferences. Theorem 6.1. For two diﬀerent linear combinations in a classiﬁcation test, we have the following results: i,1 < Δi,1 and a given relationship of cost setting (I1 , I2 , . . . , Ig−1 ), (i) If Δ then (Ni,1 , . . . , Ni,g−1 ). i,1 > Δi,1 and a given relationship of cost setting (I1 , I2 , . . . , Ig−1 ), (ii) If Δ then (Ti,1 , . . . , Ti,g−1 ).

340

C.-Y. Peng

i,1 < Δi,1 From Theorem 6.1 (i), it can be seen that under the condition Δ and diﬀerent cost settings, we have the relationships between αi,k and βi,k for k = 1, . . . , g − 1. The following theorem is used to distinguish which method is superior in terms of the sum of misclassiﬁcation probabilities and the total cost. i,1 < Δi,1 , then Theorem 6.2. Given the set of measuring times tmi , if Δ (i)

g

Ei,k <

k=1

g

i,k , E

k=1

Ci . (ii) T Ci < T Note that the results of Lemma 6.1 and Theorem 6.2 are also true if the symbol (<) is replaced by the symbol (>). When m1 = · · · = mn = m and t1,1 = · · · = tn,1 , . . . , t1,m = · · · = tn,m , Theorems 2 and 3 in Tseng and Peng (2004) are special cases (i.e., g = 2) of Theorems 6.1 and 6.2, respectively. From Theorems 6.1 and 6.2, the suﬃcient condition for choosing a better strategy is to compare the MSD of the two diﬀerent methods without the eﬀect of assumptions on the cost conﬁgurations. The large MSD not only accelerates the diﬀerence between two subpopulations, which implies having a small sum of mis-classiﬁcation probabilities, but also provides a more economical classiﬁcation policy. Consequently, we only need to compare the MSD of NCD, CD and LDA methods. CD LDA denote the MSD of NCD, CD and LDA Let ΔNCD i,1 , Δi,1 and Δi,1 methods based on the degradation model in (2.3). Since the NCD method uses information only on current observations (i.e., ai = emi ) and the LDA method uses all information from the observations, we have ΔNCD = i,1

(η2 − η1 )Λ(tmi ) 2 + σB tmi + σ2

and

ση2 Λ(tmi )2

ΔLDA = (η2 − η1 ) i,1

Λ(tmi ) Σ−1 mi Λ(tmi ).

Note that when only one measurement is observed (i.e., tmi = t1 ), it is easily seen that ΔNCD = ΔLDA i,1 i,1 . It means that there is no diﬀerence between the two methods in terms of the sum of misclassiﬁcation probabilities and total cost, no matter the given conditions. Applying the CD method to the degradation model in (2.2) and using Lemma 1 in Tseng and Peng (2004), we obtain t t t t Y (x)dx = Θ Λ(x)dx + σB B(x)dx + σ dx 0

0

0

0

341

Optimal classification policy and comparisons... 2 t t 2 t3 σB 2 2 2 Λ(x)dx + ∼ N η Λ(x)dx, ση + t σ . 3 0 0 Hence, the MSD of the CD method, ΔCD i,1 , can be expressed as tm i (η2 − η1 ) Λ(x)dx CD 0 Δi,1 = . 2 tm 3 i t m 2 + t2 σ 2 ση2 Λ(x)dx + i σB mi 3 0

The conditions under which the LDA, NCD or CD method is a better classiﬁcation policy is given in the following theorem. Theorem 6.3. Under the degradation model M0 , i.e., Y (t) = ΘΛ(t) + σB B(t) + σ , let

A

=

B

=

2 2 (Λ, σB , σ )(σB tmi + σ2 )Λ(tmi ) Ω−1 mi Λ(tmi ) − Λ(tmi ) = 0 , "

! tm 2 2 i (Λ, σB , σ )(σB tmi + 3σ2 )t2mi Λ(tmi ) Ω−1 Λ(t ) − 3 Λ(x)dx ≥ 0 , mi mi

C

=

0

2 (Λ, σB , σ )(σB tmi + 3σ2 )t2mi Λ(tmi )2 − 3

!

t mi

"

2

Λ(x)dx

2 (σB t mi

+

σ2 )

≥0 .

0

Then, we have NCD and the equality holds as (Λ, σ , σ ) ∈ A; (i) ΔLDA B i,1 ≥ Δi,1 CD (ii) ΔLDA i,1 ≥ Δi,1 ⇔ (Λ, σB , σ ) ∈ B;

(iii) ΔNCD ≥ ΔCD i,1 i,1 ⇔ (Λ, σB , σ ) ∈ C. Moreover, the set C is included in the set B, i.e., C ⊂ B. For the degradation model M0 , there are 2g + 2 parameters and one shape function: η, ση , σB , σ , p and Λ. From Lemma 6.1 and Theorem 6.3, the choice of a better classiﬁcation policy is aﬀected by only two parameters (σB , σ ) and the shape function Λ. The LDA method is equivalent or superior to the NCD method, no matter the given conditions. The LDA method is equivalent to the NCD method if and only if (Λ, σB , σ ) ∈ A. If the parameters (Λ, σB , σ ) are in the set B, then the LDA method is better than the CD method. When (Λ, σB , σ ) is in the set C, we can conclude that the CD method is not the best choice because C ⊂ B. More precisely, the functional inequality in the set C is solved for the shape function Λ in the following result.

342

C.-Y. Peng

Corollary 6.1. From Theorem 6.3 (iii), we have #

ΔNCD ΔCD i,1 i,1 ⇔ Λ(t) K

2 (σB t

2 σ2 )(σB t

3σ2 )

2 σB t

2σ2

√3

+ + + + 2 3(σB t + σ2 ) 2 2 2 2 2 2 σB t + 3σ2 3(σB t + σ )(σB t + 3σ ) + 2σB t + 3σ2

for any constant K = 0. Without loss of generality, the constant K in Corollary 6.1 can be chosen to adjust the coeﬃcient to be 1 in the shape function Λ(t) in order to avoid the identiﬁable problem. Note that for the sets A and B, the explicit solution of Λ is not easy to obtain because the complex expression of Λ(tmi ) Ω−1 mi Λ(tmi ) is in the functional inequality. Theorem 6.4. If the degradation model is without measurement error, i.e., Y (t) = ΘΛ(t) + σB B(t), then we have NCD and ΔLDA = ΔNCD ⇔ Λ is linear, i.e., Λ(t) = Λ(1)t; (i) ΔLDA i,1 ≥ Δi,1 i,1 i,1

(ii) ΔLDA ≥ ΔCD ⇔ Λ satisﬁes the functional inequality t3mi Λ(tmi ) Q−1 mi i,1 i,1 t 2 mi Λ(tmi ) ≥ 3 0 Λ(x)dx ; = ΔCD (iii) ΔNCD i,1 i,1 ⇔ Λ(t) = Kt

√

3−1

for any constant K = 0.

Without loss of generality, take Λ(1) = 1 in the linear function Λ(t) and use K = 1 in Theorem 6.4 (iii). From Theorem 6.4 (i), it is of interest to note that the LDA method can outperform the NCD method when the shape function is non-linear. The NCD method is equivalent to the CD method √ if and only if the shape function is the speciﬁc power of time t (i.e., Λ(t) ∝ t 3−1 ). Theorem 6.5. Assume that the degradation model is linear, i.e., Y (t) = Θt + σB B(t) + σ . We have NCD > ΔCD ; (i) ΔLDA i,1 ≥ Δi,1 i,1 NCD ⇔ σ = 0. (ii) ΔLDA i,1 = Δi,1

Optimal classification policy and comparisons...

343

From Theorem 6.5, under the linearity assumption (i.e., Λ(t) = t), the LDA method is superior to the NCD method and the CD method is the least eﬀective. Furthermore, the LDA method is equivalent to the NCD method, when based on the linear degradation models M3 and M4 . = ΔNCD can be obtained from Remark 1. Although the result ΔLDA i,1 i,1 Theorem 6.4 (i) or Theorem 6.5 (ii), it should be questioned whether the two theoretical results could be combined into one brief statement: NCD ⇔ Λ(t) = t and σ = 0. ΔLDA i,1 = Δi,1

Here a truth table can be used to identify whether the two statements are logically equivalent or not. Unfortunately, both compound statements are not logically equivalent as demonstrated in the following. Deﬁne the simple statements: P : Λ(t) = t,

NCD S : ΔLDA i,1 = Δi,1 ,

R : σ = 0.

The statement in Theorem 6.4 (i) and Theorem 6.5 (ii) can be represented as R ⇒ (S ⇔ P) and P ⇒ (S ⇔ R), respectively. Here we claim that (P ⇒ (S ⇔ R)) ∧ (R ⇒ (S ⇔ P)) is equivalent to S ⇔ (P ∧ R), where ∧ denotes a logical conjunction. The corresponding truth table is shown in Table 3 (details are available in the supplemental ﬁle). From Table 3, it can be seen that seven cases in eight have identical truth values. However, when the simple statement (P, S, R) is (F, T, F), then the compound statement P ⇒ (D ⇔ R) ∧ R ⇒ (S ⇔ P) is true, but the compound statement S ⇔ (P ∧ R) is false.

P

S

T T T T F F F F

T T F F T T F F

Table 3: Truth table (T: true, F: false) R (P ⇒ (S ⇔ R)) ∧(R ⇒ (S ⇔ P)) T T F F T F F T T F F T T T F T

S ⇔ (P ∧ R) T F F T F F T T

344

C.-Y. Peng The following are immediate consequences of Theorem 6.3.

Corollary 6.2. For the random-eﬀects degradation model, i.e., Y (t) = ΘΛ(t) + σ . we have NCD ≥ ΔCD ; (i) ΔLDA i,1 > Δi,1 i,1

= ΔCD (ii) ΔNCD i,1 i,1 ⇔ Λ(t) = K for any constant K = 0. Without loss of generality, use Λ(t) = 1 as the constant function. From Corollary 6.2, the LDA method is superior to the other two approaches based on the degradation models M2 and M5 . Corollary 6.3. Assume the degradation model has a random intercept, i.e., Y (t) = Θ + σB B(t) + σ . We have

(i) ΔLDA ≥ ΔNCD and the equality holds as (σB , σ ) ∈ (σB , σ ) i,1 i,1 −1 2t 2 (σB mi + σ )1mi Ωmi 1mi = 1 , 2

−1 CD 2 (ii) ΔLDA i,1 ≥ Δi,1 ⇔ (σB , σ ) ∈ (σB , σ ) (σB tmi +3σ )1mi Ωmi 1mi −3 ≥ 0 ; NCD and ΔCD = ΔNCD ⇔ σ = 0. (iii) ΔCD B i,1 ≥ Δi,1 i,1 i,1

The NCD method is equivalent to the CD method by Corollary 6.2 (ii) or Corollary 6.3 (iii). Similar to the previous remark, it is not necessary to combine the results in Corollary 6.2 (ii) and Corollary 6.3 (iii) into one brief statement. In addition, under the assumption of the degradation model with Λ(t) = 1, the equally-spaced sampling scheme (i.e., ti,j −ti,j−1 = Δti for j = 1, . . . , mi ), and mi ≥ 3, the following theorem proves the LDA method is superior to the others. Theorem 6.6. Assume the degradation model with Λ(t) = 1, i.e., Y (t) = Θ + σB B(t) + σ , > ΔCD the equally-spaced sampling scheme and mi ≥ 3. We have ΔLDA i,1 i,1 ≥ NCD Δi,1 .

Optimal classification policy and comparisons...

345

These theoretical results indicate that the eﬀectiveness of the LDA method makes it more applicable in the classiﬁcation tests. Refer to the given shape function Λ(t) = tδ using in Examples 1-3. From 2 and σ 2 : Theorem 6.3 (iii), we have the following relationships among δ, σB 2t 2 3(σB mi +σ ) NCD ⇔ δ ∈ 0, ΔCD −1 , i,1 > Δi,1 σ 2 t +3σ 2 B mi

ΔNCD = ΔCD i,1 i,1 ⇔ δ = NCD CD Δi,1 > Δi,1 ⇔ δ ∈

2t 2 3(σB mi +σ ) 2t 2 σB mi +3σ 2t 2 3(σB mi +σ ) 2 σB tmi +3σ2

− 1 and − 1, ∞ .

More precisely, if σ = 0, we then have NCD ⇔ δ = 1, ΔLDA i,1 = Δi,1 √ NCD ⇔ δ ∈ [0, 3 − 1), ΔCD i,1 > Δi,1 √ = ΔCD 3 − 1 and ΔNCD i,1 i,1 ⇔ δ = √ ΔNCD > ΔCD i,1 i,1 ⇔ δ ∈ ( 3 − 1, ∞).

= ΔCD Under the case σB = 0, we have ΔNCD i,1 i,1 √⇔ δ = 0. In our Example 1, ˆ since the estimated parameter δ = 1.67712 ∈ ( 3 − 1, ∞), from Table 2 and Theorem 6.4 (i) and (iii), both numerical and theoretical results show the same conclusion ΔLDA > ΔNCD > ΔCD i,1 i,1 i,1 for each measuring time. Furthermore, the numerical results obtained in Example 2, ΔLDA = ΔNCD > ΔCD i,1 i,1 i,1 , coincide with Theorems 6.4 and 6.5. Consequently, the rationality of using Λ(t) = tδ can be justiﬁed and it makes the degradation model widely applicable. 7 Conclusions For highly reliable products, we propose a non-linear degradation model using a Gaussian mixture process that simultaneously considers unit-to-unit variability, within-unit variability and measurement error. By using an EMtype algorithm and model selection criterion, the estimated parameters and the determination of the number of groups can be obtained. We provide a three-step classiﬁcation policy to easily determine the optimal coeﬃcients, optimal cutoﬀ points and optimal test stopping time, with the last two being subject to cost. Numerical comparisons in Examples 1-3 clearly show the advantages of the LDA approach over other methods in reducing production cost. Analytic comparisons under small sample size circumstances in Section

346

C.-Y. Peng

6 demonstrate that the LDA approach can outperform the other methods for most cases. The superior performance of the LDA method in a classiﬁcation test can be attributed to the beneﬁt of information from eﬀectively using all observations in the degradation data. Although the precise results are given based on the degradation model in (2.3), and the methods used in the comparisons of the NCD, CD and LDA procedures for a Gaussian mixture process, the user must verify these assumptions before applying the proposed method. Approaches to extending the work studied in this paper to a more general model setting (such as unequal covariance matrices for each subpopulation, non-parametric degradation models, multivariate degradation measures, more general stochastic processes, covariate information, window size consideration, etc.) may be investigated in future research. Acknowledgments. This work was supported by the National Science Council (Grant No: NSC-101-2118-M-001-004) of Taiwan, Republic of China. The author thanks the Editor and the referees for their valuable comments. The author would also like to thank S. C. Hsu and K. H. Lin for their discussions and assistances in numerical evaluations.

References bian, l. and gebraeel, n. (2012). Computing and Updating First-Passage Time Distributions for Randomly Evolving Degradation Signals. IIE Transactions 44, 974– 987. boldea, o. and magnus, j. r. (2009). Maximum Likelihood Estimation of the Multivariate Normal Mixture Model. Journal of the American Statistical Association 104, 1539– 1549. chao, m. t. (1999). Degradation Analysis and Related Topics: Some Thoughts and a Review. The Proceedings of the National Science Council, Part A 23, 555–566. cheng, y. s. and peng, c. y. (2012). Integrated Degradation Models in R Using iDEMO. Journal of Statistical Software 49, 2, 1–22. dempster, a. p., laird, n. m. and rubin, d. b. (1977). Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society. B 39, 1–22. feng, q., peng, h. and coit, d. w. (2010). A Degradation-Based Model for Joint Optimization of Burn-in, Quality Inspection, and Maintenance: A Light Display Device Application. The International Journal of Advanced Manufacturing Technology 50, 801–808. hamada, m. s., wilson, a. g., reese, c. s. and martz, h. f. (2008). Bayesian Reliability. Springer, New York. johnson, r. a. and wichern, d. w. (2007). Applied Multivariate Statistical Analysis, 6th ed. Prentice Hall, New Jersey. lai, c. d. and xie, m. (2006). Stochastic Ageing and Dependence for Reliability. Springer, New York.

Optimal classification policy and comparisons...

347

lawless, j. and crowder, m. (2004). Covariates and Random Eﬀects in a Gamma Process Model with Application to Degradation and Failure. Lifetime Data Analysis 10, 213– 227. lu, c. j. and meeker, w. q. (1993). Using Degradation Measures to Estimate a Time-toFailure Distribution. Technometrics 35, 161–174. mclachlan, g. j. and krishnan, t. (2008). The EM Algorithm and Extensions, 2nd ed. Wiley, New York. meeker, w. q. and escobar, l. a. (1998). Statistical Methods for Reliability Data. Wiley, New York. nelson, w. (1990). Accelerated Testing: Statistical Models, Test Plans, and Data Analysis. Wiley, New York. peng, c. y. (2012). A Note on Optimal Allocations for the Second Elementary Symmetric Function with Applications for Optimal Reliability Design. Naval Research Logistics 59, 278–284. peng, c. y. (2015). Inverse Gaussian processes with random eﬀects and explanatory variables for degradation data. Technometrics 57, 100–111. peng, c. y. and hsu, s. c. (2012). A Note on a Wiener Process with Measurement Error. Applied Mathematics Letters 25, 729–732. peng, c. y. and tseng, s. t. (2009). Mis-speciﬁcation Analysis of Linear Degradation Models. IEEE Transactions on Reliability 58, 444–455. peng, c. y. and tseng, s. t. (2013). Statistical Lifetime Inference with Skew-Wiener Linear Degradation Models. IEEE Transactions on Reliability 62, 338–350. schott, j. r. (2005). Matrix analysis for statistics, 2nd ed. John Wiley & Sons, New York. singpurwalla, n. d. (1995). Survival in Dynamic Environments. Statistical Science 10, 86–103. tseng, s. t. and peng, c. y. (2004). Optimal Burn-in Policy by Using an Integrated Wiener Process. IIE Transactions 36, 1161–1170. tseng, s. t. and tang, j. (2001). Optimal Burn-in Time for Highly Reliable Products. International Journal of Industrial Engineering 8, 329–338. tseng, s. t., tang, j. and ku, i. h. (2003). Determination of Optimal Burn-in Parameter and Residual Life for Highly Reliable Products. Naval Research Logistics 50, 1–14. tsai, c. c., tseng, s. t. and balakrishnan, n. (2011). Optimal Burn-in Policy for Highly Reliable Products Using Gamma Degradation Process. IEEE Transactions on Reliability 60, 234–245. van noortwijk, j. m. (2009). A Survey of the Application of Gamma Processes in Maintenance. Reliability Engineering and System Safety 94, 2–21. whitmore, g. a. (1995). Estimating Degradation by a Wiener Diﬀusion Process Subject to Measurement Error. Lifetime Data Analysis 1, 307–319. wu, w. f. and ni, c. c. (2003). A Study of Stochastic Fatigue Crack Growth Modeling Through Experimental Data. Probabilistic Engineering Mechanics 18, 107–118. wu, s. and xie, m. (2007). Classifying Weak, and Strong Components Using ROC Analysis with Application to Burn-in. IEEE Transactions on Reliability 56, 552–561. xiang, y., coit, d. w. and feng, q. (2013). n Subpopulations Experiencing Stochastic Degradation: Reliability Modeling, Burn-in, and Preventive Replacement Optimization. IIE Transactions 45, 391–408. yang, g. b. (2012). Optimum Degradation Tests for Comparison of Products. IEEE Transactions on Reliability 61, 220–226. yu, h. f. (2003). Optimal Classiﬁcation of Highly Reliable Products Whose Degradation Paths Satisfy Wiener Processes. Engineering Optimization 35, 313–324.

348

C.-Y. Peng

yu, h. f. and yu, w. c. (2006). Optimal Classiﬁcation of Highly Reliable Products with a Linearized Degradation Model. Journal of Industrial and Production Engineering 23, 382–392. zhou, r. r., serban, n. and gebraeel, n. (2011). Degradation Modeling and Lifetime Monitoring Using Functional Data Analysis. Annals of Applied Statistics 5, 1586– 1610.

Appendix Proof of Theorem 3.1. The goal is to ﬁnd an optimal and nonzero vector a∗ such that n (ai Λ(tmi ))2 a ∝ arg max , a=0 ai Σmi ai ∗

i=1

which is equivalent to (ai Λ(tmi ))2 ai =0 ai Σmi ai

a∗i ∝ arg max

for i = 1, . . . , n.

By Maximization Lemma in Johnson and Wichern (2007), we have a∗i = Ki Σ−1 mi Λ(tmi ) for any constant Ki = 0 and the corresponding opti(a∗ Λ(tmi ))2 mal value is i∗ = Λ(tmi ) Σ−1 mi Λ(tmi ). The result thus follows. ai Σmi a∗i Proof of Theorem 3.2. The optimization problem is given by ξ ∗ = arg min ξ

n

M Ci (ξ i |a∗i , tmi ) =

i=1

n

arg min M Ci (ξ i |a∗i , tmi ). ξi

i=1

Consequently, given a∗i and tmi and taking the ﬁrst partial derivatives of M Ci (ξ i |a∗i , tmi ) with respect to ξi,k for k = 1, . . . , g − 1, we obtain ∂M Ci (ξ i |a∗i , tmi ) ∂ξi,k

ξi,k − ηk a∗i Λ(tmi ) = ∗ φ ai Σmi a∗i a∗i Σmi a∗i Ci,k+1 npk+1 ξi,k − ηk+1 a∗i Λ(tmi ) , φ − ∗ ai Σmi a∗i a∗i Σmi a∗i Ci,k npk

where φ denotes the density function of the standard normal distribution. Setting these g − 1 derivatives to zero and solving the g − 1 equations,

Optimal classification policy and comparisons...

349

∗ as (3.4). Since Σ we obtain ξi,k mi is positive deﬁnite and the constraints ηk+1 > ηk for k = 1, . . . , g − 1, we have

∂ 2 M Ci (ξi |a∗i , tmi ) ∂ξi,k1 ∂ξi,k2 ∗ ξi,k =ξi,k ∂ 2 M Ci (ξi |a∗i , tmi ) 2 ∂ξi,k ξi,k =ξ∗

=

0, 1 ≤ k1 = k2 ≤ g − 1,

=

i,k

$ % ∗ (ξi,k − ηk a∗i Λ(tmi ))2 Ci,k pk n exp − 3 2a∗i Σmi a∗i 2π a∗i Σmi a∗i

(ηk+1 − ηk )Λ(tmi ) Σ−1 mi Λ(tmi ) >

0, k = 1, . . . , g − 1.

Hence, the desired result follows. Proof of Lemma 6.1. Since the order restriction is ηk+1 > ηk for i,k is equivalent to k = 1, . . . , g − 1, it is easy to see that Δi,k1 > Δ 1 a Λ(t ) b Λ(t ) i mi > i mi ai Σmi ai b i Σ mi b i

⇔

(ηk+1 −ηk )ai Λ(tmi )

√

ai Σmi ai

>

(ηk+1 −ηk )bi Λ(tmi )

√

bi Σmi bi

i,k , 1 ≤ k = k1 ≤ g − 1. ⇔ Δi,k > Δ The result follows. Proof of Theorem 6.1. Without loss of generality, we only show the i,k and we have the case of either Δi,k > Δ i,k case of (i). Compare Δi,k with Δ i,k for k = 1, . . . , g − 1. The remaining proofs directly follow the or Δi,k < Δ proof of Theorem 2 in Tseng and Peng (2004). Proof of Theorem 6.2. Similar to the proof of Theorem 3 in Tseng and Peng (2004), the proof is omitted here since the steps are routine. Proof of Theorem 6.3. In order to compare the three classiﬁcation CD LDA policies (NCD, CD and LDA) in terms of ΔNCD i,1 , Δi,1 and Δi,1 , additional notation is needed. Let Am be an m × m matrix and |Am |j be the jth leading principal minors determinant of Am , i.e., a11 . . . a1j . . .. , j = 1, . . . , m. |Am |j = ... . . aj1 . . . ajj In addition, we ﬁrst prove a main lemma that will be used in the proof of Theorem 6.3.

350

C.-Y. Peng

Lemma 7.1. Σ−1 mi −

emi emi is positive semideﬁnite. emi Σmi emi

Proof of Lemma 7.1. ⎛ 0 ⎜ .. ⎜ . emi emi ⎜ =⎜ 0 ⎜ emi Σmi emi ⎝ 0

Since ... 0 . . .. . . ... 0 ... 0

⎞

0 .. . 0 1 2t 2 2 2 ση Λ(tmi ) + σB mi + σ

⎟ ⎟ ⎟ ⎟, ⎟ ⎠

−1 emi emi > 0 for j = 1, . . . , mi − 1. It is only to show we have Σmi − emi Σmi emi j −1 emi emi that Σmi − ≥ 0. By using Theorem 7.5 in Schott (2005), e mi Σ mi e mi m i we have −1 −1 e e e e Σ m m m m mi i i i i Σm − = Σmi m I mi − i i emi Σmi emi m e mi Σ mi e mi m i i Σ e −1 e m m m i i = 0. = Σmi m 1 − i i emi Σmi emi 1 Now, we return to the proof of Theorem 6.3. (i) By the deﬁnitions of and ΔLDA ΔNCD i,1 i,1 , we have Λ(tmi ) emi emi Λ(tmi ) NCD −1 ΔLDA ≥ Δ ⇔ Λ(t ) Σ Λ(t ) ≥ mi mi i,1 i,1 mi emi Σmi emi e e m m i i Λ(tmi ) ≥ 0. ⇔ Λ(tmi ) Σ−1 mi − e mi Σ mi e mi NCD By Lemma 7.1, we obtain ΔLDA i,1 ≥ Δi,1 . (ii) By (2.5) and Theorem 1.7 of Schott (2005), we have −1 Σ−1 mi = Ωmi −

ση2 1 + ση2 Λ(tmi ) Ω−1 mi Λ(tmi )

−1 Ω−1 mi Λ(tmi )Λ(tmi ) Ωmi .

Thus, one obtains Λ(tmi ) Σ−1 mi Λ(tmi ) =

Λ(tmi ) Ω−1 mi Λ(tmi )

1+ση2 Λ(tmi ) Ω−1 mi Λ(tmi )

.

(7.1)

351

Optimal classification policy and comparisons... Consequently, by using (7.1), we get 2 (ΔLDA i,1 )

≥

2 (ΔCD i,1 )

& t 2 m 3 0 i Λ(x)dx Λ(tmi ) Ω−1 mi Λ(tmi ) ≥ ⇔ & t 2 m 1 + ση2 Λ(tmi ) Ω−1 2 mi Λ(tmi ) 3ση2 0 i Λ(x)dx +t3mi σB + 3t2mi σ2

! tm 2 i 2 ⇔ (σB tmi + 3σ2 )t2mi Λ(tmi ) Ω−1 Λ(x)dx ≥ 0. mi Λ(tmi ) − 3 0

NCD 2 (iii) By the deﬁnitions of ΔNCD and ΔCD i,1 i,1 , it is easily shown that (Δi,1 ) ≥ t 2 mi 2 2 2 2 2 2t + (ΔCD Λ(x)dx (σB mi i,1 ) is equivalent to (σB tmi +3σ )tmi Λ(tmi ) −3 0

σ2 ) ≥ 0. In addition, similar to the proof of Lemma 7.1, it can be seen that 2 tm i Λ(x)dx e e 3 mi mi 0 −1 Ωm i − 2 2 2 (σB tmi + 3σ )tmi Λ(tmi )2

for j = 1, . . . , mi − 1,

>

0

≥

2 0 ⇔ (σB tmi + 3σ2 )t2mi Λ(tmi )2

j

and 2 tm i 3 Λ(x)dx e e m m i 0 i −1 Ωm i − 2 2 2 2 (σB tmi + 3σ )tmi Λ(tmi ) mi

2

tm i

−3

Λ(x)dx

2 (σB tmi + σ2 ) ≥ 0.

0

Hence, we have 2 (σB tmi

+

3σ2 )t2mi Λ(tmi )2 ⎛

⇒ Λ(tmi )

⇔

⎜

2 tmi (σB

−1 ⎝Ωmi

+

−

3

−3

t mi 0

2t (σB mi

t mi

2 Λ(x)dx

0

2

2 (σB tmi + σ2 ) ≥ 0 ⎞

emi emi ⎟ ⎠ Λ(tmi ) ≥ 0 + 3σ2 )t2mi Λ(tmi )2 Λ(x)dx

3σ2 )t2mi Λ(tmi ) Ω−1 mi Λ(tmi )

−3

t mi

2 Λ(x)dx

≥ 0.

0

That is C ⊂ B. The desired result follows. Proof of Corollary 6.1. Without loss of generality, we only show the case of the functional equation. From the set C, we have 2 t + σ2) 1 3(σB Λ(t) = . (7.2) t 2 t σB t + 3σ2 Λ(x)dx 0

352

C.-Y. Peng

By integrating equation (7.2), we immediately get t 2 x + σ2) 1 3(σB Λ(x)dx = exp 2 x + 3σ 2 dx . x σB 0 Use the fact that

1 x

⎧ ⎫ √3 ⎪ ⎪ ⎪ ⎪ ⎨x ⎬ (ax + b)(ax + 3b) + ax + 2b 3(ax + b) dx = ln + K ∗ for a, b > 0, ⎪ ⎪ ax + 3b 3(ax + b)(ax + 3b) + 2ax + 3b ⎪ ⎪ ⎩ ⎭

where K ∗ is a constant. After some simple manipulations, we obtain the result. Proof of Theorem 6.4. The following lemma is used in the proof of Theorem 6.4. Lemma 7.2. (i) All continuously diﬀerentiable functions Λ : (0, ∞) → (0, ∞) satisfying the functional equation uΛ(v) = vΛ(u) are in the form of Λ(v) = Λ(1)v. (ii) All continuously diﬀerentiable functions √ v Λ : (0, ∞) → (0, ∞) satisfying the functional equation vΛ(v) = 3 0 Λ(x)dx are in the form of Λ(v) = √ Kv 3−1 for any constant K = 0. Proof of Lemma 7.2. (i) The diﬀerence between uΛ(v) = vΛ(u) and (u + 1)Λ(v) = vΛ(u + 1) gives Λ(v) = v(Λ(u) − Λ(u + 1)).

(7.3)

Substituting u = 0 into (7.3), this concludes the proof. (ii) After taking the ﬁrst partial derivative of both sides of this functional equation, we immediately get √ dΛ(v)/dv 3−1 = . (7.4) Λ(v) v By integrating equation (7.4), we ﬁnd the desired function. Now we are ready to prove our main result. (i)-(iii) The result follows immediately after substituting σ = 0 into Theorem 6.3.√ By using 3−1 . Here, Lemma 7.2 (ii), it is easy to see that ΔNCD = ΔCD i,1 i,1 ⇔ Λ(t) = Kt we claim that ΔLDA = ΔNCD ⇔ Λ(t) = Λ(1)t. From σ = 0 and (7.1), we i,1 i,1 have

353

Optimal classification policy and comparisons...

−1 2 Λ(tmi ) ση2 Λ(tmi )Λ(tmi ) + σB Q mi Λ(tmi ) =

Λ(tmi ) Q−1 mi Λ(tmi )

2 σB

+ ση2 Λ(tmi ) Q−1 mi Λ(tmi )

.

(7.5)

Hence, by (7.5), it is easily seen that 2 NCD 2 2 ⇔ tmi Λ(tmi ) Q−1 (ΔLDA i,1 ) = (Δi,1 ) mi Λ(tmi ) − Λ(tmi ) = 0. 2 It is only to prove tmi Λ(tmi ) Q−1 mi Λ(tmi ) − Λ(tmi ) = 0 ⇔ Λ(t) = Λ(1)t. The necessary part is obtained by using the result, tmi Q−1 mi tmi = tmi , in Peng and Tseng (2009, page 454). For the suﬃcient part, the proof is by mathematical 2 induction. If mi = 2, then t2 Λ(t2 ) Q−1 2 Λ(t2 )−Λ(t2 ) = 0, which is equivalent 2 to (Λ(t1 )t2 − t1 Λ(t2 )) = 0. By Lemma 7.2 (i), the shape function is linear, i.e., Λ(t) = Λ(1)t. Assume that the result holds for mi = l. That is 2 tl Λ(tl ) Q−1 l Λ(tl ) − Λ(tl ) = 0 ⇒ Λ(t) = Λ(1)t.

For the case mi = l + 1, let Ql+1 =

Ql t l tl tl+1

, then by Theorem 7.1 in

Schott (2005), we have Q−1 l+1

=

1 tl+1 −tl

(tl+1 − tl )Q−1 l + el el −el −el 1

.

Consequently, one has Λ(tl+1 ) Q−1 l+1 Λ(tl+1 ) =

1 2 2 (tl+1 − tl )Λ(tl ) Q−1 . l Λ(tl ) + Λ(tl ) − 2Λ(tl )Λ(tl+1 ) + Λ(tl+1 ) tl+1 − tl

2 Thus, let tl+1 Λ(tl+1 ) Q−1 l+1 Λ(tl+1 ) − Λ(tl+1 ) = 0, then we have

tl+1 tl+1 −tl

2 2 (tl+1 −tl )Λ(tl ) Q−1 l Λ(tl ) + Λ(tl ) −2Λ(tl )Λ(tl+1 ) + Λ(tl+1 ) −

(tl+1 − tl ) Λ(tl+1 )2 = 0 tl+1

tl (Λ(tl+1 )tl − Λ(tl )tl+1 )2 = 0. This means that Λ(t) = tl+1 − tl Λ(1)t. The proof is complete.

implying

354

C.-Y. Peng

Proof of Theorem 6.5. (i) The result follows immediately after substituting Λ(t) = t into Theorem 6.3. (ii) From (7.1) and Λ(t) = t, we have tmi Σ−1 mi tmi = Let Ωmi = we have

2t σB Ωmi −1 mi −1 2 t 2 2 σB mi −1 σB tmi + σ

tmi Ω−1 mi t mi

1 + ση2 tmi Ω−1 mi tmi

.

. By Theorem 7.4(b) in Schott (2005),

2 4 |Ωmi | = |Ωmi −1 | σB tmi + σ2 − σB tmi −1 Ω−1 (7.6) mi −1 tmi −1 . 2 −1 2 4 Since Ω−1 mi −1 and Ωmi are positive deﬁnite, we have σB tmi + σ −σB tmi −1 Ω−1 mi −1 tmi −1 > 0. Using (7.6) and Theorem 7.1 in Schott (2005), we obtain 2 −1 −1 2 4 Ω−1 mi = σB tmi + σ − σB tmi −1 Ωmi −1 tmi −1

A11

2 −σB tmi −1 Ω−1 mi −1

2 −σB Ω−1 mi −1 tmi −1 , 1

−1 −1 −1 −1 2t 2 4 where A11 = (σB mi + σ )Ωmi −1 + σB (Ωmi −1 tmi −1 tmi −1 Ωmi −1 − tmi −1 Ωmi −1 tmi −1 Ω−1 mi −1). Hence, we have the identity

tmi Ω−1 mi t mi

=

−1 2t 2 2 (−σB mi +σ )tm −1 Ωm −1 tmi −1 +tmi i i . 2 −1 4 t σB tmi +σ2 −σB m −1 Ωm −1 tmi −1 i

(7.7)

i

After some algebraic manipulations by using (7.7), we get 2 2 = ΔNCD ⇔ (σB tmi + σ2 )tmi Ω−1 ΔLDA 1 1 mi tmi − tmi = 0

tmi −1 Ω−1 mi −1 tmi −1 = 0 ⇔ σ = 0. ⇔ σ4 2 −1 σ tm + σ2 − σ 4 t i B B mi −1 Ωmi −1 tmi −1 This concludes the proof of the theorem. > ΔNCD can be obProof of Corollary 6.2. (i) The part ΔLDA i,1 i,1 tained easily by substituting σB = 0 into Theorem 6.3. When σB = 0 in Theorem 6.3, it is suﬃcient to show that tm i NCD CD Λ(x)dx. Δi,1 ≥ Δi,1 ⇔ tmi Λ(tmi ) ≥ v

0

Let ϕ : (0, ∞) → (0, ∞) : ϕ(v) = vΛ(v) − 0 Λ(x)dx. We have dϕ(v)/dv = vdΛ(v)/dv ≥ 0 by deﬁnition. Thus, we obtain ϕ(v) ≥ ϕ(0) = 0 for v > 0,

Optimal classification policy and comparisons...

355

which gives the asserted form. (ii) We claim that all continuously diﬀerentiable functions Λ : (0, ∞) → (0, ∞) satisfying the functional equation v vΛ(v) = 0 Λ(x)dx are in the form of Λ(v) = K for any constant K = 0. v After taking the ﬁrst partial derivative of vΛ(v)− 0 Λ(x)dx = 0 with respect to v, we immediately obtain dΛ(v)/dv = 0, which implies Λ(v) = K. Proof of Corollary 6.3. Substituting Λ(t) = 1 into Theorem 6.3, the results follow directly. Proof of Theorem 6.6. We need + the following auxiliary results to 2 c = 1 for n2 < n1 to avoid prove Theorem 6.6. In addition, we deﬁne ni=n 1 i reverse product in the following formula. Theorem 7.1. Let Ω# mi be the adjoint of Ωmi , then we have (i) 1mi Ω# mi tmi =

mi

2 r−1 2 mi −r (σB ) (σ )

ti,i1

1≤i1 <···
r=1

r−1

(ti,ij+1 − ti,ij ),

j=1

(ii) 1mi Ω# m i 1m i =

mi

2 r−1 2 mi −r (σB ) (σ )

r−1

(ti,ij+1 − ti,ij ).

1≤i1 <···
r=1

Moreover, under the equally-spaced sampling scheme, i.e., ti,j − ti,j−1 = Δti for j = 1, . . . , mi , we have (iii)

ti,i1

1≤i1 <···
(ti,ij+1 − ti,ij ) =

j=1

(iv)

r−1

1≤i1 <···

r−1

(ti,ij+1 − ti,ij ) =

mi + r (Δti )r , mi − r

mi + r − 1 (Δti )r−1 . mi − r

Proof of Theorem 7.1. Similar to the proof of Theorem 1 in Peng and Hsu (2012), the proofs of (i) and (ii) are omitted here since the steps are routine. (iii) The method of generating functions can be used here for

356

C.-Y. Peng

proving combinatorial identities. Under the assumption (i.e., ti,j − ti,j−1 = Δti for j = 1, . . . , mi ), let

F (mi , r) =

i1

1≤i1 <···
r−1

(ij+1 − ij ).

j=1

First, it is easy to verify the following recurrence relation for F : F (mi , r + 1) + F (mi + 2, r + 1) = 2F (mi + 1, r + 1) + F (mi + 1, r) (7.8) and F (mi , mi ) = together with the initial conditions F (mi , 1) = mi (m i +1)/2 r i F (m 1. If we deﬁne the generating functions Ami (x) = m i , r)x , then we r=1 have A1 (x) = x and A2 (x) = 3x + x2 . After multiplying (7.8) by xr and summing over 1 ≤ r ≤ mi , we obtain 1 1 mi +2 ) x (Ami (x)−F (mi , 1)x)+ x (Ami +2 (x)−F (mi +2, 1)x−F(mi +2, mi +2)x = x2 (Ami +1 (x) − F (mi + 1, 1)x) + (Ami +1 (x) − F (mi + 1, mi + 1)xmi +1 ),

which is equivalent to Ami (x) − (2 + x)Ami +1 (x) + Ami +2 (x) − x = 0.

(7.9)

To solve (7.9), we introduce the generating function ψ(x, y) = mi ≥1 Ami (x)y mi . Then, if we multiply (7.9) by y mi and sum over mi ≥ 1, we get ψ(x, y) −

xy (2 + x)(ψ(x, y) − A1 (x)y) ψ(x, y) − A1 (x)y − A2 (x)y 2 + . = y y2 1−y

If we use the initial conditions and solve for ψ, the generating function for F is given by ψ(x, y) =

(1 −

y)(y 2

xy . − (2 + x)y + 1)

(7.10)

The unknown numbers F (mi , r) is the coeﬃcient of xr y mi in the series expansion of the above ψ(x, y). Finally, we ﬁnd that k + 2r (xy)r y k (xy)r = ψ(x, y) = k (1 − y)2r+1 r≥1

r≥1 k≥0

mi mi + r mi + r r mi mi r y x = x y = mi − r mi − r r≥1 mi ≥r

mi ≥1 r=1

Optimal classification policy and comparisons...

357

as an explicit formula for the F . (iv) Similar to the proof of (iii), let r−1

G(mi , r) =

(ij+1 − ij ).

1≤i1 <···
The recurrence relation for G can be found that

G(mi , r + 1) + G(mi + 2, r + 1) = 2G(mi + 1, r + 1) + G(mi + 1, r)

with the initial conditions G(mi , 1) = mi and G(mi , mi ) = 1. The remaining proof is similar and is omitted to save space. The theorem can be established. Now, we return to the proof of Theorem 6.6. We only show that ΔLDA i,1 > CD 2 2 Δi,1 . By Corollary 6.3 (ii), it is suﬃcient to prove (σB mi Δti + 3σ )1mi Ω# mi 1mi − 3|Ωmi | > 0. By using Theorem 7.1 and Theorem 1 in Peng and Hsu (2012), we have 2 (σB mi Δti + 3σ2 )1mi Ω# mi 1mi − 3|Ωmi |

2 mi Δti + 3σ2 ) = (σB

mi

2 (σB Δti )r−1 (σ2 )mi −r

r=1

−3

mi

2 (σB Δti )r (σ2 )mi −r

r=0

=

m i −1

2 (σB Δti )r (σ2 )mi −r

r=1

mi + r mi − r

mi + r − 1 mi − r

, mi + r − 1 mi + r mi + r mi −3 +3 mi − r mi − r mi − r − 1

2 Δti )mi (mi − 3) + 3(σ2 )mi (mi − 1) + (σB

=

m i −1 r=1

2 (σB Δti )r (σ2 )mi −r

(mi + r − 1) · · · (2r + 3)(2r + 2) h(r) (mi − r)!

2 + (σB Δti )mi (mi − 3) + 3(σ2 )mi (mi − 1),

358

C.-Y. Peng

where

4mi + 3 h(r) = (4mi − 9) r − 8mi − 18

2 +

h1 (mi ) , 4(4mi − 9)

h1 (mi ) = 48m3i − 172m2i + 84mi − 9. We claim that h(r) ≥ 0 with mi ≥ 3 for r = 1, . . . , mi − 1. For mi = 3, it can be seen that h(1) > 0 and h(2) = 0. In addition, we have h1 (mi ) ≥ h1 (4) > 0 because of dh1 (mi )/dmi > 0 for mi ≥ 4. Consequently, the theorem can be established by using Corollary 6.3 (iii).

Chien-Yu Peng Institute of Statistical Science, Academia Sinica Taipei, 11529, Taiwan E-mail: [email protected]

Paper received: 13 August 2014.

Optimal Classification Policy and Comparisons for Highly Reliable Products

Recommend Documents