Sankhy¯ a : The Indian Journal of Statistics 2015, Volume 77-B, Part 2, pp. 321-358 c 2015, Indian Statistical Institute
Optimal Classification Policy and Comparisons for Highly Reliable Products Chien-Yu Peng Institute of Statistical Science, Academia Sinica, Taipei, Taiwan Abstract In the current competitive marketplace, manufacturers need to classify products in a short period of time, according to market demand. Hence, it is a challenge for manufacturers to implement a classification test that can distinguish the different grades of a product quickly and efficiently. For highly reliable products, if quality characteristics exist whose degradation over time can be related to the lifetime of the product, the degradation model can then be constructed based on the degradation data. In this study, we propose a general degradation model using a Gaussian mixture process that simultaneously considers unit-to-unit variability, within-unit variability and measurement error. Then, by adopting the concept of linear discriminant analysis, we propose a three-step classification policy to determine the optimal coefficients, the optimal cutoff points and the optimal test stopping time. In addition, we use an analytic approach to compare the efficiency of our proposed procedure with the methods that are reported in the previous literature under small sample size cases. Analytical comparisons provide functional equations under different assumptions. The solutions are found to elucidate the foundation between different methods proposed in recent studies. Finally, several data sets are used to illustrate the proposed classification procedure. AMS (2000) subject classification. Primary 62N05; Secondary 62H30. Keywords and phrases. Degradation model, Functional equation, Highly reliable products, Linear discriminant analysis, Mahalanobis distance, Gaussian mixture process
1 Introduction For traditional classifications, the products can be classified by a simple procedure. For example, due to some degree of manufacturing, microprocessors are routinely classified by their safe clock speed (some devices of the same model can be run safely at higher speeds). Certain types of resistors (capacitors) are classified for an on-line test (Phase II) according to their Electronic supplementary material The online version of this article (doi:10.1007/s13571-015-0097-z) contains supplementary material, which is available to authorized users.
322
C.-Y. Peng
resistance (capacitance) as 1 %, 5 % or 10 % devices, depending on how close they are to their nominal values. The nominal value is usually chosen based on the lifetime of resistors. In reliability, the lifetime of products is a simple measurement and is needed to be estimated by an off-line experiment or a pilot study (Phase I). However, for highly reliable products, the major obstacle is the difficulty in obtaining sufficient failure data to efficiently estimate the lifetime of products. In such a situation, an alternative approach assumes that there exists a quality characteristic (QC) whose degradation over time (also called degradation paths) can be related to reliability. The product’s lifetime can be accurately estimated by collecting such degradation data. Further discussions of degradation models are included in Nelson (1990), Singpurwalla (1995), Meeker and Escobar (1998), Chao (1999), Lai and Xie (2006), van Noortwijk (2009), Zhou, Serban and Gebraeel (2011), Peng (2012, 2015) and the references given therein. Other practical applications of degradation models are in areas such as the contact image scanner of a copy/fax machine, optical fiber in high-speed computer networks or communication systems, power supply, metal-oxide-semiconductor field effect transistor devices, and other dependable systems. Due to market demand, it is critical for manufacturers to design an optimal classification policy to classify products being shipped to a marketplace. A burn-in test, which is a type of classification, is often used to classify products and assumes that the population of a product is composed of two subpopulations (e.g., the weak and typical subpopulations). Various works on finding an optimal burn-in policy based on a degradation model have been considered. For instance, Tseng and Tang (2001) proposed an innovative approach to determine the optimal burn-in policy through a Wiener process formulation of a degradation model. Tseng, Tang and Ku (2003) proposed a burn-in procedure with a general window size for better prediction of the misclassification probabilities. Tseng and Peng (2004) provided an efficient burn-in method using an integrated Wiener process for the cumulative degradation of the product’s QC. Wu and Xie (2007) used the technique of receiver operating characteristic for classifying weak and strong components from the production. Tsai, Tseng and Balakrishnan (2011) changed the underlying process from a Wiener process to a gamma process in the burn-in problem. Xiang, Coit and Feng (2013) proposed geometric Brownian motion and gamma processes as degradation models that simultaneously determine burn-in and age-based preventive replacement policies for populations composed of distinct subpopulations. However, their assumption is restricted to the case that the mean degradation path follows a fixed effect model and the within-unit variability in degradation data are mainly described by
Optimal classification policy and comparisons...
323
time-dependent stochastic processes. In practical applications, however, the unit-to-unit variability (also called random effects), within-unit variability and measurement error may occur simultaneously in the degradation paths; see, for example, Whitmore (1995), Lawless and Crowder (2004), and Bian and Gebraeel (2012). In this study, the issue of two subpopulations in a burn-in test is extended to the problem of several subpopulations in a classification test. This classification, which is generally used for manufactured products with variations, is typically generated by the manufacturing equipment and/or producers. Manufacturers can develop a pricing mechanism and make a decision of marketing strategy according to different grades of the product. For instance, for a Wiener degradation-based process, Yu (2003) proposed the optimal test plan of choosing sample size, inspection frequency, and termination time so that the total experimental cost can be minimized. Yang (2012) provided optimal test plans that choose sample size and test time to minimize the variance of an estimate related to the overlap probability, and simultaneously satisfy constraints on test budgets and available sample size. For a linear mixed model, 2006 developed the optimal design of classifying more and less reliable products among several competing designs. Feng, Peng and Coit (2010) presented a linear mixed model to jointly determine the optimal burn-in, inspection, and maintenance decisions, based on degradation analysis and an integrated quality and reliability cost model. Using an optimal classification policy to distinguish products can save both manufacturing cost, testing time, and substantially increase sales and the competitiveness of the products. Hence, how to develop an optimal classification policy is a great challenge for highly reliable products. Furthermore, analytical comparisons of classification policies have not been studied very much in the literature because there are too many configurations (e.g., Tseng and Peng 2004; Wu and Xie 2007). As a result of this lack of information, researchers and engineers encounter a challenging issue in choosing a better strategy to classify products. In this paper, an effective classification policy is proposed based on degradation reliability models in Phase I, both empirically and analytically. Specifically, we first propose a general degradation model based on a Gaussian mixture process, which can simultaneously consider the unit-tounit variability, within-unit variability and measurement error. We then provide an efficient three-step classification policy inspired by the concept of linear discriminant analysis (LDA). By constructing a cost model that contains a trade-off between the mis-classification and operating/measurement costs, the optimal cutoff points and optimal test stopping time can be easily
324
C.-Y. Peng
obtained. In addition, several data sets are used to demonstrate the proposed classification procedure. Finally, we use an analytic method to compare the efficiency of our proposed procedure with the competing approaches that were previously reported for the burn-in issue in the literature under small sample size cases. The LDA method is found to outperform the existing methods in many cases, both analytically and empirically. By solving the functional equations under different assumptions, the solutions are found to elucidate the foundation between different methods. Several examples are presented to show the applicability and effectiveness of the classification policy proposed in this work. The remainder of this paper is organized as follows. Section 2 describes the motivation and its problem formulation. Section 3 derives the optimal classification policy. Section 4 uses the EM-type algorithm to compute the maximum likelihood (ML) estimations of unknown parameters for the proposed degradation model. Section 5 applies the proposed method to several illustrative examples. Section 6 provides an analytic approach to choosing the best policy in a classification test for highly reliable products. Section 7 contains concluding remarks.
2 Motivation and Problem Formulation For highly reliable products, the successful performance of a degradation test strongly depends on the suitability of the model of a product’s degradation path. Peng and Tseng (2009) proposed the following model to describe the degradation data. Let Y (t), t ≥ 0, denote the observed values of a product’s QC at time t. Assume that Y (t) = Θt + σB B(t) + σ ,
(2.1)
where the degradation rate Θ denotes the random effect with N (η, ση2 ), representing the unit-to-unit variability; σB is a diffusion coefficient; B(t) denotes standard Brownian motion (denoted by N (0, t)), representing the withinunit variability; and is the measurement error with N (0, 1). The random effect Θ, the standard Brownian motion B(t) and the measurement error are assumed to be mutually independent. Similar to the assumption in Lu and Meeker (1993, page 164), we further make an assumption of that Θ ∼ N (η, ση2 ) with ση η, so that Pr{Θ ≤ 0} is negligible. The advantage of the degradation model in (2.1) is that it allows us to simultaneously consider the unit-to-unit variability, within-unit variability and measurement
325
Optimal classification policy and comparisons...
error. More discussions and applications of this degradation model can be found in Cheng and Peng (2012), Peng and Tseng (2013), and the references therein. The linearity of a degradation path may not be achieved in practical applications. The degradation model in (2.1) can be extended by allowing a shape function Λ(t) to adapt more scientific and engineering work as follows: (2.2) Y (t) = ΘΛ(t) + σB B(t) + σ , where Λ(t) is a given, increasing and continuously differentiable on (0, ∞) and Λ(0) = 0. In addition, assume that there are g subpopulations of the products in a classification test. The unknown number of subpopulations, g, can be determined using a model selection criteria as illustrated by the subsequent case applications. The following degradation model is used to describe the degradation path of a product’s QC for g levels of the products: ⎧ Θ1 Λ(t) + σB B(t) + σ with probability p1 for the 1st subpopulation; ⎪ ⎪ ⎨ Θ2 Λ(t) + σB B(t) + σ with probability p2 for the 2nd subpopulation; Y (t) = .. ⎪ ⎪ . ⎩ Θg Λ(t) + σB B(t) + σ with probability pg for the gth subpopulation,
(2.3)
where Θk ∼ N (ηk , ση2 ), the different groups, having a different mean of the random effect; pk is the unknown proportion of subpopulation k, pk ∈ [0, 1] for k = 1, . . . , g, and gk=1 pk = 1. To make representation (2.3) unique (or identifiable), we assume the order restriction η1 < · · · < ηg (see Boldea and Magnus 2009). A more general model setting will be investigated elsewhere. For an off-line experiment in Phase I, assume that n units are tested and each unit i has a different set of measuring times ti,1 , . . . , ti,mi . Let Yi (ti,j ) denote the degradation path of the ith unit at time ti,j with ti,0 = 0 and Yi (0) = 0, where i = 1, . . . , n and j = 1, . . . , mi . This means that every degradation path can be measured at different time points. For conve i nience of notation, let t = ∪ni=1 ∪m j=1 {ti,j }, tmi = (ti,1 , . . . , ti,mi ) , Λ(tmi ) = (Λ(ti,1 ), . . . , Λ(ti,mi )) and Y i = (Yi (ti,1 ), . . . , Yi (ti,mi )) , where the superscript “ ” denotes the transpose. Then Y i is a Gaussian mixture process and follows a mi -dimensional normal mixture distribution, i.e., i.i.d.
Y i ∼ p1 Nmi (η1 Λ(tmi ), Σmi ) + · · · + pg Nmi (ηg Λ(tmi ), Σmi ), i = 1, . . . , n, (2.4) where ⎡ ⎤
ti,1 ti,1 ⎢ 2 Σmi = ση2 Λ(tmi )Λ(tmi ) + Ωmi , Ωmi = σB Qmi + σ2 I mi , Qmi = ⎢ ⎣ .. . ti,1
ti,1 ti,2 .. . ti,2
··· ··· .. . ···
ti,1 ti,2 ⎥ .. ⎥ ⎦, . ti,mi (2.5)
326
C.-Y. Peng
and I mi stands for an mi × mi identity matrix. In this study, we consider a linear combination of Y i , ai Y i , to represent the integrated information of the ith unit over time tmi , where ai ∈ Rmi for i = 1, . . . , n. Thus, we have i.i.d.
ai Y i ∼ p1 N (η1 ai Λ(tmi ), ai Σmi ai ) + · · · + pg N (ηg ai Λ(tmi), ai Σmi ai ), i = 1, . . . , n. (2.6)
A rule of classification is given by
a unit Y i over time tmi
⎧ 1st subpopulation ⇔ ai Y i ≤ ξi,1 , ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ 2nd subpopulation ⇔ ξi,1 < ai Y i ≤ ξi,2 , . .. is classified as ⎪ ⎪ ⎪ ⎪ (g−1)th subpopulation ⇔ ξi,g−2 < ai Y i ≤ ξi,g−1 , ⎪ ⎪ ⎩ gth subpopulation ⇔ ai Y i > ξi,g−1 , (2.7)
where ξ i = (ξi,1 , . . . , ξi,g−1 ) and ξ = (ξ1 , . . . , ξ n ) denotes the cutoff points given the set of measuring times tmi , for g levels of the products. Note that ai Λ(tmi ) > 0 for i = 1, . . . , n is also assumed to preserve the order of the transformed values with regard to the subpopulations. Furthermore, the burn-in policy proposed by Tseng and Tang (2001) uses information only on current observations (i.e., ai = emi ≡ (0, . . . , 0, 1) , where emi de mi −1
notes the mi th element vector). Tseng and Peng (2004) proposed an integrated Wiener process to model the cumulative degradation path of the product’s QC. It can be seen as a specific linear combination of the entire sequence of observations (e.g., trapezoidal rule or Simpson’s rule). In these references, the model formulations in the burn-in policy are expressed as special cases of (2.6). Therefore, it is of immediate interest to investigate how to determine the optimal coefficients (i.e., a∗ = (a∗1 , . . . , a∗n ) ) and implement an optimal classification policy for highly reliable products. We propose three steps for an optimal classification procedure, as follows: (i) determine the optimal coefficients (a∗ ) by using linear discriminant analysis; (ii) minimize the mis-classification cost to determine the optimal cutoff points (ξ ∗ ); and (iii) minimize the total cost to determine the optimal test stopping time (t∗ ).
327
Optimal classification policy and comparisons...
3 Optimal Classification Policy In linear discriminant analysis, the Mahalanobis distance is widely used to measure dissimilarity between several groups. The discriminating function (D) in the classification policy using (2.6) can be defined as D(a) =
g−1 n
Δ2i,k ,
(3.1)
i=1 k=1
where the Mahalanobis squared distance (MSD) is given by Δi,k =
ηk+1 ai Λ(tmi ) − ηk ai Λ(tmi ) , i = 1, . . . , n, k = 1, . . . , g − 1. a i Σ mi a i
The magnitude of the MSD can be suitably used to measure the separation of the mean degradation path between two adjacent subpopulations. Intuitively speaking, a large MSD will accelerate the difference between two subpopulations. As a result, given the set of measuring times t, the optimal coefficients a∗ can be determined by maximizing the sum of the Mahalanobis distance between groups, i.e., a∗ = arg max D(a). a=0
We have the following results and all proofs are given in the Appendix. Theorem 3.1. Under the assumption of the degradation model (2.3), the optimal coefficients is a∗i = Ki Σ−1 mi Λ(tmi ) for any constant Ki = 0 and i = 1, . . . , n, which maximizes thediscriminating functionof (3.1). Moreover, g−1 2 the optimal value is D(a∗ ) = ni=1 Λ(tmi ) Σ−1 mi Λ(tmi ) k=1 (ηk+1 − ηk ) . −1 Without loss of generality, take Ki = 1/ Λ(tmi ) Σ−1 mi Σmi Λ(tmi ) in Theorem 3.1 to remove the effect of the scale of a∗i for its identifiability. According to the classification rule in (2.7), given the set of measuring times tmi and a∗i (i = 1, . . . , n), the probability (Ei,k ) of misclassifying the unit from the kth subpopulation as the other subpopulations is ⎧ for k = 1, ⎨ αi,1 βi,k−1 + αi,k for k = 2, . . . , g − 1, Ei,k = ⎩ for k = g, βi,g−1
328 where
C.-Y. Peng
αi,k = Pr a∗i Y i > ξi,k kth subpopulation ξi,k − ηk a∗i Λ(tmi ) , k = 1, . . . , g − 1, = 1−Φ a∗i Σmi a∗i
βi,k−1 = Pr a∗i Y i ≤ ξi,k−1 kth subpopulation ξi,k−1 − ηk a∗i Λ(tmi ) , k = 2, . . . , g, = Φ a∗i Σmi a∗i
and Φ denotes the cumulative distribution function of the standard normal distribution. Hence, the corresponding mis-classification cost (MC) function and total cost (TC) function can be defined as follows: g n n ∗ ∗ M C(ξ|a , t) = M Ci (ξ i |ai , tmi ) := pk Ci,k Ei,k , (3.2) i=1
T C(t|a∗ , ξ) = :=
n
i=1 k=1
T Ci (tmi |a∗i , ξ i )
i=1 g n
pk Ci,k Ei,k + Cope
i=1 k=1
n i=1
tmi + Cmea
n (mi + 1), i=1
(3.3) where Ci,k denotes the unit cost of mistaking the ith unit from the kth subpopulation for another subpopulation; Cope and Cmea denote per unit cost of operating the degradation test per unit of time and the cost of measuring data on a unit, respectively. The optimal cutoff points, ξ ∗ , is determined by minimizing the MC function in (3.2), given in the following theorem. Theorem 3.2. Given the set of measuring times tmi and a∗i , the optimal ∗ , . . . , ξ∗ cutoff points are ξ ∗i = (ξi,1 i,g−1 ) , where ∗ = ξi,k
(ηk + ηk+1 )a∗i Λ(tmi ) 2 ∗ a Σmi a∗i ln(Ci,k pk /[Ci,k+1 pk+1 ]) + i , i = 1, . . . , n, k = 1, . . . , g − 1. (ηk+1 − ηk )a∗i Λ(tmi ) (3.4)
Finally, substituting a∗ and ξ ∗ into (3.3), the optimal test stopping time, can be directly obtained by enumerating t ∈ t to minimize the TC function, i.e., t∗ = arg min T C(t|a∗ , ξ ∗ ). t∗ ,
t∈t
Optimal classification policy and comparisons...
329
When the measuring times are fixed and the same for all tested units (i.e., m1 = · · · = mn = m and t1,1 = · · · = tn,1 , . . . , t1,m = · · · = tn,m ), only m coefficients in a and g − 1 cutoff points in ξ need to be determined, i.e., a∗1 = · · · = a∗n and ξ ∗1 = · · · = ξ∗n . This means that all tested units share the same linear combination with the vector of coefficients, cutoff points and misclassification probabilities at the same measuring time. For practical applications in Phase II, however, the parameters in the degradation model (2.3) are unknown. Thus, in order to implement the above classification procedure, we need to conduct a pilot study to estimate these parameters in Phase I. In the following, we adopt the ML method to estimate these unknown parameters. 4
Parameter Estimation be the collection of all the parameters Let θ = to be estimated, where η = (η1 , . . . , ηg ) , p = (p1 , . . . , pg−1 ) and pg = 1 − p1 − · · · − pg−1 . From (2.4), the density function of a single degradation path Y i is 2 , σ 2 , p ) (η , Λ, ση2 , σB
f (y i |θ) =
g−1
pk fk (y i |θ) + (1 − p1 − · · · − pg−1 )fg (y i |θ),
k=1
where fk and y i = (yi (ti,1 ), . . . , yi (ti,mi )) denotes the density function of Nmi (ηk Λ(tmi ), Σmi ) and the sample path of the ith unit at the set of measuring times tmi , respectively. The log-likelihood function can be expressed as L(θ) = ni=1 ln f (y i |θ). Since the log-likelihood function, f (y i |θ), is in the form of a logarithm of a sum, it is not easy to have an analytical expression for ML estimates of unknown parameters. The EM-type algorithm offers an alternative, simpler framework for computation of the ML estimates of the unknown parameters; see, for example, Dempster, Laird and Rubin (1977) and McLachlan and Krishnan (2008). Define a random vector Z i = (Zi,1 , Zi,2 , . . . , Zi,g ) for i = 1, . . . , n as i.i.d.
a classification index of Y i and Z i ∼ M ultinomial(1, p1 , . . . , pg ) for i = 1, . . . , n. If Zi,k = 1 denotes the kth class with the probability pk of the ith product for k = 1, . . . , g, i.e., Pr (Zi,k = 1) = pk . Let z i = (zi,1 , . . . , zi,g ) and y = (y 1 , . . . , y n ) be the observation of classification indexes and n observations of test units, respectively. Thus, the complete data is denoted by yc = (y , z ) , where z = (z 1 , . . . , z n ) . Since z is unavailable in practice, we
330
C.-Y. Peng
can treat z as unobserved latent data. Hence, the joint density function of (Y i , Z i ) can be expressed as g z pk fk (y i |θ) i,k . f (y i , z i |θ) = k=1
Given the complete data, the log-likelihood function, Lc , can be written as Lc (θ|yc )
n
=
ln f (y i , z i |θ)
i=1 n n ln(2π) 1 mi − ln |Σmi | 2 2 i=1 i=1
g n 1 zi,k ln pk − (y i − ηk Λ(tmi )) Σ−1 + mi (y i − ηk Λ(tmi )) , 2 i=1 k=1
−
=
where zi,g = 1 − zi,1 − · · · − zi,g−1 and pg = 1 − p1 − · · · − pg−1 . Since Z i |y, θ ∼ Multinomial (1, pi,1 , . . . , pi,g ) for i = 1, . . . , n, given the lth esti2(l) 2(l) 2(l) (l) ˆ (l) = (ˆ ˆ (l) , σ ˆ ), we have mate θ η (l) , Λ ˆη , σ ˆ ,σ ˆ , p B
(l) ˆ (l) ) pˆk fk (y i |θ (l) (l) ˆ = E Zi,k |y, θ := pˆi,k (l) (l) g ˆ ) ˆk fk (y i |θ k=1 p
for k = 1, . . . , g.
The conditional expectation Q-function can be expressed as follows: (l)
ˆ ) Q(θ|θ
ˆ (l) E Lc (θ|yc )|y, θ n n ln(2π) 1 − mi − ln |Σmi | 2 2 i=1 i=1
g n 1 (l) pˆi,k ln pk − (y i − ηk Λ(tmi )) Σ−1 (y − η Λ(t )) , + m k mi i i 2 i=1 k=1
= =
(4.1) (l) where pˆi,g (l)
= 1−
(l) pˆi,1
− ··· −
(l) pˆi,g−1 .
Taking the first partial derivatives of
ˆ ) in (4.1) with respect to p, η and σ 2 , setting these derivatives to Q(θ|θ η zero and solving these equations, we obtain n (l+1)
pˆk
2(l+1)
σ ˆη
=
=
1 n
1 n
n
(l)
i=1
n i=1
−
(l+1)
pˆi,k , ηˆk
g
(l)
k=1 1
pˆi,k
=
i=1
n
−1(l+1) (l) ˆm pˆi,k Λ(tmi ) Σ Λ(tmi ) i
, k = 1, . . . , g − 1,
i=1 ˆ −1(l+1) (y −η (l+1) Λ(tm )) 2 ) Ω i m k i i
Λ(tm
−1(l+1) Λ(tmi ) i
ˆ Λ(tmi ) Ω m
−1(l+1) (l) ˆm pˆi,k Λ(tmi ) Σ yi i
i
−1(l+1) Λ(tmi ) i
ˆ Λ(tmi ) Ω m
.
(4.2)
Optimal classification policy and comparisons... 2(l+1)
ˆ (l+1) and σ ˆ (l+1) , η ˆη Substituting p 2 ˆ Q(Λ, σB , σ2 |θ
where
(l)
)
331
into (4.1), we have
n n ln(2π) 1 mi − ln |Σmi | 2 2 i=1 i=1 g n 1 (l) (l+1) (l+1) ˆ −1 (y − ηˆ(l+1) Λ(tm )) , pˆi,k ln pˆk − (y i − ηˆk Λ(tmi )) Σ + mi i k i 2 i=1 k=1
−
=
2 ˆm = σ ˆη2(l+1) Λ(tmi )Λ(tmi ) + σB Qmi + σ2 I mi . Σ i
2(l+1) 2(l+1) 2 , σ 2 |θ ˆ (l) ) to obtain Λ ˆ (l+1) , σ Then, maximize Q(Λ, σB ˆB and σ ˆ . That is 2(l+1)
ˆ (l+1) , σ ˆB (Λ
2 ˆ (l) ). ,σ ˆ2(l+1) ) = arg max Q(Λ, σB , σ2 |θ 2 ,σ 2 ) (Λ,σB
2(l+1)
(4.3)
2(l+1)
ˆ (l+1) , σ 2 = σ ˆB , σ2 = σ ˆ into (4.2), we have the Substituting Λ = Λ B 2(l+1) (l+1) (l+1) ˆ ˆ ,η and σ ˆη . Hence, the EM-type algorithm updated estimates p then proceeds iteratively in the following two steps: 2(l+1) ˆ (l) , calculate pˆ(l+1) , ηˆ(l+1) and σ E-step: Given the lth estimate θ ˆη k k for k = 1, . . . , g − 1 by using (4.2). 2(l+1) 2(l+1) ˆ (l+1) , σ ˆB and σ ˆ by using (4.3). M-step: Update Λ
We iterate this EM-type procedure until all the parameters converge. Note that when the parametric functional form of Λ(t) is unknown or not clear, the shape function Λ(t) can be estimated non-parametrically using the above EM-type procedure. To show the applicability and effectiveness of this classification policy, three examples (concave, linear and convex degradation paths) are presented in the following section. 5 Applications We use the degradation model in (2.3) and Λ(t) = tδ proposed by Tseng et al. (2003) to describe the degradation path, where δ ∈ [0, ∞). Since the variation sources contain unit-to-unit variability, within-unit variability and measurement error, there are six combinations of variations for a degradation model (refer to Cheng and Peng 2012) as follows: M0g : Y (t) = Θtδ + σB B(t) + σ , M3g : Y (t) = Θtδ + σB B(t),
M1g : Y (t) = ηtδ + σB B(t) + σ , M4g : Y (t) = ηtδ + σB B(t),
M2g : Y (t) = Θtδ + σ , M5g : Y (t) = ηtδ + σ .
332
C.-Y. Peng
Note that the model M5g is a traditional regression, and is used as a benchmark for comparing the other degradation models. In each degradation model, the ML estimates for unknown parameters of the Gaussian mixture process have been obtained by an iterative EM procedure until all the estimated parameters converge. Furthermore, we adopt Akaike information ˆ + 2r) for the degradation model selection, criterion (i.e., AIC = −2L(θ) where r represents the number of unknown parameters in θ. The degradation model with the smallest AIC, has been chosen as a good model to fit the degradation data. If it is not certain how many groups should be identified, the minimum AIC can be used to guide the number of groups. In the following examples, we also compare the minimum total cost, the optimal test stopping time and the sum of misclassification probabilities among the non-cumulative degradation (NCD) method proposed by Tseng and Tang (2001), the cumulative degradation (CD) method proposed by Tseng and Peng (2004), and our proposed LDA method. All values are based on estimated parameter values at the test stopping time.
Example 1. The following fatigue data is taken from Wu and Ni (2003). There are 30 tested samples (i.e., n = 30) for fatigue crack development and the inspections were taken at the same measurement times. The time unit is in ten thousand cycles from 1 to 4, in increments of 0.5. Figure 1(a) is a plot of the degradation path of 40000 cycles for the fatigue data. Table 1 summarizes the results of ML estimates, sample log-likelihoods and AICs of fatigue data. A comparison of the AIC shows that in this case, the degradation model M43 (divided into three subpopulations) is adequate and parametric variations of ση and σ (i.e., random effect and measurement error) are not evident in the fatigue data. Note that some of the AIC values in the Table 1 are different from the others because the corresponding degradation models are not suitable for modeling the fatigue data. Now, we apply the NCD, CD and LDA methods to the degradation model M43 as follows. For the LDA method, since m1 = · · · = m30 = 7 and t1,1 = · · · = t30,1 , . . . , t1,7 = · · · = t30,7 , by using Theorem 3.1, we have
a∗i
=
ˆ −1 tδˆ Σ 7 7
δ Q−1 7 t7 = ˆ ˆ −1 ˆ −1 δˆ ˆ −1 δˆ tδ7 Q−1 tδ7 Σ 7 Σ 7 t7 7 Q7 t 7 ˆ
= (−0.2193, −0.1157, −0.1053, −0.0979, −0.0922, −0.0877, 0.9495) (5.1)
333
6
Y(t)
6
0
0
2
2
4
4
Y(t)
8
8
10
10
12
Optimal classification policy and comparisons...
0
1
2
3
4
0
1000
2000
t
3000
4000
0.2 0.0
0.1
Y(t)
0.3
t
0
2000
4000
6000
8000
10000
t
Figure 1: Degradation paths
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
for i = 1, . . . , 30, where tδ7 = (1, 1.5δ , 2δ , 2.5δ , 3δ , 3.5δ , 4δ ) . For illustration, assume that the cost configuration
(Ci,1 , Ci,2 , Ci,3 , Cmea , Cope ) = (25, 35, 55, 2, 1)
is used in this classification policy. From Theorem 3.2, given the set of measuring times t7 and a∗i in (5.1), the optimal cutoff points are ξ∗1 = · · · = ∗ , ξ ∗ ) = (3.6413, 5.8129) for i = 1, . . . , 30. Given the ξ ∗30 , where ξ ∗i = (ξi,1 i,2
Model M01 M11 M21 M31 M41 M51 M02 M12 M22 M32 M42 M52 M03 M13 M23 M33 M43 M53 M04 M14 M24 M34 M44 M54
ηˆ1 0.58838 0.67787 0.74076 0.58844 0.67791 0.77531 0.50207 0.49559 0.58812 0.50208 0.49560 0.55527 0.37885 0.37865 0.52419 0.37887 0.37866 0.52269 0.37527 0.37528 0.52120 0.37528 0.37526 0.52026
ηˆ2 – – – – – – 0.88006 0.88626 0.92532 0.88013 0.88631 0.92732 0.64796 0.64771 0.80401 0.64799 0.64772 0.79187 0.62071 0.62073 0.78564 0.62072 0.62070 0.76299
ηˆ3 – – – – – – – – – – – – 0.96181 0.96074 1.05884 0.96187 0.96075 1.04706 0.76043 0.76042 0.94573 0.76041 0.76041 0.91006
ηˆ4 – – – – – – – – – – – – – – – – – – 0.98152 0.98155 1.09828 0.98152 0.98150 1.10549
δˆ 1.67761 1.58713 1.51455 1.67759 1.58707 1.47641 1.67876 1.65093 1.51450 1.67875 1.65091 1.49862 1.67880 1.67712 1.51463 1.67878 1.67712 1.51121 1.67941 1.67939 1.51473 1.67940 1.67942 1.51500 σ ˆη 0.20607 – 0.21340 0.20607 – – 0.13178 – 0.13179 0.13176 – – 0.03426 – 0.07100 0.03427 – – 5.3141×10−6 – 0.06011 5.3642×10−6 – – σ ˆB 0.3281229 0.5424013 – 0.3281263 0.5423996 – 0.3281135 0.4047504 – 0.3281180 0.4047559 – 0.3281125 0.3345344 – 0.3281146 0.3345414 – 0.32584 0.32584 – 0.32584 0.32585 –
σ ˆ 9.8905×10−6 1.1344×10−5 0.25961 – – 1.07536 1.2117×10−6 6.5173×10−6 0.25962 – – 0.60300 1.4274×10−6 2.9062×10−6 0.25961 – – 0.40391 5.9492×10−8 4.8866×10−6 0.25961 – – 0.35900
pˆ1 – – – – – – 0.77447 0.69806 0.54720 0.77451 0.69805 0.46305 0.40706 0.40518 0.40521 0.40711 0.40518 0.40003 0.39892 0.39894 0.39893 0.39890 0.39893 0.39979
pˆ2 – – – – – – – – – – – – 0.43716 0.43461 0.39826 0.43719 0.43463 0.36667 0.35704 0.35699 0.35017 0.35705 0.35708 0.27134
pˆ3 – – – – – – – – – – – – – – – – – – 0.10904 0.10903 0.11779 0.10903 0.10900 0.19549
Table 1: ML estimates, sample log-likelihoods and AICs of fatigue data ˆ L(θ) −40.330 −107.128 −85.851 −40.330 −107.128 −313.245 −38.834 −61.794 −84.581 −38.834 −61.794 −212.047 −34.259 −34.739 −81.468 −34.259 −34.739 −139.817 −33.208 −33.208 −81.080 −33.208 −33.208 −121.520
AIC 90.660 222.256 179.703 88.660 220.256 632.490 91.668 135.589 181.162 89.668 133.589 434.094 86.519 85.477 178.936 84.519 83.477 293.613 88.416 86.416 182.160 86.416 84.416 261.040
334 C.-Y. Peng
Optimal classification policy and comparisons...
335
set of measuring times t7 and a∗i in (5.1), the estimated misclassification ˆi,k , are computed as probabilities, E
ˆi,k E
⎧ ⎛ ⎞ ˆ ⎪ ∗ ∗ δ ⎪ ξ − η ˆ a t 1 ⎪ 7 i,1 i ⎪ ⎠ = 0.0159 1 − Φ⎝ ⎪ ⎪ ⎪ ∗ ˆ ∗ ⎪ ⎪ a a Σ 7 i ⎪ ⎛ ⎞i ⎛ ⎞ ⎪ ⎪ ⎪ ˆ ˆ ∗ ∗ δ ∗ ∗ δ ⎨ ξ − η ˆ a t ξ − η ˆ a t 2 2 i,1 i 7⎠ i,2 i 7⎠ + 1 − Φ⎝ = 0.0133 Φ⎝ = ⎪ ∗ ˆ ∗ ∗ ˆ ∗ ⎪ a a a a Σ Σ ⎪ 7 i 7 i i i ⎪ ⎛ ⎞ ⎪ ⎪ ˆ ⎪ ∗ ∗ δ ⎪ ξ − η ˆ ⎪ 3 ai t7 i,2 ⎪ ⎠ = 0.0062 ⎪ Φ⎝ ⎪ ⎪ ⎩ ˆ 7 a∗ a∗i Σ i
for k = 1,
for k = 2,
for k = 3,
where i = 1, . . . , 30. From the definition of the total cost in (3.3), we have ∗
∗
T C(t7 |a , ξ ) = n
3
ˆi,k + 4nCope + (7 + 1)nCmea = 492.52. pˆk Ci,k E
k=1
Following the same procedure, we computed the optimal cutoff points, estimated misclassification probabilities and total cost at each measuring time by replacing t7 by tm , where m = 1, . . . , 6. If we use the NCD method, we only set a∗i = (0, 0, 0, 0, 0, 0, 1) in the previous procedure. In addition, applying the CD method to the degradation model M23 and using Lemma 1 in Tseng and Peng (2004), we have δ+1 2 3 t t t σ t ηt δ , , B Y (x)dx = η x dx + σB B(x)dx ∼ N δ+1 3 0 0 0 which can be seen as a specific linear combination of the entire sequence of observations. The remaining evaluations of the optimal cutoff points, estimated misclassification probabilities and total cost at each measuring time can be obtained by using the procedure in Tseng and Peng (2004). Hence, by applying the NCD, CD and LDA methods to the degradation model M43 at each measuring time, the optimal cutoff points, estimated misclassification probabilities and total cost are shown in Table 2. From Table 2, the optimal test stopping time of NCD, CD and LDA methods are 3, 3.5 and 2.5 in ten thousand cycles, respectively. The corresponding optimal total costs are 443.19, 578.07 and 421.35, respectively. Numerical comparisons in terms of the sum of misclassification probabilities and the total cost demonstrate that the LDA method can outperform the other methods for each measuring time. Thus, the LDA method is more economical for distinguishing these three subpopulations. Furthermore, for an on-line test in Phase II, use of the LDA method can shorten the test stopping time by 37.5 %(= 1 − 2.5/4).
336
C.-Y. Peng
Table 2: Optimal cutoff points, estimated misclassification probabilities and total cost at each measuring time of fatigue data t (×104 cycles) 1 1.5 2 2.5 3 3.5 4 NCD method ∗ ξi,1 0.3441 0.8845 1.5353 2.2951 3.1590 4.1227 5.1820 ∗ ξi,2 0.9995 1.7359 2.6940 3.8442 5.1694 6.6579 8.3009 Ei,1 0.5412 0.3690 0.2464 0.1561 0.0923 0.0504 0.0252 Ei,2 0.3285 0.3003 0.2227 0.1455 0.0854 0.0453 0.0217 Ei,3 0.5461 0.3476 0.2119 0.1196 0.0612 0.0280 0.0114 TC 578.74 521.03 472.53 445.45 443.19 463.36 500.54 Δi,1 0.8043 1.2963 1.8187 2.3650 2.9312 3.5143 4.1125 Δi,2 0.9357 1.5080 2.1158 2.7514 3.4101 4.0885 4.7844 CD method ∗ ξi,1 ∗ ξi,2 Ei,1 Ei,2 Ei,3 TC Δi,1 Δi,2
0.0408 0.4746 0.6989 0.2629 0.7255 644.16 0.5204 0.6054
0.3955 1.0881 0.5261 0.3304 0.5286 630.38 0.8387 0.9757
1.0372 2.1393 0.4041 0.3145 0.3876 608.78 1.1767 1.3689
2.0252 3.7260 0.3088 0.2670 0.2798 589.65 1.5301 1.7801
3.4149 5.9373 0.2315 0.2109 0.1961 578.44 1.8964 2.2063
5.2584 8.8562 0.1690 0.1573 0.1322 578.07 2.2737 2.6452
7.6053 12.5612 0.1195 0.1114 0.0851 589.63 2.6607 3.0954
LDA method ∗ ξi,1 ∗ ξi,2 Ei,1 Ei,2 Ei,3 TC Δi,1 Δi,2
0.3441 0.9995 0.5412 0.3285 0.5461 578.73 0.8043 0.9357
0.6082 1.1657 0.3492 0.2905 0.3251 504.63 1.3689 1.5926
1.0685 1.8467 0.2212 0.2025 0.1853 448.62 1.9525 2.2715
1.6042 2.6612 0.1315 0.1227 0.0962 421.35 2.5569 2.9747
2.2135 3.5985 0.0718 0.0658 0.0443 423.55 3.1805 3.7002
2.8934 4.6511 0.0356 0.0313 0.0179 449.84 3.8213 4.4457
3.6413 5.8129 0.0159 0.0133 0.0062 492.52 4.4777 5.2092
Example 2. We use laser data (linear degradation paths, i.e., δ = 1) from Meeker and Escobar (1998, Table C.17, page 642) to illustrate the classification procedure. The QC of a laser device is its operating current. To
Optimal classification policy and comparisons...
337
maintain nearly constant light output, the laser device contains a feedback mechanism for increasing operating current when its light intensity degrades. Figure 1(b) shows a plot of the operating current for 15 tested subjects. The measured frequency of the currents is 250 hours and the experiment was terminated at 4000 hours. The results of the ML estimates, sample loglikelihoods and AICs of laser data are summarized in the supplemental file. In this case the degradation model M42 (divided into two subpopulations) is ˆ under the degradation model adequate. Hence, we use the ML estimated θ M42 and the assumed cost configuration (Ci,1 , Ci,2 , Cmea , Cope ) = (65, 90, 0.0004, 0.1) in this classification policy. Following the same procedure, the optimal cutoff points, estimated misclassification probabilities and total cost at each measuring time are available in the supplemental file. The optimal test stopping time of NCD, CD and LDA methods are 2500, 3000 and 2500 hours, respectively. The corresponding optimal total costs are 41.31, 51.15 and 41.31, respectively. Our proposed LDA method and the NCD method outperform the CD method with regard to the total cost and the sum of misclassification probabilities. However, the LDA method and the NCD method have the same total cost and misclassification probabilities for each measuring time. This leads to an interesting question: is the LDA method equivalent to the NCD method under the conditions δ = 1 or (ση , σ ) = (0, 0) in the degradation model? Note that by using the LDA/NCD method, we can shorten the test stopping time by 37.5 %(= 1 − 2500/4000) for an on-line test in Phase II. Example 3. Light emitting diodes (LEDs) have been applied in many fields (e.g., traffic signals and full-color displays etc.) and they are desirable because of their brightness, low power consumption and high reliability. We use an LED data set (T25) in Hamada et al. (2008, Table 8.6, page 292) to illustrate the procedure of classification policy. Figure 1(c) shows a plot of the luminous flux (in lumens) of 9744 hours for 25 tested subjects. The measured frequency of lumens is 336 hours for 29 inspection times. The results of ML estimates, sample log-likelihoods and AICs of T25 data are shown in the supplemental file. The degradation model M02 (divided into two subpopulations) is adequate. Following the same procedure and the assumed cost setting (Ci,1 , Ci,2 , Cmea , Cope ) = (65, 90, 0.0004, 0.1), the optimal cutoff points, estimated misclassification probabilities and total cost at each measuring time are found, as shown in the supplemental file. The optimal test stopping time of NCD, CD and LDA methods are 3024, 4368 and 2352 hours, respectively. The corresponding optimal total costs are 210.18, 245.58 and 148.94, respectively. Again, our proposed LDA method is superior to
338
C.-Y. Peng
NCD and CD methods in terms of the sum of misclassification probabilities and the total cost for each measuring time, except for the first measuring time. The corresponding reduction of the test stopping time is a dramatic 75.86 %(= 1 − 2352/9744) for an on-line test in Phase II. Observations and concerns raised from inspecting these examples are as follows: (i) the LDA method is useful to discriminate these degradation data sets no matter which degradation model is chosen, how many groups are identified or how the cost configuration is given; (ii) it seems that the value of θ in a degradation model plays an important role in judging whether the LDA method is equivalent to the NCD method; (iii) does the given function Λ(t) = tδ provide any support in degradation modeling under these three classification policies? Based on the proposed degradation model in (2.3) it is of interest to investigate the following problems: (i) What is a sufficient condition under which the LDA (or NCD, or CD) method is a better classification policy? (ii) Which parameter in θ plays an important role in a classification policy? (iii) What is the effect of the shape function Λ for these three classification policies? In the following section, we discuss and analyze these problems. 6 Analytical Comparisons In this section, based on the degradation model (2.3), we compare these three classification policies in terms of the two criteria: (i) the sum of misclassification probabilities; (ii) the total cost of conducting a classification test. Since the measurements are available for all degradation paths at different measuring times, with different numbers of measurements for each subject, the comparisons of three methods will be affected by this assumption. Hence, we use single subject to investigate these three classification methods. Some i,k denote the MSD of two additional notations are needed. Let Δi,k and Δ different linear combinations with the vector of coefficients ai and bi , respectively, i.e., Δi,k =
ηk+1 ai Λ(tmi ) − ηk ai Λ(tmi ) , ai Σ m i ai
m ) − ηk bi Λ(tmi ) i,k = ηk+1 bi Λ(t i Δ b i Σm i b i
for k = 1, . . . , g − 1.
Optimal classification policy and comparisons...
339
The symbol “∼ ” denotes researchers/engineers using another classification policy. In addition, the relationships between mis-classification probabilities αi,k (βi,k ) and α i,k (βi,k ) under two classification policies and cost settings are given by
Ni,k =
Ti,k =
i,k , βi,k < βi,k αi,k < α
for (−1)Ii,k Ri,k > 1,
i,k , (−1)Ii,k βi,k > (−1)Ii,k βi,k (−1)Ii,k αi,k < (−1)Ii,k α
for 0 < (−1)Ii,k Ri,k < 1,
i,k , βi,k > βi,k αi,k > α
for (−1)Ii,k Ri,k > 1,
i,k , (−1)Ii,k βi,k < (−1)Ii,k βi,k (−1)Ii,k αi,k > (−1)Ii,k α
for 0 < (−1)Ii,k Ri,k < 1,
where
Ii,k = and Ri,k =
0, if Ci,k pk < Ci,k+1 pk+1 , 1, otherwise,
i,k Δi,k Δ , k = 1, . . . , g − 1. −2 ln(Ci,k pk /[Ci,k+1 pk+1 ])
We will investigate the conditions for choosing a better strategy from the two vectors of coefficients ai and bi . Lemma 6.1. For two different linear combinations in a classification test, we have i,k < Δi,k , 1 ≤ k = k1 ≤ g − 1. i,k < Δi,k ⇔ Δ Δ 1 1 From Lemma 6.1, it is clear that we need to understand only one relationship between two MSDs, which is equivalent to understanding g − 1 relationships among pairwise MSDs for two different linear combinations in a classification test. Hence, under the order restriction, we use the single re i,1 < Δi,1 to represent Δ i,k < Δi,k for k = 1, . . . , g − 1 and omit lationship Δ the index g in the degradation models M0g -M5g in the following inferences. Theorem 6.1. For two different linear combinations in a classification test, we have the following results: i,1 < Δi,1 and a given relationship of cost setting (I1 , I2 , . . . , Ig−1 ), (i) If Δ then (Ni,1 , . . . , Ni,g−1 ). i,1 > Δi,1 and a given relationship of cost setting (I1 , I2 , . . . , Ig−1 ), (ii) If Δ then (Ti,1 , . . . , Ti,g−1 ).
340
C.-Y. Peng
i,1 < Δi,1 From Theorem 6.1 (i), it can be seen that under the condition Δ and different cost settings, we have the relationships between αi,k and βi,k for k = 1, . . . , g − 1. The following theorem is used to distinguish which method is superior in terms of the sum of misclassification probabilities and the total cost. i,1 < Δi,1 , then Theorem 6.2. Given the set of measuring times tmi , if Δ (i)
g
Ei,k <
k=1
g
i,k , E
k=1
Ci . (ii) T Ci < T Note that the results of Lemma 6.1 and Theorem 6.2 are also true if the symbol (<) is replaced by the symbol (>). When m1 = · · · = mn = m and t1,1 = · · · = tn,1 , . . . , t1,m = · · · = tn,m , Theorems 2 and 3 in Tseng and Peng (2004) are special cases (i.e., g = 2) of Theorems 6.1 and 6.2, respectively. From Theorems 6.1 and 6.2, the sufficient condition for choosing a better strategy is to compare the MSD of the two different methods without the effect of assumptions on the cost configurations. The large MSD not only accelerates the difference between two subpopulations, which implies having a small sum of mis-classification probabilities, but also provides a more economical classification policy. Consequently, we only need to compare the MSD of NCD, CD and LDA methods. CD LDA denote the MSD of NCD, CD and LDA Let ΔNCD i,1 , Δi,1 and Δi,1 methods based on the degradation model in (2.3). Since the NCD method uses information only on current observations (i.e., ai = emi ) and the LDA method uses all information from the observations, we have ΔNCD = i,1
(η2 − η1 )Λ(tmi ) 2 + σB tmi + σ2
and
ση2 Λ(tmi )2
ΔLDA = (η2 − η1 ) i,1
Λ(tmi ) Σ−1 mi Λ(tmi ).
Note that when only one measurement is observed (i.e., tmi = t1 ), it is easily seen that ΔNCD = ΔLDA i,1 i,1 . It means that there is no difference between the two methods in terms of the sum of misclassification probabilities and total cost, no matter the given conditions. Applying the CD method to the degradation model in (2.2) and using Lemma 1 in Tseng and Peng (2004), we obtain t t t t Y (x)dx = Θ Λ(x)dx + σB B(x)dx + σ dx 0
0
0
0
341
Optimal classification policy and comparisons... 2 t t 2 t3 σB 2 2 2 Λ(x)dx + ∼ N η Λ(x)dx, ση + t σ . 3 0 0 Hence, the MSD of the CD method, ΔCD i,1 , can be expressed as tm i (η2 − η1 ) Λ(x)dx CD 0 Δi,1 = . 2 tm 3 i t m 2 + t2 σ 2 ση2 Λ(x)dx + i σB mi 3 0
The conditions under which the LDA, NCD or CD method is a better classification policy is given in the following theorem. Theorem 6.3. Under the degradation model M0 , i.e., Y (t) = ΘΛ(t) + σB B(t) + σ , let
A
=
B
=
2 2 (Λ, σB , σ )(σB tmi + σ2 )Λ(tmi ) Ω−1 mi Λ(tmi ) − Λ(tmi ) = 0 , "
! tm 2 2 i (Λ, σB , σ )(σB tmi + 3σ2 )t2mi Λ(tmi ) Ω−1 Λ(t ) − 3 Λ(x)dx ≥ 0 , mi mi
C
=
0
2 (Λ, σB , σ )(σB tmi + 3σ2 )t2mi Λ(tmi )2 − 3
!
t mi
"
2
Λ(x)dx
2 (σB t mi
+
σ2 )
≥0 .
0
Then, we have NCD and the equality holds as (Λ, σ , σ ) ∈ A; (i) ΔLDA B i,1 ≥ Δi,1 CD (ii) ΔLDA i,1 ≥ Δi,1 ⇔ (Λ, σB , σ ) ∈ B;
(iii) ΔNCD ≥ ΔCD i,1 i,1 ⇔ (Λ, σB , σ ) ∈ C. Moreover, the set C is included in the set B, i.e., C ⊂ B. For the degradation model M0 , there are 2g + 2 parameters and one shape function: η, ση , σB , σ , p and Λ. From Lemma 6.1 and Theorem 6.3, the choice of a better classification policy is affected by only two parameters (σB , σ ) and the shape function Λ. The LDA method is equivalent or superior to the NCD method, no matter the given conditions. The LDA method is equivalent to the NCD method if and only if (Λ, σB , σ ) ∈ A. If the parameters (Λ, σB , σ ) are in the set B, then the LDA method is better than the CD method. When (Λ, σB , σ ) is in the set C, we can conclude that the CD method is not the best choice because C ⊂ B. More precisely, the functional inequality in the set C is solved for the shape function Λ in the following result.
342
C.-Y. Peng
Corollary 6.1. From Theorem 6.3 (iii), we have #
ΔNCD ΔCD i,1 i,1 ⇔ Λ(t) K
2 (σB t
2 σ2 )(σB t
3σ2 )
2 σB t
2σ2
√3
+ + + + 2 3(σB t + σ2 ) 2 2 2 2 2 2 σB t + 3σ2 3(σB t + σ )(σB t + 3σ ) + 2σB t + 3σ2
for any constant K = 0. Without loss of generality, the constant K in Corollary 6.1 can be chosen to adjust the coefficient to be 1 in the shape function Λ(t) in order to avoid the identifiable problem. Note that for the sets A and B, the explicit solution of Λ is not easy to obtain because the complex expression of Λ(tmi ) Ω−1 mi Λ(tmi ) is in the functional inequality. Theorem 6.4. If the degradation model is without measurement error, i.e., Y (t) = ΘΛ(t) + σB B(t), then we have NCD and ΔLDA = ΔNCD ⇔ Λ is linear, i.e., Λ(t) = Λ(1)t; (i) ΔLDA i,1 ≥ Δi,1 i,1 i,1
(ii) ΔLDA ≥ ΔCD ⇔ Λ satisfies the functional inequality t3mi Λ(tmi ) Q−1 mi i,1 i,1 t 2 mi Λ(tmi ) ≥ 3 0 Λ(x)dx ; = ΔCD (iii) ΔNCD i,1 i,1 ⇔ Λ(t) = Kt
√
3−1
for any constant K = 0.
Without loss of generality, take Λ(1) = 1 in the linear function Λ(t) and use K = 1 in Theorem 6.4 (iii). From Theorem 6.4 (i), it is of interest to note that the LDA method can outperform the NCD method when the shape function is non-linear. The NCD method is equivalent to the CD method √ if and only if the shape function is the specific power of time t (i.e., Λ(t) ∝ t 3−1 ). Theorem 6.5. Assume that the degradation model is linear, i.e., Y (t) = Θt + σB B(t) + σ . We have NCD > ΔCD ; (i) ΔLDA i,1 ≥ Δi,1 i,1 NCD ⇔ σ = 0. (ii) ΔLDA i,1 = Δi,1
Optimal classification policy and comparisons...
343
From Theorem 6.5, under the linearity assumption (i.e., Λ(t) = t), the LDA method is superior to the NCD method and the CD method is the least effective. Furthermore, the LDA method is equivalent to the NCD method, when based on the linear degradation models M3 and M4 . = ΔNCD can be obtained from Remark 1. Although the result ΔLDA i,1 i,1 Theorem 6.4 (i) or Theorem 6.5 (ii), it should be questioned whether the two theoretical results could be combined into one brief statement: NCD ⇔ Λ(t) = t and σ = 0. ΔLDA i,1 = Δi,1
Here a truth table can be used to identify whether the two statements are logically equivalent or not. Unfortunately, both compound statements are not logically equivalent as demonstrated in the following. Define the simple statements: P : Λ(t) = t,
NCD S : ΔLDA i,1 = Δi,1 ,
R : σ = 0.
The statement in Theorem 6.4 (i) and Theorem 6.5 (ii) can be represented as R ⇒ (S ⇔ P) and P ⇒ (S ⇔ R), respectively. Here we claim that (P ⇒ (S ⇔ R)) ∧ (R ⇒ (S ⇔ P)) is equivalent to S ⇔ (P ∧ R), where ∧ denotes a logical conjunction. The corresponding truth table is shown in Table 3 (details are available in the supplemental file). From Table 3, it can be seen that seven cases in eight have identical truth values. However, when the simple statement (P, S, R) is (F, T, F), then the compound statement P ⇒ (D ⇔ R) ∧ R ⇒ (S ⇔ P) is true, but the compound statement S ⇔ (P ∧ R) is false.
P
S
T T T T F F F F
T T F F T T F F
Table 3: Truth table (T: true, F: false) R (P ⇒ (S ⇔ R)) ∧(R ⇒ (S ⇔ P)) T T F F T F F T T F F T T T F T
S ⇔ (P ∧ R) T F F T F F T T
344
C.-Y. Peng The following are immediate consequences of Theorem 6.3.
Corollary 6.2. For the random-effects degradation model, i.e., Y (t) = ΘΛ(t) + σ . we have NCD ≥ ΔCD ; (i) ΔLDA i,1 > Δi,1 i,1
= ΔCD (ii) ΔNCD i,1 i,1 ⇔ Λ(t) = K for any constant K = 0. Without loss of generality, use Λ(t) = 1 as the constant function. From Corollary 6.2, the LDA method is superior to the other two approaches based on the degradation models M2 and M5 . Corollary 6.3. Assume the degradation model has a random intercept, i.e., Y (t) = Θ + σB B(t) + σ . We have
(i) ΔLDA ≥ ΔNCD and the equality holds as (σB , σ ) ∈ (σB , σ ) i,1 i,1 −1 2t 2 (σB mi + σ )1mi Ωmi 1mi = 1 , 2
−1 CD 2 (ii) ΔLDA i,1 ≥ Δi,1 ⇔ (σB , σ ) ∈ (σB , σ ) (σB tmi +3σ )1mi Ωmi 1mi −3 ≥ 0 ; NCD and ΔCD = ΔNCD ⇔ σ = 0. (iii) ΔCD B i,1 ≥ Δi,1 i,1 i,1
The NCD method is equivalent to the CD method by Corollary 6.2 (ii) or Corollary 6.3 (iii). Similar to the previous remark, it is not necessary to combine the results in Corollary 6.2 (ii) and Corollary 6.3 (iii) into one brief statement. In addition, under the assumption of the degradation model with Λ(t) = 1, the equally-spaced sampling scheme (i.e., ti,j −ti,j−1 = Δti for j = 1, . . . , mi ), and mi ≥ 3, the following theorem proves the LDA method is superior to the others. Theorem 6.6. Assume the degradation model with Λ(t) = 1, i.e., Y (t) = Θ + σB B(t) + σ , > ΔCD the equally-spaced sampling scheme and mi ≥ 3. We have ΔLDA i,1 i,1 ≥ NCD Δi,1 .
Optimal classification policy and comparisons...
345
These theoretical results indicate that the effectiveness of the LDA method makes it more applicable in the classification tests. Refer to the given shape function Λ(t) = tδ using in Examples 1-3. From 2 and σ 2 : Theorem 6.3 (iii), we have the following relationships among δ, σB 2t 2 3(σB mi +σ ) NCD ⇔ δ ∈ 0, ΔCD −1 , i,1 > Δi,1 σ 2 t +3σ 2 B mi
ΔNCD = ΔCD i,1 i,1 ⇔ δ = NCD CD Δi,1 > Δi,1 ⇔ δ ∈
2t 2 3(σB mi +σ ) 2t 2 σB mi +3σ 2t 2 3(σB mi +σ ) 2 σB tmi +3σ2
− 1 and − 1, ∞ .
More precisely, if σ = 0, we then have NCD ⇔ δ = 1, ΔLDA i,1 = Δi,1 √ NCD ⇔ δ ∈ [0, 3 − 1), ΔCD i,1 > Δi,1 √ = ΔCD 3 − 1 and ΔNCD i,1 i,1 ⇔ δ = √ ΔNCD > ΔCD i,1 i,1 ⇔ δ ∈ ( 3 − 1, ∞).
= ΔCD Under the case σB = 0, we have ΔNCD i,1 i,1 √⇔ δ = 0. In our Example 1, ˆ since the estimated parameter δ = 1.67712 ∈ ( 3 − 1, ∞), from Table 2 and Theorem 6.4 (i) and (iii), both numerical and theoretical results show the same conclusion ΔLDA > ΔNCD > ΔCD i,1 i,1 i,1 for each measuring time. Furthermore, the numerical results obtained in Example 2, ΔLDA = ΔNCD > ΔCD i,1 i,1 i,1 , coincide with Theorems 6.4 and 6.5. Consequently, the rationality of using Λ(t) = tδ can be justified and it makes the degradation model widely applicable. 7 Conclusions For highly reliable products, we propose a non-linear degradation model using a Gaussian mixture process that simultaneously considers unit-to-unit variability, within-unit variability and measurement error. By using an EMtype algorithm and model selection criterion, the estimated parameters and the determination of the number of groups can be obtained. We provide a three-step classification policy to easily determine the optimal coefficients, optimal cutoff points and optimal test stopping time, with the last two being subject to cost. Numerical comparisons in Examples 1-3 clearly show the advantages of the LDA approach over other methods in reducing production cost. Analytic comparisons under small sample size circumstances in Section
346
C.-Y. Peng
6 demonstrate that the LDA approach can outperform the other methods for most cases. The superior performance of the LDA method in a classification test can be attributed to the benefit of information from effectively using all observations in the degradation data. Although the precise results are given based on the degradation model in (2.3), and the methods used in the comparisons of the NCD, CD and LDA procedures for a Gaussian mixture process, the user must verify these assumptions before applying the proposed method. Approaches to extending the work studied in this paper to a more general model setting (such as unequal covariance matrices for each subpopulation, non-parametric degradation models, multivariate degradation measures, more general stochastic processes, covariate information, window size consideration, etc.) may be investigated in future research. Acknowledgments. This work was supported by the National Science Council (Grant No: NSC-101-2118-M-001-004) of Taiwan, Republic of China. The author thanks the Editor and the referees for their valuable comments. The author would also like to thank S. C. Hsu and K. H. Lin for their discussions and assistances in numerical evaluations.
References bian, l. and gebraeel, n. (2012). Computing and Updating First-Passage Time Distributions for Randomly Evolving Degradation Signals. IIE Transactions 44, 974– 987. boldea, o. and magnus, j. r. (2009). Maximum Likelihood Estimation of the Multivariate Normal Mixture Model. Journal of the American Statistical Association 104, 1539– 1549. chao, m. t. (1999). Degradation Analysis and Related Topics: Some Thoughts and a Review. The Proceedings of the National Science Council, Part A 23, 555–566. cheng, y. s. and peng, c. y. (2012). Integrated Degradation Models in R Using iDEMO. Journal of Statistical Software 49, 2, 1–22. dempster, a. p., laird, n. m. and rubin, d. b. (1977). Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society. B 39, 1–22. feng, q., peng, h. and coit, d. w. (2010). A Degradation-Based Model for Joint Optimization of Burn-in, Quality Inspection, and Maintenance: A Light Display Device Application. The International Journal of Advanced Manufacturing Technology 50, 801–808. hamada, m. s., wilson, a. g., reese, c. s. and martz, h. f. (2008). Bayesian Reliability. Springer, New York. johnson, r. a. and wichern, d. w. (2007). Applied Multivariate Statistical Analysis, 6th ed. Prentice Hall, New Jersey. lai, c. d. and xie, m. (2006). Stochastic Ageing and Dependence for Reliability. Springer, New York.
Optimal classification policy and comparisons...
347
lawless, j. and crowder, m. (2004). Covariates and Random Effects in a Gamma Process Model with Application to Degradation and Failure. Lifetime Data Analysis 10, 213– 227. lu, c. j. and meeker, w. q. (1993). Using Degradation Measures to Estimate a Time-toFailure Distribution. Technometrics 35, 161–174. mclachlan, g. j. and krishnan, t. (2008). The EM Algorithm and Extensions, 2nd ed. Wiley, New York. meeker, w. q. and escobar, l. a. (1998). Statistical Methods for Reliability Data. Wiley, New York. nelson, w. (1990). Accelerated Testing: Statistical Models, Test Plans, and Data Analysis. Wiley, New York. peng, c. y. (2012). A Note on Optimal Allocations for the Second Elementary Symmetric Function with Applications for Optimal Reliability Design. Naval Research Logistics 59, 278–284. peng, c. y. (2015). Inverse Gaussian processes with random effects and explanatory variables for degradation data. Technometrics 57, 100–111. peng, c. y. and hsu, s. c. (2012). A Note on a Wiener Process with Measurement Error. Applied Mathematics Letters 25, 729–732. peng, c. y. and tseng, s. t. (2009). Mis-specification Analysis of Linear Degradation Models. IEEE Transactions on Reliability 58, 444–455. peng, c. y. and tseng, s. t. (2013). Statistical Lifetime Inference with Skew-Wiener Linear Degradation Models. IEEE Transactions on Reliability 62, 338–350. schott, j. r. (2005). Matrix analysis for statistics, 2nd ed. John Wiley & Sons, New York. singpurwalla, n. d. (1995). Survival in Dynamic Environments. Statistical Science 10, 86–103. tseng, s. t. and peng, c. y. (2004). Optimal Burn-in Policy by Using an Integrated Wiener Process. IIE Transactions 36, 1161–1170. tseng, s. t. and tang, j. (2001). Optimal Burn-in Time for Highly Reliable Products. International Journal of Industrial Engineering 8, 329–338. tseng, s. t., tang, j. and ku, i. h. (2003). Determination of Optimal Burn-in Parameter and Residual Life for Highly Reliable Products. Naval Research Logistics 50, 1–14. tsai, c. c., tseng, s. t. and balakrishnan, n. (2011). Optimal Burn-in Policy for Highly Reliable Products Using Gamma Degradation Process. IEEE Transactions on Reliability 60, 234–245. van noortwijk, j. m. (2009). A Survey of the Application of Gamma Processes in Maintenance. Reliability Engineering and System Safety 94, 2–21. whitmore, g. a. (1995). Estimating Degradation by a Wiener Diffusion Process Subject to Measurement Error. Lifetime Data Analysis 1, 307–319. wu, w. f. and ni, c. c. (2003). A Study of Stochastic Fatigue Crack Growth Modeling Through Experimental Data. Probabilistic Engineering Mechanics 18, 107–118. wu, s. and xie, m. (2007). Classifying Weak, and Strong Components Using ROC Analysis with Application to Burn-in. IEEE Transactions on Reliability 56, 552–561. xiang, y., coit, d. w. and feng, q. (2013). n Subpopulations Experiencing Stochastic Degradation: Reliability Modeling, Burn-in, and Preventive Replacement Optimization. IIE Transactions 45, 391–408. yang, g. b. (2012). Optimum Degradation Tests for Comparison of Products. IEEE Transactions on Reliability 61, 220–226. yu, h. f. (2003). Optimal Classification of Highly Reliable Products Whose Degradation Paths Satisfy Wiener Processes. Engineering Optimization 35, 313–324.
348
C.-Y. Peng
yu, h. f. and yu, w. c. (2006). Optimal Classification of Highly Reliable Products with a Linearized Degradation Model. Journal of Industrial and Production Engineering 23, 382–392. zhou, r. r., serban, n. and gebraeel, n. (2011). Degradation Modeling and Lifetime Monitoring Using Functional Data Analysis. Annals of Applied Statistics 5, 1586– 1610.
Appendix Proof of Theorem 3.1. The goal is to find an optimal and nonzero vector a∗ such that n (ai Λ(tmi ))2 a ∝ arg max , a=0 ai Σmi ai ∗
i=1
which is equivalent to (ai Λ(tmi ))2 ai =0 ai Σmi ai
a∗i ∝ arg max
for i = 1, . . . , n.
By Maximization Lemma in Johnson and Wichern (2007), we have a∗i = Ki Σ−1 mi Λ(tmi ) for any constant Ki = 0 and the corresponding opti(a∗ Λ(tmi ))2 mal value is i∗ = Λ(tmi ) Σ−1 mi Λ(tmi ). The result thus follows. ai Σmi a∗i Proof of Theorem 3.2. The optimization problem is given by ξ ∗ = arg min ξ
n
M Ci (ξ i |a∗i , tmi ) =
i=1
n
arg min M Ci (ξ i |a∗i , tmi ). ξi
i=1
Consequently, given a∗i and tmi and taking the first partial derivatives of M Ci (ξ i |a∗i , tmi ) with respect to ξi,k for k = 1, . . . , g − 1, we obtain ∂M Ci (ξ i |a∗i , tmi ) ∂ξi,k
ξi,k − ηk a∗i Λ(tmi ) = ∗ φ ai Σmi a∗i a∗i Σmi a∗i Ci,k+1 npk+1 ξi,k − ηk+1 a∗i Λ(tmi ) , φ − ∗ ai Σmi a∗i a∗i Σmi a∗i Ci,k npk
where φ denotes the density function of the standard normal distribution. Setting these g − 1 derivatives to zero and solving the g − 1 equations,
Optimal classification policy and comparisons...
349
∗ as (3.4). Since Σ we obtain ξi,k mi is positive definite and the constraints ηk+1 > ηk for k = 1, . . . , g − 1, we have
∂ 2 M Ci (ξi |a∗i , tmi ) ∂ξi,k1 ∂ξi,k2 ∗ ξi,k =ξi,k ∂ 2 M Ci (ξi |a∗i , tmi ) 2 ∂ξi,k ξi,k =ξ∗
=
0, 1 ≤ k1 = k2 ≤ g − 1,
=
i,k
$ % ∗ (ξi,k − ηk a∗i Λ(tmi ))2 Ci,k pk n exp − 3 2a∗i Σmi a∗i 2π a∗i Σmi a∗i
(ηk+1 − ηk )Λ(tmi ) Σ−1 mi Λ(tmi ) >
0, k = 1, . . . , g − 1.
Hence, the desired result follows. Proof of Lemma 6.1. Since the order restriction is ηk+1 > ηk for i,k is equivalent to k = 1, . . . , g − 1, it is easy to see that Δi,k1 > Δ 1 a Λ(t ) b Λ(t ) i mi > i mi ai Σmi ai b i Σ mi b i
⇔
(ηk+1 −ηk )ai Λ(tmi )
√
ai Σmi ai
>
(ηk+1 −ηk )bi Λ(tmi )
√
bi Σmi bi
i,k , 1 ≤ k = k1 ≤ g − 1. ⇔ Δi,k > Δ The result follows. Proof of Theorem 6.1. Without loss of generality, we only show the i,k and we have the case of either Δi,k > Δ i,k case of (i). Compare Δi,k with Δ i,k for k = 1, . . . , g − 1. The remaining proofs directly follow the or Δi,k < Δ proof of Theorem 2 in Tseng and Peng (2004). Proof of Theorem 6.2. Similar to the proof of Theorem 3 in Tseng and Peng (2004), the proof is omitted here since the steps are routine. Proof of Theorem 6.3. In order to compare the three classification CD LDA policies (NCD, CD and LDA) in terms of ΔNCD i,1 , Δi,1 and Δi,1 , additional notation is needed. Let Am be an m × m matrix and |Am |j be the jth leading principal minors determinant of Am , i.e., a11 . . . a1j . . .. , j = 1, . . . , m. |Am |j = ... . . aj1 . . . ajj In addition, we first prove a main lemma that will be used in the proof of Theorem 6.3.
350
C.-Y. Peng
Lemma 7.1. Σ−1 mi −
emi emi is positive semidefinite. emi Σmi emi
Proof of Lemma 7.1. ⎛ 0 ⎜ .. ⎜ . emi emi ⎜ =⎜ 0 ⎜ emi Σmi emi ⎝ 0
Since ... 0 . . .. . . ... 0 ... 0
⎞
0 .. . 0 1 2t 2 2 2 ση Λ(tmi ) + σB mi + σ
⎟ ⎟ ⎟ ⎟, ⎟ ⎠
−1 emi emi > 0 for j = 1, . . . , mi − 1. It is only to show we have Σmi − emi Σmi emi j −1 emi emi that Σmi − ≥ 0. By using Theorem 7.5 in Schott (2005), e mi Σ mi e mi m i we have −1 −1 e e e e Σ m m m m mi i i i i Σm − = Σmi m I mi − i i emi Σmi emi m e mi Σ mi e mi m i i Σ e −1 e m m m i i = 0. = Σmi m 1 − i i emi Σmi emi 1 Now, we return to the proof of Theorem 6.3. (i) By the definitions of and ΔLDA ΔNCD i,1 i,1 , we have Λ(tmi ) emi emi Λ(tmi ) NCD −1 ΔLDA ≥ Δ ⇔ Λ(t ) Σ Λ(t ) ≥ mi mi i,1 i,1 mi emi Σmi emi e e m m i i Λ(tmi ) ≥ 0. ⇔ Λ(tmi ) Σ−1 mi − e mi Σ mi e mi NCD By Lemma 7.1, we obtain ΔLDA i,1 ≥ Δi,1 . (ii) By (2.5) and Theorem 1.7 of Schott (2005), we have −1 Σ−1 mi = Ωmi −
ση2 1 + ση2 Λ(tmi ) Ω−1 mi Λ(tmi )
−1 Ω−1 mi Λ(tmi )Λ(tmi ) Ωmi .
Thus, one obtains Λ(tmi ) Σ−1 mi Λ(tmi ) =
Λ(tmi ) Ω−1 mi Λ(tmi )
1+ση2 Λ(tmi ) Ω−1 mi Λ(tmi )
.
(7.1)
351
Optimal classification policy and comparisons... Consequently, by using (7.1), we get 2 (ΔLDA i,1 )
≥
2 (ΔCD i,1 )
& t 2 m 3 0 i Λ(x)dx Λ(tmi ) Ω−1 mi Λ(tmi ) ≥ ⇔ & t 2 m 1 + ση2 Λ(tmi ) Ω−1 2 mi Λ(tmi ) 3ση2 0 i Λ(x)dx +t3mi σB + 3t2mi σ2
! tm 2 i 2 ⇔ (σB tmi + 3σ2 )t2mi Λ(tmi ) Ω−1 Λ(x)dx ≥ 0. mi Λ(tmi ) − 3 0
NCD 2 (iii) By the definitions of ΔNCD and ΔCD i,1 i,1 , it is easily shown that (Δi,1 ) ≥ t 2 mi 2 2 2 2 2 2t + (ΔCD Λ(x)dx (σB mi i,1 ) is equivalent to (σB tmi +3σ )tmi Λ(tmi ) −3 0
σ2 ) ≥ 0. In addition, similar to the proof of Lemma 7.1, it can be seen that 2 tm i Λ(x)dx e e 3 mi mi 0 −1 Ωm i − 2 2 2 (σB tmi + 3σ )tmi Λ(tmi )2
for j = 1, . . . , mi − 1,
>
0
≥
2 0 ⇔ (σB tmi + 3σ2 )t2mi Λ(tmi )2
j
and 2 tm i 3 Λ(x)dx e e m m i 0 i −1 Ωm i − 2 2 2 2 (σB tmi + 3σ )tmi Λ(tmi ) mi
2
tm i
−3
Λ(x)dx
2 (σB tmi + σ2 ) ≥ 0.
0
Hence, we have 2 (σB tmi
+
3σ2 )t2mi Λ(tmi )2 ⎛
⇒ Λ(tmi )
⇔
⎜
2 tmi (σB
−1 ⎝Ωmi
+
−
3
−3
t mi 0
2t (σB mi
t mi
2 Λ(x)dx
0
2
2 (σB tmi + σ2 ) ≥ 0 ⎞
emi emi ⎟ ⎠ Λ(tmi ) ≥ 0 + 3σ2 )t2mi Λ(tmi )2 Λ(x)dx
3σ2 )t2mi Λ(tmi ) Ω−1 mi Λ(tmi )
−3
t mi
2 Λ(x)dx
≥ 0.
0
That is C ⊂ B. The desired result follows. Proof of Corollary 6.1. Without loss of generality, we only show the case of the functional equation. From the set C, we have 2 t + σ2) 1 3(σB Λ(t) = . (7.2) t 2 t σB t + 3σ2 Λ(x)dx 0
352
C.-Y. Peng
By integrating equation (7.2), we immediately get t 2 x + σ2) 1 3(σB Λ(x)dx = exp 2 x + 3σ 2 dx . x σB 0 Use the fact that
1 x
⎧ ⎫ √3 ⎪ ⎪ ⎪ ⎪ ⎨x ⎬ (ax + b)(ax + 3b) + ax + 2b 3(ax + b) dx = ln + K ∗ for a, b > 0, ⎪ ⎪ ax + 3b 3(ax + b)(ax + 3b) + 2ax + 3b ⎪ ⎪ ⎩ ⎭
where K ∗ is a constant. After some simple manipulations, we obtain the result. Proof of Theorem 6.4. The following lemma is used in the proof of Theorem 6.4. Lemma 7.2. (i) All continuously differentiable functions Λ : (0, ∞) → (0, ∞) satisfying the functional equation uΛ(v) = vΛ(u) are in the form of Λ(v) = Λ(1)v. (ii) All continuously differentiable functions √ v Λ : (0, ∞) → (0, ∞) satisfying the functional equation vΛ(v) = 3 0 Λ(x)dx are in the form of Λ(v) = √ Kv 3−1 for any constant K = 0. Proof of Lemma 7.2. (i) The difference between uΛ(v) = vΛ(u) and (u + 1)Λ(v) = vΛ(u + 1) gives Λ(v) = v(Λ(u) − Λ(u + 1)).
(7.3)
Substituting u = 0 into (7.3), this concludes the proof. (ii) After taking the first partial derivative of both sides of this functional equation, we immediately get √ dΛ(v)/dv 3−1 = . (7.4) Λ(v) v By integrating equation (7.4), we find the desired function. Now we are ready to prove our main result. (i)-(iii) The result follows immediately after substituting σ = 0 into Theorem 6.3.√ By using 3−1 . Here, Lemma 7.2 (ii), it is easy to see that ΔNCD = ΔCD i,1 i,1 ⇔ Λ(t) = Kt we claim that ΔLDA = ΔNCD ⇔ Λ(t) = Λ(1)t. From σ = 0 and (7.1), we i,1 i,1 have
353
Optimal classification policy and comparisons...
−1 2 Λ(tmi ) ση2 Λ(tmi )Λ(tmi ) + σB Q mi Λ(tmi ) =
Λ(tmi ) Q−1 mi Λ(tmi )
2 σB
+ ση2 Λ(tmi ) Q−1 mi Λ(tmi )
.
(7.5)
Hence, by (7.5), it is easily seen that 2 NCD 2 2 ⇔ tmi Λ(tmi ) Q−1 (ΔLDA i,1 ) = (Δi,1 ) mi Λ(tmi ) − Λ(tmi ) = 0. 2 It is only to prove tmi Λ(tmi ) Q−1 mi Λ(tmi ) − Λ(tmi ) = 0 ⇔ Λ(t) = Λ(1)t. The necessary part is obtained by using the result, tmi Q−1 mi tmi = tmi , in Peng and Tseng (2009, page 454). For the sufficient part, the proof is by mathematical 2 induction. If mi = 2, then t2 Λ(t2 ) Q−1 2 Λ(t2 )−Λ(t2 ) = 0, which is equivalent 2 to (Λ(t1 )t2 − t1 Λ(t2 )) = 0. By Lemma 7.2 (i), the shape function is linear, i.e., Λ(t) = Λ(1)t. Assume that the result holds for mi = l. That is 2 tl Λ(tl ) Q−1 l Λ(tl ) − Λ(tl ) = 0 ⇒ Λ(t) = Λ(1)t.
For the case mi = l + 1, let Ql+1 =
Ql t l tl tl+1
, then by Theorem 7.1 in
Schott (2005), we have Q−1 l+1
=
1 tl+1 −tl
(tl+1 − tl )Q−1 l + el el −el −el 1
.
Consequently, one has Λ(tl+1 ) Q−1 l+1 Λ(tl+1 ) =
1 2 2 (tl+1 − tl )Λ(tl ) Q−1 . l Λ(tl ) + Λ(tl ) − 2Λ(tl )Λ(tl+1 ) + Λ(tl+1 ) tl+1 − tl
2 Thus, let tl+1 Λ(tl+1 ) Q−1 l+1 Λ(tl+1 ) − Λ(tl+1 ) = 0, then we have
tl+1 tl+1 −tl
2 2 (tl+1 −tl )Λ(tl ) Q−1 l Λ(tl ) + Λ(tl ) −2Λ(tl )Λ(tl+1 ) + Λ(tl+1 ) −
(tl+1 − tl ) Λ(tl+1 )2 = 0 tl+1
tl (Λ(tl+1 )tl − Λ(tl )tl+1 )2 = 0. This means that Λ(t) = tl+1 − tl Λ(1)t. The proof is complete.
implying
354
C.-Y. Peng
Proof of Theorem 6.5. (i) The result follows immediately after substituting Λ(t) = t into Theorem 6.3. (ii) From (7.1) and Λ(t) = t, we have tmi Σ−1 mi tmi = Let Ωmi = we have
2t σB Ωmi −1 mi −1 2 t 2 2 σB mi −1 σB tmi + σ
tmi Ω−1 mi t mi
1 + ση2 tmi Ω−1 mi tmi
.
. By Theorem 7.4(b) in Schott (2005),
2 4 |Ωmi | = |Ωmi −1 | σB tmi + σ2 − σB tmi −1 Ω−1 (7.6) mi −1 tmi −1 . 2 −1 2 4 Since Ω−1 mi −1 and Ωmi are positive definite, we have σB tmi + σ −σB tmi −1 Ω−1 mi −1 tmi −1 > 0. Using (7.6) and Theorem 7.1 in Schott (2005), we obtain 2 −1 −1 2 4 Ω−1 mi = σB tmi + σ − σB tmi −1 Ωmi −1 tmi −1
A11
2 −σB tmi −1 Ω−1 mi −1
2 −σB Ω−1 mi −1 tmi −1 , 1
−1 −1 −1 −1 2t 2 4 where A11 = (σB mi + σ )Ωmi −1 + σB (Ωmi −1 tmi −1 tmi −1 Ωmi −1 − tmi −1 Ωmi −1 tmi −1 Ω−1 mi −1). Hence, we have the identity
tmi Ω−1 mi t mi
=
−1 2t 2 2 (−σB mi +σ )tm −1 Ωm −1 tmi −1 +tmi i i . 2 −1 4 t σB tmi +σ2 −σB m −1 Ωm −1 tmi −1 i
(7.7)
i
After some algebraic manipulations by using (7.7), we get 2 2 = ΔNCD ⇔ (σB tmi + σ2 )tmi Ω−1 ΔLDA 1 1 mi tmi − tmi = 0
tmi −1 Ω−1 mi −1 tmi −1 = 0 ⇔ σ = 0. ⇔ σ4 2 −1 σ tm + σ2 − σ 4 t i B B mi −1 Ωmi −1 tmi −1 This concludes the proof of the theorem. > ΔNCD can be obProof of Corollary 6.2. (i) The part ΔLDA i,1 i,1 tained easily by substituting σB = 0 into Theorem 6.3. When σB = 0 in Theorem 6.3, it is sufficient to show that tm i NCD CD Λ(x)dx. Δi,1 ≥ Δi,1 ⇔ tmi Λ(tmi ) ≥ v
0
Let ϕ : (0, ∞) → (0, ∞) : ϕ(v) = vΛ(v) − 0 Λ(x)dx. We have dϕ(v)/dv = vdΛ(v)/dv ≥ 0 by definition. Thus, we obtain ϕ(v) ≥ ϕ(0) = 0 for v > 0,
Optimal classification policy and comparisons...
355
which gives the asserted form. (ii) We claim that all continuously differentiable functions Λ : (0, ∞) → (0, ∞) satisfying the functional equation v vΛ(v) = 0 Λ(x)dx are in the form of Λ(v) = K for any constant K = 0. v After taking the first partial derivative of vΛ(v)− 0 Λ(x)dx = 0 with respect to v, we immediately obtain dΛ(v)/dv = 0, which implies Λ(v) = K. Proof of Corollary 6.3. Substituting Λ(t) = 1 into Theorem 6.3, the results follow directly. Proof of Theorem 6.6. We need + the following auxiliary results to 2 c = 1 for n2 < n1 to avoid prove Theorem 6.6. In addition, we define ni=n 1 i reverse product in the following formula. Theorem 7.1. Let Ω# mi be the adjoint of Ωmi , then we have (i) 1mi Ω# mi tmi =
mi
2 r−1 2 mi −r (σB ) (σ )
ti,i1
1≤i1 <···
r=1
r−1
(ti,ij+1 − ti,ij ),
j=1
(ii) 1mi Ω# m i 1m i =
mi
2 r−1 2 mi −r (σB ) (σ )
r−1
(ti,ij+1 − ti,ij ).
1≤i1 <···
r=1
Moreover, under the equally-spaced sampling scheme, i.e., ti,j − ti,j−1 = Δti for j = 1, . . . , mi , we have (iii)
ti,i1
1≤i1 <···
(ti,ij+1 − ti,ij ) =
j=1
(iv)
r−1
1≤i1 <···
r−1
(ti,ij+1 − ti,ij ) =
mi + r (Δti )r , mi − r
mi + r − 1 (Δti )r−1 . mi − r
Proof of Theorem 7.1. Similar to the proof of Theorem 1 in Peng and Hsu (2012), the proofs of (i) and (ii) are omitted here since the steps are routine. (iii) The method of generating functions can be used here for
356
C.-Y. Peng
proving combinatorial identities. Under the assumption (i.e., ti,j − ti,j−1 = Δti for j = 1, . . . , mi ), let
F (mi , r) =
i1
1≤i1 <···
r−1
(ij+1 − ij ).
j=1
First, it is easy to verify the following recurrence relation for F : F (mi , r + 1) + F (mi + 2, r + 1) = 2F (mi + 1, r + 1) + F (mi + 1, r) (7.8) and F (mi , mi ) = together with the initial conditions F (mi , 1) = mi (m i +1)/2 r i F (m 1. If we define the generating functions Ami (x) = m i , r)x , then we r=1 have A1 (x) = x and A2 (x) = 3x + x2 . After multiplying (7.8) by xr and summing over 1 ≤ r ≤ mi , we obtain 1 1 mi +2 ) x (Ami (x)−F (mi , 1)x)+ x (Ami +2 (x)−F (mi +2, 1)x−F(mi +2, mi +2)x = x2 (Ami +1 (x) − F (mi + 1, 1)x) + (Ami +1 (x) − F (mi + 1, mi + 1)xmi +1 ),
which is equivalent to Ami (x) − (2 + x)Ami +1 (x) + Ami +2 (x) − x = 0.
(7.9)
To solve (7.9), we introduce the generating function ψ(x, y) = mi ≥1 Ami (x)y mi . Then, if we multiply (7.9) by y mi and sum over mi ≥ 1, we get ψ(x, y) −
xy (2 + x)(ψ(x, y) − A1 (x)y) ψ(x, y) − A1 (x)y − A2 (x)y 2 + . = y y2 1−y
If we use the initial conditions and solve for ψ, the generating function for F is given by ψ(x, y) =
(1 −
y)(y 2
xy . − (2 + x)y + 1)
(7.10)
The unknown numbers F (mi , r) is the coefficient of xr y mi in the series expansion of the above ψ(x, y). Finally, we find that k + 2r (xy)r y k (xy)r = ψ(x, y) = k (1 − y)2r+1 r≥1
r≥1 k≥0
mi mi + r mi + r r mi mi r y x = x y = mi − r mi − r r≥1 mi ≥r
mi ≥1 r=1
Optimal classification policy and comparisons...
357
as an explicit formula for the F . (iv) Similar to the proof of (iii), let r−1
G(mi , r) =
(ij+1 − ij ).
1≤i1 <···
The recurrence relation for G can be found that
G(mi , r + 1) + G(mi + 2, r + 1) = 2G(mi + 1, r + 1) + G(mi + 1, r)
with the initial conditions G(mi , 1) = mi and G(mi , mi ) = 1. The remaining proof is similar and is omitted to save space. The theorem can be established. Now, we return to the proof of Theorem 6.6. We only show that ΔLDA i,1 > CD 2 2 Δi,1 . By Corollary 6.3 (ii), it is sufficient to prove (σB mi Δti + 3σ )1mi Ω# mi 1mi − 3|Ωmi | > 0. By using Theorem 7.1 and Theorem 1 in Peng and Hsu (2012), we have 2 (σB mi Δti + 3σ2 )1mi Ω# mi 1mi − 3|Ωmi |
2 mi Δti + 3σ2 ) = (σB
mi
2 (σB Δti )r−1 (σ2 )mi −r
r=1
−3
mi
2 (σB Δti )r (σ2 )mi −r
r=0
=
m i −1
2 (σB Δti )r (σ2 )mi −r
r=1
mi + r mi − r
mi + r − 1 mi − r
, mi + r − 1 mi + r mi + r mi −3 +3 mi − r mi − r mi − r − 1
2 Δti )mi (mi − 3) + 3(σ2 )mi (mi − 1) + (σB
=
m i −1 r=1
2 (σB Δti )r (σ2 )mi −r
(mi + r − 1) · · · (2r + 3)(2r + 2) h(r) (mi − r)!
2 + (σB Δti )mi (mi − 3) + 3(σ2 )mi (mi − 1),
358
C.-Y. Peng
where
4mi + 3 h(r) = (4mi − 9) r − 8mi − 18
2 +
h1 (mi ) , 4(4mi − 9)
h1 (mi ) = 48m3i − 172m2i + 84mi − 9. We claim that h(r) ≥ 0 with mi ≥ 3 for r = 1, . . . , mi − 1. For mi = 3, it can be seen that h(1) > 0 and h(2) = 0. In addition, we have h1 (mi ) ≥ h1 (4) > 0 because of dh1 (mi )/dmi > 0 for mi ≥ 4. Consequently, the theorem can be established by using Corollary 6.3 (iii).
Chien-Yu Peng Institute of Statistical Science, Academia Sinica Taipei, 11529, Taiwan E-mail:
[email protected]
Paper received: 13 August 2014.