Science in China Series A: Mathematics Sep., 2008, Vol. 51, No. 9, 1593–1608 www.scichina.com math.scichina.com www.springerlink.com
Statistical inference on mixing proportion XU XingZhong1† & LIU Fang1,2 1 2
Department of Mathematics, Beijing Institute of Technology, Beijing 100081, China The People’s Insurance Company (Group) of China, Beijing 100084, China
(email:
[email protected], liu
[email protected])
Abstract In this paper, the interval estimation and hypothesis testing of the mixing proportion in mixture distributions are considered. A statistical inferential method is proposed which is inspired by the generalized p-values and generalized pivotal quantity. In some situations, the true levels of the tests given in the paper are equal to nominal levels, and the true coverage of the interval estimation or confidence bounds is also equal to nominal one. In other situations, under mild conditions, the tests are consistent and the coverage of the interval estimations or the confidence bounds is asymptotically equal to nominal coverage. Meanwhile, some simulations are performed which show that our method is satisfactory. Keywords: asymptotic properties, generalized pivotal quantity, generalized p value, mixing proportion, mixture distributions MSC(2000):
1
62F03
Introduction
Tukey[1] proposed a mixture model when a normal population is contaminated, which is called Tukey model. Now mixture models have been widely used in biology, medicine, finance and engineering in recent decades. For a detailed account of applications and statistical analysis, see [2–4]. For mixture models, to test the hypotheses on the mixing proportion and to give confidence intervals or bounds of it are important, because we are often concerned about whether a population is contaminated or what proportions the components are in a compound. Moreover, some parameters, such as the mean life and the reliability, are linear functions of the proportion when the population is a mixture of two known life distributions. Then inferences on these parameters are equivalent to inferences on the proportion. Even when the component distributions are unknown, we can substitute the component parameters by their estimators. So if the interesting parameters are monotone in the proportion, inferences on them can be transformed on the proportion. However, most of the published papers are concerned only with the point estimation of it (see [5–7]). Some of papers discussed the homogeneity testing in mixture models, i.e., testing whether the populations are mixed. Unfortunately classical likelihood ratio test (LRT) is intractable. Lindsay[3] pointed out that the asymptotic distribution of LRT statistic for mixture models has long be a mystery. Liu and Shao[8] pointed out that “the asymptotic behavior of the LRT is unknown even for the following seemingly simple Received June 21, 2006; accepted September 21, 2007; published online June 13, 2008 DOI: 10.1007/s11425-008-0016-0 † Corresponding author This work was supported by the National Natural Science Foundation of China (Grant Nos. 10271013, 10771015)
XU XingZhong & LIU Fang
1594
homogeneity test: H0 : N (0, 1) against H1 : pN (t, 1) + (1 − p)N (0, 1), p ∈ (0, 1], t ∈ R\{0}”. In fact Hartigan[9] discovered that the LRT statistic for above hypotheses diverges to +∞ in probability. Hence for testing homogeneity of a mixture model, restricted or modified LRT statistics are considered, such as in [8] and [10–13]. Because the LRT statistic is intractable, the generalized inference method is adopted in the paper. For one side hypothesis testing and interval estimation problems about the mixing proportion, we do not find any results. At the presence of nuisance parameters, Tsui and Weerahandi[14] proposed the generalized p values for the hypothesis testing problems when trivial procedures are not applicable. In the sequel, Weerahandi[15] proposed the generalized pivotal quantity and generalized confidence interval for the interesting parameter. When we are interested in the mixing proportion, the nuisance parameters are those in component distributions. Just faced with that no trivial procedures can be applied, we employ the generalized inference method to mixture models. In generalized inference, a random variable (or a pivotal quantity) with known distribution plays a great role. If the distribution F (x) of a variable X is continuous, then the random variable F (X) has the known distribution U (0, 1). This simple fact is related to our method. Let X = (X1 , X2 , . . . , Xn ), where X1 , X2 , . . . , Xn are independently and identically distributed (i.i.d.) as the distribution function pF (x, θ) + (1 − p)G(x, θ),
(1.1)
p is the mixing proportion, F (x, θ) and G(x, θ) are component cumulative distribution functions and θ is the vector of nuisance parameters. Throughout the paper the vectors are row ones, and are denoted by boldfaced letters. An observation of X is denoted by x = (x1 , x2 , . . . , xn ). The method of this paper is inspired by the generalized pivot quantity[15] . Definition 1.1. Let S = S(X; x, η) be a function of X, x, η = (p, θ) (but not necessary to be a function of all ). The random variable S is called a generalized pivotal quantity if it satisfies the following two conditions : Condition 1. S has a probability distribution free of the unknown parameter η; Condition 2. The observed pivotal, defined as sobs = S(x; x, η) does not depend on the nuisance parameter θ. If F (x, θ) and G(x, θ) are continuous in x, then pF (X, θ)+(1−p)G(X, θ) follows the uniform distribution U (0, 1). Suppose that {U1 , U2 , . . . , Un } is an i.i.d. sample from U (0, 1). Then n
d
(pF (Xi , θ) + (1 − p)G(Xi , θ)) =
i=1
n
Ui ,
i=1
d
where = means that the random variables of two sides have an identical distribution. Given x, let S be the minimizer of the minimum problem, n (pF (xi , θ) + (1 − p)G(xi , θ) − Ui ). min 0p1 i=1
Statistical inference on mixing proportion
Then
where
1595
⎧ ⎪ 0, ⎪ ⎪ ⎨ pmin (U ; x, θ), S = ⎪ ⎪ ⎪ ⎩ 1,
pmin (U ; x, θ) < 0, 0 pmin (U ; x, θ) 1, pmin (U ; x, θ) > 1,
n n G(xi , θ) i=1 Ui − i=1 n , pmin (U ; x, θ) = n F (x , θ) − i i=1 i=1 G(xi , θ)
and U = (U1 , U2 , . . . , Un ). Replacing Ui in S with pF (Xi , θ) + (1 − p)G(Xi , θ), i = 1, 2, . . . , n, we have ⎧ ⎪ 0, S < 0, ⎪ ⎨ ∗ ∗ S = S (X; x, η) = S, 0 S 1, ⎪ ⎪ ⎩ 1, S > 1, where S = S(X; x, η) =
n θ) + (1 − p)G(Xi , θ)) − i=1 i=1 (pF (Xi , n i=1 (F (xi , θ) − G(xi , θ))
n
G(xi , θ)
.
Because S(x; x, η) = S ∗ (x; x, η) = p, Condition 2 in Definition 1.1 is satisfied by both S and S ∗ . When θ is known, Condition 1 in Definition 1.1 is also satisfied by them. The interval estimation and hypothesis testing about p will be discussed based on S ∗ in this paper. Three situations are considered separately. Section 2 will concern the case in which the nuisance parameter θ is known and a stochastic order between component distributions exists. In Section 3 the case discussed in Section 2 is continued, but the existence of a stochastic order is not needed. For the case in which the nuisance parameter θ is unknown, some asymptotic results are given in Section 4 under mild conditions. A practical example is discussed in Section 5. 2
Known nuisance parameter and ordered component distributions
If the vector θ of nuisance parameters is known, denoting F (x, θ) = F (x) and G(x, θ) = G(x), then model (1.1) can be written as pF (x) + (1 − p)G(x).
(2.1)
In this section, assume F (x) > G(x) for any x ∈ R. This assumption means that the variable Y is stochastically larger than the variable Y if Y and Y have distributions F (x) and G(x) respectively. Intuitively the relation between Y and Y can be thought as Y Y , i.e., there are two functions g1 (z) and g2 (z) and some random variable Z, such that g1 (z) g2 (z) for all z ∈ R, and Y = g1 (Z) and Y = g2 (Z) (see Lemma 1 in Chapter 3 of [16]). Any two distributions in location parameter families or in families with monotone likelihood ratio have such a stochastic order. The assumption that the component distributions are known is practical. For instance, in measuring fish lengths, one may have accurate information about the component distributions from previous studies of past generations[7] . Likewise, the forms of grain size distributions for various individual minerals can be established in the laboratory as a preliminary to the analysis of an ash deposit of a mixture of minerals[4] .
XU XingZhong & LIU Fang
1596
2.1
Hypothesis testing
Consider one-sided hypotheses H0 : p p0 ,
H1 : p > p0 ,
(2.2)
H0 : p p 0 ,
H1 : p < p0 ,
(2.3)
and
where p0 is a pre-specified value of the parameter p and 0 < p0 < 1. Although there are no nuisance parameters at present, the generalized p values can also be used to test hypotheses. Below is the definition of the generalized p values for testing hypotheses (2.2) given by Tsui and Weerahandi[14] . Definition 2.1. A generalized test variable is a function T (X; x, η) of X, x, η satisfying following three conditions : Condition 1. t = T (x; x, η) does not depend on unknown parameter ; Condition 2. For fixed x and η = (p0 , θ), the distribution of T (X; x, η) is free of the nuisance parameter θ; Condition 3. For fixed x and θ, P (T (X; x, η) t|x, (p, θ)) is nondecreasing in p. Then the generalized p value corresponding to the generalized test variable T (X; x, η) is given by p(x) = suppp0 P (T (X; x, η) T (x; x, η)|x, η). Now because θ is known, the argument η can be denoted simply by p. Let T1 (X; x, p) = p − S ∗ (X; x, η). The variable T1 (X; x, p) satisfies the three conditions of a generalized test variable in Definition 2.1. Then by Definition 2.1, the generalized p value is p1 (x) = sup P (T1 (X; x, p) T1 (x; x, p)|x, p) pp0
= P (S ∗ p0 |x, p0 ) n
n i ) + (1 − p0 )G(Xi )) − i=1 (p0 F (X i=1 G(xi ) p0 x, p0 . =P n i=1 (F (xi ) − G(xi )) To investigate the frequentist properties of the generalized p value p1 (x), the fixed-level test determined by it is considered. For a given significance level α, 0 < α < 1, the rejection region is taken as D1 (α) = {x : p1 (x) α}. Then it has the following frequentist property. Theorem 2.1. The test given by the rejection region D1 (α) is an unbiased test of the hypotheses (2.2), and P (X ∈ D1 (α)|p = p0 ) = α. Proof. For convenience, denote by P Z the probability measure determined by a random vector Z. In following text, Z will be taken as Z = X = (X1 , X2 , . . . , Xn ),
Z = U = (U1 , U2 , . . . , Un ),
Z = U ∗ = (U1∗ , U2∗ , . . . , Un∗ )
respectively, where U ∗ is a random vector independently and identically distributed as U . The rejection region D1 (α) can be denoted by n n
i=1 Ui − i=1 G(xi ) U p0 α . D1 (α) = {x : p1 (x) α} = x : P n i=1 (F (xi ) − G(xi ))
Statistical inference on mixing proportion
If p p0 , then
1597
n n
i=1 Ui − i=1 G(xi ) p0 P n i=1 (F (xi ) − G(xi ))
n n n n Ui − G(xi ) p0 F (xi ) − G(xi ) = PU U
i=1
PU
n
i=1
Ui −
i=1
= PU
n
Ui
i=1
i=1
n
G(xi ) p
i=1 n
n i=1
F (xi ) −
i=1 n
G(xi )
i=1
(pF (xi ) + (1 − p)G(xi ))
i=1
and the equality holds when p = p0 . Therefore,
n n Ui (pF (Xi ) + (1 − p)G(Xi )) α P X (X ∈ D1 (α)) P X P U i=1
i=1
n n U∗ U ∗ P Ui Ui α =P i=1
= α.
i=1
n ∗ The last equality is due to the fact that ni=1 Ui and i=1 Ui have a common continuous cumulate distribution function. If p > p0 , then n
n n Ui − ni=1 G(xi ) U p P U (pF (X ) + (1 − p)G(X )) . P U i=1 0 i i i n i=1 (F (xi ) − G(xi )) i=1 i=1 Thus, n n
U i=1 Ui − i=1 G(Xi ) p0 α P (X ∈ D1 (α)) = P P n n i=1 F (Xi ) − i=1 G(Xi )
n n ∗ Ui Ui∗ α PU PU X
i=1
i=1
= α. For testing the hypotheses (2.3), an analogous result can be obtained by changing the inequality sign in above argument, and therefore is omitted. In order to test whether the population is a mixture or not, consider the hypotheses H0 : p = 0,
H1 : 0 < p 1,
(2.4)
H0 : p = 1,
H1 : 0 p < 1.
(2.5)
or Since testing (2.5) is analogous to (2.4), only (2.4) is taken into account. Let p2 (x) = P (S ∗ = 0|x, p0 ) and D2 (α) = {x : p2 (x) α}. Then we have the following result. Theorem 2.2. The test given by the rejection region D2 (α) is an unbiased test of the hypotheses (2.4), and P (X ∈ D2 (α)|p = 0) = α.
XU XingZhong & LIU Fang
1598
Proof.
Consider the probability of rejection region D2 (α). P (X ∈ D2 (α) | p = 0) n
n Ui − i=1 G(Xi ) X U i=1 n n 0 αp = 0 P =P i=1 F (Xi ) − i=1 G(Xi )
n n Ui G(Xi ) αp = 0 = PX PU i=1
i=1
n n U∗ U ∗ P Ui Ui α =P i=1
= α,
i=1
and P (X ∈ D2 (α) | 0 < p 1) n
n Ui − i=1 G(Xi ) n 0 α0 < p 1 = P X P U n i=1 i=1 F (Xi ) − i=1 G(Xi )
n n X U P Ui G(Xi ) α0 < p 1 =P i=1
i=1
i=1
i=1
n n Ui pF (Xi ) + (1 − p)G(Xi ) α PX PU
n n ∗ Ui Ui∗ α = PU PU = α.
i=1
i=1
The proof is completed. 2.2
Interval estimation
To obtain the confidence intervals or bounds of the mixing proportion p, naturally the quantils of the distribution of the generalized pivotal quantity S ∗ are taken into account. For 0 < γ < 1, let Sγ (x) = sup{z : P r(S ∗ z) γ}. The following results are established. Theorem 2.3. For given 0 < α < 1, the upper and lower confidence bounds, and the confidence interval of p are S1−α (x), Sα (x) and [S α2 (x), S1− α2 (x)] respectively, and for any 0 < p < 1,
Proof.
P (S1−α (X) p) = 1 − α,
(2.6)
P (Sα (X) p) = 1 − α,
(2.7)
P (S α2 (X) p S1− α2 (X)) = 1 − α.
(2.8)
By the definition of S1−α (X), for any 0 < p < 1, P (S1−α (X) p) = P X (P U (S ∗ p) 1 − α) n
n X U i=1 Ui − i=1 G(Xi ) p 1−α P =P n n i=1 F (Xi ) − i=1 G(Xi )
Statistical inference on mixing proportion
1599
n n = PX PU Ui (pF (Xi ) + (1 − p)G(Xi )) 1 − α i=1
i=1
= 1 − α, n n where the last equality is duo to the fact that i=1 Ui and i=1 (pF (Xi ) + (1 − p)G(Xi )) are independently and identically distributed. For (2.7), P (Sα (X) p) = 1 − P (Sα (X) > p) = 1 − P (Sα (X) p) = 1 − α. The equality last but one follows from that P U (S ∗ z) is a continuous and strictly increasing function of z over the interval (0, 1). Hence, for any 0 < γ < 1 and 0 < p < 1, P (Sγ (X) = p) = P X (P U (S ∗ p) = γ)
n n X −1 p F (Xi ) + (1 − p) G(Xi ) = Hn (γ) =P i=1
i=1
= 0, n where Hn (·) is the distribution function of i=1 Ui and Hn−1 (·) is the inverse function of Hn (·). The proof of (2.8) is analogous. A byproduct of confidence interval of (2.8) is the confidence band of population distribution function given below. Corollary 2.1. Denote KL (z) = S α2 (x)(F (z) − G(z)) + G(z), KU (z) = S1− α2 (x)(F (z) − G(z)) + G(z), then P {KL(z) pF (z) + (1 − p)G(z) KU (z), ∀ z ∈ R} = 1 − α. Proof. Since F (z) > G(z) for any z ∈ R, {x : KL (z) pF (z) + (1 − p)G(z) KU (z)} = {x : p ∈ [S α2 (x), S1− α2 (x)]}. From (2.8), Corollary 2.1 holds. 3
Known nuisance parameter and not-ordered component distributions
Consider the mixture distributions (2.1) continuously. But the stochastic order between component distributions is not assured. An interesting property of the mixture distributions (2.1) is helpful to approach the problems of hypothesis testing and interval estimation. Property 1. Suppose that random variable X follows the distribution (2.1), and that both F (x) and G(x) are continuous in x. Then E(F (X) − G(X)) is independent of p. Proof.
+∞
E(F (X)) =
F (x)d(pF (x) + (1 − p)G(x))
−∞
+∞
=p −∞
From 1 = 2
+∞
−∞
we have
F (x)dF (x) + (1 − p)
+∞ −∞
E(F (X)) − E(G(X)) =
F (x)dG(x). −∞
(pF (x) + (1 − p)G(x))d(pF (x) + (1 − p)G(x)), +∞
F (x)dG(x) + Hence,
+∞
+∞ −∞
G(x)dF (x) = 1. −∞
(F (x) − G(x))d(pF (x) + (1 − p)G(x)) =
+∞ −∞
1 F (x)dG(x) − . 2
XU XingZhong & LIU Fang
1600
In this section, we can assume E(F (X)) > E(G(X)) due to Property 1. Consider the hypotheses (2.2). The generalized p value p1 (x) is employed. For the fixed level test determined by p1 (x), the following conclusion is established. Theorem 3.1. The test given by the rejection region D1 (α) is consistent to hypotheses (2.2); that is, when n → ∞, ⎧ ⎪ 0, when p < p0 , ⎪ ⎨ P (X ∈ D1 (α)) → α, when p = p0 , ⎪ ⎪ ⎩ 1, when p > p0 . To prove Theorem 3.1, the following two lemmas are needed. Lemma 3.1. Suppose that U1 , U2 , . . . , Un are i.i.d. random variables with common distrin −1 bution U (0, 1), and that Hn (x) is the distribution function of i=1 Ui . Then Hn (t)/n → −1 1/2 as n → ∞ for any 0 < t < 1, where Hn (·) is the inverse function of Hn (·). Proof.
Let t = Hn (x) = P
√
√ n 12( i=1 Ui − n/2) 12(x − n/2) √ √ . Ui x = P n n i=1
n
√ 12(x − n/2) √ zn (t) = . n
Denote
Then Hn−1 (t) =
√ √ √ nzn (t)/ 12 + n/2, and Hn−1 (t)/n = zn (t)/ 12n + 1/2. Since √ n 12( i=1 Ui − n/2) L √ −→ N (0, 1) as n → ∞, n
L
where → denotes convergence in distribution, limn→∞ zn (t) = z(t) for any t, where z(·) is the the inverse function of Φ(·), the standard normal distribution function. So, for any 0 < t < 1, limn→∞ Hn−1 (t)/n = 1/2. Lemma 3.2. Suppose that {X1 , X2 , . . . , Xn } is an i.i.d. Then when n → ∞, ⎧ ⎪ 1, when ⎪ ⎨ L p1 (X) −→ U (0, 1), when ⎪ ⎪ ⎩ 0, when Proof.
Denote Cn =
n
F (Xi ) −
i=1
n
sample from pF (x) + (1 − p)G(x). p < p0 , p = p0 , p > p0 .
G(Xi ) > 0 .
i=1
Since E(F (X)) > E(G(X)), limn→∞ P (Cn ) = 1. For any 0 < t < 1, lim P (p1 (X) t) = lim P (p1 (X) t, Cn ) n→∞
n n = lim P Hn G(Xi ) + p0 (F (Xi ) − G(Xi )) t, Cn
n→∞
n→∞
i=1
i=1
Statistical inference on mixing proportion
1601
n n = lim P Hn G(Xi ) + p0 (F (Xi ) − G(Xi )) t n→∞
= lim P
n
n→∞
= lim P n→∞
i=1
G(Xi ) + p0
i=1 n
1 n
i=1 n
(F (Xi ) − G(Xi ))
Hn−1 (t)
i=1
(pF (Xi ) + (1 − p)G(Xi ))
i=1
+ (p0 − p)
n 1 1 (F (Xi ) − G(Xi )) Hn−1 (t) . n i=1 n
Thus, from Lemma 1, limn→∞ P (p1 (X) t) = 0 when p < p0 , limn→∞ P (p1 (X) t) = 1 when p > p0 , and limn→∞ P (p1 (X) t) = t when p = p0 . Therefore, Lemma 3.2 holds. It can be seen that Theorem 3.1 is a direct consequence of Lemma 3.2. About testing the hypotheses (2.4), the following conclusion is also established. Theorem 3.2.
The test given by the rejection D2 (α) is consistent to the hypotheses (2.4).
Adopting notations in Section 2, the properties of the confidence regions of p are given bellow. Theorem 3.3.
For a given confidence level 1 − α and 0 < p < 1, lim P (S1−α (X) p) = 1 − α,
n→∞
lim P (Sα (X) p) = 1 − α,
n→∞
lim P (S α2 (X) p S1− α2 (X)) = 1 − α.
n→∞
Proof.
Substituting p0 of the expression p1 (x) with p, we find lim P (S1−α (X) p) = lim P (p1 (X) 1 − α).
n→∞
n→∞
From Lemma 3.2, we have limn→∞ P (S1−α (X) p) = 1−α. The proof of other two conclusions are analogous. Now, to investigate the power function of tests and the coverage of confidence regions, some numerical simulations are performed. Our simulations are carried out by following steps: Step 1.
For given η = (p, θ) and sample size n, produce x from the distribution (2.1).
Step 2. Taking α = 0.05, produce a sample with size n from the uniform distribution U(0, 1) and compute S . Step 3. Repeat Step 2 for 10000 times. The value of p1 (x) is computed by the frequency of S p0 , and the value of Sα (x) is computed by the 10000α-th order statistic of S from smaller to larger. Step 4. Repeat Steps 1–3 for 2000 times. The power function β(p, n) is computed by the frequency of p1 (x) α , and the coverage of Sα (x) is computed by the frequency of Sα (x) p. For mixture distributions pΦ((x − 3)/1.5) + (1 − p)Φ(x), consider to test the following hypotheses: H0 : p 0.4,
H1 : p > 0.4,
(3.1)
H0 : p = 0,
H1 : 0 < p 1.
(3.2)
XU XingZhong & LIU Fang
1602
The simulation results for (3.1) and (3.2) are displayed in Table 1 respectively. From the first part of Table 1 it can be seen that for fixed p, the function β(p, n) of n is decreasing when p < 0.4 and increasing when p > 0.4. For fixed n, β(p, n) is an increasing function of p and equal to 0.05 approximately at p = 0.4. The second part of Table 1 exhibits the analogous behavior of β(p, n). Table 1 H0 : p 0.4 vs H1 : p > 0.4 n=20 β(p, n)
p=0.2
p=0.3
p=0.4
p=0.5
0.0005
0.0020
0.0435
0.5925
n=60
0.0000
0.0015
0.0480
0.7955
n=100
0.0000
0.0000
0.0490
1.000
p=0.4
p=0.5
H0 : p = 0 vs H1 : 0 < p 1 p=0.2 β(p, n)
p=0.3
n=20
0.0455
0.2695
0.8860
0.9900
n=60
0.0445
0.3395
0.9650
1.0000
n=100
0.0550
0.6382
1.0000
1.000
Consider the mixture distributions pΦ((x − 3)/2) + (1 − p)Φ(x). When p = 0.3, the coverages of the upper confidence bound and confidence interval determined by S1−α (x) and [S α2 (x), S1− α2 (x)] are displayed in Table 2 respectively. From Table 2 the frequency of upper confidence bound covering true parameter 0.3 is approach to 0.95. When p > 0.3, the frequency of confidence upper bound covering p is less than 0.95 and decreasing as p increases. For fixed p > 0.3, the coverage is decreasing in sample size n. The second part of Table 2 displays the analogous results. Table 2 P (R0.95 > p) p=0.3
p=0.4
p=0.5
p=0.6
n=10
0.9480
0.8845
0.7825
0.6130
n=20
0.9585
0.8535
0.6700
0.4130
n=60
0.9620
0.7215
0.2750
0.0510
P (R0.025 p R0.975 ) n=10
4
p=0.2
p=0.3
p=0.4
p=0.5
0.9160
0.9490
0.9355
0.8630
n=20
0.8955
0.9510
0.8955
0.7400
n=60
0.8150
0.9525
0.8005
0.3820
Unknown nuisance parameters
When the nuisance parameters are unknown, S ∗ in Section 1 is not a generalized pivotal quantity. But it can also be applicable if θ is replaced by an appropriate estimator. Two preliminary lemmas are need to establish our results. Lemma 4.1. Suppose that Hn (x) is the distribution function given in Lemma 1. Then there is a constant A independent of n such that √ √ 6 |Hn (x1 ) − Hn (x2 )| |x1 − x2 |/ n + A/ n π
Statistical inference on mixing proportion
1603
for any x1 , x2 ∈ R and n 1. Proof. Hn (x) = P
n
Ui x
i=1
n √ 1 x 1 i=1 Ui − 12n − 12n =P n 2 n 2 √
12x √ = Hn √ − 3n , n n √ Ui where Hn (x) is the distribution function of the random variable 12n( i=1 − 12 ). It follows n that √ √
12x1 √ 12x2 √ √ √ − 3n − Hn − 3n |Hn (x1 ) − Hn (x2 )| = Hn n n √
√
√
12x1 √ 12x1 √ √ √ Hn − 3n − Φ − 3n n n √
√
12x2 √ 12x2 √ √ √ + Hn − 3n − Φ − 3n n n √
√
12x1 √ 12x2 √ √ √ + Φ − 3n − Φ − 3n . n n Hence, by Berry-Esseen bound, bc |Hn (x1 ) − Hn (x2 )| √ + n
6 |x1 − x2 | √ , π n
where b is a constant independent of n and c = 123/2 E(|U1 − 1/2|3). Taking A = bc, Lemma 4.1 holds. Lemma 4.2. Suppose that X1 , X2 , . . . , Xn are i.i.d. random variables with a continuous distribution function C(x, θ), θ ∈ Θ an open set in Rm . Denote Cθ (x, θ) = ∂C(x,θ) . If ∂θ (1) for any θ0 ∈ Θ, there is a neighborhood δθ0 of θ 0 such that Cθ (x, θ) is uniformly bounded for all x and θ ∈ δθ0 , and √ (2) θˆn is an n-consistent estimator of θ, then
n L ˆ as n → ∞, Hn C(Xi , θn ) −→ U (0, 1) i=1
where Hn (x) is given in Lemma 3.1. n Proof. Expanding i=1 C(Xi , θˆn ) at θ as n i=1
C(Xi , θˆn ) =
n i=1
C(Xi , θ) +
n
θˆn − θ), Cθ (Xi , θ)(
i=1
lies between θ and θˆn . Immediately, where θ n
n n √ 1 1 · n(θˆn − θ). ˆ √ C(Xi , θn ) − C(Xi , θ) = Cθ (Xi , θ) n i=1 n i=1 i=1
XU XingZhong & LIU Fang
1604
n = Op (1), √n(θˆn − θ) = op (1). It follows From the conditions of the lemma, n1 i=1 Cθ (Xi , θ) that,
n n Hn ˆ C(X , θ ) − H C(X , θ) i n n i i=1
i=1
n n 6 1 A ˆ √ C(Xi , θn ) − C(Xi , θ) + √ π n i=1 n i=1
= op (1). Since Hn
n
C(Xi , θ) ∼ U (0, 1),
i=1
for any n, by Slutsky Theorem,
n L Hn C(Xi , θˆn ) −→ U (0, 1)
as n → ∞.
i=1
Theorem 4.1. Suppose that X1 , X2 , . . . , Xn are i.i.d. random variables with distribution function pF (x, θ)+(1−p)G(x, θ) which satisfies the conditions on distribution functions C(x, θ) √ in Lemma 4.2. Let θˆn is an n-consistent estimator of θ. If E(F (X, θ)) > E(G(X, θ)) for all θ ∈ Θ, then limn→∞ P (X ∈ D(α)) = α, where D(α) = {x : p(x, θˆn ) α}, n
n θ) + (1 − p)G(Xi , θ)) − i=1 G(xi , θ) i=1 (pF (Xi , p , p(x, θ) = P n i=1 (F (xi , θ) − G(xi , θ)) and p is the true parameter. Proof.
Analogous to the proof of Lemma 3.2, we have
n ˆ ˆ lim P (F (Xi , θn ) − G(Xi , θn )) > 0 = 1 n→∞
i=1
and lim P (X ∈ D(α)) = lim P (p(X, θˆn ) α)
n→∞
n→∞
= lim
n→∞
n n Hn p F (Xi , θˆn ) + (1 − p) G(Xi , θˆn ) α i=1
i=1
= α. For the test of the hypotheses (2.2) and (2.4) and the interval estimation of p, we have the following conclusions. Theorem 4.2. Under the assumptions of Theorem 4.1, the test given by D1 (α) is consistent for the hypotheses (2.2), where D1 (α) = {x : p1 (x, θˆn ) α} and
n θ) + (1 − p)G(Xi , θ)) − ni=1 G(xi , θ) i=1 (pF (Xi , p0 . p1 (x, θ) = P n i=1 (F (xi , θ) − G(xi , θ))
Statistical inference on mixing proportion
1605
Theorem 4.3. Under the assumptions of Theorem 4.1, the test given by D2 (α) is consistent for hypotheses (2.4), where D2 (α) = {x : p2 (x, θˆn ) α} and n
n θ) + (1 − p)G(Xi , θ)) − i=1 G(xi , θ) i=1 (pF (Xi , 0 . p2 (x, θ) = P n i=1 (F (xi , θ) − G(xi , θ)) Theorem 4.4.
Under the assumptions of Theorem 4.1, for the given confidence level 1 − α, lim P (S1−α (x, θˆn ) p) = 1 − α,
n→∞
lim P (Sα (x, θˆn ) p) = 1 − α,
n→∞
lim P (S α2 (x, θˆn ) p S1− α2 (x, θˆn )) = 1 − α,
n→∞
where Sγ (x, θ) = sup{z : P (S ∗ z) γ}. For usual mixture models such as normal mixtures and exponential mixtures, it can be easily √ verified that the conditions of Theorem 4.1 hold. As for θˆn which must be n-consistent √ estimator of θ, we know, the maximum likelihood estimation (MLE) is n-consistent. Here, we employ the estimators of θ obtained by EM algorithm. However, the point estimation of θ is a difficult problem, because the MLE’s of parameters may not exist for a mixture model. For example, if the population is a mixture of two normal distributions with unknown and different component means and variances, then the MLE’s of parameters do not exist. No matter whether the MLE’s exist, EM algorithm gives good estimates, named the EM algorithm estimates by Nityasuddhi and B¨ohning[17] , of parameters in mixture models, but the larger size of sample is needed. They pointed out that “reasonably small bias in mean and variance seem to occur for the sample size greater than 200” for EM algorithm estimates under a normal mixture model. Now, we consider the simulation of the above asymptotic results. The process of simulations is the same as that in Section 3, while θ in S is replaced by the estimators obtained by EM algorithm. For the mixture distribution pΦ((x − μ1 )/σ1 ) + (1 − p)Φ((x − μ2 )/σ2 ), where σ1 = 1.5, σ2 = 1 and both μ1 and μ2 are unknown, the simulation results for testing the hypotheses (3.1) is displayed in the first part of Table 3 (I) when μ1 = 4, μ2 = 0. If four nuisance parameters are all unknown, the simulation results for testing (3.1) are displayed in the second part of Table 3 (II) when μ1 = 4, μ2 = 0, σ1 = 1.5 and σ2 = 1. For the mixture distribution pΦ(x − μ1 ) + (1 − p)Φ(x − μ2 ), where both μ1 and μ2 are unknown, the simulation results for testing the following hypotheses (4.1) is displayed in the third part of Table 3 (III) when μ1 = 3 and μ2 = 0. H0 : p 0.7, H1 : p > 0.7. (4.1) Consider the exponential mixtures pE(λ1 )+(1−p)E(λ2 ), where E(λ) denotes an exponential distribution with expectation λ. For testing H0 : p 0.3,
H1 : p > 0.3,
(4.2)
the simulation results are displayed in the forth part of Table 3 (IV) when λ1 = 1 and λ2 = 5.
XU XingZhong & LIU Fang
1606
From Tables 3 it can be seen that the power function β(p, n) behaves as that in the simulations in Section 3 and displays the corresponding asymptotic properties of the theorem in Section 4. For exponential mixtures, the coverage of S1−α (x, θˆn ) and [S α2 (x, θˆn ), S1− α2 (x, θˆn )] is displayed in Table 4 respectively when p = 0.3, λ1 = 1, λ2 = 5. Table 3 I
β(p, n)
p=0.3
p=0.4
p=0.5
p=0.7
n=100
0.0025
0.0546
0.1190
0.4275
n=200
0.0000
0.0460
0.5010
0.9990
n=500
0.0000
0.0490
0.7460
1.0000
II
β(p, n)
p=0.3
p=0.4
p=0.5
p=0.7
n=100
0.0010
0.0610
0.5060
0.7628
n=200
0.7580
0.0475
0.7575
0.9248
n=500
0.0000
0.0460
0.9850
1.0000
p=0.6
p=0.7
p=0.8
p=0.9
0.0048
0.0480
0.8060
0.9127
III
n=100 β(p, n)
n=200
0.0032
0.0512
0.9154
0.9760
n=500
0.0013
0.0395
0.9950
1.0000
p=0.2
p=0.3
p=0.4
p=0.5
n=100
0.0085
0.0640
0.4265
0.8570
n=200
0.0015
0.0530
0.6420
0.9970
n=500
0.0000
0.0475
0.9050
1.0000
IV
β(p, n)
Table 4 P (R0.95 > p) p=0.3
p=0.4
p=0.5
p=0.6
n=100
0.9710
0.6270
0.1710
0.0170
n=200
0.9430
0.4060
0.0260
0.0010
n=500
0.9540
0.1060
0.0000
0.0000
P (R0.025 p R0.975 ) n=100
5
p=0.2
p=0.3
p=0.4
p=0.5
0.8314
0.9480
0.8285
0.6400
n=200
0.4218
0.9510
0.3685
0.0927
n=500
0.1012
0.9500
0.1463
0.0000
A practical example
We now discuss an example about the fitting of mixture models to data about the released time of relay. Table 5 gives the results of 466 measurements of the released time of a maximum time relay (measured in seconds)[18] . From the corresponding histogram, Hald thought that
Statistical inference on mixing proportion
1607
the distribution is composed of two presumably normal distributions in about the ratio 1/1 with the same standard deviation and a difference between the means of about 0.08 second. Therefore, we fit these data with a mixture of two normal distributions in which the two standard deviations of components are assumed to be equal or unequal respectively. Table 6 displays the numerical results. The estimates of parameters by EM algorithm are p = 0.5817, μ 1 = 1.0657, μ 2 = 1.1530, σ 1 = 0.0220 and σ 2 = 0.0240 when the standard deviations are unequal, and p = 0.5909, μ 1 = 1.0663, μ 2 = 1.1540 and σ 1 = σ 2 = 0.0225 when the standard deviations are equal. It can be seen that the numerical results are similar in two cases. However the estimates of mixing proportion p in two cases are obviously not equal to 0.5. The p values corresponding to hypothesis H0 : p 0.5 are 0.0011 and 0.0004 respectively, and the 95% confidence intervals are [0.53, 0.64] and [0.54, 0.64] respectively. Hence the hypotheses H0 : Table 5
The distribution of 466 measurements of released time of a relay (Class length: 0.01 sec)
Release time
Number of
Release time
Number of
in seconds
measurements
in seconds
measurements
1.00
1
1.12
12
1.01
0
1.13
28
1.02
2
1.14
20
1.03
20
1.15
27
1.04
23
1.16
39
1.05
49
1.17
30
1.06
41
1.18
14
1.07
43
1.19
8
1.08
39
1.20
6
1.09
27
1.21
1
1.10
21
1.22
0
1.11
14
1.23
1
Total
466
Table 6
The numerical results for equal and unequal standard deviation respectively
Unequal standard deviation p
μ1
σ1
μ2
σ2
0.5817
1.0657
0.0220
1.1530
0.0240
p0 = 0.4
p0 = 0.5
p0 = 0.55
p0 = 0.59
p0 = 0.6
Generalized p-values
0
0.0011
0.1102
0.5910
0.7190
95% Confidence interval
[0.53
0.64]
p
μ1
σ1
μ2
σ2
0.5909
1.0663
0.0225
1.1540
0.0225
p0 = 0.4
p0 = 0.5
p0 = 0.55
p0 = 0.59
p0 = 0.6
Generalized p-values
0
0.0004
0.0504
0.4402
0.5724
95% Confidence interval
[0.54
0.64]
EM Estimator H0 : p p0 H1 : p > p0
Equal standard deviation EM Estimator H0 : p p0 H1 : p > p0
1608
XU XingZhong & LIU Fang
p 0.5 and H0 : p = 0.5 should be rejected. These two decisions are consistent with the data, because the data around μ 1 are more than around μ 2 . For component standard deviations assumptions, we should select that σ1 and σ2 are unequal, because the p values in unequal case are obviously larger than those in equal case for H0 : p p0 when p0 0.5. References 1 Tukey J W. A survey of sampling from contaminated distributions. In: Olkin, eds. Contributions to Probability and Statistics. Stanford: Stanford University Press, 1960 2 Peel D, Mclanchlan G J. Finite Mixture Models. New York: Wiley, 2000 3 Lindsay B G. The geometry of mixture likelihood: a general theory. Ann Statist, 11: 86–94 (1983) 4 Titterington P M, Smith A F M, MaKov U E. Statistical Analysis of Finite Mixture Distribution. New York: Wiley, 1985 5 Woodward W A , Whitney P, Eslinger P W. Minimum hellinger distance estimation of mixture proportions. J Statist Plann Inference, 48: 303–319 (1995) 6 Clarke B R. An unbiased minimum distance estimator of the proportion parameter in a mixture of two normal distributions. Statist Probab Lett, 7: 275–281 (1989) 7 James I R. Estimation of mixing proportion in a mixture of two normal distributions from simple rapid measurements. Biometrics, 34: 265–275 (1978) 8 Liu X, Shao Y. Asymptotics for the likelihood ratio test in a two-component normal mixture model. J Statist Plann Inference, 123: 61–81 (2004) 9 Hartigan J A. A failure of likelihood ratio asymptotics for normal mixtures. In: Le Cam L, Olshen A, eds. Proceedings of Berkeley Conference in Honor of Jerzy Neyman and Jack Kiefer. Monterey, CA: Wadsworth Advanced Books, 1985 10 Liang K Y, Rathouz P J. Hypothesis testing under mixture models: application to genetic linkage analysis. Biometrics, 55: 65–74 (1999) 11 Garel B. Likelihood ratio test for univariate Gaussian mixture. J Statist Plann Inference, 96: 325–350 (2001) 12 Chen H, Chen J. Large sample distribution of the likelihood ratio test for normal mixture. Statist Probab Lett, 52: 125–133 (2001) 13 Mosler K, Seidel W. Testing for homogeneity in an exponential mixture model. Aust N Z J Stat, 43(2): 231–247 (2001) 14 Tsui K W, Weerahandi S. Generalized p-values in significance testing of hypothesis in the presence of nuisance parameters. J Amer Statist Assoc, 84: 602–607 (1989) 15 Weerahandi S. Generalized confidence intervals. J Amer Statist Assoc, 88: 899–905 (1993) 16 Lehmann E L. Testing Statistical of Hypothesis. New York: Wiley, 1983 17 Nityasuddhi D, B¨ ohning D. Asymptotic properties of the EM algorithm estimate for normal mixture models with component specific variances. Comput Statist Data Anal, 41: 591–601 (2003) 18 Hald A. Statistical Theory with Engineering Applications. New York: Wiley, 1955