Metrika (2000) 52: 115±131
> Springer-Verlag 2000
Mean square error estimation in multi-stage sampling Arijit Chaudhuri, Arun Kumar Adhikary and Shankar Dihidar Indian Statistical Institute, 203 Barrackpore Trunk Road, Calcutta 700035, INDIA Received: 19 February 1999
Summary: Suppose for a homogeneous linear unbiased function of the sampled ®rst stage unit (fsu)-values taken as an estimator of a survey population total, the sampling variance is expressed as a homogeneous quadratic function of the fsu-values. When the fsu-values are not ascertainable but unbiased estimators for them are separately available through sampling in later stages and substituted into the estimator, Raj (1968) gave a simple variance estimator formula for this multi-stage estimator of the population total. He requires that the variances of the estimated fsu-values in sampling at later stages and their unbiased estimators are available in certain `simple forms'. For the same set-up Rao (1975) derived an alternative variance estimator when the later stage sampling variances have more `complex forms'. Here we pursue with Raj's (1968) simple forms to derive a few alternative variance and mean square error estimators when the condition of homogeneity or unbiasedness in the original estimator of the total is relaxed and the variance of the original estimator is not expressed as a quadratic form. We illustrate a particular three-stage sampling strategy and present a simulation-based numerical exercise showing the relative e½cacies of two alternative variance estimators. Key words: Multi-stage sampling, Survey population, Variance estimation, Varying probability sampling. AMS Subject Classi®cation: 62 D05 1 Introduction Suppose a ®nite survey population U
1; . . . ; i; . . . ; N has N ®rst stage units (fsu) with values yi
i 1; . . . ; N of a variable y of interest. Based on a sample s of fsu's suitably taken in the ®rst stage of sampling with a probability
116
A. Chaudhuri et al.
p
s from U let for the population total Y mator be taken in the from X t bsi Isi yi :
P
yi y1 yN , an esti
1:1
Here Isi 1 if i A s; 0 otherwise; later we shall also use Isij Isi Isj ; bsi 's are constants free of Y
y1 ; . . . ; yi ; . . . ; yN . Writing E1 as the operator for expectation over the ®rst stage of sampling, let E1
bsi Isi 1
for every i in U:
1:2
Then, E1
t Y i.e. t is unbiased for Y. Denoting by V1 the operator for variance in the ®rst stage of sampling and PP by the sum over i; j 1; . . . ; N
i 0 j, we may write, following Raj (1968), X XX ci yi2 cij yi yj
1:3 V1
t where ci E1
bsi2 Isi
1;
cij E1
bsi Isi
1
bsj Isj
1:
Let v1
t
X
csi Isi yi2
XX
csij Isij yi yj
1:4
be an unbiased estimator for V1
t so that E1
csi Isi ci ;
E1
csij Isij cij :
1:5 P P Later we shall use 0 0 to denote summing over i; j 1; . . . ; N, without the restriction i 0 j. In this `set-up' treated by Raj (1968), let yi -values be unobservable but sampling be carried out in one or more subsequent stages in such a way that the following conditions hold with EL and VL as operators respectively for expectation and variance in the `later' stages of sampling: (i) (ii) (iii) (iv)
There exist estimators ri for yi such that EL
ri yi ; VL
ri Vi ; ri 's are `independently' distributed; there exist estimators vi for Vi such that EL
vi Vi .
Under these conditions Raj (1968) recommended for Y the multi-stage estimator X e bsi Isi ri which is t evaluated at ``Y equal to R
r1 ; . . . ; rN ''. Thus, if we write t t
s; Y , then e t
s; R. Let E E1 EL be the overall operator for expectation
Mean square error estimation in multi-stage sampling
117
and V V1 EL E1 VL the over-all variance operator. It follows that E
e E1 EL
e E1
t Y :
1:6
Thus, e is an unbiased estimator for Y. Also X bsi2 Isi Vi ; V
e V1 EL
e E1 VL
e V1
t E1
1:7
V1
e
X
ci ri2
XX
cij ri rj
which is V1
t evaluated at Y R; and X XX v1
e csi Isi ri2 csij Isij ri rj
1:8
1:9
which is v1
t evaluated at Y R: Then using (1.2), (1.3), and (1.7), for X v
e v1
e bsi Isi vi we have Ev
e V
e
1:10
1:11
as observed by Raj (1968). This observation led Raj to recommend v
e as an unbiased estimator for V
e. P bsi Isi vi in (1.10) We may remark that if we write V
v1 ; . . . ; vN , then may be expressed as t
s; V . Thus, Raj's (1968) multi-stage variance estimation rule for e is v
e v1
tjY R tjY V v1
tjY R t
s; V :
1:12
It may be remarked that Durbin (1953) earlier gave a version of this rule with t as the Horvitz and Thompson's (1952) estimator, in particular. Retaining the above set-up but with the modi®cations that (ii) and (iv) above are respectively replaced by (ii) 0 and (iv) 0 where (ii) 0 VL
ri Vsi for i in s; (iv) 0 there exist estimators vsi for Vsi such that EL
vsi Vsi when i A s. Rao (1975) recommended for V
e the estimator X
bsi2 csi Isi vsi
1:13 v
e v1
e for which he proved the unbiasedness condition Ev
e V
e:
1:14
For the results (1.10)±(1.12) of Raj (1968) and (1.13)±(1.14) of Rao (1975) the relations (1.1)±(1.5) are all essential. If (1.1) is replaced by X t 0 t 0
s; Y as bsi Isi yi
1:15 such that as 0 0 but E1
as 0 and (1.2) is retained, that is a ``non-
118
A. Chaudhuri et al.
homogeneous'' linear unbiased estimator is tried, then (1.10), (1.12)±(1.14) need not follow. If (1.1) is retained but (1.2) is relaxed, then the probelm of variance estimation reduces to one of estimating the ``Mean Square error'' (MSE) of e, namely, M
e E1 EL
e
Y 2:
Then, M
e E1 EL
e
Y 2
EL e
EL e
E1 VL
e E1
t
Y 2 E1 VL
e M1
t;
where M1
t E1
t Y 2 , the MSE of t in the ®rst stage of sampling. This problem we intend next to address. Again, if (1.1)±(1.2) are retained and following Rao (1979) the ®rst stage sampling variance of t is expressed as V1
t
1 X0 X0 y dij wi wj i 2 wi
yj wj
2
;
1:16
where wi 0 0
and
dij
E1
bsi Isi
1
bsj Isj
1;
then following Rao (1979) again an unbiased estimator of this V1
t may be taken as v2
t
1 X0 X0 y dsij Isij wi wj i 2 wi
yj wj
2
;
1:17
such that E1
dsij Isij dij , then also Raj's (1968) and Rao's (1975) methods of estimating V
e shown earlier are not immediately applicable to derive estimators for V
e V1 EL
e E1 VL
e with V1
t as in (1.16), t as in (1.1)±(1.2) unless one tediously re-expresses v2
t in (1.17) as a quadratic form in yi 's. Raj (1956) gave a well-known estimator for Y based on the method of sampling with probabilities proportional to sizes (PPS) without replacement (WOR) as brie¯y described below. Suppose there are numbers pi
0 < pi < 1, P pi 1 called `normed size-measures'. Then in PPSWOR sampling distinct units from U
1; . . . ; i; . . . N in n
V 2 successive draws namely i1 ; . . . ; in are respectively chosen with probabilities pi1 ;
1
pi 2 ;...; pi 1 1
i1 ; . . . ; in 1; . . . ; N
pi1
pin
pin
; 1
i1 0 0 in ; 2 U n < N:
Mean square error estimation in multi-stage sampling
119
Then, Raj's (1956) unbiased estimator for Y is tD
n 1X tj ; where n j1
t1
yi1 ; pi1
tj yi1 yij 1
yij pij
1
pi1
pij 1 ;
j 2; . . . ; n:
For this tD a simple unbiased variance estimator given by Raj (1956) is vD
n X
1 n
n
1
tj
tD 2 :
j1
Throwing tD and vD into the forms (1.1), (1.4) respectively would be a tedious exercise needed to apply Raj's (1968) and Rao's (1975) variance estimation formulae if this strategy of Raj (1956) is to be extended to cover the multi-stage sampling situation. Moreover, it is worthwhile to mention that in a given survey for certain variables the fsu-values may be ``ascertainable'' but not for some others. For example, villages may be fsu's and one may know the number of households in them classi®ed by the occupations of their principal earners, the numbers of schools, health care centres, business establishments etc. they respectively have but one may not know the age and sex-wise distribution in the households of the villages, the extent of indebtedness of the household members, the household expenses on their necessities etc. In that case further sampling of the `households' which are the ssu's may be needed to gather village level information. In such cases it is useful, for the sake of easy computerised processing to use a standard uni-stage variance estimator like v2
t or vD above to cover the `former set' and consider its easy modi®cation like (1.12) or (1.13) applicable to cover the `latter set' of variables. But the approaches of Raj (1968) and Rao (1975) do not readily make it evident that it may really be always possible to do so. Bearing these in mind we develop and present some results in the next section. 2 Developing `Variance and MSE-estimators' in multi-stage sampling Retaining the conditions (i)±(iv) in Raj's (1968) set-up it is possible to claim that ``E1 commutes with EL '' i.e. E E1 EL EL E1 . But with Rao's (1975) approach when (ii), (iv) are replaced by (ii) 0 , (iv) 0 this cannot be the case as we shall see. We feel it is worthwhile to verify this commutativity property with one illustration which we shall utilize in the sequel. Let us consider a case of sampling in two stages for which a sample of n fsu's is drawn from the population of N fsu's employing the scheme due to Rao, Hartley and Cochran (RHC, 1962) P using known positive normed sizemeasures pi
0 < pi < 1; i 1; . . . ; N; pi 1. The ith fsu is supposed to consist of Mi second stage units (ssu) bearing known normed positive size-
120
A. Chaudhuri et al.
measures gij
j 1; . . . ; Mi ; i 1; . . . ; N. From each selected fsu, say, i, a sample of mi ssu's is then selected applying the RHC scheme again using these gij 's ± the selection is done independently across the selected fsu's. In applying the RHC scheme in the ®rst stage n non-overlapping `groups' are formed at random out of the N fsu's, the ith group containing Ni fsu's P N subject to n Ni N;
i 1; . . . ; n; Ni 's are chosen as integers closest to n P here n denotes summing over the n groups. From the ith group one fsu is selected out of the Ni fsu's with a probability proportional to its p-value ± this is repeated independently over the groups. Writing Qi pi1 piNi and denoting for simplicity by
pi ; yi , the p ± and y-values for the fsu selected from the ith group the unbiased estimator for Y given by RHC for the single-stage sampling is X Qi yi : n p i P 2 N N , the variance of tR is Writing A n i N
N 1 X 2 yi 1 XX y pi pj i V1
tR A Y2 A 2 pi pi tR
2:1
yj pj
!2 :
2:2
When yi is not ascertainable, from each selected fsu, the ssu's are independetly selected by the RHC scheme again. Using somewhat obvious notations we may write the RHC estimator ri for yi as X Hij yij
2:3 mi g ij P Here mi is the sum over the mi groups into which the Mi ssu's in the ith fsu are to be split up to choose from them a sample of mi ssu's by the RHC scheme;
gij ; yij ± the known normed size-measure and the y-value for the single ssu selected from the ijth group
j 1; . . . ; mi corresponding to the ith fsu, Hij is the sum of the normed size-measure ±values over the Nij ssu's P Mi taken in the ijth group, each Nij taken close to subject to mi Nij Mi . mi By vij we shall denote the Vi -value corresponding to the jth
j 1; . . . ; Ni fsu falling in the ith group in choosing the sample of n fsu's
i 1; . . . ; n by the RHC scheme. Further, we shall denote by EG the operator of expectation for a given grouping and by ES that over formation of these n groups. With this background we have ri
eR
X Qi r tR jY R n p i i
for which it follows that EL
eR tR ;
E1
eR
X
ri R; say;
Mean square error estimation in multi-stage sampling
121
E
eR E1 EL
eR E1
tR Y and also; E
eR EL E1
eR EL
R Y : Then, V
eR E1 EL
eR Y 2 E1 VL
eR V1 EL
eR " # X Qi 2 Vi V1
tR E1 n p i
X
E
pi1 piN 2 n S i
Ni X vij pi j j1
!
"
X n
ES
X 2 1 yi A
pi1 piN pi
Y
2
i
Ni X 1
! pij
Ni X vi j 1
pi j
!# A
X 2 y i
pi
Y2
X 2 X Ni X Ni Ni 1 X Vi yi Vi
1 pi A n N N N 1 pi pi X X 2 X Vi X yi 2 Vi A Vi A Y pi pi X 2 X X Vi yi 2
1 A Vi A Y : A pi pi get
Y
2
2:4
On the other hand reversing the order of the operators of expectation we V
eR EL E1
eR
Y 2 EL V1
eR VL E1
eR
X 2 ri 2 EL A R VL
R pi X 2 X X yi Vi X 2 Y Vi A Vi A pi pi X 2 X Vi X yi 2 A
1 A Vi A Y pi pi
2:5
From (2.5) and (2.4) our claim that ``E1 EL EL E1 '' is veri®ed in this case. In some other examples also we checked this to be true. So, from now on we presume that when (i)±(iv) hold good we have ``E1 EL EL E1 ''.
122
A. Chaudhuri et al.
We may however note that if (ii) 0 and (iv) 0 hold, that is if we intend to cover Rao's (1975) approach, then we cannot use this `commutativity'. This is because Vsi ; vsi depend on s and as such without operating ®rst by EL on terms involving vsi the operator E1 cannot be applied as one may check with any of the examples treated by Rao (1975). So, in what follows we shall throughout assume that (i)±(iv) hold and E1 commutes with EL . Let us present below a few results of interest in the present context. Theorem 1. Let (i)±(iv) hold and E1 commute with EL ; t t
s; Y
satisfy E1
t Y ;
e t
s; R
satisfy EL
e t:
Then, (a) P E1
e R, (b) E
e Y , (c) V
e EL V1
e VL E1
e EL V1
e Vi ; (d) If there exists any v1
t v1
s; Y satisfying E1 v1
t V1
t, then writing v1
e for v1
t with Y in the latter equal to R and v for t with Y in the latter replaced by V , it follows that v
e v1
e v
2:6
satis®es Ev
e V
e. Proof: Easy and hence omitted. Remark I. (2.6) is a generalization of (1.12). For example, t in Theorem 1 may be chosen as t 0 of (1.15), V1
t may be as in (1.16), v1
e may be taken in the case the simple formula (2.6) applies. form vD in section 1 and in each such P To establish our next result let t bsi Isi yi for which E1
t may not equal Y i.e. `(1.2) is relaxed', but let there exist wi
0 0 such that t equals Y
if yi z wi :
Rao (1979) has illustrated many such situations and from this source we know that we may write 1 X0 X0 yi yj 2 2 M1
t E1
t Y dij wi wj 2 wi wj P with dij as in (1.16). Then, for e bsi Isi ri , we have M
e E
e
Y 2 E1 EL
e
EL
e
EL
e
Y 2
E1 VL
e M1
t and we have Theorem 2. With dsij as in (1.17) and 1 X0 X0 ri m1
e dsij Isij wi wj 2 wi
rj wj
2
;
2:7
Mean square error estimation in multi-stage sampling
123
an unbiased estimator for M
e is m
e m1
e
vj 1 X0 X0 vi dsij Isij wi wj 2 2 2 wi wj
!
X
bsi2 Isi vi
2:8
Proof: That Em
e equals M
e follows immediately. To work out our next result, letting t be as in (1.15) allowing as 0 or 00 and relaxing (1.2) suppose we commit negligible errors if we ignore the discrepancies D E1
t Y and d E1
e R, for t and e tjY R assuming the sample-size to be large so that for Y ; R respectively, t t
s; Y , e t
s; R may be regarded as `asymptotically design unbiased' (ADU) and `asymptotically design consistent' (ADC) estimators, in the sense of Brewer's (1979) asymptotic approach. Then we have the following proposition. Proposition: M
e EL E1
e
E1
e E1
e
Y 2
`approximately equals' EL E1
e
R 2 EL
R
Y 2 EL E1
e
R 2
X
Vi :
Then, if there exists a function m2
t such that E1 m2
t `approximately equals' M1
t, and m2
e m2
tjY R , then an ``approximately unbiased'' estimator for M
e is X bsi Isi vi m2
e tjY V
2:9 v
e m2
e as with as equal to or not equal to zero. For an illustration let x be a variable well-correlated with y having values xi and a total X. Let pi E1
Isi > 0, pij E1
Isij > 0, Dij pi pj
pij ;
Ri
> 0
be freely assignable constants like 1 1 1 1 pi ; ; ; xi xi2 pi xi pi xi etc. Then, letting P y xi Ri Isi bR P i 2 ; xi Ri Isi P y xi Ri pi BR P i 2 ; xi Ri pi
e i yi
bR xi ;
E i yi
BR xi ;
the well-known ADU and ADC estimator for Y based on a single-stage sam-
124
A. Chaudhuri et al.
ple is Cassel, SaÈrndal and Wretman's (CSW, 1976) generalized regression (GREG) estimator of the form t 0 as in (1.15) with as 0 and is given by X Isi X Isi yi X xi tG bR pi pi X
Isi ; writing pi X Isi xi Ri pi P 2 xi gsi 1 X pi xi Ri pi
yi gsi
From SaÈrndal (1982) we know that its approximate MSE is given by M
tG
1XX Ei Dij 2 pi
Ej pj
2
:
SaÈrndal (1982) has also given 2 estimators for M
tG as mk
tG
1 X X Isij ei Dij aki 2 pij pi
a1i 1;
akj
ej pj
2
;
k 1; 2
a2i gsi :
Chaudhuri and Maiti (1994) considered the following versions of tG , M
tG , mk
tG when the sample is chosen employing the RHC scheme. They respectively take the forms X Qi X Qi X Qi yi X x yi hsi b tGR i R n p n p pi i i n pi X Qi xi Ri Q i x P pi ; n p i 2 i x R i n i Qi
where hsi 1 X
AXX Fi pi pj M
tGR 2 pi P
Fi y i
!2 ;
pi Qi CR P pi ; 2 xi Ri Qi
CR xi ;
mk
tGR B
Fj pj
X X n
n
Qi Qj
yi xi Ri
ei bki pi
ej bkj pj
!2 ;
k 1; 2; b1i 1; b2i hsi ;
Mean square error estimation in multi-stage sampling
P ± a formula due to RHC (1962), B
mk0
tGR
X X Qi Qj C n nNN i j
N2
ei bki pi
n
125
Ni2 N P 2 , and n Ni ej bkj pj
!2 ;
k 1; 2;
± a formula due to Ohlsson (1989), P 2 N N : C n i N
N 1 P P Here n n denotes summing over the distinct pairs of n groups with no overlap. Corresponding to tG ; tGR ; mk
tG ; mk0
tGR , the multi-stage estimators, applying (2.8) are respectively eG
X
ri gsi
Isi ; pi
vk
eG mk
eG
eGR X
X Qi rh n p i si i
vi gsi
Isi ; pi
k 1; 2
2:10
2:11
writing mk
eG for mk
tG with Y replaced by R, vk
eGR mk
eGR
X Qi vh ; n p i si i
k 1; 2
2:12
vk0
eGR mk0
eGR
X Qi vh ; n p i si i
k 1; 2
2:13
writing mk
eGR for mk
tGR with Y replaced by R, and mk0
eGR for mk0
tGR with Y replaced by R. In the next section we report certain results that we developed along the above lines as we needed them to apply in implementing two surveys in Indian Statistical Institute, Calcutta. There we actually adopted a three-stage sampling scheme in which the RHC scheme was adopted in the ®rst two stages and a simple random sample (SRS) was taken without replacement (WOR) in the third stage. 3 Variance Estimation in a three stage sampling scheme Suppose a sample is to be chosen in three stages. In the ®rst two stages the selection is by adopting the RHC scheme as described in Section 2; let the ijth ssu in the ith fsu consist of Tij third stage units (tsu) and a sample of tij tsu's be selected out of these Tij tsu's by the SRSWOR method. Using Nij 's introduced in Section 2 and also gij , Hij etc, let P P 2 2 Mi Mi mi Nij m Nij ; Bi 2 i P ; Ai 2 Mi
Mi 1 Mi mi Nij
126
A. Chaudhuri et al.
further let xi
X Hij yij ; mi g ij
e
X Qi x; n p i i
zi
X Hij wij ; mi g ij
X
X
xi ;
Z
X
zi ;
writing yijk as the y-value for the kth tsu in the ijth ssu and the tij tsu's in the sample, let wij
Tij X y ; k ijk tij
wi
Mi X
P
as sum over
k
wij :
j1
Then, writing Ei ; Vi ; i 1; 2; 3, as operators for expectation and variance respectively for the ith stage of sampling we have, E2
xi yi ;
E1 E2
e Y ;
E3
wij yij ;
E3
zi xi :
Also, we shall write E123 E1
E2 E3 E1 E2
E3 E1 E2 E3 E; the over-all operator of expectation over the three stages of sampling and V for the overall variance operator. Next, let us write X Qi z; u n p i i v2
xi Bi
v3
wij
X Hij y2 mi g 2 ij ij
Tij2
X 1 1 yijk k Tij
tij 1
1 tij
! xi2 ;
v2
zi Bi
X Hij wij2 mi g 2 ij
wij Tij
2
;
! zi2 :
Then, we may observe the following. E2 v2
xi V2
xi Ai
1
V2
zi Ai
Mi X wij2 1
E2 v2
zi V2
zi ;
gij
!
Mi X yij2
yi2 ;
gij !
wi2
;
E2
zi wi ;
V3 E2
zi
Mi X
! V3
wij
1
E3 v3
wij V3
wij ;
E2 E3
X
X Mi Hij v
w V3
wij V3 E2
zi 3 ij mi g ij 1
Mean square error estimation in multi-stage sampling
127
X X yij Qi E1 E2 E3 v2
zi v3
wij V23
Z E23
Z n p mi g ij i
Y 2:
E
u Y ; V
u E123
u Y 2 E23 E1
u Y 2 E23 E1
u Z 2
Z Y 2 : Now, E1
u
P zi2 Z A pi 2
X X
Z
2
QQ n i j
zj pj
X X Qi Qj d2 C n nNN i j
zi pi
zj pj
n
and it follows that for
!2
zi pi
d1 B
B
X Qi 2 z n p2 i i
u2
and
E1
d1 E1
d2 E1
u
!2
Z 2 :
Then, we have Theorem 3. Given X Qi v1
u B n p2 i
u2
X Hij X Qi v
z v
w 2 i 3 ij n p mi g ij i
and X X Qi Qj v2
u C n nNN i j
zi pi
zj pj
!2
X Qi X Hij v2
zi v3
wij ; n p mi g ij i
it follows that Ev1
u Ev2
u V
u. Proof. Easy and hence omitted. In hitting upon Theorem 3 we applied the approach of our Theorem 1. An alternative estimator for V
u is available following the approach of Raj (1968) as follows: Let us express the RHC estimator of the variance of the RHC estimator P Qi y in the form for Y namely tR n pi i X XX dsi Isi yi2 dsij Isij yi yj so that v1
tR E1
dsi Isi E1
! X Qi 2 n p i
1:
128
A. Chaudhuri et al.
Let v2
tR v3
tR v20
zi
X X
dsi Isi xi2 dsi Isi zi2
v2
zi
XX XX
dsij Isij xi xj dsij Isij zi zj
! X Hij
1 Hij v3
wij : mi gij2
Bi
Then we have Theorem 4. Given v
u v3
tR
X
X Qi X Qi 2 0 dsi Isi v3
zi v
z v3
zi n p 2 i n p i i
it follows that Ev
u V
u. Proof: E3 v3
tR v2
tR E3
X
Qi 0 v
z n p 2 i i
X
dsi Isi V3
zi
" X Hij X Qi B y2 i n p mi g 2 ij i ij
So E3 v
u v2
tR E2 E3 v
u v1
tR
!# xi2
X Qi X Qi 2 v
x V3
zi n p 2 i n p i i X
dsi Isi V2
xi
X Qi V2
xi n p i
X Qi 2 E2 V3
zi n p i
So; Ev
u E1 E2 E3 v
u E1 v1
tR X X Qi dsi Isi V2
xi V2
xi E1 n p i X Qi 2 E1 E2 V3
zi n p i X Qi x V1
tR E1 V2 i n p i " # X Qi 2 E2 V3
zi E1 n p i
Mean square error estimation in multi-stage sampling
129
This is because X X Qi dsi Isi V2
xi
x E1 V 2 i n p i " ! # X Qi 2 X Qi X Qi E1 V2
xi V2
xi V2
xi n p n p n p i i i " # X X Qi 2 Qi E1 V2
xi E1 V2 x : n p n p i i i So, ®nally, V
u E1 E2 E3
u
Y 2 E1 E2 E3
u
E23 u 2
E23 u
Y 2
Y 2 E1 V23
u " !# X Qi X Qi 2 V1
tR E1 V2 xi E2 V3
zi Ev
u: n p n p i i
E1
tR
Remark II. There seems to be no guarantee that v1
u, v2
u, v
u must be non-negative for every sample of observations. In the Section 4 below we shall numerically examine the relative e½cacies of two alternative estimators v1
u and v
u for the variance V
u through a simulation exercise. 4 Relative e½cacies of two variance estimators in a three stage sampling In two di¨erent surveys carried out at the Indian Statistical Institute, Calcutta the above-mentioned three stage sampling was implemented. In one of them the variance estimator v1
u and in the other v
u was applied as is reported for the two surveys. To compare the relative e½cacies of v1
u and v
u we consider it useful to apply certain performance criteria which may be evaluated only if certain details are used for numerical calculations. So, we consider it appropriate to undertake a simulation study. Let us consider certain ®ctitious data relating to a district composed of 10 administrative blocks. The blocks are taken as the fsu's and they are supposed to be composed of a number of villages which are the ssu's. The households (hh) in the villages are the tsu's. Some details are given in Table 1. The number of villages in a `block' is taken as its size-measure; using this applying RHC scheme 4 blocks are sampled. Using the number of people in a `village' as its size-measure, from each selected block, 22 percent of the villages, rounded up to the nearest integer, is sampled applying the RHC scheme again. A 4 percent, rounded up to the higher integer, sample of households is taken by SRSWOR method from each village. The purpose is to estimate the total population in the district. Note that though the size-measures are chosen in the manners described, the total population Y 271986 will not be estimated free of error because the households are chosen by SRSWOR method. To compare v1
u with v
u we repeat the
130
A. Chaudhuri et al.
Table 1. Composition of 10 blocks in a district Serial number of block
Number of villages in blocks
Total population in blocks
Serial number of blocks
Number of villages in blocks
Total population in blocks
1 2 3 4 5
39 30 55 51 60
23239 22253 32756 29074 35079
6 7 8 9 10
59 56 41 33 42
33624 31373 21435 19219 23934
Total
271986
Table 2. A summary of e½cacy of v1
u vs v
u Serial number of `set' of replicated samples
Number of replicated samples in the `set'
ACP using v1
u
(1)
(2)
v
u
ACV using v1
u
v
u
(3)
(4)
(5)
v1
u v
u for the last 10 replicates in the sets
Percent of replicates giving v1
u less than v
u
RE 100
(6)
(7)
(8)
1
300
94.34
92.67
5.55
5.53
54.67
107.67, 98.49 104.06, 100.43 111.82, 98.57 86.50, 111.00 90.53, 89.89
2
300
95.33
95.00
5.57
5.54
58.00
81.10, 92.10 93.48, 103.89 106.98, 76.06 88.88, 115.21 84.48, 94.84
3
400
97.00
96.75
5.59
5.58
54.50
102.22, 100.12 96.44, 81.83 02.83, 100.05 75.58, 97.89 123.26, 97.57
1000
95.70
95.00
5.57
5.55
55.60
Total
drawing of the sample a total of F 1000 times divided into 3 disjoint sets of turn 300, 300 and 400 replicates. Writing w for v1
u and v
u in p we calculate p the percentage of replicates for which the intervals
u 1:96 w; u 1:96 w cover the value of Y. Each interval has a nominal con®dence coe½cient of 95 per cent assuming normality. This realized per cent is called the ACP ± the actual coverage percent. Also, we calculate the ACV, the average coe½cient ofvariation. This is the average, over the F 1000 replicates of the value of p w 100. This re¯ects the length of the con®dence interval. Between v1
u u and v
u that one is preferable for which the ACP is closer to 95 per cent and the ACV is smaller. The actual simulated ®ndings are shown below in Table 2.
Mean square error estimation in multi-stage sampling
131
v1
u for the last 10 replicates v
u out of each of the 3 sets of replicates of samples numbering 300, 300 and 400 mentioned above to illustrate the e½ciency of v
u relative to v1
u ± the smaller it is the better the one proposed by us relative to the one given by Raj (1968). Here we also indicate the values of RE 100
Remark III. Each replicate gave us positive values for v1
u and v
u. Conclusion: The two variance estimators tried turn out quite competitive and adequately e¨ective. In an actual survey both should be calculated and a con®dence interval may be reported in terms of the one for which its length happens to be shorter. Our method at least provides a serviceable competitor against Raj's (1968). Acknowledgment: The authors gratefully acknowledge the helpful comments on an earlier draft from a referee which led to a substantial improvement in the presentation.
References Brewer KRW (1979) A class of robust sampling designs for large-scale surveys. Jour Amer Stat Assoc 74:911±915 Cassel CM, SaÈrndal CE, Wretman JH (1976) Some results in generalized di¨erence estimation and generalized regression estimation for ®nite populations. Biometrika 63:615±620 Chaudhuri A, Maiti T (1994) On the regression adjustment to Rao, Hartley, Cochran estimator. Jour Stat Res 29(2):71±78 Durbin J (1953) Some results in sampling theory when the units are selected with unequal probabilities. Jour Roy Stat Soc Ser B 15:262±269 Horvitz DG, Thompson DJ (1952) A generalization of sampling without replacement from a ®nite universe. Jour Amer Stat Assoc 47:663±685 Murthy MN (1957) Ordered and unordered estimators in sampling without replacement Sankhya 18:379±390 Ohlsson E (1989) Variance estimation in Rao, Hartley, Cochran procedure. Sankhya B 51:348± 367 Raj Des (1956) Some estimators in sampling with varying probabilities without replacement. Jour Amer Stat Assoc 51:269±284 ÐÐÐ (1968) Sampling theory. McGraw Hill NY Rao JNK (1975) Unbiased variance estimation for multi-stage designs. SankhyaÂ, C 37:133±139 ÐÐÐ (1979) On deriving mean square errors and their non-negative unbiased estimators in ®nite population sampling. Jour Ind Stat Assoc 17:125±136 Rao JNK, Hartley HO, Cochran WG (1962) On a simple procedure of unequal probability sampling without replacement. Jour Roy Stat Soc B24:482±491 SaÈrndal CE (1982) Implications of survey design for generalized regression estimation of linear functions. Jour Stat Plan Inf 7:155±170