StatistischeHefte Statistical Papers
Statistische Hefte 27, 2 2 7 - 2 3 7 (1986)
9 Springer-Verlag 1986
Unbiased estimation of income inequality R. Perez, C. Caso and M.A. Gil Received: July 7, 86
In a previous paper Gastwirth shows that a broad family of measures of inequality can be accurately estimated when the tax data are known in groups (more precisely, when we know the number of returns in each of several class intervals and their corresponding average income). In the present paper we show that some measures of the preceding family can be unbiasedly estimated when the tax data are individually known for a sample from the population. Specifically, we construct unbiased estimators of a particular measure of inequality in the samplings with and without replacement, and in the stratified samplings with and without replacement. Keywords:
finite population, sample, income inequality, unbiased estimator, samplings with and without replacement, stratified samplings.
I. INTRODUCTION.
Gastwirth (such
as
measures
(1975) pointed out that many useful measures of inequality
Theil Mh
and
Herfindahl
of relative
measures)
belong
to
the
family
of
inequality defined by
~ 0 h(x) d F(x)
-
h(~)
h(~) where
x
are positive
bution function
F(x),
and
a
h(x)
is
incomes following a probability ~
convex
is the average income function.
~; h(x) d F(x), Gastwirth occasionally
(In
order
law with distri-
(that is,~= ~ ; x to
supposed that
derive
dF(x))
bounds
for
h(O) = 0).
The preceding family of measures has been mentioned in previous papers
228 (e.g.,
AiEner
and
Hein
The measures
%
suggested
Gastwirth
assumed
by
(1967),
Atkinson
have been accurately (1972,
1975).
[ak_l,ak]
information:
are chosen,
i) the number
and
Bentzel
(1970)).
estimated by means of a method The
that the tax data are grouped,
[al,a2] . . . . .
(1970)
application
so that
k
of
this
classes
method
[ao,al] ,
and the following is the available
of incomes in each class,
and ii) the aver-
age income in each class. In
this
estimate
paper,
we
the population
from this population function lation
are
h(x)
income
first
income
going
to
verify
inequality
that
if
we consider
and make use of the measure
in order
a random
Mh
to
sample
for the convex
= x -1, we can define an unbiased estimator of the popuinequality
on the basis
of the sample
income
inequality
in the usual random samplings. We also
then
remark
be unbiasedly
difficulties and
of
Herfindahl
that
some
other
estimated
, although
computation, measures
measures
whereas
cannot
be
of inequality
their estimation
unbiased
estimators
immediately
defined
~
could
entails
more
of the Theil
by following
a
similar method. Finally,
we
emphasize
that
the
study above could be developed
for
the estimation of the industrial concentration. REMARK = x
-i
i.I:
It should be pointed out that the measure
, is the "additively
decomposable
to the "family of additively decomposable and characterized
by several
authors
index of order
index satisfies
population mean
size
independence
independence
Pigou-Dalton
such properties
symmetry,
studied
(1979 , Cowell
(1980) and Zagier (1983)).
as normalization
(oF principle
h(x) =
-I" belonging
(e.g., Bourguignon
(or income homogeneity),
condition),
for
indices" exhaustively
(1980), Eichhorn and Gehrig (1982),Shorrocks This
Mh
(oF minimality),
of population replication), principle
continuity
and
of transfers additive
(or
decom-
229 posability
(or
decomposability
into
"between-groups"
and
"within-
-groups".
2. U N B I A S E D
ESTIMATORS
Consider
a finite
IN THE S A M P L I N G S W I T H AND W I T H O U T
population with
N
REPLACEMENT.
income earners,
and suppose
that each individual income for a certain period of time is positive. Let
x[,
i=l,...,M,
be the possible
different
income
values
in this
population (x~ > 0). Let
N. l
be the number of individuals in the population with income M
x,~, i=l ..... M ( ~ N. = N). Then if we denote Pi = Ni/N' i=l ..... M, l l ' i=l and quantify the income inequality by means of the measure Mh with h(x) = x -1, we have DEFINITION 2.1.- The value
I-I
defined by
M
I-1 =
M
M
x~
~ ~ Pi - 1 = ~ ~ x~ PiPj - 1 i=l i i=l j=l
is called population income inequality.
In order sample of
to estimate n
the population inequality,
income earners drawn at random with or without replace-
ment from the population. Let sample with f
1
the
= n./n,
income
measure
Mh
DEFINITION 2 . 2 . -
n. 1
be the number of individuals in this M
x.~ , i=l,...,M 1
i=l .....
1
we now consider a
M,
with
and h(x)
The v a l u e
quantify = x 1-1 n
-1
( ~ i=l the
n
= n). Then,
if we denote
1 income
inequality
by
means
of
, we h a v e defined
by
I-1 = i=IM ~ Unl x-~. fi - 1 = i=iM ~ j~xM Xx~ f.f. - 1 n ~=~ . I J is called sample income inequality. Obviously, I -I n We
are
first
is an analogue estimate of the value going
to
examine
determining the expected value of
the I -I n
goodness
I -I.
of this
estimate
over all samples of size
by n,
230 when we adopt a random sampling with replacement.
This expected value
is given by M M x: M M M x~ 2: ~: ~ E(fifj) - 1 = ~: E(f2) + 2: ~ ~ E(fifj) - 1 i=l j=l i i=l i=l j=l l
E(Inl) =
j~i and the random vector with parameters
(nf I .... ,nfM)
having multinomial distribution
n, PI' "''' PM' we have
E(I -I) = (n-l)I-I/n n Secondly,
we are going to examine the goodness of
mining the expected value of we
adopt
a random
I-I n
I -I n
by deter-
over all samples of size
sampling without
replacement.
This
n, when
expected
value
is given by
E(In I) = N(n-l)I-I/(N-l)n since the random vector
(nfl, .... nfM)
has multivariate hypergeometric
distribution with parameters, N, D 1 = NPl, ..., D M = NPM, and
n.
The preceding results allow us to state no longer accurate but exact relations between In addition, lished
in
E(I -I) n
and
I -I.
unbiased estimators of
both,
the
samplings
with
I -I
can be immediately estab-
and without
replacement,
on the
basis of the expected values above computed. THEOREM 2.1.- In the random sampling with replacement from the population,
the
estimator
I~ 1 n
allocating
to each sample of
n
individuals
the value jl n
= n I-I/(n-l) n
is an unbiased estimator of THEOREM
2.2.-
population
individuals
the
In
the
estimator
the value
I-I.
random
sampling
(I~l) n
c
without
allocating
replacement to
each
from
sample
of
the n
231 (~1)c
(N-1)n I~l/N(n-1)
:
is an unbiased estimator of REMARK
2.1:
Obviously,
(?i) n
c
coincide.
with
parameters
hypergeometric
I -I.
when
N §
the expressions
In fact, when n,
PI'
"'''
distribution
N §
PM'
with
for
I
n
and
1
the multinomial distribution
is
the
limit
parameters
N,
of
the
multivariate
D 1 = NPl , ..., D M =
= Np M.
3.
UNBIASED
ESTIMATORS
IN
THE
STRATIFIED
SAMPLINGS
WITH
AND
WITHOUT
REPLACEMENT.
Consider that
it
a finite
is possible
population to
divide
with it
N
income earners,
into
r
subgroups
and suppose
(as relatively
homegeneous with respect to the income inequality as possible,
on the
basis of the information supplied by the available publications of the Census
or other Services reports).
Assume
that each individual
for
a certain period of time is positive,
the
possible
different
income
values
and let
in
the
x~,
income
i=l,...,M,
considered
be
population
(x[ > o). Let
Nik
be the number of individuals M
income Pik
in the k-th subgroup with
r
x~,
i=l ..... M, k=l ..... r ( ~ Z Nik = N). Then, if we denote i=l k=l = N i k / N , i=l,...,M, k=l,...,r, and quantify the income inequality
by means of the measure DEFINITION
3.1.-
The value M
I -I =
Mh
with I-I
r
h(x) .= x
-I
, we have
defined by M
M
r
r
x~
Z ~ ~ Z --II~ Z x~' ~ Pik - 1 = ^~ PikPjl - i i=l x k=l i=l j=l k=l I=i
is called population income inequality. In order to estimate population inequality, we now consider a sample of
n
income
earners
drawn
at random from the population according
232 to a stratified sampling with proportional allocation. In other words, we now conside.r a sample of without
replacement
nk
individuals drawn at random with or
for the k-th
subgroup,
being
nk/n = Nk/N
(with
M
Nk = i=l~ Nik),
k=l,...,r. Let
nik
be the number of individualsM r with
income x~'l in the sample from the k-th subgroup ( ~: >2, nik = r i=l k=l = Z n k = n). Then, if we denote fik = nik/n' i=l ..... M, k=l ..... r, k=l and quantify the income inequality by means of the measure M h with h(x) = x
-i
, we have
DEFINITION 3.2.- The value M
~n
I -I n
r
I-l- n
defined by
fik-l= i=l
i k I
M
M
z
z
r
r
x~
fikSl-i
i=l j=l k=l i=i
i
is called sample income inequality.
As
in the preceding section we are first going to compute the ex-
pected value of the analogue estimate I-I n when
we
adopt
stratified
sampling with
over all samples of size n, replacement
and proporcional
allocation. This expected value is given by M
E(1-1) : Z n i=l M
+
and
the
M
r
M
x~
~ Z ~ i=l j=l ~ j~i
random
r
x~
M
Z Z Z x~ ~ E(fikfjl) j = l k = l 1=1 i r
- 1 : M
M
r
Z Z E(f2k)i + i=1 k = l x*
r
r
~ E(fikfjk) + X ~ x~ X Z E(fikfjl) - 1 k=l i=l j=l k=l i=i l~k
vectors
(nflk ,...,
nfMk )
multinomial distribution with parameters
nk,
being
independent
NPl k /Nk,
with
..., NPM k /Nk,
(k=l ..... r), we have E(I~ 11 = I-I -
with
I -I (k)
an unbiased follows
r nk Z --~ I - l ( k ) k=ln
the income inequality of the k-th subgroup,
estimator
for which
can be constructed by applying Theorem 2.1, as
233 M M x82 nfik nf.k I-l(k)nk = n k [ Z 7 __~ x.* n k nk~ _ 1 ]/(nk-l) i=l j:l 1 Consequently,
an unbiased
estimator
of
I -I
can be
immediately
established in the stratified random sampling above studied THEOREM
3.1.-
In the stratified random sampling with replacement and
proportional
allocation,
sample of
individuals the value
n
the estimator
(i-~)s n
allocating
to each
A
(inl)S
I- 1 =
n
r 1 (nk/n) 2 k=l]~ nk- ~-
+
is an unbiased estimator of On the other of the analogue
~k I
(k)
I-I.
hand, we are now going to compute the expected value estimator
I -I
over all samples of size
n, when we
n
adopt
the
stratified
random
sampling
without
replacement
and
with
independent
with
proportional allocation. This expected value is given by r nk Nk - nk E(Inl) = I-I - k~l n 2 N-k [ T l-l(k)
since
the random
multivariate = NPlk,
vectors
(nflk , ...,
hypergeometric
... DMk = NPMk, and
inequality
of the k-th
distribution
nfMk ) with
n k, (k=l,...,r).
subgroup,
are
parameters I-l(k)
Nk,
Dlk
=
being the income
for which an unbiased estimator can
be constructed by applying Theorem 2.2, as follows M M (I-~)C(k) = nk(Nk-l) [ Z ~ nk i=lj=l Consequently,
an unbiased
x~ nfik nfjk x$ n k n k - l]/(nk-l)Nk 1 estimator
of
I
-i
can be immediately
established. THEOREM
3.2.-
In
the
stratified
random
and with proportional
allocation,
the estimator
to each sample of
n
samplinE
individuals the value
without
replacement
(I~l) sc
allocating
234 r 1 Nk(N k - n k) (i-I) sc = i -I + - n n N2 k=l nk - 1 is an unbiased estimator of REMARK 3.1:
Obviously,
(i-l)C(k) nk
I -I.
Nk § ~
when
(k=l ..... r),
(I~)
s and
(j1)sc n
coincide. REMARK
3.2:
r=l,
When
(i'~-l)s (in Theorem 3.1) and
I~ I
n
2.1) coincide,
(Inl) sc
and
(in Theorem
n
(in Theorem 3.2)
and
(I~i) c (in Theorem
2.2) coincide too. REMARK 3.3: Results in this section cannot be immediately Section 2 and the decomposability decomposition
includes
derived
from
of the index of order -I, since this
coefficients
depending also on the sample
fre-
quencies.
4.
DIFFICULTIES
IN
THE
UNBIASED
ESTIMATION
OF OTHER
MEASURES
OF
INCOME
INEQUALITY.
The
procedures
and results
for the measures
of inequality
problem
developments
the
in
such
statistical
in Sections Mh
with
2 and 3 could be developed h(x) = x -m (m gIN).
The only
is to compute moments of high order
distributions
which
arise
in
exact
relations
the
samplings
for
above
considered. On value
the of
other the
hand,
sample
to
state
income
inequality
between
the
and the population
expected
income
in-
equality, when they are quantified by means of the Theil and Herfindahl measures h(x)
is not
= x log x
possible, and
since
h(x) = x 2
from
the
convexity
of t h e
functions
we can only establish some boundary
relations. These
last
argument
could
be
used
for justifying
in estimating other measures in the family
M h.
the difficulties
235 5. CONCLUDING REMARKS. The method used in the preceding sections could be followed in order to
estimate
industrial
population.
In
the
ecological
particular,
entropy of a variable both,
and
random
this
concentration
method
has
been
in a finite population
samplings
with
in
a
finite
expounded
for
the
(P~rez et al. (1986)) in
and without replacement,
and results
in this last paper might be applied to the concentration estimation. In addition, cerning
two
the unbiased estimation of the mutual information con-
random
has been previously estimation
could
variables
in
developed
(Gil et al. (1986)). In particular,
be
applied
the
to approximate
by a discrete constant channel with or
the
mean
motivated the
increase
by
ideas
random samplin E with replacement
finite
in concentration
the
information processed
input and output alphabet,
under a classification process
the adoption of another classification process.
in
this
paper
could
be
this
also
followed
for
Clearly,
estimatin E
the
mutual information in the stratified samplings. ~hese latter studies and the present one may be completed by calculating and estimatin E the precision of the unbiased estimators
and by
determinin E
the
income
inequality,
entropy or mutual information, which could be achieved with
the
use
the
has
already
finite
of
suitable sample size for estimatin E population
moments
been
in
the
accomplished
population
Gil et al. (1986)).
for
accompanying
Appendix.
Such a process
for the estimation of the entropy
nonstratified
samplings
(P~rez
et
al.
in a
(1986),
236 APPENDIX
I. MOMENTS OF THE NULTINOMIAL
(n I .... n M)
Let
n, PI''''PM
DISTRIBUTION
be a random vector with multinomial distribution,
( T. i-! N Pi = i) are the parameters.
where
Then,
E(n i) = np i S(n 2) = nPi[(n-l)p i + I]
E(n3) = nPi[(n-1)(n-2)Pi2 + 3(n-l)Pi § I] ECn4) = nPi[Cn-l)(n-2)Cn-3)pi3 + 6Cn-l)(n-2)pi2 § 7(n-1)Pi § I] E(nin j) = n(n-l)pip j
,
i ~j
E(n.2n2) = n(n-l)piPj[(n-2)(n-3)piPj§ i J
+ pj) + i] ,
E(ni3nj) : n(n-l)piPj[(n-2)(n-3)Pi2 + 3(n-2)p i + 1] , 2
E(ninjn k) = n(n-l)(n-2)piPjPk[(n-3)p i + 1] E(ninjnknl) = n(n-l)(n-2)(n-3)piPjPkPl
2. MOMENTS OF THE MULTIVARIATE
Let
(nl,...n M)
tribution, where
,
,
i ~ j
i ~j
j ~ i , k ~ i,j
j~i , k~i,j , l~i,j,k
HYPERGEOMETRIC
DISTRIBUTION
be a random vector with multivariate hyperEeometric disN, D 1 = NPI,...D M = NPM
and
n
are the correspondin E pa-
rameters. Then, E(n i ) = nP i E(n~) = (N-n)nPi[N(n-l)Pi/(N-n)
E(n~) =
+ 1]/(N-I)
(N-n)nPi[N2(n-l)(n-2)p~/(N-n)
+ 3N(n-l)p i + N - 2n]/(N-I)(N-2)
E(n~) = (N-n)nPi[N3(n-l)(n-2)(n-3)p~/(N-n) N(n-l)(7N-lln+l)Pi
E(nin j ) = Nn(n-l)piPj/(N-l)
+ 6(n-l)(n-2)N2pf +
+ N(N+I) - 6n(N-n)]/(N-I)(N-2)(N-3) ,
i / j
22
E(n.n.) = n(n-l)N2(N-n)piPj[N(n-2)(n-3)piPj/(N-n) + J + (n-2)(Pi + pj) + (N-n-I)/N]/(N-I)(N-2)(N-3)
E(n~nj)
=
(N-n)N2n(n-l)piPj[N(n-2)(n-3)p~/(N-n)
+ 3(n-2)pi + (N-2n+I)/N]/(N-I)(N-2)(N-3)
,
i ~ j
,
i ~ j
+
2 E(ninjn k) = (N-n)N2n(n-l)(n-2)piPjPk[N(n-3)Pi/(N-n) + I]/(N-I)(N-2)(N..3) j~i,k~i,j E(ninjnkn I) = N3n(n-l)(n-2)(n-3)piPjPkPl/(N-l)(N-2)(N-3),j~i;k~i,j;l~i,j,k
237 REFERENCES
Aigner, D.J. and A.J. Hein (1967), A social welfare view of the measure ment of income equality, Review of Income and Wealth 73, 12-25. Atkinson, A.B. (1970), On the measurement of inequality, Journal of Economic Theory, 2, 244-263. Bentzel, R. (1970), Measures of income inequality, Review of Income and Wealth. Bourguignon, F. (1979), Decomposable income inequality measures, Econometrica, 47, 901-920. Cowell, F.A. (1980), On the structure of additive inequality measures, Review of Economic Studies, 47, 521-531. Eichhorn, W. and W. Gehrig (1982), Measurement of Inequality in Economics, in: B. Korte ed.: Modern applied Mathematics-Optimization and Operations Research (North-Holland, Amsterdam). Gastwirth, J.L. (1972), The estimation of the Lorenz curve and Gini index, The Review of Economics and Statistics, 63, no.3, 306-316. Gastwirth, J.L. (1975), The estimation of a family of measures of economic inequality, Journal of Econometrics, 3, 61-70. Gil, M.A., P~rez, R. and I. Mart~nez (1986), The Mutual Information. Estimation in the sampling with replacement, R.A.I.R.O.- Rech.Op~r., Vol. 20, n ~ 3. Herfindahl, O.C. (1950), Concentration in the steel industry, P h . D . dissertation (Columbia University, New York). P~rez, R., Gil, M.A. and P. Gil (1986), Estimating the uncertainty associated with a variable in a finite population, Kybernetes (in
press). Shorrocks, A.F. (1980), The class of additively decomposable inequality measures, Econometrica, 48, 613-625. Theil, H. (1967), Economics and Information Theory (North-Holland, Amsterdam). Zagier, D. (1983), On the decomposability of the Gini coefficient and other indices of inequality, Discussion paper No.108, Projektgruppe Theoretische Modelle (Universit~t Bonn).
R. P~rez and C. Caso Departamento de Estad~stica y Econometr[a Facultad de Econ6micas Universidad de Oviedo 330710viedo, SPAIN
M.A.
Gil
Departamento de Matem~ticas Facultades de Ciencias Universidad de Oviedo 330710viedo, SPAIN