Stat Papers (2012) 53:971–985 DOI 10.1007/s00362-011-0401-6 REGULAR ARTICLE
A measure of asymmetry P. N. Patil · P. P. Patil · D. Bagkavos
Received: 27 May 2010 / Revised: 14 July 2011 / Published online: 9 August 2011 © Springer-Verlag 2011
Abstract It is a general practice to make assertions about the symmetry or asymmetry of a probability density function based on the coefficients of skewness. Since most of the coefficients of skewness are designed to be zero for a symmetric density, they overall do provide an indication of symmetry. However, skewness is primarily influenced by the tail behavior of a density function, and the skewness coefficients are designed to capture this behavior. Thus they do not calibrate asymmetry in the density curves. We provide a necessary condition for a probability density function to be symmetric and use that to measure asymmetry in a continuous density curve on the scale of −1 to 1. We show through examples that the proposed measure does an admirable job of capturing the visual impression of asymmetry of a continuous density function. Keywords
Asymmetry measure · Correlation · Nonparametric · Skewness
1 Introduction Qualitative features such as skewness and symmetry of a probability model play an important role in mathematical as well as in applied statistics. For objectivity, their mathematical quantification is essential. But, for example, in the case of skewness,
P. N. Patil School of Mathematics and Statistics, The University of Birmingham, Birmingham, B15 2TT, UK P. P. Patil Department of Statistics, Miami University, Oxford, OH 45056, USA D. Bagkavos (B) Accenture, 1 Arkadias Street, 14564 Athens, Greece e-mail:
[email protected]
123
972
P. N. Patil et al.
owing to its usefulness and to a degree of difficulty in capturing and representing it in a ‘simple’ numerical value, there are several measures of skewness; not all are useful when comparing skewness of a probability density function with the other. Nevertheless, by and large, the mathematical understanding of when one probability density function is more (or less) skew than the other is provided by van Zwet (1964) and the requirement it induces on any measure of skewness is well accepted. For the latest attempt to measure skewness together with references to earlier work on the topic see Critchley and Jones (2008). In contrast, the concept of symmetry, at least in one dimension, seems to be well understood in mathematical terms and plays a critical role in statistical procedures. For example, symmetry of an underlying population is one of the main assumptions of the Mann-Whitney type tests of location parameter, whereas, as shown in Arias-Nicolás et al. (2009), asymmetric probability distributions play a role in robust Bayesian inference. In this article the focus is on univariate symmetry. Nonetheless, we note that there are interesting and diverse forms of generalization of the concept of univariate symmetry to higher dimensions, see for example measures of symmetry for copulas in Siburg and Stoimenov (2008), and symmetry around the main diagonal in contingency tables in Ye and Bhattacharya (2009). Thus, owing to the important role symmetry plays in statistics, in the statistics literature one finds several procedures to test the symmetry. But surprisingly, there is very limited amount of work which could be used to suggest whether one probability model (or data set) is more symmetric or asymmetric than the other. The very first structured attempt to measure symmetry is in Doksum (1975) where an index of asymmetry for the central 100(1 − α)% (0 < α < 1) of the distribution is defined (see also MacGillivray 1986 for a short discussion). In Li and Morris (1991) the drawbacks of the classical skewness measure based on the third central moment when used as an asymmetry measure are noted, and a new measure (s4 , discussed in Sect. 2) of symmetry is proposed, and very recently in Boshnakov (2007) a measure of asymmetry with respect to mode has been developed. Although these concepts or measures may, in some sense, seem a reasonable way to quantify asymmetry, they still do not seem either user friendly or intuitive enough to visualize the amount of asymmetry present in a density curve. In fact, this lack of user friendly mathematical quantification has more often lead to the use of skewness measures to make assertions about symmetry. While this is valid up to an extent, as shown in Li and Morris (1991) and in Sect. 3.6 assessing symmetry of two skewed probability density functions on the basis of their skewness coefficients is not appropriate. This is because skewness is primarily influenced by the tail behavior of the density function, whereas an assessment of symmetry or asymmetry is based on the visual impression of the whole curve, or at least the majority of the curve. In this article we provide a measure on a scale of −1 to 1 to quantify the amount of asymmetry of a continuous probability density function, with the value zero referring to a symmetric density and ±1 referring to respectively positively and negatively most asymmetric densities. Our proposal is based on the necessary (but not sufficient) condition for a density to be symmetric, and hence it may be appropriate to refer to it as a weak measure of asymmetry. However, we show through examples that it does an admirable job of capturing the visual impression of asymmetry of continuous density curves. In fact, it certainly explains asymmetry in a continuous density curve better
123
A measure of asymmetry
973
than the skewness coefficient. For example, the magnitudes of the proposed measure of two different continuous densities are in sync with their visual ordering of symmetry. This suggests one of the most important applications of the proposed measure in economics. Although there is no agreement on the ideal income distribution among economists with regard to how symmetric or asymmetric it should be, it is generally modelled by a continuous curve. In this respect, the proposed measure would provide a more meaningful and objective comparison of continuous income distributions over different periods or of different countries by means of their asymmetry. Since in this article we will be dealing mainly with continuous probability density functions, a density function would always be a continuous function unless specified otherwise. In the next section we state the necessary condition for a probability density function to be symmetric and use it to propose a measure of asymmetry (and hence a measure of symmetry). To illustrate the appropriateness of the proposed measure, we provide several examples of continuous density curves in Sect. 3 together with their asymmetry coefficients. Estimation of the proposed measure of asymmetry from a sample is considered in the last section.
2 Measure of asymmetry A continuous probability density function f (x) with associated distribution function F(x), x ∈ R, is said to be symmetric about θ if F(θ − x) = 1 − F(θ + x) (or equivalently f (θ − x) = f (θ + x)) for every x ∈ R. Based on this definition there are several tests in the literature to assess the symmetry of an unknown density f (x) based on a random sample from f (x), see for example Ekström and Jammalamadaka (2007) and references therein. However, this definition or the tests developed, do not help to compare or quantify the asymmetry of one probability density function with the other. For example, Butler (1969) has proposed a test of symmetry based the sample version of s1 (F) = supx≤0 |F(θ + x) + F(θ − x) − 1|, whereas Boos (1982) and Rothman and Woodroofe (1972) respec tively use the sample versions of s2 (F) = [F(θ + x) + F(θ − x) − 1]2 d x and s3 (F) = [F(θ + x) + F(θ − x) − 1]2 d F(x) to test the symmetry. Note that s1 , s2 and s3 are all zero for symmetric and positive for asymmetric probability density functions. Also the numerical values of s1 , s2 and s3 by themselves do not help in determining the direction of asymmetry. But as pointed out by one of the referees, there may be ways to obtain direction indicators based on these measures. For examof s2 (F), set dθ (x) = F(θ + x) + F(θ − x) − 1. Then the difference ple, in the case 2 2 {x|dθ (x)>0} dθ (x) d x − {x|dθ (x)<0} dθ (x) d x can be taken as an indicator of the direction of symmetry. Thus the sample versions of s1 , s2 and s3 do indicate departure from symmetry and hence when it comes to testing symmetry they provide useful tools. However, they were not designed to quantify and/or calibrate asymmetry and accordingly are not useful in doing so. For example if η measures asymmetry, then based on the visual impressions of the Folded Normal and Log-normal probability density functions (Fig. 1), one would like η of the Folded Normal density to be bigger than η of the Log-normal density function. But, with θ = median, it is easy to see that s1 (Folded Normal)
123
974
P. N. Patil et al.
< s1 (Log-normal), indicating that the Folded Normal density is less asymmetric than the Log-normal density which is an exact opposite of the visual impression. Further, it should also be noted that in order to compute si (i = 1, 2, 3), one must know the point of symmetry or the point from where one would like to measure the asymmetry. in Li Some of the negative attributes associated with si (i = 1, 2, 3) are remedied and Morris (1991), where the measure of asymmetry is taken to be s4 = | f (θ + x) − f (θ − x)| d x. Though intuitive, s4 is not as user friendly as the measure proposed in this article and, in addition, estimation of s4 from a sample poses additional problems. Finally, as with si (i = 1, 2, 3), to compute s4 one needs to know θ and moreover, s4 does not indicate the direction of asymmetry. Here, we first establish a new necessary condition, as stated in lemma 1 below, for a continuous random variable with continuous probability density function f (x) to be symmetric. It is then used to develop a new measure of symmetry. Lemma 1 Let X be a continuous symmetric random variable with square integrable continuous probability density function f (x) and distribution function F(x) then, Cov( f (X ), F(X )) = 0. Proof Note that Cov( f (X ), F(X )) = 0 means E[ f (X )F(X )] = E[F(X )]E[ f (X )]. Now since F(X ) is uniformly distributed over (0,1), to prove the lemma we need to show that
1 F(x) f (x)d x = 2
f 2 (x)d x.
2
R
R
For that, without loss of generality take the point of symmetry to be zero. Then, since 2 2 X is a symmetric random have2F(−x) = 1 − F(x) and f (−x) = f (x) ∞ 2variable we 1 for any x ∈ R and 0 f (x)d x = 2 R f (x)d x. Using these identities we have,
0 F(x) f (x)d x =
F(x) f 2 (x)d x
2
−∞ ∞
R
∞ F(x) f (x)d x +
2
=
0
∞ F(−y) f 2 (−y)dy +
0
F(x) f 2 (x)d x 0
∞ ∞ 2 = [1 − F(y)] f (y)dy + F(x) f 2 (x)d x 0
123
0
A measure of asymmetry
975
∞ =
f 2 (x)d x 0
=
1 2
f 2 (x)d x, R
which completes the proof.
Remark 1 The fact that the above condition is not sufficient may be noticed from the proof. One can easily construct density functions with the above property which are not necessarily continuous. Nevertheless, a counter example where a density function is continuous, is in order. Take f (x) to be the probability density function of a Weibull random variable with scale parameter σ and shape parameter α. That is α α−1 f (x) = σα σx exp{− σx }, for 0 < x < ∞ where σ > 0 and α > 1/2. In −1 2 , but f (x), this case E[ f (X )F(X )] = E[F(X )]E[ f (X )] for α = 2 − lnln1.5 according to the definition, is not a symmetric function in a strict mathematical sense. Now note that, for a random variable with (continuous) monotonically decreasing probability density on its support (positively asymmetric), Cov( f (X ), F(X )) < 0 and for monotonically increasing probability density function on its support (negatively asymmetric) Cov( f (X ), F(X )) > 0. Thus, the values of Cov( f (X ), F(X )) clearly distinguish symmetric, positively asymmetric and negatively asymmetric densities. Further, by changing the origin, though the location of the probability density function would change, there would not be any change in its symmetry or asymmetry feature and the value of Cov( f (X ), F(X )) will still have the same interpretation as before. But seeing Cov( f (X ), F(X )) directly as a measure of symmetry brings a couple of drawbacks. By changing the scale there will not be any change in its symmetry or asymmetry feature, in the sense that its visual impression with regard to symmetry or asymmetry will not change, however the values of Cov( f (X ), F(X )) will change. Further, if we assume that Cov( f (X ), F(X )) measures the amount of symmetry or asymmetry, comparing the values of Cov( f (X ), F(X )) for two different densities may not match with the visual impression of the order of symmetry or asymmetry in the two densities because Cov( f (X ), F(X )) is not a standardized quantity. Thus, to overcome these drawbacks we propose to define η(X ), a measure or coefficient of asymmetry of a random variable X , as η(X ) =
− Corr( f (X ), F(X )) 0
if 0 < Var( f (X )) < ∞ if Var( f (X )) = 0.
123
976
P. N. Patil et al.
The requirement of Var( f (X )) < ∞ leads to the condition ∞ f 3 (x)d x < ∞.
(1)
−∞
In the present context one may replace the requirement (1) by a stronger but more easy to understand condition that f (x) be bounded above for every x ∈ R. Note that we have also assumed that f (x) be a continuous function. The important and obvious properties of this measure are: P1. For a symmetric random variable X , if (1) holds, then η(X ) = 0. P2. If Y = a X + b where a > 0 and b any real number, η(X ) = η(Y ). P3. If Y = −X, η(X ) = −η(Y ). Thus the values of η(X ) closer to zero will mean the density function is close to being a symmetric function whereas closer to ±1 will mean the density function is close to being the most positively or most negatively asymmetric function. If, instead of asymmetry, one were to quantify symmetry, say by the symmetry coefficient η , then η = (1 − |η|). In this case, for a symmetric density η = 1 and for a density which is ‘less’ symmetric we have the symmetry coefficient η between zero and one. Of course, one may multiply it by the sign of η to indicate ‘positive’ or ‘negative’ symmetry, although it does not seem as natural to call a curve positively (or negatively) symmetric as it does to call a curve positively (or negatively) asymmetric. We now provide some examples which help to show how this measure is reflective of the visual impression of the symmetry or asymmetry of a probability density function, particularly when comparing two density curves. 3 Examples and discussion 3.1 Symmetric density functions Clearly for the Cauchy, Normal, Uniform and any other continuous symmetric density function bounded above, η(X ) = 0 by Lemma 2.1. 3.2 Exponential probability density function Let X follow the exponential distribution with probability density function f 1 (x) = exp(−x), if 0 < x < ∞, and zero otherwise. Then, since F1 (x) = 1 − f 1 (x), if 0 < x < ∞, and zero otherwise, clearly the asymmetry coefficient of the exponential density is η1 = 1. That is, the exponential random variable has the most positively asymmetric probability density function. Similarly, for the negative exponential random variable Y (= −X ), η(Y ) = −1 and is the most negatively asymmetric probability density function.
123
A measure of asymmetry
977
Thus the values of the asymmetry coefficient η appropriately reflect the asymmetry feature in the two extreme cases namely, symmetric and completely (positively or negatively) asymmetric curves. Now we consider examples where the density curves falls in between these two extremes with respect to the symmetry feature. 3.3 Folded normal and log-normal probability density functions The folded normal distribution with probability density function √ f 2 (x) = 2 ( 2π )−1 exp(−x 2 /2), if x > 0, = 0 otherwise, is clearly asymmetric. But when plotted against the exponential (Fig. 1), it is obvious that the amount of asymmetry in the folded normal is slightly less than what one notices in the exponential density curve. Thus an ideal measure of asymmetry would assign a smaller value for the folded normal than the exponential and this
distribution Z2 <∞ is indeed true for the proposed measure. Note that for A = P 0 < Z 1 < √ 2 and B = P[0 < Z 1 < Z 2 < ∞] where √ Z 1 and Z 2 are independent standard normal variables, η2 = −8(A − B)33/4 (2 − 3)−1/2 = 0.9526 which is less than η1 . Similarly, the log-normal distribution with probability density function √ f 3 (x) = x −1 ( 2π )−1 exp(−( log x)2 /2), if x > 0, = 0 otherwise, seems less asymmetric than the folded normal density, see Fig. 1. Again as one would Z2 like, for C = P − ∞ < Z 1 < √ − 21 , −∞ < Z 2 < ∞ , the asymmetry coefficient 2 √ −1/2 = 0.909649 for the log-normal is, η3 = −33/4 (2C − 1) 2 exp(1/6) − 3 which is less than η2 . 3.4 Mixtures of normal probability density functions Let f 4 (x; p, μ1 , μ2 , σ12 , σ22 ) be the probability density function of a mixture p N (μ1 , σ12 ) + (1 − p) N (μ2 , σ22 ). The exact expression of the asymmetry coefficient, η4 ( p, μ1 , μ2 , σ12 , σ22 ), for the mixture of normal densities is too long and is given in the Appendix. For illustration, we fix μ1 = 0, μ2 = 2, σ12 = 1 and σ22 = 4, and the asymmetry coefficient η4 ( p) as a function of p is plotted in Fig. 2. Clearly, for a pair of values of p one gets the same asymmetry coefficient with the maximum asymmetry for p ≈ 0.491. Thus, the density curves with asymmetry coefficient equal to 0.1, 0.2, 0.3, and 0.4 which correspond to p = 0.945, 0.872, 0.773 and 0.606 respectively are plotted in Fig. 3. The other set of values of p producing the curves with asymmetry coefficients 0.1, 0.2, 0.3, and 0.4 are 0.101, 0.175, 0.256 and 0.382 respectively.
123
978
P. N. Patil et al. 1.0
Fig. 1 Probability density curves for the exponential, folded normal and log-normal distributions
0.6 0.4 0.0
0.2
probability density
0.8
exponential folded normal lognormal
0
1
2
3
4
x
0.3 0.2 0.0
0.1
probability density
0.4
0.5
Fig. 2 symmetry coefficient η4 ( p)
0.0
0.2
0.4
0.6
0.8
1.0
s
3.5 Weibull probability density function Since the proposed coefficient of asymmetry is invariant under the scale transformation, we re-write the Weibull probability density function with only the shape parameter, f 5 (x) = α x α−1 exp(−x α ), if x > 0, = 0 otherwise, where the shape parameter is now α ≥ 1 since the asymmetry coefficient η is not defined for α < 1. Here, since the shape changes with α, the asymmetry coefficient is a function of α denoted by, η5 (α) and is given by
123
A measure of asymmetry
979
0.4
Fig. 3 Normal mixture density curves with asymmetry coefficient η4 equal to 0.1, 0.2, 0.3, and 0.4
0.2 0.1 0.0
probability density
0.3
η4 =0.1, p=0.945 η4 =0.2, p=0.872 η4 =0.3, p=0.773 η4 =0.4, p=0.606
-4
0
2
4
x
6
8
10
α=2 α=3 α=4 α=5
1.0 0.5 0.0
probability density
1.5
2.0
Fig. 4 Weibull density curves for shape parameter α = 2, 3, 4, 5
-2
0.0
0.5
1.0
1.5
2.0
2.5
3.0
x
(2α−1)/α 3 − 21+(2α−1)/α 2α − 1 α 6(2α−1)/α −1/2 2α−1 ( 2α−1 3α−2 α α )( α ) × (3α−2)/α − 3 22(2α−1)/α
√ η5 (α) = − 3
where (u) is the gamma function. The Weibull density curves for selected values of α are plotted in Fig. 4 and the function η5 (α) is plotted in Fig. 5. Although none of the coefficients of skewness provide quantified information on the amount of asymmetry, their zero values do reflect the actual symmetry with few exceptions. That is because they are designed to be zero only by one of the many
123
980
P. N. Patil et al.
-0.5
0.0
η 5(α)
0.5
1.0
Fig. 5 Function η5 (α) for α ranging from 1 to 10
0
2
4
α
6
8
10
necessary conditions for a density to be symmetric. For example, the necessity of the third central moment being zero plays such a role in the skewness coefficient based on the third moment or the necessity of the equality of the median and mode for unimodal density to be symmetric plays a similar role in the skewness coefficient based on the difference between the median and mode. Nonetheless, in the case of a Weibull distribution respectively for α1 ≈ 3.6 and α2 ≈ 3.26 the third central moment and the difference between the median and mode are zero, MacGillivray (1986), yielding corresponding skewness coefficients zero. But, at none of these values of α Weibull density is symmetric in the strict mathematical sense, owing to the very thin and negligible tail. Though for all practical purposes these curves give the visual impression of being close to symmetric. Note that the proposed asymmetry coefficient η5 (α) = 0 2 −1 for α = 2 − lnln1.5 ≈ 3.44 indicating symmetry. But again, though the visual impression of the density curve at α ≈ 3.44 is of a symmetric curve it is not symmetric in the strict mathematical sense. More importantly this value of α is not only in the close neighborhood of α1 and α2 , but it is almost the middle of the interval (α1 , α2 ). 3.6 Asymmetry and skewness Consider the probability density function f 6 (x) = (α − 1)(x + 1)−α , if x > 0, = 0 otherwise, where α > 1 and compare it with f 1 (x) for asymmetry and skewness. Let α = 2 for the pictorial illustration. Then from Fig. 6, clearly f 6 (x) is more skewed to the right than f 1 and we verify this by using the van Zwet (1964) criterion. For that, note that the distribution functions of f 6 and f 1 are respectively, F6 (x) = 1−(x +1)−α+1 , for x > 0
123
A measure of asymmetry
981
1.0
Fig. 6 Density curves f 1 and f6
0.6 0.4 0.0
0.2
probability density
0.8
f_1 f_6
0
2
4
6
8
x
and F1 (x) = 1 − e−x , for x > 0. Then F6−1 (F1 (x)) = e x/(α−1) , for x > 0, which is convex on 0 < x < ∞. This implies, in van Zwet’s notation, F1 < F6 , or f 6 is c more skewed to the right than f 1 . But as far as asymmetry is concerned, from Fig. 6, the (probability) mass under f 6 is relatively more evenly spread than the mass under f 1 , suggesting the asymmetry in f 6 is less than the asymmetry in f 1 . This indeed is the case when asymmetry is measured by the asymmetry coefficient proposed here. That √ 3(α−1)(3α−1)3α−2 η1 = 1 for every α > 1 and in particular η6 (2) = 0.968 is, η6 (α) = < and η6 (α) → 1 as α → ∞.
3.7 Estimation of η Our main interest in this section is to propose a method or methods of estimation of the asymmetry coefficient rather than to evaluate the properties of the proposed estimators analytically. Nevertheless, it is obvious that the estimators proposed below could be shown to be consistent, for example, by arguments similar to that in Giné and Mason (2008). Further, the simulation study, carried out at the end, provides insight into the biases and variances of the proposed estimators. Let X 1 , X 2 , . . . , X n be a random sample from a continuous density function f (x). Then the simplest estimate of η would be to define the obvious sample counter part of η, that is, the negative of the correlation coefficient between f (X i ) and F(X i ), i = 1, 2, . . . , n. But since f and F are unknown, we replace them by their respective nonparametric estimates. For example, we take estimate of f to be
fˆ(x) = (nh)−1
n i=1
K
x − Xi , h
123
982
P. N. Patil et al.
where K is a kernel and h, the bandwidth, and replace F by the smooth estimator x
ˆ F(x) =
fˆ(u)du.
−∞
ˆ i ), U¯ = n Ui /n and V¯ = n Vi /n we have Thus for Ui = fˆ(X i ), Vi = F(X i=1 i=1 n
¯ ¯ i=1 Ui Vi − n U V . ηˆ = − n n ( i=1 Ui2 − nU¯ 2 )( i=1 Vi2 − n V¯ 2 ) Improved estimation of η can be achieved by using better estimators of f and F. The first alternative is to use the nonparametric maximum likelihood estimator Fn (x), i.e. the empirical distribution function, of F(x). This results in η˘ defined by, n
¯ ¯ i=1 Ui Wi − n U W , η˘ = − n n ( i=1 Ui2 − nU¯ 2 )( i=1 Wi2 − n W¯ 2 ) n W /n. where Wi = Fn (X i ) and W¯ = i=1 i Since F(X ) ∼ U (0, 1) a second alternative estimator η˜ is obtained by replacing E[F(X )] and Var(F(X )) in the definition of η by 1/2 and 1/12 respectively. That is,
n η˜ = −
i=1 Ui Vi
n 12
n
− n2 U¯
2 i=1 Ui
− nU¯ 2
We now implement estimators η, ˆ η˘ and η˜ for the Weibull distribution with shape parameter a = 2, 3, 4, 5 (W (a)), the standard normal (N) and folded normal (FN) distributions, the Lognormal distribution (LN) and the normal mixtures (NM1-NM8), which correspond to p = 0.945, 0.101, 0.872, 0.175, 0.773, 0.256, 0.606, 0.382 respectively, discussed in Sect. 3.4. Note here that in implementing fˆ we use reflection so that we avoid boundary issues, see for example Jones (1993, p. 137). The results of the simulation are given in Tables 1 and 2 where we use samples of size 30 and 50 to calculate the η estimates for each distribution, its MSE and its variance. The average of each quantity over 1000 iterations is reported on the tables. In terms of MSE’s we feel that the results are all acceptable as estimates of the true η, given the small sample sizes used. The results indicate that η˜ and η˘ compete closely in terms of precision with η˜ to be marginally better especially in the normal mixture examples on both sample sizes.
123
A measure of asymmetry
983
Table 1 Average mean square errors, estimate values and variances for estimators η, ˆ η˘ and η˜ from 1000 iterations with sample size n = 30 f
η
ηˆ MSE
η˘
η˜
ηˆ η˘ Average value
η˜
ηˆ η˘ Variance
η˜
W(2) W(3) W(4) W(5) FN LN N NM1 NM2 NM3 NM4 NM5 NM6 NM7 NM8
0.42 0.08 −0.07 −0.16 0.95 0.91 0 0.1 0.1 0.2 0.2 0.3 0.3 0.4 0.4
0.077 0.066 0.085 0.079 0.023 0.006 0.08 0.085 0.106 0.073 0.104 0.064 0.095 0.058 0.069
0.07 0.055 0.072 0.068 0.027 0.005 0.076 0.074 0.093 0.063 0.091 0.054 0.084 0.047 0.059
0.069 0.056 0.073 0.068 0.032 0.006 0.069 0.073 0.091 0.062 0.089 0.053 0.082 0.046 0.058
0.366 0.128 −0.073 −0.145 0.899 0.946 −0.023 0.071 0.139 0.209 0.212 0.338 0.279 0.422 0.384
0.339 0.118 −0.068 −0.134 0.825 0.852 −0.022 0.066 0.129 0.194 0.197 0.313 0.259 0.387 0.351
0.074 0.064 0.085 0.078 0.02 0.005 0.079 0.084 0.105 0.073 0.103 0.062 0.094 0.054 0.069
0.062 0.055 0.073 0.067 0.016 0.003 0.068 0.072 0.09 0.062 0.089 0.053 0.08 0.045 0.058
0.343 0.118 −0.068 −0.134 0.874 0.905 −0.023 0.065 0.13 0.194 0.198 0.313 0.261 0.388 0.355
0.063 0.054 0.072 0.067 0.021 0.005 0.075 0.072 0.092 0.063 0.091 0.054 0.082 0.046 0.059
The second column, titled η, is the true asymmetry coefficient given with precision of 2 decimal digits Table 2 Average mean square errors, estimate values and variances for estimators η, ˆ η˘ and η˜ from 1000 iterations with sample size n = 50 f
η
ηˆ MSE
η˘
η˜
ηˆ η˘ Average value
η˜
ηˆ η˘ Variance
η˜
W(2) W(3) W(4) W(5) FN LN N NM1 NM2 NM3 NM4 NM5 NM6 NM7 NM8
0.42 0.08 −0.07 −0.16 0.95 0.91 0 0.1 0.1 0.2 0.2 0.3 0.3 0.4 0.4
0.057 0.053 0.051 0.055 0.009 0.006 0.047 0.051 0.07 0.045 0.068 0.04 0.058 0.037 0.04
0.052 0.046 0.045 0.049 0.011 0.005 0.045 0.045 0.062 0.039 0.061 0.033 0.052 0.029 0.034
0.052 0.046 0.045 0.049 0.016 0.004 0.042 0.045 0.061 0.039 0.06 0.033 0.051 0.029 0.033
0.397 0.108 −0.073 −0.156 0.916 0.956 0.019 0.082 0.138 0.219 0.207 0.353 0.304 0.432 0.401
0.372 0.101 −0.069 −0.146 0.853 0.872 0.018 0.077 0.13 0.206 0.195 0.331 0.286 0.402 0.373
0.057 0.052 0.051 0.055 0.008 0.004 0.047 0.051 0.068 0.044 0.068 0.037 0.058 0.032 0.038
0.049 0.046 0.045 0.049 0.006 0.003 0.042 0.045 0.06 0.039 0.06 0.032 0.051 0.027 0.033
0.374 0.101 −0.068 −0.146 0.897 0.922 0.023 0.077 0.13 0.205 0.196 0.331 0.288 0.402 0.375
0.05 0.046 0.045 0.048 0.008 0.005 0.045 0.044 0.061 0.039 0.061 0.032 0.051 0.027 0.034
The second column, titled η, is the true asymmetry coefficient given with precision of 2 decimal digits
Appendix 1 In the following let N (μ, σ 2 ) denote the density function of normal random variable with mean μ and variance σ 2 . To compute s4 ( p, μ1 , μ2 , σ12 , σ22 ) first note that N (μ1 , σ12 ) ×
N (μ2 , σ22 )
1 (μ1 − μ2 )2 =√ exp − 2 σ12 + σ22 2π σ12 + σ22 1
× N (μ∗ , σ ∗ ), 2
(a.1)
123
984
P. N. Patil et al.
where μ∗ =
μ1 σ22 +μ2 σ12 σ12 +σ22
R
and σ ∗ = 2
σ12 σ22 , σ12 +σ22
x − μ2 N (μ1 , σ12 ) σ2
and
⎛
⎞ μ − μ 1 2 ⎠. dx = ⎝ 2 2 σ1 + σ2
(a.2)
Then using (a.1) and (a.2), its easy to see that s4 ( p, μ1 , μ2 , σ12 , σ22 ) = −corr( f 4 (X ), F4 (X )) = −
√
A − B/2 12 √ C − B2
where F4 is the 6distribution function corresponding to f 4 and A, B and C are as defined Ai where below. A = i=1 ⎞ ⎛ p3 p 2 (1 − p) ⎝ μ1 − μ2 ⎠ , A1 = √ , A2 = √ 4σ1 π 2σ1 π σ12 /2 + σ22 ⎞ ⎛ p(1 − p)2 ⎝ μ2 − μ1 ⎠ , A3 = √ 2σ2 π σ 2 + σ 2 /2 1
2
A4 =
(1 − p)3 2 p 2 (1 − p) (μ1 − μ2 )2 exp − √ , A5 = √ 4σ2 π 2(σ12 + σ22 ) 2π σ12 + σ22
⎞
⎛
μ1 σ22 +μ2 σ12 ⎜ σ12 +σ22 − μ1 ⎟
⎜ ⎝
σ12 σ22 + σ12 σ12 +σ22
and
2 p(1 − p)2 (μ1 − μ2 )2 exp − A6 = √ 2(σ12 + σ22 ) 2π σ12 + σ22
⎛
μ1 σ22 +μ2 σ12 σ12 +σ22
⎜ ⎜ ⎝
σ12 σ22 σ12 +σ22
⎞
− μ2 ⎟ ⎟. ⎠ 2 + σ2
p2 (μ1 − μ2 )2 (1 − p)2 2 p(1 − p) B= exp − . √ + √ +√ 2σ1 π 2σ2 π 2(σ12 + σ22 ) 2π σ12 + σ22 C=
4
i=1 Ci
where
p3 3 p 2 (1 − p) (μ1 − μ2 )2 C1 = exp − , √ , C2 = √ 2(σ12 /2 + σ22 ) 2π σ12 3 2 2 σ1 π σ12 /2 + σ22
123
⎟ ⎠
A measure of asymmetry
985
(μ1 − μ2 )2 3 p(1 − p)2 (1 − p)3 exp − C3 = √ = and C √ . 4 2(σ12 + σ22 /2) 2π σ22 3 2 2 σ2 π σ12 + σ22 /2 Acknowledgements The first author is grateful to Professor A. T. A. Wood for helpful discussions. We are also thankful to the two anonymous referees for very helpful comments and suggestions.
References Arias-Nicolás JP, Martín J, Suárez-Llorens A (2009) L p loss functions: a robust bayesian approach. Stat Papers 50:501–509 Boos DD (1982) A test for asymmetry associated with the Hodges-Lehmann estimator. J Am Stat Assoc 77:647–649 Boshnakov GN (2007) Some measures for asymmetry of distributions. Stat Prob Lett 77:1111–1116 Butler CC (1969) A test for symmetry using the sample distribution function. Ann Math Stat 40:2209–2210 Critchley F, Jones MC (2008) Asymmetry and gradient asymmetry functions: density-based skewness and kurtosis. Scand J Stat 35:415–437 Doksum KA (1975) Measures of location and asymmetry. Scand J Stat 1:11–22 Ekström M, Jammalamadaka SR (2007) An asymptotically distribution-free test of symmetry. J Stat Plan Inference 137:799–810 Giné E, Mason D (2008) Uniform in Bandwidth Estimation of Integral Functionals of the Density Function. Scand J Stat 35:739–761 Jones MC (1993) Simple boundary correction for kernel density estimation. Stat Comput 3:135–146 Li X, Morris JM (1991) On measuring asymmetry and the reliability of the skewness measure. Stat Prob Lett 12:267–271 MacGillivray HL (1986) Skewness and asymmetry: measures and orderings. Annals Stat 14:994–1011 Rothman ED, Woodroofe M (1972) A Cramé-von Mises type statistic for testing symmetry. Ann Math Stat 43:2035–2038 Siburg KF, Stoimenov PA (2008) Symmetry of functions and exchangeability of random variables. Stat Papers. doi:10.1007/s00362-008-0195-3 Ye P, Bhattacharya B (2009) Tests of symmetry with one-sided alternatives in three-way contingency tables. Stat Papers. doi:10.1007/s00362-009-0198-8 van Zwet WR (1964) Convex transformations of random variables. Math Centrum, Amsterdam
123