Extremes 5, 157±180, 2002 # 2003 Kluwer Academic Publishers. Manufactured in The Netherlands.
On Exponential Representations of Log-Spacings of Extreme Order Statistics J. BEIRLANT University Center of Statistics, W. de Croylaan 52B, 3001 Heverlee, Belgium E-mail:
[email protected] G. DIERCKX Department of Statistics, Potchefstroom University for CHE, Potchefstroom 2520, South Africa E-mail:
[email protected] A. GUILLOU Universite Paris VI, LSTA, BoõÃte 158, 4 place jussieu, 75252 Paris cedex 05, France E-mail:
[email protected] Ï RICA Ï C. STA Mathematical Statistics, Chalmers University of Technology, S-412 96 Goteborg, Sweden E-mail:
[email protected] [Received October 10, 2000; Revised September 23, 2002; Accepted October 22, 2002] Abstract. In Beirlant et al. (1999) and Feuerverger and Hall (1999) an exponential regression model (ERM) was introduced on the basis of scaled log-spacings between subsequent extreme order statistics from a Pareto-type distribution. This lead to the construction of new bias-corrected estimators for the tail index. In this note, under quite general conditions, asymptotic justi®cation for this regression model is given as well as for resulting tail index estimators. Also, we discuss diagnostic methods for adaptive selection of the threshold when using the Hill (1975) estimator which follow from the ERM approach. We show how the diagnostic presented in Guillou and Hall (2001) is linked to the ERM, while a new proposal is suggested. We also provide some small sample comparisons with other existing methods. Key words. Pareto index, quantile plots, regression, asymptotics AMS 2000 Subject Classi®cation.
1.
PrimaryÐ62G32 SecondaryÐ62F12
Introduction
Let X1 ; X2 ; . . . ; Xn be a sequence of positive independent and identically distributed random variables with distribution function F, and let X1;n X2;n Xn;n denote the order statistics based on the ®rst n observations.
158
BEIRLANT ET AL.
We suppose that F is of Pareto-type, i.e. there exists a positive constant g for which 1
1=g
F
x x
`F
x;
1
where `F is a slowly varying function at in®nity satisfying `F
lx ?1 `F
x
when x ? ?
for all l > 0:
2
This model is well-known to be equivalent to U
x xg `
x;
3
where U
x inffy; F
y 1 1=xg
x > 1 and with ` slowly varying. Most references on the estimation of the tail index g (see for instance Hall, 1982; CsoÈrgo00 et al., 1985; Beirlant et al., 1996; Drees and Kaufmann, 1998) make use of the following general condition on the slowly varying function `: Assumption
R` : There exists a real constant r < 0 and a rate function b satisfying b
x ? 0 as x ? ?, such that for all l 1, as x ? ?, log
`
lx *b
xkr
l; `
x
with kr
l
Rl 1
vr
1
4
dv.
Using this framework, Beirlant et al. (1999) motivated the following approximation for log-spacings of upper order statistics: Zj : j
log Xn
j 1;n
log Xn
j j;n * g bn;k k1
r
fj ;
1 j k;
5
where
f 1 ; . . . ; f k is a vector of independent and standard exponentially distributed random variables and b n;k : b
n 1=
k 1/, 2 k n 1. A maximum likelihood estimator of g can then be constructed by maximizing the likelihood induced by the right hand side of (5), or of j Zj *g exp dn;k k1
r
fj ;
1 j k;
with dn;k bn;k =g and using the approximation r exp dn;k
j=
k 1 , see Feuerverger and Hall (1999).
6 1 dn;k
j=
k 1
r
*
ON EXPONENTIAL REPRESENTATIONS OF LOG-SPACINGS
159
After deletion of ``bn;k
j=
k 1 r '' from these ERMs, the maximum likelihood estimator of g is given by the famous Hill (1975) estimator: Hk;n
1 X log Xn k 1 jk
j 1;n
log Xn
k;n :
In this note representation (5) will be made more precise in the asymptotic sense for n ? ? and k=n ? 0. In Section 2 this result is derived. Next, in Section 3, based on the result given in Section 2 we derive the asymptotic results for a class of weighted linear combinations of the Zj , and in particular of some estimators of g and bn;k based on the statistical models induced by (5) or (6). In particular, it is shown that the level of bias of the Hill estimator and other classical estimators is eliminated over the range of values of k for p which kbn;k O
1 as k; n ? ?, k=n ? 0. In a ®nal section we discuss applications of the statistical analysis of model (5) to the selection of the optimal sample fraction in extreme value estimation. In case of the Hill estimator the optimal number kn;opt of largest order statistics was given by Hall and Welsh (1985) for a particular subclass of the slowly varying functions satisfying (4). In the meantime other methods for an adaptive choice of k have been proposed: see, for example, Beirlant et al. (1996), Drees and Kaufmann (1998), Danielsson et al. (2001), Gomes and Oliveira (2001). We derive the connection between the diagnostic proposed in Guillou and Hall (2001) and the ERM (5). When concentrating on the minimization of the (asymptotic) mean squared error, the value of the minimizer kn;opt can be estimated by simple imputation of the estimates obtained from (5). Asymptotic properties of this approach are inspected. The small sample behavior of this approach is investigated using a case study and through a simulation study, which is comparable to the one in Matthys and Beirlant (2000).
2.
Exponential representations of log-spacings for Pareto-type distributions
In this section we formalize (5). The basic result from this section now reads as follows. Theorem 2.1: Suppose (3) holds together with
R` . Then there exist random variables bj;n and standard exponential random variables fj;n (independent with each n) such that ! r Xn j 1;n j sup j log
7 fj;n bj;n op
bn;k ; g bn;k k 1 X 1 jk n j;n as k; n ? ? with k=n ? 0, where X k k1 b =j op bn;k max log ;1 : j i j;n i
160
BEIRLANT ET AL.
This main result is close in spirit to approximations established by Kaufmann and Reiss (1998) and Drees et al. (2000). Some important differences are present, however, for instance with the random factor fj;n appearing here also with the bias component r bn;k
j=
k 1 . In the sequel we omit the second subscript in bj;n and fj;n . We also state a slightly stronger version of this result which will turn out to be useful in deriving asymptotic results for weighted linear combinations of the Zj . The statement appears more complicated however. The proof is essentially the same as the one of Theorem 2.1 itself. We use the notation log
1=w log
1=wV1 for w [
0; 1. Theorem 2.2: R j=k Suppose (3) holds together with
R` . Then (7) holds with P k jR 1 f
1=j 0 u
t dtgbj op
bn;k for every function u de®ned on
0; 1 satisfying j=k jk
j 1=k u
t dtj R 1 f
j=
k 1 for some ®xed positive continuous function f de®ned on
0; 1 satisfying 0 log
1=wf
w dw < ?. The relation between the two versions of the main result follows from ( Z ) (Z ) j k k X k Z i=k k i=k X X X bj X bj 1 j=k ; u
t dt bj u
t dt u
t dt j 0 j j
i 1=k ji j1 j1 i1 i 1
i 1=k and hence ( Z ) X 1X X k k k bj 1 j=k i u
t dt bj f : k i1 k 1 ji j j 1 j 0 The proof of the main result (7) is split in different parts. It follows from the combination of the following lemma's. In this, we use the function R
x U
exp
x; together with the facts
X1 ; X2 ; . . . ; Xn d
R
E1 ; R
E2 ; . . . ; R
En 1 1 1 d U ;U ;...;U ; V1 V2 Vn with
V1 ; . . . ; Vn a sequence of independent uniformly
0; 1 distributed random variables,
E1 ; . . . ; En a sequence of independent standard exponentially distributed random variables. E1;n En;n will denote the order statistics based on E1 ; . . . ; En .
161
ON EXPONENTIAL REPRESENTATIONS OF LOG-SPACINGS
Lemma 2.1: Suppose (3) holds together with
R` . If k k
n is an intermediate sequence i.e. k
n ? ? and k
n=n ? 0, as n ? ?, then the following result holds for every j [ f1; . . . ; kg and some function b~ which is asymptotically equivalent to b: for any e > 0 and any d > 0, there exists n0 such that for every n n0 we have with probability at least 1 d that log
Xn j 1;n Ak;n
j Bk;n
j; Xn k;n
8
where Ak;n
j g
En
j 1;n
En
k;n
~ En b
e r
k;n
er
En
En
j 1;n
k;n
1;
9
and E ~ n k;n b
e er
En j 1;n En k;n jBk;n
jj r n max
1 eee
En j 1;n En k;n 1; 1
1
e
En
ee
j 1;n
En
k;n
o
:
10
Proof: Condition
R` implies (cf., for instance, Dekkers et al., 1989, Lemma 2.5) that ~ for some function b*b one has that for any e > 0, there exists x0 such that, for any x x0 and l 1
1 elr e r
1
log `
lx log `
x
1 ~ b
x
elr r
e
1
:
Rearranging yields
1 ele r
1
r
l
log `
lx
log `
x ~ b
x
~ lr 1 b
x r
1
el r
e
1
lr ;
for x x0 and every l 1. ~ is ultimately positive for large Using (3) (without loss of generality, assume that b
x values of x) it turns out that this statement is equivalent to
1 ele r
1
~ log lr b
x
U
lx U
x
g log l
~ l b
x
r
1 r
1
el r
e
1
~ lr b
x:
162
BEIRLANT ET AL.
Setting log l En j 1;n En k;n , with j [ f1; . . . ; k 1g, and log x En exists n0 , such that for any n n0, with arbitrary large probability n
1 eee
En log
En
j 1;n
Xn j 1;n Xn k;n
n
1
ee
o 1 er
En
k;n
g
En
e
En
j 1;n
En
j 1;n
j 1;n
En
En
o 1 er
En
k;n
k;n
k;n
~ En b
e r
~ En b
e j 1;n
En
k;n
k;n
then there
er
En
k;n
k;n ,
En
j 1;n
k;n
1
r ~ En b
e r
k;n
;
from which the lemma follows with Bk;n
j log
Xn j 1;n Xn k;n
g
En
j 1;n
En
k;n
~ En b
e r
k;n
er
En
j 1;n
En
k;n
1:
In the following two results we analyze the two terms in the right hand side of (8) further. We set Ak;n
k 1 : 0 and Bk;n
k 1 : 0. Asymptotically j
Ak;n
j Ak;n
j 1 will yield the right hand side in (5). This is shown in Lemma 2.2. The asymptotic negligibility of the Bk;n
j-terms is treated in Lemma 2.3. Lemma 2.2: With the same notations as before, we have under
R` that r j sup j Ak;n
j Ak;n
j 1 g bn;k fj op
bn;k ; k 1 1 jk as k; n ? ? with k=n ? 0. Proof: Following the ReÂnyi representation, we know that for every j [ f1; . . . ; kg, j
En j 1;n En j;n d fj , where f1 ; . . . ; fk are independent and standard exponentially distributed random variables. From this ReÂnyi representation and (9) it then follows that, ~ En gfj b
e Ak;n
j 1 d
j Ak;n
j
k;n
e
rEn
k;n
j
erEn
r
The mean value theorem entails that
erEn
erEn
j 1;n
j;n
r with En
j;n
Qj;n En
En
j 1;n .
j 1;n
En
j;n exp
rQj;n ;
j 1;n
erEn
j;n
:
163
ON EXPONENTIAL REPRESENTATIONS OF LOG-SPACINGS
From this we see that for every j [ f1; . . . ; kg j Ak;n
j
~ En Ak;n
j 1 gfj b
e ~ b
e
En
k;n
k;n
j k1
r
fj ! exp
En k;n exp
Qj;n
r
j k1
r
! fj :
Now, ~ n1 k1 ~ En k;n b
b
e k 1
n 1Vk 1;n ? p 1; 1 bn;k b
nk 1 since b~ is regularly varying and
n 1Vk 1;n =
k 1 ? p 1 as k; n ? ?. Hence, as max1 j k fj Op
log k, if we can prove that as k; n ? ?, ! exp
E n k;n
log k sup exp
Qj;n 1 j k
r
j k1
? p 0;
r
11
then sup j Ak;n
j
1 jk
Ak;n
j 1
j g bn;k k1
fj op
bn;k :
r
We end the proof by showing (11). Remark that, ! r r exp
E j n k;n exp
Qj;n k1 ( ! r ! r r exp
E j j exp
En k;n n k;n max ; exp
En j 1;n exp
En j;n k1 k1 ) ( ! r ! r r r V j j Vj;n j 1;n d max ; Vk 1;n k1 k1 Vk 1;n r r r j j ; V d max Vj r1;k j;k k1 k1 r r j1 j ck ; V r ; max Vj r1;k j;k k1 k1
)
r
164
BEIRLANT ET AL.
where V1;k Vk;k are uniform
0; 1 order statistics from an i.i.d. sample of size k, (de®ning Vk 1;k as 1) and where 0 ck jrj maxf
1=
k 1 r ;
1=
k 1g. In order to prove (11) we have to show that r r j ? p 0;
log k max Vj;k 1jk k1 as k; n ? ? and k=n ? 0. First suppose that jrj 1. Then for some constant C, only depending on r, we have that r j j C
log k max Vj;k ;
log k max Vj;k r 1jk 1 j k k1 k 1 which tends to zero in probability as the sup-norm of apuniform empirical quantile process based on a uniform sample of size k, bk
p k
Gk 1
p p
p [
0; 1 with Gk 1
p Vj;k for p [
j 1=k; j=k, is bounded in probability (see, for example, Shorack and Wellner, 1986, Section 9.2). In case 0 < jrj < 1, then r r j j jrj 1 max Vj;k Tj;k jrj 1max Vj;k k 1 ; 1 j k jk k1 where Tj;k lies in between Vj;k and j=
k 1. Using the fact that constants b > 1 can be constructed such that with arbitrarily large probability, the bounds Vj;k b 1 j=
k 1 hold uniformly in j 1; . . . ; k, together with the fact that for any d [
0; 1, p k supp [
1=
k 1;k=
k 1 j
Gk 1
p
1
p=p2
d 2
j Op
1
(see Sections 10.3 and 11.5 in Shorack and Wellner, r j
log k max Vj;k r 1 jk k1 1 p G
p d=2 Cb;r k
log k k sup k 1 d p2 2 p [
k 1 1;k k 1
1986), we ®nd that for d [
0; jrj
! p
? 0: Now, set bj j Bk;n
j Bk;n
j 1 . Then, the proof of the main result will be concluded with the next lemma. & Lemma 2.3:
With the same notations as before, we have that uniformly in j [ f1; . . . ; kg, k1 jBk;n
jj op bn;k max log ;1 ; j
as k; n ? ? with k=n ? 0.
165
ON EXPONENTIAL REPRESENTATIONS OF LOG-SPACINGS
Proof:
From (10) it follows that we need to show that, uniformly in j,
e
r
En
j 1;n
En
k;n
e
r
En
j 1;n
En
k;n
ee e
En j 1;n En max log k1 ;1 j
1
1
k;n
;
12
and
1 eee
En j 1;n En k;n max log k1 ;1 j
1
;
13
can be made arbitrarily small with arbitrarily large probability by taking e small enough. To prove (13) we can use the inequality ex 1 xex, for any x 0. Let e be arbitrarily small with r e < 0. Then, er
En
j 1;n
En
k;n
h e e
r e
En " eVj;k r
e
h
1 eee
En
j 1;n
log
En
1 Vj;k
!
k;n
j 1;n
En #
En
j 1;n
k;n
1 En
i k;n
i
ee
r e
En
j 1;n
En
k;n
1 :
Using linear in probability bounds b 1
j=
k 1 Vj;k b
j=
k 1
b > 1, (13) now follows. The proof of (12) is similar, making use of the inequality 1 e x x instead of ex 1 xex . & 3.
Asymptotic normality of bias-corrected estimators of c
We now show how the main result of the preceding section can be used to deduce the asymptotic normality of kernel type statistics k 1X j K Z; k j1 k1 j Rt where the kernel K can be written as K
t
1=t 0 u
v dv, 0 < t < 1 for some function u de®ned on
0; 1. This result can be compared with Theorem 1 in CsoÈrgoÈ et al. (1985) which holds for positive, nonincreasing and right continuous kernels K for which R ? 1=2 R? u K
u du < ? and K
u du 1. See also CsoÈrgo00 and Viharos (1998). Our 0 0 conditions on the rate of increase of the kernel are somewhat more restrictive but the result is a consequence of our main result which was obtained using more elementary methods. Remark also that the kernel does not need to be positive or monotone.
166
BEIRLANT ET AL.
Rt Theorem 3.1: Suppose holds together (3) with
R` . Let K
t
1=t 0 u
v dv for some R j=k function u satisfying k
j 1=k u
t dt f
j=
k 1 R 1 for some positive continuous function f de®ned on
0; 1 such that and 0 log
1=w f
w dw < ? R1 2d
w dw < ? for some d > 0. 0 jKj p Suppose kbn;k O
1 as k; n ! ? with k=n ! 0. Then with the same notations as before we have that " k p 1 X j k K Z k j1 k1 j converges to a N
0; g2 Proof: that
R1 0
k 1X j j K g bn;k k j1 k1 k1
#
r
;
K 2
udu distribution.
The result follows from the Lindeberg-Feller central limit theorem after showing
( Z ) " k p X 1 j=k k u
t dt Zj j 0 j1 ( Z ) k X 1 j=k j u
t dt g bn;k j 0 k1 j1
r
#
fj ? p 0:
This follows directly from Theorem 2.2 and the restriction with k=n ? 0.
p kbn;k O
1 as k; n ? ? &
The generalized linear regression models (5) and (6) can be used to construct biasreduced estimators for g. Alternatively, changing the generalized linear models (5) and (6) respectively into regression models with additive noise (replacing the random factors fj by their expected values in the bias term in Theorem 2.1), we obtain j Zj * g bn;k k1
r
g
fj
1;
1 j k;
14
or j Zj *g exp dn;k k1
r
g
fj
1;
1 j k;
15
leading to least squares type estimators. We ®rst consider the case where these models are solved for g and bn;k , or g and dn;k ,
167
ON EXPONENTIAL REPRESENTATIONS OF LOG-SPACINGS
~k;n for r in the models given above. For brevity after substituting a consistent estimator r we focus on the least squares estimators based on (14) leading to ^g
1 r LS
~
Zk
1
1 b^LS
~ r
1 b^LS
~ r ~
1 r
~2
1 r ~2 r
16 k 2~ r 1 X k j1
j k1
~ r
!
1 1
~ r
Zj :
17
Here we approximated
1=k
k X
j=
k 1
~ r
by
1=
1
~ r
j1
and
1=k
k X
j=
k 1
~ r
1=
1
~2 r
by
~2 =
1 r
~2
1 r
2~ r:
j1
Theorem 3.2: Suppose (3) holds together with
R` . If, in the estimation procedure, we ~, then when k kn ? ? and k=n ? 0 at a rate substitute for r a consistent estimator r p such that kbn;k O
1 p
1 k ^gLS
~ r g ? d N
0; s21 ; 2
where s12 g2
1 r=r .
1 Also, b^LS
~ r satis®es the following asymptotic representation as k kn ? ? and k=n ? 0: p
1
1 r 1 2r Nk
1 ^ p op
bn;k ; bLS
~ r bn;k g r k
1
where Nk ? d N
0; 1 as k ? ?. Proof:
Remark that
^g
1 r LS
~
k 1X j K Z; k j 1 r~ k 1 j
168
BEIRLANT ET AL.
2 r 2 where Kr
j=
k 1
1 Pk r=r
1 r
1 2r=r
j=
k 1 . The asymptotic normality of
1=k j 1 Kr
j=
k 1Zj is a direct consequence of Theorem 3 3.1; here Pf can be taken to be constant equal to 2
1 r =r2 . Remark k r that
1=k jp 1 Kr
j=
k Pk 1
j=
k 1 ? 0 as kr? ? so that under the given 1 conditions k b ? 0. Remark also that n;k j 1 Kr
j=
k 1
j=
k 1 k Pk 1 1=2 and j 1 Kr
j=
k 1 1o
k k k 1X K
j=
k 1 k j 1 r~
Kr
j=
k 1 op
k
1=2
:
p P It remains to verify that k 1k kj 1 Kr~
j=
k 1 Kr
j=
k 1 Zj ? p 0 under the given conditions. Starting from (16) and using the mean value theorem, this difference can be written as the sum of two terms of the form
~ r
p
1 rC^1;r kb^LS
r ;
respectively
~ r
k p 1 X rC^2;r k k j1
"
j k1
r
j log k1
2 #
1
1
r
Zj ;
where C^1;r and C^2;r denote sequences of random variables tending in probability to a constant depending only on r, as well as r which converges p
1 in probability to r. Using the method of proof from Theorem 3.1 it is seen that kb^LS
~ r Op
1 and k p 1 X k k j1
"
j k1
r
j log k1
2 #
1 1
r
Zj Op
1:
r follows from similar lines using the mean value The asymptotic representation for b^LS
~ theorem and Theorem 2.1. Indeed,
1
1 b^LS
r
1 d bn;k
2
r
1 2r r2 bn;k k 1X j 6 k j1 k1
r
1 1
r
g bn;k
j k1
r
fj bj
The statement follows then from Theorem 2.2, the fact that, as k ? ?, k 1X j k j1 k 1
r
j k1
r
1 1
r
fj ? p
r2
1
2r
1
r
2
;
op
1:
169
ON EXPONENTIAL REPRESENTATIONS OF LOG-SPACINGS
and by setting
1 Nk
1
p k r 1 2r 1 X j p r k1 k j1
r
1 1
f: r j
Remark 1: This result implies that the estimators for g as de®ned by (16) still possess the convergence rate k 1=2 common to all classical estimators, this in spitepof the fact that the corresponding estimator for bn;k given by (17) is only consistent when kbn;k ? ?. When r is close to 0 the variance increases drastically, which corresponds with the conclusions from the simulation study in Beirlant et al. (1999) which motivates the need for some truncation on r in practice as discussed in that paper. However the mean squared errors (m.s.e.) of the present estimators are comparable to the one of the Hill estimator, provided all estimators are based on the respective optimal number of order statistics. Their m.s.e.'s ®nd their minima at much higher values of k in comparison to classical estimators such as the Hill estimator. So the estimators under study combine unbiasedness with m.s.e. accuracy which is comparable to classical estimators that do exhibit large biases, in particular when jrj is small. These characteristics also hold for the estimators based on joint estimation of g, bn;k and r as will be derived below. Remark 2: ~k;n;l r ~
~ for r one can for instance propose the estimator As a consistent estimator r Hbl2 kc;n Hblkc;n ~ ~ 1 log ; log l Hblkc;n Hk;n ~ ~
p for some l [
0; 1 and with k~ taken in the range k~bn;k~ ? ?, as proposed in Drees and Kaufmann (1998) where an adaptive choice for k~ in this range is also given. It can also be shown that the estimators pof r based on the regression models discussed here share this consistency property as k~bn;k~ ? ?. For a more elaborate discussion of the estimation of r and several other estimators of r we refer the reader to Gomes et al. (2000) and Fraga Alves et al. (2003). Under a third order condition log
`
lx b
xkr
l c
xH
l; r; b
1 o
1; `
x
18
with b < r, c regularly varying at in®nity with index b, and 1 H
l; r; b fkr b
l b
kr
lg;
p p ~b ~
~ it is shown that for r r to be asymptotically normal one needs k k~bn;k~ ? ?, ~ n; k k;n p p p 2 ~ k~c ~ ? c2 ®nite. k~bn;k~ ? c1 ®nite, and k~c
n=k n;k
170
BEIRLANT ET AL.
Remark 3: With the same methods as used in the proof of Theorem 3.2, one can ®nd
1 asymptotic results concerning b^LS
1, the least squares estimator of bn;k obtained after ~ 1 in (17), given by setting r k 12 X j
1 b^LS
1 k j1 k1
1 Z: 2 j
Indeed, under the same conditions as in Theorem 3.2, r p k 1 ^
1 cjrj 3 bLS
1 ? d N ;1 ; 12 g g
2 r
1 r
p if kbn;k ? c [ R; if
1 b^LS
1 ?p
1 bn;k
p kbn;k ? ?;
6jrj r
2
r
:
19
20
We next turn to the asymptotic analysis of estimators obtained from (6) in case the parameters g, bn;k and r are estimated simultaneously. A similar program can be carried out for the other models but this is omitted; these lead to optimization problems which are mathematically and notationally more involved. More precisely, we construct a speci®c algorithm to solve the optimization problem, for which we then give the asymptotic result. The algorithm was essentially given in Box and Tidwell (1962). We denote the resulting
2 estimator by ^gBT . The asymptotic result is in the spirit of classical theorems on likelihood methods which specify that a one-step Newton algorithm starting with a consistent estimator with the appropriate rate yields an ef®cient estimator. Here an initial estimator for r is used for which the appropriate rate was discussed in Remark 2. It is easily seen that ^g
2 BT
k 1X Ze k j1 j
d^n;k
j=
k 1
^ r
21
^ are found by minimizing where d^n;k and r k 1X j dn;k k j1 k 1
r
"
k 1X log Ze k j1 j
# dn;k
j=
k 1
r
or k 1X Z exp k j1 j
dn;k vj;k
r
where vj;k
r
j=
k 1
r
vk
r ; and vk
r 1k
22 Pk
j 1 vj;k
r.
171
ON EXPONENTIAL REPRESENTATIONS OF LOG-SPACINGS
Following the approach given in Box and Tidwell (1962), we propose to use a Taylor ~ expansion of (22) in 1=
1 r around a consistent estimator r k 1X Zej exp dn;k vj;k
~ r k j1
vk
~ r
dn;k
1
~2 tj;k
~ r r
tk
~ r ;
~ r ~, tj;k
~ where dn;k dn;k
1=
1 r 1=
1 r r
j=
k 1 log
k 1=j and P
1 k 1 tk
~ r k j 1 tj;k
~ r. This expression is then optimized in dn;k and dn;k , leading to d^n;k
1 ^ and dn;k , after which the estimator for r follows from
1 ^ dn;k 1 1 : ~ d^
1 ^
1 1 r 1 r
23
n;k
The equations in dn;k and dn;k are 8 k > 1X > > Ze > > < k j1 j
dn;k vj;k
~ r
vk
~ r
dn;k
1
~2 tj;k
~ r r
tk
~ r
vk
~ r 0
k > > 1X > > Zj e > :k j1
dn;k vj;k
~ r
vk
~ r
dn;k
1
~2 tj;k
~ r r
tk
~ r
tk
~ r 0;
vj;k
~ r tj;k
~ r
or, by linearizing the equations in dn;k and dn;k , since dn;k ? 0 as k=n ? 0, 8 k X > > >1 > Z v
~ r > > k j 1 j j;k > > > > > > > > > > > < k > > 1X > > Z t
~ r > > k j 1 j j;k > > > > > > > > > > > :
vk
~ r dn;k
k 1X Z v
~ r k j 1 j j;k
dn;k
1 tk
~ r dn;k
~ 2 r
k 1X Z v
~ r k j 1 j j;k
k 1X Z v
~ r k j 1 j j;k
dn;k
1
~ 2 r
vk
~ r2 vk
~ rtj;k
~ r
vk
~ rtj;k
~ r
k 1X Z t
~ r k j 1 j j;k
tk
~ r
tk
~ r
tk
~ r2 :
24
Remark here that terms of order b2n;k can be deleted since dn;k =b2n;k is unbounded in ~ is based on a number probability if p r satis®es p p k~ of extreme order statistics pwhich 2 ~ ~ and ? c ®nite. Indeed then
~ r r is k~bn;k~ ? ?, k~cn;k~ ? c2 ®nite k b k b p 1 n;k~ k~ n;k~ asymptotically normal while k~bn;k~bn;k ? 0. In the next theorem we derive the basic asymptotic normality result for this BoxTidwell type algorithm described above.
172
BEIRLANT ET AL.
Theorem 3.3: Suppose (18) holds, and that k kn ? ?, k=n ? 0 at a rate such that p kbn;k O
1. Then procedure de®ned p for the estimation p p by (21), (23) and (24) start~k~ with k~bn;k~ ? ?, k~b2n;k~ ? c1 ®nite and k~cn;k~ ? c2 ®nite, we have that ing with r p
2 k ^gBT
g ? d N
0; s22 ; 4
where s22 g2
1 Proof:
r=r .
Using the fact that k 1X Z v
~ r k j 1 j j;k
k 1X Z v
~ r k j 1 j j;k
2 vk
~ r ? p g
tk
~ r ? p g
vk
~ rtj;k
~ r k 1X Z t
~ r k j 1 j j;k
2 tk
~ r ? p g
r2
1
2r
1
r
1
r r2
r 2
1
2r
1 1
r
3
2r 2r4 3
1
2
2r
1
r
4
as k; n ? ?, k=n ? 0, the system of linear equations (24) can be written as follows: 0
r2 op
1
1 2r
1 r2 @ r
1 r r2 oP
1
1 2r2
1 r3
1
r
1 r r2 op
1 d
1 2r2
1 r A n;k 1 2r 2r4 dn;k o
1 p
1 2r3
1 r2
where
1
Uk;n
k 1X Z v
~ r gk j 1 j j;k
vk
~ r;
k 1X Z t
~ r gk j 1 j j;k
tk
~ r:
and
2
Uk;n
1
Uk;n
2
Uk;n
! ;
173
ON EXPONENTIAL REPRESENTATIONS OF LOG-SPACINGS
Now by inversion, we ®nd that 0 @
1 d^n;k
1 ^ dn;k
1
2r4
1 r4
1
A
0 6@
1
1
r2
2r 2r4 2r3
1 r2
! op
1 r
1 r r2
1 2r2
1 r
op
1
r
1 r r2
1 2r2
1 r3
op
1
1
r2 2r
1
2
r
op
1
op
1
10 A@
1
Uk;n
2
Uk;n
1 A:
Consequently the following asymptotic representations are found for the Box-Tidwell type p estimators of dn;k and dn;k as kbn;k O
1: p
1 k d^n;k
1 dn;k d
p
1 k^ dn;k d
k 2r
1 2r 2r4 1 X p fj vj;k
r r4 k j1 2
1
2r
1
1
2r
1 r r3
1 r
1
r
1 r3
r
vk
r
k r2 1 X p fj tj;k
r k j1
k r2 1 X p fj vj;k
r k j1
2
k 2r 1 X p fj tj;k
r r2 k j1 3
vk
r
tk
r op
1;
tk
r op
1:
The result now follows from p ^g
2 k BT g
! 1
k p 1 X k f k j1 j
! 1
p k
1 d^n;k
1
^
1 r
dn;k 1 r
! op
1;
while, using (23), p k
1 d^n;k ^
1 1 r
1 d^n;k ~ 1 r p k ^
1 d ~ n;k 1 r
p k
!
dn;k 1 r
! dn;k
1 ^ dn;k 1 r p
1 dn;k k^dn;k op
1:
The result now follows from combination of the above steps.
&
174
BEIRLANT ET AL.
Remark 4: Since the slope parameter bn;k in the regression model is not ®xed but tends to zero with the sample size n tending pto in®nity, the maximum likelihood estimators of bn;k and r are not consistent when kbn;k O
1 and as a consequence the proof of asymptotic normality of any estimator based on joint estimation of the parameters g, bn;k and r is not conventional compared for instance to the techniques used in standard cases (as found, for example, in Lehmann, 1983, Chapter 6). On the other hand, proofs for iterative estimation schemes have to start with consistent estimators with the appropriate rate. Remark that the asymptotic normal distribution is the same as stated in the main theorem in Feuerverger and Hall (1999). ~n;k~ Remark 5: The number of extreme order statistics to be used in an initial estimator r
1 ^ ~ can p be based on bLS
1. Indeed, based on (20), a value k such that
1 k~jb^LS
1j [
M1 log n; M2 log n for some constants M1 ; M2 > 0 asymptotically will lead to a number k~ which satis®es the requirements stated in Theorem 3.3 whatever the value of r and b in (18).
4.
Selection of the optimal sample fraction for the Hill estimator
In this ®nal section we discuss the estimation of the optimal sample fraction when applying a classical tail index estimator such as the Hill (1975) estimator. It should be intuitively clear that the estimates of bn;k , the parameter which dominates the bias of the Hill estimator, should be helpful to locate the values of k for which the bias of the Hill estimator is too large, or for which the mean squared error of the estimator is minimal. ^ Guillou and Hall (2001) propose to choose Hk;n ^ where k is the smallest value of k for which r ^
1
1 b k LS > ccrit ; Hk;n 12 where ccrit is a critical value such as 1.25 or 1.5. So, after appropriate standardization of
1 b^LS
1, the procedure given in Guillou and Hall (2001) can be considered as an
1 asymptotic test for zero (asymptotic) expectation of b^LS
1 based on (19): the bias in the Hill estimator is considered to be too large, and hence the hypothesis of zero bias is rejected, when the asymptotic mean in the limit result appears signi®cantly different from zero. In a somewhat more restrictive setting Hall and Welsh (1985) gave the optimal sample fraction kn;opt of largest order statistics for the Hill estimator when the unknown distribution function admits the following expansion: 1
F
x Cx
1=g
1 Dxr=g
1 o
1
x ? ?;
25
175
ON EXPONENTIAL REPRESENTATIONS OF LOG-SPACINGS
or U
x Cg xg 1 gDCr xr
1 o
1
x ? ?;
26
for some constants C > 0, D [ R. This class of Pareto-type distributions clearly satis®es
R` with b
x rgDCr xr
1 o
1
x ? ?. In this case the asymptotic mean squared error of the Hill estimator is minimal for 2r
C
kn;opt *
1
r
2
!1=
1
2r
n
2D2
r3 2
*
b
n
1=
1
r2 2r
g2
1
2r
2r=
1
2r
!1=
1
2r
n ? ?:
Here, due to the particular form of b we obtain that n kn;opt * b2 k0
1=
1
2r
k0
2r=
1
2r
g2
1
r 2r
2
!1=
1
2r
;
27
for any secondary value k0 [ f1; . . . ; ng with k0 o
n. From (27) we arrive at the idea of plugging in consistent estimators of bn;k0 , r and g in this expression. For instance, the estimators discussed in the preceding section can be used here, all based on the upper k0 extremes. In this way, for each value of k0 an estimator of kn;opt is obtained. The following result can now be stated concerning h
k^n;k0 b^2n;k0
i
1=
1
2~ r
k0
2~ r=
1
2~ r
^g2
1
~ 2 r 2~ r
!1=
1
2~ r
;
28
~ again denotes a consistent estimator of r. where r Theorem 4.1: Suppose (25) is satis®ed. Then as k0 ; n ? ?; k0 =n ? 0 and p k0 bn;k0 = log k0 ? ?, and if, in the estimation procedure, we substitute for r a consistent ~k;n , then estimator r k^n;k0 ? p 1: kn;opt In particular, Hk^n;k ;n has the same asymptotic ef®ciency as Hkn;opt ;n . 0
176
BEIRLANT ET AL.
(a)
(b)
Figure 1. The storm insurance data set: (a) Plot of the Hill estimates Hk;n as a function of log k; (b) Plot of log k^n;k0 for k0 1; . . . ; 24,316.
ON EXPONENTIAL REPRESENTATIONS OF LOG-SPACINGS
177
The asymptotics developed in the preceding sections can be used to validate this result, especially see Theorem 3.2. Remark that the ®nal assertion in the statement of Theorem 4.1 follows from Hall and Welsh (1985), Theorem 4.1. Of course, a drawback p of this approach is that in practice one needs to identify the k0 region for which k0 bn;k0 ? ? in order to obtain a consistent method. However, as
1 suggested in the statement concerning b^LS
~ r in Theorem 3.2, k^n;k0 =kn;opt approximately p behaves as a realization from a normal distribution centered at 1 when k0 bn;k0 ? c for ^ some c [ R. As a consequence, graphs pof log kn;k0 as a function of k0 are quite stable, except for the k0 -regions corresponding to k0 bn;k0 ? 0. This is illustrated in Figure 1 for a data set of a French storm insurance portfolio consisting of 24,316 claims. The plot of log k^n;k0 is stable from k0 5000 up to 12,000 indicating a log k^ value around 6. This value corresponds to the endpoint of a stable horizontal area in the Hill plot Figure 1(a) with height at 0.5. In order to set up an automatic method, from a practical point of view we propose to use ^ the median of the ®rst n=2 k-values as an overall estimate for kn;opt : n no k^n;med median k^n;k0 : k0 3; . . . ; : 2
29
In the insurance example this leads to log k^n;med 6:2. We conclude with a comparison of some recently published adaptive threshold selection methods with a simulation study. In general, two approaches to estimating the optimal sample fraction can be distinguished. In one class of methods, estimators for AMSE of the Hk;n are constructed and the threshold that minimizes the estimated mean squared error is selected, whereas a second group derives estimators directly for kn;opt , based on the asymptotic representation of this quantity. In the following, we give an overview of the main types of adaptive threshold selection methods which are then compared in a simulation study. First method.
It is the method described above based on k^n;med .
Second method. The idea of subsample bootstrapping has been ®rst proposed by Hall (1990) and is taken up in a more general method by Danielsson et al. (2001). Instead of bootstrapping the MSE of the Hill estimator itself, they use an auxiliary statistic, the MSE of which converges at the same rate and which has a known asymptotic mean, independent of the parameter g and r. Third method. Drees and Kaufmann (1998) present a sequential procedure to select the optimal sample fraction kn;opt . From a law of the iterated logarithm they construct ``stopping times'' for the sequence Hk;n of Hill estimators that are asymptotically equivalent to a deterministic sequence. An ingenious combination of two such stopping times then attains the same rate of convergence as kn;opt . In our simulations, we have also ®xed the value of r to the canonical choice 1.
178
BEIRLANT ET AL.
^ Fourth method. Guillou and Hall (2001) proposed to choose Hk;n ^ where k is the smallest a b value of k [ n ; n for which r
1 k jb^LS
1j > ccrit ; Tk : 12 Hk;n where ccrit is a critical value close to 1. In practice, it is often advantageous to dampen stochastic ¯uctuations of Tk by working with a moving average of its squares ( Qk
2` 1
1
)1=2
` X j
`
Tk2 j
;
where ` denotes the integer part of k=2. Numerical experiments suggest taking a 0:6, b 0:8 and ccrit 1:25. In order to compare the different methods we performed extensive simulations. The patterns for different distributions are very similar, therefore we only show the results for four representative examples:
a FreÂchet
2 distribution F
x exp
x 2 with p g 0:5 and r 1, a Burr
1; 0:5; 2 distribution F
x 1
1 x 2 with g 1 and r 0:5, a Student t6 distribution with g 1=6 and r 1=3, and a loggamma
2; 1 distribution with density f
x x 2 log x and g 1 and r 0.
FreÂchet
2 distribution Method
MSE
Hk^opt ;n
1 2 3 4
0:52 1:09 0:41 0:49
10 10 10 10
2 2 2 2
MSE-ratio 1.68 3.53 1.33 1.52
Burr
1; 0:5; 2 distribution Method
MSE
Hk^opt ;n
1 2 3 4
0:93 1:46 1:24 1:45
10 10 10 10
1 1 1 1
MSE-ratio 1.01 1.57 1.33 1.55
179
ON EXPONENTIAL REPRESENTATIONS OF LOG-SPACINGS
Student
6 distribution Method
MSE
Hk^opt ;n
1 2 3 4
1:10 1:76 1:29 1:90
10 10 10 10
2 2 2 2
MSE-ratio 1.28 2.04 1.50 2.44
Loggamma
2; 1 distribution Method
MSE
Hk^opt ;n
1 2 3 4
6:02 9:04 7:84 6:89
10 10 10 10
2 2 2 2
MSE-ratio 1.11 1.67 1.45 1.36
For each distribution, 100 samples of size n 500 were randomly generated. To these samples the four adaptive threshold selection methods were applied. For the bootstrap procedure we used B 250 resamples and a subsample size n1 100. In the following tables, the empirical MSE of Hk^opt ;n are given together with the ratio of this MSE and the minimal empirical MSE for Hk;n in the simulations MSE-ratio
MSE
Hk^opt ;n mink
MSE
Hk;n
;
which gives an indication of the quality of the adaptive selection procedure. Although the bootstrap method and the diagnostic procedure (Method 4) tend to give quite variable estimates for the optimal sample fraction, the results for all four adaptive Hill estimators are well in line. The sequential procedure and the method based on k^n;med give the best results even when setting r 1. The in¯uence of the mis-speci®cation of the parameter r in these methods does not seem to be of major importance. The method presented here even performs well for the loggamma distribution in comparison with the other methods inspite of the fact that this distribution does not belong to the Hall class. Overall, the method presented here performs best when jrj is smaller than 1.
Acknowledgment The authors are very grateful to one of the referees for careful reading of the different versions of the paper; these comments lead to signi®cant improvements.
180
BEIRLANT ET AL.
References Beirlant, J., Dierckx, G., Goegebeur, Y., and Matthys, G., ``Tail index estimation and an exponential regression model,'' Extremes 2, 177±200, (1999). Beirlant, J., Vynckier, P., and Teugels, J.L., ``Tail index estimation, Pareto quantile plots, and regression diagnostics,'' J. Amer. Statist. Assoc. 91, 1659±1667, (1996). Box, G.E.P. and Tidwell, P.W., ``Transformation of the independent variables,'' Technometrics 4, 531±550, (1962). 00 CsoÈrgo, S., Deheuvels, P., and Mason, D., ``Kernel estimates of the tail index of a distribution,'' Ann. Statist. 13, 1050±1077, (1985). 00 CsoÈrgo, S. and Viharos, L., ``Estimating the tail index.'' In: Asymptotic Methods in Probability and Statistics (B. Szyszkowicz, ed.) North-Holland, Amsterdam, 833±881, (1998). Danielsson, J.L., de Haan, L., Peng, L., and de Vries, C.G., ``Using a bootstrap method to choose the sample fraction in tail index estimation,'' J. Multivariate Anal. 76, 226±248, (2001). Dekkers, A.L.M., Einmahl, J.H.J., and de Haan, L., ``A moment estimator for the index of an extreme-value distribution,'' Ann. Statist. 17, 1833±1855, (1989). Drees, H., de Haan, L., and Resnick, S., ``How to make a Hill plot,'' Ann. Statist. 28, 254±274, (2000). Drees, H. and Kaufmann, E., ``Selecting the optimal sample fraction in univariate extreme value estimation,'' Stoch. Proc. Applications 75, 149±172, (1998). Feuerverger, A. and Hall, P., ``Estimating a tail exponent by modeling departure from a Pareto distribution,'' Ann. Statist. 27, 760±781, (1999). Fraga Alves, M.I., Gomes, M.I., and de Haan, L., ``A new class of semi-parametric estimators of the second order parameter,'' Notas e Comunicacoes C.E.A.U.L. 4/2001, University of Lisbon, to appear in Portugaliae Mathematica (2003). Gomes, M.I., de Haan, L., and Peng, L., ``Semi-parametric estimation of the second order parameterÐasymptotic and ®nite sample behavior,'' Notas e Comunicacoes C.E.A.U.L. 8/2000, University of Lisbon (2000). Gomes, M.I. and Oliveira, O., ``The use of the bootstrap methodology in Statistics of ExtremesÐchoice of the optimal sample fraction,'' Extremes 4, 331±358, (2001). Guillou, A. and Hall, P., ``A diagnostic for selecting the threshold in extreme-value analysis,'' J. R. Statist. Soc. Ser. B 63, 293±305, (2001). Hall, P., ``On some simple estimates of an exponent of regular variation,'' J. Roy. Statist. Soc. Ser. B 44, 37±42, (1982). Hall, P., ``Using the bootstrap to estimate mean squared error and select smoothing parameter in nonparametric problems,'' J. Multivariate Anal. 32, 177±203, (1990). Hall, P. and Welsh, A.H., ``Adaptive estimates of parameters of regular variation,'' Ann. Statist. 13, 331±341, (1985). Hill, B.M., ``A simple general approach to inference about the tail of a distribution,'' Ann. Statist. 3, 1163±1174, (1975). Kaufmann, E. and Reiss, R.D., ``Approximation of the Hill estimator process,'' Statist. Probab. Letters 39, 347± 354, (1998). Lehmann, E.L., Theory of Point Estimation, Wiley, New York, 1983. Matthys, G. and Beirlant, J., Adaptive threshold selection in tail index estimation in Extremes and Integrated Risk Management, Risk Books, Risk Books, 2000. Shorack, G.R. and Wellner, J.A., Empirical Processes with Applications to Statistics, Wiley, New York, 1986.