Ann. Inst. Statist. Math. Vol. 42, No. 2, 305-329 (1990)
RECURSIVE KERNEL DENSITY ESTIMATORS UNDER A WEAK DEPENDENCE CONDITION LANH TAT TRAN Department of Mathematics, Indiana University, Bloomington, IN 47405, U.S.A. (Received March 8, 1989; revised June 19, 1989)
Let X~, t = ..., - 1,0, 1.... be a strictly stationary sequence of random variables (r.v.'s) defined on a probability space (g2,~Jr, P) and taking values in R d. Let Xl ..... X, be n consecutive observations of At. Let f be the density of X~. As an estimator of f(x), we shall consider Abstract.
f,(x) = n -l j =~l bfdK((x
- Xj)/bj). Here K is a kernel function and b, is a
sequence of bandwidths tending to zero as n ~ oo. The asymptotic distribution and uniform convergence o f ~ are obtained under general conditions. Appropriate bandwidths are given explicitly. The process Xt is assumed to satisfy a weak dependence condition defined in terms of joint densities. The results are applicable to a large class of time series models.
Key words and phrases: Asymptotic normality, uniform convergence, absolute regularity, density estimation, kernel, bandwidth.
1.
Introduction
The nonparametric estimation of a probability d e n s i t y f i s an interesting problem in statistical inference and plays an important role in communication theory and pattern recognition. An account of this information can be found in F u k u n a g a (1972) and F u k u n a g a and Hostetler (1975). The purpose of this paper is to investigate recursive density estimators when the observations are dependent. Let Xt, t = ..., - 1,0, I,... be a strictly stationary sequence of random variables (r.v.'s) defined on a probability space ((2, ~.~_ P ) and taking values in R d. Let XI,..., X,, be n consecutive observations of X,. Let f be the density of X~. As an estimator o f f ( x ) we shall consider
0.1)
f.(x) = n
-1
n
i~ ' bfaK((x
305
-
Xi)/bi) ,
306
LANH TAT TRAN
which was introduced by Wolverton and Wagner (1969a, 1969b) and Yamato (1971). Note that J~ can be c o m p u t e d recursively by (1.2)
J~(x) - n - 1 f . - l ( x ) + nbi, aK((x - X.)/b,,) . n
This property is particularly useful in large sample sizes easily updated with each additional observation. Here function and b, is a sequence of bandwidths tending to zero Assume the joint densities of (X1,..., Xk) exist for all k satisfies the dependence condition defined below:
since j~ can be K is a kernel as n --, oo. > 1 and that Xt
DEFINITION 1.1. Let m, k and l be arbitrary positive integers. Let X = (X~,..., Xk) and Y = (Xk÷t.l,..., Xk÷t÷m). Let fx.r, fx, fY be the densities of (X, Y), X and Y, respectively. The process Xt is said to satisfy the absolute regularity condition in the locally transitive sense (ARLT) if for some function h
(1.3)
f f R~×R,~ Ifx, v(x,y) -- f x ( x ) f r ( y ) l dxdy <_ h ( k , m ) 9 ( l )
where 9(I) I 0 as l --- ~ . Here x , y denote values of X, Y. The letter C will be used to denote constants whose values are unimportant and may vary. We assume t h r o u g h o u t the paper that
h(k, m) = C(k + m) ° for some 0 >_ 0. Let (1.4)
fl(k, l, m) = E{sup I P ( B I . ~ ( X , : 1 <_ t <_ k)) - P ( B ) I } ,
where the s u p r e m u m is taken over all sets B e .~'(X,: k + l + 1 _< t < k + l + m ) ; and where ~ ' ( X t : 1 <_ t < k) and ~"(Xt: k + l + 1 <_ t <_ k + l + rn) are the a-fields generated, respectively, by {At: 1 _< t < k } and {At: k + l + 1 _ t <_ k + l + rn}. Employing L e m m a 2 of Ibragimov and Rozanov ((1978), p. 118),
(1.5)
f f :×R~. Ifx.v(x,y) - f x ( x ) f r ( y ) l dxdy = 2/~(k, l, m ) .
It is now easy to see that the A R L T condition is weaker than the absolute regularity condition defined below when the joint densities of (X~,..., Xk) exist for all k > 1.
RECURSIVE DENSITY ESTIMATORS
307
DEFINITION 1.2. Let ~ o and ~ ' f f be the a-fields generated by {At: t < 0} and {At: t _>n} respectively. Then X(t) is absolutely regular if (1.6)
fl(n) = E{sup IP(AIM-°=)- P(A)I: A ~ °~'ff} I 0
as n ~o~.
From Definitions 1.1 and 1.2, it is clear that /~(k, l, m) <_fl(l). For relevant literature on absolutely regular processes, the reader is referred to Yoshihara (1976, 1978 and 1984). Pham and Tran (1985) and Pham (1986) have shown that a large class of time series is absolutely regular. In particular, autoregressive moving average time series models and bilinear time series models are absolutely regular with fl(n) decaying to zero exponentially fast under weak conditions. Thus, autoregressive processes and bilinear time series models satisfy the ARLT condition with h(k, m) = 1. The absolute regularity condition is weaker than the 4)-mixing condition but is stronger than the strong mixing condition. The definitions of these dependence conditions can be found in Ibragimov and Rozanov (1978). In the independent case, J~ has been thoroughly examined in Wegman and Davies (1979). In the dependent case, quadratic mean convergence and asymptotic normality of J~ have been obtained by Masry (1986) under various assumptions on the dependence of X,. Strong pointwise consistency ofj~ has been proved by Gyorfi (1981). Takahata (1980) and Masry and Gyorfi (1987) obtained sharp a.s. rates of J~ to f for the class of asymptotically uncorrelated processes, the definition of which can be found in Takahata (1977a). Masry (1987) established sharp rates of almost sure convergence ofj~ to f for vector-valued stationary strong mixing processes under weak assumptions on the strong mixing condition. These rates were recently improved by Tran (1989a). The role of the smoothing parameter b, is crucial in kernel density estimation. The book of Devroye and Gyorfi (1985) points out the prominent role of bn in the behaviour of kernel type estimators. Our main effort is devoted to finding the appropriate bandwidths and determining the trade-off between the rates at which bn and 9~(n) tend to zero. The paper is organized as follows: in Section 2, weak and explicit conditions under which J~ is asymptotically normal are found. Section 3 deals with the almost sure convergence off, t o f The methods of proof are closely related to those of Yoshihara (1976, 1978 and 1984), Takahata (1980), Masry (1986) and Tran (1989a). The conditions under whiehJ~ converges tofuniformly on compacts are weaker than those assumed by Tran (1989a) and the rates of convergence obtained are sharp. As an example, choose b, = C(n-t log n) '/la+2) where C > 0; then under the conditions of Theorem 3.1, the uniform rate of convergence of J~
308
LANH TAT TRAN
to f on compact sets is of order (n -1 log n) 1/(d+2). This is the optimal uniform rate of convergence for nonparametric estimators of a density function (see Stone (1983)). An interesting open problem is to find the best constant C to minimize the L= distance. In the independent case, Silverman (1978) has developed a practical method to determine b, to give the best possible rate of uniform consistency for nonrecursive kernel density estimators. The reader is referred to Silverman (1978) for further statistical motivation.
=flj (x)
Let .In - f ( x ) l d x be the L1 distance, where integration is over the entire space. Assume that K c L2 and X, is stationary and ergodic, and that there is an integer m > 0 such that the conditional distribution of X,, given °3"-°= is absolutely continuous a.s. Under some additional conditions requiring no information on the dependence structure of X,, Gyorfi and Masry (1988) have shown that J, converges to zero a.s. This result is also presented in Gyorfi et aL ((1988), Theorem 4.3.1). The conditions on the dependence structure of Xt in Gyorfi and Masry (1988) are quite weak in comparison with those of the present paper. Here, appropriate rates convergence of 9(n) to zero are assumed; also, the assumption made earlier that the joint densities of (X,,..., Xk) exist for all k > 1 is itself a dependence condition. Recently, for ARLT processes, Tran (1989b) has shown that J, tends to zero completely under the assumption that K e L~ with 9(n) tending to zero sufficiently fast. The literature on density estimation under dependence is extensive. The book of Gyorfi et al. (1988) gives an in-depth treatment of the subject. For a bibliography and additional background material, the reader is referred to Roussas (1969, 1988), Rosenblatt (1970), Takahata (1977a, 1977b, 1979 and 1980), Robinson (1983), Yoshihara (1984), Masry (1986), Masry and Gyorfi (1987), Yakowitz (1987), Ioannides and Roussas (1987), Hart and Vieu (1988).
2. Asymptotic normality of f. Let K , ( x ) be the averaging kernel defined by K , ( x ) = (1/ b d ) K ( x / b,). ThenJ~(x) = (l/n) }2 Kj(x - X3. Let pj = E [ I ~ ( x - X3]. By (1.2) j=l
(2.1)
A(z)
1 [&(x - E f . ( x ) : - 2 J:,
- xj)
Let Aj(x) = Kj(x - Xj) - pj. Then 1 ~ Aj(x). -
= -2
RECURSIVE DENSITY ESTIMATORS
309
LEMMA 2.1. Let U and V be random variables measurable with respect to ~ ' ( X t : 1 <_ t <_ k) and ~ ' ( X , : k + l + 1 <_ t <_k + l + m), respectively. Let r, s be positive numbers. Assume that 11UHr < oo and I] V ll~ < where II U II~ = {EIU]r} l/r. If r - l + s - l + h -l-- 1, then I E U V - E U E V I ~ C[I UII~II VIIs{fl(k,l,m)} t/h •
Lemma 2.1 can be found in Yoshihara (1984). Let 6 > 0 . El UI 2+a < oo and El VI 2+6 < oo. Then by Lemma 2.1, IE U V -
Assume
E U E V I <- CE{I U I2+aEI V I2+a}l/t2+at{/~(k, l, m)} a/Iz+a~ .
LEMMA 2.2. Let N_> 1 be a positive integer. Assume n = ( p + q)r for some positive integers p, q and r. Let {11j, l <_j < r} be a f a m i l y o f N-dimensional random vectors such that f o r each j (1 <_j< r), rlj is measurable with respect to the a-fieM generated by X(k),
(j-l)(p+q)+
l
1)(p+q)+p.
Let g(xl,...,x,) be a Borel function such that Ig(xl,...,Xr)l ~ M f o r some constant M, where xl,..., xr are N-dimensional vectors. Let F I~ and F 12~be distribution functions o f random vectors (I/t,..., r/j) and (IIi+ l,..., rb), with 1 < j < r. Then
(2.2)
Eg(rl,,..., I!,) - f ...f g(xl .... , xj, xj+ 1,..., xr) d F Ct' R ~,
• (x~,.. ., x ~) d;12), tx:+l,...,
x~)
<_2Mfl(n, q, n) <- Cn°•(q) .
PROOF. By assumption, )g(xl,...,Xr)l <_M. Using L e m m a 1 in Yoshihara (1978), the left-hand side of (2.2) is bounded by 2ME{supl P(Bl,~r(r/k: 1 _< k < j ) ) - P(B)[}, where the supremum is taken over all sets B c ~ ( r / , : j + 1 _< k < r); and where ,~(r/k: 1 < k < j ) and , ~ ( q k : j + 1 < k <_ r) are the g-fields generated, respectively, by {r/k: 1 <_ k _
310
LANH TAT TRAN
apart by a distance of at least q. Thus clearly, E{supl P(Bl,~(t/k: 1 <_k <_j)) - P(B)[ } -~(n, q, n), from which the lemma follows. LEMMA 2.3. Assume n = (p + q)r for some positive integers p, q and r. Let {t/j: 1 <_j <_r} be a family of real valued r.v.'s such that for each j (1 _
l___k___(j-l)(p+q)+p.
Let e > O. Then (2.3)
P[ i=I ~ t/i
(2.4'
"[ i=1~t/i ~ 8 ]~
P[~ei(8i=l
]-2rfl(n,q,n)
where {Zi: 1 <_i< r} are independent r.v.'s such that for each i (1 < i_< r), Zi has the same distribution function as that of the random vector t/i. PROOF.
We will prove (2.3) since the proof of (2.4) is similar. Let h = {(Xl,...,Xr): Xl -~- "'" -~ Xr < ~,} .
Set =/ 1 g(xl,..., Xr)
0
if (Xl,..., Xr) ~ A otherwise.
Then g ( x L . . . . , xr) is bounded by M = 1. Let Fll),..., F (r) be, respectively, the distribution functions of the random vectors ql,..., t/r. Using Lemma 2.2, we
get (2.5)
P [ ~ l t/~ < e ] = Eg(t/,,...
<_f...f g(x,,..., xr)dF(U(Xl) ... dFlr)(x,) + 2re(n, q, n) = "[ ~
2r,(n,q,n,.
RECURSIVE DENSITY ESTIMATORS
311 F
ASSUMPTION 1. The kernel function K e LI, with J K ( x ) d x = 1 and K has an integrable radial majorant Q(x), that is, Q ( x ) = sup {[K(y)I: Ilyll -> Ilxll} is integrable, where Ilxll denotes the Euclidean norm of x and the integration is over the entire space. Assume in addition that K satisfies the following Lipschitz condition: I K ( x ) - K(y)I < C l l x - yll . R e m a r k 2.1. Note that Assumption 1 implies that IKI is bounded since K is Lipschitz and absolutely integrable.
ASSUMPTION 2.
The bandwidth parameter {b,} satisfies n
(1/n) j~ l (b./bj) d" --- Oar as n --* ~ for 1 ___r<2. ASSUMPTION 3. (i) The joint probability density f ( x , y , k ) of the r.v. X1 and Xl+k exists and satisfies I f ( x , y , k ) - f ( x ) f ( y ) l < - M < ~ for all x,y and k_> 1. (ii) Assume Xt satisfies the ARLT condition with 9~(n)= O(n -~) for some v > 2. The following is an immediate consequence of the Lebesgue Density Theorem (see Devroye and Gyorfi (1985)). LEMMA 2.4.
Assume that Assumption 1 holds. Then
limf~
n-ooJ R
Kn(x - u ) f ( u ) d u = f ( x ) f K ( u ) d u = 1.
LEMMA 2.5. Assume Xt is a strictly stationary A R L T p r o c e s s satisfying Assumptions 1-3. Let x be a point o f continuity o f f Suppose that b~ ~ 0 and nb~ --. ~ as n ~ ~. Then (2.6)
lira nb~ var J~ (x) =
Oaf(x)f K 2(u) d u ,
where v a r y ( x ) denotes the variance o f j~(x).
PROOF. The proof of Lemma 2.5 is essentially the same as the proof of Theorem 3 of Masry (1986) except here the theorem is proved under the ARLT condition and Xt takes value in R a instead of R ~. We will therefore just sketch the proof. Note that vary~(x) -- In(x) + Rn(x), where
312
LANH TAT TRAN
I,(x) = n -2 ~2 v a r K i ( x - Xi), i=1
(2.7)
n
n
R , ( x ) = n -2 ]g ]g c o y { K i ( x - Xi), Kj(x - Xj)}. i=lj=l
Using Assumptions
1 and 2 l i m nba, I, (x) = Oaf(x) f K 2(u) du .
(2.8) F r o m (2.7)
(2.9)
IR,(x)l
<- 2n -2
~ ~
i=lj=l
Icov { K d x
-
Xi),
Kj(x
-
Xj)}I .
i>j
C h o o s e c. = b n 80-7)/~ w h e r e q = - 1 - e + (1 - y ) a -~ w h e r e a = 1/v w i t h y a n d e b e i n g s m a l l p o s i t i v e n u m b e r s s u c h t h a t a -~ - (1 + e)(l - y)-i > 1. T h i s c a n be d o n e since 0 < a < 1 / 2. N o t e t h a t r/>
- l-e+(1-y)[l+(l+e)(1-y)-']=(1-y).
Split the s u m a b o v e i n t o t w o r e g i o n s , & a n d Sz,
(2. lO)
S,={i,j~{1,...,n}:
l <_i-j<_c,},
$2 = { i , j ~ { l , . . . , n } : c . +
l<_i-j<_n
- 1},
a n d write the b o u n d as (2.11)
I R . ( x ) l - J~ + J 2 .
N o t e t h a t c.b d = b~ atll~-y)/'t-q w h i c h t e n d s t o z e r o as n--* ~ ((1 - y)/r/) - 1 < 0 a n d b . --* 0 as n ---- ~ . U s i n g A s s u m p t i o n 3(i)
(2.12)
Ji < 2 M n - 2
IS
I K(o) l dv
]2
= O ( c d n ) = O(c,b,/nb,)d
Y~ 1 S,
d = o(l/(nbd)).
L e t d = 2(1 - y ) / 7 or ? = 2•(2 + 6). D e f i n e (2.13)
q,(x) = f (1 / b / ) l K ( ( x - u)/ b,)[ ~+~f(u) du ,
since
RECURSIVE
DENSITY ESTIMATORS
313
f(x)flK(u)12+adu
flK(u)12*adu
which tends to < oo by L e m m a 2.4 since is finite by Assumption 1. Using L e m m a 2.1 with r = s = 2 + ~ and h = (2 + d)/c~ and following a computation similar to (3.25) of Masry (1986), we have
s~ <-- c n -~
l .... 1 [ p ( l , l , 1)]1 =?' jE I
b/.+~)
-d(l+~)
=
DI +j
.
Using the Cauchy-Schwarz inequality and Assumption 2, we obtain (2.14) Thus
(2.15)
nbd.J2 <_ Cb~ d~'-~ Y. [~(l)] '-~ oo
Cbnd(1-)'ICn~ ~ lq[f~(l)]I-y. ]--
n
All we need to show is that ~ P[~o(I)]I-~' < oo. However, l=l
(2.16)
~ P [ f ( l ) ] l - : ' _< C ~ l-'-*+(l-Y)"l -II-)''v= C ~ 1-1~ < oo . I=l
/=1
/=1
Finally by (2.12), (2.15) and (2.16), nb~& --* 0 and n b ~ l R , ( x ) l ~ 0 as n ~ oo since b2d¢l-:')c2 '1 = 1 and c, ~ oo. ASSUMPTION 4.
Suppose sup I£'(x)[
f f denotes the partial derivative o f f with respect to xj, the j-th coordinate of x. Suppose also that )2 (be/b~) = O(n) as n ~ oo. i=1
THEOREM 2.1. (i) S u p p o s e A s s u m p t i o n s 1-3 are satisfied and b, I 0 as n --. oo slowly enough that (2.17)
nb dl3-2"/I ~ oo
f o r s o m e y < 1 - v-l. A s s u m e also
(2.18)
n(V-yv-l)/2bdll3-2y)vll-r)-|l/2 ---. oo
314
LANH TAT TRAN
and n(V- 20-1)/2 /~ d(3v- 27v+ 1)/2
(2.19)
Then (nbdn)l/2(fn(X)- EJ~(x))/a has a s t a n d a r d n o r m a l distribution n --, ~ , where
as
a 2 = Odf(x) fK2(u) du
(ii)
I f in a d d i t i o n , A s s u m p t i o n 4 is satisfied a n d nba~+2~ 0, then has a standard n o r m a l distribution as n ~ ~ .
( n bd, , ) r/~,~ f , ( x ) - f ( x ) ) / a
PROOF OF THEOREM 2. l(i). (2.20)
so:
Define Yj = b//2aj(x). T h e n n
/,(x)-
?/
z b//%(x): Z rj
Let co(n) be a positive function increasing to infinity sufficiently slowly that (nbU. (3 2>'))I/2(co(n))-2 _ ~ ,
(2.21) (2.22)
n(-~+y~+1)/Zb~1(3-2y)~(-l+y)+t]/2(co(n))21(l-y)v-i) --. O, n(2°+ l-v)/Z b~d(3v- z;'"+l)/2(co(n))l +2" -- O .
Choose q = qn = [(nbf3-2yl)l/2(co(n)) -2] and p = p ( n ) = [(nba,)l/2/co(n)], where [a] denotes the integer part of a. Then q / p = bffIl->'l(co(n))-1 = o ( 1 ) .
(2.23)
Assume for now that n / ( p + q) = r where r is a positive integer. If n is not a multiple o f p + q then the proof can be altered, but the result remains valid as will be pointed out later. We now set the r.v.'s Yj's in alternate blocks of size p and q. Let ( j - 1)(p+q)+p
U(n, x, j ) =
(2.24)
Z
i=(j-|)(p+q)+ 1
•(x) ,
j(P+q)
V(n, x, j ) =
Z
l =(j- l)(p+q)+p+ 1
Y,(x) ,
w h e r e j = 1,..., r. Let (2.25)
S~ - J =1 U(n, x, j )
and
S~"-= )2 V ( n , x , j ) j=l
RECURSIVE DENSITY ESTIMATORS
315
be the sum of r.v.'s Yj in large blocks and small blocks, respectively. Let > 0 and let a2(S.) denote the variance of Sn. It is easy to see that, (2.26)
P[Sn/cr(&) <<_x] <- P[ISd't / a(&) <- ~, Sn/ cr(S~) < x] + P[IS£1/a(S,,) > e] <_e[s,, - S;'/a(S.) <_e + x] + P[lS"l/a(S.) > e] = P[Sd/a(S.) < e + x] + P[IS£J/o(S.) > el.
By (2.3) of Lemma 2.3, (2.27)
P[S'/a(S.)
where Z.; (1 < i < r) are independent r.v.'s such that for each i, Zni has the same distribution as that of U(n, x, i)/a(S.). By (2.26) and (2.27), (2.28)
P[&/a(S.) <- x] <_e [ ,~, Z.i <_x + e ] + e[{sg'. /a(S.) >_g] + 2rfl(n,q,n) .
Similarly, using (2.4), it can be shown that (2.29)
P[S./a(&) <_x] >_P[ ,=~ Z.;<_x- E 1 - P[]S"I/a(S.)>_e]- 2rfi(n,q,n) ,
By a simple computation, we have (2.30)
r~(n,q,n) <_ Cn(p + q)-ln°q-~ < Cn(p + q)-lnO{(nban(3-2rl)'/2(w(n)) 2}-~ <_ Cnl+°-{~/2)(p + q)-l(oo(n))2~b; dvO-2y)/2 < Cn t +°-(~/2~(nbff)-~/2(co(n))~+2ubndvl3-2y}/2 < Cnt2e, l-~/2(og(n))J ,2,b,Ta13~-2~.~+0/2 = 0(1),
by (2.22). Using (2.25)
316
(2.31)
LANH TAT TRAN
(1/n)EIS#I 2 = (1/n) Z ElF(n, x, 0] 2 i=l
+ (2/n) ,=~ J=~ coy {V(n, x, i), V ( n , x , j ) } . Employing (2.24) q (2.32)
E[V(n,x,j)] 2 = i~=l E[Yij-1)(p+q)+p+i] 2 q
q
+ ]~ Y~ COV {Y(j-l)(p+q)+p+i, i=1 /=l
Y(j-1)(p+q)+p+l} •
i~l
By a simple computation using Lemma 2.4,
E[ Y(s- 1}(p+q)+p*i] 2 ~ MI (x) ,
(2.33)
for some constant Ml(X) independent of i. Note that 1~1[~(/)]l-r < ~ since y < 1 - v-1. By Lemma 2.4 (2.34)
c o v { Y(j-l)(p+q)+p+k, Y(j-1)(p+q)+p,l}
<- C[9~(Ik-/I)l-Y]ll Y(j-1)(p+q)+p+kllz/rll Y(j-l)(p+q)+p+tllz/). . Again, by Lemma 2.4, (2.35)
max [I Yill2/~, <- [M2(x)b a,(1-(l/Y))]yn ,
for some constant M2(x) independent of i. Therefore (2.36)
(l/n) i~iE[V(n,x, 0] 2 _<(1/n).~ ~ MI(X) z=l k=l q
+ C(1/n)tM2tx)o,d(l-('/~)>l~j
i=, ~ k=l ~
<_ M~(x)n-lrq + CM2(x)n-lb•(r-l)rq < MI (x)n-lrq + CM2(x)n-lba~(r-l)rq, since
,_-E_1[¢(Ik _
11)]'-~'
~ [9~(i)]1-y
1=1
RECURSIVE DENSITY ESTIMATORS
1:1
I=1
Clearly
2 ~ ~ cov{V(n,x,i),V(n,x,j)}
(2.37)
i:1 j = l i>j
= 2 Z Z
J(P+q)
i(p+q)
£
F~
i:1 j = ! k=(j-1)(p+q)+p+l l:(i-l)(p+q)+p+l i>/
cov {Yk, Yz}.
For i > j , the indices k, l differ by at least p, so by (2.35) and Lemma 2.1 2 ~ ]~ cov {V(n,x,i), V(n,x,j)}
(2.38)
i=lj=l i>/
n-p
___4 Y.
~
k=l l=k+p
[cov{Y~,Ylll
n-p
c k=l X
Ykll2.11Y, IIz.
l=k+p n-p
<_CM~'(x)bff t~'-u X
n
~+ [ 9 ( 1 - k)] ~-~'
k=l I=
p
oo
_
k~p[9(k)] ~-~'.
By (2.31), (2.36) and (2.38), (2.39)
(1/n)EIS, TI 2 <_Cn-'rq + Cn-'bfft'-l)rg + Cb ffly-~l ~p[~(k)] '-r .
Since q/p ~ 0 as n ~ ~, (2.40)
rq/n : nq/((p + q)n) = q/(p + q) = o(1).
By (2.23) (2.41)
n-tbffiy-i)rq = n-lbff(~-l)n( p + q)-lq = bd(~.-l)(p + q)-lq <_bffI:ltp-lq = 1/~o(n) -- 0
as n --- ~ . Since q _
317
318
LANH TAT T R A N
(2.42)
k=p < Cbalr l)q-{1-~,lv+ 1 = Cbal>, - ll[(t.tbd(3-z~,l)l/2(~(n))-z]-ll-y)v+
l
f,i~ d(~,- l jll~.,l~d(3-27)~l/21-(1 -)')v+ l ( ( . o ( n ) ) 2 { I I - T b ' - 11 - - ,..un O.,,~.n ! j
< C & air-,}n /-,'+r~+l)/2bdl3-2y)l-~+,+w2(co(n))2,1-r>-1} <_ C n t-"+,,,v+ l}/2bd{(3-2yJv(-l+,,)+ll/2(~(n))2{{1-y)v-l}
=
o(1).
Finally, from (2.39)-(2.42) (2.43)
(1/n)EIS,;'[ 2 ~ O,
which entails P[I&"I
(2.44)
> enl/2] = o(1).
Lemma 2.3 implies lim ( r r Z ( S , ) / n ) = nbU, var (f,(x)) = a 2 .
(2.45)
It follows from (2.44) and (2.45) that (2.46)
P[IS."I
> er~(S,)] = o(1).
Since & = Sd + Sd', by (2.43) and (2.45), (2.47)
( 1 / n ) E S d 2 ~ r~2
Note that (2.48)
(l/n)E[Sd[
2 = (l/n)
.~ E U Z ( n , x , j )
J=l
+
cov {U(n,x,
i) U(n,x,j)}.
i>i
Similar to the proof of (2.41), (2.49)
(2/n),=l j=, cov { U ( n , x , i ) U ( n , x , j ) } i>i
by (2.42). Employing (2.47), (2.48) and (2.49),
<_ Cba~ (~
t} ,=q ~ [~(k)]'-Y = o(1)
RECURSIVE DENSITY ESTIMATORS
(2.50)
(i/n)
ZEU2(n,x,j)
j=l
319
-- a 2
which together with (2.45) gives
(ll~(s.)) ~, E U 2 ( n , x , i ) -
(2.51)
i=1
1.
Note that (2.52)
I~1 =
b//%(x)
:
I
<_ C(b//2
+
b//~)
since f K j ( x - u)f(u)du ~ f ( x ) as j ~ oo and IKI is bounded by Remark 2.1. Hence max I~1 -- Cb?,"/2 and I -~,i<_n
max ] U(n,x,j)] < C p b S 2 ,
(2.53)
I~j
which entails (2.54)
(na2) -1 ~ E[U2(n, x, i)I{1U(n, x, i)1 > tTnVZa}] i=1
<_(na 2)- lp2bi, dr max P[I U(n, x, i)1 > qnl/2 ] ~ O, I<~i<_r
since
pb~a/2(qnl/2~r) -1 < Cp(nbff)-112 --+ 0 by the choice ofp. Here I{A } denotes the indicator of the set A. By (2.45) and (2.54), (2.55)
(l/o-2(S,)) ]~ E[U2(n,x,i)l{l U(n,x,i)[ >_qo'(S,)/l ~ 0 i=1
or equivalently (2.56)
~ E[Z~I{IZ, il > r/}] -- 0
i=1
where the Z,,-'s are defined in (2.27). Let
LANH TAT TRAN
32O
(2.57)
Snr = • ( Z n l
+ "" + Znr) •
By (2.45) and (2.51)
(2.58)
lim s,r = 1 . r~o~
Employing (2.56) and (2.58)
E[(Z~,/snr)eI[IZ.il
~
(2.59)
i=l
> ?ISnr]] -'~ O .
Then by the Lindeberg-Feller theorem (2.60)
P
i
l (Zni/s,r) < x + e -- ~ ( x + ~) .
Since e can be chosen arbitrarily small, by (2.28), (2.29), (2.30), (2.44) and (2.60), (2.61)
P[ S,/ a( S,,) <_ x] -- ~ ( x ) .
The proof of (i) is completed by (2.20) and L e m m a 2.5. Finally, s u p p o s e n is not an integer multiple of p + q. Let r = [ n / ( p + q)]. Define S• ' ' =
Yr(p+q}+l + "'" + Y . .
It is not hard to show that P [ S " / a ( S . ) > e] -- 0. The proof of the theorem can be completed in the same m a n n e r as in the case n = ( p + q)r since S . = S; + S# + S " .
PROOF OF THEOREM 2.1(ii). (2.62)
We have
E f , ( x ) = n-' Y-, b i d f K ( ( x - y ) / b , ) f ( v ) d y i=l
=n
= n
" f K ( z ) f ( x - biz)dz
i-1
_lgf K(z)[ f ( x l' l
= f ( x ) + n-' ~
i=1
By Taylor's theorem
- biz) - f ( x ) + f(x)] dz
f K(z)[f(x-
biz)-f(x)]dz.
R E C U R S I V E DENSITY E S T I M A T O R S
(2.63)
IE j ( x ) - f ( x ) l - <
321
iZlflK(z)l If(x- b,z)-f(x)ldz
n -l
< sup sup Ifj'(x)ln -~ lsj~d
x
+ s(t )
i=l
bi
j
--- C sup sup I£'(x)ln -1 ~:1 ~2 b, f I:~ ]<_d
IK(z)ldz
llzl[ IK(z)ldz
x
bi = Cb, n -1
<~ C n -1
IzJl
i=l
(bitb,) <- Cb, i=
by A s s u m p t i o n 4. Therefore
(nb~) l/2[E~ (x) - f(x)[ / a <_ C(nbff)lt2b, = Cntl2b~af2)+l , which tends to zero since no, . d . 2 ~ 0 by assumption.
Example 2.1. Let X, be a s t a t i o n a r y autoregressive process of order 1, t h a t is, X, = aXt-~ + e, where lal < 1. Assume the et's are i.i.d, and each el has a C a u c h y density with density s y m m e t r i c a b o u t zero. T h e n Xt satisfies the A R L T c o n d i t i o n with 0 = 0 and ~(n) -- O(e -s") for some s > 0. Hence ~(n) = O(n-") for all v > 0. It is not hard to show that A s s u m p t i o n 3 is satisfied. Consider the case where bn = n -p for some p > 0. Then (2.17), (2.18) and (2.19) are satisfied if for s o m e arbitrarily large v and some 7
(2.64)
p <
(2.65)
p <
1 d(3 - 27) '
v-Tvd{(3 - 2y)v(]
1 - y) -
1}
and
(2.66)
p <
v-I
d ( 3 v - 27v + 1) "
By letting v ~ oo and 7 - - 1 , it is seen t h a t (2.64)-(2.66) are satisfied if p < 1/d. The c o n d i t i o n no, .d+2 --- 0 is satisfied i f p > 1 / ( d + 2).
3.
Uniform convergence of fn
LEMMA 3.1. Assume the conditions o f Lemma 2.5 hold and in addition f is continuous on R d. Let D be a compact subset o f R a. Let I,(x) be defined as in Lemma 2.4 and let
322
LANH TAT TRAN 11
R.*(x) = n
n
E I c o v { K , ( x - X,), g j ( x - X h } l •
E
i=lj=l i¢j
Then f o r some constant C > 0, nb~L(x) < C
and
nb~lR*(x)[ <- C
f o r all
x ~D .
L e m m a 3.1 can be obtained by the same argument as of L e m m a 2.5 by noting t h a t f i s uniformly continuous on any compact subset of R d. LEMMA 3.2. Assume n = 2pq f o r some positive integers p, q. Let {(q/~,...,qjN), 1 < j < q} be a family o f N-dimensional random vectors such that f o r each j (1 < j < q), r/j = (rbl,... , qjU) is measurable with respect to the a-field generated by X(k),
2(j-
l)p-p+l<_k<2(j-
1)p.
Let ~ > O. Then P
[ IqE][ max
I~/~N
~r/o
i-
>e
_
max
I<_./~N
I 1] i=I
Z,j > e
+2q~(n,p,n)
where {(Zjl .... , ZjN), 1 <--j <--q} are independent random vectors such that f o r each j (1 <_j < q), Zs -- ( Zjl,..., ZjN) has the same distribution function as that o f the random vector TIj. L e m m a 3.2 is essentially the same as L e m m a 3.1 of Yoshihara (1984). The proof can be easily obtained from L e m m a 2. I. LEMMA 3.3. Suppose Assumptions 1 and 2 hold and in addition f is continuous on R a. Suppose b, ~ 0 slowly enough that (3.1)
nb~a(log n)-l ~ oo .
Let ~V(n) = (log n)l/=(nbd,) - t/= and 3.~ = (nb~ log n) 1/2. Then f o r any compact subset D o f R d, sup Ifn(x) - Efn(x)l = O(~P(n)) a.s. as n ~ ~ ifbn tends to xED
zero in a manner that (3.2)
(2,g (n)/bff) n°~o([(nba,/2 (n)g (n))]) h (n) --* 0 ,
f o r some function g(n) increasing to infinity arbitrarily slowly, and some function h(n) > 0 with ~ l / h(n) < ~. n=l
RECURSIVE
PROOF.
DENSITY ESTIMATORS
323
Let
1 --- n-II/al-13/I26~)(log n)3/126~+ll/a)bnd/126) .
(3.3)
Since D is c o m p a c t , it can be covered by, say, o cubes Ik with sides parallel to the c o o r d i n a t e axes and with center at xk. N o w (3.4)
sup I./~(x) - E~(x)l -< m a x sup [ ~ ( x ) -]~(xk)l
+ max ] j ~ ( x , ) - EJ~(xk)[ I~k~u
+ m a x sup IEf,(xk) - EJ~(x)l . I~k<_u x E h
F o r x e Ik, using the Lipschitz condition of K and noting that b / > C ( l o g j ) / j by (3.1), n
(3.5)
I/,(x) - f , ( x k ) l --< (C/n) jZ bfa(llx - yll/b3 ~ /i
< Clan - Ij~=2.:(j/logj)l+I~/a~ <_ Clan I+la/a)(log n)-~-~/a) :
o(~(n))
a.s. as n -- ~o. Therefore m a x sup [ j ~ ( x ) - j ~ ( x k ) [ = O(~(n)) (3.6)
a.s.
as
n -- oo,
a.s.
as
n ~ ~.
I-
m a x sup [Ef/1(xk) - EJ~(x)l = O(~(n))
I~k<-°
,:¢:Ik
It remains to s h o w that (3.7)
max I J ~ ( x k ) - E~(x~)[ = O ( ~ ( n ) ) .
I<_k~o
A s s u m e n = 2pq for s o m e increasing integer valued functions p = p(n), q = q(n). Then the r.v.'s Aj's defined in (2.1) can be g r o u p e d successively into 2q blocks of size p. Let S(n, x) =J~(x) - Eft(x) = ( I / n ) ~ Ai(x). Write i--1
S ( n , x) as
S ( n , x ) = S ( n , x , 1) + S ( n , x , 2) , where
324
LANH TAT T R A N q
q
S(n,x, 2) =j~l V(n,x, 2 j - I)
S(n, x, I) = j_~ V(n, x, 2(j - 1)), with
V(n,x,j)=(l/n)~
Y,
=(j- 1)p+ 1
A~(x)
( j = l,...,q).
Note that S(n,x, 1) and S(n,x, 2) are, respectively, the sum of the evennumbered and odd-numbered groups. If it is not the case that n = 2pq, then the last blocks of S(n, x, 1) and S(n, x, 2) can be shorter than p but this does not effect the proofs of the results, as will be seen. Let en = r/~(n), where r/is a large number to be specified later. Observe that
(3.8)
P [ ~,max, Ifn(Xk)-Ef~(X~)l>en] = P [ ~k~,,max[S(n,x~,l) + S(n, xk,2)l > e,, ]
<- P[ ,m,ax lS(n, xk, 1)l > en/2 ] + P [ .~,~,,max[S(n, xk, 2)l > e,,/2 ]. Since I K [ is bounded by Remark 2.1, we have Jp
(3.9)
E [b.[d[K((x - Xj)/bj) - EK((x - Xj)/bj)][ V(n, x,j) <_(l/n)/=(j-ljp+l < Cn-t
Jp E b? ~ . i=(j-I}p+l
By (3.1) and (3.2), there exists a function g*(n) increasing to infinity such that nb,a(log n)- ' / g*(n) -- oo , and in addition (3.2) is satisfied with g(n) - g*(n). Choose (3.10)
p = [nbff/(2,g*(n))] _> C(nba~)l/2(log n)- 1/2/g*(n).
Then p ~ ~. By Assumption 2 and (3.9), JP
(3.1 I) 2~4 V(n,x,j)[ <- C2,n-~b£ d
Z (b,,/bi) a <_ C2,pn-~b~ a = 1/g*(n), i=(j-l)p+ l
RECURSIVE
DENSITY
ESTIMATORS
325
which tends to zero. We now approximate ARLT r.v.'s by independent ones. By Lemma 3.2 P [ ,_~k~_omax IS(n,x,,1)l>e./2]
(3.12)
<_ P max
= Vj*(xk) > ~,,/2 + 4qfl(n,p,n),
where the {(V;*(xl),..., Vj*(xo),j= l,...,q)} is a family of independent vdimensional random vectors such that for each j, the random vector (Vj*(xl),..., Vj*(xo)) has the same distribution as that of {V(n, x l , 2 ( j - 1)), ..., V(n, x o , 2 ( j - 1))}. Note that 12~Vj*(x)l-< 1/2 for large n. Hence there exists an N such that 9--)
exp 2. Vj*(x) _< 1 + 2~ Vj*(x) + ( b (x))'2;,, (3.13)
exp ( - 2. Vj*(x)) <_ 1 - 2. Vj*(x) + ( Vj*(x))Z22n
for n > N. Using the independence of the ~*(x)'s and applying Markov's q
q
inequality separately to the summands Y. Vj*(x) and Y~ - Vj(x), we obtain j=l
(3.14)
P
[I j~.=q Vj*(x) 1> e.] <[P
j=l
J=,~ Vj*(x) > e. + P
j
1 --
b*(X) > /~n
_ e-a':" (2=,fi E[exp 2. b*(x)]
q
+j__FIl E[exp ( - )t. Vj*(x))] < _ 2exp
)
( - 2 n e , + 2 2 ]~=lq E(Vj*(x))2 )
Clearly
(3.15) jZ=E(Vj*(x)) 2 _< n2[ k=t~'EA/, + 2 , ,<,~_,,EZ'[EAkAt[ ] = I,, + R* From (3.15) and Lemma 3.1 q
(3.16)
]~ E(Vj*(x)) 2 _< C/(nba~).
j=l
By (3.12), (3.14) and (3.16)
.
]
326
LANH
(3.17)
TAT TRAN
P[ ~k<_vmaxlS(n'xk' l)l > e"/2 ] 2 d <_ Co exp [ - )~,e,/2 + C2,/(nb,)] + 4qfi(n,p,n) .
r
Similarly, P [ max IS(n, xk, 2)l
1
>
en/2] is bounded by the right-hand
side of (3.17). Now 2,e, = q log n and 2~/(nbff) = log n. From (3.8) and (3.17)
(3.18)
P[ max l~(xk)-- Ej~(xk)l > e. "~ d _< Cv exp [ - 2.e./2 + C2~,/(nba)] + 4qfl(n,p,n)
= Co exp [ - (q/2) log n + C log n] + 4qfi(n,p, n) = Con ii,/2~+ct+ 4qfl(n,p, n) . From (3.3) (3.19)
0 <_ Cn l+13d/12'~ll(lOgn) -13d/t26))-tbff/1261 .
By (3.1) b. > C(n- i log n) ua .
(3.20)
Using (3.19) and (3.20), it is easy to show that for sufficiently large q
(3.21)
~ o n -11"/2~+c) <
o~ .
n=l
Note that (3.2) implies that ]~ q~(n,p, n) is finite. The proof of the lemma n=l
follows by the Borel-Cantelli Lemma from (3.2), (3.18) and (3.21). THEOREM 3.1. (i) Assume that ~o(n) = O(n-V) for some v> 3. Suppose Assumptions 1 and 2 hold and b, I 0 in a manner that (3.22)
nll~-31/21-°(log n) 13+~)/2(bn)ll+v~d/2(loglogn) -I1.~1 ~ oo
for some ~ > O. Then sup I j ~ ( x ) - Ej~(x)[ = O(~(n)) a.s. for any compact ~:e D
subset D of R u. (ii) I f in addition, Assumption 4 holds and n(log n)-lb,a÷2= O(1), then
RECURSIVE DENSITY ESTIMATORS
sup I f , ( x ) -
f(x)l = O(~(n))
327
a.s.
xeD
3.1. Since b, > 0, for (3.22) to be satisfied, it is necessary that ( ( v - 3 ) / 2 ) - 0 > 0 or v > 3 + 20. Remark
PROOF. (i) infinity such that (3.23)
By (3.22), there exists a function g ( n ) increasing to
nl3-'°/z+°(log n)13+"l/Z(bn)-tl+v~d/Z(loglog n)ll+')(g(n)) 1+' -- O .
By (3.23) ([nbd, log n]l/Zg(n)/ba,) n o • [nbff/((nba, log n)l/2g(n))]-Vn log n(loglog n) I1+'t --* 0 ,
which implies (3.2) of Lemma 3.2 with h ( n ) = n log n(loglog n)l+L Note that (3.23) implies (3.1). The theorem thus follows from Lemma 3.2. (3.24)
:
E f t ( x ) = n ~ Y~ b[ d K ( ( x - y ) / b i ) f ( y ) d y . i=1
(ii)
By the proof of Theorem 2.1(ii), it follows that sup IE~(x) - f ( x ) l x
< Cb,, = O(qJ(n)) since n(log n)-lba~ +2 = O(1). Part (ii) follows from (i) and
the fact that sup Ij~(x)-f(x)] < sup [J~(x)- E ~ ( x ) l + sup [EJ~(x)-f(x)l . .reD
.reD
.reD
Acknowledgements The author would like to thank the referees for many valuable comments.
REFERENCES Devroye, L. and Gyorfi, L. (1985). Nonparametric Density Estimation: The L1 View, Wiley, New York. Fukunaga, K. (1972). Introduction to Statistical Pattern Recognition, Academic Press, New York. Fukunaga, K. and Hostetler, L. D. (1975). The estimation of the gradient of a density function, with applications in pattern recognition, IEEE Trans. Inform. Theory, 21, 32-40. Gyorfi, L. (1981). Strong consistent density estimate from ergodic sample, J. Multivariate AnaL, 11, 81 84.
328
LANH TAT TRAN
Gyorfi, L. and Masry, E. (1988). The L~ and L2 strong consistency of recursive kernel density estimation from dependent samples, Report, TU Budapest. Gyorfi, L., Hardle, W., Sarda, P. and Vieu, P. (1988). Nonparametric curve estimation from time series, Preprint of book, 1988. Hart, J. D. and Vieu, P. (1988). Data driven bandwidth choice for density estimation based on dependent data, Tech. Report, Department of Statistics, Texas A&M University, College Station, Texas. Ibragimov, I. A. and Rozanov, Yu. V. (1978). Gaussian Random Processes, Springer, New York. Ioannides, D. A. and Roussas, G. G. (1987). Note on the uniform convergence of density estimates for mixing random variables, Statist. Probab. Left., 5, 279-285. Masry, E. (1986). Recursive probability density estimation for weakly dependent stationary processes, IEEE Trans. Inform. Theory, 32, 254-267, Masry, E. (1987). Almost sure convergence of recursive density estimators for stationary mixing processes, Statist. Probab. Lett., 5, 249-254. Masry, E. and Gyorfi, L. (1987). Strong consistency and rates for recursive probability density estimators of stationary processes, J. Multivariate AnaL, 22, 79-93. Pham, D. T. (1986). The mixing property of bilinear and generalized random coefficient autoregressive models, Stochastic Process. Appl., 23, 291-300. Pham, D. T. and Tran L. T. (1985). Some mixing properties of time series models, Stochastic Process. Appl., 19, 297-303. Robinson, P. M. (1983). Nonparametric estimators for time series, J. Time Ser. AnaL, 4, 185 207. Rosenblatt, M. (1970). Density estimates and Markov sequences, Nonparametric Techniques in Statistical Inference, (ed. M. Puri), 199-210, Cambridge Univ. Press, London. Roussas, G. G. (1969). Nonparametric estimation of the transition distribution function of a Markov Process, Ann. Math. Statist., 40, 1386-1400. Roussas, G. G. (1988). Nonparametric estimation in mixing sequences of random variables, J. Statist. Plann. Inference, 18, 135-149. Silverman, B. W. (1978). Choosing a window width when estimating a density, Biometrika, 65,1 11. Stone, C. J. (t983). Optimal uniform rate of convergence for nonparametric estimators of a density function and its derivatives, Recent Advances in Statistics: Papers in Honor of H. Chernoff, (eds. M. H. Rezvi, J. S. Rustagi and D. Siegmund), 393-306, Academic Press, New York. Takahata, H. (1977a). Limiting behavior of density estimates for stationary asymptotically uncorrelated processes, Bull. Tokyo Gakugei Univ. (4), 29, I 9. Takahata, H. (1977b). On recursive density estimators for a class of stationary processes, Bull. Tokyo Gakugei Univ. (4), 29, 10-18. Takahata, H. (1979). An application of the Strassen law of iterated logarithm to density estimations, Bull. Tokyo Gakugei Univ. (4), 31, 45-50. Takahata, H. (1980). Almost sure convergence of density estimators for weakly dependent stationary processes, Bull. Tokyo Gakugei Univ. (4), 32, l l 32. Tran, L. T. (1989a), Recursive density estimation under dependence, IEEE Trans. Inform. Theory, 35, 1103-1108. Tran, L. T. (1989b). The L~ complete convergence of recursive density estimators under weak dependence (manuscript). Wegman, E. J. and Davies, H. I. (1979). Remarks on some recursive estimators of a probability density function, Ann. Statist., 7, 316-327. Wolverton, C. T. and Wagner, T. J. (1969a). Asymptotically optimal discriminant functions for pattern classification, IEEE Trans. Inform. Theory, 15, 258-265.
RECURSIVE DENSITY ESTIMATORS
329
Wolverton, C. T. and Wagner, T. J. (1969b). Recursive estimates of probability densities, IEEE Trans. Syst. Sci. Cybern., 5, 307. Yakowitz, S. (1987). Nearest-neighbor methods for time series analysis, J. Time Ser. Anal., 8, 235-247. Yamato, H. (1971). Sequential estimation of a continuous probability density function and mode, Bull. Math. Statist., 14, I 12. Yoshihara, K. (1976). Limiting behavior of U-statistics for stationary absolutely regular processes, Z. Wahrsch, Verw. Gebiete, 35, 237-252. Yoshihara, K. (1978). Probability inequalities for sums of absolutely regular processes and their applications, Z. Wahrsch. Verw. Gebiete, 43, 319 330. Yoshihara, K. (1984). Density estimation for samples satisfying a certain absolute regularity condition, J. Statist. Plann. Inference, 9, 19-32.