Sankhy¯ a B : The Indian Journal of Statistics DOI 10.1007/s13571-014-0085-8 c 2014, Indian Statistical Institute
Fitting EXPAR Models Through the Extended Kalman Filter Himadri Ghosh, Bishal Gurung and Prajneshu Gupta Indian Agricultural Statistics Research Institute, New Delhi, India Abstract Exponential autoregressive (EXPAR) family of parametric nonlinear timeseries models, which is a discrete-time approximation of continuous-time nonlinear stochastic dynamical system, is considered. A heartening feature of this model is that it is capable of describing those data sets that depict cyclical variations. The estimation procedure for EXPAR models is developed using extended Kalman filter (EKF). Through simulation studies, it is shown that EKF is very efficient for fitting EXPAR models. Formulae for optimal one-step and two-step ahead out-of-sample forecasts are derived analytically by recursive use of conditional expectation. Conditions for the existence of limit cycle behaviour for EXPAR models are also established. Superiority of EKF method vis-a-vis Genetic algorithms (GA) method for fitting EXPAR models is shown through simulation studies. As an illustration, EXPAR models are employed for modelling and forecasting Oil sardine, Mackerel and Bombay duck time-series landings data in India. It is shown that all the three fitted models exhibit the desirable feature of existence of limit cycle behaviour. It is concluded that the EXPAR model performs better than ARIMA methodology for both modelling and forecasting purposes for the data sets under consideration. AMS (2000) subject classification. Primary 62M10; Secondary 62M20. Keywords and phrases. Exponential autoregressive model, extended Kalman filter, forecast performance, landings data, out-of-sample forecast.
1 Introduction Researchers possess a well-established methodology based on linear timeseries models, called the Box-Jenkins methodology (Box et al., 2008), for modelling and forecasting of data collected sequentially in time. It consists of fitting the Autoregressive integrated moving average (ARIMA) models by model selection and estimation of parameters followed by significance tests to check the adequacy of fitted model. Popularity of these models is certainly due to their relative simplicity and also due to the existence of computer software incorporating the same. The ARIMA model, however, is
2
H. Ghosh et al.
insufficient as it is not able to capture many important features present in practical situations. For over three decades, time-series analysis has moved towards the nonlinear domain, which is generally more appropriate for accurately describing dynamics of a time-series and for making better multi-step ahead forecasts (Fan and Yao, 2003). Several important parametric and nonparametric nonlinear time-series models have been developed. In respect of the former, four important models are: Bilinear model, Generalized autoregressive conditional heteroscedastic (GARCH) model, Mixture model and Self-exciting threshold autoregressive (SETAR) model, whereas in respect of the latter approach, two important families of models are: Functional- coefficient autoregressive model (FCAR) and Wavelet analysis. Bilinear model is used when there are outliers at random epochs, GARCH model is used for volatile data, Mixture model is used when there are sudden bursts along with flat stretches, and SETAR model is used for cyclical data. Under the nonparametric formulation, FCAR model is appropriate for describing cyclical data, whereas Wavelet analysis may be adopted for nonstationary long memory data. For modelling and forecasting of cyclical data, another important family of parametric nonlinear time-series model is the Exponential autoregressive (EXPAR) family, which is described in Section 2. Several methods for estimation of parameters of this model along with their limitations are also discussed in that section. A very promising method of parameter estimation of time-series data is the state space modelling technique (Durbin and Koopman, 2000). It is used effectively in situations where observations are composed of unobserved state variables (Harvey, 2001). State space modelling is mainly utilized for prediction, smoothing, filtering and likelihood evaluation for estimation of parameters through Kalman filter (KF). This is a recursive procedure for computing optimal estimator of state vector at time t, based on information available at that time. However, as KF is applicable only when the model is linear, it is not an efficient procedure for fitting EXPAR models, which can be expressed in a nonlinear state space model using local linearization approach. To this end, extended Kalman filtering (EKF) technique, which is a modification of usual KF, may be employed for fitting nonlinear EXPAR models. This aspect along with conditions for existence of limit cycle behaviour for EXPAR models in linear state space representation are discussed in Section 3. Formulae for carrying out out-of-sample forecasting are derived analytically by recursive use of conditional expectation in Section 4. In Section 5, superiority of EKF method vis-a-vis another state-of-the-art method, viz. Genetic algorithms (GA) method for fitting
Fitting expar models through the extended kalman filter 3 EXPAR model is shown through simulation studies. Section 6 illustrates fitting EXPAR models to Oil sardine, Mackerel and Bombay duck landings data in India. Finally, in Section 7, some problems for future research work are identified. 2 Description of EXPAR Models The EXPAR model of order p, i.e. EXPAR(p) model is given by Xt+1 = {ϕ1 + π1 exp(−γXt2 )}Xt +
··· + {ϕp + πp
exp(−γXt2 )}Xt−p+1 + ξt+1 , (2.1) where γ > 0 is a scaling constant, and {ξt } is white noise process with mean zero and variance σξ2 . If the parameters (π1 , . . . , πp ) are all zeros, (2.1) reduces to AR(p) model. Also, note that (2.1) may be thought of as a threshold autoregressive model, in the sense that, if |Xt | is large, it is similar to an autoregressive model with parameters approximately equal to (ϕ1 , . . . , ϕp ), while if |Xt | is small, autoregressive parameters switch to (ϕ1 +π1 , . . . , ϕp +πp ). Ghazal and Elhassanein (2009) discussed the dynamics of EXPAR(p) models, conditions of stability and existence of equilibria. It may be pointed out that marginal distribution time-series generated by EXPAR(1) model, i.e. Xt+1 = {ϕ1 + π1 exp(−γXt2 )}Xt + ξt+1 ,
(2.2)
is capable of exhibiting various types of non-Gaussian behaviours, like long heavy tailed, short light tails with concentration in centre, and bimodality only when π1 = 0. Allal and Melhaoui (2006) studied the problem of testing ordinary AR(1) dependence against alternative of EXPAR(1) model. Equation (2.2) exhibits a regime-switching behaviour with respect to delayed observation, in the sense that if |Xt | is large, it is similar to AR(1) model with parameter ϕ1 , but if |Xt | is small, the autoregressive parameter switches to ϕ1 + π1 . Although EXPAR(1) model is capable of exhibiting cyclical pattern, it has the limitation that it cannot describe the desirable feature of existence of a limit cycle behaviour. For p > 1, the EXPAR(p) models become appropriate for fitting non-Gaussian oscillatory time-series whose spectrum has peaks and are capable of exhibiting the desirable feature of existence of limit cycle behaviour. The maximum likelihood estimator of parameter γ is a very time consuming nonlinear optimization procedure. Moreover, objective function for the nonlinear coefficient γ is not convex and multiple local optima may exist. Therefore, there is no guarantee that this method will converge to the global optimum (Messaoud et al., 2006). Baragona et al. (2011) have given
4
H. Ghosh et al.
a good description of employing genetic algorithm for estimating the parameters of EXPAR(p) models. However, genetic algorithm searches the solution through evolution operators, such as selection, crossover and mutation. Evolution is inductive. In reality, life does not always evolve towards a good solution. Ghosh et al. (2011) used grid-search technique for fitting the EXPAR(p) models and applied them to some fisheries data. However, a limitation of this approach is that it is heuristic in nature. 3
Development of Estimation Procedure using Extended Kalman Filter In analogy with Ozaki (1993), estimation of parameters of (2.2) can be carried out through estimating parameters of discrete p-dimensional nonlinear state space model of stochastic dynamical system (SDS) in continuous time, given by the following transition and measurement equations: αt+1 = Φt (αt |θ)αt + η t
(3.1)
Xt = Ht αt ,
(3.2)
where ⎡ ⎢ ⎢ ⎢ ⎢ Φt = ⎢ ⎢ ⎢ ⎢ ⎣ Ht = H = (0
0 0
...
⎤ f1 (xt ) 1 0 0 . . . 0 ⎡ ζp,t f2 (xt ) 1 1 0 . . . 0 ⎥ ⎥ ⎢ ⎢ ζp−1,t ˙ 0 1 1 ... 0 ⎥ ⎥ . ⎥ .. ⎥ , αt = ⎢ ⎢ .. ˙ 0 0 1 ... . ⎥ ⎢ ⎣ ζ2,t ⎥ .. .. .. ˙ . . . ... 1 ⎦ xt fp (xt ) 0 0 0 . . . 1
⎤ ⎥ ⎥ ⎥ ⎥, ⎥ ⎦
1), fi (xt ) = ϕi + πi exp(−γx2t ), i = 1, 2, . . . , p,
and ς i,t stands for the (i − 1)th order differential coefficient of xt . Here η t = (ηt , 0, . . . , 0) , where ηt is a Gaussian white noise process with mean zero and variance ση2 . As the matrix Φt is a nonlinear function of αt and θ, EKF , which therefore there is a need to employ the EKF approach to obtain θ is the maximum likelihood estimator of θ = (ϕi , πi , γ) , i = 1, 2, .., p, based on prediction error decomposition. The EKF methodology is to apply the standard KF to nonlinear systems with additive white noise by continually updating a linearization around the current state estimate, starting with an initial guess. In respect of EXPAR(p) modelling in continuous time through
Fitting expar models through the extended kalman filter 5 EKF, the gradient matrix Ft (Anderson and Moore, 2005), after a good deal of algebra, is obtained as ⎡
Ft =
⎢ ⎢ ∂Φt αt t|t = ⎢ |αt = α ⎢ ∂αt ⎣
Ψ1 1 0 .. . 0
Ψ2 1 1 .. . 0
Ψ3 0 1 .. . 0
Ψ4 0 0 .. . 0
··· ··· ··· .. . ···
Ψp−1 0 0 .. . 1
Ψp 0 0 .. . 1
⎤
⎥ ⎥ ⎥, ⎥ ⎦
where 2 p,t|t )for i = 1, 2, . . . , p − 1, Ψi = ϕi + πi exp(−γ α p−1 2 p,t|t exp(−γ α πi α i,t|t {−2γ α p,t|t )} + Ψp = i=1
2 2 2 p,t|t )} + {−2γ α p,t|t πp exp(−γ α p,t|t )}, {ϕp + πp exp(−γ α
t|t of αt . Also Ht = α i,t|t is the i th coordinate of measurement update α 0|−1 and time (0 0 0 . . . 1) . After linearization and initializing α update prediction error matrix Σ0|−1 , the EKF equations are obtained. Since the transition (3.1) is nonlinear with dispersion matrix of measure t|t−1 }, t = 1, 2, · · · ment error as zero, therefore error sequences {Xt − Ht α with conditional variance-covariance matrix Ωt = Ht Σt|t−1 Ht are considered to be normally distributed for pseudo-likelihood estimation. Using prediction error and conditional variance, the likelihood function can be written
as
N t|t−1 }2 {Xt − Ht α logΩt + , L(X, θ) = − t=1 Ωt t|t−1 is time update of αt (Anderson where Xt ’s are scalar quantities, and α and Moore, 2005). Finally, it is maximized for various orders of models given by (3.1) and (3.2). Cao (2008) developed a code for EKF, which is available on MATLAB CENTRAL. However, this code is useful only when parameters are known. Therefore, it is not applicable in our case as we have to estimate the parameters of EXPAR models through EKF by maximizing the likelihood function using prediction error decomposition. However, this could be achieved by employing “fminsearch” function of MATLAB (2007) software package. The accuracy or stopping criteria of search procedure for optimum value may be kept as 10−5 , say. The best model is chosen on the basis of minimum Akaike information criterion (AIC) and Bayesian information criterion (BIC), given by
N t|t−1 }2 {Xt − Ht α logΩt + + 2q AIC = t=1 Ωt
6
H. Ghosh et al. BIC =
N t=1
t|t−1 }2 {Xt − Ht α logΩt + + q log N, Ωt
where N is the effective number of observations and q = 2p+2 is the number of parameters. To save space, in our subsequent discussion, we shall confine our attention only to EXPAR(2) model, i.e. Xt+1 = {ϕ1 + π1 exp(−γXt2 )}Xt + {ϕ2 + π2 exp(−γXt2 )}Xt−1 + ξt+1 , (3.3) However, the methodology may be extended for the EXPAR(p) models but the expressions become very lengthy. Estimated parameters of (3.1) and (3.2) may be used in order to estimate parameters of (3.3) through their suitable two-dimensional state space representation. The continuous counterpart of EXPAR(2) model, represented by SDS, is given by x ¨t + a(xt )x˙ t + b(xt )xt = ηt ,
(3.4)
where ηt is a Gaussian white noise with variance ση2 . Equation (3.4) can be written as a first order two-dimensional SDS given by ˙ t = f (αt |θ) + η t , α
(3.5)
where η t = [ηt , 0] , αt = [α1t , α2t ] = [x, ˙ x] , f (αt |θ) = [−a(α2t )α1t − b(α2t |θ)α2t , α1t ] .
Using local linearization approach, discrete state-space version of (3.5) is given by (3.6) αt+t = A(αt |θ)αt + B(αt |θ)η t+t , where
A(αt |θ) = exp(log{I + Jt−1 (eJt t − I)V (αt |θ)}),
and V (αt |θ) satisfies
V (αt |θ)αt = f (αt |θ).
(3.7)
By drawing analogy between differential and difference equations, an attempt is made here to estimate the parameters of (3.3) as follows. We first represent it in the form of two-dimensional first order stochastic difference equation. In (3.5), replace αt by Zt , where Zt = (Z1t , Z2t ) = (ΔXt , Xt ) , and also replace differential operator by difference operator Δ, where ΔXt = Xt+1 − Xt . Note that Xt+1 = Z1,t + Z2,t and ΔZ1,t = ΔXt+1 − Z1,t = Xt+2 − (2Z1,t + Z2,t ). Writing f (Zt |θ) = (f1 (Zt ), f2 (Zt )) , it is noted that
Fitting expar models through the extended kalman filter 7 f1 (Zt ) is a linear combination of components of Zt and f2 (Zt ) = Z1t . Also, taking first order approximation of eJtt , (3.6) yields A(αt |θ) = I +V (αt |θ). Thus, under discrete set up, linearized state-space model in two-dimensional Zt may be written in terms of V (Zt |θ), which satisfies the discretized form of (3.7). A straightforward but lengthy algebra yields the elements of matrix V (Z t |θ) as V1,1 (Zt )
=
2 [(ϕ1 − 2) + π1 exp {−γXt+1 }],
V1,2 (Zt )
=
2 [(ϕ1 + ϕ2 − 1) + (π1 + π2 ) exp {−γXt+1 }], V2,1 (Zt ) = 1, V2,2 (Zt ) = 0.
Therefore, (3.3) gives the linearized state-space model in two-dimensional Zt as Zt+1 = A(Zt |θ)Zt + η t , (3.8) where A11 (Zt |θ)
=
2 [(ϕ1 − 1) + π1 exp {−γXt+1 }],
A12 (Zt |θ)
=
2 [(ϕ1 + ϕ2 − 1) + (π1 + π2 ) exp {−γXt+1 }],
A21 (Zt |θ)
=
1,
A22 (Zt |θ)
=
1.
The transition matrix A(Zt |θ) involves exponential functions of Zt . It may be pointed out that transition matrix Φt (αt |θ) of the corresponding continuous time model given by (3.1) with p=2 also involves exponential functions of αt . Therefore, equating coefficients of transition functions A11 (Zt |θ) and A12 (Zt |θ) in (3.8) with f1 (xt ) and f2 (xt ) respectively in (3.1), estimators of parameters in (3.3) are obtained as ϕ 1 = ϕ 1,EKF + 1, π 1 = π 1,EKF , ϕ 2 = ϕ 2,EKF − ϕ 1,EKF , 2,EKF − π 1,EKF, γ =γ EKF . π 2 = π The conditions for existence of limit cycle behaviour for the model given by (3.8) are (Ozaki, 1993): (i) Roots of the quadratic equation x2 − (1 + ϕ1,EKF + π1,EKF )x + {(ϕ1,EKF + π1,EKF )−(ϕ2,EKF + π2,EKF )}= 0 (3.9)
are less than unity. (ii) Absolute values of roots of the quadratic equation x2 − (1 + ϕ1,EKF )x + (ϕ1,EKF − ϕ2,EKF ) = 0 are greater than unity.
(3.10)
8
H. Ghosh et al.
As (3.8) is the linearized state space version of (3.3), therefore, to the first order of approximation, the conditions given in (3.9) need to be satisfied to ensure existence of limit cycle behaviour for EXPAR(2) model given by (3.3). In this article, an attempt is made to study the existence of cyclical behaviour when second order of approximation of (3.3) is also considered. To this end, taking second order approximation of eJt t appearing in (3.6), we get Jt (3.11) A(Zt |θ) = I + V (Zt |θ) + V (Zt |θ), 2 where
Jt =
∂f (Z|θ) ∂Z
= ((Jt,ij )) . Z=Zt
Using chain rule of differentiation and after straightforward but lengthy algebra, elements of Jt are obtained as follows: 2 }[π1 − 2γXt+1 {π1 Xt+1 + π2 Xt }], Jt,11 = (ϕ1 − 2) + exp{−γXt+1 2 Jt,12 = (ϕ1 + ϕ2 − 1) + exp{−γXt+1 }[(π1 + π2 )
−2γXt+1 + {π1 Xt+1 + π2 Xt }], Jt,21 = 1, Jt,22 = 0 Now, cyclical pattern of EXPAR(2) model may be explained, if for small (large) Xt+1 , absolute value of determinant of A(Zt |θ) is large (small). Thus, the mechanistic interpretation of EXPAR(2) model with second order approximation in its state space representation is that the timeseries follows the sequence of stationarity and non-stationarity or vice versa. 4 Derivation of Formulae for Out-of-Sample Forecasts In this section, we confine our attention only to deriving out-of-sample one-step and two-step forecast formulae in respect of the EXPAR(p) models. However, formulae for more than two-step ahead forecasts, though quite complicated, can be derived along similar lines. The optimal predictor, which minimizes the mean one-step ahead squared prediction error is the conditional expectation, obtained from (3.1) and (3.2), is N +i = E{XN +i |χN +i−1 , θ} = H α N +i|N +i−1 X N +i−1|N +i−1 , N +i−1|N +i−1 α = H ΦN +i−1 |α
(4.1)
Fitting expar models through the extended kalman filter 9 where χT −1 is the information contained in {X0 , X1 , . . . , XT −1 }. Using recursive conditional expectation approach, we get N +i+1 = E[{XN +i+1 |χN +i−1 , θ}] = E[E[XN +i+1 |χN +i , θ]|χN +i−1 , θ] X N +i+1|N +i )|χN +i−1 , θ] = E[HN +i+1 (α N +i|N +i )|χN +i−1 , θ]. = E[H N +i+1 (ΦN +i |α N +i|N +i α By approximating the measurement update equation by time update equation through EKF applied on (3.1) and (3.2), two-step ahead predictor is given by N +i+1 = E[HN +i+1 {ΦN +i |α N +i|N +i−1 + KN +i (XN +1+i X N +i|N +i−1 α N +i|N +i−1 ))}|χN +i−1 , θ] −HN +i (α N +i|N +i−1 = HN +i+1 ΦN +i |α N +i|N +i−1 α N +i|N +i−1 ) = pth component of (ΦN +i |α N +i|N +i−1 α
(4.2)
N +i−1|N +i−1 ), = p component of (ΦN +i |α N +i|N +i−1 ΦN +i−1 |α N +i−1|N +i−1 α th
where the Kalman gain function Kt is given by (Anderson and Moore, 2005): Kt = Σt|t−1 Ht {Ht Σt|t−1 Ht }−1 . 5
Performance Study of EKF for Fitting EXPAR Models through Simulation To compare the performance of EKF method vis-a-vis GA method for fitting the EXPAR(p) models, simulation study is carried out for p=1,2,...,6. However, to save space, only the results for p=2 are reported here. Taking various values of parameters of EXPAR(2) model, time-series data of size 50 are generated and the model is fitted to simulated data using EKF and GA methods. The biases and variances of the parameter estimates are also computed based on 100 iterations and the same are reported respectively in Tables 1 and 2. It may be pointed out that biases as well as estimates of variances for all the 5 parameters, viz. γ, ϕ1 , ϕ2 , π1 and π2 are throughout less for EKF method vis-a-vis GA method, thereby indicating its superiority. Table 1 also shows that, even when the number of data points is as small as 50, estimated biases and variances throughout are very small, implying thereby that EKF method for fitting of EXPAR(p) models is very efficient.
2 ) V ar(ϕ 1 ϕ (0.0023,0.0073) (0.0182,0.0312) (0.0130,0.0029) (0.0231,0.0021) (0.0087,0.0021) (0.0021,0.0028) (0.0032,0.0031) (0.0022,0.0028)
γ 1 1 0.01 0.01 0.9 0.9 0.5 0.5
(π1 , π2 ) (-0.1,0.6) (0.9,0.8) (0.2,0.7) (0.6,0.5) (0.7,0.1) (0.6,0.9) (0.3,0.6) (0.7,0.8)
( π1 , π 2 ) (-0.094,0.623) (0.886,0.919) (0.179,0.586) (0.745,0.446) (0.732,0.171) (0.578,1.023) (0.219,0.635) (0.787,0.878) Bias (0.006,0.023) (-0.014,0.119) (-0.021,0.114) (0.145,-0.054) (0.032,0.071) (-0.022,0.123) (-0.081,0.035) (0.087,0.078)
V ar ( π1 π 2 ) (0.0021,0.0023) (0.0109,0.0089) (0.0093,0.0191) (0.0084,0.0104) (0.0023,0.0034) (0.0032,0.0039) (0.0021,0.0073) (0.0023,0.0048)
Table 1: Simulation results for EXPAR(2) model using EKF technique. γ bias V ar( γ) (ϕ1 ϕ2 ) (ϕ 1 ϕ 2 ) 1.043 0.043 0.0079 (0.9,0.5) (1.034,0.478) 0.908 -0.092 0.0045 (0.1,0.5) (0.245,0.494) 0.023 0.013 0.0067 (0.8,0.1) (0.787,0.093) 0.021 0.011 0.0055 (0.3,0.7) (0.256,0.643) 0.843 -0.057 0.0054 (0.4,0.8) (0.425,0.698) 0.917 0.017 0.0051 (-0.1,0.6) (0.015,0.713) 0.542 0.042 0.0083 (0.7,0.2) (0.734,0.231) 0.532 0.032 0.0029 (0.1,0.8) (0.085,0.765)
Bias (0.134,-0.022) (0.145,-0.006) (-0.013, -0.007) (-0.044,-0.057) (0.025,-0.102) (0.115,0.113) (0.034,0.031) (-0.015,-0.035)
10 H. Ghosh et al.
2 ) V ar(ϕ 1 , ϕ (0.0045,0.0702) (0.0243,0.0584) (0.0238,0.0260) (0.0586,0.0296) (0.0109,0.0389) (0.0053,0.0301) (0.0100,0.0692) (0.0163,0.0490)
γ 1 1 0.01 0.01 0.9 0.9 0.5 0.5
(π1 , π2 ) ( π1 , π 2 ) (-0.1,0.6) (-0.157,0.498) (0.9,0.8) (1.029,0.629) (0.2,0.7) (0.321,0.635) (0.6,0.5) (0.785,0.423) (0.7,0.1) (0.809,0.015) (0.6,0.9) (0.716,0.747) (0.3,0.6) (0.476,0.519) (0.7,0.8) (0.813,0.674) Bias (-0.057,-0.101) (0.129,-0.170) (0.121,-0.064) (0.185,-0.076) (0.109,-0.084) (0.116,-0.152) (0.176,-0.081) (0.113,-0.125)
model using GA technique. (ϕ1 , ϕ2 ) (ϕ 1 , ϕ 2 ) (0.9,0.5) (0.752,0.697) (0.1,0.5) (0.271,0.619) (0.8,0.1) (0.916,0.213) (0.3,0.7) (0.213,0.788) (0.4,0.8) (0.302,0.931) (-0.1,0.6) (0.049,0.756) (0.7,0.2) (0.593,0.356) (0.1,0.8) (0.213,0.894)
V ar( π1 , π 2 ) (0.0656,0.0049) (0.0729,0.0107) (0.0383,0.0211) (0.0370,0.0143) (0.0139,0.0065) (0.1115,0.0058) (0.1635,0.0086) (0.0821,0.0062)
Table 2: Simulation results for EXPAR(2) γ bias V ar( γ) 1.107 0.107 0.0097 0.870 -0.130 0.0130 0.043 0.033 0.0084 0.038 0.028 0.0083 0.782 -0.118 0.0087 0.812 -0.088 0.0074 0.587 0.087 0.0101 0.584 0.084 0.0049
Bias (-0.148,0.196) (0.171,0.119) (0.116, 0.113) (-0.087,0.088) (-0.098,0.131) (0.149,0.156) (-0.107,0.157) (0.113,0.094)
Fitting expar models through the extended kalman filter 11
12
H. Ghosh et al.
6 An Illustration for Fitting EXPAR Models Oil sardine, Mackerel and Bombay duck landings time-series data (’000 tonnes) from 1961 to 2008 in Kerala, Karnataka and Maharashtra States of India respectively, obtained from Central Marine Fisheries Research Institute, Kochi, India, are considered. From the total 48 data points denoted as {Xt , t = 0, 1, . . . , 47}, first 45 data points corresponding to fish landings for the period 1961 to 2005 are used for building the model and remaining 3 data points for validation purpose. The entire data analysis is carried out using MATLAB (2007) software package. Directed scatter diagrams for the given data sets are exhibited in Figure 1. A perusal shows that, for Oil sardine time-series data, there is a void at lag 2, which indicates that the joint distribution of (Xt , Xt−2 ) is non-Gaussian. Also, in the case of Mackerel and Bombay duck time-series data, directed scatter plots at lag 3 indicate a strong presence of non-Gaussian joint distribution of {Xt }. This typical feature of scatter plots with void indicates possibility of presence of limit cycle behaviours. Further, for all the three datasets, it is noticed that periodogram ordinates are significant, which also indicate towards possible presence of limit cycle behaviours. In respect of
300000 200000 100000 0 0
50000
100000
150000
200000
250000
300000
Directed scatter diagram of Oil sardine landings data at lag 2 100000 50000 0 0
20000
40000
60000
80000
100000
Directed scatter diagram of Mackerel landings data at lag 3 100000
50000
0 0
20000
40000
60000
80000
100000
Directed scatter diagram of Bombay duck landings data at lag 3
Figure 1: Directed scatter diagrams of fish landings data.
Fitting expar models through the extended kalman filter 13 Table 3: Estimates of parameters of EXPAR(2) models. Species Oil sardine Mackerel Bombay duck Parameters γ 0.00021 (0.000017)* 0.00103 (0.00082) 0.61(0.29) 1.10 (0.08) 0.61 (0.23) 0.89 (0.34) ϕ 1 π 1 0.12 (0.03) 0.95 (0.35) 80.36 (1.35) ϕ 2 -0.10 (0.02) -0.12 (0.04) 0.08 (0.02) π 2 -0.51 (0.21) 0.16 (0.05) -57.85 (1.79) σξ2 2149.32 (5.29) 1035.24 (4.22) 1010.20 (5.98) *Entries within brackets ( ) indicate corresponding standard errors.
all the three datasets, various values of p, i.e. order of EXPAR(p) models were tried for p=1,2,...,6 and, on the basis minimum AIC and BIC values, it is found that the optimum value of is 2 for all the three data sets. Using the methodology described in Section 3, parameter estimates for EXPAR(2) model in respect of the three data sets are computed and the same are reported in Table 3. To get a visual idea, graphs of fitted models along with data points are exhibited in Figure 2. A visual inspection shows that the fitted models are able to properly capture cyclical fluctuations present in the data sets. Subsequently, for all the three data sets, roots of the quadratic equations appearing in (3.9) and (3.10) are calculated and the same are reported in Table 4. Evidently, for all the three data sets, conditions given in (3.9) and (3.10) are satisfied, implying thereby existence of limit cycle behaviours. Further, determinants of A(Zt |θ) appearing in (3.11) are computed for all the three data sets. When |Xt+1 | are small (large), values of determinants of A(Zt |θ) are found to be large (small). This implies that, for all the three data sets, fitted EXPAR(2) models with second order approximation in their state space model follow the sequences of stationarity and nonstationarity or vice versa. This analysis again points out towards possible existence of limit cycle behaviours, when second order approximation is also considered. Table 4: Roots of the quadratic equations. Roots Quadratic equation in (3.9) Quadratic equation in (3.10) Species Root 1 Root 2 Root 1 Root 2 Oil sardine 0.03 0.41 1.96 1.43 Mackerel 0.03 0.87 1.70 1.10 Bombay duck 0.08 0.97 2.71 8.91
14
H. Ghosh et al. 250 200 150 100
2003
2005
1999
2001
1997
1995
1993
1991
1989
1987
1985
1983
1981
1979
1977
1975
1973
1971
1969
1967
0 - 50
1963
50 1965
Oil sardine landings
300
Mackerel landings
120 100 80 60 40 20
100 80 60 40 20 2005
2003
2001
1999
1997
1995
1993
1991
1989
1987
1985
1983
1981
1979
1977
1975
1973
1971
1969
1967
1965
0 1963
Bombay duck landings
1963 1965 1967 1969 1971 1973 1975 1977 1979 1981 1983 1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005
0
Figure 2: Fitted EXPAR(2) models along with data points (’000 tonnes).
6.1. Evaluation of Goodness-of-fit and Forecasting Performance of Fitted ARIMA and EXPAR Models. In the first instance, ARIMA model is fitted to the three data sets. On the basis of minimum AIC criterion, the AR(2) model is found to be best for all the three data sets, and parameter estimates of fitted models are reported in Table 5. Thereafter, a comparative study is carried out to evaluate goodness-of-fit performance of fitted ARIMA and EXPAR models on the basis of Mean square error (MSE), Mean absolute
Table 5: Estimates of parameters of fitted AR models. Species Oil sardine Mackerel Bombay duck Parameters Constant 43.40 (1.90)∗ 18.53 (0.83) 7.60 (0.49) AR 1 0.77 (0.14) 0.38 (0.17) 0.65 (0.19) AR 2 -0.08 (0.04) -0.08 (0.02) 0.12 (0.03) *Entries within brackets ( ) indicate corresponding standard errors.
Fitting expar models through the extended kalman filter 15 Species Criterion MAE MSE RMAE
Table 6: Goodness-of-fit of fitted EXPAR Oil sardine Mackerel ARIMA EXPAR ARIMA EXPAR 39.51 36.51 11.27 9.35 2356.86 2149.32 289.25 135.24 151.86 85.83 76.22 61.99
models. Bombay duck ARIMA EXPAR 8.26 7.80 122.95 110.20 27.31 24.12
error (MAE) and Relative mean absolute error (RMAE) criteria and the results are reported in Table 6. The Mean square prediction error (MSPE), Mean absolute prediction error (MAPE) and Relative mean absolute prediction error (RMAPE) values for the forecasts produced for hold-out data by fitted AR and EXPAR models are also computed and the same are reported in Table 7. A perusal of Tables 6 and 7 indicate the superiority of EXPAR model over ARIMA model for modelling as well as forecasting for the data sets under consideration. Finally, for all the three data sets, optimal out-of-sample forecasts for fitted EXPAR(2) model are computed, with parameters being replaced by their corresponding estimates. Using (4.1), one-step ahead optimal forecasts for hold-out data (’000 tonnes) in respect of the three data sets are computed and the same are reported in Table 8. One-step ahead forecasts are also computed using ARIMA model and the same are also reported in Table 8. It may be noted that EXPAR model has throughout performed better than ARIMA model for one-step ahead forecasting. Computations are also carried out for two-step ahead forecasting and the superiority of EXPAR model over ARIMA model is again noticed. Further, it is found that all the one-step and two-step ahead forecasts for EXPAR(2) model are quite close to the actual values, implying thereby that the model is able to perform forecasting for hold-out data quite satisfactorily.
Species Criterion MAPE MSPE RMAPE
Table 7: Forecast performance for hold-out data. Oil Sardine Mackerel Bombay duck ARIMA 32.49 1282.77 13.59
EXPAR 17.37 351.73 7.27
ARIMA 9.13 145.71 24.10
EXPAR 7.81 135.63 19.36
ARIMA 2.88 8.41 11.22
EXPAR 1.89 7.05 7.72
Table 8: Comparison of EXPAR and ARIMA models for one-step ahead forecasting in respect of hold-out data. Species Oil sardine (’000 tonnes) Mackerel (’000 tonnes) Bombay duck (’000 tonnes) Year Actual ARIMA EXPAR Actual ARIMA EXPAR Actual ARIMA EXPAR 2006 226.27 193.89 221.04 20.93 22.81 22.35 25.23 22.95 22.27 2007 250.47 200.11 228.46 45.22 25.14 42.20 27.49 25.58 24.24 2008 231.64 218.16 242.71 28.58 34.04 30.59 24.03 28.49 26.47
16 H. Ghosh et al.
Fitting expar models through the extended kalman filter 17 7 Concluding Remarks In this article, methodology for fitting EXPAR nonlinear time-series models through the EKF technique is developed and is demonstrated by applying it to three data sets. A heartening aspect is that, for all the three data sets, desirable feature of existence of limit cycle behaviour is established for EXPAR(2) models with first order approximation in its state space representation. It is shown that, when second order approximation is also taken into account, the time-series follow the sequences of stationarity and nonstationarity or vice versa. Work is in progress to develop the methodology to find out as to whether or not limit cycle behaviour continues to exist, when second and higher order approximations are also taken into account and shall be reported separately in due course of time. It is advocated that, for modelling and forecasting cyclical time-series data, researchers should apply this model rather than the ARIMA model. As a future research problem, this type of work needs to be extended for EXPAR moving average models. Parameter estimation of another extended EXPAR model, viz. Exponential functional autoregressive using Local polynomial regression technique, where the regression function is non-parametric could also be attempted. Finally, possibility of application of other variants of KF, such as Unscented Kalman filter (Wong et al., 2002) and Gauss-Hermite Kalman filter may be explored. Acknowledgements. Drs. Prajneshu and Himadri Ghosh thank the Department of Science and Technology, New Delhi, India for providing financial assistance during the course of this research work. Authors are grateful to the two anonymous referees and an Editor for their valuable comments, which have led to considerable improvement in the quality of this article. References allal, j. and melhaoui, s.e. (2006). Optimal detection of exponential component in autoregressive models. J. Tim. Ser. Anal. 27, 793–810. anderson, b.d.o. and moore, j.b. (2005). Optimal filtering. Prentice Hall, New Jersey. baragona, r., battaglia, f. and poli, f. (2011). Evolutionary statistical procedures. Springer, Germany. box, g.e.p., jenkins, g.m. and reinsel, g.c. (2008). Time series analysis: forecasting and control. Prentice Hall, San Francisco. cao, y. (2008) An implementation of extended Kalman filter for nonlinear estimation. http://www.mathworks.in/matlabcentral/fileexchange/18189-learning-the-extendedKalman-filter. durbin, j. and koopman, s.j. (2000). Time-series analysis of non-Gaussian observations based on state space models from both classical and Bayesian perspectives. J. R. Stat. Soc. B62, 3–56. fan, j. and yao, q. (2003). Nonlinear time-series: nonparametric and parametric methods. Springer, New York.
18
H. Ghosh et al.
ghazal, m.a. and elhassanein, a. (2009). Dynamics of EXPAR models for high frequency data. Int. J. Appl. Maths. Stat. 14, 88–96. ghosh, h., gurung, b. and prajneshu (2011). Methodology for combining linear and nonlinear time-series models for cyclical data. J. Ind. Soc. Agric. Stat. 65, 249–256. harvey, a.c. (2001). Testing in unobserved components models. J. Forecasting 20, 1–19. matlab (2007). Ver.7.4 (R2007a). The Math Works Inc., Massachusetts. messaoud, a., weihs, c. and hering, f. (2006) Nonlinear time series modelling: Monitoring a drilling process. In From data and information analysis to knowledge engineering. (M. Spilipoulu, R. Kruse, C. Borgelt, A. Nurnberger, V. Gaul, eds.). pp. 302-09. Springer, Germany. ozaki, t. (1993). Non-Gaussian characteristics of exponential autoregressive processes. In Developments in time-series analysis. (T. Subba Rao, ed.). pp. 257-273. Chapman and Hall, London. wong, y.s., voitcu, o. and popescu, c.a. (2002) An expert data mining system for nonlinear aero elastic response prediction. 23rd International Congress of Aeronautical Sciences, 8-13 September, 2002, Toronto, Canada.
Himadri Ghosh, Bishal Gurung and Prajneshu Gupta Indian Agricultural Statistics, Research Institute, New Delhi 110012, India E-mail:
[email protected] [email protected] [email protected]
Paper received: 22 May 2012; revised: 22 May 2014.