Neural Process Lett (2010) 32:269–276 DOI 10.1007/s11063-010-9156-7
Forecast Combination by Using Artificial Neural Networks Cagdas Hakan Aladag · Erol Egrioglu · Ufuk Yolcu
Published online: 30 October 2010 © Springer Science+Business Media, LLC. 2010
Abstract One of the efficient ways for obtaining accurate forecasts is usage of forecast combination method. This approach consists of combining different forecast values obtained from different forecasting models. Also artificial neural networks and fuzzy time series approaches have proved their success in the field of forecasting. In this study, a new forecast combination approach based on artificial neural networks is proposed. The forecasts obtain from different fuzzy time series models are combined by utilizing artificial neural networks. The proposed method is applied to index of Istanbul stock exchange (IMKB) time series and the results are compared to other forecast combination methods available in the literature. As a result of the implementation, it is seen that the proposed forecast combination approach produces better forecasts than those produced by other methods. Keywords Artificial neural networks · Forecast combination · Forecasting · Fuzzy time series
1 Introduction In time series, there have been many studies to obtain more accurate forecasts and various forecasting approach have been proposed in the literature. Artificial neural networks and fuzzy time series methods have been used successfully in many implementations recently [1,6]. Another forecasting method is forecast combination approach. This approach is based on combining different forecasting methods in order to obtain more accurate forecasts. Forecast combination approach was firstly proposed by Bates and Granger [2]. Then, Granger and Ramanathan [9], Newbold and Granger [12] and Winlkler and Makridakis [15] used forecast combination by using more than two forecasting models. Wong et al. [16] applied C. H. Aladag (B) Department of Statistics, Hacettepe University, Ankara, Turkey e-mail:
[email protected] E. Egrioglu · U. Yolcu Department of Statistics, Ondokuz Mayis University, Samsun, Turkey
123
270
C. H. Aladag et al.
three different forecast combination methods to four time series and compared the obtained results. A forecast value obtained from forecast combination is an output of a function consisting of sum of weighted forecasts produced by different forecasting models. These weights show the contribution of corresponding forecasting models used in the forecast combination method. These weights can be determined by taking some assumptions into account or by using various optimization methods. In order to obtain accurate forecasts, the key point in the forecast combination is determination of the weights and the type of combination function. The combination function can be chosen as a linear or a non linear function. In this study, artificial neural networks are used to determine the best values of the weights in forecast combination. The forecasts obtained from various forecasting models are taken as inputs of the artificial neural networks and the output is the combination of forecasts. When this artificial neural networks configuration is used, optimum values of the weights which provide the best non linear mapping can be obtained. Thus, both combination function and optimum values of the weights can be easily determined by using artificial neural networks. When the proposed combination method is applied, various fuzzy time series forecasting models are utilized. Fuzzy time series definitions and some fuzzy time series forecasting models are presented in Sect. 2. Three well known forecast combination methods in the literature are given in the Sect. 3. In Sect. 4, the proposed forecast combination method is introduced and the proposed method is applied to IMKB time series. In the last section, the obtained results are discussed. 2 Fuzzy Time Series Fuzzy time series approach was firstly proposed by Song and Chissom [13,14]. Chen [3] proposed utilizing fuzzy relation tables instead of performing complex matrix calculations in Song and Chissom’s method. In the literature, many methods have been also proposed in order to improve forecasting accuracy in fuzzy time series [7]. The some general definitions of fuzzy time series can be given as follows [1]: Let U be the universe of discourse, where U = {u 1 , u 2 , . . ., u b }. A fuzzy set Ai of U is defined as Ai = f Ai (u 1 )/u 1 + f Ai (u 2 )/u 2 + · · · + f Ai (u b )/u b , where f Ai is the membership function of the fuzzy set Ai ; f Ai : U → [0, 1]. u a is a generic element of fuzzy set Ai ; f Ai (u a ) is the degree of belongingness of u a to Ai ; f Ai (u a ) ∈ [0, 1] and 1 ≤ a ≤ b. Definition 1 Fuzzy time series Let Y (t)(t = . . . , 0, 1, 2, . . .) a subset of real numbers, be the universe of discourse by which fuzzy sets f j (t) are defined. If F(t) is a collection of f 1 (t), f 2 (t), . . . then F(t)is called a fuzzy time series defined on Y (t). Definition 2 Fuzzy time series relationships assume that F(t) is caused only by F(t − 1), then the relationship can be expressed as: F(t) = F(t − 1) ∗ R(t, t − 1), which is the fuzzy relationship between F(t) and F(t − 1), where ∗ represents as an operator. To sum up, let F(t −1) = Ai and F(t) = A j . The fuzzy logical relationship between F(t) and F(t −1) can be denoted as Ai → A j where Ai refers to the left-hand side and A j refers to the right-hand side of the fuzzy logical relationship. Furthermore, these fuzzy logical relationships can be grouped to establish different fuzzy relationship. Definition 3 Let F(t) be a fuzzy time series. If F(t) is a caused by F(t − 1), F(t − 2), . . ., F(t − m), then this fuzzy logical relationship is represented by F(t − m), . . ., F(t − 2), F(t − 1) → F(t), and it is called the mth order fuzzy time series forecasting model.
123
Forecast Combination
271
3 The Forecast Combination Methods Three well known forecast combination methods in the literature are simple forecast combination, variance–covariance, and discounted mean square forecast error (MSFE) methods [16]. These methods are given below briefly. 3.1 Simple Forecast Combination In this method, forecasts obtained from two or more forecasting models are firstly multiplied by weights and then summed. Combined forecast f c is calculated by using the formula given below. fc =
n
wi f i
(1)
i=1
In here, f i and n represent the forecast value obtained from the forecasting model i, and number of forecasting models, respectively. In this method, the weights wi can be equal to each other or the weights can take different values. If the weights determined as follows: wi = 1/n
(2)
then, this method is called as simple average combination. In simple forecast combination, combined forecasts are linear combination of forecasts of different models whatever the weights are equal or not. 3.2 Variance–Covariance Method Variance–covariance method is a forecast combination approach in which the weights are determined based performances of forecasting models used in combination. In this method, on n w = 1 are calculated as follows: the weights w i i=1 w t = u t −1 /u t −1 u
(3)
where and u represent sample covariance matrix and column vector consists of 1 (u = [1, 1, . . ., 1]t , t represents the transpose), respectively. The weights are calculated by using the formula (3) and then, the combined forecasts are calculated by using the formula (1). 3.3 Discounted Mean Square Forecast Error In this forecast combination method, the weight for forecasting model i is of the form T 2 β T −a+1 eia 1/ a=1 wi = n T 2 1/ i=1 a=1 β T −a+1 eia
(4)
where β is an multiplier which is 0 < β < 1, eit2 is error calculated from forecasting model i for the observation a and T is the number of observations. The weights are computed from the (4) then, the combined forecasts are calculated by using the formula (1).
123
272
C. H. Aladag et al.
4 The Proposed Forecast Combination Method The forecast combination methods, which are given in Sect. 3, use different techniques to determine weights. On the other hand, the combination function used in all of these methods has a linear form same as (1). The combination function can also be chosen as a nonlinear function. Freitas and Rodrigues [8] used a combination function which includes a nonlinear term. They considered the following weighting combination, which is linear, for two forecasting models: f c = w0 + w1 f 1 + w2 f 2 + wnl f 1 f 2 .
(5)
In here, f 1 and f 2 represents the forecasts obtain from model 1 and 2, respectively. wnl is the weight for the nonlinear term and can be found by minimization of the sum of squared errors. The combination function given in (5) is same as (1) when w0 = 0, wnl = 0. Although the (5) has a nonlinear term consists of forecasts, it does not include a nonlinear term for the weights. Therefore, this function is a linear combination of the weights. In this study, we use feed forward neural networks to combine forecasts obtained from different fuzzy time series forecasting models. In other words, artificial neural networks are utilized as a nonlinear combination function in the step of combination. In the proposed forecast combination approach, forecasts produced different models are inputs for feed forward neural networks and the output is combined forecast. By training a feed forward neural network which is configured in that way, the weights which provide best nonlinear mapping can be found. The number of inputs of the feed forward neural network is equal to the number of used forecasting models. In the implementation, one neuron is employed in the hidden layer in order not to lose generalization ability of the feed forward neural network. In the output layer, one neuron is employed and the one output is the combined forecast value. 5 fuzzy time series forecasting models are used so 5 forecast values are obtained for each observation. The architecture of the feed forward neural network employed in the combination step is shown in Fig. 1. For the neurons in the hidden layer and the output layer, the logistic activation function given in (6) and the linear activation function given in (7) are used, respectively. f (x) =
1 1+e(−x)
f (x) = x Fig. 1 The architecture of used feed forward neural network
123
(6) (7)
Forecast Combination
273
By taking into account the bias weights, the neural network model given in Fig. 1 can be expressed mathematically as follows: 5 1 w (1, i) (8) fc = + w1 w(2, 1) + w2 , for i = 1, 2, 3, 4, 5 1 + e(− fi ) i=1
where f i is the forecast value obtain from model i, w(1, i) are the weights between the input and the hidden layers, w1 is the weight between the input bias and hidden layer neuron, w(2,1) is the weight between the hidden layer neuron and output neuron, and w2 is the weight between the hidden layer bias and output neuron. It is clearly seen that the mathematical model given in (8) has a nonlinear form. When the feed forward neural network, whose mathematical model is expressed in (8), is trained, the calculated weights are the optimum weights of forecast combination and the obtained outputs is the combined forecasts. We would also like that the function given in (8) is a nonlinear combination of the weights. In order to train a neural network, an optimization algorithm is used. In this study, Levenberg Marquardt algorithm [11] is used as training algorithm since it is an efficient nonlinear optimization method that is used in most optimization packages. In other words, the best values of the weights w in (8) are calculated by using Levenberg Marquardt algorithm.
5 The Application of the Proposed Method The IMKB time series including 95 observations is used in the implementation. The daily observations are between 20th May 2008 and 26th September 2008. The graph of the used time series is given in Fig. 2. The 10 observations between 16th September and 26th September 2008 are chosen for the test set. For the test set, the best forecast values obtain from the models proposed by Cheng et al. [5], Chen [3,4], and Huarng [10] are given in Table 1. In Chen’s [3,4] methods, the lengths of interval are 1300 and 900, respectively. The order of the used Chen’s [4] model is 2. For Cheng et al.’s [5] method, the number of sets is taken as 6. For the used forecasting models, calculated root mean square error (RMSE) values are also given in the last row of Table 1.
Fig. 2 The time series used in the implementation
123
274
C. H. Aladag et al.
Table 1 The forecasts obtain from the used forecasting models Time
Observations
Chen [3]
Huarng [10]a
Huarng [10]b
Chen [4]
Cheng [5]
16.09.2008
33736.3
34816.6
35075
35000
35750
35626.9
17.09.2008
32727.5
34600
33950
34000
33350
35626.9
18.09.2008
32216.4
33950
32750
32750
32750
35626.9
19.09.2008
36370.1
33950
32150
32250
32450
35626.9
20.09.2008
36183.6
36550
37550
37750
34850
35626.9
22.09.2008
35454.1
36550
36050
37750
36050
35626.9
23.09.2008
35177.1
34816.6
35150
35000
35600
35626.9
24.09.2008
36361.8
34816.6
35075
35000
36050
35626.9
25.09.2008
36556.6
36550
37550
37750
35150
35626.9
26.09.2008
36051.3
36550
35750
35750
36650
35626.9
RMSE
1328.04
1777.68
1622.87
1576.1
1621.45
a,b Represents distribution based and mean based methods proposed by Huarng [10], respectively
Table 2 The combined forecasts Time
Observations
SFC
VC
MSFE
The proposed
16.09.2008
33736.3
35253.7
34973
35269.3
33736.3
17.09.2008
32727.5
34305.3
34426.3
34480.3
32471.9
18.09.2008
32216.4
33565.3
33885.8
33845.4
32471.9
19.09.2008
36370.1
33285.3
33868.3
33624.5
36116.7
20.09.2008
36183.6
36465.3
35993.2
36333.7
36116.7
22.09.2008
35454.1
36405.3
35611.5
36272.2
35454.1
23.09.2008
35177.1
35238.7
34975.8
35257.0
36116.7
24.09.2008
36361.8
35313.7
35027.6
35320.4
36116.7
25.09.2008
36556.6
36525.3
36047.8
36384.8
36116.7
26.09.2008
36051.3
36065.3
36708.4
36068.0
36116.7
RMSE
1349.63
1266.24
1320.09
366.07
The forecasts obtained from the forecasting models are combined by using models which are the proposed method based on artificial neural networks, simple forecast combination (SFC), variance–covariance (VC), and MSFE methods. In order to determine the optimum weight values for the combination methods such as VC, MSFE and the proposed combination method, the same data set includes observations between 16th and 26th September 2008 is used. The obtained combined forecasts and the corresponding RMSE values are presented in Table 2. It is clearly seen from Table 2 that the proposed method produce better combined forecasts than those obtained from other methods. When the proposed method is used for the combination, the lowest RMSE value is obtained. In order to show the results visually, the graph of the observations and the forecasts obtained from the proposed and the other combination methods is given in Fig. 3. In the graph shown in Fig. 3, the vertical axis shows the values and the horizontal axis shows the time. When the results are also examined visually in Fig. 3, it is seen that the proposed method gives the more accurate combined forecasts than those obtained from the other combination
123
Forecast Combination
275
Fig. 3 The graph of observations and the forecasts
Table 3 The best weight values for the used feed forward neural network w (1,1)
w (1,2)
w (1,3)
w (1,4)
−28.9906
267.3430
357.0639
−124.2035
w (1,5)
w1
w (2,1)
w2
−354.301
21969.00
0.8398
0.0589
methods. For the used feed forward neural network whose mathematical model given in (8), the best weight values are presented in Table 3. As mentioned in the previous section, these values given in Table 3 are obtained by using Levenberg Marquardt algorithm. All calculations in the implementation are done by using MatLab 2009 version.
6 Conclusion In time series analysis, there are many methods introduced to obtain more accurate forecasts. These improved methods have some advantages and disadvantages. For example, forecasts obtained from a method can reflect the turning points while the error between the forecasts and the observations is high. On the other hand, another method can give forecasts with small error but these forecasts cannot reflect the turning points. Therefore, combining forecasts obtained from different forecasting model can produce more accurate results. In the literature, some studies have been inspired by this idea and some forecast combination methods such as simple forecast combination, variance–covariance, and MSFE were proposed. These methods consist of aggregating the weighted forecasts obtained from different forecasting models. In the forecast combination, determining the weights and choosing the combination
123
276
C. H. Aladag et al.
function are vital choices. In the most of the forecast combination methods, the combination function is linear. In this study, we propose a new forecast combination approach based on artificial neural networks. The proposed method combines the forecasts by using feed forward neural networks. In other words, feed forward neural networks are employed as combination function. For the used feed forward neural network, inputs are forecasts from different forecast models and the output is the combined forecast. Hence, after the used feed forward neural network is trained, the obtained weight values are the best weight values for the combination. In the implementation, the proposed method is applied to IMKB time series. Different forecasts are obtained from various fuzzy time series forecasting models, and then these forecasts are combined by employing the proposed method. The obtained forecasts are also combined by using other well known forecast combination techniques such as simple forecast combination, variance–covariance, and MSFE methods for comparison. As a result of the comparison, it is clearly seen that the proposed forecast combination method gives better forecasts than those obtained from other methods in terms of RMSE value. The results are also examined visually and it is observed that the proposed method produces more accurate forecasts.
References 1. Aladag CH, Basaran MA, Egrioglu E, Yolcu U, Uslu VR (2009) Forecasting in high order fuzzy times series by using neural networks to define fuzzy relations. Expert Syst Appl 36:4228–4231 2. Bates JM, Granger CWJ (1969) The combination of forecast. Oper Res Q 20(4):451–468 3. Chen SM (1996) Forecasting enrollments based on fuzzy time-series. Fuzzy Sets Syst 81:311–319 4. Chen SM (2002) Forecasting enrollments based on high order fuzzy time series. Cybern Syst 33:1–16 5. Cheng CH, Cheng GW, Wang JW (2008) Multi-attribute fuzzy time series method based on fuzzy clustering. Expert Syst Appl 34:1235–1242 6. Egrioglu E, Aladag CH, Günay S (2008) A new model selection strategy in artificial neural network. Appl Math Comput 195:591–597 7. Egrioglu E, Aladag CH, Uslu VR, Basaran MA, Yolcu U (2009) A new hybrid approach based on SARIMA and partial high order bivariate fuzzy time series forecasting model. Expert Syst Appl 36(4):7424– 7434 8. Freitas PSA, Rodrigues AJL (2006) Model combination in neural-based forecasting. Eur J Oper Res 173:801–814 9. Granger CWJ, Ramanathan R (1984) Improved methods of combined forecasts. J Forecast 3:197–204 10. Huarng K (2001) Effective length of intervals to improve forecasting in fuzzy time-series. Fuzzy Sets Syst 123:387–394 11. Levenberg K (1944) A method for the solution of certain non-linear problems in least squares. Q Appl Math 2:164–168 12. Newbold P, Granger CWJ (1974) Experience with forecasting time series and combination of forecasts. J R Stat Soc A 137(2):131–165 13. Song Q, Chissom BS (1993) Fuzzy time series and its models. Fuzzy Sets Syst 54:269–277 14. Song Q, Chissom BS (1993) Forecasting enrollments with fuzzy time series—part I. Fuzzy Sets Syst 54:1–10 15. Winkler RL, Markidakis S (1983) The combination of forecasts. J R Stat Soc A 146(2):150–157 16. Wong KKF, Song H, Witt SF, Wu DC (2007) Tourism forecasting: to combine or not to combine?. Tour Manag 28:1068–1078
123