Water Resour Manage https://doi.org/10.1007/s11269-017-1857-5
Subset Modeling Basis ANFIS for Prediction of the Reference Evapotranspiration Behrooz Keshtegar 1 & Ozgur Kisi 2 & Hamed Ghohani Arab 3 & Mohammad Zounemat-Kermani 4
Received: 2 September 2016 / Accepted: 7 November 2017 # Springer Science+Business Media B.V., part of Springer Nature 2017
Abstract The study investigates accuracy of a new modeling scheme, subset adaptive neuro fuzzy inference system (subset ANFIS), in estimating the daily reference evapotranspiration (ET0). Daily weather data of relative humidity, solar radiation, air temperature, and wind speed from three stations in Central Anatolian Region of Turkey were utilized as input to the applied models. The input data set for modeling the ET0 was divided to several subsets to calibrate the local data using a local modeling-based ANFIS. The estimates obtained from subset ANFIS models were compared with those of the M5 model tree (M5Tree), ANFIS models and ANN. Mean absolute error (MAE), root mean square error (RMSE), and model efficiency factor criteria were applied for analysis of models. The accuracy of M5Tree (from 15.3% to 32.5% in RMSE, from 14.4% to 24.2% in MAE), ANN (from 24.3% to 65.3% in RMSE, from 34.1% to 47% in MAE) and ANFIS (from 17.4% to 35.4% in RMSE, from 10.8% to 28.3% in MAE) models was significantly increased using subset ANFIS for estimating da ily ET0. Keywords Reference evapotranspiration . ANFIS . M5 model tree . ANN . Subset ANFIS
1 Introduction Evapotranspiration (ET0) - the process by which water is vaporized from the soil and free water surface to the atmosphere through evaporation and by transpiration from vegetation and
* Behrooz Keshtegar
[email protected]
1
Department of Civil Engineering, University of Zabol, P.B. 9861335-856, Zabol, Iran
2
Faculty of Natural Sciences and Engineering, Ilia State University, 0162 Tbilisi, Georgia
3
Department of Civil Engineering, University of Sistan and Baluchestan, Zahedan, Iran
4
Water Engineering Department, Shahid Bahonar University of Kerman, Kerman, Iran
Keshtegar B. et al.
plants – depends on several climatological and agricultural interacting factors (Kisi 2008; Rahimikhoob 2016) The evapotranspiration rate from a reference (e.g. alfalfa) surface is called the reference crop evapotranspiration and is denoted as ET0. Determining ET0 is an essential measure for almost all of the hydrologic projects, water resources management, water and irrigation engineering (Guven et al. 2008). One of the most comprehensive categories of indirect approaches for this aim is utilizing soft computing techniques. Soft computing techniques vary form machine learning methods and fuzzy logic to evolutionary learning. In this respect, the use of ANNs for prediction of ET0 as a function of climatic has recently been reported and made available in lots of articles and researches (Aghajanloo et al. 2013; Guven et al. 2008; Kisi 2016; Kumar et al. 2008; Kumar et al. 2002; Traore et al. 2016; Zanetti et al. 2007). Comprehensive studies and review of artificial neural network (ANN) applications in evapotranspiration simulation and empirical models can be seen in researches (Abdullah and Malek 2016; Kisi 2008; Landeras et al. 2008). In addition to ANNs, several ANFIS models, which are based on fuzzy logic concepts in the soft computing techniques have been applied to evapotranspiration modeling (Citakoglu et al. 2014; Cobaner 2011; Dogan 2009; Kisi 2016; Kisi and Zounemat-Kermani 2014; Petković et al. 2015; Pour-Ali Baba et al. 2013) For instance, Citakoglu et al. (2014) modeled monthly ET0 using ANN and ANFIS methods. Findings of the study showed that the ANFIS and ANN models were superior to classical method of FAO-56 PM and can be successfully applied for estimating monthly ET0. Kisi and Zounemat-Kermani (2014) compared the accuracy of two different ANFIS models as ANFIS basis grid partition and ANFIS basis subtractive clustering in estimating daily ET0. Pour-Ali Baba et al. (2013) showed the ANN and ANFIS provided the accurate predictions for ET0 based on the climatic data. However, the application of the M5 model tree technique, which is categorized as a machine learning method to estimate the evapotranspiration is limited in the literature (Kisi 2016; Pal and Deswal 2009; Rahimikhoob 2014; Rahimikhoob 2016). Kisi (2016) investigated the accuracy of three different machine learning approaches, least square support vector regression (LSSVR), multivariate adaptive regression splines (MARS) and M5Tree in approximating ET0. The accuracy of the models was also investigated using input and output data of nearby stations. The results showed that the M5Tree models performed superior the other models. Rahimikhoob (2014, 2016) compared feed-forward ANN and M5Tree for estimating ET0 for meteorological sites in an arid climate. The study demonstrated that estimated ET0 by utilizing M5Tree and ANN techniques performed as well predictions and predicted data followed close agreement to FAO56-PM. The abilities of support vector machine (SVM) for accurate predictions of the ET0 were compared in references (Wen et al. 2015; Yin et al. 2017). The accuracy predictions of the daily reference evapotranspiration using SVM (Wen et al. 2015) and SVMbased genetic algorithm (Yin et al. 2017) were compared with ANN. The study showed that the SVM was provided the better results than the ANN models. The accuracy predictions of the ET0 using the meta-models are a one of the important interesting issues in the hydrological systems. The input data points for calibrating the nonlinear models of ET0 are sensitive factors for accurate predictions. The local separations of the calibrating input data set may select the affective input data to build a nonlinear model of a hydrological system. The objectives of this study are 1) to develop a modeling approach basis ANFIS with subset selecting data (subset ANFIS) 2) to compare several data driven models including ANN, ANFIS, M5Tree and subset-ANFIS models for estimating reference evapotranspiration using weather data. Weather variables used in this study are solar radiation, relative humidity, air temperature, and wind speed. For this aim, three climatic stations in Turkey (Ankara, Kirikkale
Subset modeling Basis ANFIS for Prediction of the Reference...
and Cankiri) each with almost 20-year daily data were used for training and testing the applied models. To the knowledge of the writers, no study has been carried out so far that applies the subset-ANFIS approach for forecasting the daily ET0.
2 Case Study The study uses daily climatic data including air temperature (Ta), relative humidity (RH), solar radiation (SR), and wind speed (u) from the stations of Ankara, Kirikkale and Cankiri located in Central Anatolian Region of Turkey. Data were obtained from the Turkey State Meteorological Service (TSMS). Figure 1 shows the location of the stations in the Central Anatolia Region which is situated in the center of Turkey and covers around 19% of the area in Turkey. Central Anatolia Region is for the most part covered with plateau circled by the Taurus Mountains on the south, North Anatolian Mountains in the North and the high mountains of Eastern Anatolia. The climate is cooler, dry and less sunny than the coastal regions, having snowfall in winter season. The climate and the meteorological conditions which influence the watershed have a region in the Central Anatolia and cover the cities of Ankara, Kirikkale and Cankiri. The major mountain chains present are in the northern and southern areas, and they are for the most zone parallel to the coast. The mountains in the North are the Northern Anatolia Mountains and the mountains toward the south are the Taurus Mountains. These mountain are isolated from each other by wide plain zones, which are situated over the central part of Anatolia (Akkemik and Aras 2005). The mean yearly total rainfall for this district is above 800 mm. In Central Anatolia, the range of mean yearly precipitation sums is somewhere around 350 and 500 mm (Kahya and Çağatay Karabörk 2001). The district is at a height of about 1000 m above sea level and exhibits a flat topography surrounded by highlands. The area has semi-arid climate. It is described by hot and dry summers, and cold and rainy-snowy winters. The mean yearly rainfall ranges from 12 to 51 mm and the serious precipitation is
Cankiri Station Ankara Station
Kirikkale Station
TURKEY
Fig. 1 The location of stations in the Central Anatolia Region of Turkey
Keshtegar B. et al.
concentrated between December and April. The dry season occurs between June and September. The humidity level rises to 84% in the rainy season Evapotranspiration exceeds precipitation in August and September months (Akkemik and Aras 2005).
3 Methods 3.1 M5 Model Tree The algorithm of M5 tree model (M5Tree) is recognized as the most commonly used and popular classifier of decisions trees family. It provides the ability to dealing with complex problems by dividing them into smaller problems. Therefore, it can be regarded as a robust method and a proper approach for simulation, prediction and classification phenomena. Contrary to conventional decision tree algorithms, M5Tree combines linear regression functions with the general form of a decision tree at leaves (Rahimikhoob 2016; Solomatine and Xue 2004). Generally, improved M5Tree models (Also known as M5´ models) consist of three steps of building, pruning and smoothing the tree. Through the building process of an M5Tree, a regression tree is constructed by splitting the space using standard deviation reduction (SDR) factor. The variability is measured by SDR of the values which connect that node from the root and the related branch by testing each attribute at that specific node and then the best attribute is chosen. Accordingly, the chosen attribute would maximize the expected error reduction. The splitting process would be terminated if either a few instances remain or the output values of all the instances that reach the node vary slightly(Etemad-Shahidi and Taghipour 2012). Figure 2a shows a schematic illustration of splitting of input space into three sub-spaces. Continuously, considering a 2D input parameter domain of ×1 and ×2, Fig. 2b illustrates a schematic formed structure of an M5Tree of training procedure. After the building process of the tree, for each node a linear multiple regression model is built. Afterwards, by examining the expected error on the future data, linear regression models might be simplified by dropping attributes. After the simplification phase, the pruning procedure is taken into account for every sub-tree. Each sub-tree would be pruned if the estimated error for the linear model is smaller or equal to the expected error for the sub-tree. After the pruning procedure, it is likely that there would be some discontinuities between adjacent leaves. To cope with this problem, the regularization process (smoothing procedure) for amending discontinuities in the leaves of the constructed tree is applied. Readers are
Fig. 2 a Illustration of the splitting of the input space into three sub-spaces and the three consequent regression models; b correspondent structure of the M5Tree model
Subset modeling Basis ANFIS for Prediction of the Reference...
referred to references (Etemad-Shahidi and Mahjoobi 2009; Pal and Deswal 2009) for comprehensive information about the M5Tree model.
3.2 Artificial Neural Network Artificial neural networks (ANNs) are known as computational modeling tools which have demonstrated their capability for modeling complex and non-linear phenomenon. ANNs are defined as parallel-distributed information structures including interconnected adaptive processing elements (artificial neurons and nodes) resembling the biological arrangement of neurons in a natural neurotic system. They are capable of performing parallel computations and can provide promising results for precisely/imprecisely formulated problems as well as for related phenomena to experimental data and field observations (Zounemat-Kermani 2012). In this study, multilayer perceptron neural network (MLPNN), which is one of the most important classes of neural networks, applied to ET0 prediction. The MLPNN typically consists of an input layer, at least one hidden layer (including nonlinear processing elements) and an output layer (Fig. 3a). Comprehensive information about MLPNN can be found in the following literatures (Gardner and Dorling 1998; Graupe 2013).
3.3 ANFIS with Fuzzy c-Means Clustering Method Adaptive neuro fuzzy inference system (ANFIS) is a type of feed-forward neural network which gets the advantages of a Takagi–Sugeno type fuzzy inference system (FIS) for functioning. Based on the ability of the ANN technique for updating the adjustable parameters, in the ANFIS model the related parameters to the Takagi–Sugeno inference are adjusted according to the ability of learning from training data. In brief, five distinct layers are utilized to structure the architecture of an ANFIS model. The first layer in the ANFIS structure, which is called the fuzzification layer, transforms crisp inputs into fuzzy inputs via membership functions (MFs); the second layer performs the rule base layer by means of multiplication operator; the third layer performs the normalisation of MFs; the fourth layer is the defuzzification layer and the fifth layer is the output layer which uses the summation of the net outputs of the nodes in layer four. More information about the ANFIS network structure is given in (Zounemat-Kermani and Teshnehlab 2008).
Fig. 3 a Basic structure of MLP-ANN with m inputs, n neurons and 2 outputs; b Basic structure of an ANIFS-FCM
Keshtegar B. et al.
In order design an ANFIS model, one has to first construct and then train it. In the construction process, the number and type of membership functions are defined by using a number of methods such as subtractive clustering technique, grid partitioning method and fuzzy c-means method (FCM). In this study, the FCM is used to create the fuzzy MFs and fuzzy rule base. A general depiction of a basic structure of the ANFIS-FCM is shown in Fig. 3b. More information about the FCM can be found in (Abdulshahed et al. 2015; Esfahanipour and Aghamiri 2010).
3.4 Subset-ANFIS Model To achieve an accurate prediction, a subset modeling approach is developed based on the ANFIS model. A local prediction-based ANFIS modeling approach is provided for each subset, which named as subset ANFIS model. The local models for each subset may be improved the accuracy of the predicted data of ET0. The input data can be separated to Ksubset using to improve the predictions of the ANFIS model bases on the affective input data set. It is supposed that domain of modeling an event e.g. ET0 is divided to several subsets i.e. K and then, a local modeling-based ANFIS is calibrated on the data for each subset. The subset data points using uniform distributions functions are selected from the train and test data points between the maximum and minimum ET0 for each local model of ANFIS. The validation data points are adaptively predicted based on subset ANFIS models for each station. This modeling approach based subset ANFIS can be used in a program code such as MATLAB by the following steps: Step 0: Select the input data in the training, testing and validation data points, number of subset for modeling (K). Step 1: Divide the training and testing data points to K subset as follows: D¼
ET max −ET min K
ð1Þ
where, D is the domain of each subset, ETmax and ETmin are the maximum and minimum ET0, respectively. Based on the above relation, each subset has a same domain. This means that the selected data points for each subset are determined by the uniform distribution function thus the domains of each subset are equaled by together that it is shown in the Fig. 4, schematically.
Fig. 4 The sematic subsets for ET0 point based on a same domain
Subset modeling Basis ANFIS for Prediction of the Reference...
Therefore, the interval for each subset can be considered based on ET0 in training and testing data as follows: ET min ðK Þ ¼ ET min þ ðK−1ÞD
ð2Þ
ET max ðK Þ ¼ ET min þ ðK ÞD
ð3Þ
Where, ETmin(K) and ETmax(K) are respectively the maximum and minimum ET0 for Kth subset. Therefore, the data of ET0 in training and testing points for Kth subset are located in the interval YSK ∈ [ETmax(K), ETmin(K)]. The observed ET0 in training and testing periods are divided by the Eq. (1)–(3). The input data (X, i.e. temperature, relative humidity, wind velocity and solar radiation) are selected based on the YSK for each subset. On the basis of ETmax(K) and ETmin(K), it is obtained the interval values of input data for train and test phases. The subset input data points, which are located in the interval Kth subset of ET0 are considered for training and testing database of Kth subset (XSK). Figure 4 illustrated the K-subset based on the output data. It can be concluded from Fig. 4 that the every point of ET0 in train and test are located in a one of the subset 1 to K. This means that YS1 − YSK are independent data point thus YS1 ∩ YS2 ∩ … ∩ YSK = {ϕ} but the input data of a one of subset may be overlapped with another subset.
Step 2: Generate FIS for each subset and made a ANFIS model based on input data, which are selected in Step 1 for each subset (ANFIS-1, ANFIS-2, ANFIS-3, …, ANFISK). In this step, the ET0 in the test period for each station is predicted and the ANFIS models basis subset data points are calibrated. The calibrated models for each subset may be obtained different because the YS is not a similar point in each subset. Step 3: Create an ANFIS model based on the all input data (train and test i.e. XT (input data), YT (ET0)), then, predict the ET0 based on ANFIS models, which is generated with all input data. This step is produced a guideline for citing each data point in a subset. The RMSE is computed for validation data points based on the ANFIS model. The RMSE can be used for stopping criterion of the updating points in each subset. Step 4: Divide the test and validation data points, which are predicted using the subset ANFIS for each station based on the information subset. The domain and interval each subset are obtained in the Step 1 that the test and validation data are divided as well as step 1 for predicted data points in the test and validation period. These subset data points of test and validate period can be done as follows: i) Give the predicted data points in test and validation based on ANFIS (YT) for first application or subset ANFIS for next applications (Y TS ). ii) Give interval domain (maximum and minimum) for each subset which are determined in step 1 iii) Divide the test and validation input data points based on the predicted ET step (i) and the information subset domain using step (ii) iv) If ET0 < ETmin, then the data are located on the first data points. The input data points are located on the last subset when ET0 ≥ ETmax. The input data points for each subset are selected based on the predicted ET0 in this step. The framework of the subset ANFIS is plotted in Fig. 5. It is obvious that the subset input data
Keshtegar B. et al.
Fig. 5 Framework of the subset ANFIS to predict the ET0
points for prediction are selected based on the previous predicted data points from ANFIS (first prediction) or subset ANFIS (another prediction). Step 5: Predict the ET0 based on the subset input data from Step 4 and the generated subset ANFIS models in Step 2 (Y TS ). Compute the RMSE for this predicted data. Step 6: Check the convergence prediction as |RMSE1-RMSE| < 0.0001 in which, RMSE1 is RMSE, which is obtained based on the new predicted data, and RMSE is the previous RMSE. If |RMSE1-RMSE| < 0.0001, then print the predicted data based on subset ANFIS (Y TS ), otherwise, RMSE = RMSE1 (the previous RMSE is equal to new RMSE) and go to step 4 to continue the division process in step 4, predicted ET0 based on subset ANFIS in step 5 and check the convergence prediction in step 6.
According to the framework of subset ANFIS in Fig. 5, the steps 1–6 is shown in this figure. The subset ANFIS-based modeling ET0 is similar to the ANFIS model for generating the FIS in each subset models. However, the modeling processes are divided into several subsets thus it may be produced an accurate prediction for a subset in comparison with the all data generated FIS. In this modeling approach-based subset data, the ANFIS models are locally calibrated, thus there are produced several local models and then may be increased the agreement the predicted data on the observed ET0. It is supposed that the data points in max is the maximum data point in the all each set of subset are more than N max S =4 in which N S subset. This means that a suitable input data points are selected to generate a models-based artificial intelligent (i.e. ANFIS). The data in the first and last subset may be reduced when a larger number subset is considered for modeling approach-based subset data points. It can be
Subset modeling Basis ANFIS for Prediction of the Reference...
reduced the number of subset when the data points in the all subsets are not satisfied the max condition N min S < N S =4. Therefore, the number of sunset can be given as more than K ≥ 2 max and less than Kmax for N min S < N S =4. The maximum subset number is depended on the total number experiments in training, the maximum and minimum ET0 in the training data points (it is affected on the subset domain based on Eq. (1)). This modeling approach is coded in the MATLAB software and uses the fuzzy toolbox.
4 Comparative Statistics In the current study, various statistical parameters were utilized to analyze the performance of applied models. These statistics can be given by the following relations (Keshtegar et al. 2016b; Nash and Sutcliffe 1970; Willmott 1981).
4.1 Root Mean Square Error (RMSE) sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi i2 1 N h RMSE ¼ ∑ ðET 0 Þi − ET p i N i¼1
ð4Þ
4.2 Mean Absolute Errors (MAE) MAE ¼
1 N ∑ jðET 0 Þi − ET p i j N i¼1
ð5Þ
i 1 N h ∑ ðET 0 Þi − ET p i N i¼1
ð6Þ
4.3 Residual Mean Error (RME) RME ¼
Where, N is the number of data experiments, ET0 and ETp are respectively the observed and predicted data points for reference evapotranspiration. RMSE is root mean square error that is shown the average difference between predicted (ETp) and observed reference evapotranspiration (ET0) for ith data. The lower values of RMSE, MAE and RME reveal better predictions.
4.4 Model Efficiency Factor (EF) N h i2 ∑ ðET 0 Þi − ET p i
EF ¼ 1− i¼1N h i2 ; − ∞ < EF ≤1 ∑ ðET 0 Þi −ET 0
ð7Þ
i¼1
The efficiency factor (EF) is computed using the relationship between the predicted and observed mean deviations. The EF represents the correlations between the observed and predicted data (Keshtegar et al. 2016a).
Keshtegar B. et al.
4.5 Agreement Index (d)
d ¼ 1−
N h i2 ∑ ðET 0 Þi − ET p i
i¼1 h i2 ; 0 < EF ≤1 ∑ jðET 0 Þi − ET 0 j þ j ET p i − ET 0 j N
ð8Þ
i¼1
N
Where, ET 0 is the average value of observed ET0 as ET 0 ¼ N1 ∑ ðET 0 Þi . The d is a i¼1
descriptive measure, which can be employed to make a cross-comparison between models. The range of d is varied between 0 for no correlation between the observed and predicted data of ET0 and 1 for perfect predictions (Keshtegar et al. 2017).
4.6 Relative Error (RE)
RE ¼
N N ∑ ðET 0 Þi − ∑ ET p i
i¼1
i¼1
N
100
ð9Þ
∑ ðET 0 Þi
i¼1
The relative error is a normalized parameter that RE = 0 indicates a perfect prediction.
5 Results The accuracies of M5Tree, ANN, ANFIS and subset ANFIS methods in modeling ET0 were investigated. For the ANN models, different numbers of hidden node (from 1 to 10) were used and the optimal ones that had the least RMSE in the test stage were determined. Similar to the ANN, different number of clusters (from 2 to 8) and iterations (from 10 to 100) were tried for the ANFIS models and the optimal ones were obtained according to the test stage. Table 1 reports the error statistics of each model in training and test stages. In the table, (4,2,1) indicates an ANN model comprising 4 inputs corresponding to air temperature, solar radiation, relative humidity and wind speed, 2 hidden and 1 output nodes, respectively. (8,gauss,100) indicates an ANFIS model having 8 clusters and Gaussian MFs for each input and 100 iterations. It is apparent from Table 1 that subset ANFIS models have better performances than the M5Tree, ANN and single ANFIS models in both training and test stages. Among the subset ANFIS models, 6-subset ANFIS provides the best accuracy in modeling ET0 of Ankara, Kirikkale and Cankiri stations. M5Tree, ANN, ANFIS and subset ANFIS models are compared in Table 2 for the validation stage. It is clear from the results with respect to EF, d, RMSE and MAE statistics that subset ANFIS models have better accuracy than the other models. Figures 6-8 illustrate the observed and estimated ET0 of the Ankara, Kirikkale and Cankiri stations in validation stage, respectively. It can be conducted that from Table 2 and Figs. 6-8 that 4-subset ANFIS performs superior to the other models in estimating Ankara’s ET0 in validation stage. The best scores were obtained using 6-subset ANFIS model in validation stage for estimating Kirikkale’s ET0. It is evident from Fig. 7 based on the fit line equations and R2 values that 6-subset ANFIS model has less scattered estimates than the other models. The scatterplots of Cankiri station
Ankara M5Tree ANN ANFIS 2-subset ANFIS 4-subset ANFIS 6-subset ANFIS Kirikkale M5Tree ANN ANFIS 2-subset ANFIS 4-subset ANFIS 6-subset ANFIS Cankiri M5Tree ANN ANFIS 2-subset ANFIS 4-subset ANFIS 6-subset ANFIS
Model EF
d
0.0577 0.1051 0.0614 0.0518 0.0484 0.0438 0.0648 0.1119 0.0770 0.0714 0.0680 0.0449 0.0587 0.0960 0.0679 0.0625 0.0488 0.0472
0.0832 0.1443 0.0893 0.0731 0.0702 0.0646
0.0968 0.1569 0.1145 0.1072 0.0994 0.0747
0.0822 0.1369 0.0908 0.0858 0.0715 0.0665
(4,2,1) (8,gauss,100) (8,gauss,100) (8,gauss,100) (8,gauss,100)
(4,2,1) (8,gauss,100) (8,gauss,100) (8,gauss,100) (8,gauss,100)
0.9979 0.9946 0.9971 0.9975 0.9978 0.9988 0.9966 0.9905 0.9958 0.9963 0.9974 0.9978
−5.0E-9 3.7E-9 6.5E-7 7.3E-4 1.0E-4 3.1E-4
0.9983 0.9947 0.9980 0.9987 0.9988 0.9989
7.6E-9 1.3E-5 −7.4E-8 −6.6E-4 −5.8E-4 −1.5E-3
2.37E-9 −4.8E-5 −1.5E-7 −7.4E-5 −7.8E-4 −2.8E-4
0.9991 0.9976 0.9990 0.9991 0.9994 0.9994
0.9995 0.9986 0.9993 0.9994 0.9995 0.9997
0.9996 0.9987 0.9995 0.9997 0.9997 0.9997
0.1469 0.1717 0.1166 0.1066 0.1192 0.0974
0.3116 0.5822 0.2788 0.2962 0.2762 0.2589
0.1488 0.1988 0.1494 0.1200 0.1164 0.1094
RMSE (mm)
RME (mm)
RMSE (mm)
MAE (mm)
Test
Training
(4,2,1) (8,gauss,100) (8,gauss,100) (8,gauss,100) (8,gauss,100)
Structure
Table 1 Error statistics for each model in training and test stages
0.0913 0.1149 0.0830 0.0760 0.0706 0.0635
0.2150 0.2811 0.1987 0.2028 0.1985 0.1838
0.0908 0.1402 0.1006 0.0805 0.0744 0.0739
MAE (mm)
−0.0127 0.0098 −0.0216 −0.0202 −0.0265 −0.0140
0.0518 −0.0862 0.0320 0.0173 0.0091 0.0281
0.0010 0.0201 −0.0167 −0.0190 −0.0117 −0.0104
RME (mm)
0.9833 0.9771 0.9895 0.9912 0.9890 0.9926
0.9644 0.8758 0.9715 0.9719 0.9721 0.9773
0.9937 0.9887 0.9936 0.9959 0.9961 0.9967
EF
0.9958 0.9941 0.9974 0.9978 0.9973 0.9982
0.9913 0.9718 0.9930 0.9928 0.9932 0.9940
0.9984 0.9972 0.9984 0.9990 0.9990 0.9992
d
Subset modeling Basis ANFIS for Prediction of the Reference...
Keshtegar B. et al. Table 2 Comparison of models in estimating Ankara’s ET0 in validation stage Model
Ankara M5Tree ANN ANFIS 2-subset ANFIS 4-subset ANFIS 6-subset ANFIS Kirikkale M5Tree ANN ANFIS 2-subset ANFIS 4-subset ANFIS 6-subset ANFIS Cankiri M5Tree ANN ANFIS 2-subset ANFIS 4-subset ANFIS 6-subset ANFIS
MAE (mm)
Total ET0* Average ET0* (mm) (mm)
Relative error (%)
0.0789 0.1273 0.0942 0.0758
5447.49 5461.40 5467.48 5464.75
2.9817 2.9893 2.9926 2.9896
0.2615 0.5176 0.6295 0.5706
(8,gauss,100) 0.9970 0.9992 0.1093 0.0675 5460.77
2.9889
0.5060
(8,gauss,100) 0.9958 0.9990 0.1290 0.0789 5468.95
2.9934
0.6566
0.9524 0.9881 0.3171 (4,2,1) 0.8204 0.9541 0.6160 (8,gauss,100) 0.9683 0.9920 0.2590 (8,gauss,100) 0.9733 0.9933 0.2375
4689.60 4761.86 4685.47 4692.63
2.5682 2.6078 2.5660 2.5699
−1.4395 0.0792 −1.5263 −1.3758
(8,gauss,100) 0.9756 0.9940 0.2270 0.1623 4710.31
2.5796
−1.0043
(8,gauss,100) 0.9783 0.9946 0.2140 0.1465 4706.17
2.5773
−1.0913
0.9774 0.9944 0.1646 (4,2,1) 0.9730 0.9932 0.1801 (8,gauss,100) 0.9629 0.9908 0.2111 (8,gauss,100) 0.9795 0.9949 0.1569
2.2954 2.3282 2.3285 2.3234
0.0889 1.5219 1.5320 1.3132
(8,gauss,100) 0.9805 0.9952 0.1530 0.0828 4244.8418 2.3234
1.3113
(8,gauss,100) 0.9845 0.9962 0.1363 0.0756 4226.3518 2.3133
0.8700
Structure
EF
d
RMSE (mm)
0.9958 0.9990 0.1290 (4,2,1) 0.9916 0.9979 0.1816 (8,gauss,100) 0.9940 0.9985 0.1542 (8,gauss,100) 0.9960 0.9990 0.1251
0.1819 0.2224 0.1643 0.1568
0.0998 0.1206 0.0989 0.0911
4193.6240 4253.6659 4254.0883 4244.9233
*The total ET0 and average ET0 for Ankara station in the validate period are respectively obtained as Tot.ETo = 5433.28 mm and Ave.ETo = 2.9738 mm *The total ET0 and average ET0 for Kirikkale station in the validate period are respectively obtained as Tot.ETo = 4758.09 mm and Ave.ETo = 2.6057 mm *The total ET0 and average ET0 for Cankiri station in the validate period are respectively obtained as Tot.ETo = 4189.90 mm and Ave.ETo = 2.2933 mm
based on Fig. 8 clearly indicates that the 6-subset ANFIS model approximates observed ET0 values better than the other models.
6 Discussion The results apparently indicate the superiority of subset ANFIS models compared to single ones. It considerably increased the accuracies of the alternative models in estimating ET0 values in the validation stage. For example, the 4-subset ANFIS increased the RMSEMAE accuracies of the M5Tree, ANN and single ANFIS models by 15.3%–14.4%, 39.88%–47% and 29.1%–28.3% for the Ankara station, respectively. For the Kirikkale station, the RMSE and MAE accuracies of the M5Tree, ANN and single ANFIS models was respectively increased by 32.5%–19.5%, 65.3%–34.1% and 17.4%–10.8% using the
Subset modeling Basis ANFIS for Prediction of the Reference...
Fig. 6 The observed and estimated ET0 of the Ankara Station in validation stage
6-subset ANFIS model. Moreover, the 6-subset ANFIS model increased the RMSE and MAE accuracies of the M5Tree, ANN and single ANFIS models by 17.2%–24.2%, 24.3%–37.3% and 35.4%–23.6% for the Cankiri station, respectively. The main reason of this superiority is subset ANFIS is a compound model uses various single models for each data range and therefore it has more parameters and rules and more flexibility and
Fig. 7 The observed and estimated ET0 of the Kirikkale Station in validation stage
Keshtegar B. et al.
Fig. 8 The observed and estimated ET0 of the Cankiri Station in validation stage
power than the single model. It should be noted that 2-subset has also superior accuracy than the single models but not as much as 4- and 6-subset models. The number subset is also variable and can be changed according to the data structure. For this reason, different subsets were found to be the optimal for different stations. This suggests trying different subsets and selecting the best one in modeling ET0 for each station. Overall results indicated that the subset ANFIS models performs superior to the other models. The main advantage of these models is training of different ANFIS models for different ranges independently. The climatic data corresponding to each range are used for optimizing the parameters of each ANFIS model. In this way, each ANFIS was not subjected to the entire range of extremely different values in this manner making the ET0 estimations of the subset ANFIS less likely to deviate significantly from the relating observations. On account of single ANFIS use without range specification, the single model is prepared for the entire data set subsequently subjecting it to amazingly high and low ET0 values continuously. This makes restricting weights within the ANFIS, resulting in its provision of poorer overall estimates (Cigizoglu and Kisi 2006).
7 Conclusion A new modeling schema, subset ANFIS was developed for modeling reference evapotranspiration in this study. The effective data points were divided into k-subset using the uniform selection from the train input data points for modeling ET0 using the ANFIS models. The modeling process of the subset ANFIS was established based on the active learning loop to obtain the accurate predictions of ET0. The results of these models were compared with M5Tree, ANN and single ANFIS models. Subset ANFIS models were found to be superior to the other models in modeling daily ET0 using daily climatic inputs, air temperature, solar
Subset modeling Basis ANFIS for Prediction of the Reference...
radiation, relative humidity and wind speed obtained from three stations in Central Anatolian Region of Turkey. 4-subset ANFIS model performed superior to the M5Tree, ANN and ANFIS models in Ankara station while the 6-subset ANFIS provided the best accuracy in modeling daily ET0 in Kirikkale and Cankiri stations. Subset ANFIS models significantly increased the estimation accuracy of M5Tree (from 15.3% to 32.5% in RMSE and from 14.4% to 24.2% in MAE), ANN (from 24.3% to 65.3% in RMSE and from 34.1% to 47% in MAE) and ANFIS (from 17.4% to 35.4% in RMSE and from 10.8% to 28.3% in MAE) models. Overall results suggest use of subset ANFIS models as a better alternative to the M5Tree, ANN and single ANFIS models in estimating daily ET0. In the current study, 3 different subset numbers i.e. 2, 4 and 6 subsets were used for the subset ANFIS models. In the future study, more subset numbers may be tried and compared with the empirical methods. In this study, daily data from three stations were used. Much more data in daily or monthly different areas may be used to derive the ability of subset ANFIS conclusions. Acknowledgements This work was supported by University of Zabol under Grant No. UOZ-GR-9517-3.
References Abdullah SS, Malek MA (2016) Empirical Penman-Monteith equation and artificial intelligence techniques in predicting reference evapotranspiration: a review. Int J Water 10(1):55–66. https://doi.org/10.1504 /IJW.2016.073741 Abdulshahed AM, Longstaff AP, Fletcher S, Myers A (2015) Thermal error modelling of machine tools based on ANFIS with fuzzy c-means clustering using a thermal imaging camera. Appl Math Model 39(7):1837–1852. https://doi.org/10.1016/j.apm.2014.10.016 Aghajanloo M-B, Sabziparvar A-A, Talaee PH (2013) Artificial neural network–genetic algorithm for estimation of crop evapotranspiration in a semi-arid region of Iran. Neural Comput & Applic 23(5):1387–1393. https://doi.org/10.1007/s00521-012-1087-y Akkemik Ü, Aras A (2005) Reconstruction (1689–1994 AD) of April–August precipitation in the southern part of central Turkey. Int J Climatol 25(4):537–548. https://doi.org/10.1002/joc.1145 Cigizoglu HK, Kisi Ö (2006) Methods to improve the neural network performance in suspended sediment estimation. J Hydrol 317(3-4):221–238. https://doi.org/10.1016/j.jhydrol.2005.05.019 Citakoglu H, Cobaner M, Haktanir T, Kisi O (2014) Estimation of monthly mean reference evapotranspiration in Turkey. Water Resour Manag 28(1):99–113. https://doi.org/10.1007/s11269-013-0474-1 Cobaner M (2011) Evapotranspiration estimation by two different neuro-fuzzy inference systems. J Hydrol 398(3-4):292–302. https://doi.org/10.1016/j.jhydrol.2010.12.030 Dogan E (2009) Reference evapotranspiration estimation using adaptive neuro-fuzzy inference systems. Irrig Drain 58(5):617–628. http://dx.doi.org/10.1002/ird.445 Esfahanipour A, Aghamiri W (2010) Adapted neuro-fuzzy inference system on indirect approach TSK fuzzy rule base for stock market analysis. Expert Syst Appl 37(7):4742–4748. https://doi.org/10.1016/j. eswa.2009.11.020 Etemad-Shahidi A, Mahjoobi J (2009) Comparison between M5′ model tree and neural networks for prediction of significant wave height in Lake Superior. Ocean Eng 36(15-16):1175–1181. https://doi.org/10.1016/j. oceaneng.2009.08.008 Etemad-Shahidi A, Taghipour M (2012) Predicting longitudinal dispersion coefficient in natural streams using M5′ model tree. J Hydraul Eng 138(6):542–554. https://doi.org/10.1061/(ASCE)HY.1943-7900.0000550 Gardner MW, Dorling S (1998) Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences. Atmos Environ 32(14-15):2627–2636. https://doi.org/10.1016/S1352-2310(97)00447-0 Graupe D (2013) Principles of artificial neural networks, vol 7, 3rd edn. World Scientific Publishing Co. Pte, Lte., Singapore, p 506224 Guven A, Aytek A, Yuce MI, Aksoy H (2008) Genetic programming-based empirical model for daily reference evapotranspiration estimation. Clean–Soil, Air, Water 36(10-11):905–912. https://doi.org/10.1002 /clen.200800009 Kahya E, Çağatay Karabörk M (2001) The analysis of El Nino and La Nina signals in streamflows of Turkey. Int J Climatol 21(10):1231–1250. https://doi.org/10.1002/joc.663
Keshtegar B. et al. Keshtegar B, Allawi MF, Afan HA, El-Shafie A (2016a) Optimized river stream-flow forecasting model utilizing high-order response surface method. Water Resour Manag 30(11):3899–3914. https://doi.org/10.1007 /s11269-016-1397-4 Keshtegar B, Piri J, Kisi O (2016b) A nonlinear mathematical modeling of daily pan evaporation based on conjugate gradient method. Comput Electron Agric 127:120–130. https://doi.org/10.1016/j. compag.2016.05.018 Keshtegar B, Sadeghian P, Gholampour A, Ozbakkaloglu T (2017) Nonlinear modeling of ultimate strength and strain of FRP-confined concrete using chaos control method. Compos Struct 163:423–431. https://doi. org/10.1016/j.compstruct.2016.12.023 Kisi O (2008) The potential of different ANN techniques in evapotranspiration modelling. Hydrol Process 22(14):2449–2460. https://doi.org/10.1002/hyp.6837 Kisi O (2016) Modeling reference evapotranspiration using three different heuristic regression approaches. Agric Water Manag 169:162–172. https://doi.org/10.1016/j.agwat.2016.02.026 Kisi O, Zounemat-Kermani M (2014) Comparison of two different adaptive neuro-fuzzy inference systems in modelling daily reference evapotranspiration. Water Resour Manag 28(9):2655–2675. https://doi. org/10.1007/s11269-014-0632-0 Kumar M, Raghuwanshi N, Singh R, Wallender W, Pruitt W (2002) Estimating evapotranspiration using artificial neural network. J Irrig Drain Eng 128(4):224–233. https://doi.org/10.1061/(ASCE)0733-9437(2002)128:4(224) Kumar M, Bandyopadhyay A, Raghuwanshi N, Singh R (2008) Comparative study of conventional and artificial neural network-based ETo estimation models. Irrig Sci 26(6):531–545. https://doi.org/10.1007/s00271-008-0114-3 Landeras G, Ortiz-Barredo A, López JJ (2008) Comparison of artificial neural network models and empirical and semi-empirical equations for daily reference evapotranspiration estimation in the Basque Country (Northern Spain). Agric Water Manag 95(5):553–565. https://doi.org/10.1016/j.agwat.2007.12.011 Nash JE, Sutcliffe JV (1970) River flow forecasting through conceptual models part I—A discussion of principles. J Hydrol 10(3):282–290. https://doi.org/10.1016/0022-1694(70)90255-6 Pal M, Deswal S (2009) M5 model tree based modelling of reference evapotranspiration. Hydrol Process 23(10): 1437–1443. https://doi.org/10.1002/hyp.7266 Petković D, Gocic M, Trajkovic S, Shamshirband S, Motamedi S, Hashim R, Bonakdari H (2015) Determination of the most influential weather parameters on reference evapotranspiration by adaptive neuro-fuzzy methodology. Comput Electron Agric 114:277–284. https://doi.org/10.1016/j.compag.2015.04.012 Pour-Ali Baba A, Shiri J, Kisi O, Fard AF, Kim S, Amini R (2013) Estimating daily reference evapotranspiration using available and estimated climatic data by adaptive neuro-fuzzy inference system (ANFIS) and artificial neural network (ANN). Hydrol Res 44(1):131–146. https://doi.org/10.2166/nh.2012.074 Rahimikhoob A (2014) Comparison between M5 model tree and neural networks for estimating reference evapotranspiration in an arid environment. Water Resour Manag 28(3):657–669. https://doi.org/10.1007 /s11269-013-0506-x Rahimikhoob A (2016) Comparison of M5 model tree and artificial neural network’s methodologies in modelling daily reference evapotranspiration from NOAA satellite images. Water Resour Manag 30(9):3063–3075. https://doi.org/10.1007/s11269-016-1331-9 Solomatine DP, Xue Y (2004) M5 model trees and neural networks: application to flood forecasting in the upper reach of the Huai River in China. J Hydrol Eng 9(6):491–501. https://doi.org/10.1061/(ASCE)1084-0699 (2004)9:6(491) Traore S, Luo Y, Fipps G (2016) Deployment of artificial neural network for short-term forecasting of evapotranspiration using public weather forecast restricted messages. Agric Water Manag 163:363–379. https://doi.org/10.1016/j.agwat.2015.10.009 Wen X, Si J, He Z, Wu J, Shao H, Yu H (2015) Support-vector-machine-based models for modeling daily reference evapotranspiration with limited climatic data in extreme arid regions. Water Resour Manag 29(9): 3195–3209. https://doi.org/10.1007/s11269-015-0990-2 Willmott CJ (1981) On the validation of models. Phys Geogr 2:184–194 Yin Z, Wen X, Feng Q, He Z, Zou S, Yang L (2017) Integrating genetic algorithm and support vector machine for modeling daily reference evapotranspiration in a semi-arid mountain area. Hydrol Res 48(5):1177–1191. https://doi.org/10.2166/nh.2016.205 Zanetti S, Sousa E, Oliveira V, Almeida F, Bernardo S (2007) Estimating evapotranspiration using artificial neural network and minimum climatological data. J Irrig Drain Eng 133(2):83–89. https://doi.org/10.1061 /(ASCE)0733-9437(2007)133:2(83) Zounemat-Kermani M (2012) Hourly predictive Levenberg–Marquardt ANN and multi linear regression models for predicting of dew point temperature. Meteorog Atmos Phys 117(3-4):181–192. https://doi.org/10.1007 /s00703-012-0192-x Zounemat-Kermani M, Teshnehlab M (2008) Using adaptive neuro-fuzzy inference system for hydrological time series prediction. Appl Soft Comput 8(2):928–936. https://doi.org/10.1016/j.asoc.2007.07.011