Modeling Earth Systems and Environment https://doi.org/10.1007/s40808-017-0398-5
ORIGINAL ARTICLE
Genetic algorithm tuned fuzzy inference system to evolve optimal groundwater extraction strategies to control saltwater intrusion in multi-layered coastal aquifers under parameter uncertainty Dilip Kumar Roy1 · Bithin Datta1 Received: 25 September 2017 / Accepted: 6 November 2017 © Springer International Publishing AG, part of Springer Nature 2017
Abstract Excessive withdrawal of groundwater resources poses significant challenges to the management of saltwater intrusion processes in coastal aquifers. Optimization of groundwater withdrawal rates plays a vital role in sustainable management of coastal aquifers. This study proposes a genetic algorithm (GA) tuned Fuzzy Inference System (FIS) hybrid model (GA-FIS) for developing a regional scale saltwater intrusion management strategy. GA is used to tune the FIS parameters in order to obtain the optimal FIS structure. The GA-FIS models thus obtained are linked externally to the Controlled Elitist Multiobjective Genetic Algorithm (CEMGA) in order to derive optimal pumping management strategies using a linked simulation–optimization approach. The performance of the hybrid GA-FIS-CEMGA based saltwater intrusion management model is compared with that of a basic adaptive neuro fuzzy inference system (ANFIS) based management model (ANFIS-CEMGA). The parameters of the ANFIS model are tuned using hybrid algorithm. To achieve computational efficiency, the proposed optimization routine is run in a parallel processing platform. An illustrative multi-layered coastal aquifer system is used to evaluate the performances of both management models. The illustrative aquifer system considers uncertainties associated with the hydrogeological parameters e.g. hydraulic conductivity, compressibility, bulk density, and aquifer recharge. The evaluation results show that the proposed saltwater intrusion management models are able to evolve reliable optimal groundwater extraction strategies to control saltwater intrusion for the illustrative multi-layered coastal aquifer system. However, a closer look at the performance evaluation results demonstrate the superiority of the GA-FIS-CEMGA based management model over ANFIS-CEMGA based saltwater intrusion management model. Keywords Saltwater intrusion · Fuzzy inference system · Genetic algorithm · Controlled elitist multi-objective genetic algorithm · Parameter uncertainty
Introduction Saltwater intrusion in coastal aquifers has become a major water resources management problem. This management issue is directly related to unplanned and irrational over extraction of coastal groundwater resources, often to satisfy demands for urban water supply and irrigation. The extent * Dilip Kumar Roy
[email protected] Bithin Datta
[email protected] 1
Discipline of Civil Engineering, College of Science and Engineering, James Cook University, Douglas, QLD 4811, Australia
of future saltwater intrusion scenarios in coastal aquifers can be obtained by simulating physical processes along with spatial and temporal groundwater extraction patterns. However, prediction of saltwater intrusion processes in coastal aquifers is a challenging task because of the inherent uncertainties associated with model structure and uncertainties related to accompanying model parameters (Sreekanth and Datta 2011b; Sreekanth et al. 2012). Another source of uncertainty arises from inappropriate and inadequate characterization of the accompanying physical processes in the subsurface system. Multidimensional heterogeneity of aquifer properties such as hydraulic conductivity, compressibility, and bulk density are considered as major sources of uncertainty in groundwater modelling system (Ababou and Al-Bitar 2004). Other sources of uncertainty are associated with spatial and temporal variability of hydrologic as well as human
13
Vol.:(0123456789)
interventions, e.g. aquifer recharge and transient groundwater extraction patterns. This study addresses uncertainties arising from estimating hydrogeological model parameters (hydraulic conductivity, compressibility, bulk density, and aquifer recharge), and in accurately estimating spatial and temporal variation of groundwater extraction patterns. Density dependent coupled flow and salt transport processes in a typical coastal aquifer system is nonlinear and complex. Therefore, simulation of these complex physical processes is always associated with huge computational burden and complexity. This is especially true in situations where repetitive use of these simulation models is necessary, e.g. in a linked simulation-optimization (S/O) methodology to develop regional scale saltwater intrusion management models. In these situations, a sufficiently accurate and approximate simulation of the complex physical processes in a coastal aquifer would be very useful. The reliability and accuracy of such an approximate simulator depends on how accurately the simulator captures and simulates the accompanying physical processes. This study proposes artificial intelligence (AI) based emulators as approximate simulators of the density reliant coupled flow and salt transport processes in a multi-layered coastal aquifer system under parameter uncertainty. AI based emulators have gained popularity in recent years to approximate physical processes in coastal aquifers. Learning from experience is one of the characteristics of AI methods. This feature gives them the ability to address real world problems. Artificial Neural Network (ANN) (Bhattacharjya and Datta 2009; Kourakos and Mantoglou 2009; Sreekanth and Datta 2010), Genetic Programming (GP) (Sreekanth and Datta 2010, 2011a), Evolutionary Polynomial Regression (EPR) (Hussain et al. 2015), Fuzzy Inference System (FIS) (Roy and Datta 2017a), cubic Radial Basis Function (RBF) (Christelis and Mantoglou 2016), Multivariate Adaptive Regression Spline (MARS) (Roy and Datta 2017b), Fuzzy Inference System (FIS) (Roy and Datta 2017a), and Adaptive Neuro Fuzzy Inference System (Roy and Datta 2017c) have been used as a computationally efficient substitute of the density reliant coupled flow and solute transport processes in coastal aquifers. Despite achieving computational benefits, some of these emulators have certain limitations. The drawbacks of ANN models include proneness to premature convergence in local minima, the “Black-Box” nature of the models, higher computational burden, and susceptibility to model overfitting etc. (Holman et al. 2014). In addition, ANN models are not stable when the number of training datasets is insufficient (Hsieh and Tang 1998). GP, an explicit mathematical formulation (Shiri and Kişi 2011), produces simple regression models (Sreekanth and Datta 2011a) that can easily be linked within an optimization algorithm to achieve computational efficiency in linked S/O methodology. However, GP requires extensive training time
13
Modeling Earth Systems and Environment
for evaluating millions of model structures before finding the optimal structure (Sreekanth and Datta 2011a). Besides, GP suffers from being trapped in local minima (Pillay 2004). RBF is quite simple in formulation (Sóbester et al. 2014) and easy to implement in any number of dimensions with reasonable accuracy for certain types of radial functions (Piret 2007). However, RBF has stability issues that can be a serious concern when using the RBF method. Computational cost is another significant drawback of RBFs. Polynomial regression has stability problems when the polynomial order is high for polynomial fits. In addition, individual observations of the training datasets can have an unexpected influence on remote parts of the curve in polynomial regression (Green and Silverman 1993). To overcome some of the limitations of the existing meta-modelling approaches, present study investigates the applicability of a Genetic Algorithm (GA) tuned FIS (GA-FIS) as an approximate simulator of the physical processes in a coastal aquifer system considering a set of uncertain model parameters. Recently, FIS models have received considerable attention in the modelling of nonlinear systems. FISs are based on fuzzy set theory (Zadeh 1965). FISs incorporate human reasoning process in reaching conclusion from the predictor–response relationships of a nonlinear system. FISs are recognized as a successful computing framework because of their capability of application in multi-dimensional fields (Jang et al. 1997). FISs are recognized as an effective tool to model complex and nonlinear processes by capturing the nonlinear relationships between the predictors and responses (Sugeno and Yasukawa 1993; Takagi and Sugeno 1985). Obtaining an optimal model structure is a challenging task in the development of an FIS based emulator of the saltwater intrusion processes. Among several techniques used to improve the accuracy of the fuzzy based modelling techniques (Casillas et al. 2003), tuning of the antecedent and consequent parameters have achieved a reasonable degree of accuracy (Lee and Teng 2001). Parameters tuning has a significant impact on the accuracy of the FIS based models (Zeng and Singh 1996). The antecedent and consequent parameters of a FIS structure should be tuned appropriately to obtain an optimal set of rules. The antecedent or premise parameters are nonlinear and more difficult to tune than the consequent or conclusive parameters, which are linear in nature. Optimal FIS parameters are obtained through parameter tuning using hybrid algorithm (Jang 1993). In hybrid algorithm, the premise parameters are estimated by gradient descent (GD) through error backpropagation whereas the consequent (linear) parameters are estimated by recursive least square approximations. This tuning process is complex and time consuming because gradient based methods are based on considering all layers, and calculation of gradients in each step is very complex. Moreover, convergence of parameters depends on the initial value of the associated
Modeling Earth Systems and Environment
parameters. Convergence to a global optimal values of parameters is a slow process, and the parameter values may be trapped in the local optima. As an improvement over gradient based search approach, tuning of the more complex and nonlinear antecedent part is accomplished by population based optimization algorithms (Araghi et al. 2015; Basser et al. 2015; Jalalkamali 2015; Oliveira and Schirru 2009; Rini et al. 2016; Tang et al. 2005; Zanaganeh et al. 2009). Present study utilizes GA to tune the parameters of FIS structure, which is later used to predict saltwater intrusion processes in coastal aquifers under parameter uncertainty. The concept of GA was first introduced by Holland (1975). GA provides better and flexible optimal solutions compared to conventional optimization methods for complex and difficult problems. Broad and diverse applicability, ease of application, and global perspective are some of the distinguishing features that make GA a popular search algorithm in various optimization problems (Goldberg 1989). GA works differently compared to classical or conventional search or optimization methods. A GA is originated from Darwin’s principle of natural selection process, which is based on biological evolution. As the algorithm progresses, a population of individual solutions is repetitively modified. This progression of the algorithm consists of several iterations or steps. At each step, GA randomly chooses individuals from the current population and uses these individuals as parents to generate children for the next generation. Over successive generations, the population “evolves” toward an optimal solution. Therefore, contribution of the present study is to develop a saltwater intrusion management model using computationally efficient fuzzy logic based emulators whose parameters are tuned using GA. These emulators are used to approximate coupled flow and salt transport processes in a multilayered coastal aquifer system considering a set of uncertain model parameters. This study adopts a multi-layered anisotropic coastal aquifer system in which each individual layer represents different materials characterized by varying hydraulic conductivity values in these layers. The flow and transport process considered are also transient and density dependent.
Methodology The proposed methodology consists of a density reliant coupled flow and salt transport numerical simulation model, a fuzzy logic based emulator of the physical processes in the coastal aquifer (FIS), and a population based optimization algorithm (GA). The simulation model is used to simulate physical processes in the aquifer and to generate input–output (predictor-response) training patterns for FIS models. FIS models are used as approximate emulators of the numerical
simulation model, and GA is used to optimize parameters of FIS models. GA tuned FIS models (GA-FIS) are used to predict salinity concentrations at different observation points (OP) in the multi-layered coastal aquifer system. Finally, saltwater intrusion management model is developed by externally linking the GA-FIS models to a Controlled Elitist Multi-objective Genetic Algorithm (CEMGA) to obtain optimal groundwater extraction strategies. The performance of the GA-FIS-CEMGA based management model is compared with that of a management model obtained through linking basic ANFIS models to CEMGA (ANFIS-CEMGA). Brief descriptions of each of the components of the proposed methodology is presented in the following subsections.
Flow and salt transport model FEMWATER (Lin et al. 1997), a three-dimensional (3D) finite element based and density reliant linked flow and solute transport numerical simulation model is used to simulate the physical processes of a multi-layered coastal aquifer system with spatial and temporal groundwater extraction patterns. Spatiotemporal groundwater extraction patterns are obtained from Latin Hypercube Sampling (LHS) technique (Pebesma and Heuvelink 1999). These transient groundwater extraction patterns are used as inputs to FEMWATER to obtain salinity concentrations at specified OPs as outputs. Governing equations of coupled flow and solute transport processes are expressed by the following sets of equations (Lin et al. 1997): [ ( )] 𝜌 𝜌 𝜌 𝜕h F = ∇ × K × ∇h + ∇z + ∗ q (1) 𝜌0 𝜕t 𝜌0 𝜌
F = 𝛼�
𝜃 dS + 𝛽�𝜃 + n n dh
(2)
where F denotes storage coefficient, h is the pressure head, t represents time, K is hydraulic conductivity tensor, z symbolizes potential head, q represents either a source or a sink, 𝜌 represents the water density at chemical concentration C, ρo indicates referenced water density at zero chemical concentration, ρ* represents density of injection fluid or that of the withdrawn water, θ represents moisture content, α′ and β′ indicates respectively modified compressibility of water and the medium, n represents porosity of the medium, and S stands for saturation. Hydraulic conductivity tensor, K is represented by / ( / ) 𝜌 𝜌o 𝜌 𝜌o 𝜌 o g 𝜌g ks kr = / Kso kr K= k= / (3) 𝜇 𝜇 𝜇o 𝜇o 𝜇 𝜇o in which µ stands for waters’ dynamic viscosity at chemical concentration C, µo represents the reference dynamic viscosity at zero chemical concentration, ks is saturated
13
Modeling Earth Systems and Environment
permeability tensor, kr is relative permeability or relative hydraulic conductivity, kso stands for referenced saturated conductivity tensor. The 3D solute transport equation is expressed as:
patterns is used for training and validation of the proposed GA-FIS based prediction models.
𝜕S 𝜕C + 𝜌b + V ⋅ ∇C − ∇ ⋅ (𝜃D ⋅ ∇C) 𝜕t 𝜕t ( )( ) ( ) 𝜕h = − 𝛼� + 𝜆 𝜃C + 𝜌b S − 𝜃Kw C + 𝜌b Ks S 𝜕t ) ( ( ) 𝜌 𝜌∗ 𝜕C 𝜕h 𝜌o C − + V ⋅∇ + m − qC + F 𝜌 𝜕t 𝜌 𝜌o 𝜕t (4)
FISs are suitable for nonlinear mapping of predictor–response relationships. A Sugeno-type FIS, also known as Takagi–Sugeno–Kang FIS introduced in 1985 (Sugeno 1985) is ideal for this nonlinear mapping. FISs utilize a set of fuzzy if-then rules in establishing this predictor-response relationships. For a first order Sugeno type FIS with two inputs (α and β), one output (γ), and two fuzzy rules, the simplest form of fuzzy if-then rules are expressed as:
𝜃
where 𝜌b symbolizes bulk density of medium, C stands for material concentration in aqueous phase, S is material concentration in adsorbed phase, t the time, V represents discharge, ∇ stands for del operator, D indicates Dispersion coefficient tensor, 𝜆 denotes the decay constant, M = qCm is the artificial mass rate, q is the source rate of water, Cin is the material concentration in the source, Kw is the first order biodegradation rate constant through dissolved phase, K s is the first order biodegradation rate through adsorbed phase, Kd is the distribution coefficient. Dispersion coefficient tensor D in Eq. (4) is represented by
( ) VV 𝜃D = aT |V|𝛿 + aL − aT + am 𝜃𝜏𝛿 |V|
(5)
where |V| is the magnitude of V, 𝛿 the Kronecker delta tensor, aT is lateral dispersivity, aL is longitudinal dispersivity, am is the molecular diffusion coefficient, and 𝜏 is tortuosity.
Dataset preparation (predictor‑response training pattern) Learning from specified trends using human expert knowledge is one of the most important features of FIS based modelling approaches. Proposed GA-FIS models learn from predictor–response patterns generated by the numerical simulation model. Predictors are the spatiotemporal groundwater extraction patterns from a set of production bores and barrier extraction wells. In the performance evaluation scenarios incorporated in this study, practical extraction limits of 0–1300 m3/day is assigned to each of the production and barrier extraction wells. As mentioned earlier, the statistically generated extraction values for the training purpose are obtained from a uniform distribution using LHS. These inputs to the simulation model are the randomized groundwater extraction values. Saltwater concentrations obtained as solution outputs from the simulation model are the corresponding responses. A set of such predictor–response
13
AI based models (FIS and ANFIS)
Rule 1 ∶ If 𝛼 is P1 and 𝛽 is Q1 then f1 = p1 𝛼 + q1 𝛽 + r1 , (6) Rule 2 ∶ If 𝛼 is P2 and 𝛽 is Q2 then f2 = p2 𝛼 + q2 𝛽 + r2 . (7) This is illustrated in Fig. 1. Figure 1 shows the equivalent type 3 ANFIS architecture originated from type 3 fuzzy reasoning represented in Eqs. (6) and (7). The resulting ANFIS structure has five layers, namely a fuzzy layer, a product layer, a normalized layer, a defuzzification layer, and a total output layer. A detailed description of these layers can be found in Jang (1993) and is not repeated. An ANFIS allows advantages from both fuzzy logic theory and artificial neural networks. The simple structure of a Sugeno type ANFIS has a good learning capability compared to other types of ANFIS architectures (Jang et al. 1997). Desired ANFIS structures are obtained from an initial FIS structure whose parameters are optimized by using either a hybrid algorithm (Jang 1993) or a population based optimization algorithm, e.g. GA (Goldberg 1989). In situations where the considered problem has a high dimensional large number of input variables, compressing the dataspace by using a suitable clustering algorithm may be of great usefulness. Fuzzy c-mean clustering (FCM) (Bezdek et al. 1984) is a useful tool in compressing the dataset by dividing them into a group of identical clusters. This technique
Layer 1 α
P1
Layer 2
Layer 3
Layer 4
W1
W1
W 1 f1
P2 β
Q1 Q2
Layer 5
αβ
f W2
W2 f2
W2 αβ
Fig. 1 ANFIS architecture based on a two-input first-order Sugeno FIS
Modeling Earth Systems and Environment
greatly reduces the number of modifiable parameters (linear and nonlinear) and fuzzy if-then rules of a FIS. Training rule and algorithm for adaptive networks GD and chain rule, proposed in early 1970s by Werbos (1974) were the basic learning rules of adaptive networks. However, this gradient based method is slow in convergence and it has the tendency to become trapped in local minima. Therefore, to speed up the learning process and to avoid premature convergence in local optima Jang (1993) proposed a hybrid learning rule (Jang 1991), which integrates the GD approach and the least squares estimate (LSE) to identify optimal parameters of adaptive networks. Each iteration of this hybrid learning approach is associated with both a forward and a backward pass. Suppose that the parameter set S can be decomposed into two subsets such that S = S1 ⊕ S2 , where ⊕ indicates the direct summation. Then, for a given constant parameter values in S1, the obtained parameter values in S2 are guaranteed to be the global optimal values in the parameter space of S2 (Jang 1993). This hybrid learning algorithm not only reduces the gradient method’s search space dimension, but also achieves a substantial amount of computational efficiency (Jang 1993). FCM Clustering is a useful tool to identify the natural groupings of data from a large set of data to generate a succinct representation of the behavior of a system. FCM is an approach in which the entire dataset is grouped into n identical clusters. Every individual data point belongs to every cluster to a certain degree of belongingness. A data point that lies close to the center of any particular cluster has a higher degree of belongingness compared to a data point that lies far away from the center of that particular cluster. Several clustering criteria has been used to identify optimal fuzzy c-partitions. Among them, generalized least-squared errors criteria is one of the most popular and widely used method (Bezdek et al. 1984). In this method, FCM is based on minimizing the following objective function
Jm =
D N ∑ ∑ i=1 j=1
‖ ‖2 𝜇ijm ‖xi − cj ‖ , ‖ ‖
(8)
where, N is the number of clusters, D represents the number of data points, m indicates fuzzy partition matrix exponent with m > 1, xi is the ith data point, cj denotes center of the jth cluster, µij indicates degree of membership of xi in the jth cluster. A value of m is chosen such that fuzzy overlap between adjacent clusters are minimum. Fuzzy overlap indicates a
measure of fuzziness of the boundaries between clusters. In other words, fuzzy overlap refers to the number of data points that belongs to more than one clusters with significant membership grades. The lower the values of m, the better is the clustering (MathWorks 2017a). A value of m = 2 is found optimal in the present study. Optimum number of clusters Selecting optimum number of clusters is an important preprocessing step of the FIS model development using FCM algorithm. The optimum right number of clusters is decided based on the type of problem and dimension of the input space. A model with a simple architecture is always preferable in most cases (Mohammadi et al. 2016). In this research, optimum number of clusters is selected by conducting several trials using different number of clusters and observing the resulting root mean square error (RMSE) between the actual salinity concentration values and predicted responses obtained using selected FIS models. Number of clusters that produce minimum RMSE value as well as least variance in RMSE values between learning and testing sets of data are chosen as adequate. Lowest variance in RMSE values between training and test datasets is checked to protect against model over-fitting. Optimum number of clusters is selected for both the hybrid algorithm (HA) trained ANFIS and GA trained GA-FIS models. The trial is conducted on salinity concentration values obtained at five OPs. It is noted that model architecture with two clusters provides a reasonably accurate prediction model at almost all OPs except at OP4 in which GA-FIS used three clusters to provide an accurate prediction model. The number of input and output membership functions (MF) at each OP can be expressed as:
NMF(input) = NX × Nclusters .
(9)
(10) where, NMF(input) is the number of input membership functions, NMF(output) is the number of output membership functions, NX is the number of predictors, NY is the number of responses, and Nclusters is the number of clusters. All GAFIS and ANFIS models are developed using commands and functions of Fuzzy Logic Toolbox in MATLAB (MathWorks 2017b).
NMF(output) = NY × Nclusters .
Genetic algorithm Working principles of GA are based on the theory of natural genetics and natural selection. Fundamental ideas of GA are different from classical optimization methods in that GAs utilize a coding of variables rather than the variables themselves (Deb 1999). GAs use a probability based search technique to provide a population of solutions instead of a single
13
Modeling Earth Systems and Environment
solution. Unlike traditional optimization approaches that are based on fixed transition rules to swap between the solutions, GAs search procedure begins with an initial random population. Therefore, GAs search procedure may progress in any direction and is not associated with any crucial decision at the beginning of the search process. The working principle of GA is summarized as follows: 1. Generate an initial random set of solutions. 2. Evaluate each candidate solution in relation to the underlying problem. 3. Check a termination criterion. 4. Stop search process and provide the optimum solution if termination criterion is satisfied. 5. Modify population of solutions using three main operators if termination criterion is not satisfied. This modified population of solutions is expected to be better than the previous population of solutions. After each iteration, the generation counter keeps note of every completed generation of the GA search. Figure 2 shows a flowchart of the fundamental working principle of GA. As shown in the flowchart, working principle is simple and straightforward. However, despite the operations are simple, GAs are nonlinear, multifaceted, complex, and stochastic in nature (Deb 1999). Three basic genetic operators, e.g. reproduction, crossover, and mutation operators form the major part of the GA search mechanism. Reproduction operator identifies better (above-average) solutions in a population, creates multiple copies of these solutions, and replaces these multiple copies of solutions by eliminating worse solutions in the population. Reproduction operator does not generate any new solutions in the population. It can only create more copies of good solutions at the cost
Start
Reproduction operator
Genetic operators
t=0 Initialize random population of solutions
Cond?
Crossover operator
Mutation operator
t = t +1
Fig. 2 Flowchart of the operational principles of a GA
13
of “not-so-good” solutions. Crossover and mutation operators perform the task of creating new solutions. In a typical crossover operator, two strings are randomly taken from the mating pool and some parts of the strings are swapped between the strings. Crossover operators are principally accountable for the search mechanism of GAs. Mutation operators are also responsible for the search aspect sparingly. The role of mutation operators in GA search process is to maintain diversity in the population. The mutation operator executes swapping between a 1 and 0 with a small mutation probability, pm. The role of these three genetic operators can be summarized as follows: (a) reproduction operator chooses good strings, (b) crossover operator recombines good substrings from two good strings to create a relatively better string, and (c) mutation operator swaps a string locally to form a comparatively better string (Deb 1999). If bad strings are generated in any generation, these strings will be eliminated by the reproduction operator in subsequent future generations (Deb 1999).
Development of the proposed GA‑FIS coupled model One major drawback of applying fuzzy logic based modelling approaches in high dimensional dataset is the selection of suitable rules to ensure the best performance of the model. A first order Sugeno type FIS solves this issue through modifying the rules and learns adaptively to provide optimal sets of parameters for the FIS architecture. Tuning the parameters of antecedent and consequent parts of the rule base is usually performed using the gradient method. However, this gradient method is generally slow in convergence and is probable to become trapped in local minima. To address this issue, Jang (1993) proposed a hybrid learning rule which integrates GD and LSE to identify optimal parameters. In this hybrid algorithm, the nonlinear premise parameters are estimated by GD through error backpropagation whereas the linear consequent parameters are estimated by recursive LSE. To achieve further improvement in the tuning process of FIS models, population based search algorithms replaces the traditional search techniques. Commonly used optimization algorithms to optimize FIS parameters are GA (Ishigami et al. 1995; Zanaganeh et al. 2009), PSO (Oliveira and Schirru 2009; Rini et al. 2016), and Cuckoo search (Araghi et al. 2015) etc. These population based hybrid ANFIS learning techniques have been applied in different research domains, e.g. in traffic signal control (Araghi et al. 2015), in sensor monitoring (Oliveira and Schirru 2009), in determining optimum parameters of a protective spur dike (Basser et al. 2015), in determining spatiotemporal groundwater quality parameters (Jalalkamali 2015) etc. In this study, for the first time, this hybrid learning technique using GA is proposed to develop FIS models in approximating saltwater intrusion processes
Modeling Earth Systems and Environment
in a multi-layered coastal aquifer system under the influence of spatiotemporal groundwater extraction values. In this work, GA-FIS hybrid models are developed to predict salinity concentrations at specified OP situated in the aquifer. The GA-FIS models have 80 inputs and one output. Spatiotemporal groundwater extraction values from a set of production bores and barrier extraction wells make the inputs to the GA-FIS models. The outputs are the resulting salinity concentrations observed at specified OPs at the end of the simulation period of 5 years. Both antecedent and consequent parameters are tuned using GA to obtain the optimal FIS model architecture. During the training phase, GA evaluates different parameters until obtaining the optimal parameters for the FIS models. The cost function of the GA optimization approach is the mean squared error (MSE) values that reflects the training error. Therefore, the objective is to minimize the MSE values between the targets (actual) and FIS outputs (predicted) on the training dataset. The cost function is represented as: ∑n 2 i=1 (Ci,o − Ci,p ) (11) Minimize, fMSE = n where, fMSE is the cost function to be minimized, i = 1, … , n is the number of training data, Ci,o is the observed salinity concentration values in the training dataset, and Ci,p is the predicted salinity concentrations in the training dataset. Optimum GA parameters are determined through conducting a set of trials using different combinations of these parameters. Based on these trials, the optimum combination of GA parameter sets are estimated as: population size 200, crossover fraction 0.90, mutation fraction 0.85, mutation rate 0.005, and selection pressure 8. These parameter values (GA parameters) are obtained through trial and error procedure and are regarded as optimum values at least for this example problem. Once the best GA-FIS model structures with optimized parameters are obtained, the developed models are presented with a new set of test data and testing errors (RMSE values) are computed. Five GA-FIS models are developed at five OPs. The performance of the proposed GA-FIS hybrid models are compared with that of the basic ANFIS models. In ANFIS models, the antecedent parameters are optimized using GD through error backpropagation and the consequent parameters are tuned using recursive LSE.
Development of the management model The proposed saltwater intrusion management model is based on the concept of linked S/O approach. However, the complex numerical simulation component is replaced by the appropriately learned and validated GA-FIS models that approximate the associated physical properties of the
aquifer processes. The management model is multi-objective in nature in which two conflicting objectives of groundwater extraction patterns are considered. Therefore, the management model provides several alternate feasible solutions of groundwater extraction patterns from the well field, showing a trade-off between the conflicting objectives represented by a Pareto optimal front. The conflicting objectives are: (1) maximize extraction of water from the production bores for beneficial purposes; and (2) minimize water abstraction from barrier extraction wells. Water extracted from barrier extraction wells cannot be used because of high salinity content, therefore, the objective is to minimize water extraction from these wells. Barrier extraction wells are used to create a hydraulic barrier along the coast to allow hydraulic control of saltwater intrusion. Mathematical formulation of the proposed saltwater intrusion management model is similar to one proposed in Dhar and Datta (2009)
Maximize ∶ f1 (QPW ) =
M T ∑ ∑
t QPW m
(12)
m=1 t=1
Minimize ∶ f2 (QBW ) =
N T ∑ ∑
QBW tn
(13)
n=1 t=1
s. t. Ci = 𝜉(QPW , QBW )
(14)
Ci ≤ Cmax ∀i
(15)
QPW min ≤ QPW tm ≤ QPW max
(16)
QBW min ≤ QBW tn ≤ QBW max
(17) water extraction from the mth pumping well throughout t th time phase; QBW tnstands for water extraction from nth barrier extraction well throughout t th time phase; Cisymbolizes saltwater concentrations at i th OPs at the closure of the management period; 𝜉()denotes the density reliant coupled flow and salt transport simulation model, and constraint (14) indicates linking of the simulation model within the optimization framework, either using a numerical simulation model, or a trained and tested Metamodel; constraint (15) specifies the maximum allowable salt concentration at specified OPs; Eqs. (16) and (17) outline the lower and upper limits on the water extraction rate from the pumping wells and barrier extraction wells, respectively; subscripts PW and BW stands for production bores and barrier extraction wells, respectively; M, N, and T stands for the entire pumping wells, barrier extraction wells, and time periods, respectively. The first objective of maximization of groundwater extraction from the pumping wells for beneficial use is represented by Eq. (12), and the second objective where QPW tm represents
13
of minimizing the water extraction from barrier pumping wells is given by Eq. (13). Optimization algorithm: CEMGA A population based optimization algorithm, CEMGA (Deb and Goel 2001) is utilized to solve optimization routine of the linked S/O based saltwater intrusion management model. CEMGA includes individuals with a relatively low fitness value to the next generation in order to increase the diversity of the population. Therefore, the algorithm has a mechanism to control the elite members of the population to maintain the population diversity. As the algorithm progresses, this control of elite members makes the new population more diverse. This controlled elitist approach reduces the elitism effect by including a particular fraction of the dominated populations to the present best non-dominated populations. CEMGA has performed better in terms of providing better convergence to the global Pareto optimal solution for a number of complex test problems (Deb and Goel 2001) compared to its previous version of non-controlled elitist multi-objective genetic algorithm (Deb et al. 2000). The elite control mechanism of CEMGA uses a pre-defined geometric distribution of the number of individuals in each front to control the maximum number of individuals allowed in the ith font, ni such that ni = r × ni−1, where r represents the reduction rate, the value of which should be less than one. Coupled simulation–optimization This study proposes multiple objective optimization in the form of an externally linked S/O methodology together with GA-FIS based meta-modelling approach to develop saltwater intrusion management model for a multi-layered coastal aquifer system incorporating uncertainty in model parameters. Properly trained and validated GA-FIS models replace the complex numerical simulation model to approximate the 3D coupled flow and solute transport processes. GA-FIS models are linked to the optimization algorithm as constraints of the optimization routine. Other constraints of the optimization procedure are the maximum allowable saltwater concentrations at the specified OPs. These constraints are set based on the potential usability of the extracted water from the production bores at different regions of the study area. Therefore, the aim of developing this saltwater intrusion management model is to provide several alternate feasible solutions of optimal groundwater extraction values while restricting the salinity concentrations at specified OPs to predefined allowable limits. The similar procedure is applied to develop another management model by incorporating ANFIS models into the CEMGA as binding constraints.
13
Modeling Earth Systems and Environment
Parallel computing To achieve further computational efficiency, the multi-objective optimization formulation is executed in a parallel computation framework by distributing the objective functions and all the constraints among a parallel pool of multiple workers. Parallel computing enables speed computations that help achieve an additional computational efficiency in the GA-FIS and ANFIS based linked S/O approach. Present study performs parallel computing by utilizing the physical cores of a CPU [Intel (R) Core (TM) i7-4790
[email protected] GHz] by using parallel computing toolbox of MATLAB (MathWorks 2017c).
Statistical indices used for performance evaluation RMSE, Mean Absolute Percentage Relative Error (MAPRE), Index of Agreement (IOA), Coefficient of Correlation (R), and Threshold Statistics (TS) are used to evaluate performances of both ANFIS and GA-FIS models. Root Mean Square Error (RMSE) is calculated using
RMSE =
by
√
n
1∑ (C − Ci,p )2 n i=1 i,o
(18)
Absolute Percentage Relative Error (APRE) is calculated
APRE =
n | | ∑ | Ci,o − Ci,p | | | × 100 | Ci,o | i=1 | |
(19)
Mean Absolute Percentage Relative Error (MAPRE) is expressed as
MAPRE =
n 1 ∑ || Ci,o − Ci,p || | | × 100 n i=1 || Ci,o ||
(20)
Index of Agreement (IOA) is calculated by
d =1−
n � ∑
i=1
Ci,o − Ci,p
�2
� n � ∑ � � � 2 � �Ci,p − Co � + �Ci,o − Co � � � � i=1 �
(21)
Correlation Coefficient (R) is calculated by
R= �
∑n
(Ci,o − Co )(Ci,o − Cp ) � 2 ∑n ∑n 2 (C − C ) (Ci,p − Cp ) i,o o i=1 i−1 i=1
Threshold Statistics (TS) is expressed as
(22)
Modeling Earth Systems and Environment
TS =
na × 100 n
PWs = Production wells BWs = Barrier wells OPs = Observation points
(23)
where, Ci,o and Ci,p are the observed and predicted salinity concentration values, respectively; C0 and Cp denotes the mean of the observed and predicted salinity concentrations; and n represents the number of data points, na is the number of data points whose APRE values is less than a specified threshold value a%.
N
PW1 PW2
OP4
Case study An illustrative multi-layered coastal aquifer system with a set of uncertain model parameters is used to evaluate the performance of the proposed methodology. Hydraulic conductivity, compressibility, and bulk density are assumed homogeneous but uncertain within each vertical layers of materials. Different realizations of these uncertain model parameters are used in each vertical material layer. An anisotropy ratio of 0.5 (kv/kh) is used for hydraulic conductivity realizations. kv and kh represents vertical and horizontal hydraulic conductivities, respectively. Hydraulic conductivity values in the Z-direction are considered as one-tenth of the hydraulic conductivity values used in the X-direction. Different realizations of compressibility and aquifer recharge are also used. Aquifer recharge is uniformly spread over the top layer of the aquifer. These uncertain model parameters are randomly paired with each other and combined with the transient groundwater extraction values obtained from a set of production bores and barrier extraction wells. These input values along with other initial and boundary conditions are used as inputs to the numerical simulation model to obtain salinity concentrations at specified OPs at the end of the management period.
Description of the study area The multi-layered coastal aquifer system considered is similar to one developed in Roy and Datta (2017a, b, c) and is illustrated in Fig. 3. The illustrative study area has an aerial extent of 4.35 km2. Total thickness of the aquifer is 80 m, which is divided into four distinct layers of aquifer materials. An initial head of 0 m and a constant concentration of 35,000 mg/l are assigned in the seaside boundary. Assigned specified head of 1 m at the upstream end of the river is assumed to vary linearly along the stream until it reaches 0 m at the seaside boundary. The study considers 11 production bores and 5 barrier extraction wells with a fairly well spread extraction well fields (well density = 3.68 wells/km2). In Fig. 3, production extraction wells are represented by PW1–PW11 whereas the barrier extraction wells are indicated by BW1–BW5. Production bores are used to extract water from
Production wells PW7
PW5 PW8
PW4 PW3 PW6 & OP1
PW11
PW9 & OP2 PW10 & OP3 OP5 BW5
BW1 BW2
BW3
80 m
BW4
Fig. 3 Three dimensional view of the coastal aquifer study area
the aquifer for beneficial purposes. On the other hand, barrier extraction wells are placed near the shoreline to control saltwater intrusion by creating a hydraulic barrier along the coast. Water is abstracted from the second and third layers of the aquifer. Total simulation period of 5 years is divided into 5 uniform time steps of 1 year each. Within each time step, water extraction from both the production bores and barrier extraction wells is assumed constant. Saltwater concentrations at the end of the management period is monitored at 5 OPs located at 3 different salinity zones: OP1 is located in the low salinity zone, OP2 and OP3 are located at the moderate salinity zone, and OP4 and OP5 are located at high salinity zones. OPs are placed at different salinity zones with a view to use the extracted water from different regions of the aquifer for different purposes. The proposed GA-FIS and ANFIS based linked S/O methodology considers 80 input pumping variables [16 wells (11 production bores + 5 barrier extraction wells) × 5 years] of groundwater extraction in space and time. These variables are designated by X1–X80. Variables X1–X55 represents groundwater extraction for the management period of 5 years from the production bores, PW1–PW11. Water abstracted from barrier wells, BW1–BW5 are indicated by X56–X80.
Parameter randomization Four model parameters, e.g. hydraulic conductivity, aquifer recharge, bulk density, and compressibility of the aquifer material are assumed homogeneous but uncertain within each material layer of the multi-layered coastal aquifer system. Different realizations of these uncertain model parameters are obtained from different statistical distributions. A representative set of hydraulic conductivity realizations are obtained from a lognormal distribution with a specific mean and standard deviation of the associated normal distribution (Table 1).
13
Table 1 Parameter distributions with mean and standard deviation values used in the simulation
Modeling Earth Systems and Environment Parameters
Unit
Material layer
Distribution
Mean
Standard deviation
Hydraulic conductivity in x-direction Hydraulic conductivity in y-direction Hydraulic conductivity in z-direction Compressibility Bulk density Recharge Hydraulic conductivity in x-direction Hydraulic conductivity in y-direction Hydraulic conductivity in z-direction Compressibility Bulk density Hydraulic conductivity in x-direction Hydraulic conductivity in y-direction Hydraulic conductivity in z-direction Compressibility Bulk density Hydraulic conductivity in x-direction Hydraulic conductivity in y-direction Hydraulic conductivity in z-direction Compressibility Bulk density
m/d m/d m/d md2/kg kg/m3 m/d m/d m/d m/d md2/kg kg/m3 m/d m/d m/d md2/kg kg/m3 m/d m/d m/d md2/kg kg/m3
1 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4
Lognormal Lognormal Lognormal Uniform (LHS) Normal (LHS) Uniform (LHS) Lognormal Lognormal Lognormal Uniform (LHS) Normal (LHS) Lognormal Lognormal Lognormal Uniform (LHS) Normal (LHS) Lognormal Lognormal Lognormal Uniform (LHS) Normal (LHS)
5.02 2.52 0.50 7.37E-16 1650 0.00019 9.97 4.98 1.00 7.35E-18 1600 14.95 7.47 1.49 7.35E-18 1550 3.05 1.53 0.31 7.35E-17 1700
0.30 0.15 0.03 3.50E-16 5 3.48E-05 0.28 0.14 0.03 3.49E-18 5 0.29 0.15 0.03 3.49E-18 5 0.30 0.15 0.03 3.49E-17 5
Recharge is distributed uniformly over the first layer of the study area
Lognormal distribution is a probability distribution in which the uncertain model parameter hydraulic conductivity is divided into N equally probable intervals from which a single value is chosen randomly. Aquifer recharge and compressibility realizations are generated from LHS uniform distributions for specific lower and upper bounds within the parameter space. Realizations of bulk density are obtained from LHS technique from a p-dimensional multivariate normal distribution with specific mean and covariance. The values of hydraulic conductivity, recharge, compressibility, and bulk density are then shuffled randomly and combined to obtain a multivariate random realizations of uncertain model parameters. One hundred realizations of each parameter are generated for each material layer. A total of 3000 uniformly distributed groundwater extraction values are generated from the variable space with a range of 0–1300 m3/ day. For each randomized uncertain model parameter set, 30 sets of transient pumping values at the well locations are assigned. The obtained 3000 randomized combined realizations of uncertain model parameters and transient groundwater extraction values are then used along with other initial and boundary conditions as inputs to the simulation model in order to obtain the corresponding salinity concentrations at specified OPs.
13
Performance evaluation Performances of the GA‑FIS and ANFIS models Performances of the GA-FIS models to approximate density dependent coupled flow and salt transport processes are evaluated based on a set of different statistical indices. Each developed GA-FIS model is utilized to predict salinity concentrations at a specified OP. Performances of GA-FIS models are compared with those of basic ANFIS models. Results are summarized in Tables 2, 3, 4 and 5 and Figs. 4, 5 and 6. Table 2 presents the training and testing RMSE values as well as differences between these two RMSE values at all OPs. In all cases, ANFIS models produce smaller training errors than GA-FIS models. However, ANFIS models produce higher testing errors than GA-FIS at all OPs. As such, difference in errors between training and testing datasets is smaller in GA-FIS than in ANFIS models. Therefore, GAFIS models seem to be more robust and is likely to provide smaller errors when presented with a completely new set of unseen data. Performances of both GA-FIS and ANFIS models based on RMSE, MAPRE, IOA and R criteria are presented in Table 3. It is observed form Table 3 that both models produce relatively lower values of RMSE and MAPRE, and
Modeling Earth Systems and Environment Actual Actual
38 36 34 32 30 28 26 24
Saltwater concentration, mg/l
0
5
10
Actual
15 Sample index ANFIS
20
25
GA-FIS
1100 1000 900 800 700 10
Actual
15 Sample index
ANFIS
20
900 800 700
0
5
25
30
10
Actual
(d) 5600
1200
5
GA-FIS
1000
600
30
1300
0
ANFIS
(b) 1100
40
600
(e)
GA-FIS
Saltwater concentration, mg/l
Saltwater concentration, mg/l
(c)
ANFIS
Saltwater concentration, mg/l
Saltwater concentration, mg/l
(a)
15 Sample index ANFIS
20
25
30
GA-FIS
5400 5200 5000 4800 4600 4400 4200 4000 3800
0
5
10
15 Sample index
20
25
30
GA-FIS
6000 5800 5600 5400 5200 5000 4800
0
5
10
15 Sample index
20
25
30
Fig. 4 Actual and predicted salinity concentrations at a observation point OP1, b observation point OP2, c observation point OP3, d observation point OP4, and e observation point OP5
higher values of IOA and R. Although both GA-FIS and ANFIS models are sufficiently accurate in predicting the responses, GA-FIS exhibits relatively better performance than ANFIS models at all OPs. Therefore, it is concluded that the GA-FIS models’ prediction accuracy in terms of capturing the trends of responses at different regions is quite satisfactory based on RMSE criterion. However, the drawback of RMSE criterion is that it provides more weights to the outlying observations. Therefore, MAPRE criteria is used to provide a comparatively better information on the distribution of errors. MAPRE criterion also demonstrates the better performance of GA-FIS compared to ANFIS models. Developed models are also evaluated from the correlation coefficient viewpoint. All GA-FIS models produce higher values of R compared to their ANFIS counterparts.
However, R values at OPs 4 and 5 are comparatively smaller (close to 70%). Therefore, IOA (Willmott 1984) is proposed to ascertain the proposed model’s prediction capability as well as to overcome the insensitivity of R to differences in the actual and predicted means and variances (Legates and McCabe 1999). Both models produce reasonable and acceptable values of IOA at all OPs. Performances of GAFIS models are also better based on the IOA criterion. Figure 4 illustrates actual versus predicted salinity concentrations at different OPs. For brevity of presentation, only first 30 samples out of a total of 600 test datasets are presented in Fig. 4. For this performance evaluation purpose, the actual concentrations are those synthetically obtained as solution of the numerical simulation model in response to water abstraction from the aquifer. The predicted
13
Modeling Earth Systems and Environment
(a1)
(a2) 39
41
37 Actual salinity, mg/l
Actual salinity, mg/l
39 37 35 33 31 29
25
27
29 31 33 35 37 Predicted salinity, mg/l
39
(b2) 1300
1200
1200 Actual salinity, mg/l
Actual salinity, mg/l
29
1300 1100 1000 900 800 700 500 450
27
500
600
29 31 33 35 Predicted salinity, mg/l
37
39
1100 1000 900 800 700
600 750 900 1050 Predicted salinity, mg/l
500
1200
(c2) 1700
1450
1500
Actual salinity, mg/l
1650
1250 1050 850 650 450 450
25
600
600
Actual salinity, mg/l
31
25
41
(b1)
(c1)
33
27
27 25
35
650
850 1050 1250 1450 Predicted salinity, mg/l
1650
700 800 900 1000 1100 1200 Predicted salinity, mg/l
1300 1100 900 700 500
500
700 900 1100 1300 Predicted salinity, mg/l
1500
Fig. 5 Regression plots of actual versus predicted saltwater concentrations at a observation point OP1, b observation point OP2, c observation point OP3, d observation point OP4, e observation point OP5; subscript 1 refers to ANFIS, and subscript 2 refers to GA-FIS
concentrations denote the concentrations predicted by the meta-models. It is observed from Fig. 4 that both GA-FIS and ANFIS models predictions are very similar to those of the actual salinity concentrations. Both models are able to capture the trends of data quite accurately. However, GAFIS models provide relatively better predictions than ANFIS models. This is also evident from the regression plots of actual versus predicted salinity concentrations as shown
13
in Fig. 5. GA-FIS models provide a better fit of data with higher values of R compared to ANFIS models at all OPs. Figure 6 illustrates boxplots of absolute errors between the actual and predicted salinity concentration values obtained by GA-FIS and ANFIS models at different OPs. In Fig. 6, the red and blue horizontal lines indicate medians of the absolute errors produced by ANFIS and GA-FIS models, respectively. Mean of absolute errors by both models
Modeling Earth Systems and Environment
Actual salinity, mg/l
Actual salinity, mg/l
5800 5400 5000 4600 4200 3800 3800
(e1)
(d2)
6200
4200
4600 5000 5400 5800 Predicted salinity, mg/l
6200 5800 5400 5000 4600 4200 3800 3800
6200
6600
(e2) 6600
6200
6200
Actual salinity, mg/l
Actual salinity, mg/l
(d1)
5800 5400 5000 4600 4600
5000 5400 5800 Predicted salinity, mg/l
6200
4200
4600 5000 5400 5800 Predicted salinity, mg/l
6200
5800 5400 5000 4600 4600
5000 5400 5800 Predicted salinity, mg/l
6200
Fig. 5 (continued)
is represented by small black circles. Figure 6 also demonstrates the superiority of GA-FIS models over ANFIS models at all OPs based on absolute error viewpoint. Performances of the proposed models are also evaluated based on the TS criterion, which provides distribution of errors. TS is an indication of the percentage of sample indices whose Relative Error (RE) values are smaller than the pre-defined threshold values. Four threshold values (< 5, < 10, < 15, and 20%) are used in the present study so that RE values obtained fall within the selected threshold values. It is apparent from Table 4 that GA-FIS models outperform ANFIS models in terms of TS criterion. Another important criteria that should be taken care of while evaluating the performance of any prediction model is the computational time required to train the model. Table 5 presents training time requirement in the development of both GA-FIS and ANFIS models. It is noted that the difference in training time requirement among different OPs by both GA-FIS and ANFIS models is not very substantial. However, at all OPs, GA-FIS requires more time to train. This is because GA performs a thorough search in order to provide an optimal sets of parameter values. However,
once trained these models are able to provide prediction results very quickly (fraction of a minute). Therefore, it is concluded that GA-FIS provides relatively better approximations of the coupled flow and salt transport processes than ANFIS models at least based on the limited evaluations presented here. Performance of the management model Saltwater intrusion management models are developed by integrating GA-FIS and ANFIS models separately with a population based multi-objective optimization algorithm, CEMGA. The proposed management models provide optimal solution of groundwater abstraction in the form of Pareto optimal fronts that show the tradeoffs between the two conflicting objectives of groundwater abstraction. The optimization routine is run in a parallel computing platform by distributing the objective functions and the constraints among four physical cores of a PC. The performance of the GA-FIS-CEMGA based management model is compared with that of the ANFIS-CEMGA based management model. The GA-FIS-CEMGA model evaluates 1,804,600 functions
13
(a)
(b) 5
200
Absolute error, mg/l
Absolute error, mg/l
Fig. 6 Box plots of absolute errors between actual and predicted saltwater concentration values at a observation point OP1, b observation point OP2, c observation point OP3, d observation point OP4, e observation point OP5
Modeling Earth Systems and Environment
4 3 2 1
150
100
50
0
0 ANFIS
Models
GA-FIS
ANFIS
(c)
1200
200
Absolute error, mg/l
Absolute error, mg/l
GA-FIS
(d) 250
150 100 50
1000 800 600 400 200
0
0 ANFIS
(e)
Models
Models
GA-FIS
ANFIS
Models
GA-FIS
700
Absolute error, mg/l
600 500 400 300 200 100 0 ANFIS
Table 2 Performance of the proposed GA-FIS and ANFIS models on training and testing phase
OPs
GA-FIS
Training and Test RMSE, mg/l ANFIS
OP1 OP2 OP3 OP4 OP5
13
Models
GA-FIS
Train RMSE
Test RMSE
Difference
Train RMSE
Test RMSE
Difference
0.81 48.55 63.10 221.40 187.86
1.12 64.47 85.39 300.66 245.02
0.30 15.92 22.28 79.26 57.16
0.93 53.28 70.09 245.09 207.03
1.06 56.59 76.42 269.39 223.70
0.13 3.31 6.33 24.30 16.67
Modeling Earth Systems and Environment Table 3 Performance of the proposed GA-FIS and ANFIS models on test dataset
Table 4 Threshold statistics between the actual and predicted saltwater concentration values on the test dataset
OPs
RMSE, mg/l
MAPRE, %
ANFIS
GA-FIS
ANFIS
GA-FIS
ANFIS
GA-FIS
ANFIS
GA-FIS
OP1 OP2 OP3 OP4 OP5
1.12 64.47 85.39 300.66 245.02
1.06 56.59 76.42 269.39 223.70
2.69 6.19 7.03 4.96 3.66
2.58 5.51 6.40 4.50 3.38
0.92 0.89 0.91 0.79 0.81
0.92 0.91 0.93 0.82 0.84
0.85 0.80 0.84 0.65 0.67
0.87 0.84 0.87 0.71 0.73
OPs
Threshold statistics, % < 5%
OP1 OP2 OP3 OP4 OP5
IOA
< 10%
R
< 15%
< 20%
ANFIS
GA-FIS
ANFIS
GA-FIS
ANFIS
GA-FIS
ANFIS
GA-FIS
89.50 46.83 39.50 54.33 71.83
91.00 46.67 42.83 57.00 76.50
98.50 81.33 73.50 92.00 98.00
99.17 87.33 79.00 96.50 99.67
100.0 95.83 92.17 99.17 100.0
100.0 99.17 97.17 99.67 100.0
100.0 99.33 98.83 99.83 100.0
100.0 100.0 99.83 100.0 100.0
Table 5 Training time requirement (min) Models
OP1
OP2
OP3
OP4
OP5
ANFIS GA-FIS
2.77 12.20
2.74 11.28
2.70 10.76
2.76 11.60
2.72 16.98
OPs observation points
(1289 generations × 1400 populations) to decide on the global optimal solution. On the other hand, ANFIS-CEMGA model performs 2,018,800 (1442 generations × 1400 populations) function evaluations before reaching the global Pareto optimal solution. CEMGA parameters are selected by conducting a set of numerical experiments using variation of several combinations of different parameters. Based on this numerical trial, the CEMGA uses a population size of 1400, crossover rate of 0.95, and Pareto front population fraction of 0.7. The function and constraint tolerances are set as 1e-05 and 1e-03, respectively. The optimal groundwater extraction strategies in the form of a Pareto optimal front is presented in Fig. 7. The Pareto front provides 980 nondominated solutions from which the managers can choose the right combination of production and barrier well pumping. These solutions are based on limiting the salinity concentrations at specified OPs to the pre-defined maximum allowable limits. Total amount of water abstraction from both the production bores and barrier extraction wells are presented in Table 6. Ten solutions are selected randomly from different regions of the Pareto optimal front. It is observed from Table 6 that ANFIS models seem to be more efficient in
Total Barrier Well Pumping, m³/day
16000
anfis
GA-FIS
14000 12000 10000 8000 6000 4000 2000 0 29000 30000 31000 32000 33000 34000 35000 36000 37000 38000 Total Production Well Pumping, m³/day
Fig. 7 Pareto optimal front of the developed management model
terms of total amount of beneficial production well pumping with respect to barrier extraction well pumping. However, to reach a conclusion verification of the performance of both of these models is required. This verification is conducted by running the numerical simulation model using these pumping values and by checking the constraint violations as well as the errors in prediction.
Verification of the management model The performance of the developed saltwater intrusion management models is verified by comparing the solution results obtained from the optimization routine with those obtained from the numerical simulation model. To do this, ten solutions of optimal groundwater extraction values (which
13
Modeling Earth Systems and Environment
Table 6 Optimal solutions obtained using GA-FIS and ANFIS models Solutions Total pumping × 103 m3/day Total pumping × 103 m3/day (GA-FIS) (ANFIS)
1 2 3 4 5 6 7 8 9 10
Production well
Barrier well Production well
Barrier well
35.43 35.44 35.45 35.37 33.59 35.34 35.43 34.27 32.95 35.44
13.86 14.05 14.19 13.39 6.50 13.10 13.83 8.45 5.16 13.96
4.37 3.18 3.51 3.45 3.53 4.52 3.21 1.74 3.48 3.78
34.88 33.63 34.02 33.97 34.04 35.02 33.67 31.82 34.00 34.33
are same as the solutions used in Table 6) are randomly selected from different regions of the Pareto optimal front. These solutions are used as inputs to the simulation model, Table 7 Percentage absolute relative error between optimal solutions and simulated salinity concentrations
Table 8 Absolute values of constraint violations at selected observation points
Solutions
1 2 3 4 5 6 7 8 9 10
Percentage relative error, % (GA-FIS)
Percentage relative error, % (ANFIS)
OP1
OP2
OP3
OP4
OP5
OP1
OP2
OP3
OP4
OP5
0.04 0.05 0.06 0.01 0.60 0.01 0.04 0.35 0.80 0.04
2.16 2.13 2.09 2.24 2.25 2.31 2.17 2.44 2.12 2.14
4.01 3.97 3.96 4.08 3.56 4.15 4.02 3.85 3.20 3.99
0.11 0.07 0.03 0.20 0.87 0.28 0.12 0.75 1.02 0.09
0.49 0.46 0.45 0.54 0.84 0.59 0.50 0.85 0.87 0.48
0.90 0.28 0.36 0.32 0.29 0.91 0.25 0.77 0.27 0.46
0.65 0.73 0.10 0.31 0.28 0.72 0.51 1.56 0.18 0.04
12.36 10.04 11.00 10.51 11.04 12.43 10.52 7.84 11.13 11.42
2.35 2.11 2.27 2.19 2.27 2.54 2.11 2.50 2.17 2.32
6.39 5.16 5.61 5.42 5.55 6.45 5.30 3.54 5.62 5.73
Solution OP1 (< 30 mg/l)
OP2 (< 700 mg/l) OP3 (< 700 mg/l) OP4 (< 4500 mg/l)
OP5 (< 5000 mg/l)
GA-FIS ANFIS GA-FIS ANFIS GA-FIS ANFIS GA-FIS ANFIS GA-FIS ANFIS 1 2 3 4 5 6 7 8 9 10
13
developed by using the average values of uncertain model parameters to obtain the corresponding saltwater concentration values at each OPs. Table 7 represents the percentage absolute relative errors (PARE) between the management models predicted saltwater concentration values and simulation models’ output. It is observed from Table 7 that PARE values are less than 5% for all estimates for GA-FIS based management model. On the other hand, ANFIS based management model produces a relatively higher values of PARE especially at OPs 3 and 5. It is noted that ANFIS based management model produces lower PARE values at OP2 compared to GA-FIS based management model. This example problem demonstrates the superiority of the GA-FIS based saltwater intrusion management model based on the PARE viewpoint. Performances of the management models are also verified by comparing the violation of the constraints at each OP. This is represented in Table 8. Constraint violations by the ANFIS models are higher for most of the estimates. At OP5, ANFIS models provide a considerable amount of constraint violations.
0.90 0.89 0.87 0.97 1.73 1.02 0.91 1.60 1.82 0.90
0.77 0.79 0.79 0.85 0.82 0.81 0.78 0.88 0.84 0.83
0.00 0.27 0.47 0.06 0.03 0.48 0.04 0.07 1.74 0.05
0.87 0.20 1.64 0.60 0.48 0.45 1.61 0.58 0.58 0.03
0.67 0.15 0.00 0.81 1.65 0.15 0.48 0.88 2.01 0.22
0.51 0.04 0.74 0.24 1.39 0.17 3.29 12.91 0.92 0.43
0.76 1.50 1.22 5.68 0.25 12.17 3.82 6.62 2.37 1.91
0.22 6.35 3.07 1.78 3.27 3.58 4.60 0.50 2.64 1.56
0.01 0.21 0.51 0.95 0.60 1.25 0.11 2.74 0.35 0.27
14.52 5.80 12.65 8.59 7.07 12.91 10.91 5.28 11.49 5.33
Modeling Earth Systems and Environment
Based on the above discussions, it can be concluded that the proposed saltwater intrusion management models can be applied to obtain optimal groundwater abstraction values in a multi-layered coastal aquifer system under parameter uncertainty. However, GA-FIS based AI models are more efficient in developing the saltwater intrusion management model at least for this illustrative example problem.
Summary and conclusions This study proposes the use of coupled GA-FIS-CEMGA based management model to predict saltwater intrusion processes and to evolve a regional scale optimal management strategy in a multi-layered coastal aquifer system under hydrogeological parameter uncertainty. The performance of the GA-FIS-CEMGA model is compared with the performance of an ANFIS-CEMGA based management model. In the first step, GA-FIS and ANFIS based models are developed to approximate density dependent coupled flow and salt transport processes. GA is used to tune the antecedent and consequent parameters for obtaining the best FIS model structures. The parameters of ANFIS based models are tuned using the hybrid algorithm. Both models are trained using datasets consisting of predictor-response arrays of groundwater withdrawal and resultant saltwater concentrations, obtained as solutions of a density dependent 3D coupled flow and salt transport numerical simulation model. In the second step, two management models are developed by external linking of these AI based models to the optimization algorithm, CEMGA. It is demonstrated that both GAFIS and ANFIS models capture the nonlinear relationship between the spatiotemporal groundwater extraction values and the resulting salinity concentrations quite accurately. Comparison results indicate that the GA-FIS models provide better prediction capabilities compared to the ANFIS models. This study applies GA for extracting fuzzy if-then rules in the FCM method in order to develop the hybrid GA-FIS model. Management model performance evaluation result also indicates superiority of the GA-FIS-CEMGA based model. It is noted that comparison of Pareto fronts of both GAFIS-CEMGA and ANFIS-CEMGA based management models appear to indicate that ANFIS-CEMGA model allows increased amount of beneficial pumping associated with the same amount of barrier well extraction. However, this apparently permissible increased amount of pumping corresponding to a specific amount of barrier well extraction appear to be the consequence of relatively inaccurate prediction of salinity by the ANFIS-CEMGA and not due to better optimal solutions. Based on the PARE criterion, it can be concluded that GA-FIS-CEMGA based management model provides more accurate estimates of the optimal groundwater
extraction strategies and therefore provides more accurate representation of the optimal permissible pumping. This is a very important consideration in all real life applications. The proposed GA-FIS models do not include parameter uncertainty directly, however, they indirectly incorporate uncertainty because they are developed from the solution results of a numerical model that addresses parameter uncertainty. The present study considers a geologically multi-layered coastal aquifer system in which four vertical material layers vary in randomized realizations of uncertain model parameters. However, the aquifer material within each layer are considered homogeneous but uncertain. Future research may be directed towards studying the applicability of the GA-FIS based approximation of the complex saltwater intrusion process for developing optimal management strategies in heterogeneous coastal aquifer systems.
References Ababou R, Al-Bitar A (2004) Salt water intrusion with heterogeneity and uncertainty: mathematical modeling and analyses. Dev Water Sci 55:1559–1571 Araghi S, Khosravi A, Creighton D (2015) Design of an optimal ANFIS traffic signal controller by using cuckoo search for an isolated intersection. In: 2015 IEEE international conference on systems, man, and cybernetics, 9–12 Oct 2015, pp 2078–2083. https://doi.org/10.1109/SMC.2015.363 Basser H et al (2015) Hybrid ANFIS–PSO approach for predicting optimum parameters of a protective spur dike. Appl Soft Comput 30:642–649. https://doi.org/10.1016/j.asoc.2015.02.011 Bezdek JC, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 10:191–203. https://doi. org/10.1016/0098-3004(84)90020-7 Bhattacharjya RK, Datta B (2009) ANN-GA-based model for multiple objective management of coastal aquifers. J Water Res Pl-ASCE 135:314–322. https://doi.org/10.1061/ (asce)0733-9496(2009)135:5(314) Casillas J, Cordón O, Herrera F, Magdalena L (2003) Accuracy improvements to find the balance interpretability-accuracy in linguistic fuzzy modeling: an overview. In: Casillas J, Cordón O, Herrera F, Magdalena L (eds) Accuracy improvements in linguistic fuzzy modeling. Springer, Berlin, pp 3–24. https://doi. org/10.1007/978-3-540-37058-1 Christelis V, Mantoglou A (2016) Pumping optimization of coastal aquifers assisted by adaptive metamodelling methods and radial basis functions. Water Resour Manag 30:1–15. https://doi. org/10.1007/s11269-016-1337-3 Deb K (1999) An introduction to genetic algorithms. Sadhana 24:293– 315. https://doi.org/10.1007/bf02823145 Deb K, Goel T (2001) Controlled elitist non-dominated sorting genetic algorithms for better convergence. In: Zitzler E, Thiele L, Deb K, Coello Coello CA, Corne D (eds) Evolutionary multi-criterion optimization: first international conference. EMO 2001 Zurich, Switzerland, March 7–9, 2001 proceedings. Springer, Berlin, pp 67–81. https://doi.org/10.1007/3-540-44719-9 Deb K, Agrawal S, Pratap A, Meyarivan T (2000) A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization:NSGA-II. In: Schoenauer M, Deb K, Rudolph G, Yao X, Lutton E, Merelo JJ, Schwefel H-P (eds) Parallel problem
13
solving from nature. Springer, Berlin, pp 849–858. https://doi. org/10.1007/3-540-45356-3 Dhar A, Datta B (2009) Saltwater intrusion management of coastal aquifers. I: linked simulation-optimization. J Hydrol Eng 14:1263– 1272. https://doi.org/10.1061/(ASCE)HE.1943-5584.0000097 Goldberg DE (1989) Genetic algorithms in search, optimization and machine learning. Addison-Wesley Longman Publishing Co, Inc, Boston Green PJ, Silverman BW (1993) Nonparametric regression and generalized linear models: a roughness penalty approach. Taylor & Francis, Routledge Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor Holman D, Sridharan M, Gowda P, Porter D, Marek T, Howell T, Moorhead J (2014) Gaussian process models for reference ET estimation from alternative meteorological data sources. J Hydrol 517:28–35. https://doi.org/10.1016/j.jhydrol.2014.05.001 Hsieh WW, Tang B (1998) Applying neural network models to prediction and data analysis in meteorology and oceanography. Bull Am Meteorol Soc 79:1855–1870 Hussain MS, Javadi AA, Ahangar-Asr A, Farmani R (2015) A surrogate model for simulation–optimization of aquifer systems subjected to seawater intrusion. J Hydrol 523:542–554. https://doi. org/10.1016/j.jhydrol.2015.01.079 Ishigami H, Fukuda T, Shibata T, Arai F (1995) Structure optimization of fuzzy neural network by genetic algorithm. Fuzzy Sets Syst 71:257–264. https://doi.org/10.1016/0165-0114(94)00283-d Jalalkamali A (2015) Using of hybrid fuzzy models to predict spatiotemporal groundwater quality parameters. Earth Sci Inf 8:885– 894. https://doi.org/10.1007/s12145-015-0222-6 Jang J-SR (1991) Fuzzy modeling using generalized neural networks and Kalman filter algorithm, vol 2. In: Paper presented at the Proceedings of the ninth national conference on artificial intelligence, Anaheim Jang J-SR (1993) ANFIS: adaptive-network-based fuzzy inference systems. IEEE Trans Syst Man Cybern 23:665–685. https://doi. org/10.1587/transfun.E99.A.963 Jang J-SR, Sun CT, Mizutani E (1997) Neuro-fuzzy and soft computing: A computational approach to learning and machine intelligence. Prentice Hall, Upper Saddle River Kourakos G, Mantoglou A (2009) Pumping optimization of coastal aquifers based on evolutionary algorithms and surrogate modular neural network models. Adv Water Resour 32:507–521. https:// doi.org/10.1016/j.advwatres.2009.01.001 Lee C-H, Teng C-C (2001) Fine tuning of membership functions for fuzzy neural systems Asian. J Control 3:216–225. https://doi. org/10.1111/j.1934-6093.2001.tb00060.x Legates DR, McCabe GJ (1999) Evaluating the use of “goodness-of-fit” Measures in hydrologic and hydroclimatic model validation. Water Resour Res 35:233–241. https://doi.org/10.1029/1998WR900018 Lin HJ, Rechards DR, Talbot CA, Yeh GT, Cheng JR, Cheng HP, Jones NL (1997) A three-dimensional finite-element computer model for simulating density-dependent flow and transport in variable saturated media, Version 3.0. US Army Engineering Researchand Development Center, Vicksburg, p 143 MathWorks (2017a) fcm: fuzzy c-means clustering MATLAB documentation. https://aumathworks.com/help/fuzzy/fcm.html. Accessed 30 Aug 2017 MathWorks (2017b) MATLAB version R2017a. The Mathworks Inc, Mathworks MathWorks (2017c) Parallel computing toolbox: MATLAB version R2017a. The Mathworks Inc, Mathworks Mohammadi K, Shamshirband S, Petković D, Yee PL, Mansor Z (2016) Using ANFIS for selection of more relevant parameters to predict dew point temperature. Appl Therm Eng 96:311–319. https://doi.org/10.1016/j.applthermaleng.2015.11.081
13
Modeling Earth Systems and Environment Oliveira MV, Schirru R (2009) Applying particle swarm optimization algorithm for tuning a neuro-fuzzy inference system for sensor monitoring. Prog Nucl Energy 51:177–183. https://doi. org/10.1016/j.pnucene.2008.03.007 Pebesma EJ, Heuvelink GBM (1999) Latin hypercube sampling of Gaussian random fields. Technometrics 41:303–312. https://doi. org/10.2307/1271347 Pillay N (2004) An investigation into the use of genetic programming for the induction of novice procedural programming solution algorithms in intelligent programming tutors. Dissertation, University of KwaZulu-Natal, Durban Piret C (2007) Analytical and numerical advances in radial basis functions. Dissertation, University of Colorado, Boulder Rini DP, Shamsuddin SM, Yuhaniz SS (2016) Particle swarm optimization for ANFIS interpretability and accuracy. Soft Comput 20:251–262. https://doi.org/10.1007/s00500-014-1498-z Roy DK, Datta B (2017a) Fuzzy c-mean clustering based inference system for saltwater intrusion processes prediction in coastal aquifers. Water Resour Manag 31:355–376. https://doi.org/10.1007/ s11269-016-1531-3 Roy DK, Datta B (2017b) Multivariate adaptive regression spline ensembles for management of multilayered coastal aquifers. J Hydrol Eng 22:04017031 Roy DK, Datta B (2017c) Optimal management of groundwater extraction to control saltwater intrusion in multi-layered coastal aquifers using ensembles of adaptive neuro-fuzzy inference system. World Environ Water Resour Congr. https://doi. org/10.1061/9780784480595.013 Shiri J, Kişi Ö (2011) Comparison of genetic programming with neurofuzzy systems for predicting short-term water table depth fluctuations. Comput Geosci 37:1692–1701. https://doi.org/10.1016/j. cageo.2010.11.010 Sóbester A, Forrester AIJ, Toal DJJ, Tresidder E, Tucker S (2014) Engineering design applications of surrogate-assisted optimization techniques. Optim Eng 15:243–265. https://doi.org/10.1007/ s11081-012-9199-x Sreekanth J, Datta B (2010) Multi-objective management of saltwater intrusion in coastal aquifers using genetic programming and modular neural network based surrogate models. J Hydrol 393:245–256. https://doi.org/10.1016/j.jhydrol.2010.08.023 Sreekanth J, Datta B (2011a) Comparative evaluation of genetic programming and neural network as potential surrogate models for coastal aquifer management. Water Resour Manag 25:3201–3218. https://doi.org/10.1007/s11269-011-9852-8 Sreekanth J, Datta B (2011b) Optimal combined operation of production and barrier wells for the control of saltwater intrusion in coastal groundwater well fields. Desalin Water Treat 32:72–78. https://doi.org/10.5004/dwt.2011.2680 Sreekanth J, Datta B, Mohapatra PK (2012) Optimal short-term reservoir operation with integrated long-term goals. Water Resour Manag 26:2833–2850. https://doi.org/10.1007/s11269-012-0051-z Sugeno M (1985) Industrial applications of fuzzy control. Elsevier Science Inc., Amsterdam Sugeno M, Yasukawa T (1993) A fuzzy logic based approach to qualitative modeling. IEEE Trans Fuzzy Syst 1:7–31. https://doi. org/10.1109/TFUZZ.1993.390281 Takagi T, Sugeno M (1985) Fuzzy identification of systems and its application to modeling and control. IEEE Trans Syst Man Cybern 15:116–132. https://doi.org/10.1109/TSMC.1985.6313399 Tang AM, Quek C, Ng GS (2005) GA-TSKfnn: parameters tuning of fuzzy neural network using genetic algorithms. Expert Syst Appl 29:769–781. https://doi.org/10.1016/j.eswa.2005.06.001 Werbos PJ (1974) Beyond regression: new tools for prediction and analysis in the behavioral sciences. Ph.D. dissertation, Harvard University, Cambridge
Modeling Earth Systems and Environment Willmott CJ (1984) On the evaluation of model performance in physical geography. In: Gaile GL, Willmott CJ (eds) Spatial statistics and models. Springer, Dordrecht, pp 443–460. https://doi. org/10.1007/978-94-017-3048-8 Zadeh LA (1965) Fuzzy sets. Inf Control 8:338–353. https://doi. org/10.1016/S0019-9958(65)90241-X Zanaganeh M, Mousavi SJ, Etemad Shahidi AF (2009) A hybrid genetic algorithm-adaptive network-based fuzzy inference system
in prediction of wave parameters. Eng Appl Artif Intell 22:1194– 1202. https://doi.org/10.1016/j.engappai.2009.04.009 Zeng X-J, Singh MG (1996) A relationship between membership functions and approximation accuracy in fuzzy systems. IEEE Trans Syst Man Cybern Part B Cybern 26:176–180. https://doi. org/10.1109/3477.484451
13