Surrogate modeling based on an adaptive network and granular computing

Reducing the number of evaluations of expensive fitness functions is one of the main concerns in evolutionary algorithms, especially when working with...

1 downloads 39 Views 350KB Size

Download PDF

Soft Comput DOI 10.1007/s00500-015-1605-9

METHODOLOGIES AND APPLICATION

Surrogate modeling based on an adaptive network and granular computing Israel Cruz-Vega · Hugo Jair Escalante · Carlos A. Reyes · Jesus A. Gonzalez · Alejandro Rosales

© Springer-Verlag Berlin Heidelberg 2015

Abstract Reducing the number of evaluations of expensive fitness functions is one of the main concerns in evolutionary algorithms, especially when working with instances of contemporary engineering problems. As an alternative to this efficiency constraint, surrogate-based methods are grounded in the construction of approximate models that estimate the solutions’ fitness by modeling the relationships between solution variables and their performance. This paper proposes a methodology based on granular computing for the construction of surrogate models for evolutionary algorithms. Under the proposed method, granules are associated with representative solutions of the problem under analysis. New solutions are evaluated with the expensive (original) fitness function only if they are not already covered by an existing granule. The parameters defining granules are periodically adapted as the search goes on using a neuro-fuzzy network that does not only reduce the number of fitness function evaluations, but also provides better convergence capabilities. The proposed method is evaluated on classical benchmark functions and on a recent benchmark created to test large-scale optimization models. Our results show that the proposed method considerably reduces the actual number of fitness function evaluations without significantly degrading the quality of solutions. Keywords Surrogate modeling · Genetic algorithms · Neuro-fuzzy networks

Communicated by V. Loia. I. Cruz-Vega (B) · H. J. Escalante · C. A. Reyes · J. A. Gonzalez · A. Rosales Computer Science Department, Instituto Nacional de Astrofísica, Óptica y Electrónica, 72840 Puebla, Mexico e-mail: [email protected]

1 Introduction Nowadays, many objective functions used in engineering applications are generated through computer simulation. In many of these cases, the optimization of such objectives cannot be done in a straightforward way, i.e., by directly applying exact optimization methods to these functions, this would be impractical or inefficient. One reason is that simulation-based objective functions are often analytically intractable (discontinuous, non-differentiable, and inherently noisy). Another case is when sensitive information is unavailable. However, the most important case relates to the high-computational cost associated to measurements/simulations. Simulations require multiple evaluations of the objective function. This may last from several hours, to days or even weeks, which is not uncommon in contemporary engineering applications (despite the increase of computing power). A feasible way to handle applications that depend on the use of costly evaluation functions consists on using surrogate models. Here, the optimization of the original objective function is replaced by the iterative re-optimization and update of an analytically tractable and computationally inexpensive surrogate. Surrogate modeling is related with the construction of approximation models to estimate the performance of a system, by developing relationships between the most representative structural variables of this system and its performance. A wide variety of surrogate modeling techniques has been reported in the literature. Each surrogate modeling method has its own benefits and limitations (Jin 2005). Among the available methods, those based on granular computing have recently reported acceptable results (Akbarzadeh et al. 2008; Akbarzadeh-T et al. 2003; Cruz-Vega et al. 2014). Besides, its acceptable performance, granular methods have the advan-

123

I. Cruz-Vega et al.

tage of being easy to interpret because they are based on notions tied to the human perception system. In this article, we introduce GA-FGNFN: a geneticalgorithm based on fuzzy granulation and neuro-fuzzy networks. GA-FGNFN is a surrogate modeling approach for genetic algorithms1 in which the reduction of the amount of fitness function evaluations is performed throughout a granularity technique, taking into account the similarity among individuals for the construction of granules. GA-FGNFN is inspired in a previously introduced method for surrogate modeling called genetic-algorithm based on adaptive fuzzy granulation (GA-AFFG) (Akbarzadeh et al. 2008; Cruz-Vega and Escalante 2015). In this paper, we propose an improvement to such approach that increases its convergence capabilities and reduces its fitness-evaluation cost. The extended model is called E-GA-AFFG (extended-GA-AFFG). The E-GA-AFFG method is further enhanced with a neuro-fuzzy adaptive mechanism yielding the GA-FGNFN approach. Under GA-FGNFN, the fitness values of solutions of a genetic algorithm are either evaluated with the original (expensive) objective function or approximated with fuzzy granules (inexpensive). The proposed method takes into account the uncertainty and imprecision of solutions belonging to a specific granule, this is achieved by combining the effect of fuzzy logic with granulation. The fuzzy-granulation process provides knowledge about the structure of the optimization surface and is considered in the process of building a neuro-fuzzy network that aims to adapt the positions and size of granules (i.e., the scope of surrogates). The network adjusts these structural parameters, updating the surrogate every certain number of generations according to the most prominent optimization values, pushing the genetic algorithm to a faster convergence to the optimum solution and significantly reducing the number of fitness function evaluations. A preliminary version of this work was reported in CruzVega et al. (2014), this article extends previous work in several ways. First, we introduce a novel probabilistic decision rule that resembles the probabilistic rule of simulated annealing (Bertsimas et al. 1993). According to such rule, solutions that are very similar to a given surrogate still can be evaluated with the original fitness function with certain probability. The aim of this mechanism is to allow updating granules/surrogates positions even when a similar solution has been found. This action is particularly encouraged at the end of the search process when the population is converging to better quality solutions. In this way, the reduction of fitness evaluations is not our primary goal, but 1

One should note that although we have chosen genetic algorithms for the implementation of the proposed method, the same approach can be implemented with other population-based search strategies (e.g., particle swarm optimization or differential evolution).

123

also, an improvement in the convergence process towards the optimum value. Secondly, we propose a set of additional improvements over the GA-AFFG method described in Akbarzadeh et al. (2008), resulting in our E-GA-AFFG method. The extended approach significantly outperforms the former method. Third, we perform a comprehensive evaluation of the proposed methods over a wide variety of optimization problems, and show that the proposed improvement further increases the capabilities of the model proposed in Cruz-Vega et al. (2014). The performance of the proposed methods is extensively evaluated in a wide variety of problems. First, a suite of traditional benchmark functions was considered (De Jong 1975; Digalakis and Margaritis 2002). Second, considering that an important application for global optimization techniques is the minimization of potential structures, the Morse’s clusters problem was approached (Morse 1929); this problem provides a challenging benchmark for global optimization due to the high-dimensionality of its data and the complexity of the task (Pintér 2006). Finally, the CEC-2013 benchmark for large-scale global optimization (Li et al. 2013) was considered to evaluate our E-GA-AFFG algorithm. The aim of using the latter benchmark is to assess the performance of our method over a wide range of real-world large-scale optimization problems. The experimental study shows the effectiveness of the proposed methods in terms of reduction of fitness function evaluations and convergence to the optimum. The remainder of the paper is organized as follows. The next section provides a review of related work on granular computing for surrogate modeling. Section 3 describes the surrogate model based on fuzzy granules that is the basis of our proposal: GA-FGNFN. In the same section, we describe our improved version of GA-FGNFN: E-GAAFFG. In Sect. 4, the adaptive part of surrogate fuzzygranules parameters is presented with the use of a neurofuzzy structure. Numerical results are presented in Sect. 5. Finally, in Sect. 6, conclusions of this work are presented as well as future work directions.

2 Related work A wide variety of surrogate modeling techniques have been reported in the literature, some of the more important works are: (i) Polynomial Response Surface Model (PRSM) (Myers and Montgomery 2003), (ii) Kriging (Sacks et al. 1989), (iii) Radial Basis Functions (RBFs) (Karakasis and Giannakoglou 2004), (iv) Extended Radial Basis Functions (Zhang et al. 2012), (v) Artificial Neural Networks (Farina 2002), and Support Vector Regression (Clarke et al. 2005). A concise overview of the memoir and recent developments in surrogate-assisted evolutionary computation is presented in Jin (2011). Among all these variety of surrogate modeling

Surrogate modeling based on an adaptive network and granular computing

approaches, those based on granular computing (GrC) have shown quite competitive performance (Jin 2005). Has noticed by Yao et al. (2013), GrC has emerged as one of the fastest growing information-processing paradigms in computational intelligence and human-centric systems. GrC can be defined as the study of a general theory of problem solving based on different levels of detail (Yao 2004); it is often regarded as an umbrella term to cover theories, methodologies, techniques, and tools that make use of granules problem solving (Yao 2005). A unified view and some general paradigms in the construction of granular models is mentioned in Pedrycz (2014). In this paper, we focus in the construction of granules by means of fuzzy set theory. The benefits of fuzzy set theory in GrC, as stated in Zadeh (1997) are: “to exploit the tolerance for imprecision, uncertainty and partial truth to achieve tractability, robustness, low-cost and better rapport with reality”. Some related works on fuzzy granules are next reviewed. In Do Wan Kim et al. (2006), a three step methodology is proposed for the construction of fuzzy classifiers: (1) selection of information granules, (2) construction of the associated fuzzy sets, and (3) tuning of fuzzy rules. The granules construction is carried out by a GA. Instead, we use fuzzy granulation to reduce the number of fitness function evaluations in GA when targeting high dimensional problems. In Castellano et al. (2003), another method to obtain fuzzy granules is presented . The method is developed by first granulating the data through a fuzzy clustering algorithm, the granules are represented with Gaussian functional forms and are used to build a fuzzy inference system. Gaussian function widths are determined by solving a constrained quadratic problem of membership values returned by the clustering algorithm. Compared to our approach, we are combining the interpretability capabilities of fuzzy systems and the learning power of Neural Networks, which will change the granules parameters every period of time as specified by the user, such parameters include position and width of the granules. In Aja-Fernández et al. (2004) granulation by means of fuzzy logic is used in the decision process of a fuzzy inference system. This method is used for classification although the adaptive part of the rules or granules is not presented. Leite et al. (2011) approach the time series prediction problem by means of fuzzy granules. The algorithm is compared against different predictors but again, the adaptive part of granules is not mentioned. In Roh et al. (2014), a Fuzzy C-Means (FCM) clustering algorithm is used to form information granules and particle swarm optimization (PSO) is used to search for the optimal distribution of prototypes over the feature space. The methodology proposed there is similar to our approach in the sense of construction of granules and adapting them, but this process is used to obtain granular fuzzy classifiers (under the prototype-based classification formulation), and we are focusing in the opti-

mization of large-scale functions. Similar approaches are presented in Park et al. (2007), Pedrycz and Song (2012), where authors present a methodology to reduce the feature space with the design of fuzzy models as a granulation process, which inherently corresponds to a surrogate modeling procedure; interesting analysis and results are presented on the accuracy and interpretability of the granular fuzzy models employed. In Panoutsos et al. (2010), the importance model interpretability in a real mechanical industrial process is highlighted, the extraction of knowledge is performed with a GrC algorithm based on fuzzy rules and adjusting of the model is realized by means of a neuro-fuzzy modeling structure. Our proposal is based in a previously introduced method for surrogate modeling based on granular computing described in Akbarzadeh et al. (2008), where Akbarzadeh-T et al. proposed the use of fitness granulation via an adaptive fuzzy similarity analysis to reduce the number of fitness evaluations. In this method, called GA-AFFG, a pool of fuzzy granules with Gaussian similarity measures is constructed and the enlargement and shrinkage of granules is adapted in each generation depending on the population fitness. Although the surrogate proved to be effective for reducing the number of fitness function evaluations, we noticed that under GAAFFG the position of granules does not change during the search process (a note on this limitation of the method is presented in Cruz-Vega and Escalante (2015)). Hence, the convergence of the evolutionary algorithm is significantly affected. Our proposed approach GA-FGNFN does not only reduce fitness evaluations of similar individuals by means of granularity, but also incorporates a neuro-fuzzy network used to learn the model that determines how the granules’ parameters are modified, then providing those granules with an adaptive component during the search process. Our extended version E-GA-FGNFN from the GA-AFFG algorithm, comprises the use of a probabilistic rule implemented to improve its convergence capabilities.

3 Genetic-algorithm based on adaptive fuzzy granulation and its extension The reduction of the number of fitness evaluations in the proposed surrogate model is based on the use of fuzzy granules. Specifically, we consider a genetic algorithm in which each solution is evaluated with either the original (expensive) fitness function or a fuzzy granule that is very similar to the incoming solution. As a result, only a subset of solutions is evaluated with the fitness function, reducing the computational cost. A general flowchart of our proposed GA-FGNFN method is shown in Fig. 1. The rest of this section describes in detail our proposed GA-FGNFN method.

123

I. Cruz-Vega et al.

Initial random population, P 0

Incoming solution

Is the solution similar to an existing granule? AND The probability is low?

Yes

Relate to an existing granule

No

Gen = Gen + 1

Evaluate fitness of solution exactly and create a new granule

Update Granule’s Parameters C,f (C), σ Yes

Update the number of granules

No

Module (i,n)==0

Perform Crossover and Mutation

Granule’s Construction Part Parameters: C,f(C), σ

Perform Reproduction Adaptive Part: Neuro–Fuzzy Network

No

Termination Criterion Satisfied?

Yes End

Fig. 1 Flowchart of GA-FGNFN algorithm. The left dashed-square illustrates the granule adaptation procedure by means of a neuro-fuzzy network

3.1 Granular computing As previously mentioned, GrC is a powerful tool to solve difficult problems: by partitioning a problem into particles, complex tasks can be converted into a number of relatively simpler problems (Zadeh 1997). In this paper, we use GrC for surrogate modeling, where the general goal is to find

123

a simple, easy and low-cost satisfactorily enough approximate solution to replace the precise solution. In our case, we want to approximate expensive fitness functions for GAbased optimization. Herein, GrC is supported on fuzzy set theory, which is characterized by the use of membership functions μ A (x) which take values in the interval [0, 1]. The membership

Surrogate modeling based on an adaptive network and granular computing

function of individuals or solutions within a fuzzy set is a continuous function, with grades of membership in the range [0, 1]. Different fuzzy sets descriptions can be employed to represent individuals in the universe of discourse, for example: triangular, trapezoidal, or Gaussian. The membership function choice is a subjective aspect of fuzzy logic, which allows the desired values to be appropriately interpreted. Because of their smoothness, concise notation, and the dependence on only two parameters (center and width, which will be updated by the neural network), in this work Gaussian functions are employed as a measure of similarity between individuals and in the construction of the fuzzy granules.

more exact fitness computed around individuals who perform very well, and fewer fitness computations around individuals showing poor performance.We like to emphasize that Eq. (2) is an improvement over the original model proposed by Akbarzadeh et al. (2008), and therefore, it is an additional contribution of this work.2 As previously explained, the fitness of an individual x ij is either calculated by computing the exact fitness function ( f (x ij )) or using the fitness value associated to one of the existing granules ( f (Ck )). This decision is taken according to: f (x ij )

=

f (Ck ) if max{μk } > θ i and P(x ij ) > γ f (x ij ) computed by fitness function otherwise

3.2 Fuzzy granules as surrogate models

(4)

In this work, taking as inspiration the work in Akbarzadeh et al. (2008), granules (surrogate models) represent prototype solutions, in such a way that a genetic algorithm can either compute the real (expensive) fitness function or use the existing granules to obtain its fitness value. Granules are generated as depicted in Fig. 1: the first solution is defined as granule or surrogate; the forthcoming solutions are compared to existing granules. Whenever a solution is similar enough to an existing granule, the fitness value of the surrogate is used for the incoming solution; otherwise, the actual function is used to determine the fitness of the solution and a new granule is generated. Each fuzzy granule is interpreted as a density function, representing a group of individuals (solutions) with similar characteristics. The following Gaussian similarity measure is used to compare incoming solutions with existing granules: μk (x ij ) = exp(−(x ij − Ck )2 /(σk )2 )

(1)

where Ck is the kth granule, for k = 1, 2, . . . , l number of fuzzy granules, and x ij is the jth individual (solution) in the ith generation, for the initial population P0 = {x10 , x20 , . . . , xt0 }. The radius σk of each granule is used to control the area of the degree of similarity between individuals, which also determines the decision boundary of fitness evaluation of individuals, so that the granules can shrink or enlarge. In this work, the mean value of the Euclidean distance between a granule and all solutions is used to determine its radius, σk =

1 t

t

(x ij − Ck )2

(2)

j=1

where t is the number of individuals in the population. Initially, σk is large and then, as the algorithm evolves, fitness evaluations of best individuals will be evaluated in granules of reduced radius in a zone near the optimal solutions. Therefore, Eq. (2) adapts the width of the granules to have

where θ i is a similarity threshold that has to be surpassed to use the granule’s fitness function, P(x ij ) is the probability that the surrogate model will be used to estimate the fitness of x ij , and γ is a random number uniformly distributed in [0, 1]. This decision rule is another improving difference over the method originally proposed in Akbarzadeh et al. (2008), in which only the similarity threshold θ i was considered. The underlying hypothesis of using the probabilistic function P(x ij ) is that even when a solution, x ij , is very similar to a granule, Ck , it might be necessary to use the actual fitness function to evaluate x ij . Consider for instance the search process at the final generations, because of the nature of evolutionary algorithms, most solutions will converge to a local minima zone (even surrogates). In such situation, surrogates may cover zones of the search space that are currently unexplored. Hence, the probabilistic function aims at overcoming such undesired behaviour of the granular surrogate. The probability function is similar to the one used in a simulated annealing process (Bertsimas et al. 1993). Because we want the probability function to allow solutions to be evaluated with surrogates at early stages (when surrogates are dispersed along the search space) and with the actual function at final steps (when most solutions converge 2

In Akbarzadeh et al. (2008), the radius of the Gaussian similarity function was proposed as: σk = γ ×

1 exp( f (Ck ))β

(3)

that is, in inverse proportion to the exponential of the fitness value of the granule’s center, relating fitness function values and similarity measures in the similarity function. Since similarity values and fitness values do not necessarily lie in the same scale, it makes no sense to use the definition for σk from Eq. (3), see Cruz-Vega and Escalante (2015) for details. Instead, in our work, σk is only related with measures in the same scale, that is distances in the variables space, see Eq. (2). In this way, we avoid the construction of granules that could have undefined values.

123

I. Cruz-Vega et al.

to a reduced area of the search space), the probability function is controlled by a non-decreasing function T : Q → (0, ∞), called the cooling schedule. Where Q is the set of positive integers, and T (i) is called the temperature at every ith generation change. P varies within the set of positive real numbers in the range [Pmax , Pmin ]. For each individual x ij , at every generation, before taking the decision to be evaluated by the actual fitness function, it is evaluated with the similarity measure, to decide whether it will be included or not in one (or with certain degree of membership) of the existing granules. The probability function is defined as, P j = exp

−1 Tj

(5)

We can note that the probability P is inversely proportional to the temperature T (i) at each ith change of generation, T = −1.0/ log(P), which defines the initial and final temperature Tini , TF , respectively. At every generation, the temperature is changed by a fractional reduction rate defined by α = (Tini /TF )(1.0/(N −1.0)) , where N is the total number of generations. We can see that as generations advance in the genetic algorithm, this probability gets smaller, so that the probability to accept an individual within an existing granule is smaller. As mentioned before, within the last generations of the algorithm, the convergence zone is close to the optimum zone. Therefore, actual fitness evaluations and genetic operations are more frequently allowed to take action on these individuals. Therefore, another contribution with respect to Akbarzadeh et al. (2008); Cruz-Vega et al. (2014), is the use of the probability function P(x ij ). The probability function is implemented in E-GA-AFFG, the enhanced version of our GA-FGNFN approach (Akbarzadeh et al. 2008). Finally, we describe how the similarity threshold θ ij is defined. We set up this threshold according to the similarity measure between individuals (1), θ ij = min(μk (x ij ))

(6)

This change of threshold, with respect to our previous work (Cruz-Vega et al. 2014), is due to the fact that for the first generations individuals are dispersed over the large space of solutions. Then, the algorithm has to be more permissive with the inclusion of individuals within the granules (a good convergence zone has not been found yet) and unnecessary fitness functions evaluations should be omitted so that the threshold θ is small. Now, after some generations of the algorithm have passed and the convergence zone is supposed to be near the optimum value, θ is preferred to be large and closer to its maximum value of similarity estimation. This happens to allow higher selectivity for granule associability and a higher threshold for granules. This action allows the

123

algorithm to perform real fitness evaluations on individuals near the optimum zone which have the highest capabilities of optimum convergence. We have described in this section the GA-FGNFN method and the changes we proposed to improve it (E-GA-AFFG). The following section describes how the E-GA-AFFG is further enhanced by equipping it with a neuro-fuzzy network that allows it to adapt the position of granules throughout the search process. In Sect. 5 we compare all of the variants of granular surrogates.

4 Neuro-fuzzy: adaptive part The process of GrC mentioned above, suggests a reduction of computational cost. This is achieved by means of the adaptive component added to obtain the similarity threshold between individuals and also by changing the radius of granules, even when these do not change in position during the search process. Therefore, properties of mutation and crossover, which let individuals in each generation jump towards optimal values, will be enclosed in granules established from the very first generations. We propose the use of a neuro-fuzzy network that will adapt free parameters of granules, these parameters are: centres, radius, and fitness values of granules. The adaptive step used to obtain the granule parameters will be performed every n-generations, and not at each generation. This is because it could increase the calculus computational time increase besides an increase in the number of fitness evaluations. From last section, we have values of the granules G = {Ck , σk , f (Ck )}, where Ck is the m-dimensional vector of centers, σk is the width of membership functions of the kth fuzzy granule and f (Ck ) are the fitness values of such granules. The fuzzy system that we are going to design has the following form, M f (x) =

k=1 y¯k

n

− Ck )2 /(σk )2 )

M n i 2 2 l=1 i=1 exp(−(x j − C k ) /(σk ) ) i i=1 exp(−(x j

(7)

where μk (x ij ) = exp(−(x ij − Ck )2 /(σk )2 ), represents the fuzzy sets or fuzzy granules, y¯k is the fitness value of the respective granule ( f (Ck )), M is fixed and indicates the number of fuzzy rules (fuzzy granules), y¯k , Ck and σk are free parameters to be updated. To determine these parameters in some optimal fashion, it is helpful to represent the fuzzy system of granules (7) as a feedforward network. Specifically, the mapping from the input of individuals x ∈ U ⊂ R n to the output of their desired fitness values f (x) can be implemented according to the following operations: first, the input x (individual) is passed through a product Gaussian operator (granule’s structure construction)

Surrogate modeling based on an adaptive network and granular computing

To determine Ck , we use Ck (q + 1) = Ck (q) − α

Fuzzy Granules

Layer 1

Layer 2

∂e |q ∂Ck

(12)

we see from Fig. 2 that f (and hence e) depend on Ck only through z k ; hence, using the Chain Rule, we have

Layer 3

Fig. 2 Three-layer feedforward network as a processing unit of fuzzy granules

n to become z k = i=1 exp(−(x ij − Ck )2 /(σk )2 ); then, the z k are passed through a summation operator and a weighted summation operator (current fitness value of the granule) to M M z k and a = k=1 y¯k z k ; finally, the output obtain b = k=1 of the fuzzy system is computed as f (x) = a/b. This threestage operation is shown in Fig. 2 as a three-layer feedforward network.

i ∂e ∂ f ∂z k y¯k − f 2(x j − Ck ) = ( f − y) = ( f − y) zk ∂Ck ∂z k ∂ck b σk2 (13)

Substituting (13) into (12), we obtain the training algorithm for Ck : Ck (q +1) = Ck (q)−α

2(x ij −Ck (q)) ( f − y) ( y¯k (q)− f )z k b σk2 (q) (14)

4.1 Parameter update

Using the same procedure, we obtain the training algorithm for σk :

Now, is the time to perform the parameters updating task of the fuzzy granular system (7), such that the matching error

σk (q + 1) = σk (q) − α

ek =

1 [ f k (x) − yk ]2 2

(8)

∂e |q ∂ y¯k

(9)

where k = 1, . . . , M, q = 0, 1, 2, . . . , and α is a constant step size. If y¯k (q) converges as q goes to infinity, then from (9) we have ∂∂ey¯k = 0 at the converged y¯k , which means that the converged y¯k is a local minimum of ek . From Fig. 2 it is seen that f (and hence e) depend on y¯k only through M M a, where a = k=1 y¯k z k , b = k=1 z k , and n f = a/b, z = i=1 exp(−(x ij − Ck )2 /(σk )2 ); hence, using the Chain Rule we have ∂e ∂ f ∂a 1 = ( f − y) = ( f − y) z k ∂ y¯k ∂a ∂ y¯k b

(10)

Substituting (10) into (9), we obtain the training algorithm for y¯k : y¯k (q + 1) = y¯k (q) − α

f −y b 2(x ij − Ck (q)) ( y¯k (q) − f )z k σk (q)

σk (q + 1) = σk (q) − α

is minimized. That is, the task is to determine the parameters y¯k , Ck and σk such that ek of (8) is minimized. In the sequel, f will be used to denote f k (x). In this part, a gradient descent algorithm is used to find such parameters. Specifically, the following algorithm is used to determine y¯k , y¯k (q + 1) = y¯k (q) − α

∂e |q ∂σk

f −y zk b

where k = 1, 2, . . . , M, and q = 0, 1, 2, . . .

(11)

(15)

(16)

The training algorithms (11), (14), and (16) perform an error back-propagation procedure. This will update parameters of granules such as center positions, radius, and fitness values, providing a fast convergence rate as well as lower computational cost to the genetic algorithm process. 4.2 Setting-up the training data set The data set used in the training phase of the neuro-fuzzy network is obtained from the granules’ information of the more recent generations. Out of the total number of generations N , after a number of generations n (experimentally determined) has passed; the values of centers, width, and fitness values of each granule, G = {Ck , σk , f (Ck )} are taken as the initial parameters of the network. These parameters will be updated as the weights of the network through the training algorithm. Now, after n generations have passed, and after that genetic operations such as selection, mutation, and reproduction have been performed for each generation, the proposed algorithm automatically selects the best-fitted values of a certain percentage of the recent generations of granules, from which the training data set is established. This selection is around 50 % or less of the total fitness values of the granules, and

123

I. Cruz-Vega et al.

is used as the target value of the network. This selection of a certain percentage of best-fitness values is related with the “natural selection process” of individuals in GAs, and is performed trying to avoid falling into a local optimum value.

5 Numerical results

4.3 Algorithm

5.1 Experimental settings

Now, the general steps of the algorithm E-GA-AFFG and the Gradient Descent Algorithm for the Granule’s update algorithm are described through the following pseudo-codes,

For the experimental study we considered a set of optimization tasks that comprises a variety of problems with the following characteristics: nonlinearity, unimodality– multimodality, and low-high dimensionality. First, traditional benchmark functions are considered (De Jong 1975; Digalakis and Margaritis 2002). These functions are selected to make our results comparable to previous studies that have used these classic problems. Next we evaluate the performance of the proposed methods over several instances of a tough global optimization problem: the minimization of potential energy structures with Morse clusters (Morse 1929). Finally, we also evaluate the performance of our methods on a benchmark from the special session on large-scale global optimization from the 2013 IEEE Congress on Evolutionary Computation (Li et al. 2013). The aim of using such benchmark is to assess the effectiveness of the proposed methods over a wide range of real world large-scale optimization problems. The suite of problems comprises a wide variety of objective functions that allow us to effectively assess the performance of the considered methods. For each problem of the different benchmarks we report the average performance over five independent runs. We report the average of: the number of fitness function evaluations, the percentage of optimal solutions reached and the best fitness value. We use hypothesis testing techniques to provide statistical support to the analysis of the results, specifically we use a non-parametric test due to the fact that the initial conditions that guarantee the reliability of the parametric tests may not be satisfied, causing the statistical analysis to loss credibility with these parametric tests (Derrac et al. 2011). We report the p value obtained from the Wilcoxon signed-rank test, as a nonparametric statistical procedure for performing pairwise comparisons between two algorithms. Simulations are compared among an ordinary genetic algorithm3 (GA); the genetic algorithm with adaptive fuzzy fitness granulation proposed in this paper (E-GA-AFFG), see Sect. 3; the genetic algorithm with adaptive fuzzy fitness granulation (GA-AFFG) proposed in Akbarzadeh et al. (2008); and the proposed GA-FGNFN algorithm. The same parameter settings were used for all of the methods in each of the approached problems. Thus, unless otherwise stated, the

Data: P0 = x10 , x20 , ..., xt0

Result: BestFitnessValue f ∗ x ij , N umber O f Evaluations

initialization: Granule C0 = x10 ; while not stop condition do i = 0, forall the x ij ∈ Pi { do

2 2 ; μk x ij = exp − x ij − Ck / σ k

θ i = min μk x ij ;

if max {u k } > θ i and P x ij > random then

f x ij = f (Ck ); else f x ij = Computed By Real Fitness Function; x ij = Ck , AN ewGranuleI sCr eated; end end PC = Cr ossover (Ck ) ; PM = Mutation(Ck ); PR = Repr oduction(Ck ); Pi+1 = build N ext Generation Fr om (Pc , PM , PR , Pi ); Generation i − th, if (module (i, n) == 0) then Update Granule’s Parameters by Gradient Descent, else G (Ck , σk , f (Ck )) = G (Ck , σk , f (Ck )) end i = i + 1, end

Algorithm 1: E-GA-AFFG Algorithm

Data: G (Ck , σk , f (Ck )); k = 1, 2, ..., Granules M;Number of Fuzzy Result: G Ck∗ , σk∗ , f ∗ (Ck ) ; Granule’s updated values if (module (i, n) == 0) then f ∗ (Ck ) = yk − α f −y b zk ; 2(xi −ck ) Ck∗ = Ck − α f −y ; (y k − f ) zk b σ2 k

2(xi −ck ) ; σk∗ = σk − α f −y b (yk − f ) z k σk else G (Ck , σk , f (Ck )) = G (Ck , σk , f (Ck )) end

Algorithm 2: Granule’s Update Algorithm

123

This section reports experimental results that aim at showing the effectiveness of the proposed methods. First, we describe the experimental settings and then the results obtained in three types of problems.

3

We considered the GA implementation of the global optimization toolbox of Matlab (Goldberg and Holland 1988).

Surrogate modeling based on an adaptive network and granular computing

different methods were run with random initial populations, decimal-coded chromosomes, single-point crossover, mutation, fitness scaling, and an elitist stochastic universal sampling selection strategy. Evolutionary parameters were set as follows: percentage of crossover, 1, percentage of mutation, 0.01, population size, 20, and number of generations, 100. The learning rate parameter of the network for GA-FGNFN was fixed to: 0.1. 5.2 Traditional benchmark functions The set of traditional benchmark functions we considered is described in Table 1. Average results of simulations for the different methods are presented in Table 2. From this table, it can be seen that, in terms of reduction of real fitness-function evaluations (No. FFE), the proposed methods (i.e., E-GA-AFFG and GA-FGNFN) obtained better performance than GA and GA-AFFG for most of the considered functions. Both methods, E-GA-AFFG and GAFGNFN, decreased the number of evaluations for GA in about 80 %; whereas, when compared to GA-AFFG the reduction was around 40 %. These are important reductions that significantly save computational resources without considerably degrading the quality of the solutions found. In fact, the p values for each of the benchmark functions show that there were not statistically significant differences between the performance of GA and the other methods in terms of best fitness value. On the other hand, it is clear that E-GA-AFFG significantly outperformed GA-AFFG (in terms of both, No. FFE and best fitness value), evidencing the validity of the extensions proposed herein. One should note that we have optimized the parameters of GA-AFFG to have a strong baseline,

Table 1 Traditional benchmark functions Function F1 : De Jong Dimensions, range F2 : Dimensions, range F3 : Michalewicz Dimensions, range F4 : Rastrigin Dimensions, range F5 : Schwefel

Formulation m

2 i=1 x i

5.3 Morse clusters An important application for global optimization techniques is the minimization of potential energy structures, which is relevant in the study of proteins and nanomaterials (Pintér 2006). The Morse potential is an adequate model for several atomic clusters and provides a challenging benchmark for global optimization algorithms (Morse 1929; Pintér 2006). The model consists of pairwise atomic interactions, Vi j = e2ρ(1−ri j ) − 2eρ(1−ri j )

m = 3, −5.12 ≤ xi ≤ 5.12 m 4 i=1 x i + Gauss(0, 1) m = 30, −1.28 ≤ xi ≤ 1.28 (2·n) m i·x 2 − i=1 sin(xi ) · sin π i n = 10, m = 10, 0 ≤ xi ≤ π m 10 · m + i=1 (xi2 − 10 · cos(2 · π · xi )) m = 20, −5.12 ≤ xi ≤ 5.12 m √ i=1 (−x i · sin( |x i |))

(17)

where ri j is the inter-atomic Euclidean distance and ρ is a parameter that represents the equilibrium pair separation. The associated optimization problem consists of minimizing the potential energy of the N -atoms cluster, V =

Vi j

(18)

i< j

F6 : Griewangk

m = 20, −500 ≤ xi ≤ 500

m m xi2 xi √ 1 + i=1 i=1 cos 4,000 −

Dimensions, range

m = 20, −600 ≤ xi ≤ 600

Dimensions, range

the straight implementation from Akbarzadeh et al. (2008) resulted in worse performance. Comparing the performances of both of the proposed strategies, it is somewhat surprising that E-GA-AFFG and GA-FGNFN showed very similar performance, recall GAFGNFN incorporates an additional adaptation mechanism. In fact, slightly better results were obtained by E-GA-AFFG. Hence, we can say that the extensions made to the original GA-AFFG resulted in a very competitive method that is not benefited significantly by using the fuzzy network. Anyway, it is worth studying what the performance of GA-FGNFN is when varying the frequency by which the neural network was applied. Table 3 shows the performance of the GA-FGNFN when varying the number of generations in which the network is applied. Results from Table 3 were obtained by running the GA-FGNFN algorithm every n = {5, 10, 20, 30, 50} generations. It can be seen that these results are mixed for most functions (except F3 and F4 ), and an acceptable performance was obtained with n = 10, whereas for the rest, better results were obtained by applying the network less frequently. In general, we can say that applying the network barely (e.g., n = 50) or very frequently (e.g., n = 5) is not convenient to obtain a good tradeoff between quality of solutions and reduction of FFE. Actually, larger values of n are preferred because each time the network is applied, the computational time of GA-FGNFN increases. We can conclude that fixing this parameter may be crucial for obtaining positive results with GA-FGNFN.

i

The minimum energy configurations are of fundamental importance in addressing the chemical and physical properties of a given system. This is a very complex optimization problem in which computing the fitness function (see Eq.

123

I. Cruz-Vega et al. Table 2 Comparison of benchmark functions with a conventional Matlab-GA algorithm, E-GA-AFFG, GA-FGNFN, and GA-AFFG (Akbarzadeh et al. 2008) algorithms Fnc. Values

GA

E-GA-AFFG

GA-FGNFN

GA-AFFG (Akbarzadeh et al. 2008)

F1

No. FFE

2,000

190

205.6

268

% Opt. Sol.

80

p value

F3

β = 0.1, γ = 1

β = 0.1, γ = 1 n =20 n = 20, n i = 100, μ = 0.1 β = 0.1, γ = 1

0.00036086

0.0396086

No. FFE

2,000

199

408

730.6

% Opt. Sol.

80

80

80

80

p value

1.0

0.4375

0.0625

Learning Par.

β = 0.04, γ = 1.7

β = 0.001, γ = 2 n = 50, n i = 100, μ = 0.1

β = 0.04, γ = 1.7 1.4667

Best-fitness

0.8873

0.9433

0.82914

2,000

181

202

209

% Opt. Sol.

80

80

80

80

0.0625

0.0625

0.0625 β = 0.04, γ = 1.85

β = 0.04, γ = 1.85

β = 0.04, γ = 1.85 n = 50, n i = 5, μ = 0.1

Best-fitness

−4.3514

−3.2009

−3.1562

−2.08

No. FFE

2,000

191

205.6

842

% Opt. Sol.

60

p value

60

60

60

0.0625

0.0625

0.0625

β = 0.004, γ = 1.15

β = 0.004, γ = 120 n = 50, n i = 100, μ = 0.9

β = 0.004, γ = 1.15

Best-fitness

103.8504

136.35

127.28

109.9558

No. FFE

2,000

184.4

181.4

% Opt. Sol.

60

60

60

p value

0.0625

0.0625

Learning Par.

β = 0.0008, γ = 300

β = 0.0008, γ = 300 n = 50, n i = 100, μ = 0.1

Learning Par.

F6

0.1092

No. FFE

Learning Par.

F5

80 0.0625

0.10908

p value

F4

80 0.0625

Best-fitness

Learning Par. F2

80 0.0625

β = 0.0008, γ = 300

Best-fitness

−3,435.37 −1,729.9

−2,409

−3,265.49

No. FFE

2,000

186

863.53

% Opt. Sol.

80

p value

80

80

80

0.0625

0.125

0.0625

β = 0.00012, γ = 190 β = 0.00012, γ = 190 n = 20, n i = 100, μ = 0.1 β = 0.00012, γ = 190

Learning Par. Best-fitness

182.2

80.8125

109.68

88.72

(17)) is computationally expensive. Thus, it is appropriate for evaluating the performance of our methods. One should note that this problem has been previously approached with other stochastic optimization methods (Pintér 2006; Velasco et al. 2014; Roberts et al. 2000). We evaluated the performance of the considered methods in this complex problem. All of the methods were ran according to the experimental settings explained in Sect. 5.1, except for some parameters of the GA that were modified because of previous results on using a genetic algorithm for this problem (Velasco et al. 2014). The modified parameters were: percentage of crossover: 0.8, percentage of mutation: 0.2, population size: 100, and generation size: 5,000. The results of this experiment are presented in Tables 4 and 5 for different values of N , the dimension of the problem. For each dimension we show the average performance of the consid-

123

80.4752

ered methods and, for some dimensions, the known global optimum. From Tables 4 and 5 it can be seen the same pattern as in the previous section: the proposed methods E-GA-AFFG and GA-FGNFN reduce considerably the number of fitness evaluations of both GA and GA-AFFG without significantly compromising the quality of the solutions. In small dimensions (Table 4), the reduction of FFE of E-GA-AFFG and GA-FGNFN over GA is huge, and in spite of this, the surrogates can obtain solutions close to the global optimum. Regarding GA-AFFG, the proposed methods still obtain considerable improvements (close to 40 %). On the other hand, for large dimensions (Table 5), the proposed surrogates still considerably reduce the number of FEE, although this time the reduction is smaller. Also, it can be seen that the quality of solutions slightly diminishes as the dimensionality increases,

Surrogate modeling based on an adaptive network and granular computing Table 3 Different number of generations where the neural network in GA-FGNFN was applied, percentage of optimal solutions, and best-fitness value Fnc.

No. Gens. parameters

n=5

n = 10

n = 20

n = 30

n = 50

GA

F1 : De Jong

No. FFE

1,740

259

260

220

295

2,000

F2

F3 : Michalewicz

F4 : Rastrigin

F5 : Schwefel

F6 : Griewangk

% Opt. Sol.

60

80

80

80

80

80

Best-fitness

1.654

0.1092

0.114

1.1130

1.6506

0.10908

No. FFE

2,000

460.8

569.2

455.8

440

2,000

% Opt. Sol.

80

40

40

80

80

80

Best-fitness

1.1706

1.0795

2.2572

1.9965

1.0182

0.8873

No. FFE

119.2

147.2

173

282

381.6

2,000

% Opt. Sol.

60

60

60

40

80

80

Best-fitness

−2.8510

−2.77

−2.49

−2.6287

−2.967

−4.3514

No. FFE

457

412

370

365

332

2,000

% Opt. Sol.

40

60

60

60

60

80

Best-fitness

128.3852

116.9746

157.8378

169.5834

109.9558

103.8504

No. FFE

288

221

321

367

252.8

2,000

% Opt. Sol.

40

40

40

40

80

80

Best-fitness

−4,786.24

−4,293.34

−3,987.31

−3,897.3

−3,394.5

−3,435.37

No. FFE

414

412

884

1,055

964

2,000

% Opt. Sol.

60

80

60

60

60

80

Best-fitness

111.3852

80.4752

86.3752

105.2752

143.9852

The last column has the values for the GA implementation

which is due to the difficulty of the problem. Anyway, surrogates offer a good tradeoff between the quality of the solution and computational-resources requirements. 5.4 CEC 2013 benchmark functions In this section, we evaluate the proposed methods on a recent benchmark of functions for large scale optimization (Li et al. 2013). The benchmark consists of 15 functions, each with dimension of 1,000. The aim of using this benchmark is to assess the benefits of our methods when facing a wide range of real-world large-scale optimization problems. Four major categories of large-scale problems are included in the benchmark (the high-level design of these four major categories is explained in Li et al. (2013)): 1. Fully-separable functions (F1 –F3 ); 2. Two types of partially separable functions: (a) with a set of non-separable subcomponents and onefully separable subcomponent; (F4 –F7 ) (b) with only a set of non-separable subcomponents and non fully-separable subcomponent; (F8 –F11 ) 3. Functions with overlapping subcomponents: the subcomponents have some degree of overlap with its neighbouring subcomponents. There are two types of overlapping functions:

(a) with conforming subcomponents: for this type of functions the shared decision variables between two subcomponents have the same optimum value with respect to both subcomponent functions (F13 ) (b) with conflicting subcomponents: for this type of functions the shared decision variables have a different optimum value with respect to each of the subcomponent functions (F14 ) 4. Fully-nonseparable functions F15 The base functions that were used to form separable/nonseparable subcomponents were: Sphere, Elliptic, Rastrigin’s, Ackley’s, Schwefel’s, and Rosenbrock’s functions. Based on the major four categories described above and the aforementioned six base functions, the following 15 large-scale functions are proposed in Li et al. (2013): 1. Fully-separable functions (a) F1 : Elliptic function (b) F2 : Rastrigin function (c) F3 : Ackley function 2. Partially additively separable functions Functions with a separable subcomponent: (a) F4 : Elliptic function (b) F5 : Rastrigin function

123

I. Cruz-Vega et al. Table 4 Comparisons of simulations for morse clusters with GA, GA-FGNFN, E-GA-AFFG, and GA-AFFG (Akbarzadeh et al. 2008) for dimensions 5–30 N

Method

5

GA

f (x ∗ )

GA-AFFG (Aja-Fernández et al. 2004) E-GA-AFFG 10

−9.04

32,496

−9.04

0.009

16,241

−9.04

0.054

24,115

−9.04

500,000

−26.03

−21.43

1.492

30,974

−24.68

−23.574

1.690

20,125

−26.10

GA-FGNFN

−23.956

1.903

25,212

−25.92

GA

−37.20

5.44

500,000

−38.49

−29.519

3.619

35,159

−30.15

−33.393

1.672

25,103

−38.65

GA-FGNFN

−35.185

2.028

26,282

−42.23

GA

−45.09

4.13

500,000

−47.82

−40.169

4.962

35,136

−42.31

−42.209

1.857

28,543

−49.156

GA-FGNFN

−44.543

3.16

29,836

−50.146

GA

−55.00

5.20

500,000

−57.12

GA-AFFG (Aja-Fernández et al. 2004)

−49.621

4.139

81,615

−57.78

E-GA-AFFG

−27.47

−49.75

−72.51

−55.017

2.202

55,697

−58.455

GA-FGNFN

−54.137

3.925

61,540

−59.422

GA

−65.39

5.13

500,000

−69.13

GA-AFFG (Aja-Fernández et al. 2004)

−61.65

4.646

99,126

−62.631

−63.54

3.453

81,741

−65.126

−64.14

3.868

85,714

−69.765

E-GA-AFFG

E-GA-AFFG

−95.13

−118.43

GA-FGNFN

(c) F6 : Ackley function (d) F7 : Schwefels problem Functions with no separable subcomponents: F8 : Elliptic function F9 : Rastrigin function F10 : Ackley function F11 : Schwefels problem

3. Overlapping functions (a) F12 : Rosenbrock’s function (b) F13 : Schwefels function with conforming overlapping subcomponents (c) F14 : Schwefels function with conflicting overlapping subcomponents 4. Non-separable functions (a) F15 : Schwefels problem We compared the performance of all of the considered methods over the aforementioned 15 functions using the

123

0.136

−9.04

−9.04

1.13

GA-AFFG (Aja-Fernández et al. 2004)

(a) (b) (c) (d)

−9.04

500,000

−25.51

E-GA-AFFG

30

1.94e−05

BFV

−9.01

GA-AFFG (Aja-Fernández et al. 2004)

25

−9.04

N o.F F E

GA E-GA-AFFG

20

SD

GA-FGNFN GA-AFFG (Aja-Fernández et al. 2004)

15

f (x) ˆ

experimental settings described in Sect. 5.1. In addition to the developed methods, we show the performance of a reference method called variable mesh optimization for continuous optimization problems (VMO), introduced in Puris et al. (2012). Results of this experiment are shown in Table 6. In general, it can be seen that the proposed surrogate models reduced the number of FFE about 50 % when compared to the GA and VMO methods, which results in important savings of computational resources. Again, there is no significant detriment on the quality of solutions (except for function F1 with E-GA-AFFG). In fact, improvements in terms of the best-fitness values were observed for functions F2 , F3 , F5 , F6 , F9 (not shown here). Our extended version of GA-AFFG outperforms the original implementation in terms of quality of solutions, again, confirming the validity of the proposed extensions. It is interesting, however, that for the considered benchmark, better results were obtained with GA-FGNFN than with E-GA-AFFG (in 14 out of the 15 functions). This result suggests the neuro fuzzy network is really helping the GA to converge. This is a very positive result because even when the number of FFE is not reduced with the network, we were able to improve its convergence capabilities.

Surrogate modeling based on an adaptive network and granular computing Table 5 Comparisons of simulations for Morse clusters with GA, GA-FGNFN, E-GA-AFFG, and GA-AFFG (Akbarzadeh et al. 2008) for dimensions 35–55 N

Method

35

GA

f (x ∗ )

−73.75

7.35

500,000

−75.25

6.19

356,588

−72.15

294,492

−77.46

GA-FGNFN

−73.38

3.46

304,987

−79.12

GA

−70.18

7.09

500,000

−73.15

−62.64

7.13

326,152

−70.31

−68.22

4.57

287,855

−77.75

GA-FGNFN

−68.49

3.19

291,481

−75.14

GA

−74.55

4.42

500,000

−80.13

−167.99

−64.19

5.13

351,961

−69.71

−69.16

8.12

294,238

−73.14

GA-FGNFN

−68.42

6.19

314,592

−71.15

GA

−78.26

4.58

500,000

−80.12

GA-AFFG (Aja-Fernández et al. 2004) −192.95

E-GA-AFFG

−59.15

7.13

381,265

−62.15

−65.12

6.14

284,164

−71.423

GA-FGNFN

−67.15

5.12

335,414

−73.195

GA

−69.63

6.08

500,000

−72.15

GA-AFFG (Aja-Fernández et al. 2004)

−45.19

4.185

415,697

−51.43

−55.13

6.125

294,568

−60.19

−56.14

7.951

354,857

−62.49

GA-AFFG (Aja-Fernández et al. 2004) −219.82

E-GA-AFFG 55

BFV

2.67

GA-AFFG (Aja-Fernández et al. 2004)

50

No. FFE

−69.34 −141.96

E-GA-AFFG 45

SD

−72.94

GA-AFFG (Aja-Fernández et al. 2004) E-GA-AFFG 40

f (x) ˆ

−250.29

E-GA-AFFG GA-FGNFN

Table 6 Experimental results on benchmark functions from CEC-2013 with GA, GA-AFFG (Akbarzadeh et al. 2008), E-GA-AFFG, GA-FGNFN, and VMO (Puris et al. 2012) Fnc.

GA

E-GA-AFFG

GA-FGNFN

VMO

GA-AFFG (Akbarzadeh et al. 2008)

Mean

No. FFE

Mean

No. FFE

Mean

No. FFE

Mean

F1

4.44e+10

1.2e+05

2.01e+12

6.3e+04

3.16e+10

7.1e+04

1.05e+06

1.2e+05

3.14e+13

6.8e+04

F2

46,809

1.2e+05

52,276

5.6e+04

47,465

6.3e+04

2.94e+04

1.2e+05

55,481

5.6e+04

F3

20.94

1.2e+05

32.16

4.8e+04

21.619

5.5e+04

5.31e+02

1.2e+05

28.913

4.3e+04

F4

1.51e+12

1.2e+05

1.96e+12

6.1e+04

1.74e+12

6.2e+04

9.14e+10

1.2e+05

3.49e+12

7.2e+04

F5

1.70e+07

1.2e+05

1.85e+07

5.9e+04

1.68e+07

6.4e+04

7.28e+14

1.2e+05

2.91e+12

4.5e+04

F6

1.86e+07

1.2e+05

1.06e+06

7.2e+04

2.84e+06

7.3e+04

9.38e+05

1.2e+05

2.91e+07

5.8e+04

F7

3.11e+10

1.2e+05

2.50e+11

4.2e+04

2.14e+10

5.8e+04

3.29e+08

1.2e+05

3.91e+12

6.1e+04

F8

3.88e+16

1.2e+05

4.74e+16

6.9e+04

3.29e+16

7.4e+04

4.85e+14

1.2e+05

2.49e+16

8.2e+04

F9

1.38e+09

1.2e+05

1.61e+09

3.8e+04

1.24e+09

5.6e+04

1.81e+09

1.2e+05

3.89e+10

6.8e+04

F10

9.50e+07

1.2e+05

9.51e+07

7.1e+04

8.96e+07

7.3e+04

2.66e+07

1.2e+05

1.96e+08

6.4e+04

F11

2.61e+13

1.2e+05

3.86e+13

6.8e+04

3.15e+13

7.1e+04

3.96e+09

1.2e+05

4.71e+13

6.9e+04

F12

2.72e+12

1.2e+05

3.59e+12

5.9e+04

2.89e+12

6.4e+04

6.88e+06

1.2e+05

4.88e+12

5.7e+04

F13

5.64e+12

1.2e+05

3.75e+13

5.8e+04

6.49e+12

6.8e+04

8.72e+09

1.2e+05

4.91e+13

6.2e+04

F14

4.95e+13

1.2e+05

5.01e+13

7.2e+04

4.81e+13

7.5e+04

5.80e+10

1.2e+05

6.71e+14

5.7e+04

F15

6.94e+13

1.2e+05

8.13e+13

5.2e+04

6.00e+13

6.5e+04

1.68e+07

1.2e+05

9.77e+13

6.8e+04

6 Conclusions Evolutionary algorithms produce candidate solution individuals in each generation with the objective of trying to iden-

No. FFE

Mean

No. FFE

tify those who present better conditions of adaptability and improved performance; this condition is measured by the fitness function. For some problems, estimating the fitness function can be a very time-consuming process, slowing the

123

I. Cruz-Vega et al.

optimization process. Surrogate modeling has been adopted for substituting the real fitness function by approximations obtained by a model. This paper introduces a surrogate modeling approach based on granular computing and a neuro fuzzy network. Granular computing leads to having data compression, gain in computation time, and knowledge extraction from individuals/performance with respective similarities between neighbours. On the one hand, we extend a granular surrogate model (GA-AFFG) by incorporating a probabilistic decision rule and other methodological improvements, resulting in a new granular surrogate model (E-GA-AFFG). The E-GAAFFG approach is then extended by including a learning algorithm to adapt the position of surrogates (GA-FGNFN method). Our proposal extends a previous model by allowing the adaptation of granules’ parameters. The new algorithm proposed in the granule’s construction process modifies the threshold decision function and with the aid of a probability function, it improves, in general, the process of granule’s construction. Experimental results over three benchmarks show the effectiveness of the proposed methods. It is shown that both proposed methods, E-GA-AFFG and GA-FGNFN, considerably reduce the number of fitness function evaluations when compared to a standard genetic algorithm, without significantly compromising the quality of the solutions. We also show that our strategies significantly outperform GA-AFFG. The addition of the neuro fuzzy network resulted beneficial for E-GA-AFFG only for large scale optimization problems. Hence, the adaptive learning process seems to make sense for these highly complex problems, but not very useful for the other two benchmarks. We have identified several extensions for the work presented herein. First, we would like to evaluate the effectiveness of E-GA-AFFG when using different probabilistic functions (i.e., non linearly increasing), in fact, we could learn this function. We also would like to explore the suitability of the developed methods in the context of online prototype generation. Another venue for research is an extensive study on different learning procedures for training the neuro-fuzzy network. Finally, we would like to evaluate the suitability of the proposed surrogate model for other population-based heuristics, including particle swarm optimization. Acknowledgments The first author was supported by CONACyT under a postdoctoral scholarship (CVU No. 162347).

References Aja-Fernández S, Alberola-López C (2004) Fuzzy granules as a basic word representation for computing with words. In: 9th conference speech and computer

123

Akbarzadeh-T MR, Davarynejad M, Pariz N (2008) Adaptive fuzzy fitness granulation for evolutionary optimization. Int J Approx Reason 49(3):523–538 Akbarzadeh-T MR, Mosavat I, Abbasi S (2003) Friendship modeling for cooperative co-evolutionary fuzzy systems: a hybrid ga-gp algorithm. In: 22nd international conference of the North American Fuzzy Information Processing Society, 2003 (NAFIPS 2003). IEEE, pp 61–66 Bertsimas D, Tsitsiklis J et al (1993) Simulated annealing. Stat Sci 8(1):10–15 Castellano G, Fanelli AM, Mencar C (2003) Fuzzy information granules: a compact, transparent and efficient representation. JACIII 7(2):160–168 Clarke SM, Griebsch JH, Simpson TW (2005) Analysis of support vector regression for approximation of complex engineering analyses. J Mech Des 127(6):1077–1087 Cruz-Vega I, Escalante HJ (2015) A note on: adaptive fuzzy fitness granulation for evolutionary optimization. Int J Approx Reason 57:40–43 Cruz-Vega I, Garcia-Limon M, Escalante HJ (2014) Adaptive-surrogate based on a neuro-fuzzy network and granular computing. In: Proceedings of the 2014 conference on genetic and evolutionary computation. ACM, pp 761–768 De Jong KA (1975) An analysis of the behavior of a class of genetic adaptive systems. Ph.D. Dissertation. University of Michigan, Ann Arbor, MI, USA, AAI7609381 Derrac J, García S, Molina D, Herrera F (2011) A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evol Comput 1(1):3–18 Digalakis JG, Margaritis KG (2002) An experimental study of benchmarking functions for genetic algorithms. Int J Comput Math 79(4):403–416 Do Wan Kim HJL, Park JB, Joo YH (2006) Ga-based construction of fuzzy classifiers using information granules Farina M (2002) A neural network based generalized response surface multiobjective evolutionary algorithm. In: Proceedings of the 2002 congress on evolutionary computation, 2002 (CEC’02), vol 1. IEEE, pp 956–961 Goldberg DE, Holland JH (1988) Genetic algorithms and machine learning. Mach Learn 3(2):95–99 Jin Y (2005) A comprehensive survey of fitness approximation in evolutionary computation. Soft Comput 9(1):3–12 Jin Y (2011) Surrogate-assisted evolutionary computation: recent advances and future challenges. Swarm Evol Comput 1(2):61– 70 Karakasis MK, Giannakoglou KC (2004) On the use of surrogate evaluation models in multi-objective evolutionary algorithms. In: Proceedings of European congress on computational methods in applied sciences and engineering (ECCOMAS 2004) Leite D, Gomide F, Ballini R, Costa P (2011) Fuzzy granular evolving modeling for time series prediction. In: 2011 IEEE international conference on fuzzy systems (FUZZ). IEEE, pp 2794–2801 Li X, Tang K, Omidvar MN, Yang Z, Qin K, China H (2013) Benchmark functions for the CEC 2013 special session and competition on large-scale global optimization. Gene 7:33 (2013) Morse PM (1929) Diatomic molecules according to the wave mechanics. ii. Vibrational levels. Phys Rev 34(1):57 Myers WR, Montgomery DC (2003) Response surface methodology. Encycl Biopharm Stat 1:858–869 Panoutsos G, Mahfouf M (2010) A neural-fuzzy modelling framework based on granular computing: concepts and applications. Fuzzy Sets Syst 161(21):2808–2830 Park KJ, Pedrycz W, Oh SK (2007) A genetic approach to modeling fuzzy systems based on information granulation and successive

Surrogate modeling based on an adaptive network and granular computing generation-based evolution method. Simul Model Pract Theory 15(9):1128–1145 Pedrycz W (2014) Allocation of information granularity in optimization and decision-making models: towards building the foundations of granular computing. Eur J Oper Res 232(1):137–145 Pedrycz W, Song M (2012) A genetic reduction of feature space in the design of fuzzy models. Appl Soft Comput 12(9):2801–2816 Pintér JD (2006) Global optimization: scientific and engineering case studies, vol 85. Springer, Berlin Puris A, Bello R, Molina D, Herrera F (2012) Variable mesh optimization for continuous optimization problems. Soft Comput 16(3):511–525 Roberts C, Johnston RL, Wilson NT (2000) A genetic algorithm for the structural optimization of morse clusters. Theor Chem Acc 104(2):123–130 Roh SB, Pedrycz W, Ahn TC (2014) A design of granular fuzzy classifier. Expert Syst Appl 41(15):6786–6795 Sacks J, Welch WJ, Mitchell TJ, Wynn HP (1989) Design and analysis of computer experiments. Stat Sci 4(4):409–423

Velasco J, Saucedo-Espinosa MA, Escalante HJ, Mendoza K, VillarrealRodrıguez CE, Chacón-Mondragón OL, Rodrıguez A, Berrones A (2014) An adaptive random search for unconstrained global optimization. Computacion y Sistemas 18(2):243–257 Yao JT, Vasilakos AV, Pedrycz W (2013) Granular computing: perspectives and challenges. IEEE Trans Cybern 43(6):1977–1989 Yao Y (2005) Perspectives of granular computing. In: 2005 IEEE international conference on granular computing, vol 1. IEEE, pp 85–90 Yao YY (2004) Granular computing. In: Proceedings of the 4th Chinese national conference on rough sets and soft computing, vol 31, pp 1–5 Zadeh LA (1997) Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic. Fuzzy Sets Syst 90(2):111–127 Zhang J, Chowdhury S, Messac A (2012) An adaptive hybrid surrogate model. Struct Multidiscip Optim 46(2):223–238

123

Surrogate modeling based on an adaptive network and granular computing

Recommend Documents