J Glob Optim DOI 10.1007/s10898-013-0119-1
A modified DIRECT algorithm with bilevel partition Qunfeng Liu · Wanyou Cheng
Received: 8 February 2013 / Accepted: 17 October 2013 © Springer Science+Business Media New York 2013
Abstract It has been pointed out by Jones D. R. that the DIRECT global optimization algorithm can quickly get close to the basin of the optimum but takes longer to achieve a high degree of accuracy. In this paper, we introduce a bilevel strategy into a modifed DIRECT algorithm to overcome this shortcoming. The proposed algorithm is proved to be convergent globally. Numerical results show that the proposed algorithm is very promising. Keywords
Global optimization · Robust DIRECT algorithm · Bilevel method
1 Introduction In the mathematical modeling of real world systems for a broad range of applications, people often need to solve global optimization problems. The objective functions of many global optimization problems are multivariate and nonlinear, i.e., there are often multiple local minima. The global minimum, a point where the objective function obtains its smallest value, is often hard to find. Algorithms for solving global optimization problems can be classified into stochastic algorithms and deterministic algorithms [12,19]. Many of the former type use analogies to biology and physics to approach the global optimum [12]. The latter type includes the popular branch and bound algorithm and its variants [1,21] and the DIRECT algorithm [13]. In this paper, we consider the following bound constrained global optimization problem min f (x),
(1)
x∈Ω
This work was supported by MOE (Ministry of Education in China) Project of Humanities and Social Sciences (Project No. 13YJC630095) and NSF of China (Nos. 11271069, 11101081). Q. Liu (B) · W. Cheng School of Computing, Dongguan University of Technology, Dongguan 523808, China e-mail:
[email protected] W. Cheng e-mail:
[email protected]
123
J Glob Optim
where Ω = {x ∈ Rn | − ∞ < li ≤ xi ≤ u i < ∞, i = 1, . . . , n}. The objective function f (x) is just continuous in a neighborhood of the global minimum. We present a deterministic global optimization algorithm for problem (1), which can be regarded as an improvement of the original DIRECT algorithm [13]. Although the DIRECT algorithm only suit for small scale problems, it stimulates many interests due to its guaranteed convergence and easiness to implement [2,7,9,12,15]. Moreover, in a recent comparison of derivative-free algorithms, two algorithms based on DIRECT (MCS and TOMLAB/GLCCLUSTER) performed very well [20]. The DIRECT (DIviding RECTangle) algorithm proposed in [13] is a deterministic sampling global optimization algorithm for problem (1), where the dimension n is often small. In the initialization stage of the DIRECT algorithm, the original search space Ω is normalized into a unit hypercube. In every iteration of the DIRECT algorithm, there are two key processes. The first process is to identify some hyperrectangles which may contain optimal points, these hyperrectangles are called potential optimal hyperrectangles (see Definition 1 below). The second process is to sample inside these potential optimal hyperrectangles and divide them into smaller hyperrectangles. In [13], the set of centers of hyperrectangles created by the DIRECT algorithm is shown to be dense in the original search space Ω. Therefore, if the objective function is continuous in a neighborhood of the optimal minimum, then the DIRECT algorithm can sample deterministically a point near enough to the optimal minimum. In [7,8], numerical experiments were presented to show that the original DIRECT algorithm is sensitive to the additive scaling of the objective function. The modified DIRECTmedian algorithm was proposed in [8] to eliminate this sensitivity. In [15], theoretical results were proposed which show that the original DIRECT algorithm is sensitive not only the additive scaling but also the linear scaling of the objective function. In order to eliminate these sensitivities, the definition of the potential optimal hyperrectangle is changed slightly in [15]. Specifically, the definition of the minimally acceptable improvement is made insensitive to linear scaling. Then a class of DIRECT algorithms are proved to be robust in [15], which include DIRECT-median [8] and DIRECT-a [15] as special cases. We note that, in this paper, an algorithm is “robust” means that it is insensitive to linear scaling of the objective function. Although the robustness of the original DIRECT algorithm is obtained, there is another defect of the original DIRECT algorithm. It is pointed out in [14] that DIRECT suffers the eventually inefficient behavior: DIRECT quickly gets close to the basin of the optimum but takes much longer to achieve a high degree of accuracy. Actually, such defect is common for derivative-free optimization algorithms. A common strategy to eliminate such defect is to combine DIRECT with a good local optimizer, for example [17]. However, because local optimizers are often sensitive to the initial point, this strategy is often highly sensitive to the number of function evaluations consumed in the global phase. In this paper, we adopt bilevel strategy to improve the ability of DIRECT to achieve solutions with high degree of accuracy. In engineering and scientific computations, there are two different meanings about bilevel. The first is bilevel optimization: it employees an upperlevel problem and a lower-level problem, the value of a constraint or objective function in the upper-level problem is itself the result of solving a lower-level optimization problem [4]. The second is bilevel ( or multilevel ) methods, which are among the most efficient modern techniques for solving large scale algebraic systems arising from the discretization of partial differential equations (PDE) [3,22]. In this paper, “bilevel” possesses the latter meaning. The bilevel method is designed to overcome a so called “smooth mode phenomenon”, which tells the following fact that the residual is hard to eliminate if we only adopt a singlelevel algorithm to solve a linear system. The key idea in bilevel methods is to reduce the
123
J Glob Optim
residual of the fine level (large scale) algebraic system discretized from a PDE through solving a coarse level (small scale) algebraic system discretized from the same PDE. Such a strategy can eliminate the smooth mode phenomenon and is proved to be almost optimal [3,22]. Because the same algorithm can be adopted to solve both the fine level algebraic system and the coarse level algebraic system, multilevel methods can be easily obtained by recursing the bilevel methods [22]. In our previous work [16], we pointed out that the eventually inefficient behavior of partition-based algorithms (e.g., DIRECT) is similar to the smooth mode phenomenon. Inspired by the great succession of the multilevel methods in eliminating the smooth mode, we proposed in [16] a multilevel partition-based algorithm framework for global optimization problems. A DIRECT-based multilevel partition algorithm was proposed in [16]. Preliminary numerical results showed that the proposed multilevel really help to eliminate DIRECT’s eventually inefficient behavior. In this paper, we combine the multilevel partition strategy and the robust DIRECT algorithm. Our main purpose is to improve DIRECT’s ability to find solutions with high level of accuracy within reasonable computational cost. More precisely, we adopt a robust DIRECT (RDIRECT) algorithm to search not only in the unit hypercube normalized by the original search space (we call it the fine level search space) but also in the coarse level search space, which is a subset of the whole set of hyperrectangles created in the fine level. Such bilevel strategy allows to turn to search in the coarse level before the inefficient behavior comes in the fine level. We hope that the bilevel strategy can accelerate the convergence through searching in the coarse search space. Although it is easy to obtain a robust DIRECT algorithm with multilevel search, we only apply the bilevel method. The reason is that our preliminary numerical results show that multilevel strategy brings no clear improvement than the bilevel strategy. We call the obtained algorithm RDIRECT-b. We prove that RDIRECT-b is globally convergent. Numerical results proposed in this paper show that RDIRECT-b performs much better than the original DIRECT algorithm, and performs better than the RDIRECT algorithm, too. Although additive parameters are introduced in the RDIRECT-b algorithm, the default values of the two parameters seem to work simultaneously for most problems. In our numerical experiments, the bilevel DIRECT (DIRECT-b) algorithm is also compared. Our numerical results show that, when problem is well scaled, RDIRECT-b sometimes degrade the performance of DIRECT-b. However, then the problems is bad scaled, RDIRECTb performs much better than DIRECT-b. This conclusion is similar to that in [15]. We organize this paper as follows. In Sect. 2, we describe the RDIRECT-b algorithm. In Sect. 3, we prove the global convergence for the RDIRECT-b algorithm. We report numerical experiments in Sect. 4 and give some conclusions in the final section.
2 The RDIRECT-b algorithm In this section, we describe the RDIRECT-b algorithm in details. In every iteration of the RDIRECT-b algorithm, there are two levels search spaces, one is the fine level search space, another is the coarse level search space. The fine level search space is the whole set of hyperrectangles. In the beginning, there is only one unit hypercube in the fine level. We sample 2n points in the unit hypercube and compute the function values at these points. Specifically, if we denote c1 as the center point of the unit hypercube, then we sample c1 ± 13 ei , i = 1, . . . , n, where ei is the ith unit vector.
123
J Glob Optim
Fig. 1 Two dimensional example of sampling and division in the fine level
After the function is evaluated at these sampled point, the unit hypercube need to be divided into 2n + 1 smaller hyperrectangles, whose centers are the sampled points and the center of the original hypercube. We adopt the same strategy as that used in the original DIRECT algorithm to divide the hypercube. The key point is that the hypercube is divided into three equal parts along each dimension and dimensions with low function values are divided first. Figure 1 illustrates a two dimensional example of sampling and division in the first iteration in the fine level. After divided, the fine level has many smaller hyperrectangles. At the next step, we choose some potential optimal hyperrectangles to sample inside them and then divide them further. The potential optimal hyperrectangle is a hyperrectangle which may contain good, unsampled points. For the RDIRECT-b algorithm, the definition of potential optimal hyperrectangle is given as follows [15]. Definition 1 Given > 0 and the index set of the hyperrectangles I . Let f min be the known minimal function value, and ci , σi be the center and the size of the ith hyperrectangle. Denote f cc as any convex combination of the objective function values gathered during previous iterations by RDIRECT-b. If there is a constant γ > 0, such that the following inequalities hold, f (c j ) − γ σ j ≤ f (ci ) − γ σi , ∀i ∈ I
(2a)
f (c j ) − γ σ j ≤ f min − | f min − f cc |,
(2b)
then the jth hyperrectangle is a potential optimal hyperrectangle (POH). Remark 1 In the above definition, convex combination means that, for gathered function the K K , we select α ∈ [0, 1], k = 1, . . . , K such that values { f (xk )}k=1 k k=1 αk = 1 and then define f cc =
K
αk f (xk ).
(3)
k=1 K . In this paper, f cc equals the median value of the gathered functions values { f (xk )}k=1
Remark 2 If we substitute the condition (2b) with the following condition f (c j ) − γ σ j ≤ f min − | f min |,
(4)
then we obtain the definition of potential optimal hyperrectangle for the DIRECT-b algorithm.
123
J Glob Optim
Fig. 2 POHs can be identified by figure
Determining the set of POHs can be regarded as a problem of finding the extreme points on the lower right convex hull of a set of points in the plane. In [13], a special figure is designed to help to understand the Definition 1 and to identify the set of POHs. Figure 2 illustrates an example. In Fig. 2, every dot corresponds to a hyperrectangle. The horizontal coordinate is the size of hyperrectangle, and the vertical coordinate is the objective function value at the center of hyperrectangle. The three blue dots are the current POHs. The balance parameter prevents the dot with best function value from being a POH. After the current POHs identified, we sample 2n points inside each POH and divide them further. We summarize the sampling process and the dividing process into the following algorithm. Algorithm 1 Sampling and dividing algorithm Initialization: Let ξ be the maximal side length, and I be the set of dimensions of the hyperrectangle with length ξ . – Sampling: For all i ∈ I , evaluate the objective function at c ± 13 ξ ei , where c is the center point of the hyperrectangle. – Dividing: Let ωi = min{ f (c ± 13 ξ ei )}. Divide the hyperrectangle containing c into thirds along the dimensions in I , starting with the dimension with smallest ωi and continuing to the dimension with the largest ωi .
We execute N1 iterations of the RDIRECT algorithm in the fine level search space. In each iteration, we identify a set of POHs, then sample and divide inside them. After N1 iterations in the fine level search space, let S f be the set of hyperrectangles whose cardinality is denoted as |S f |. Then we select r0 |S f | smallest hyperrectangles in S f to institute the coarse level, where r0 ∈ (0, 1) is a constant. We execute N2 iterations of RDIRECT in the coarse level search space. After N2 iterations in the coarse level search space, let Sc be the set of hyperrectangles created in the coarse level. Then go back to the fine level, update the set S f := S f ∪ Sc , and then execute N3 iterations of RDIRECT.
123
J Glob Optim
Algorithm 2 RDIRECT-b Normalize the search space into a unit cube, let c1 be the center. Compute f (c1 ), f min = f (c1 ). Denote S f and Sc as the sets of hyperrectangles created in the fine level and in the coarse level, respectively. while(stopping conditions do not hold) do – For i = 1 : N1 % Sampling and dividing in the fine level – Identify all the POHs according to the Definition 1; – Sample and divide every POH by Algorithm 2, and update f min and S f . – Select r0 |S f | smallest hyperrectangles in S f , and let Sc denote the set of these selected hyperrectangles. – For i = 1 : N2 % Sampling and dividing in the coarse level • Identify all the POHs according to the Definition 1; • Sample and divide every POH by Algorithm 2, and update f min and Sc . – Update S f = S f ∪ Sc . Let Sc = [ ]. – For i = 1 : N3 % Sampling and dividing in the fine level – Identify all the POHs according to the Definition 1; – Sample and divide every POH by Algorithm 2, and update f min and S f .
We summarize RDIRECT-b algorithm as follows. Remark 3 If we let N2 = 0, then the RDIRECT-b algorithm becomes the RDIRECT algorithm. Remark 4 In step 2 in the RDIRECT-b algorithm, “r0 |S f | smallest hyperrectangles” are often undefined due to many hyperrectangles have the same size. There are different ways to define it in practice. In our implementation, we sort the rectangles lexicographically, first by size and then by function value, and then select the r0 smallest hyperrectangles. If there are some hyperrectangles with the same size and the same function values, then we select the top ones. Remark 5 If the condition (2b) in Definition 1 is replaced by the condition (4), then the RDIRECT-b algorithm becomes the DIRECT-b algorithm. The RDIRECT-b algorithm terminates when a user-supplied budget of function evaluations or iterations is exhausted.
3 Convergence analysis of the RDIRECT-b algorithm In this section, we assume that the search space Ω is the unit hypercube. This does not affect the results but can simplify the proof of the convergence. We let Ck be the set of points that have been sampled after k iterations with RDIRECT-b. Here, an iteration of RDIRECT-b includes N1 + N3 iterations of RDIRECT in the fine level and N2 iterations of RDIRECT in the coarse level. Because the RDIRECT-b algorithm will not terminate before the budget of function evaluations is exhausted. In this section, we let the budget of function evaluations be infinite, and denote C = ∪k Ck . We will show that the set C is dense in the unit hypercube, and if f is continuous in the neighborhood of the global minimum, then the RDIRECT-b algorithm can converge to the globally optimal function value. Theorem 1 Given any point x in the unit hypercube and any δ > 0, there is a K ∈ N, such that for any k > K , there is a y ∈ Ck , such that the following inequality holds, |y − x| < δ.
123
(5)
J Glob Optim
Proof We prove by contradiction. Assume that there is a point x ∈ Ω and a constant δ > 0, for any K ∈ N, there is a k > K , such that |y − x| ≥ δ, ∀y ∈ Ck . Denote Bδ (x) = {z| |z − x| < δ}, then Bδ (x) ∩ Ck = ∅, ∀k ∈ N.
(6)
Denote H the smallest hyperrectangle which contains Bδ (x) after K 1 iterations, then H will be divided in finite iterations [9]. Suppose that H is divided in the K 2 th (K 2 > K 1 ) iteration. If there is a point in Bδ (x) be sampled during the K 2 th iteration, then we find a contradiction with (6). Otherwise, denote H the smallest hyperrectangle which contains Bδ (x) after K 2 iterations. Similarly, H will be divided in finite iterations. Repeat the same process, we can show that the RDIRECT-b algorithm will sample at least one point in Bδ (x) in finite iterations, which contradicts (6). This end the proof. Theorem 1 shows that the set of points sampled by the RDIRECT-b algorithm is dense in Ω. The following result is a straightforward corollary of Theorem 1. Corollary 1 Assume that f is continuous in the neighborhood of the global minimum x ∗ , then for any δ > 0, the RDIRECT-b algorithm can sample a point y, such that the following inequality holds, | f (y) − f (x ∗ )| < δ.
(7)
Remark 6 Comparing Theorem 1 in this paper with the convergence result in [13], we can find that the sampling and dividing process in the fine level plays a different role from that in the coarse level. The former ensures the convergence and the latter can accelerate the convergence, which is supported by the following numerical results.
4 Numerical experiments In this section, we report some numerical results. First, we report numerical results on the Jones test set, which is used to test the performance of the original DIRECT algorithm [13] and its variants [2,7,8,10,15,16]. Second, we exam the sensitivity of parameters. Third, we adopt the performance profiles [5] and data profiles [18] to compare the proposed RDIRECT-b and other algorithms on an extended test set. 4.1 Test on the Jones test set In this subsection, we compare the following algorithms: – DIRECT: The original DIRECT algorithm proposed in [6,13]. – RDIRECT: The DIRECT-median proposed in [8], see [15] for the proof of its insensitivity to linear scaling of the objective function. – DIRECT-b: The bilevel form of the original DIRECT algorithm. It is the same as the GOMP-T algorithm proposed in [16]. – RDIRECT-b: The bilevel form of the RDIRECT algorithm.
123
J Glob Optim Table 1 Key characteristics of the Jones test set Function
Abbreviation
Problem dimension
Number of local minima
Number of global minima
Shekel 5
S5
4
5
1
Shekel 7
S7
4
7
1
Schkel 10
S10
4
10
1
Hartman 3
H3
3
4
1
Hartman 6
H6
6
4
1
Branin RCOS
BR
2
3
3
Goldstein and Price
GP
2
4
1
Six-Hump Camel
C6
2
6
2
2D Shubert
SHU
2
760
18
In our experiments, we use the codes of DIRECT written by Finkel in Matlab. The codes can be downloaded from the following web site: http://www4.ncsu.edu/~ctk/Finkel_Direct/, see [6] for its user-guide. We use the following parameters to implement the belevel strategy N1 = 3, N2 = 1, N3 = 1, r0 = 0.1.
(8)
That is to say, we firstly implement 3 iterations of RDIRECT (or DIRECT for DIRECT-b) in the fine level, select 10 % of the obtained hyperrectangles (see Remark 4 for details) and then go to the coarse level. We implement only 1 iteration of RDIRECT (or DIRECT) in the coarse level, and then go back into the fine level. In the fine level, we implement another 1 iteration of RDIRECT (or DIRECT). This constitutes one iteration of the RDIRECT-b (or DIRECT-b) algorithm. In this subsection, the test problems are the Jones test set used in [2,8,10,13,15]. The Jones test set contains nine test problems, Table 1 shows the key characteristics of the set of problems, where the number of local optima includes the number of global optima. In our experiments, the efficiency is measured as the number of function evaluations needed for convergence, and we use the same definition of convergence as in [13]. Specifically, the convergence is defined in terms of percent error from the globally optimal function value (denoted by f global ). Let f min denote the found best function value, then the percent error is defined by f min − f global < perror. | f global |
(9)
In our experiments, perror = 1e − 4, 1e − 5, 1e − 6, 1e − 7, 1e − 8, 1e − 9, respectively. Table 2 shows the numerical results, where ‘–’ means that the convergence criterion (9) is not satisfied within 500,000 function evaluations. The number with * is different from the result (285) reported in [7,13]. As we can see from Table 2, the proposed RDIRECT-b algorithm performs very well. First, RDIRECT-b solves all the 54 problem/tolerance combinations. However, the RDIRECT algorithm fails in one case (H6, perror = 1e−9), DIRECT-b fails in 10 cases, and the original DIRECT algorithm fails in 12 cases.
123
J Glob Optim Table 2 Numerical results on the Jones test set Prob.
Solver
perror 1e − 4
S5
S7
S10
H3
H6
BR
GP
C6
1e − 6
1e − 7
1e − 8
1e − 9
DIRECT
155
255
255
53,525
–
–
DIRECT-b
159
265
265
18,043
–
–
RDIRECT
155
255
255
513
777
1,217
RDIRECT-b
159
251
251
353
551
1,031
DIRECT
145
1,061
4,879
38,167
–
–
DIRECT-b
157
321
2,543
3,523
–
–
RDIRECT
145
255
331
373
949
1,665
RDIRECT-b
157
255
325
347
759
989
DIRECT
145
1,131
4,939
–
–
–
DIRECT-b
157
321
2,385
265,199
–
–
RDIRECT
145
255
331
565
1,019
2,115
RDIRECT-b
157
255
325
465
757
979
DIRECT
199
695
751
775
775
775
DIRECT-b
173
753
809
843
843
843
RDIRECT
199
459
773
815
815
815
RDIRECT-b
173
497
853
895
895
895
DIRECT
571
1,031
182,623
–
–
–
DIRECT-b
559
959
107,329
–
–
–
RDIRECT
571
915
1,191
1,681
4,293
173,517
RDIRECT-b
559
877
1,209
2,049
3,473
6,027
DIRECT
195
195
377
1,295
38,455
75,713
DIRECT-b
159
159
431
1,167
14,777
65,155
RDIRECT
259
333
393
1,205
5,079
6,897
RDIRECT-b
181
181
287
519
923
1,399
DIRECT
191
241
305
1,479
10,437
–
DIRECT-b
175
229
271
1,217
5,119
40,963
RDIRECT
191
321
535
1,113
3,395
11,337
RDIRECT-b
175
235
373
537
895
1,303
DIRECT
145∗
211
211
211
44,537
203,301
DIRECT-b
115
115
115
115
38,847
62,747
RDIRECT
145
213
213
213
1,365
2,903
RDIRECT-b SHU
1e − 5
115
115
115
115
533
533
DIRECT
2,967
3,143
3,867
15,915
68,667
–
DIRECT-b
2,837
3,025
3,789
16,505
27,463
–
RDIRECT
3,663
3,839
4,407
6,671
10,955
15,111
RDIRECT-b
3,501
3,641
4,259
6,571
11,111
12,989
Second, RDIRECT-b solve most (40) of the 54 problem/tolerance combinations with best efficiency. Especially, when the accuracy is high (perror is small), RDIRECT-b needs much smaller cost than those of the other 3 algorithms. Even when the accuracy is low, RDIRECTb performs not bad. For example, when perror = 1e − 4, RDIRECT-b still solve 4 (of 9) problem/tolerance combinations with best efficiency.
123
J Glob Optim
Moreover, we can see from Table 2 that the DIRECT-b algorithm performs better than the original DIRECT algorithm, too. 4.2 Test parameters’ sensitivity In this subsection, we change the parameters combinations to test its sensitivity. We explore r0 = 0.05, 0.10, 0.25, and (N1 + N 3) : N2 = 2, 4, 6. As a result, we have nine combinations. Specifically, we compare the following nine RDIRECT-b algorithms with the RDIRECT algorithm. – – – – – – – – –
RDIRECT-b1: RDIRECT-b with RDIRECT-b2: RDIRECT-b with RDIRECT-b3: RDIRECT-b with RDIRECT-b4: RDIRECT-b with RDIRECT-b5: RDIRECT-b with RDIRECT-b6: RDIRECT-b with RDIRECT-b7: RDIRECT-b with RDIRECT-b8: RDIRECT-b with RDIRECT-b9: RDIRECT-b with
N1 N1 N1 N1 N1 N1 N1 N1 N1
= 1, N2 = 1, N2 = 1, N2 = 3, N2 = 3, N2 = 3, N2 = 5, N2 = 5, N2 = 5, N2
= 1, N3 = 1, N3 = 1, N3 = 1, N3 = 1, N3 = 1, N3 = 1, N3 = 1, N3 = 1, N3
= 1, η = 0.05; = 1, η = 0.10; = 1, η = 0.25; = 1, η = 0.05; = 1, η = 0.10; = 1, η = 0.25; = 1, η = 0.05; = 1, η = 0.10; = 1, η = 0.25;
In this subsection, we adopt the same convergence criterion (9) to measure the efficiency of these algorithms on the Jones test set. The results are listed in Tables 3 and 4, where ‘–’ means that the convergence criterion (9) is not satisfied within 25,000 function evaluations. From Tables 3 and 4 we can see that, RDIRECT-b performs better than RDIRECT for all the nine problems except the H3 problem and the SHU problem. For the H3 problem, it seems too easy for both RDIRECT and RDIRECT-b, thus the bilevel search strategy possesses no clear advantage. For the SHU problem, both RDIRECT-b and RDIRECT have similar performance. From Tables 3 and 4 we can see that, among the 81 algorithm/problem combinations (9 RDIRECT-b algorithms and 9 problems), RDIRECT-b algorithms perform significantly better than RDIRECT in 56 combinations (with bold), perform with similar performance at least in 11 combinations (1 for S5, 8 for H3 and 2 for SHU), while perform worse at most in 14 combinations. Table 5 shows the number of problems on which RDIRECT-b performs significantly better than RDIRECT and the number of problems on which RDIRECT-b does not perform better than RDIRECT. The results show that, RDIRECT-b2 and RDIRECT-b3 perform worse. The reason maybe lies in the following two facts: first, N1 is too small to locate a good convergence basin of global minimum; second, r0 = 0.1, 0.25 makes the search locally. As a result, both RDIRECT-b2 and RDIRECT-b3 pay too much attention to not good (coarse level) regions. If we exclude RDIRECT-b2 and RDIRECT-b3, all other RDIRECT-b algorithms perform significantly better than RDIRECT. In this sense we say that, the parameters in RDIRECT-b are insensitive. 4.3 Extended numerical results In this subsection, we compare the performances of DIRECT, DIRECT-b, RDIRECT and RDIRECT-b on the Hedar test set [11], which is largely extended from the Jones test set. The Hedar test set has 27 functions, some of them have several variants (e.g., Bohachevsky, Hartman, Trid, etc.), some of them have variable dimensions. Table 6 shows the function
123
J Glob Optim Table 3 Numerical results on testing parameters sensitivity Prob.
Solver
perror 1e − 4
S5
S7
S10
H3
1e − 5
1e − 6
1e − 7
1e − 8
1e − 9
RDIRECT
155
255
255
513
777
1,217
RDIRECT-b1
171
255
275
383
523
793
RDIRECT-b2
–
–
–
–
–
–
RDIRECT-b3
187
283
283
457
581
765
RDIRECT-b4
171
245
245
381
515
831
RDIRECT-b5
159
251
251
353
551
1,031
RDIRECT-b6
173
259
259
385
615
885
RDIRECT-b7
161
247
247
401
663
1,009
RDIRECT-b8
165
251
251
415
675
1,087
RDIRECT-b9
179
299
299
487
759
1,259
RDIRECT
145
255
331
373
949
1,665
RDIRECT-b1
167
261
317
349
739
981
RDIRECT-b2
–
–
–
–
–
–
RDIRECT-b3
–
–
–
–
–
–
RDIRECT-b4
–
–
–
–
–
–
RDIRECT-b5
157
255
325
347
759
989
RDIRECT-b6
4,475
4,989
6,123
6,269
7,769
8,363
RDIRECT-b7
151
261
315
339
823
1,391
RDIRECT-b8
159
255
315
333
825
1,167
RDIRECT-b9
157
277
341
361
885
1,247
RDIRECT
145
255
331
565
1,019
2,115 801
RDIRECT-b1
169
263
319
407
661
RDIRECT-b2
–
–
–
–
–
–
RDIRECT-b3
3,145
3,743
5,735
8,879
10,779
13,915
RDIRECT-b4
–
–
–
–
–
–
RDIRECT-b5
157
255
325
465
757
979
RDIRECT-b6
2,241
2,525
4,569
5,173
5,989
7,187
RDIRECT-b7
151
247
295
451
735
1,031
RDIRECT-b8
159
255
313
513
793
1,489
RDIRECT-b9
157
277
341
543
831
1,263
RDIRECT
199
459
773
815
815
815
RDIRECT-b1
243
411
917
959
959
959
RDIRECT-b2
123
435
937
979
979
979
RDIRECT-b3
175
335
1,087
1,137
1,137
1,137
RDIRECT-b4
777
859
879
921
921
921
RDIRECT-b5
173
497
853
895
895
895
RDIRECT-b6
185
421
951
959
959
959
RDIRECT-b7
189
457
823
865
865
865
RDIRECT-b8
193
455
825
861
861
861
RDIRECT-b9
189
449
889
929
929
929
123
J Glob Optim Table 4 Numerical results on testing parameters sensitivity (continued) Prob.
Solver
perror 1e − 4
H6
C6
1e − 7
1e − 8
1e − 9
RDIRECT
571
915
1,191
1,681
4,293
−
687
971
1,151
1,423
1,925
3,141
787
1,219
1,435
1,757
2,287
3,625
6,993
7,339
7,641
14,317
17,265
21,125
RDIRECT-b4
593
923
1,179
1,513
2,409
5,021
RDIRECT-b5
559
877
1,143
1,461
2,535
5,295
RDIRECT-b6
541
871
1,131
1,529
2,375
5,153
RDIRECT-b7
593
891
1,113
1,561
2,699
6,973
RDIRECT-b8
553
905
1,161
1,533
2,865
7,051
RDIRECT-b9
521
845
1,119
1,499
3,081
6,883
RDIRECT
259
333
393
1,205
5,079
6,897
RDIRECT-b1
181
181
305
423
627
627
RDIRECT-b2
151
151
251
337
539
539
RDIRECT-b3
155
155
289
339
537
537
RDIRECT-b4
199
199
383
647
901
1,533
RDIRECT-b5
181
181
287
519
923
1,399
RDIRECT-b6
183
183
333
591
997
1,909
RDIRECT-b7
245
245
383
593
977
1,721
RDIRECT-b8
253
253
391
609
1,005
2,047
RDIRECT-b9
269
269
375
637
1,041
2,763
RDIRECT
191
321
535
1,113
3,395
11,337
RDIRECT-b1
173
233
355
521
697
881
RDIRECT-b2
177
245
367
535
723
917
RDIRECT-b3
137
181
251
431
645
871
RDIRECT-b4
175
227
355
673
1,033
1,433
RDIRECT-b5
175
235
373
537
895
1,303
RDIRECT-b6
163
249
401
573
955
1,379
RDIRECT-b7
175
273
483
723
1,263
1,853
RDIRECT-b8
181
289
471
771
1,335
1,931
RDIRECT-b9
181
299
521
797
1,381
1,993
RDIRECT
145
213
213
213
1,365
2,903
RDIRECT-b1
117
237
237
237
623
623
RDIRECT-b2
105
245
245
245
485
485
RDIRECT-b3
107
139
139
139
397
397
RDIRECT-b4
129
227
227
227
637
637
RDIRECT-b5
115
115
115
115
533
533
RDIRECT-b6
119
119
119
119
567
567
RDIRECT-b7
125
223
223
223
951
951
RDIRECT-b8
125
225
225
225
719
719
RDIRECT-b9
131
205
205
205
761
761
RDIRECT-b3
GP
1e − 6
RDIRECT-b1 RDIRECT-b2
BR
1e − 5
123
J Glob Optim Table 4 continued Prob.
SHU
Solver
perror 1e − 4
1e − 5
1e − 6
1e − 7
1e − 8
1e − 9
RDIRECT
3, 663
3, 839
4, 407
6, 671
10, 955
RDIRECT-b1
3, 131
3, 271
3, 903
6, 039
10, 069
15, 111 13, 309
RDIRECT-b2
5, 063
5, 251
5, 951
9, 735
16, 055
24, 617
RDIRECT-b3
2, 367
2, 511
2, 815
3, 975
7, 067
7, 659
RDIRECT-b4
3, 129
3, 273
3, 917
6, 029
10, 049
13, 085 12, 989
RDIRECT-b5
3, 501
3, 641
4, 259
6, 571
11, 111
RDIRECT-b6
4, 695
4, 883
5, 579
8, 279
14, 427
17, 255
RDIRECT-b7
3, 939
4, 123
4, 795
7, 063
12, 103
16, 915
RDIRECT-b8
3, 969
4, 149
4, 825
7, 105
11, 393
16, 469
RDIRECT-b9
4, 391
4, 575
5, 263
7, 703
12, 939
15, 967
Table 5 Number of problems on which RDIRECT-b performs better (or not better) than RDIRECT Algorithm
RDIRECT-b1
RDIRECT-b2
RDIRECT-b3
RDIRECT-b4
RDIRECT-b5
Better
8
4
5
6
8
Not better
1
5
4
3
1
Algorithm
RDIRECT-b6
RDIRECT-b7
RDIRECT-b8
RDIRECT-b9
total
Better
5
7
7
6
56
Not better
4
2
2
3
25
name, dimension, search domain and the known minimal function value. Totally, we test 64 problems, the maximal dimension is 48. We note that the search domains are modified for some problems (Bohachevsky, Griewank, Matyas, Rastrigin, Sphere, Sum square) in order to void the global minimum lies in the centroid of the feasible region. Specifically, for each dimension, the lower bound is multiplied 0.8 while the upper bound is multiplied 1.25. Both DIRECT-b and RDIRECT-b adopt the same parameters as those listed in (8). We also adopt the condition (9) to measure the efficiency. In the case that f global = 0, we require the following condition holds f min < perror.
(10)
Because the test set is large, we adopt the data profiles [18] and the performance profiles [5,18] to exhibit the numerical results. Data profiles is a tool for analyzing the performance of derivative-free optimization solvers [18]. Given a budget of function evaluations μ f , this technique memorize the best function values until the following convergence test satisfied f (x0 ) − f (x) ≥ (1 − τ )( f (x0 ) − f L ),
(11)
where x0 is the starting point, τ > 0 is a tolerance and f L is the smallest objective function value obtained by any solver within μ f of function evaluations. Then the memorized infor-
123
J Glob Optim Table 6 Key characteristics of the Hedar test set Problem
n
Ω
f (x ∗ )
Ackley
2,5,10,20
[−15, 30]n
0
Beale
2
[−4.5, 4.5]2
0
Bohachevsky 1
2
[−80, 125]2
0
Bohachevsky 2
2
[−80, 125]2
0
Bohachevsky 3
2
[−80, 125]2
0
Booth
2
[−100, 100]2
0
Branin
2
[−5, 10] ∗ [0, 15]
0.397887357729739
Colville
4
[−10, 10]4
0
Dixson Price
2,5,10,20
[−10, 10]n
0
Easom
2
[−100, 100]2
−1
Goldstein and Price
2
[−2, 2]2
3
Griewank
2,5,10,20
[−480, 750]n
0
Hartman 3
3
[0, 1]3
−3.86278214782076
Hartman 6
6
[0, 1]6
−3.32236801141551
Hump
2
[−5, 5]2
0
Levy
2,5,10,20
[−10, 10]n
0
Matyas
2
[−8, 12.5]2
0
Michalewics
2
[0, π ]2
−1.80130341008983
Michalewics
5
[0, π ]5
−4.687658179
Michalewics
10
[0, π ]10
−9.66015
Perm
4
[−4, 4]4
0
Powell
4,12,24,48
[−4, 5]n
0
Power sum
4
[0, 4]4
0
Rastrigin
2,5,10,20
[−4.1, 6.4]n
0
Rosenbrock
2,5,10,20
[−5, 10]n
0
Schwefel
2,5,10,20
[−500, 500]n
0
Shekel 5
4
[0, 10]4
−10.1531996790582
Shekel 7
4
[0, 10]4
−10.4029405668187
Schkel 10
4
[0, 10]4
−10.5364098166920
Shubert
2
[−10, 10]2
−186.730908831024
Sphere
2,5,10,20
[−4.1, 6.4]n
0
Sum squares
2,5,10,20
[−8, 12.5]n
0
Trid
6
[−36, 36]6
−50
Trid
10
[−100, 100]1 0
−200
Zakharov
2,5,10,20
[−5, 10]n
0
mation of function values is used to define the data profiles and (or) the performance profiles, see [18] for more details. In this subsection, we let μ f = 5000, which allows to compute at least 100 simplex gradients for the problem with maximal dimension (48). Because we care about the solutions
123
J Glob Optim
Fig. 3 Data profile (left) and performance profile (right) on the Hedar test set
with high accuracy, we set τ = 1e − 7. Figure 3 shows the obtained data profiles (left subfigure) and the performance profiles (right subfigure) for the Hedar test set. Roughly speaking, for both data profiles and performance profiles, the solver with upper left profile is better than that with lower right profile. From Fig. 3 we can see that, DIRECT-b is the best solver among these four algorihtms. RDIRECT-b performs worse than DIRECT-b but much better than DIRECT and RDIRECT. From the performance profiles we can see that, DIRECT-b solve less than 60 % problems with best efficiency, and finally DIRECT-b can solve about 80 % problems. RDIRECTb solve less than 40 % problems with best efficiency, and finally RDIRECT-b can solve about 74 % problems. Similarly, RDIRECT and DIRECT solve about 20 and 16 % problems with best efficiency, and finally RDIRECT-b and DIRECT can solve about 58 and 56 % problems, respectively. In other words, the performance difference between DIRECT-b and DIRECT is about 24 % (=80 − 56 %), and the performance difference between RDIRECT-b and RDIRECT is about 16 % (=74−58 %). Thus we can say that the bilevel strategy improves greatly the ability of DIRECT and RDIRECT to find solutions with high accuracy. It is interesting that RDIRECT-b performs worse than DIRECT-b. The reason maybe lies in the fact that RDIRECT sometimes degrades the performance for well scaled problem [15]. Through careful comparison, we find that RDIRECT-b performs well for all problems in Hedar’s set except four problems (Griewank, Rastrigin, Rosenbrock and Zakhrov). Because all these four problems have variable dimensions (2, 5, 10, 20), totally RDIRECT-b performs worse on 16 problems. This degrades its overall performance. In order to verify the efficiency of robust DIRECT algorithm [15], we let each problem be scaled additively by 1e5. Then we compare these four algorithms’ performance. Figure 4 shows the data profiles and the performance profiles. From Fig. 4, we can see that both RDIRECT-b and RDIRECT perform very well. RDIRECT-b can solve 90 % problems, RDIRECT can solve 70 % problems. On the other hand, DIRECT-b only solve 30 % problems, DIRECT solves 20 % problems.
123
J Glob Optim
Fig. 4 Data profile (left) and performance profile (right) on the Hedar test set (with scaling)
5 Conclusion and future work Inspired by the great succession of multilevel method in eliminating the smooth mode phenomenon, in this paper, we introduce a bilevel strategy to the original DIRECT algorithm and the RDIRECT algorithm which is insensitive to linear scaling of the objective function. The obtained DIRECT-b and RDIRECT-b algorithm are proved to be globally convergent. Numerical results show that the bilevel strategy significantly improve the ability of DIRECT and RDIRECT to search solutions with high accuracy. Our results in this paper provide further proves which show that the multilevel strategy proposed in our previous work [16] is really helpful in accelerating the partition-based algorithms’ convergence. An interesting property of the bilevel (or multilevel) strategy is that, such approach can accelerate the convergence in the framework of DIRECT itself. As a result, our approach needs no effort to select proper local optimizer. Moreover, our approach is easy to balance the (fine level) global search phase and the (coarse level) local search phase. The parameters are also insensitive relatively. Numerical results in this paper also support the conclusion proposed in [15] that, although RDIRECT can eliminate the sensitivity of linear scaling of the objective function, it sometimes degrade the performance on well scaled problems. Our work brings many possible improvements to other partition-based global optimization algorithm. As long as the bilevel or multilevel strategy is included properly, the obtained algorithms would have the ability of searching solutions with high accuracy. Acknowledgments We would like to thank two anonymous reviewers whose suggestions improve this paper greatly. We would also like to thank Doctor Finkel D.E. and Professor Kelley C.T. for their DIRECT codes and the Jones test set codes.
123
J Glob Optim
References 1. Androulakis, I.P., Maranas, C.D., Floudas, C.A.: αBB: a global optimization method for general constrained nonconvex problems. J. Glob. Optim. 7(4), 337–363 (1995) 2. Björkman, M., Holmström, K.: Global optimization using the DIRECT algorithm in Matlab. Adv. Model. Optim. 1, 17–37 (1999) 3. Briggs, W.L., Henson, V.E., McCormick, S.: A Multigrid Tutorial. SIAM, Philadelphia (2000) 4. Colson, B., Marcotte, P., Savard, G.: An overview of bilevel optimization. Ann. Oper. Res. 153, 235–256 (2007) 5. Dolan, E.D., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Program. 91, 201–213 (2002) 6. Finkel, D.E.: DIRECT optimization user guide. Center for Research and Scientific Coputation CRSCTR03-11, North Carolina State University, Raleigh (2003) 7. Finkel, D.E.: Global optimization with the DIRECT algorithm. Ph.D. Thesis, North Carolina State University (2005) 8. Finkel, D.E., Kelley, T.: Additive scaling and the DIRECT algorithm. J. Glob. Optim. 36, 597–608 (2006) 9. Gablonsky, J.M.: Modifications of the DIRECT algorithm. Ph.D. Thesis, North Carolina State University (2001) 10. Gablonsky, J.M., Kelley, C.T.: A locally-biased form of the DIRECT algorithm. J. Glob. Optim. 21, 27–37 (2001) 11. Hedar, A.: http://www-optima.amp.i.kyoto-u.ac.jp/member/student/hedar/Hedar_files/TestGO.htm 12. Huyer, W., Neumaier, A.: Global optimization by multilevel coordinate search. J. Glob. Optim. 14(4), 331–355 (1999) 13. Jones, D.R., Perttunen, C.D., Stuckman, B.E.: Lipschitzian optimization without the Lipschitz constant. J. Optim. Theory Appl. 79(1), 157–181 (1993) 14. Jones, D.R.: The DIRECT Global Optimization Algorithm, The Encyclopedia of Optimization. Kluwer Academic, Dordrecht (1999) 15. Liu Q.: Linear scaling and the DIRECT algorithm. J. Glob. Optim. 56(3), 1233–1245 (2013) 16. Liu, Q., Zeng J.: Global optimization by multilevel partition. J. Glob. Optim. (revised) 17. Liuzzi, G., Lucidi, S., Piccialli, V.: A DIRECT-based approach exploiting local minimizations for the solution of large-scale global optimization problems. Comput. Optim. Appl. 45(2), 353–375 (2010) 18. Moré, J., Wild, S.: Benchmarking derivative-free optimization algorithms. SIAM J. Optim. 20(1), 172–191 (2009) 19. Pardalos, P.M., Schoen, F.: Recent advances and trends in global optimization: deterministic and stochastic methods. In: DSI 1–2004. Proceedings of the Sixth International Conference on Foundations of ComputerAided Process Design, pp. 119–131 (2004) 20. Rios, L.M., Sahinidis, N.V.: Derivative-free optimization: a review of algorithms and comparison of software implementations. J. Glob. Optim. 56(3), 1247–1293 (2013) 21. Tawarmalani, M., Sahinidis, N.V.: A polyhedral branch-and-cut approach to global optimization. Math. Program. 103(2), 225–249 (2005) 22. Xu, J.: An Introduction to Multilevel Methods. Oxford University Press, Oxford (1997)
123