A modified DIRECT algorithm with bilevel partition

It has been pointed out by Jones D. R. that the DIRECT global optimization algorithm can quickly get close to the basin of the optimum but takes longe...

5 downloads 55 Views 447KB Size

Download PDF

J Glob Optim DOI 10.1007/s10898-013-0119-1

A modified DIRECT algorithm with bilevel partition Qunfeng Liu · Wanyou Cheng

Received: 8 February 2013 / Accepted: 17 October 2013 © Springer Science+Business Media New York 2013

Abstract It has been pointed out by Jones D. R. that the DIRECT global optimization algorithm can quickly get close to the basin of the optimum but takes longer to achieve a high degree of accuracy. In this paper, we introduce a bilevel strategy into a modifed DIRECT algorithm to overcome this shortcoming. The proposed algorithm is proved to be convergent globally. Numerical results show that the proposed algorithm is very promising. Keywords

Global optimization · Robust DIRECT algorithm · Bilevel method

1 Introduction In the mathematical modeling of real world systems for a broad range of applications, people often need to solve global optimization problems. The objective functions of many global optimization problems are multivariate and nonlinear, i.e., there are often multiple local minima. The global minimum, a point where the objective function obtains its smallest value, is often hard to find. Algorithms for solving global optimization problems can be classified into stochastic algorithms and deterministic algorithms [12,19]. Many of the former type use analogies to biology and physics to approach the global optimum [12]. The latter type includes the popular branch and bound algorithm and its variants [1,21] and the DIRECT algorithm [13]. In this paper, we consider the following bound constrained global optimization problem min f (x),

(1)

x∈Ω

This work was supported by MOE (Ministry of Education in China) Project of Humanities and Social Sciences (Project No. 13YJC630095) and NSF of China (Nos. 11271069, 11101081). Q. Liu (B) · W. Cheng School of Computing, Dongguan University of Technology, Dongguan 523808, China e-mail: [email protected] W. Cheng e-mail: [email protected]

123

J Glob Optim

where Ω = {x ∈ Rn | − ∞ < li ≤ xi ≤ u i < ∞, i = 1, . . . , n}. The objective function f (x) is just continuous in a neighborhood of the global minimum. We present a deterministic global optimization algorithm for problem (1), which can be regarded as an improvement of the original DIRECT algorithm [13]. Although the DIRECT algorithm only suit for small scale problems, it stimulates many interests due to its guaranteed convergence and easiness to implement [2,7,9,12,15]. Moreover, in a recent comparison of derivative-free algorithms, two algorithms based on DIRECT (MCS and TOMLAB/GLCCLUSTER) performed very well [20]. The DIRECT (DIviding RECTangle) algorithm proposed in [13] is a deterministic sampling global optimization algorithm for problem (1), where the dimension n is often small. In the initialization stage of the DIRECT algorithm, the original search space Ω is normalized into a unit hypercube. In every iteration of the DIRECT algorithm, there are two key processes. The first process is to identify some hyperrectangles which may contain optimal points, these hyperrectangles are called potential optimal hyperrectangles (see Definition 1 below). The second process is to sample inside these potential optimal hyperrectangles and divide them into smaller hyperrectangles. In [13], the set of centers of hyperrectangles created by the DIRECT algorithm is shown to be dense in the original search space Ω. Therefore, if the objective function is continuous in a neighborhood of the optimal minimum, then the DIRECT algorithm can sample deterministically a point near enough to the optimal minimum. In [7,8], numerical experiments were presented to show that the original DIRECT algorithm is sensitive to the additive scaling of the objective function. The modified DIRECTmedian algorithm was proposed in [8] to eliminate this sensitivity. In [15], theoretical results were proposed which show that the original DIRECT algorithm is sensitive not only the additive scaling but also the linear scaling of the objective function. In order to eliminate these sensitivities, the definition of the potential optimal hyperrectangle is changed slightly in [15]. Specifically, the definition of the minimally acceptable improvement is made insensitive to linear scaling. Then a class of DIRECT algorithms are proved to be robust in [15], which include DIRECT-median [8] and DIRECT-a [15] as special cases. We note that, in this paper, an algorithm is “robust” means that it is insensitive to linear scaling of the objective function. Although the robustness of the original DIRECT algorithm is obtained, there is another defect of the original DIRECT algorithm. It is pointed out in [14] that DIRECT suffers the eventually inefficient behavior: DIRECT quickly gets close to the basin of the optimum but takes much longer to achieve a high degree of accuracy. Actually, such defect is common for derivative-free optimization algorithms. A common strategy to eliminate such defect is to combine DIRECT with a good local optimizer, for example [17]. However, because local optimizers are often sensitive to the initial point, this strategy is often highly sensitive to the number of function evaluations consumed in the global phase. In this paper, we adopt bilevel strategy to improve the ability of DIRECT to achieve solutions with high degree of accuracy. In engineering and scientific computations, there are two different meanings about bilevel. The first is bilevel optimization: it employees an upperlevel problem and a lower-level problem, the value of a constraint or objective function in the upper-level problem is itself the result of solving a lower-level optimization problem [4]. The second is bilevel ( or multilevel ) methods, which are among the most efficient modern techniques for solving large scale algebraic systems arising from the discretization of partial differential equations (PDE) [3,22]. In this paper, “bilevel” possesses the latter meaning. The bilevel method is designed to overcome a so called “smooth mode phenomenon”, which tells the following fact that the residual is hard to eliminate if we only adopt a singlelevel algorithm to solve a linear system. The key idea in bilevel methods is to reduce the

123

J Glob Optim

residual of the fine level (large scale) algebraic system discretized from a PDE through solving a coarse level (small scale) algebraic system discretized from the same PDE. Such a strategy can eliminate the smooth mode phenomenon and is proved to be almost optimal [3,22]. Because the same algorithm can be adopted to solve both the fine level algebraic system and the coarse level algebraic system, multilevel methods can be easily obtained by recursing the bilevel methods [22]. In our previous work [16], we pointed out that the eventually inefficient behavior of partition-based algorithms (e.g., DIRECT) is similar to the smooth mode phenomenon. Inspired by the great succession of the multilevel methods in eliminating the smooth mode, we proposed in [16] a multilevel partition-based algorithm framework for global optimization problems. A DIRECT-based multilevel partition algorithm was proposed in [16]. Preliminary numerical results showed that the proposed multilevel really help to eliminate DIRECT’s eventually inefficient behavior. In this paper, we combine the multilevel partition strategy and the robust DIRECT algorithm. Our main purpose is to improve DIRECT’s ability to find solutions with high level of accuracy within reasonable computational cost. More precisely, we adopt a robust DIRECT (RDIRECT) algorithm to search not only in the unit hypercube normalized by the original search space (we call it the fine level search space) but also in the coarse level search space, which is a subset of the whole set of hyperrectangles created in the fine level. Such bilevel strategy allows to turn to search in the coarse level before the inefficient behavior comes in the fine level. We hope that the bilevel strategy can accelerate the convergence through searching in the coarse search space. Although it is easy to obtain a robust DIRECT algorithm with multilevel search, we only apply the bilevel method. The reason is that our preliminary numerical results show that multilevel strategy brings no clear improvement than the bilevel strategy. We call the obtained algorithm RDIRECT-b. We prove that RDIRECT-b is globally convergent. Numerical results proposed in this paper show that RDIRECT-b performs much better than the original DIRECT algorithm, and performs better than the RDIRECT algorithm, too. Although additive parameters are introduced in the RDIRECT-b algorithm, the default values of the two parameters seem to work simultaneously for most problems. In our numerical experiments, the bilevel DIRECT (DIRECT-b) algorithm is also compared. Our numerical results show that, when problem is well scaled, RDIRECT-b sometimes degrade the performance of DIRECT-b. However, then the problems is bad scaled, RDIRECTb performs much better than DIRECT-b. This conclusion is similar to that in [15]. We organize this paper as follows. In Sect. 2, we describe the RDIRECT-b algorithm. In Sect. 3, we prove the global convergence for the RDIRECT-b algorithm. We report numerical experiments in Sect. 4 and give some conclusions in the final section.

2 The RDIRECT-b algorithm In this section, we describe the RDIRECT-b algorithm in details. In every iteration of the RDIRECT-b algorithm, there are two levels search spaces, one is the fine level search space, another is the coarse level search space. The fine level search space is the whole set of hyperrectangles. In the beginning, there is only one unit hypercube in the fine level. We sample 2n points in the unit hypercube and compute the function values at these points. Specifically, if we denote c1 as the center point of the unit hypercube, then we sample c1 ± 13 ei , i = 1, . . . , n, where ei is the ith unit vector.

123

J Glob Optim

Fig. 1 Two dimensional example of sampling and division in the fine level

After the function is evaluated at these sampled point, the unit hypercube need to be divided into 2n + 1 smaller hyperrectangles, whose centers are the sampled points and the center of the original hypercube. We adopt the same strategy as that used in the original DIRECT algorithm to divide the hypercube. The key point is that the hypercube is divided into three equal parts along each dimension and dimensions with low function values are divided first. Figure 1 illustrates a two dimensional example of sampling and division in the first iteration in the fine level. After divided, the fine level has many smaller hyperrectangles. At the next step, we choose some potential optimal hyperrectangles to sample inside them and then divide them further. The potential optimal hyperrectangle is a hyperrectangle which may contain good, unsampled points. For the RDIRECT-b algorithm, the definition of potential optimal hyperrectangle is given as follows [15]. Definition 1 Given > 0 and the index set of the hyperrectangles I . Let f min be the known minimal function value, and ci , σi be the center and the size of the ith hyperrectangle. Denote f cc as any convex combination of the objective function values gathered during previous iterations by RDIRECT-b. If there is a constant γ > 0, such that the following inequalities hold, f (c j ) − γ σ j ≤ f (ci ) − γ σi , ∀i ∈ I

(2a)

f (c j ) − γ σ j ≤ f min − | f min − f cc |,

(2b)

then the jth hyperrectangle is a potential optimal hyperrectangle (POH). Remark 1 In the above definition, convex combination means that, for gathered function the K K , we select α ∈ [0, 1], k = 1, . . . , K such that values { f (xk )}k=1 k k=1 αk = 1 and then define f cc =

K

αk f (xk ).

(3)

k=1 K . In this paper, f cc equals the median value of the gathered functions values { f (xk )}k=1

Remark 2 If we substitute the condition (2b) with the following condition f (c j ) − γ σ j ≤ f min − | f min |,

(4)

then we obtain the definition of potential optimal hyperrectangle for the DIRECT-b algorithm.

123

J Glob Optim

Fig. 2 POHs can be identified by figure

Determining the set of POHs can be regarded as a problem of finding the extreme points on the lower right convex hull of a set of points in the plane. In [13], a special figure is designed to help to understand the Definition 1 and to identify the set of POHs. Figure 2 illustrates an example. In Fig. 2, every dot corresponds to a hyperrectangle. The horizontal coordinate is the size of hyperrectangle, and the vertical coordinate is the objective function value at the center of hyperrectangle. The three blue dots are the current POHs. The balance parameter prevents the dot with best function value from being a POH. After the current POHs identified, we sample 2n points inside each POH and divide them further. We summarize the sampling process and the dividing process into the following algorithm. Algorithm 1 Sampling and dividing algorithm Initialization: Let ξ be the maximal side length, and I be the set of dimensions of the hyperrectangle with length ξ . – Sampling: For all i ∈ I , evaluate the objective function at c ± 13 ξ ei , where c is the center point of the hyperrectangle. – Dividing: Let ωi = min{ f (c ± 13 ξ ei )}. Divide the hyperrectangle containing c into thirds along the dimensions in I , starting with the dimension with smallest ωi and continuing to the dimension with the largest ωi .

We execute N1 iterations of the RDIRECT algorithm in the fine level search space. In each iteration, we identify a set of POHs, then sample and divide inside them. After N1 iterations in the fine level search space, let S f be the set of hyperrectangles whose cardinality is denoted as |S f |. Then we select r0 |S f | smallest hyperrectangles in S f to institute the coarse level, where r0 ∈ (0, 1) is a constant. We execute N2 iterations of RDIRECT in the coarse level search space. After N2 iterations in the coarse level search space, let Sc be the set of hyperrectangles created in the coarse level. Then go back to the fine level, update the set S f := S f ∪ Sc , and then execute N3 iterations of RDIRECT.

123

J Glob Optim

Algorithm 2 RDIRECT-b Normalize the search space into a unit cube, let c1 be the center. Compute f (c1 ), f min = f (c1 ). Denote S f and Sc as the sets of hyperrectangles created in the fine level and in the coarse level, respectively. while(stopping conditions do not hold) do – For i = 1 : N1 % Sampling and dividing in the fine level – Identify all the POHs according to the Definition 1; – Sample and divide every POH by Algorithm 2, and update f min and S f . – Select r0 |S f | smallest hyperrectangles in S f , and let Sc denote the set of these selected hyperrectangles. – For i = 1 : N2 % Sampling and dividing in the coarse level • Identify all the POHs according to the Definition 1; • Sample and divide every POH by Algorithm 2, and update f min and Sc . – Update S f = S f ∪ Sc . Let Sc = [ ]. – For i = 1 : N3 % Sampling and dividing in the fine level – Identify all the POHs according to the Definition 1; – Sample and divide every POH by Algorithm 2, and update f min and S f .

We summarize RDIRECT-b algorithm as follows. Remark 3 If we let N2 = 0, then the RDIRECT-b algorithm becomes the RDIRECT algorithm. Remark 4 In step 2 in the RDIRECT-b algorithm, “r0 |S f | smallest hyperrectangles” are often undefined due to many hyperrectangles have the same size. There are different ways to define it in practice. In our implementation, we sort the rectangles lexicographically, first by size and then by function value, and then select the r0 smallest hyperrectangles. If there are some hyperrectangles with the same size and the same function values, then we select the top ones. Remark 5 If the condition (2b) in Definition 1 is replaced by the condition (4), then the RDIRECT-b algorithm becomes the DIRECT-b algorithm. The RDIRECT-b algorithm terminates when a user-supplied budget of function evaluations or iterations is exhausted.

3 Convergence analysis of the RDIRECT-b algorithm In this section, we assume that the search space Ω is the unit hypercube. This does not affect the results but can simplify the proof of the convergence. We let Ck be the set of points that have been sampled after k iterations with RDIRECT-b. Here, an iteration of RDIRECT-b includes N1 + N3 iterations of RDIRECT in the fine level and N2 iterations of RDIRECT in the coarse level. Because the RDIRECT-b algorithm will not terminate before the budget of function evaluations is exhausted. In this section, we let the budget of function evaluations be infinite, and denote C = ∪k Ck . We will show that the set C is dense in the unit hypercube, and if f is continuous in the neighborhood of the global minimum, then the RDIRECT-b algorithm can converge to the globally optimal function value. Theorem 1 Given any point x in the unit hypercube and any δ > 0, there is a K ∈ N, such that for any k > K , there is a y ∈ Ck , such that the following inequality holds, |y − x| < δ.

123

(5)

J Glob Optim

Proof We prove by contradiction. Assume that there is a point x ∈ Ω and a constant δ > 0, for any K ∈ N, there is a k > K , such that |y − x| ≥ δ, ∀y ∈ Ck . Denote Bδ (x) = {z| |z − x| < δ}, then Bδ (x) ∩ Ck = ∅, ∀k ∈ N.

(6)

Denote H the smallest hyperrectangle which contains Bδ (x) after K 1 iterations, then H will be divided in finite iterations [9]. Suppose that H is divided in the K 2 th (K 2 > K 1 ) iteration. If there is a point in Bδ (x) be sampled during the K 2 th iteration, then we find a contradiction with (6). Otherwise, denote H the smallest hyperrectangle which contains Bδ (x) after K 2 iterations. Similarly, H will be divided in finite iterations. Repeat the same process, we can show that the RDIRECT-b algorithm will sample at least one point in Bδ (x) in finite iterations, which contradicts (6). This end the proof. Theorem 1 shows that the set of points sampled by the RDIRECT-b algorithm is dense in Ω. The following result is a straightforward corollary of Theorem 1. Corollary 1 Assume that f is continuous in the neighborhood of the global minimum x ∗ , then for any δ > 0, the RDIRECT-b algorithm can sample a point y, such that the following inequality holds, | f (y) − f (x ∗ )| < δ.

(7)

Remark 6 Comparing Theorem 1 in this paper with the convergence result in [13], we can find that the sampling and dividing process in the fine level plays a different role from that in the coarse level. The former ensures the convergence and the latter can accelerate the convergence, which is supported by the following numerical results.

4 Numerical experiments In this section, we report some numerical results. First, we report numerical results on the Jones test set, which is used to test the performance of the original DIRECT algorithm [13] and its variants [2,7,8,10,15,16]. Second, we exam the sensitivity of parameters. Third, we adopt the performance profiles [5] and data profiles [18] to compare the proposed RDIRECT-b and other algorithms on an extended test set. 4.1 Test on the Jones test set In this subsection, we compare the following algorithms: – DIRECT: The original DIRECT algorithm proposed in [6,13]. – RDIRECT: The DIRECT-median proposed in [8], see [15] for the proof of its insensitivity to linear scaling of the objective function. – DIRECT-b: The bilevel form of the original DIRECT algorithm. It is the same as the GOMP-T algorithm proposed in [16]. – RDIRECT-b: The bilevel form of the RDIRECT algorithm.

123

J Glob Optim Table 1 Key characteristics of the Jones test set Function

Abbreviation

Problem dimension

Number of local minima

Number of global minima

Shekel 5

S5

4

5

1

Shekel 7

S7

4

7

1

Schkel 10

S10

4

10

1

Hartman 3

H3

3

4

1

Hartman 6

H6

6

4

1

Branin RCOS

BR

2

3

3

Goldstein and Price

GP

2

4

1

Six-Hump Camel

C6

2

6

2

2D Shubert

SHU

2

760

18

In our experiments, we use the codes of DIRECT written by Finkel in Matlab. The codes can be downloaded from the following web site: http://www4.ncsu.edu/~ctk/Finkel_Direct/, see [6] for its user-guide. We use the following parameters to implement the belevel strategy N1 = 3, N2 = 1, N3 = 1, r0 = 0.1.

(8)

That is to say, we firstly implement 3 iterations of RDIRECT (or DIRECT for DIRECT-b) in the fine level, select 10 % of the obtained hyperrectangles (see Remark 4 for details) and then go to the coarse level. We implement only 1 iteration of RDIRECT (or DIRECT) in the coarse level, and then go back into the fine level. In the fine level, we implement another 1 iteration of RDIRECT (or DIRECT). This constitutes one iteration of the RDIRECT-b (or DIRECT-b) algorithm. In this subsection, the test problems are the Jones test set used in [2,8,10,13,15]. The Jones test set contains nine test problems, Table 1 shows the key characteristics of the set of problems, where the number of local optima includes the number of global optima. In our experiments, the efficiency is measured as the number of function evaluations needed for convergence, and we use the same definition of convergence as in [13]. Specifically, the convergence is defined in terms of percent error from the globally optimal function value (denoted by f global ). Let f min denote the found best function value, then the percent error is defined by f min − f global < perror. | f global |

(9)

In our experiments, perror = 1e − 4, 1e − 5, 1e − 6, 1e − 7, 1e − 8, 1e − 9, respectively. Table 2 shows the numerical results, where ‘–’ means that the convergence criterion (9) is not satisfied within 500,000 function evaluations. The number with * is different from the result (285) reported in [7,13]. As we can see from Table 2, the proposed RDIRECT-b algorithm performs very well. First, RDIRECT-b solves all the 54 problem/tolerance combinations. However, the RDIRECT algorithm fails in one case (H6, perror = 1e−9), DIRECT-b fails in 10 cases, and the original DIRECT algorithm fails in 12 cases.

123

J Glob Optim Table 2 Numerical results on the Jones test set Prob.

Solver

perror 1e − 4

S5

S7

S10

H3

H6

BR

GP

C6

1e − 6

1e − 7

1e − 8

1e − 9

DIRECT

155

255

255

53,525

–

–

DIRECT-b

159

265

265

18,043

–

–

RDIRECT

155

255

255

513

777

1,217

RDIRECT-b

159

251

251

353

551

1,031

DIRECT

145

1,061

4,879

38,167

–

–

DIRECT-b

157

321

2,543

3,523

–

–

RDIRECT

145

255

331

373

949

1,665

RDIRECT-b

157

255

325

347

759

989

DIRECT

145

1,131

4,939

–

–

–

DIRECT-b

157

321

2,385

265,199

–

–

RDIRECT

145

255

331

565

1,019

2,115

RDIRECT-b

157

255

325

465

757

979

DIRECT

199

695

751

775

775

775

DIRECT-b

173

753

809

843

843

843

RDIRECT

199

459

773

815

815

815

RDIRECT-b

173

497

853

895

895

895

DIRECT

571

1,031

182,623

–

–

–

DIRECT-b

559

959

107,329

–

–

–

RDIRECT

571

915

1,191

1,681

4,293

173,517

RDIRECT-b

559

877

1,209

2,049

3,473

6,027

DIRECT

195

195

377

1,295

38,455

75,713

DIRECT-b

159

159

431

1,167

14,777

65,155

RDIRECT

259

333

393

1,205

5,079

6,897

RDIRECT-b

181

181

287

519

923

1,399

DIRECT

191

241

305

1,479

10,437

–

DIRECT-b

175

229

271

1,217

5,119

40,963

RDIRECT

191

321

535

1,113

3,395

11,337

RDIRECT-b

175

235

373

537

895

1,303

DIRECT

145∗

211

211

211

44,537

203,301

DIRECT-b

115

115

115

115

38,847

62,747

RDIRECT

145

213

213

213

1,365

2,903

RDIRECT-b SHU

1e − 5

115

115

115

115

533

533

DIRECT

2,967

3,143

3,867

15,915

68,667

–

DIRECT-b

2,837

3,025

3,789

16,505

27,463

–

RDIRECT

3,663

3,839

4,407

6,671

10,955

15,111

RDIRECT-b

3,501

3,641

4,259

6,571

11,111

12,989

Second, RDIRECT-b solve most (40) of the 54 problem/tolerance combinations with best efficiency. Especially, when the accuracy is high (perror is small), RDIRECT-b needs much smaller cost than those of the other 3 algorithms. Even when the accuracy is low, RDIRECTb performs not bad. For example, when perror = 1e − 4, RDIRECT-b still solve 4 (of 9) problem/tolerance combinations with best efficiency.

123

J Glob Optim

Moreover, we can see from Table 2 that the DIRECT-b algorithm performs better than the original DIRECT algorithm, too. 4.2 Test parameters’ sensitivity In this subsection, we change the parameters combinations to test its sensitivity. We explore r0 = 0.05, 0.10, 0.25, and (N1 + N 3) : N2 = 2, 4, 6. As a result, we have nine combinations. Specifically, we compare the following nine RDIRECT-b algorithms with the RDIRECT algorithm. – – – – – – – – –

RDIRECT-b1: RDIRECT-b with RDIRECT-b2: RDIRECT-b with RDIRECT-b3: RDIRECT-b with RDIRECT-b4: RDIRECT-b with RDIRECT-b5: RDIRECT-b with RDIRECT-b6: RDIRECT-b with RDIRECT-b7: RDIRECT-b with RDIRECT-b8: RDIRECT-b with RDIRECT-b9: RDIRECT-b with

N1 N1 N1 N1 N1 N1 N1 N1 N1

= 1, N2 = 1, N2 = 1, N2 = 3, N2 = 3, N2 = 3, N2 = 5, N2 = 5, N2 = 5, N2

= 1, N3 = 1, N3 = 1, N3 = 1, N3 = 1, N3 = 1, N3 = 1, N3 = 1, N3 = 1, N3

= 1, η = 0.05; = 1, η = 0.10; = 1, η = 0.25; = 1, η = 0.05; = 1, η = 0.10; = 1, η = 0.25; = 1, η = 0.05; = 1, η = 0.10; = 1, η = 0.25;

In this subsection, we adopt the same convergence criterion (9) to measure the efficiency of these algorithms on the Jones test set. The results are listed in Tables 3 and 4, where ‘–’ means that the convergence criterion (9) is not satisfied within 25,000 function evaluations. From Tables 3 and 4 we can see that, RDIRECT-b performs better than RDIRECT for all the nine problems except the H3 problem and the SHU problem. For the H3 problem, it seems too easy for both RDIRECT and RDIRECT-b, thus the bilevel search strategy possesses no clear advantage. For the SHU problem, both RDIRECT-b and RDIRECT have similar performance. From Tables 3 and 4 we can see that, among the 81 algorithm/problem combinations (9 RDIRECT-b algorithms and 9 problems), RDIRECT-b algorithms perform significantly better than RDIRECT in 56 combinations (with bold), perform with similar performance at least in 11 combinations (1 for S5, 8 for H3 and 2 for SHU), while perform worse at most in 14 combinations. Table 5 shows the number of problems on which RDIRECT-b performs significantly better than RDIRECT and the number of problems on which RDIRECT-b does not perform better than RDIRECT. The results show that, RDIRECT-b2 and RDIRECT-b3 perform worse. The reason maybe lies in the following two facts: first, N1 is too small to locate a good convergence basin of global minimum; second, r0 = 0.1, 0.25 makes the search locally. As a result, both RDIRECT-b2 and RDIRECT-b3 pay too much attention to not good (coarse level) regions. If we exclude RDIRECT-b2 and RDIRECT-b3, all other RDIRECT-b algorithms perform significantly better than RDIRECT. In this sense we say that, the parameters in RDIRECT-b are insensitive. 4.3 Extended numerical results In this subsection, we compare the performances of DIRECT, DIRECT-b, RDIRECT and RDIRECT-b on the Hedar test set [11], which is largely extended from the Jones test set. The Hedar test set has 27 functions, some of them have several variants (e.g., Bohachevsky, Hartman, Trid, etc.), some of them have variable dimensions. Table 6 shows the function

123

J Glob Optim Table 3 Numerical results on testing parameters sensitivity Prob.

Solver

perror 1e − 4

S5

S7

S10

H3

1e − 5

1e − 6

1e − 7

1e − 8

1e − 9

RDIRECT

155

255

255

513

777

1,217

RDIRECT-b1

171

255

275

383

523

793

RDIRECT-b2

–

–

–

–

–

–

RDIRECT-b3

187

283

283

457

581

765

RDIRECT-b4

171

245

245

381

515

831

RDIRECT-b5

159

251

251

353

551

1,031

RDIRECT-b6

173

259

259

385

615

885

RDIRECT-b7

161

247

247

401

663

1,009

RDIRECT-b8

165

251

251

415

675

1,087

RDIRECT-b9

179

299

299

487

759

1,259

RDIRECT

145

255

331

373

949

1,665

RDIRECT-b1

167

261

317

349

739

981

RDIRECT-b2

–

–

–

–

–

–

RDIRECT-b3

–

–

–

–

–

–

RDIRECT-b4

–

–

–

–

–

–

RDIRECT-b5

157

255

325

347

759

989

RDIRECT-b6

4,475

4,989

6,123

6,269

7,769

8,363

RDIRECT-b7

151

261

315

339

823

1,391

RDIRECT-b8

159

255

315

333

825

1,167

RDIRECT-b9

157

277

341

361

885

1,247

RDIRECT

145

255

331

565

1,019

2,115 801

RDIRECT-b1

169

263

319

407

661

RDIRECT-b2

–

–

–

–

–

–

RDIRECT-b3

3,145

3,743

5,735

8,879

10,779

13,915

RDIRECT-b4

–

–

–

–

–

–

RDIRECT-b5

157

255

325

465

757

979

RDIRECT-b6

2,241

2,525

4,569

5,173

5,989

7,187

RDIRECT-b7

151

247

295

451

735

1,031

RDIRECT-b8

159

255

313

513

793

1,489

RDIRECT-b9

157

277

341

543

831

1,263

RDIRECT

199

459

773

815

815

815

RDIRECT-b1

243

411

917

959

959

959

RDIRECT-b2

123

435

937

979

979

979

RDIRECT-b3

175

335

1,087

1,137

1,137

1,137

RDIRECT-b4

777

859

879

921

921

921

RDIRECT-b5

173

497

853

895

895

895

RDIRECT-b6

185

421

951

959

959

959

RDIRECT-b7

189

457

823

865

865

865

RDIRECT-b8

193

455

825

861

861

861

RDIRECT-b9

189

449

889

929

929

929

123

J Glob Optim Table 4 Numerical results on testing parameters sensitivity (continued) Prob.

Solver

perror 1e − 4

H6

C6

1e − 7

1e − 8

1e − 9

RDIRECT

571

915

1,191

1,681

4,293

−

687

971

1,151

1,423

1,925

3,141

787

1,219

1,435

1,757

2,287

3,625

6,993

7,339

7,641

14,317

17,265

21,125

RDIRECT-b4

593

923

1,179

1,513

2,409

5,021

RDIRECT-b5

559

877

1,143

1,461

2,535

5,295

RDIRECT-b6

541

871

1,131

1,529

2,375

5,153

RDIRECT-b7

593

891

1,113

1,561

2,699

6,973

RDIRECT-b8

553

905

1,161

1,533

2,865

7,051

RDIRECT-b9

521

845

1,119

1,499

3,081

6,883

RDIRECT

259

333

393

1,205

5,079

6,897

RDIRECT-b1

181

181

305

423

627

627

RDIRECT-b2

151

151

251

337

539

539

RDIRECT-b3

155

155

289

339

537

537

RDIRECT-b4

199

199

383

647

901

1,533

RDIRECT-b5

181

181

287

519

923

1,399

RDIRECT-b6

183

183

333

591

997

1,909

RDIRECT-b7

245

245

383

593

977

1,721

RDIRECT-b8

253

253

391

609

1,005

2,047

RDIRECT-b9

269

269

375

637

1,041

2,763

RDIRECT

191

321

535

1,113

3,395

11,337

RDIRECT-b1

173

233

355

521

697

881

RDIRECT-b2

177

245

367

535

723

917

RDIRECT-b3

137

181

251

431

645

871

RDIRECT-b4

175

227

355

673

1,033

1,433

RDIRECT-b5

175

235

373

537

895

1,303

RDIRECT-b6

163

249

401

573

955

1,379

RDIRECT-b7

175

273

483

723

1,263

1,853

RDIRECT-b8

181

289

471

771

1,335

1,931

RDIRECT-b9

181

299

521

797

1,381

1,993

RDIRECT

145

213

213

213

1,365

2,903

RDIRECT-b1

117

237

237

237

623

623

RDIRECT-b2

105

245

245

245

485

485

RDIRECT-b3

107

139

139

139

397

397

RDIRECT-b4

129

227

227

227

637

637

RDIRECT-b5

115

115

115

115

533

533

RDIRECT-b6

119

119

119

119

567

567

RDIRECT-b7

125

223

223

223

951

951

RDIRECT-b8

125

225

225

225

719

719

RDIRECT-b9

131

205

205

205

761

761

RDIRECT-b3

GP

1e − 6

RDIRECT-b1 RDIRECT-b2

BR

1e − 5

123

J Glob Optim Table 4 continued Prob.

SHU

Solver

perror 1e − 4

1e − 5

1e − 6

1e − 7

1e − 8

1e − 9

RDIRECT

3, 663

3, 839

4, 407

6, 671

10, 955

RDIRECT-b1

3, 131

3, 271

3, 903

6, 039

10, 069

15, 111 13, 309

RDIRECT-b2

5, 063

5, 251

5, 951

9, 735

16, 055

24, 617

RDIRECT-b3

2, 367

2, 511

2, 815

3, 975

7, 067

7, 659

RDIRECT-b4

3, 129

3, 273

3, 917

6, 029

10, 049

13, 085 12, 989

RDIRECT-b5

3, 501

3, 641

4, 259

6, 571

11, 111

RDIRECT-b6

4, 695

4, 883

5, 579

8, 279

14, 427

17, 255

RDIRECT-b7

3, 939

4, 123

4, 795

7, 063

12, 103

16, 915

RDIRECT-b8

3, 969

4, 149

4, 825

7, 105

11, 393

16, 469

RDIRECT-b9

4, 391

4, 575

5, 263

7, 703

12, 939

15, 967

Table 5 Number of problems on which RDIRECT-b performs better (or not better) than RDIRECT Algorithm

RDIRECT-b1

RDIRECT-b2

RDIRECT-b3

RDIRECT-b4

RDIRECT-b5

Better

8

4

5

6

8

Not better

1

5

4

3

1

Algorithm

RDIRECT-b6

RDIRECT-b7

RDIRECT-b8

RDIRECT-b9

total

Better

5

7

7

6

56

Not better

4

2

2

3

25

name, dimension, search domain and the known minimal function value. Totally, we test 64 problems, the maximal dimension is 48. We note that the search domains are modified for some problems (Bohachevsky, Griewank, Matyas, Rastrigin, Sphere, Sum square) in order to void the global minimum lies in the centroid of the feasible region. Specifically, for each dimension, the lower bound is multiplied 0.8 while the upper bound is multiplied 1.25. Both DIRECT-b and RDIRECT-b adopt the same parameters as those listed in (8). We also adopt the condition (9) to measure the efficiency. In the case that f global = 0, we require the following condition holds f min < perror.

(10)

Because the test set is large, we adopt the data profiles [18] and the performance profiles [5,18] to exhibit the numerical results. Data profiles is a tool for analyzing the performance of derivative-free optimization solvers [18]. Given a budget of function evaluations μ f , this technique memorize the best function values until the following convergence test satisfied f (x0 ) − f (x) ≥ (1 − τ )( f (x0 ) − f L ),

(11)

where x0 is the starting point, τ > 0 is a tolerance and f L is the smallest objective function value obtained by any solver within μ f of function evaluations. Then the memorized infor-

123

J Glob Optim Table 6 Key characteristics of the Hedar test set Problem

n

Ω

f (x ∗ )

Ackley

2,5,10,20

[−15, 30]n

0

Beale

2

[−4.5, 4.5]2

0

Bohachevsky 1

2

[−80, 125]2

0

Bohachevsky 2

2

[−80, 125]2

0

Bohachevsky 3

2

[−80, 125]2

0

Booth

2

[−100, 100]2

0

Branin

2

[−5, 10] ∗ [0, 15]

0.397887357729739

Colville

4

[−10, 10]4

0

Dixson Price

2,5,10,20

[−10, 10]n

0

Easom

2

[−100, 100]2

−1

Goldstein and Price

2

[−2, 2]2

3

Griewank

2,5,10,20

[−480, 750]n

0

Hartman 3

3

[0, 1]3

−3.86278214782076

Hartman 6

6

[0, 1]6

−3.32236801141551

Hump

2

[−5, 5]2

0

Levy

2,5,10,20

[−10, 10]n

0

Matyas

2

[−8, 12.5]2

0

Michalewics

2

[0, π ]2

−1.80130341008983

Michalewics

5

[0, π ]5

−4.687658179

Michalewics

10

[0, π ]10

−9.66015

Perm

4

[−4, 4]4

0

Powell

4,12,24,48

[−4, 5]n

0

Power sum

4

[0, 4]4

0

Rastrigin

2,5,10,20

[−4.1, 6.4]n

0

Rosenbrock

2,5,10,20

[−5, 10]n

0

Schwefel

2,5,10,20

[−500, 500]n

0

Shekel 5

4

[0, 10]4

−10.1531996790582

Shekel 7

4

[0, 10]4

−10.4029405668187

Schkel 10

4

[0, 10]4

−10.5364098166920

Shubert

2

[−10, 10]2

−186.730908831024

Sphere

2,5,10,20

[−4.1, 6.4]n

0

Sum squares

2,5,10,20

[−8, 12.5]n

0

Trid

6

[−36, 36]6

−50

Trid

10

[−100, 100]1 0

−200

Zakharov

2,5,10,20

[−5, 10]n

0

mation of function values is used to define the data profiles and (or) the performance profiles, see [18] for more details. In this subsection, we let μ f = 5000, which allows to compute at least 100 simplex gradients for the problem with maximal dimension (48). Because we care about the solutions

123

J Glob Optim

Fig. 3 Data profile (left) and performance profile (right) on the Hedar test set

with high accuracy, we set τ = 1e − 7. Figure 3 shows the obtained data profiles (left subfigure) and the performance profiles (right subfigure) for the Hedar test set. Roughly speaking, for both data profiles and performance profiles, the solver with upper left profile is better than that with lower right profile. From Fig. 3 we can see that, DIRECT-b is the best solver among these four algorihtms. RDIRECT-b performs worse than DIRECT-b but much better than DIRECT and RDIRECT. From the performance profiles we can see that, DIRECT-b solve less than 60 % problems with best efficiency, and finally DIRECT-b can solve about 80 % problems. RDIRECTb solve less than 40 % problems with best efficiency, and finally RDIRECT-b can solve about 74 % problems. Similarly, RDIRECT and DIRECT solve about 20 and 16 % problems with best efficiency, and finally RDIRECT-b and DIRECT can solve about 58 and 56 % problems, respectively. In other words, the performance difference between DIRECT-b and DIRECT is about 24 % (=80 − 56 %), and the performance difference between RDIRECT-b and RDIRECT is about 16 % (=74−58 %). Thus we can say that the bilevel strategy improves greatly the ability of DIRECT and RDIRECT to find solutions with high accuracy. It is interesting that RDIRECT-b performs worse than DIRECT-b. The reason maybe lies in the fact that RDIRECT sometimes degrades the performance for well scaled problem [15]. Through careful comparison, we find that RDIRECT-b performs well for all problems in Hedar’s set except four problems (Griewank, Rastrigin, Rosenbrock and Zakhrov). Because all these four problems have variable dimensions (2, 5, 10, 20), totally RDIRECT-b performs worse on 16 problems. This degrades its overall performance. In order to verify the efficiency of robust DIRECT algorithm [15], we let each problem be scaled additively by 1e5. Then we compare these four algorithms’ performance. Figure 4 shows the data profiles and the performance profiles. From Fig. 4, we can see that both RDIRECT-b and RDIRECT perform very well. RDIRECT-b can solve 90 % problems, RDIRECT can solve 70 % problems. On the other hand, DIRECT-b only solve 30 % problems, DIRECT solves 20 % problems.

123

J Glob Optim

Fig. 4 Data profile (left) and performance profile (right) on the Hedar test set (with scaling)

5 Conclusion and future work Inspired by the great succession of multilevel method in eliminating the smooth mode phenomenon, in this paper, we introduce a bilevel strategy to the original DIRECT algorithm and the RDIRECT algorithm which is insensitive to linear scaling of the objective function. The obtained DIRECT-b and RDIRECT-b algorithm are proved to be globally convergent. Numerical results show that the bilevel strategy significantly improve the ability of DIRECT and RDIRECT to search solutions with high accuracy. Our results in this paper provide further proves which show that the multilevel strategy proposed in our previous work [16] is really helpful in accelerating the partition-based algorithms’ convergence. An interesting property of the bilevel (or multilevel) strategy is that, such approach can accelerate the convergence in the framework of DIRECT itself. As a result, our approach needs no effort to select proper local optimizer. Moreover, our approach is easy to balance the (fine level) global search phase and the (coarse level) local search phase. The parameters are also insensitive relatively. Numerical results in this paper also support the conclusion proposed in [15] that, although RDIRECT can eliminate the sensitivity of linear scaling of the objective function, it sometimes degrade the performance on well scaled problems. Our work brings many possible improvements to other partition-based global optimization algorithm. As long as the bilevel or multilevel strategy is included properly, the obtained algorithms would have the ability of searching solutions with high accuracy. Acknowledgments We would like to thank two anonymous reviewers whose suggestions improve this paper greatly. We would also like to thank Doctor Finkel D.E. and Professor Kelley C.T. for their DIRECT codes and the Jones test set codes.

123

J Glob Optim

References 1. Androulakis, I.P., Maranas, C.D., Floudas, C.A.: αBB: a global optimization method for general constrained nonconvex problems. J. Glob. Optim. 7(4), 337–363 (1995) 2. Björkman, M., Holmström, K.: Global optimization using the DIRECT algorithm in Matlab. Adv. Model. Optim. 1, 17–37 (1999) 3. Briggs, W.L., Henson, V.E., McCormick, S.: A Multigrid Tutorial. SIAM, Philadelphia (2000) 4. Colson, B., Marcotte, P., Savard, G.: An overview of bilevel optimization. Ann. Oper. Res. 153, 235–256 (2007) 5. Dolan, E.D., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Program. 91, 201–213 (2002) 6. Finkel, D.E.: DIRECT optimization user guide. Center for Research and Scientific Coputation CRSCTR03-11, North Carolina State University, Raleigh (2003) 7. Finkel, D.E.: Global optimization with the DIRECT algorithm. Ph.D. Thesis, North Carolina State University (2005) 8. Finkel, D.E., Kelley, T.: Additive scaling and the DIRECT algorithm. J. Glob. Optim. 36, 597–608 (2006) 9. Gablonsky, J.M.: Modifications of the DIRECT algorithm. Ph.D. Thesis, North Carolina State University (2001) 10. Gablonsky, J.M., Kelley, C.T.: A locally-biased form of the DIRECT algorithm. J. Glob. Optim. 21, 27–37 (2001) 11. Hedar, A.: http://www-optima.amp.i.kyoto-u.ac.jp/member/student/hedar/Hedar_files/TestGO.htm 12. Huyer, W., Neumaier, A.: Global optimization by multilevel coordinate search. J. Glob. Optim. 14(4), 331–355 (1999) 13. Jones, D.R., Perttunen, C.D., Stuckman, B.E.: Lipschitzian optimization without the Lipschitz constant. J. Optim. Theory Appl. 79(1), 157–181 (1993) 14. Jones, D.R.: The DIRECT Global Optimization Algorithm, The Encyclopedia of Optimization. Kluwer Academic, Dordrecht (1999) 15. Liu Q.: Linear scaling and the DIRECT algorithm. J. Glob. Optim. 56(3), 1233–1245 (2013) 16. Liu, Q., Zeng J.: Global optimization by multilevel partition. J. Glob. Optim. (revised) 17. Liuzzi, G., Lucidi, S., Piccialli, V.: A DIRECT-based approach exploiting local minimizations for the solution of large-scale global optimization problems. Comput. Optim. Appl. 45(2), 353–375 (2010) 18. Moré, J., Wild, S.: Benchmarking derivative-free optimization algorithms. SIAM J. Optim. 20(1), 172–191 (2009) 19. Pardalos, P.M., Schoen, F.: Recent advances and trends in global optimization: deterministic and stochastic methods. In: DSI 1–2004. Proceedings of the Sixth International Conference on Foundations of ComputerAided Process Design, pp. 119–131 (2004) 20. Rios, L.M., Sahinidis, N.V.: Derivative-free optimization: a review of algorithms and comparison of software implementations. J. Glob. Optim. 56(3), 1247–1293 (2013) 21. Tawarmalani, M., Sahinidis, N.V.: A polyhedral branch-and-cut approach to global optimization. Math. Program. 103(2), 225–249 (2005) 22. Xu, J.: An Introduction to Multilevel Methods. Oxford University Press, Oxford (1997)

123

A modified DIRECT algorithm with bilevel partition

Recommend Documents