Ann Oper Res (2008) 164: 167–191 DOI 10.1007/s10479-008-0401-7
A hybrid algorithm for nonlinear minimax problems Fusheng Wang · Kecun Zhang
Published online: 26 July 2008 © Springer Science+Business Media, LLC 2008
Abstract In this paper, a hybrid algorithm for solving finite minimax problem is presented. In the algorithm, we combine the trust-region methods with the line-search methods and curve-search methods. By means of this hybrid technique, the algorithm, according to the specific situation at each iteration, can adaptively performs the trust-region step, line-search step or curve-search step, so as to avoid possibly solving the trust-region subproblems many times, and make better use of the advantages of different methods. Moreover, we use secondorder correction step to circumvent the difficulties of the Maratos effect occurred in the nonsmooth optimization. Under mild conditions, we prove that the new algorithm is of global convergence and locally superlinear convergence. The preliminary experiments show that the new algorithm performs efficiently. Keywords Trust-region methods · Hybrid technique · Second order correction · Finite minimax problem
1 Introduction In this paper, we study the finite minimax problem of the form (P ):
min max fi (x),
x∈Rn 1≤i≤m
(1.1)
where fi : Rn → R are twice continuously differentiable. Let φ(x) = max fi (x). 1≤i≤m
(1.2)
Obviously, the objective function φ(x) is not necessarily differentiable. Therefore, (1.1) is, in nature, a nondifferentiable optimization problem. Many problems of interest in real F. Wang () · K. Zhang Faculty of Science, Xi’an Jiaotong University, 710049 Xi’an, People’s Republic of China e-mail:
[email protected]
168
Ann Oper Res (2008) 164: 167–191
world applications can be modeled as above finite minimax problems (1.1). This class of problems occur, for instance, in curve fitting, 1 and ∞ approximation problems, systems of nonlinear equations, problems of finding feasible points of systems of inequalities, nonlinear programming problems, multiobjective problems, engineering design, optimal control and many other situations. The finite minimax problem is a very important class of nonsmooth optimization problems, which has attracted attention from more and more researchers (see Overton 1982; Murray and Overton 1980; Du 1995; Polak et al. 1991; Ploak 1989; Fletcher 1982; Yuan 1985; Zhou and Tits 1993; Gaudioso and Monaco 1982; Charalambous and Conn 1978; Han 1981; Vardi 1992; Luksan 1986; Shen and Wang 2005; Di Pillo et al. 1993; Dem’yanov and Malozemov 1974; Xu 2001; Gigola and Gomez 1990; Li and Fang 1997; Yi 2002, 2004; Jian et al. 2007; Watson 1979; Conn et al. 2000). Many algorithms have been developed for past decades, they can reduce to two classes. On the one hand, problem (1.1) can be regarded as an unconstrained nondifferentiable optimization problem, and the necessary condition for a point, x ∗ say, to be a solution of problem (1.1) is that 0 ∈ conv{∇fi (x ∗ ): i ∈ IA (x ∗ )},
(1.3)
where IA (x ∗ ) = {i : fi (x ∗ ) = φ(x ∗ ). More especially, when φ(x) is convex, this condition is also sufficient. So one can use the methods of solving general nondifferentiable optimization problems, such as subgradient methods, bundle methods and cutting plane methods to solve problem (1.1) (see Overton 1982; Ploak 1989; Gaudioso and Monaco 1982; Shen and Wang 2005). It is known that these techniques can be proved, at most, to be linearly convergent (Di Pillo et al. 1993). On the other hand, in view of the particular structure of its nondifferentiability, it is also suitable to make use of smooth optimization methods. At present, smooth methods mainly use three classes of techniques: Firstly, it is so called regularization techniques based on approximating the maximum function φ(x) via smooth functions with a parameter (see Xu 2001; Gigola and Gomez 1990; Li and Fang 1997). However, this methods need to solve a nonlinear optimization problem at each iteration to generate a new approximation solution. Secondly, it is based on the following well-known fact that the problem (1.1) is equivalent to a nonlinear smooth optimization problem on the n + 1 variables (x, z) min z, (x,z)∈Rn+1 (1.4) (EQNP): s.t. fi (x) − z ≤ 0, i = 1, . . . , m, where x ∈ Rn and z ∈ R. Though we have many successful algorithms to solve this equivalent problem, such as SQP methods, there are some drawbacks to directly solve problem (1.4) instead of problem (1.1), since some properties of the primal problem may be thrown away (Di Pillo et al. 1993). Nevertheless, one can still make use of the model of problem (1.4) construct some subproblems to produce a descent direction, and make use of the Kuhn-Tucker optimality conditions of problem (1.4) exploit the solution of original problem (1.1) (see Polak et al. 1991; Charalambous and Conn 1978; Han 1981). In addition, Di Pillo et al. (1993) proposed an approach which reduces to solving an unconstrained nonlinear programming problem in which its objective function is formulated by a continuously differentiable exact penalty function. This algorithm is of good properties of both global convergence and locally superlinear convergence. However, it needs to choose the suitable penalty parameters in each iteration in implementation. In history, there has ever been a problem that it’s difficult to generate a descent direction for nonsmooth optimization other than that for smooth optimization. For this reason, one
Ann Oper Res (2008) 164: 167–191
169
is always trying to overcome it, which causes that most of existing algorithms on minimax problem are line search rather than trust region based (see Overton 1982; Polak et al. 1991; Charalambous and Conn 1978; Han 1981; Dem’yanov and Malozemov 1974). Recently, Zhou and Tits (1993) proposed a nonmonotone SQP line search method with second-order correction for finite minimax problem (Yi 2002, 2004) developed a class of line search algorithms based on a new approximation model to problem (1.1), and Jian et al. (2007) proposed another SQP line search algorithm for problem (1.1). Typically, under mild assumptions, these algorithms exhibit a locally superlinear (or two-step superlinear) rate of convergence. It’s well-known that trust-region methods can induce strongly global convergence, which are very important methods for solving smooth optimization problems (see Conn et al. 2000; Sun 2004; Wang et al. 2006; Nocedal and Yuan 1998; Michael Gertz 2004; Yuan 1995; Powell 1984; Jian 2006; Hager 1999; Fletcher 1987; Nocedal and Wright 2006). So it’s natural to extend the trust-region methods to nonsmooth case. A pioneer work of trust region methods for a class of composite nonsmooth optimization problems was presented in Fletcher (1982), which is of the following form: min φ(x) ≡ f (x) + h(c(x)),
x∈Rn
(1.5)
where f : Rn → R and c : Rn → Rm are both twice continuously differentiable functions, h(c) from Rm to R is a polyhedral convex function of the form: h(c) = max (hTi c + bi ), 1≤I ≤q
(1.6)
where hi and bi are given vectors and constants, respectively. Obviously, the problem (1.1) can be viewed as a special case of problem (1.3). The algorithm proposed in Fletcher (1982) employed the second order correction step to overcome the Maratos effect, which was proved to be of rate of superlinear convergence in Yuan (1985). However, it may require the global solution of two (possibly nonconvex) quadratic subproblems at each iteration (Di Pillo et al. 1993), and it requires to compute the exact Hessian matrices ∇ 2 fi (x), i = 1, . . . , m in the subproblems at each iteration which causes a lot of computations on gradient. Moreover, if the ratio rk < 0, the algorithm requires to recompute the basic trial step dk and correction step d˜k . Thus, it may solve two (possibly nonconvex) quadratic subproblems several times in each iteration. Another well-known algorithm based on trust-region methods for nonsmooth optimization with corrections were described in Conn et al. (2000) (see Algorithm 11.3.1 thereof ), similarly, it requires to recompute the basic trial step dk and correction step d˜k at each iteration until the ratio meets the sufficient descent condition. The algorithms based on trust-region in Fletcher (1982), Conn et al. (2000) can be applied to the minimax problem (1.1), at present, however, there is no trust-region algorithms which is specialized for minimax problem directly. In this paper, according to the special features of minimax problem, motivated by Nocedal and Yuan (1998), Michael Gertz (2004), we develop a trust-region based algorithm in which we combine the trust-region methods with line or curve search methods, that is, when dk does not meet the sufficient descent condition defined in trust-region scheme, we perform a second order correction step to obtain d˜k . If dk + d˜k can not meet the sufficient descent condition, either. Then we perform a line backtracking search along dk or a curve back-tracking search, which depended on whether rk > 0 is satisfied or not, to find a step-length tk , such that the new point xk + tk dk or xk + tk dk + tk2 d˜k satisfies certain descent rules. In addition, it worths pointing out that, in order to have two
170
Ann Oper Res (2008) 164: 167–191
convex QP subproblems, and ensure that the line or curve search works well, it is required that Bk is uniformly positive definite. In this manner, the new algorithm can always produce a new point at each iteration, and avoid possibly solving two subproblems many times, so that the efficiency of the algorithm is improved. The analysis shows that the new algorithm here is of global convergence and locally superlinear convergence. Moreover, preliminary numerical experiments show that the new algorithm performs efficiently in practice. The contents of this paper is organized as follows: In Sect. 2, we establish the new algorithm. In Sect. 3, we analyze the global convergence and the rate of local convergence. In Sect. 4, we report some numerical results and compare it with some algorithms. Finally, we end the paper with the conclusions. We shall use the following notation and terminology. Unless otherwise stated, the vector norm used in this paper is Euclidean vector norm on Rn , and the matrix norm is the induced operator norm on Rn×n . In addition, we denote IA (x) = {i : fi (x) = φ(x)},
IN (x) = {i : fi (x) < φ(x)},
f (x) = (f1 (x), . . . , fm (x)) , T
F (x) = Diag(f1 (x), . . . , fm (x)),
∇f (x) = (∇f1 (x), . . . , ∇fm (x)), e = (1, . . . , 1)T ,
e ∈ Rm .
2 Algorithm Let {xk } ⊂ Rn be the sequence generated by the algorithms, at iterate k, we want to compute the iterate xk+1 from an iterate xk , an SQP trial step dk should be obtained first. We use the solution dk of the following trust-region subproblem: ⎧ 1 ⎨ min d, Bk d + max { ∇fi (xk ), d + fi (xk )} − φ(xk ), QP(xk , Bk ): (2.1) d∈Rn 2 1≤i≤m ⎩s.t. d ≤ , ∞
k
where Bk ∈ Rn×n is symmetric positive definite matrix, and the max1≤i≤m { ∇fi (xk ), d + fi (xk )} is a first-order approximation to φ(xk + d). In this paper, Bk is updated by the Powell’s modification of BFGS formula. The solution of (2.1) can be obtained by solving the following equivalent trust-region subproblem:
QP(xk , Bk ) :
⎧ ⎪ ⎪ ⎨ min
1
d, Bk d + z = mk (d, z), 2 s.t. ∇fi (xk ), d − z ≤ φ(xk ) − fi (xk ), ⎪ ⎪ ⎩ d ∞ ≤ k . (d,z)∈Rn+1
i = 1, . . . , m,
(2.2)
Remark 2.1 In the trust-region subproblem (2.1) and (2.2), we use the ∞ norm on the trustregion bound, so that the subproblems become easily-solved quadratic program problems. Remark 2.2 It’s worth pointing out that the SQP trust-region subproblem (2.2) here is different from the SQP trust-region subproblem which is used in smooth case. Because the latter can be infeasible (inconsistent), but the former is always feasible. In fact, (0, 0) is a feasible solution of (2.2). Therefore, we can employ any practical algorithms of solving quadratic programming to obtain the solution of (2.1).
Ann Oper Res (2008) 164: 167–191
171
In Fletcher (1982), in order to overcome the Maratos effect, the trust-region methods require to solve a second-order correction step d˜k . Zhou and Tits (1993) additionally performs a curve search as follows: φ(xk + tk dk + tk2 d˜k ) ≤ max φ(xk−l ) − αtk dk , Bk dk . 0≤l≤2
(2.3)
In this paper we follow the above ideas, and make use of the hybrid technique, that is, combine trust-region methods with line or curve search methods. If the basic trial step dk can not be acceptable in trust-region scheme, we try to use correction step dk + d˜k . If dk + d˜k is not acceptable either, then we perform a line search along dk or a curve search depended on whether rk > 0 is satisfied or not, to find a step-length tk , such that the new iterate xk + tk dk or xk + tk dk + tk2 d˜k satisfies certain descent rules, that is, there are two cases in this situation. One is rk > 0, this means that dk is a descent direction. Thus, we can perform a line search along dk to find a tk , such that φ(xk + tk dk ) ≤ φ(xk ) − αtk dk , Bk dk .
(2.4)
The other is rk ≤ 0, this means that dk may not be a descent direction. In this case, we perform a curve search to find a tk , such that the step-length tk satisfies the following inequality: φ(xk + tk dk + tk2 d˜k ) ≤ φ(xk ) − αtk dk , Bk dk ,
(2.5)
where α ∈ (0, 12 ), (dk , zk ) is the solution of (2.2), and d˜k is the second-order correction which is the solution of following quadratic programming: ⎧ 1 ⎪ ⎪ ˜ Bk (dk + d) ˜ + z˜ = ϕk (d, ˜ z˜ ),
dk + d, min ⎪ ⎨(d,z)∈R n+1 ˜ 2 (2.6) ˜ − z˜ ≤ φ(xk + dk ) − fi (xk + dk ), i = 1, . . . , m, s.t. ∇fi (xk ), d ⎪ ⎪ ⎪ ⎩ ˜ ∞ ≤ k , dk + d where we set d˜k = 0, when d˜k > dk . In addition, for problem (1.1), the Lagrangian is defined by L(x, λ) =
m
λi fi (x),
i=1
and the merit function is φ(x). Now we give the algorithm as follows: Algorithm 2.1 Step 0 Given initial values x0 ∈ Rn , ε > 0, max , 0 ∈ (0, max ), 0 < τ1 < 1 < τ2 , α ∈ (0, 12 ), β ∈ (0, 1), and 0 < μ ≤ 2α, η ∈ (0.5, 1), B0 = I , k := 0. Step 1 (Computation of basic trial step) Solve the quadratic program (2.2) approximately, and suppose that (dk , zk ) is its solution. If dk ≤ ε, stop; Otherwise; Step 2 Compute the ratio between the actual reduction and the predicted reduction rk = Step 3 (Update the iteration point)
φ(xk ) − φ(xk + dk ) . mk (0, 0) − mk (dk , zk )
(2.7)
172
Ann Oper Res (2008) 164: 167–191
(3.1) If rk > μ, set sk = dk , xk+1 = xk + sk , go to step 4; Otherwise; (3.2) Compute the second-order correction step d˜k by solving quadratic program (2.6) approximately. If d˜k > dk , we set d˜k = 0. (3.3) Compute r˜k =
φ(xk ) − φ(xk + dk + d˜k ) . mk (0, 0) − mk (dk , zk )
(2.8)
(3.4) If r˜k > μ, set rk := r˜k , sk = dk + d˜k , xk+1 = xk + sk , go to step 4; Otherwise; (3.5) If rk > 0, set d˜k = 0. (3.6) (Perform curve search) Compute tk , which is the first number in the sequence of {1, β, β 2 , . . .} such that φ(xk + tdk + t 2 d˜k ) ≤ φ(xk ) − αt dk , Bk dk ,
(2.9)
and set sk = tk dk + tk2 d˜k , xk+1 = xk + sk . Step 4 (Update k ) If rk ≤ μ, k+1 ∈ [ sk , τ1 k ]; If rk ≥ η, k+1 = min(τ2 k , max ); Otherwise, k+1 = k . Step 5 (Update Bk ) Update Bk to Bk+1 ; k := k + 1, go to step 1. Remark 2.3 In Algorithm 2.1, we employ the Powell’s modification of BFGS formula for updating Bk to Bk+1 . We recall the following well-known result on minimax problems (see Di Pillo et al. 1993). Lemma 2.1 A point x ∗ ∈ Rn is a critical point for problem (1.1) if and only if there exists a vector λ∗ ∈ Rm such that ⎧ m ⎪ ⎪ ⎪ λ∗i ∇fi (x ∗ ) = 0, ⎪ ⎪ ⎪ ⎪ i=1 ⎪ m ⎨ (2.10) λ∗i = 1, ⎪ ⎪ ⎪ i=1 ⎪ ⎪ ⎪ λ∗i ≥ 0, i = 1, . . . , m, ⎪ ⎪ ⎩ ∗ λi (fi (x ∗ ) − φ(x ∗ )) = 0, i = 1, . . . , m. The conditions of Lemma 2.1 can be interpreted as the Kuhn-Tucker conditions for problem (1.4) at the point (x ∗ , φ(x ∗ )). Lemma 2.2 Suppose that (dk , zk ) is the solution of SQP trust-region subproblem (2.2). Then (1) If dk = 0, there holds zk = 0. (2) If dk = 0, there holds zk < 0. Proof Without loss of generality, for simplicity, in the subproblem (2.2), we convert to make use of 2 norm. First, we show that the argument (1) is correct. By Algorithm 2.1, since
Ann Oper Res (2008) 164: 167–191
173
(dk , zk ) is the solution of SQP trust-region subproblem (2.2), there exist multipliers λk ∈ Rm and uk ∈ R which satisfy: (Bk + uk I )dk + ∇f (xk )λk = 0,
(2.11)
e λk = 1, T
(2.12)
(F (xk ) + Diag(∇f1 (xk ) dk , . . . , ∇fm (xk ) dk ) − (φ(xk ) + zk )I )λk = 0,
(2.13)
uk ( dk − k ) = 0,
(2.14)
∇f (xk ) dk − zk e ≤ φ(xk )e − f (xk ),
(2.15)
dk ≤ k ,
(2.16)
λk ≥ 0,
(2.17)
uk ≥ 0.
(2.18)
T
T
T
If dk = 0, it follows from (2.15) that f (xk ) − zk e ≤ φ(xk )e. For ∀i ∈ IA (xk ), we have φ(xk ) = fi (xk ). This yields that zk ≥ 0. On the other hand, since (d, z) = (0, 0) is a feasible solution to the subproblem (2.2), we obtain 1
dk , Bk dk + zk ≤ 0. (2.19) 2 By assumption, dk = 0. Thus, there holds zk ≤ 0. Consequently, we have the result zk = 0. The argument (2) is obvious. In fact, since (0, 0) is a feasible solution of subproblem (2.2), by hypothesis and (2.19), we have 1 zk ≤ − dk , Bk dk < 0. 2
Theorem 2.1 Suppose that {xk } and {dk } are generated by Algorithm 2.1, then (1) dk = 0 if and only if xk is the KKT point of the problem (1.1). (2) {xk } is well-defined. Proof (1) (⇒): By Algorithm 2.1, dk is always the solution of (2.2). If dk = 0, by Lemma 2.2, zk = 0 holds. Hence, it follows from (2.11)–(2.18) that the corresponding xk satisfies the first-order conditions (2.10), which means that xk is the KKT point of the problem (1.1). (1) (⇐): If xk is the KKT point of the problem (1.1), by Lemma 2.1, (2.10) holds. Substituting m i=1 λk,i ∇fi (xk ) = 0 into (2.11), we have (Bk + uk I )dk = 0, which implies that dk = 0. (2) Thanks to the result (1), the stopping criterion of dk ≤ ε in Algorithm 2.1 (step 1) is reasonable. Therefore, the sequence {xk } generated by Algorithm 2.1 is well-defined.
174
Ann Oper Res (2008) 164: 167–191
3 Convergence analysis 3.1 Global convergence We suppose that the following standard assumptions hold throughout the analysis. Assumption 3.1 For any point x0 ∈ Rn the level set L(x0 ) = {x ∈ Rn : φ(x) ≤ φ(x0 )} is compact. Assumption 3.2 For each x ∈ L(x0 ), the vectors
∇fi (x) , i ∈ IA (x) −1 are linearly independent. Assumption 3.3 There exists σ1 , σ2 > 0 such that σ1 x 2 ≤ x, Bk x ≤ σ2 x 2 holds, for every x ∈ Rn and every integer k. Assumption 3.1 is introduced in order to ensure the existence of a solution to problem (1.1); Assumption 3.2 is a common assumption in the literature on the minimax problem, which is a condition that ensures that the constrained problem (1.4) and the original problem (1.1) have the same first-order necessary condition (2.10); Assumption 3.3 is generally required in the context of SQP-type methods if global convergence is to be guaranteed. Lemma 3.1 Suppose that the sequence {xk } and {dk } are generated by Algorithm 2.1, {xk } converges to x ∗ , and the multiplies {uk } is bounded. Assumptions 3.2 and 3.3 are satisfied. Then, (1) If x ∗ is the KKT point of problem (1.1), then the corresponding sequence of multipliers {λk } converges to λ∗ , and λ∗ is the corresponding multiplier of x ∗ . (2) x ∗ is the KKT point of problem (1.1), if and only if {dk } converges to zero. Proof (1) Note that {λk } is in the compact set m m = λ∈R : λi = 1 and λi ≥ 0, ∀i = 1, 2, . . . , m .
(3.1)
i=1
Thus, there exist a subset K and λ∗ ∈ such that λk → λ∗ on K. We claim that the entire ∗
sequence
{λk } converges to λ . In fact, on the contrary, there exist another subset K and ¯λ∗ ∈ (λ¯ ∗ = λ∗ ), such that λk → λ¯ ∗ on K . If x ∗ is the KKT point of problem (1.1), taking the limit on K and K in (2.10), respectively. We obtain m
λ∗i ∇fi (x ∗ ) = 0,
i=1
m
λ∗i = 1,
(3.2)
λ¯ ∗i = 1.
(3.3)
i=1
and m i=1
λ¯ ∗i ∇fi (x ∗ ) = 0,
m i=1
Ann Oper Res (2008) 164: 167–191
175
It yields that
m ∇fi (x) (λ¯ ∗i − λ∗i ) = 0. −1
(3.4)
i=1
It follows from Assumption 3.2 that λ¯ ∗i = λ∗i ,
i = 1, 2, . . . , m.
This contradicts λ¯ ∗ = λ∗ . Thus, {λk } is convergent. (2) (⇐): First, we show that the sequence {λk } is also convergent if {dk } converges to zero. Proceeding by contradiction, similarly
to the proof in result (1), suppose on the contrary that there exist a subset K and λ∗ ∈ such that λk → λ∗ on K, and there exist another subset K and λ¯ ∗ ∈ (λ¯ ∗ = λ∗ ), such that λk → λ¯ ∗ on K . If limk→∞ dk = 0, by Assumption 3.2 and the given hypothesis, taking the limit on K and K in (2.11)–(2.18), respectively. It follows that (3.2)–(3.4) hold, which implies λ¯ ∗i = λ∗i ,
i = 1, 2, . . . , m.
This yields a contradiction to λ¯ ∗ = λ∗ again. Consequently, it follows from {(xk , dk , λk )} converges to (x ∗ , 0, λ∗ ) and (x ∗ , λ∗ ) satisfies (2.10) that x ∗ is the KKT point of problem (1.1). (2) (⇒): If x ∗ is the KKT point
of problem (1.1), it follows from the result (1) that {λk } is convergent to some λ∗ ∈ . In view of Assumption 3.3 and the given assumptions, taking the limit in (2.11), it follows from the boundedness of {dk } that the sequence {dk } is convergent, and limk→∞ dk = 0. Lemma 3.2 Suppose that the sequence {xk } is generated by Algorithm 2.1, and (dk , zk ) is the solution of subproblem (2.2), then zk ≤ − dk , (Bk + uk I )dk ,
∀k.
Proof Let IA (x) = {i : λk,i > 0}. Then, from (2.13), for every i ∈ IA (x), we have
∇fi (xk ), dk = φ(xk ) + zk − fi (xk ). Thus, it follows that m
λk,i ∇fi (xk ), dk =
i=1
λk,i ∇fi (xk ), dk
i∈IA (x)
=
λk,i (φ(xk ) + zk − fi (xk ))
i∈IA (x)
≥
λk,i zk
i∈IA (x)
= zk .
(3.5)
In addition, it follows from (2.11) that
(Bk + uk I )dk , dk = − ∇f (xk )λk , dk =−
m i=1
λk,i ∇fi (xk ), dk .
(3.6)
176
Ann Oper Res (2008) 164: 167–191
Combining (3.5) with (3.6), we have zk ≤ − (Bk + uk I )dk , dk .
(3.7)
Now we define index sets as follows:
1 = {k : rk ≥ μ},
2 = {k : r˜k ≥ μ},
3 = {k : r˜k < μ}.
Lemma 3.3 Suppose that the sequence {xk } is generated by Algorithm 2.1, and Assumptions 3.1 and 3.3 are satisfied, then (1) lim δk dk = 0,
k→∞
where
δk =
μ , 2α
tk ,
k ∈ 1 ∪ 2 , k ∈ 3 .
(3.8)
(3.9)
(2) lim xk+1 − xk = 0.
k→∞
(3.10)
Proof By Algorithm 2.1 and Assumption 3.1, {φ(xk )} is monotone decreasing and bounded below. Hence, {φ(xk )} is convergent, let lim φ(xk ) = φ ∗ .
k→∞
(3.11)
When k ∈ 1 , by Algorithm 2.1, it follows from Lemma 3.2 that φ(xk+1 ) = φ(xk + dk )
1 ≤ φ(xk ) + μ zk + dk , Bk dk 2
1 2 ≤ φ(xk ) + μ −uk dk − dk , Bk dk 2 1 ≤ φ(xk ) − μσ1 dk 2 , 2
(3.12)
and xk+1 − xk = dk .
(3.13)
Similarly, if k ∈ 2 , we obtain φ(xk+1 ) = φ(xk + dk + d˜k )
1 ≤ φ(xk ) + μ zk + dk , Bk dk 2 1 ≤ φ(xk ) − μσ1 dk 2 . 2
(3.14)
Ann Oper Res (2008) 164: 167–191
177
Meanwhile, in view of d˜k ≤ dk , we obtain xk+1 − xk = dk + d˜k ≤ 2 dk .
(3.15)
If k ∈ 3 , by Algorithm 2.1, in this case, we perform line or curve search. Note that d˜k = 0 if rk > 0, the line search (2.4) and curve search (2.5) can be written as an identical expression: φ(xk+1 ) = φ(xk + tk dk + tk2 d˜k ) ≤ φ(xk ) − αtk dk , Bk dk ≤ φ(xk ) − ασ1 tk dk 2 .
(3.16)
Meanwhile, in view of d˜k ≤ dk , tk < 1, we have xk+1 − xk = tk dk + tk2 d˜k ≤ 2tk dk . Let
δk =
μ , 2α tk ,
k ∈ 1 ∪ 2 , k ∈ 3 .
(3.17)
(3.18)
Combining (3.12), (3.14) and (3.16), it gives φ(xk+1 ) ≤ φ(xk ) − ασ1 δk dk 2 .
(3.19)
Therefore, it follows from (3.11) that lim δk dk 2 = 0,
k→∞
(3.20)
which implies that (3.8) holds. In addition, by the definition of δk , if k ∈ 1 ∪ 2 , there is a constant ρ1 such that ρ1 δk > 2. Let ρ = max{ρ1 , 2}, it follows from (3.13), (3.15) and (3.17) that xk+1 − xk ≤ ρδk dk , which gives from (3.8) that (3.10) holds, and the proof is finished.
(3.21)
Theorem 3.1 Suppose that {xk } is generated by Algorithm 2.1, the Assumptions 3.1–3.3 hold and the multiplies uk is bounded, if the algorithm does not stop in finite step, then every accumulation point of {xk } is the KKT point of problem (1.1). Proof Since Assumption 3.2 holds, problem (1.1) and problem (1.4) have the same KKT conditions (2.10). Thus, we can draw support from the KKT point of problem (1.4) to investigate the KKT point of original problem (1.1). By Assumption 3.1, if the algorithm does not stop in finite step, since the sequence {xk } is contained in the compact set L(x0 ), there exists a convergent subsequence, which we can also denote by {xk }, and suppose that limk→∞ xk = x ∗ , x ∗ ∈ L(x0 ). We claim that the corresponding subsequence {dk } converges to zero. Suppose on contrary that the assertion is false. Then, there exist a small positive constant γ and an infinite subset K, such that dk > γ ,
∀k ∈ K.
(3.22)
178
Ann Oper Res (2008) 164: 167–191
If 1 ∪ 2 is an infinite set, that is, there are infinitely many indices k ∈ 1 ∪ 2 , such μ . By Lemma 3.3 and (3.22), K ∩ ( 1 ∪ 2 ) is an only finite set. Hence, for all that δk = 2α sufficiently large k in set K, k ∈ 3 . Without loss of generality, we suppose that K ⊂ 3 , which implies that ⎧ ∀k ∈ K ⊂ 3 , ⎨ δk = tk , ∀k ∈ K ⊂ 3 , dk > γ , (3.23) ⎩ limk∈K, k→∞ tk dk = 0. If 1 ∪ 2 is a finite set, in view of K is an infinite set, it follows that for all sufficiently large k in set K, k ∈ 3 , and we can suppose that K ⊂ 3 , which yields (3.23) again. Hence, it suffices to consider the case of k ∈ K ⊂ 3 . When k ∈ 3 , by Algorithm 2.1, in this case, we perform line or curve search. Since d˜k = 0 if rk > 0, the line search (2.4) and curve search (2.5) can be written as an identical expression (2.5). Thus, we discuss (2.5) hereafter. We show now that there exits a tmin > 0 independent of k, such that line or curve search (2.5) is always satisfied for tk > tmin and for all k ∈ K. Expanding fi at xk gives fi (xk + tdk + t 2 d˜k ) = fi (xk ) + ∇fi (xk ), tdk + t 2 d˜k + o(tdk + t 2 d˜k ). Hence, by Lemma 3.2 and thanks to the boundedness of dk and d˜k , for t ∈ [0, 1] and i = 1, . . . , m, we have fi (xk + tdk + t 2 d˜k ) = fi (xk ) + t ∇fi (xk ), dk + o(t) ≤ fi (xk ) + t (φ(xk ) + zk − fi (xk )) + o(t) = (1 − t)fi (xk ) + tφ(xk ) + tzk + o(t) ≤ φ(xk ) + tzk + o(t) ≤ φ(xk ) − t (Bk + uk I )dk , dk + o(t) ≤ φ(xk ) − t Bk dk , dk + o(t) ≤ φ(xk ) − αt Bk dk , dk − t (1 − α) Bk dk , dk + o(t) ≤ φ(xk ) − αt Bk dk , dk − t (1 − α)σ1 γ 2 + o(t)
(3.24)
which the last inequality follows from Assumption 3.3 and the contradiction hypothesis. Note that α < 1, there exist t˜i > 0 independent of k such that, for all t ∈ [0, t˜i ], there holds fi (xk + tdk + t 2 d˜k ) ≤ φ(xk ) − αt Bk dk , dk ,
i = 1, . . . , m.
(3.25)
Let tmin = min1≤i≤m t˜i . Then, for tk ≥ tmin and for all k ∈ K, we have φ(xk + tk dk + tk2 d˜k ) ≤ φ(xk ) − αtk Bk dk , dk ,
(3.26)
where tk is obtained by the line or curve search. Hence, {tk dk } is uniformly bounded from below on K by tmin γ , but this contradicts with (3.23). Therefore, dk is convergent to zero, which implies that x ∗ is a KKT point of problem (1.1) by Lemma 3.1.
Ann Oper Res (2008) 164: 167–191
179
3.2 Local superlinear convergence In this section, let {xk } be generated by Algorithm 2.1, xk → x ∗ , x ∗ be a KKT point of problem (1.1), and λ∗ be the corresponding multiplier of x ∗ . Denote IA (x ∗ ) = {i : λ∗i > 0}. In order to analyze the local superlinear convergence, we make the following hypothesis. Assumption 3.4 The second order sufficiency conditions with strict complimentary slackness are satisfied at x ∗ , namely, λ∗i > 0,
i ∈ IA (x ∗ ),
(3.27)
and 2 L(x ∗ , λ∗ )d > 0,
d, ∇xx
∀0 = d ∈ V,
(3.28)
where V = {d ∈ Rn : d, ∇fi (x ∗ ) = 0, ∀i ∈ IA (x ∗ )}.
(3.29)
Assumption 3.5 The approximation matrices sequence {Bk } satisfy the Dennis-More condition ∗ 2 ∗ (Bk − m i=1 λi ∇ fi (x ))dk = 0. (3.30) lim k→∞ dk Remark 3.1 It is not difficult to prove that the Dennis-More condition (3.30) is equivalent to the following condition: lim
k→∞
(Bk −
i∈IA (x ∗ ) λk,i ∇
dk
2
fi (xk ))dk
= 0.
(3.31)
Lemma 3.4 Suppose that the basic step dk and the second-order correction step d˜k are generated by Algorithm 2.1, the multiplier λk is associated with dk and λ˜ k is associated with d˜k . Then, for sufficiently large k, (1) dk = O( xk − x ∗ ); (2) d˜k = O( dk 2 ); (3) {i : λk,i > 0} = IA (x ∗ ), {i : λ˜ k,i > 0} = IA (x ∗ ). Proof The proof is similar to that of Proposition 3.1 in Zhou and Tits (1993).
Since the minimax problem (1.1) is the special case of a class of composite nonsmooth optimization problems (1.5), similar to Theorems 2.1 and 2.2 given in Fletcher (1982), we have the following results: Theorem 3.2 Suppose that Assumptions 3.2–3.4 are satisfied, and the constraints of trustregion bound in the subproblems (2.2) and (2.6) are inactive. Let dk , d˜k be generated by the subproblems (2.2) and (2.6), respectively, and λˆ k , λ˜ k be their corresponding multipliers, respectively; λk , λ∗ be the multipliers of xk , x ∗ , respectively; x = xk − x ∗ , λ = λk −
180
Ann Oper Res (2008) 164: 167–191
λ∗ , = max{x , λ }. Then, for sufficiently large k, xk + dk − x ∗ = O( · x ), λˆ k − λ∗ = O(x2 ), xk + dk + d˜k − x ∗ = O( · x ), λ˜ k − λ∗ = O(x2 ). Proof The proof is similar to that of Theorem 2.1 in Fletcher (1982).
(3.32) (3.33)
Theorem 3.3 Suppose that Assumptions 3.2–3.4 are satisfied, dk , d˜k are generated by the subproblems (2.2) and (2.6), respectively, and the definition of is the same as Theorem 3.2. Then, for sufficiently large k, φ(xk ) − φ(xk + dk + d˜k ) = 1 + O(). mk (0, 0) − mk (dk , zk ) Proof The proof is similar to that of Theorem 2.2 in Fletcher (1982).
(3.34)
Theorem 3.4 Suppose that Assumptions 3.2–3.4 are satisfied, dk , d˜k are generated by the subproblems (2.2) and (2.6), respectively, and the definition of is the same as Theorem 3.2. Then, for k large enough, the constraints of trust-region bound in the two subproblems are inactive. Proof In view of xk → x ∗ , and x ∗ is a KKT point of problem (1.1), by Lemma 3.1, λk → λ∗ . By Lemma 3.4, dk → 0 and d˜k → 0. By Theorem 3.3, in Algorithm 2.1, r˜k → 1. Hence, by the definition of k+1 in Algorithm 2.1, k+1 ≥ k holds for k large enough, this implies that, for k large enough, the trust-region radius is bounded away from zero in both two subproblems, namely, the constraints of trust-region bound in the two subproblems are inactive. Lemma 3.5 Suppose that the step-length tk is generated by Algorithm 2.1, and Assumptions 3.2–3.5 are satisfied, then tk ≡ 1, for sufficiently large k. Proof By Algorithm 2.1, the step-length tk is generated by two different searches. Since d˜k = 0 if rk > 0, the line search (2.4) and curve search (2.5) will be unified to one expression (2.5). Thus, it suffices to discuss (2.5) as follows. First, we show that, there holds fi (xk + dk ) = fj (xk + dk ) + O( dk 2 ),
∀i, j ∈ IA (x ∗ ).
(3.35)
In fact, expanding fi (xk + dk ) around xk , and noting that (dk , zk ) is the solution of the subproblem (2.2), we have fi (xk + dk ) = fi (xk ) + ∇fi (xk ), dk + O( dk 2 ) = φ(xk ) + zk + O( dk 2 ),
∀i ∈ IA (x ∗ ).
(3.36)
Similarly, we can obtain fj (xk + dk ) = fj (xk ) + ∇fj (xk ), dk + O( dk 2 ) = φ(xk ) + zk + O( dk 2 ),
∀j ∈ IA (x ∗ ).
(3.37)
Ann Oper Res (2008) 164: 167–191
181
Combining (3.36) and (3.37), it immediately yields (3.35). By the result (3) of Lemma 3.4, for k large enough, φ(xk + dk ) = max∗ fi (xk + dk ) i∈IA (x )
= fi (xk + dk ) + O( dk 2 ),
∀i ∈ IA (x ∗ ).
(3.38)
Secondly, we show that, it holds fi (xk + dk + d˜k ) = fj (xk + dk + d˜k ) + O( dk 3 ),
∀i, j ∈ IA (x ∗ ).
(3.39)
In fact, by Taylor’s expression and Lemma 3.4, we obtain fi (xk + dk + d˜k ) = fi (xk + dk ) + ∇fi (xk + dk ), d˜k + O( d˜k 2 ) = fi (xk + dk ) + ∇fi (xk ), d˜k + O( dk 3 ) = φ(xk + dk ) + z˜ k + O( dk 3 ),
∀i ∈ IA (x ∗ ).
(3.40)
Similarly, we have fj (xk + dk + d˜k ) = fj (xk + dk ) + ∇fj (xk + dk ), d˜k + O( d˜k 2 ) = fj (xk + dk ) + ∇fj (xk ), d˜k + O( dk 3 ) = φ(xk + dk ) + z˜ k + O( dk 3 ),
∀j ∈ IA (x ∗ ).
(3.41)
Combining (3.40) and (3.41), it gives (3.39). Furthermore, similar to (3.38), it follows, for k large enough, that φ(xk + dk + d˜k ) = max∗ fi (xk + dk + d˜k ) i∈IA (x )
= fi (xk + dk + d˜k ) + O( dk 3 ) = λk,i fi (xk + dk + d˜k ) + O( dk 3 ) i∈IA (x ∗ )
=
λk,i fi (xk ) + ∇fi (xk ), dk + d˜k
i∈IA (x ∗ )
1 2 + ∇ fi (xk )dk , dk + o( dk 2 ), 2
∀i ∈ IA (x ∗ ).
(3.42)
By Theorem 3.4, the Lagrangian multiplier uk = 0, and it follows from (2.11) that λk,i ∇fi (xk ), dk + d˜k = − (Bk + uk I )dk , dk + o( dk 2 ) i∈IA (x ∗ )
= − Bk dk , dk + o( dk 2 ).
(3.43)
By Assumption 3.5, we obtain
2 λk,i ∇ fi (xk ) − Bk dk , dk = o( dk 2 ).
(3.44)
i∈IA (x ∗ )
182
Ann Oper Res (2008) 164: 167–191
Therefore, it follows from (3.42), (3.43) and (3.44) that φ(xk + dk + d˜k ) ≤ φ(xk ) − Bk dk , dk +
1 2
λk,i ∇ 2 fi (xk )dk , dk + o( dk 2 )
i∈IA (x ∗ )
1
Bk dk , dk 2
1 + λk,i ∇ 2 fi (xk ) − Bk dk , dk + o( dk 2 ) 2 i∈I (x ∗ )
= φ(xk ) − α Bk dk , dk + α −
A
1 = φ(xk ) − α Bk dk , dk + α −
Bk dk , dk + o( dk 2 ). 2
(3.45)
Since α ∈ (0, 12 ), for sufficiently large k, we have φ(xk + dk + d˜k ) ≤ φ(xk ) − α Bk dk , dk ,
(3.46)
which means that the step-length tk ≡ 1, and completes the proof.
Theorem 3.5 Suppose that (1) {xk } is generated by Algorithm 2.1, and limk→∞ xk = x ∗ , x ∗ is a KKT point of problem (1.1); (2) Assumptions 3.2–3.5 are satisfied. Then, {xk } converges to x ∗ superlinearly. Proof By Theorem 3.4, when k sufficiently large , the constraints of trust-region bound in the subproblems (2.2) and (2.6) are inactive. By Theorem 3.2, (3.32) and (3.33) hold for k large enough, which immediately yields that xk + dk + d˜k − x ∗ = O max( xk − x ∗ , λk − λ∗ ) . xk − x ∗
(3.47)
In addition, by Lemma 3.1, we have λk → λ∗ .
(3.48)
It follows from Lemma 3.5, (3.47) and (3.48) that xk+1 − x ∗ = 0, k→∞ xk − x ∗ lim
which completes the proof.
(3.49)
4 Numerical experiments In order to validate the efficiency of our proposed algorithm, preliminary numerical experiments have been carried out. The algorithm is coded in MATLAB 6.5, and the tests are performed on a PC computer with CPU Pentium 4, 2.00 GHz. In the implementation, the pa-
Ann Oper Res (2008) 164: 167–191
183
rameters are chosen as follows: 0 = 1, max = 105 , τ1 = 0.5, τ2 = 4, α = 0.1, β = 0.5, μ = 0.25, η = 0.75, B0 is chosen to be the identity matrix, the tolerance is set to = 1.0e–5. All test results is summarized in Table 1. In this table, the performance of Algorithm 2.1 (Wang) is compared with that of algorithms proposed in Conn et al. (2000) (CGT), Fletcher (1982) (RFL) and Zhou and Tits (1993) (ZhT), where all algorithms are coded in MATLAB 6.5. In Table 1, “NI” represents the number of iterations; “NF” stands for the total number of function evaluations; “NG” stands for the total number of gradient evaluations; “time (s)” stands for the cost of CPU running in seconds; φ(x) denotes the optimal value of objective function; d refers to the norm of the solution of SQP subproblem; IA (x) represents the set of indices of active functions. In addition, the letter “F” will be marked to denote the case when the algorithm fails. The terminal criterion is set to d ≤ or NI ≥ 10(n + m). The test examples are all from the literatures of Yi (2002), Luksan (1986), Watson (1979), which are given in Appendix. Since the trust-region algorithm proposed in Fletcher (1982) is for a class of composite nonsmooth optimization problems (1.5), in order to be used for solving finite minimax problem (1.1) specifically, we have to make conversion as follows: Algorithm 4.1 Step 1 Given initial values x0 ∈ Rn , ε > 0, 0 > 0, λ0 , k := 0. Step 2 (Compute a basic trial step) Solve the quadratic program (2.2), and suppose that (dk , zk ) is its solution, and λk is the corresponding multiplier. Step 3 If dk ≤ ε, stop; Otherwise; Step 4 Compute the ratio between the actual reduction and the predicted reduction rk =
φ(xk ) − φ(xk + dk ) . mk (0, 0) − mk (dk , zk )
(4.1)
If rk > 0.75 go to step 11; Otherwise; Step 5 Compute the second-order correction step d˜k by solving quadratic program (2.6), and the solution is d˜k , z˜ k , the corresponding multiplier is λ˜ k . Step 6 Compute the ratio rke = rk +
mk (0, 0) − mk (d˜k , z˜ k ) . mk (0, 0) − mk (dk , zk )
(4.2)
If rk < 0.25, goto step 8. Step 7 If rke ∈ [0.9, 1.1], set k+1 = 2k , and go to step 13; Otherwise, go to step 12. / [0.75, 1.25], go to step 9; Otherwise, compute Step 8 If rke ∈ r˜k =
φ(xk ) − φ(xk + dk + d˜k ) , mk (0, 0) − mk (dk , zk )
(4.3)
and set dk = dk + d˜k , rk = r˜k , λk = λ˜ k . If rk > 0.75, go to step 11; If rk ∈ [0.25, 0.75], go to step 12. Step 9 Set k+1 = αk dk , where αk ∈ [0.1, 0.5] is a given parameter. If rk > 0, go to step 13. Step 10 Set xk+1 = xk , λk+1 = λk , k = k + 1, and go to step 2. Step 11 If dk < k , go to step 12; Otherwise, If rk > 0.9 set k+1 = 4k , and go to step 13; else set k+1 = 2k , and go to step 13.
184
Ann Oper Res (2008) 164: 167–191
Step 12 Set k+1 = k . Step 13 Set xk+1 = xk + dk , λk+1 = λ˜ k , k = k + 1, and go to step 2. Remark 4.1 In Algorithm 4.1, according to the original description in Fletcher (1982), the Hessian matrix used in SQP subproblems (2.2) and (2.6) is Bk =
m
λk,i ∇ 2 fi (xk ).
i=1
Table 1 Numerical results Fun.
n/m
Meth.
NI
NF
NG
Time (s)
φ(x)
d
IA (x)
1
2/3
Wang
6
6
6
0.9710
1.9522
8.2341e–07
1, 2
CGT
5
9
5
1.8820
1.9522
4.6318e–07
1, 2
RFL
5
5
20
7.0400
1.9522
9.4757e–09
1, 2
ZhT
6
6
6
3.5750
1.9522
8.2341e–07
1, 2
Wang
5
5
5
0.8810
2.0000
9.7022e–10
1, 2, 3
CGT
4
7
4
1.2120
2.0000
2.5564e–12
1, 2, 3
RFL
5
5
20
7.3800
2.0000
3.2540e–08
1, 2, 3
ZhT
5
5
5
0.8310
2.0000
9.7022e–10
1, 2, 3
2
3
4
5
6
7
2/3
4/4
2/3
3/6
7/5
10/9
Wang
11
15
11
6.5900
−44.0000
1.4745e–06
1, 2, 4
CGT
12
23
9
10.7660
−44.0000
7.6254e–08
1, 2, 4
RFL
9
9
45
15.9530
−44.0000
4.0894e–07
2, 4
ZhT
9
26
9
9.9640
−44.0000
1.3006e–06
1, 2, 4 1
Wang
10
20
10
1.8930
0.6164
2.4353e–06
CGT
16
31
8
3.1250
0.6164
2.6695e–06
1, 3
RFL
12
12
48
5.1370
0.6164
8.7922e–08
1, 3
ZhT
9
12
9
1.3220
0.6164
6.7740e–08
1, 3
Wang
11
11
11
3.3950
3.5997
3.9095e–07
2, 5
CGT
14
27
9
8.2420
3.5997
1.0245e–06
2, 5
RFL
6
6
42
6.2790
3.5997
1.1038e–06
5
ZhT
11
11
11
3.1450
3.5997
1.3164e–06
2, 5 1, 2
Wang
19
44
19
24.9960
678.6796
4.1400e–06
CGT
26
51
18
34.5300
678.6796
8.4886e–06
1, 2
RFL
19
19
114
94.5660
678.6796
9.9549e–09
1, 2
ZhT
16
56
16
26.9690
678.6796
6.1418e–06
1, 2
Wang
16
24
16
23.4240
24.3062
8.9702e–07
1, 2, 3, 5,
CGT
18
35
14
37.8950
24.3062
1.6799e–06
1, 2, 3, 5,
RFL
6
6
60
31.1850
24.3062
1.2318e–07
2, 3
ZhT
15
33
15
29.4020
24.3062
3.9060e–07
1, 2, 3, 5,
6, 7, 9 6, 7, 9
6, 7, 9
Ann Oper Res (2008) 164: 167–191
185
Table 1 (Continued) Fun. 8
n/m
Meth.
NI
NF
NG
Time (s)
φ(x)
d
IA (x)
20/18
Wang
21
48
21
99.6530
131.2477
1.3537e–06
1, 2, 3, 5, 6, 7, 9, 11, 12, 15, 16, 17, 18
CGT
39
77
24
171.0860
131.2477
5.1284e–06
1, 2, 3, 5, 6, 7, 9, 11, 12, 15, 16, 17, 18
RFL
10
10
190
212.2960
131.2477
2.1049e–08
1, 2, 3, 5, 6, 7, 9, 11, 12, 15, 16, 17, 18
ZhT
22
73
22
131.6190
131.2477
1.6600e–06
1, 2, 3, 5, 6, 7, 9, 11, 12, 15, 16, 17, 18
9
10
11
12
13
3/30
2/20
4/40
4/100
6/62
Wang
7
7
7
5.9590
0.0508
4.2244e–06
9, 23
CGT
8
15
8
12.8390
0.0508
6.3452e–11
9, 18, 23, 30
RFL
23
23
713
334.6610
0.0508
6.5909e–06
9, 23, 30
ZhT
7
7
7
5.8480
0.0508
4.2244e–06
9, 23 11
Wang
10
31
10
13.2990
4.6934
5.2753e–07
CGT
11
21
7
11.6070
4.6934
5.4729e–09
5, 11
RFL
7
8
147
34.4190
4.6934
1.2856e–06
11
ZhT
9
7
9
8.1720
4.6934
3.6387e–06
11
Wang
13
13
13
54.8790
115.7064
5.4233e–07
1, 13, 20
CGT
12
23
12
219.6860
115.7064
6.6032e–07
1, 13, 20
RFL
13
13
533
1.6075e+03
115.7064
6.7895e–09
1, 13, 20
ZhT
11
13
11
33.8590
115.7064
2.8537e–07
1, 13, 20
Wang
13
32
13
133.0710
0.0026
2.1507e–07
10, 50, 67, 100
CGT
32
63
13
389.3900
0.0026
7.6751e–08
10, 50, 67, 100
RFL
F
F
F
F
F
F
F
ZhT
14
41
14
174.0400
0.0026
1.1189e–07
10, 50, 67, 100
Wang
13
14
13
415.0370
0.0127
2.9530e–09
2, 13, 29, 32,
CGT
12
23
12
1.5242e+03
0.0127
1.4288e–06
2, 13, 29, 32,
RFL
F
F
F
F
F
F
F
ZhT
13
15
13
465.8970
0.0127
2.9562e–09
2, 13, 29, 32,
36, 53, 62 36, 53, 62
36, 53, 62
Remark 4.2 In Table 1, since problems 12 and 13 have larger scale, and since the RFL algorithm needs to compute exact Hessian matrices at each iteration, our computer is unable to undertake the computations under its memory limit. In this case, we mark “F” on all output items.
186
Ann Oper Res (2008) 164: 167–191
From Table 1, we observe the following: (1) When comparing the performance between our new Algorithm 2.1 (Wang) and the CGT algorithm. It’s clear that: (a) For all test problems except problem 10, the Wang algorithm performs much better than the CGT algorithm in terms of the cost of CPU running times and the number of function evaluations, in particular, for the larger scale problems 6–9 and 11–13, and the difference is significant. (b) For all test problems, in terms of the number of gradient evaluations, the performance of both two algorithms is equally matched. (c) In terms of the number of iterations, for problems 1–10 and 12, the Wang algorithm outperforms the CGT algorithm. While for problems 11 and 13, the performance of both algorithms is very close, and the difference is not significant. The reason for this is that the Wang algorithm takes the hybrid technique, so that it can avoid possibly solving the two quadratic programming subproblems many times at each iteration, and thereby improves the algorithm’s efficiency. (2) When comparing the Wang algorithm with the ZhT algorithm, we find that: for all test problems, in terms of both the number of iterations and gradient evaluations, the performance of both algorithms is very close. While in terms of both the cost of CPU running times and the number of function evaluations, the Wang algorithm outperforms the ZhT algorithm in general, especially for larger scale problems. (3) When comparing the Wang algorithm with the RFL algorithm, we can see that: (a) For all test problems, the Wang algorithm performs much better than the RFL algorithm in terms of both the cost of CPU running times and the number of gradient evaluations. Moreover, the difference is significant. The reason is that the latter employed exact Hessian matrices, and need to compute m pieces Hessian matrices ∇ 2 fi (x), i = 1, . . . , m when solving the subproblems at each iteration. In addition, for larger scale problems, the RFL algorithm fails due to the memory limit of our computer. (b) In terms of both the number of iterations and function evaluations, the latter performs little better than the former, but the difference is not large. The reason may be that the latter employed exact Hessian matrices, hence, the approximation effect of model in the subproblems can be relatively better than that of the former. In summary, through above preliminary tests, synthesizing each criterion, in particular, the cost of CPU running times which is very important for an algorithm, comparing our new algorithm with three different algorithms described in Fletcher (1982), Zhou and Tits (1993) and Conn et al. (2000), respectively, we can see that the new algorithm proposed in this paper is efficient and of advantageous.
5 Conclusions In this paper, we have described a hybrid algorithm for finite minimax problem, that is, combining the trust-region methods with the line or curve search methods. Moreover, the new algorithm employ the second-order correction step, which enable the Maratos effect to be avoided. Under mild conditions, we proved that it is of strongly global convergence and locally superlinear convergence. The preliminary numerical experiments show that the new algorithm is efficient and practical.
Ann Oper Res (2008) 164: 167–191
187
Appendix The test examples for unconstrained finite minimax problem are as follows: Problem 6.1 (Yi 2002) Here, n = 2, m = 3, and f1 (x) = x12 + x24 ,
f2 (x) = (2 − x1 )2 + (2 − x2 )2 ,
f3 (x) = 2 exp(−x1 + x2 ).
The solution is x = (1.139037652, 0.8995599384), f1 = f2 = 1.952224494. The initial point is x0 = (1, −0.1). Problem 6.2 (Yi 2002) Here, n = 2, m = 3, and f1 (x) = x14 + x22 ,
f2 (x) = (2 − x1 )2 + (2 − x2 )2 ,
f3 (x) = 2 exp(−x1 + x2 ).
The solution is x = (1, 1), f1 = f2 = f3 = 2. The initial point is x0 = (1, −0.1). Problem 6.3 (Yi 2002) Here, n = 4, m = 4, and f1 (x) = x12 + x22 + 2x32 + x42 − 5x1 − 5x2 − 21x3 + 7x4 , f2 (x) = f1 (x) − 10(−x12 − x22 − x32 − x42 − x1 + x2 − x3 + x4 + 8), f3 (x) = f1 (x) − 10(−x12 − 2x22 − x32 − 2x42 + x1 + x4 + 10), f4 (x) = f1 (x) − 10(−x12 − x22 − x32 − 2x1 + x2 + x4 + 5). The solution is x = (0, 1, 2, −1), f1 = f2 = f4 = −44. The initial point is x0 = (0, 0, 0, 0). Problem 6.4 (Yi 2002) Here, n = 2, m = 3, and f1 (x) = x12 + x22 + x1 x2 ,
f2 (x) = sin(x1 ),
f3 (x) = cos(x2 ).
The solution is x = ±(0.4532962370, −0.9065924741), f1 = f3 = 0.6164324356. The initial point is x0 = (3, 1). Problem 6.5 (Yi 2002) Here, n = 3, m = 6, and f1 (x) = x12 + x22 + x32 − 1,
f4 (x) = x1 + x2 − x3 + 1,
f2 (x) = x12 + x22 + (x3 − 2)2 ,
f5 (x) = 2x13 + 6x22 + 2(5x3 − x1 + 1)2 ,
f3 (x) = x1 + x2 + x3 − 1,
f6 (x) = x12 − 9x3 .
The solution is x = (0.32825995, 0, 0.1313200636), f2 = f5 = 3.599719300. The initial point is x0 = (1, 1, 1). Problem 6.6 (Wong 1) (Luksan 1986) Here, n = 7, m = 5, and f1 (x) = (x1 − 10)2 + 5(x2 − 12)2 + x34 + 3(x4 − 11)2
188
Ann Oper Res (2008) 164: 167–191
+ 10x56 + 7x62 + x74 − 4x6 x7 − 10x6 − 8x7 , f2 (x) = f1 (x) + 10(2x12 + 3x24 + x3 + 4x42 + 5x5 − 127), f3 (x) = f1 (x) + 10(7x1 + 3x2 + 10x32 + x4 − x5 − 282), f4 (x) = f1 (x) + 10(23x1 + x22 + 6x62 − 8x7 − 196), f5 (x) = f1 (x) + 10(4x22 + x22 − 3x1 x2 + 2x32 + 5x6 − 11x7 ). The initial point is x0 = (1, 2, 0, 4, 0, 1, 1). Problem 6.7 (Wong 2) (Luksan 1986) Here, n = 10, m = 9, and f1 (x) = x12 + x22 + x1 x2 − 14x1 − 16x2 + (x3 − 10)2 + 4(x4 − 5)2 + (x5 − 3)2 + 2(x6 − 1)2 + 5x72 + 7(x8 − 11)2 + 2(x9 − 10)2 + (x10 − 7)2 + 45, f2 (x) = f1 (x) + 10(3(x1 − 2)2 + 4(x2 − 3)2 + 2x32 − 7x4 − 120), f3 (x) = f1 (x) + 10(5x12 + 8x2 + (x3 − 6)2 − 2x4 − 40), f4 (x) = f1 (x) + 10(0.5(x1 − 8)2 + 2(x2 − 4)2 + 3x52 − x6 − 30), f5 (x) = f1 (x) + 10(x12 + 2(x2 − 2)2 − 2x1 x2 + 14x5 − 6x6 ), f6 (x) = f1 (x) + 10(4x1 + 5x2 − 3x7 + 9x8 − 105), f7 (x) = f1 (x) + 10(10x1 − 8x2 − 17x7 + 2x8 ), f8 (x) = f1 (x) + 10(−3x1 + 6x2 + 12(x9 − 8)2 − 7x10 ), f9 (x) = f1 (x) + 10(−8x1 + 2x2 + 5x9 − 2x10 − 12). The initial point is x0 = (2, 3, 5, 5, 1, 2, 7, 3, 6, 10). Problem 6.8 (Wong 3) (Luksan 1986) Here, n = 20, m = 18, and f1 (x) = x12 + x22 + x1 x2 − 14x1 − 16x2 + (x3 − 10)2 + 4(x4 − 5)2 + (x5 − 3)2 + 2(x6 − 1)2 + 5x72 + 7(x8 − 11)2 + 2(x9 − 10)2 + (x10 − 7)2 + (x11 − 9)2 4 + 10(x12 − 1)2 + 5(x13 − 7)2 + 4(x14 − 14)2 + 27(x15 − 1)2 + x16 2 + (x17 − 2)2 + 13(x18 − 2)2 + (x19 − 3)2 + x20 + 95,
f2 (x) = f1 (x) + 10(3(x1 − 2)2 + 4(x2 − 3)2 + 2x32 − 7x4 − 120), f3 (x) = f1 (x) + 10(5x12 + 8x2 + (x3 − 6)2 − 2x4 − 40), f4 (x) = f1 (x) + 10(0.5(x1 − 8)2 + 2(x2 − 4)2 + 3x52 − x6 − 30), f5 (x) = f1 (x) + 10(x12 + 2(x2 − 2)2 − 2x1 x2 + 14x5 − 6x6 ), f6 (x) = f1 (x) + 10(4x1 + 5x2 − 3x7 + 9x8 − 105), f7 (x) = f1 (x) + 10(10x1 − 8x2 − 17x7 + 2x8 ), f8 (x) = f1 (x) + 10(−3x1 + 6x2 + 12(x9 − 8)2 − 7x10 ), f9 (x) = f1 (x) + 10(−8x1 + 2x2 + 5x9 − 2x10 − 12),
Ann Oper Res (2008) 164: 167–191
189
f10 (x) = f1 (x) + 10(x1 + x2 + 4x11 − 21x12 ), f11 (x) = f1 (x) + 10(x12 + 15x11 − 8x12 − 28), 2 − 9x14 − 97), f12 (x) = f1 (x) + 10(4x1 + 9x2 + 5x13
f13 (x) = f1 (x) + 10(3x1 + 4x2 + 3(x13 − 6)2 − 14x14 − 10), f14 (x) = f1 (x) + 10(14x12 + 35x15 − 79x16 − 92), f15 (x) = f1 (x) + 10(15x22 + 11x15 − 61x16 − 54), 4 − x18 − 68), f16 (x) = f1 (x) + 10(5x12 + 2x2 + 9x17
f17 (x) = f1 (x) + 10(x12 − x9 + 19x19 − 20x20 + 19), 2 − 30x20 ). f18 (x) = f1 (x) + 10(7x12 + 5x22 + x12
The initial point is x0 = (2, 3, 5, 5, 1, 2, 7, 3, 6, 10, 2, 2, 6, 15, 1, 2, 1, 2, 1, 3). Problem 6.9 (Bard) (Yi 2002; Watson 1979) Here, n = 3, m = 30, and fi (x) = −yi + x1 + fi (x) = −fi−15 (x),
ui , vi x2 + wi x3
i = 1, 2, . . . , 15,
i = 16, 17, . . . , 30,
where ui = i, vi = 16−i, wi = min ui , vi , y = (0.14, 0.18, 0.22, 0.25, 0.29, 0.32, 0.35, 0.39, 0.37, 0.58, 0.73, 0.96, 1.34, 2.1, 4.39). The solution is x = (0.053469, 1.542593, 1.957407), f ∗ = 0.05081632653. The initial point is x0 = (1, 1, 1). Problem 6.10 (LS) (Watson 1979) Here, n = 2, m = 20, and fi (x) = yi − exp(ix1 ) − exp(ix2 ), fi (x) = −f21−i (x),
i = 1, 2, . . . , 10,
i = 11, 12, . . . , 20,
where yi = 2 + 2i. The solution is x ∗ = (0.259127, 0.259127), f ∗ = 4.693376. The initial point is x0 = (0.3, 0.4). Problem 6.11 (Davidon) (Watson 1979) Here, n = 4, m = 40, and fi (x) = (x1 + x2 ti − exp(ti ))2 + (x3 + x4 sin(ti ) − con(ti ))2 , fi (x) = −f41−i (x),
i = 1, 2, . . . , 20,
i = 21, 22, . . . , 40,
where ti = 0.2i. The solution is x ∗ = (−12.243681, 14.021797, −0.451511, −0.010519), f ∗ = 115.70644. The initial point is x0 = (25.0, 5.0, −5.0, −1.0). Problem 6.12 (Hettich) (Watson 1979) Here, n = 4, m = 100, and √ fi (x) = ti + (x1 ti + x2 )ti + x3 )2 − x4 , i = 1, 2, . . . , 50, fi (x) = −f101−i (x),
i = 51, 52, . . . , 100,
190
Ann Oper Res (2008) 164: 167–191
where ti = 0.25 + (i − 1)0.75/(50 − 1). The solution is x ∗ = (0.087532, −0.495316, 1.118352, 1.502446), f ∗ = 0.002459. The initial point is x0 = (0, −0.5, 1, 1.5). Problem 6.13 (Watson) (Watson 1979) Here, n = 6, m = 62, and f1 (x) = x1 , f2 (x) = x2 − x12 − 1,
n 2
n i − 2 j −1 i − 2 j −2 (j − 1)xj − xj − 1, fi (x) = 29 29 j =2 j =1 fi (x) = −f63−i (x),
i = 3, 4, . . . , 31,
i = 32, 33, . . . , 62.
The initial point is x0 = (0, 0, . . . , 0). References Charalambous, C., & Conn, A. R. (1978). An efficient method to solve the minimax problem directly. SIAM Journal on Numerical Analysis, 15, 162–187. Conn, A. R., Gould, N. I., & Toint, P. L. (2000). Trust-region methods. Philadelphia: SIAM. Dem’yanov, V. F., & Malozemov, V. N. (1974). Introduction to minimax. New York: Wiley. Di Pillo, G., Grippo, L., & Lucidi, S. (1993). A smooth method for the finite minimax problem. Mathematical Programming, 60, 187–214. Du, D. Z. & Pardalos, P. M. (Eds.) (1995). Minimax and applications. Dordrecht: Kluwer. Fletcher, R. (1982). Second order correction for nondifferentiable optimization problems. In G. A. Watson (Ed.), Numerical analysis (pp. 85–114). Berlin: Springer. Fletcher, R. (1987). Practical methods of optimization (2nd edn.) New York: Wiley. Gaudioso, M., & Monaco, M. F. (1982). A boundle type approach to the unconstrained minimization of convex nonsmooth functions. Mathematical Programming, 23, 216–226. Gigola, C., & Gomez, S. (1990). A regularization method for solving the finite convex min–max problem. SIAM Journal on Numerical Analysis, 27, 1621–1634. Hager, W. W. (1999). Stabilized sequential quadratic programming. Computational Optimization and Applications, 12, 253–273. Han, S. P. (1981). Variable metric methods for minimizing a class of nondifferentiable functions. Mathematical Programming, 20, 1–13. Jian, J. B. (2006). New sequential quadratically constrained quadratic programming method of feasible directions and its convergence rate. Journal of Optimization Theorey and Application, 129, 109–130. Jian, J.-B., Quan, R., & Hu, Q. J. (2007). A new superlinearly convergent SQP algorithm for nonlinear minimax problem. Acta Mathematicae Applicatae Sinica, 23, 395–410. English series. Li, X.-S., & Fang, S.-C. (1997). On the entropic regularization method for solving min-max problems with applications. Mathematical Methods of Operations Research, 46, 119–130. Luksan, L. (1986). A compact variable metric algorithm for nonlinear minimax approximation. Computing, 36, 355–373. Michael Gertz, E. (2004). A quasi-Newton trust-region method. Mathematical Programming, 100, 447–470. Murray, W., & Overton, M. L. (1980). A projected Lagrangian algorithm for nonlinear minimax optimization. SIAM Journal on Scientific and Statistic Computing, 1, 345–370. Nocedal, J., & Wright, S. J. (2006). Numerical optimization. Beijing: Science Press. Nocedal, J., & Yuan, Y. (1998). Combining trust region and line search techniques. In Y. Yuan (Ed.), Advances in nonlinear programming (pp. 153–175). Dordrecht: Kluwer. Overton, M. L. (1982). Algorithms for nonlinear l1 and l∞ fitting. In M. J. D. Powell (Ed.), Nonlinear optimization 1981 (pp. 213–221). London: Academic Press. Ploak, E. (1989). Basics of minimax algorithms. In F. H. Clarke, V. F. Demyanov, & F. Giannessi (Eds.), Nonsmooth optimization and related topics (pp. 343–369). New York: Plenum. Polak, E., Mayne, D. H., & Higgins, J. E. (1991). Superlinearly convergent algorithm for min-max problems. Journal of Optimization Theorey and Application, 69, 407–439.
Ann Oper Res (2008) 164: 167–191
191
Powell, M. J. D. (1984). On the global convergence of trust region algorithm for unconstrained minimization. Mathematical Programming, 29, 297–303. Shen, P. P., & Wang, Y. J. (2005). A new pruning test for finding all global minimizers of nonsmooth functions. Applied Mathematics and Computation, 168, 739–755. Sun, W. (2004). Nonmonotone trust region method for solving optimization problems. Applied Mathematics and Computation, 156, 159–174. Vardi, A. (1992). New minimax algorithm. Journal of Optimization Theorey and Application, 75, 613–634. Wang, F. S., Zhang, K. C., & Tan, X. L. (2006). A fractional programming algorithm based on conic quasiNewton trust region method for unconstrained minimization. Applied Mathematics and Computation, 181, 1061–1067. Watson, G. A. (1979). The minimax solution of an overdetermined system of nonlinear equations. Journal of the Institute for Mathematics and Its Applications, 23, 167–180. Xu, S. (2001). Smoothing method for minimax problems. Computational Optimization and Applications, 20, 267–279. Yi, X. (2002). The sequential quadratic programming method for solving minimax problem. Journal of Systems Science and Mathematical Sciences, 22, 355–364. Yi, X. (2004). A Newton like algorithm for solving minimax problem. Journal on Numerical Methods and Computer Applications, 2, 108–115. Yuan, Y. (1985). On the superlinear convergence of a trust-region algorithm for nonsmooth optimization. Mathematical Programming, 31, 269–285. Yuan, Y. (1995). On the convergence of a new trust region algorithm. Numerische Mathematik, 70, 515–539. Zhou, J. L., & Tits, A. L. (1993). Nonmonotone line search for minimax problems. Journal of Optimization Theorey and Application, 76, 455–476.