A modified BFGS algorithm based on a hybrid secant equation

By making a convex combination of the modified secant equations proposed by Yuan and Wei et al., a hybrid secant equation and also, a modified BFGS al...

5 downloads 90 Views 406KB Size

Download PDF

SCIENCE CHINA Mathematics

. ARTICLES .

September 2011 Vol. 54 No. 9: 2019–2036 doi: 10.1007/s11425-011-4232-7

A modified BFGS algorithm based on a hybrid secant equation BABAIE-KAFAKI Saman Department of Mathematics, Faculty of Mathematics, Statistics and Computer Sciences, Semnan University, Semnan 35196-45399, Iran Email: [email protected], [email protected] Received August 10, 2010; accepted February 13, 2011; published online June 29, 2011

Abstract By making a convex combination of the modiﬁed secant equations proposed by Yuan and Wei et al., a hybrid secant equation and also, a modiﬁed BFGS algorithm is proposed. The hybridization parameter is eﬀectively computed using the available information of recent iterations. Under proper conditions, it is shown that the proposed algorithm is globally, locally and superlinearly convergent. By using the performance proﬁle introduced by Dolan and Mor´e, a comparison between the implementations of the proposed algorithm and two eﬃcient modiﬁed BFGS algorithms proposed by Yuan and Wei et al., on a set of unconstrained optimization test problems from the CUTEr collection, is done. Numerical results demonstrating the eﬃciency of the proposed modiﬁed BFGS algorithm are reported. Keywords unconstrained optimization, hybrid secant equation, BFGS update, global convergence, local and superlinear convergence MSC(2000):

90C53, 49M37, 65K05

Citation: Babaie-Kafaki S. A modiﬁed BFGS algorithm based on a hybrid secant equation. Sci China Math, 2011, 54(9): 2019–2036, doi: 10.1007/s11425-011-4232-7

1

Introduction

Secant (quasi-Newton) equations play essential roles for approximation of the Hessian matrix (the matrix of second order partial derivatives) of an objective function in each iteration of quasi-Newton methods which are eﬃcient iterative methods for solving unconstrained optimization problems. A general quasiNewton method is designed to solve the following unconstrained optimization problem, min f (x),

(1)

x∈Rn

where f : Rn → R is a smooth nonlinear function. Notations.

For a suﬃciently smooth function f , we consider the following notations: g(x) = ∇f (x),

G(x) = ∇2 f (x),

and at the k-th iteration, fk = f (xk ),

gk = g(xk ),

Gk = G(xk ),

yk = gk+1 − gk .

Here, · stands for the Euclidean norm and for a nonsingular symmetric matrix E ∈ Rn×n , · E is the matrix norm deﬁned by AE = EAEF , for an arbitrary matrix A ∈ Rn×n . Also, a neighborhood of x), is deﬁned by an arbitrary point x ¯ ∈ Rn with the radius r¯ > 0, Nr¯(¯ c Science China Press and Springer-Verlag Berlin Heidelberg 2011

math.scichina.com

www.springerlink.com

2020

Babaie-Kafaki S

Sci China Math

September 2011

Vol. 54

No. 9

Nr¯(¯ x) = {x ∈ Rn : x − x¯ < r¯}. The iterative formula of a quasi-Newton method is given by xk+1 = xk + sk ,

sk = αk dk ,

k = 0, 1, . . . ,

where αk is a step length to be computed by a line search technique [19], and dk is the search direction to be calculated by solving the following linear system of equations, Bk dk = −gk ,

(2)

in which Bk is an approximation for Gk . Quasi-Newton methods are characterized by the fact that Bk is eﬀectively updated to obtain a new matrix Bk+1 in the following form, Bk+1 = Bk + ΔBk , where ΔBk is a correction matrix. The matrix Bk+1 is imposed on satisfying a suitable equation, namely secant equation, which includes the second order derivatives information. The most popular equation is the (standard) secant equation, i.e., Bk+1 sk = yk . The standard secant equation only uses the gradient values available at the current iteration and ignores the function values. Eﬀorts have been made to modify the standard secant equation so that more available information will be employed and consequently, a better approximation for Gk+1 will be obtained. Using an interpolation for f , Davidon [7] proposed a non-quadratic approximation model, namely conic model, that uses both the function and gradient values available at the current iteration. Based on a quadratic interpolation for f , Yuan [27] and Wei et al. [23] proposed modiﬁed secant equations which approximate the curvature of the objective function more accurate than the standard secant equation (see also [24,28]). Zhang et al. [30] and Zhang and Xu [29] used a quadratic interpolation for g and proposed another modiﬁed secant equation with a higher order accuracy in approximating the curvature of the objective function. In order to adoptively switch the standard secant equation to the modiﬁed secant equation proposed in [29, 30], slight revisions have been made by Yabe et al. [25, 26] and Babaie-Kafaki et al. [2] by embedding a parameter on this equation. All the mentioned methods in [23, 27, 29, 30] use both the gradient and function values available at the current iteration. In the cases that the Hessian has some special structures and is partially available, for example, in the nonlinear least squares problems, the above mentioned secant equations have been developed by Dennis et al. [8], Chen et al. [6], and Amini and Ghorbani Rizi [1]. Convexity assumption on the objective function plays an important role in convergence analysis of the quasi-Newton methods based on all the above mentioned secant equations. Remarkable secant equations have been proposed by Guo et al. [16] and Li and Fukushima [18], in order to achieve the convergence of quasi-Newton methods without convexity assumption on the objective function. Some other attempts have been done in order to use the available data from more than one previous step. Many researchers dealt with the extended secant equation, i.e., Bk+1 si = yi ,

i = 1, 2, . . . , k,

which uses all the available gradient values and guarantees the at most n-step termination property for minimization of a convex quadratic function [17]. Ford and Moghrabi [12, 13] developed the concept of multi-step quasi-Newton methods based on the polynomial interpolation using available data from the m recent steps (see also [11] and references therein). Here, we suggest a modiﬁed secant equation that is a convex combination of the modiﬁed secant equations proposed by Yuan [27] and Wei et al. [23]. This work is organized as follows. In Section 2, we ﬁrst state our hybrid secant equation and then, based on the proposed secant equation, we suggest a modiﬁed BFGS algorithm. In Section 3, we show that under proper conditions, the proposed algorithm is globally, locally and superlinearly convergent. Computing the hybridization parameter based on the available information of recent iterations is discussed in Section 4. We make a numerical comparison between the implementations of our algorithm and two eﬃcient modiﬁed BFGS algorithms proposed by Yuan [27] and Wei et al. [23] in Section 5. Finally, we make conclusions in Section 6.

Babaie-Kafaki S

2

Sci China Math

September 2011

Vol. 54

No. 9

2021

The hybrid secant equation

Here, we propose our hybrid secant equation as a convex combination of the modiﬁed secant equations proposed in [23] and [27]. Assume that the objective function f is smooth enough. If we expand the function f at xk = xk+1 − sk using Taylor’s theorem, then we have 1 T 1 T 4 fk = fk+1 − sT k gk+1 + sk Gk+1 sk − sk (Tk+1 sk )sk + O(sk ), 2 6 where sT k (Tk+1 sk )sk =

n ∂ 3 f (xk+1 ) i j l s s s . ∂xi ∂xj ∂xl k k k

(3)

i,j,l=1

Therefore, 1 T T 4 sT k Gk+1 sk = 2(fk − fk+1 + sk gk+1 ) + sk (Tk+1 sk )sk + O(sk ) 3 1 T T 4 = sT k yk + 2(fk − fk+1 ) + sk (gk + gk+1 ) + sk (Tk+1 sk )sk + O(sk ). 3 So, the following approximation can be proposed, T sT k Gk+1 sk ≈ sk yk + ϑk ,

(4)

where ϑk = 2(fk − fk+1 ) + sT k (gk + gk+1 ). Since in the quasi-Newton methods, Bk+1 is required to approximate Gk+1 , by considering (4), a modiﬁed secant equation can be proposed as follows: Bk+1 sk = zk ,

zk = yk +

ϑk uk , T sk u k

(5)

where uk ∈ Rn is a vector parameter satisfying sT k uk = 0. The modiﬁed secant equation (5) is justiﬁed by the following theorem (see [23]). Theorem 2.1. If f is suﬃciently smooth and sk is small enough, then the following estimating relations hold : 1 T s (Tk+1 sk )sk + O(sk 4 ), 2 k 1 T s (Tk+1 sk )sk + O(sk 4 ). sT k (Gk+1 sk − zk ) = 3 k

sT k (Gk+1 sk − yk ) =

For a quadratic objective function f , we have ϑk = 0 and consequently, the modiﬁed secant equation (5) reduces to the standard secant equation. A comprehensive study has been done on the properties of the modiﬁed secant equation (5) by Wei et al. [23, 24]. Diﬀerent choices for the vector parameter uk in (5) lead to diﬀerent modiﬁed secant equations. The particular choice uk = sk has been considered by Wei et al. [23,24]. Also, if the step length αk is computed such that the Wolfe-Powell line search conditions hold, i.e., f (xk + αk dk ) − fk δαk gkT dk , g(xk + αk dk )T dk σgkT dk ,

(6)

with 0 < δ < σ < 1, then we have T T T sT k yk = sk gk+1 − sk gk −(1 − σ)sk gk > 0.

(7)

2022

Babaie-Kafaki S

Sci China Math

September 2011

Vol. 54

No. 9

Therefore, based on the Wolfe-Powell line search conditions the particular choice uk = yk was ﬁrstly proposed by Yuan [27] and then studied by Wei et al. [23]. In the modiﬁed secant equation (5), only the information of the current iteration is used and other available information, obtained from more than one previous iteration, is ignored. This motivated us to improve Equation (5) by considering the vector parameter uk as a convex combination of yk and sk , i.e., uk = (1 − θk )yk + θk sk ,

θk ∈ [0, 1],

(8)

and then, to compute an eﬀective value for the hybridization parameter θk based on the available information of the recent iterations (which will be discussed in Section 4). Note that the choices θk = 0 and θk = 1 in (8) converts Equation (5) to the modiﬁed secant equations proposed in [27] and [23], respectively. As mentioned in Section 1, the quasi-Newton methods are characterized by the fact that Bk , which is an approximation for Gk , should be eﬀectively updated to obtain a new approximation Bk+1 for Gk+1 , such that a version of the secant equation is satisﬁed. A well-known class of update formulae is the Broyden family [19]. Among the Broyden family, the BFGS update, which is proposed by Broyden [3], Fletcher [10], Goldfarb [14] and Shanno [21], is famous and popular because of its favorable numerical and theoretical behavior. For the modiﬁed secant equation (5), by the BFGS update formula, Bk+1 is computed as follows: Bk s k s T Bk zk z T + Tk. (9) Bk+1 = Bk − T k s k Bk s k sk z k is updated to Also, dual of the update formula (9), in which the available approximation Hk for G−1 k −1 obtain a new approximation Hk+1 for Gk+1 , is given by Hk+1 =

z k sT sk sT sk zkT k Hk I − T + Tk. I− T sk z k sk z k sk z k

(10)

When Bk is a positive deﬁnite matrix, to guarantee the heredity of positive deﬁniteness for Bk+1 computed by (9), which ensures the search direction dk computed by (2) is a descent direction, it is important for some quasi-Newton updates, e.g., BFGS update, that the following inequality holds [19], T sT k zk = sk yk + ϑk > 0.

(11)

T Although Wolfe-Powell line search conditions ensure that sT k yk > 0 (see (7)), generally, sk zk may not be positive and consequently, the BFGS update (9) may not generate a positive deﬁnite matrix Bk+1 . Here, in order to ensure Inequality (11), we follow the suggestions in [23, 27] and compute ϑk as follows: T T T 2(fk − fk+1 ) + sT k (gk + gk+1 ), sk yk + 2(fk − fk+1 ) + sk (gk + gk+1 ) sk uk , (12) ϑk = 0, otherwise,

where is a positive constant. Now, we can state our modiﬁed BFGS algorithm. The convergence property of the following algorithm will be discussed in the next section. Also, as mentioned before, computing the hybridization parameter θk (in Step 6 of the following algorithm) is the subject of Section 4. Algorithm 2.1.

A modiﬁed BFGS algorithm based on the hybrid secant equation (5).

Step 0. Select x0 ∈ Rn and a positive deﬁnite matrix B0 (H0 ) ∈ Rn×n as initial approximations respectively for x∗ (the solution to (1)) and G(x∗ ) (G(x∗ )−1 ), and also a tolerance τ > 0 to use in the stopping criterion. Set k = 0. Step 1.

If gk < τ then stop.

Step 2.

Set dk = −Bk−1 gk (= −Hk gk ).

Step 3. Compute a step length αk along the search direction dk such that the Wolfe-Powell line search conditions (6) are satisﬁed.

Babaie-Kafaki S

Sci China Math

September 2011

Vol. 54

No. 9

Step 4.

Set xk+1 = xk + αk dk .

Step 5.

Compute ϑk by (12). If ϑk = 0 then set zk = yk and go to Step 8.

Step 6.

Select a value for the hybridization parameter θk in [0, 1].

Step 7.

Set uk = (1 − θk )yk + θk sk and zk = yk +

Step 8.

Compute Bk+1 by (9) (Hk+1 by (10)), set k := k + 1 and go to Step 1.

3

2023

ϑk uk . sT k uk

Convergence analysis

In this section, we ﬁrst show that the proposed modiﬁed BFGS method is globally convergent for uniformly convex functions and then, under proper conditions, we prove the local and superlinear convergence for Algorithm 2.1. 3.1

Global convergence analysis

Here, we focus on the global convergence analysis of Algorithm 2.1 for uniformly convex functions. So the following deﬁnition is now necessary. Definition 3.1. [22] Let S ⊂ Rn be a nonempty convex set and consider the function f : S → R. If there exists a constant > 0 such that for any x1 , x2 ∈ S and for all α ∈ (0, 1), we have 1 f (αx1 + (1 − α)x2 ) αf (x1 ) + (1 − α)f (x2 ) − α(1 − α)x1 − x2 2 , 2 then f is called a uniformly (or strongly) convex function on S. Theorem 3.1. [22] Let S ⊂ Rn be a nonempty open convex set and consider the twice continuously diﬀerentiable function f : S → R. f is uniformly convex if and only if its Hessian matrix is uniformly positive deﬁnite at each point of S, i.e., there exists a constant γ > 0 such that v T G(x)v γv2 ,

∀ x ∈ S, ∀ v ∈ Rn .

(13)

Here, we make the following basic assumption on the objective function. Assumption 3.1. The objective function f is twice continuously diﬀerentiable and uniformly convex in some neighborhood N of the level set L deﬁned by L = {x ∈ Rn | f (x) f (x0 )},

(14)

where x0 is the starting point of Algorithm 2.1. Remark 3.1. Consider the sequence {xk }∞ k=0 generated by Algorithm 2.1. Since the search directions generated by Algorithm 2.1 are descent directions, the ﬁrst condition of the Wolfe-Powell line {dk }∞ k=0 ∞ search conditions (6) guarantees that {xk }k=0 ⊂ L ⊂ N . Also, under Assumption 3.1, L is a bounded closed convex set [22]. Lemma 3.1. Under Assumption 3.1, g is Lipschitz continuous on L; i.e., there exists a constant Λ > 0 such that g(x) − g(y) Λx − y, ∀ x, y ∈ L. Proof.

From the mean-value theorem, we can write g(x) − g(y) = G(xξ )(x − y) G(xξ )x − y,

(15)

where xξ = ξx + (1 − ξ)y, for some ξ ∈ (0, 1). Since L is a closed set and the Hessian G is a continuous function, there exists a positive constant Λ such that G(x) Λ,

∀ x ∈ L.

(16)

2024

Babaie-Kafaki S

Sci China Math

September 2011

Vol. 54

No. 9

Now, because L is a convex set, from (15) and (16) the Lipschitz continuity for g (on L) is concluded. Lemma 3.2. Under Assumption 3.1, there exists a constant ρ ∈ (0, 1] such that for the sequence {xk }∞ k=0 generated by Algorithm 2.1 the following inequality holds : sT k uk ρsk uk . Proof.

From Lemma 3.1, we have uk = (1 − θk )yk + θk sk (1 − θk )yk + θk sk (1 − θk )Λsk + θk sk = [(1 − θk )Λ + θk ]sk .

Therefore, uk M1 sk , M1 = max{Λ, 1}.

(17)

Also, from the mean-value theorem, we can write yk = G(xς )sk ,

xς = ςxk + (1 − ς)xk+1 ,

(18)

for some ς ∈ (0, 1). Now, from (18), Theorem 3.1 and Inequality (17) we have T 2 sT k uk = (1 − θk )sk yk + θk sk

(1 − θk )γsk 2 + θk sk 2 = [(1 − θk )γ + θk ]sk 2 m1 m1 sk 2 sk uk , M1 in which m1 = min{γ, 1}. Thus, to complete the proof, it is enough to let ρ =

m1 M1 .

Lemma 3.3. Under Assumption 3.1, for the sequence {xk }∞ k=0 generated by Algorithm 2.1 the following inequality holds : zk Γsk . Proof.

From the mean-value theorem, we can write T |ϑk | = |2(fk − fk+1 ) + sT k (gk + gk+1 )| = |[−2g(xζ ) + gk + gk+1 ] sk |,

where xζ = ζxk + (1 − ζ)xk+1 , for some ζ ∈ (0, 1). Therefore, from Lemma 3.1 we have |ϑk | [gk − g(xζ ) + gk+1 − g(xζ )]sk [Λ(1 − ζ)sk + Λζsk ]sk = Λsk 2 .

(19)

Now, from Lemmas 3.1 and 3.2, and inequality (19), we have ϑk |ϑk | Λ sk . uk Λ + zk = yk + T uk yk + T ρ sk u k |sk uk | So, to complete the proof it is enough to let Γ = Λ +

Λ ρ.

Lemma 3.4. Under Assumption 3.1, for the sequence {xk }∞ k=0 generated by Algorithm 2.1 the following inequality holds : 2 sT k zk γsk . Proof.

From Taylor’s theorem, we can write 1 T fk = f (xk+1 − sk ) = fk+1 − sT k gk+1 + sk G(xι )sk , 2

where xι = ιxk + (1 − ι)xk+1 , for some ι ∈ (0, 1). Therefore, from (13) and Remark 3.1 we have T T 2 sT k zk = 2(fk − fk+1 + sk gk+1 ) = sk G(xι )sk γsk .

Babaie-Kafaki S

Sci China Math

September 2011

Vol. 54

No. 9

2025

Theorem 3.2. [22] Let f : Rn → R be continuously diﬀerentiable and bounded below, and ∇f be uniformly continuous on the level set L deﬁned by (14). For the sequence {xk }∞ k=0 generated by Algorithm 2.1 we have lim gk cos νk = 0, k→∞

where cos νk = −

sT k gk . gk sk

Lemma 3.5. Under Assumption 3.1, for the sequence {xk }∞ k=0 generated by Algorithm 2.1 we have lim gk cos νk = 0,

k→∞

where cos νk =

sT k Bk s k . Bk sk sk

(20)

Proof. From Assumption 3.1, f is continuously diﬀerentiable and from Remark 3.1, the level set L is a closed subset of Rn . Therefore, f is bounded below on L. Also, Lemma 3.1 guarantees that g is uniformly continuous on L. So, because the search directions {dk }∞ k=0 generated by Algorithm 2.1 are descent directions, from Theorem 3.2 the proof is complete. Now, using the above results we prove the following global convergence theorem similar to the proof of Theorem 5.3.6 in [22]. Theorem 3.3.

Under Assumption 3.1, for the sequence {xk }∞ k=0 generated by Algorithm 2.1, we have lim inf gk = 0. k→∞

Proof. that

By computing the trace and determinant of Bk+1 in the BFGS update formula (9), we obtain tr(Bk+1 ) = tr(Bk ) −

Bk sk 2 zk 2 + , sT sT k Bk s k k zk

and det(Bk+1 ) = det(Bk )

sT k zk . sT k Bk s k

sT k zk , sk 2

zk 2 , sT k zk

If we deﬁne mk =

Mk =

then from Lemmas 3.3 and 3.4 we have mk γ,

Mk

Also, if we deﬁne cos νk by (20) and let qk = then we can write

Γ2 sk 2 Γ2 ˆ = Γ. 2 γsk γ sT k Bk s k , sk 2

Bk sk 2 Bk sk 2 sk 2 sT qk k Bk s k = = . T T 2 2 sk cos2 νk s k Bk s k (sk Bk sk )

In addition, from (21) we have det(Bk+1 ) = det(Bk )

2 sT mk k zk sk = det(Bk ) . sk 2 sT qk B s k k k

Now, we introduce the following function on the set of all positive deﬁnite matrices X ∈ Rn×n , ψ(X) = tr(X) − ln(det(X)).

(21)

2026

Babaie-Kafaki S

Sci China Math

September 2011

Vol. 54

No. 9

It is not diﬃcult to show that ψ(Bk ) > 0. By using the above results, we have qk − ln(det(Bk )) − ln mk + ln qk cos2 νk qk qk + ln cos2 νk . = ψ(Bk ) + (Mk − ln mk − 1) + 1 − + ln cos2 νk cos2 νk

ψ(Bk+1 ) = tr(Bk ) + Mk −

(22)

Note that h(t) = 1 − t + ln t 0, for all t > 0. Hence, from (22) we have 0 < ψ(Bk+1 ) ψ(B0 ) + ¯(k + 1) +

k

ln cos2 νj ,

(23)

j=0

ˆ where the constant ¯ = Γ−ln γ −1 is assumed to be positive without loss of generality. From Lemma 3.5 we have (24) lim gk cos νk = 0. k→∞

Now, if cos νk → 0, then there exists k1 > 0 such that for all j > k1 we have ln cos νj < −2 ¯. So, using (23), for all k > k1 large enough, we have 0 < ψ(B0 ) + ¯(k + 1) +

k1

ln cos2 νj +

j=0

= ψ(B0 ) +

k1

k

(−2 ¯)

j=k1 +1

ln cos2 νj + 2 ¯k1 − ¯(k − 1) < 0,

j=0

which gives a contradiction. Therefore, the assumption cos νk → 0 is not true. Hence there exists a subsequence {ji }∞ i=1 such that (25) cos νji γ¯ > 0, ∀ i 0. From (24) and (25) we deduce that lim inf gk = 0. k→∞

3.2

Local convergence analysis

Here, we study the local convergence property of the proposed modiﬁed BFGS algorithm (Algorithm 2.1) using the approach suggested by Dennis and Mor´e [4]. Similar to the convergence analysis in [4], we shall discuss only the local convergence property for the direct iterations (αk = 1) of our modiﬁed BFGS algorithm. The corresponding convergence properties about the damped version, in which a step length αk is computed such that Wolfe-Powell line search conditions are satisﬁed, can be deduced from the results of the direct iterations [5, 20]. Also, it can be shown that in some neighborhood of the optimal solution, αk = 1 satisﬁes in the Wolfe-Powell line search conditions (see the ﬁnal part of the proof of Theorem 3.8 in [18]). We make the following basic assumptions on the objective function. Assumption 3.2. The function f is twice continuously diﬀerentiable in an open convex set D that contains x∗ , where x∗ is a local minimizer of f in which g(x∗ ) = 0 and G(x∗ ) is positive deﬁnite. Assumption 3.3. The Hessian G(x) is Holder continuous on D, i.e., there are positive constants p and L such that G(x) − G(y) Lx − yp , ∀ x, y ∈ D. Remark 3.2 The continuity and positive deﬁniteness of G(x∗ ) imply that there are three positive constants r, m and M such that ∀ v ∈ Rn ,

∀ x ∈ Nr (x∗ ) ⊂ D : mv2 v T G(x)v M v2 .

Babaie-Kafaki S

Sci China Math

September 2011

Vol. 54

No. 9

2027

Below, we discuss some properties of our hybrid secant equation that, as pointed out in [29, 30], are necessary for proving the local and superlinear convergence of a BFGS-type algorithm. Lemma 3.6. Under Assumption 3.2, there exists a constant ρ ∈ (0, 1] such that in some neighborhood of x∗ for the sequence {xk }∞ k=0 generated by Algorithm 2.1, the following inequality holds : sT k uk ρsk uk . Proof. Remark 3.2 ensures that the objective function f is uniformly convex on Nr (x∗ ). So, the proof is similar to the proof of Lemma 3.2 and hence is omitted. Lemma 3.7. Under Assumption 3.3, for the sequence {xk }∞ k=0 generated by Algorithm 2.1, the following inequality holds : |ϑk | Lsk 2 σ(xk , xk+1 ), in which σ(xk , xk+1 ) = max{xk+1 − x∗ p , xk − x∗ p }. Proof.

From Taylor’s theorem, we can write 1 T s G(xξ )sk , 2 k 1 fk − fk+1 + gkT sk = − sT G(xη )sk , 2 k T sk = fk − fk+1 + gk+1

where xξ = ξxk + (1 − ξ)xk+1 ,

xη = ηxk + (1 − η)xk+1 ,

for some ξ, η ∈ (0, 1). Therefore, 2(fk − fk+1 ) + (gk + gk+1 )T sk =

1 T s [G(xξ ) − G(xη )]sk . 2 k

(26)

Now, from (12), (26) and Assumption 3.3 we have 1 T |s [G(xξ ) − G(xη )]sk | 2 k 1 sk 2 (G(xξ ) − G(x∗ ) + G(x∗ ) − G(xη )) 2 1 Lsk 2 (xξ − x∗ p + x∗ − xη p ) 2

|ϑk |

Lsk 2 σ(xk , xk+1 ).

Lemma 3.8. Under Assumptions 3.2 and 3.3, in some neighborhood of x∗ for the sequence {xk }∞ k=0 generated by Algorithm 2.1 the following inequality holds : zk − G(x∗ )sk cσ(xk , xk+1 )sk , where c is a positive constant. Proof. Consider the neighborhood Nr (x∗ ) introduced in Remark 3.2. From Lemmas 3.6 and 3.7, Taylor’s theorem stated by (18) and Assumption 3.3, we have |ϑk | uk |sT k uk | L G(xς ) − G(x∗ )sk + σ(xk , xk+1 )sk ρ L Lxς − x∗ p sk + σ(xk , xk+1 )sk ρ

zk − G(x∗ )sk yk − G(x∗ )sk +

2028

Babaie-Kafaki S

Sci China Math

September 2011

Vol. 54

No. 9

1 σ(xk , xk+1 )sk . L 1+ ρ Therefore, to complete the proof it is enough to let c = L(1 + 1ρ ). Lemma 3.9. Under Assumptions 3.2 and 3.3, there exists a positive constant β such that in some neighborhood of x∗ for the sequence {xk }∞ k=0 generated by Algorithm 2.1 the following inequality holds : 1 sk zk βsk . β Proof. Consider the neighborhood Nr (x∗ ) introduced in Remark 3.2. From Lemmas 3.6 and 3.7, Taylor’s theorem stated by (18) and Remark 3.2, we have |ϑk | uk |sT k uk | L M sk + σ(xk , xk+1 )sk ρ L M + σ(xk , xk+1 ) sk , ρ

zk yk +

(27)

and similarly, |ϑk | L zk yk − T uk msk − σ(xk , xk+1 )sk ρ |sk uk | L m − σ(xk , xk+1 ) sk . ρ Now, we choose the positive constant ε , as the radius of neighborhood Nε (x∗ ), such that

ρm p . ε ∈ 0, min r, L

(28)

(29)

Therefore, from (27) and (28), to complete the proof, it is enough to let −1 Lεp Lεp β = max M + , m− . ρ ρ

Remark 3.3. As pointed out in [29], using Lemmas 3.8 and 3.9, the local and superlinear convergence of Algorithm 2.1 with DFP update formula [19] in Step 8 can be proved. Lemma 3.10. If for some ∈ (0, 1) we have xk+1 − x∗ xk − x∗ ,

∀ k 0,

(30)

∞ and {Bk }∞ k=0 and {Hk }k=0 are bounded, then

(Hk − G(x∗ )−1 )zk =0 k→∞ zk lim

(31)

∗ ensures that the sequence {xk }∞ k=0 generated by Algorithm 2.1 superlinearly converges to x .

Proof.

Since Hk = Bk−1 and αk = 1, then xk+1 = xk − Hk gk . So, we have (Hk − G(x∗ )−1 )zk = Hk gk+1 + sk +

ϑk Hk u k T sk u k

⇒ Hk gk+1 = (Hk − G(x∗ )−1 )zk −

− G(x∗ )−1 zk

ϑk Hk uk + G(x∗ )−1 (zk − G(x∗ )sk ). sT k uk

(32)

Babaie-Kafaki S

Sci China Math

September 2011

Vol. 54

No. 9

2029

∞ Suppose that MB and MH are respectively the supremums of {Bk }∞ k=0 and {Hk }k=0 . Thus, from (32) we have |ϑk | uk + G(x∗ )−1 zk − G(x∗ )sk . gk+1 MB (Hk − G(x∗ )−1 )zk + MH T (33) |sk uk | ∗ Property (30) guarantees that there exists an index K such that {xk }∞ k=K ⊂ Nε (x ), with ε deﬁned by (29). So, from Lemma 3.9, we know that

sk

1 zk . β

(34)

Considering (31) and (34), we have (Hk − G(x∗ )−1 )zk = 0. k→∞ sk lim

(35)

From Lemma 3.8 and the property (30), we have zk − G(x∗ )sk zk − G(x∗ )sk cσ(xk , xk+1 ) ⇒ lim = 0. k→∞ sk sk

(36)

Also, from Lemmas 3.6 and 3.7, we have |ϑk | uk |ϑk | L σ(xk , xk+1 ), 2 s ρs ρ |sT u | k k k k where by considering the property (30) it ensures that lim

k→∞

|ϑk | uk = 0. |sT k uk | sk

(37)

From (33), (35), (36) and (37), we obtain the following important result, lim

k→∞

gk+1 = 0. sk

(38)

Now, from Taylor’s theorem and Remark 3.2, we can write gk+1 = g(xk+1 ) − g(x∗ ) = G(xξ )(xk+1 − x∗ ) mxk+1 − x∗ ,

(39)

where xξ = ξx∗ + (1 − ξ)xk+1 with some ξ ∈ (0, 1). On the other hand, from the property (30), we have sk xk − x∗ + xk+1 − x∗ 2xk − x∗ . Finally, from (38)–(40), we have

(40)

xk+1 − x∗ = 0, k→∞ xk − x∗ lim

which guarantees the superlinear convergence. As pointed out in [29, 30], following carefully the proof of superlinear convergence theorem in [4], we ﬁnd that in most places, if we substitute yk by zk , the proof of [4] remains true trivially, and to complete the proof for our method, only the results stated by Lemmas 3.8–3.10 are necessary. Here, we use these lemmas and some results in [4] to present a compact argument as the local and superlinear convergence analysis of Algorithm 2.1. Theorem 3.4. Suppose that Assumptions 3.2 and 3.3 hold, and there exist nonnegative constants a1 ∞ and a2 such that for the sequences {xk }∞ k=0 and {Hk }k=0 generated by Algorithm 2.1 with αk = 1, for all k 0, the following inequality holds : Hk+1 − G(x∗ )−1 E [1 + a1 σ(xk , xk+1 )]Hk − G(x∗ )−1 E + a2 σ(xk , xk+1 ),

2030

Babaie-Kafaki S

Sci China Math

September 2011

Vol. 54

No. 9

where E ∈ Rn×n is an arbitrary nonsingular symmetric matrix. Then, for each ∈ (0, 1) there exist positive constants ε¯ and δ¯ such that with x0 ∈ Rn satisfying x0 − x∗ < ε¯ and the positive deﬁnite matrix ∗ H0 satisfying H0 − G(x∗ )−1 E < δ¯ the sequence {xk }∞ k=0 converges to x . Furthermore, xk+1 − x∗ xk − x∗ ,

∀ k 0,

∞ and {Hk }∞ k=0 and {Bk }k=0 are uniformly bounded.

Proof.

The proof is similar to the proof of Theorem 3.4 in [4] and hence is omitted.

Corollary 3.1. Under the assumptions of Theorem 3.4, if some subsequence of {Hk − G(x∗ )−1 E }∞ k=0 ∗ converges to zero, then {xk }∞ k=0 superlinearly converges to x . Proof.

The proof is similar to the proof of Corollary 3.3 in [4] and hence is omitted.

Lemma 3.11. Let E ∈ Rn×n be a nonsingular symmetric matrix satisfying the following inequality, Es − E −1 z bE −1 z, for some b ∈ [0, 13 ] and vectors s and z in Rn with z = 0. If for each symmetric matrix H ∈ Rn×n , the ¯ ∈ Rn×n is deﬁned by matrix H zsT ssT sz T ¯ + T , H= I− T H I− T s z s z s z then for any symmetric matrix A ∈ Rn×n satisfying A = H, the following inequality holds : √ s − Az Es − E −1 z 5 ¯ − AE H − AE + 2(1 + 2 n)EF , H 1 − a3 η 2 + 2(1 − b) E −1 z E −1 z 1 − 2b 3 ,1 , a3 = ∈ 1 − b2 8

where

Proof.

η=

E(H − A)z . H − AE E −1 z

The proof is similar to the proof of Lemma 5.2 in [4] and hence is omitted.

Lemma 3.12. Suppose that Assumptions 3.2 and 3.3 hold. Let E ∈ Rn×n be a symmetric matrix such that E 2 = G(x∗ ). In some neighborhood of x∗ for the sequence {xk }∞ k=0 generated by Algorithm 2.1 with αk = 1, for all k 0, the following inequality holds : Esk − E −1 zk b, E −1 zk for some constant b ∈ [0, 13 ]. Proof.

From Lemma 3.8, we have Esk − E −1 zk = E −1 (E 2 sk − zk ) E −1 zk − G(x∗ )sk cE −1 σ(xk , xk+1 )sk .

On the other hand, from Remark 3.2 and Lemma 3.9 we have E −1 zk 2 = zkT E −2 zk = zkT G(x∗ )−1 zk Therefore,

1 1 zk 2 2 sk 2 . M β M

Esk − E −1 zk cE −1 √ σ(xk , xk+1 ). E −1 zk β M

So, we choose the positive constant ε as the radius of neighborhood Nε (x∗ ), such that √ β M , ε ∈ 0, min ε , 3cE −1

(41)

Babaie-Kafaki S

Sci China Math

September 2011

Vol. 54

No. 9

2031

where ε is deﬁned by (29). Therefore, from (41), to complete the proof, it is enough to let b=

cE −1 √ ε . β M

Now, using the above results we can prove the following local and superlinear convergence theorem for Algorithm 2.1. Theorem 3.5. Suppose that Assumptions 3.2 and 3.3 hold. If x0 ∈ D and H0 (B0 ) are suﬃciently close to x∗ and G(x∗ )−1 (G(x∗ )) respectively, then the sequence {xk }∞ k=0 generated by Algorithm 2.1 with ∗ αk = 1, for all k 0, converges to x supperlinearly. ¯ = Hk+1 , and E ∈ Rn×n Proof. In Lemma 3.11, we let A = G(x∗ )−1 , s = sk , z = zk , H = Hk and H

be a nonsingular symmetric matrix such that E 2 = G(x∗ ). So, from Lemma 3.12, the assumptions of Lemma 3.11 are held and hence we have Esk − E −1 zk 5 Hk − G(x∗ )−1 E Hk+1 − G(x∗ )−1 E 1 − a3 ηk2 + 2(1 − b) E −1 zk √ sk − G(x∗ )−1 zk , + 2(1 + 2 n)EF E −1 zk where

1 − 2b 3 ,1 , a3 = ∈ 1 − b2 8

Now, from Inequality (41), we can write ∗ −1 Hk+1 − G(x ) E 1 − a3 ηk2 +

ηk =

E(Hk − G(x∗ )−1 )zk . Hk − G(x∗ )−1 E E −1 zk

cE −1 5 √ σ(xk , xk+1 ) Hk − G(x∗ )−1 E 2(1 − b) β M

√ cE −1 2 + 2(1 + 2 n)EF √ σ(xk , xk+1 ) β M

3 1 − ηk2 + a1 σ(xk , xk+1 ) Hk − G(x∗ )−1 E + a2 σ(xk , xk+1 ), 8 where a1 =

cE −1 5 √ , 2(1 − b) β M

(42)

√ cE −1 2 a2 = 2(1 + 2 n)EF √ . β M

∗ Thus, the assumptions of Theorem 3.4 hold and therefore, the sequence {xk }∞ k=0 converges to x locally. Furthermore, for each ∈ (0, 1) we have

xk+1 − x∗ xk − x∗ ,

∀ k 0,

(43)

∞ and {Hk }∞ k=0 and {Bk }k=0 are uniformly bounded. To prove the superlinear convergence, we note that if there is a subsequence {Hkl }∞ l=0 which converges ∗ −1 ∗ −1 to G(x ) , then Corollary 3.1 gives the desired conclusion; otherwise, {Hk −G(x ) E }∞ k=0 is bounded √ away from zero. Now, since 1 − t 1 − 2t , from inequality (42) we can write

3 2 η Hk − G(x∗ )−1 E Hk − G(x∗ )−1 E − Hk+1 − G(x∗ )−1 E 16 k + σ(xk , xk+1 )[a1 Hk − G(x∗ )−1 E + a2 ].

(44)

On the other hand, from (43) we have σ(xk , xk+1 ) = xk − x∗ p kp x0 − x∗ p . So, (44) and (45) ensure that

∞ k=0

ηk2 Hk − G(x∗ )−1 E < ∞.

(45)

2032

Babaie-Kafaki S

Sci China Math

September 2011

Vol. 54

No. 9

Since {Hk − G(x∗ )−1 E }∞ k=0 is bounded below, we must have (Hk − G(x∗ )−1 )zk = 0. k→∞ zk lim

(46)

∗ Now, (43) and (46) show that the assumptions of Lemma 3.10 hold. Therefore, {xk }∞ k=0 converges to x superlinearly.

4

On the choice of hybridization parameter θk

Here, by using the information of recent iterations, we propose an eﬀective formula for computing the hybridization parameter θk . By multiplying both sides of Equation (5) with sT k−1 , we have T sT k−1 Bk+1 sk = sk−1 zk .

(47)

Now, to obtain a better approximation for Gk+1 , it is suitable for Bk+1 to satisfy the following equation, Bk+1 sk−1 = zk−1 ,

(48)

where it contains the information obtained from more than one previous iteration. It is remarkable that near the optimal solution, because of the approximate quadratic behavior of the objective function, Equation (48) approximately holds. From (47) and (48), we ﬁnd that for computing θk , the following equation, containing Equation (48), should be solved, T sk = sT zk−1 k−1 zk .

After some algebraic manipulations, θk is determined as follows: θk = in which wk = sk−1 − γk sk with γk =

wkT yk , T wk (yk − sk )

(49)

T sT k zk−1 − sk−1 yk , ϑk

where ϑk should be nonzero. Here, if θk < 0, then we set θk = 0 and also, if θk > 1, then we set θk = 1. Thus, an interesting feature of the proposed hybrid secant equation is its adoptive applying of the modiﬁed secant equations proposed in [23] and [27]. It is remarkable that if (wkT yk )(wkT sk ) < 0, then we have θk ∈ [0, 1). Here, we suggest a procedure for computing the hybridization parameter θk . Algorithm 4.1. A procedure for computing the hybridization parameter θk in the modiﬁed secant equation (5). Step 1.

If k = 0 then select an initial value for θk ∈ [0, 1].

Step 2.

Compute θk by (49).

Step 3.

If θk < 0 then set θk = 0.

Step 4.

If θk > 1 then set θk = 1.

It is possible to propose other formulae for the hybridization parameter θk by using proper equations instead of Equation (48), for example, Bk+1 sk−1 = yk−1 . (50) However, a merit of Equation (48), which guarantees the satisfaction of the modiﬁed secant equation (5) at the (k − 1)-th iteration, is its account of both the function and gradient available information. So, here we use Equation (48) to compute the hybridization parameter θk .

Babaie-Kafaki S

5

Sci China Math

September 2011

Vol. 54

2033

No. 9

Numerical results

We present some numerical results obtained by applying a MATLAB 7.7.0.471 (R2008b) implementation of Algorithm 2.1 and two eﬃcient modiﬁed BFGS algorithms proposed by Yuan [27] and Wei et al. [23] on 35 unconstrained optimization test problems of CUTEr collection [15] with diﬀerent dimensions ranging from n = 2 to n = 50 as speciﬁed in Table 1. More exactly, our results are reported for the following three algorithms: 1) M1: Algorithm 2.1 with θk computed by Algorithm 4.1. 2) M2: Algorithm 2.1 with θk = 0, ∀ k 0 (or equivalently, uk = yk ). 3) M3: Algorithm 2.1 with θk = 1, ∀ k 0 (or equivalently, uk = sk ). Table 1

Numerical results

n

M1

M2

M3

AIRCRFTB

8

(1273,802,961.13)

(1028,615,743.50)

(1240,779,934.00)

ALLINITU

4

(64,37,53.00)

(76,44,63.00)

(57,32,46.25)

BEALE

2

(56,34,62.00)

(59,35,64.50)

(96,60,108.00)

BIGGS3

6

(2134,1263,1618.67)

(1814,1135,1437.33)

(2141,1323,1679.83)

BIGGS5

6

(2134,1263,1618.67)

(1814,1135,1437.33)

(2141,1323,1679.83)

BIGGS6

6

(355,231,290.17)

(4863,2900,3710.50)

(4215,2506,3208.50)

BOX2

3

(43,33,47.33)

(80,53,79.67)

(85,56,84.33)

BOX3

3

(43,33,47.33)

(80,53,79.67)

(85,56,84.33)

Functions

BRKMCC

2

(17,10,18.50)

(17,10,18.50)

(17,10,18.50)

50

(9139,5881,6063.78)

(10569,6813,7024.38)

(9653,6149,6342.06)

CUBE

2

(895,573,1020.50)

(1239,797,1416.50)

(2173,1406,2492.50)

DENSCHNA

2

(16,11,19.00)

(21,14,24.50)

(24,16,28.00)

CHNROSNB

DENSCHNB

2

(20,11,21.00)

(20,11,21.00)

(21,12,22.50)

DENSCHNF

2

(542,346,617.00)

(541,345,615.50)

(558,355,634.00)

DIXMAANK

15

(161,148,158.73)

(173,160,171.53)

(195,182,195.00)

ENGVAL2

3

(4001,2494,3827.67)

(6959,4310,6629.67)

(7349,4739,7188.67)

GROWTHLS

3

(14,2,6.67)

(14,2,6.67)

(14,2,6.67)

HAIRY

2

(50,12,37.00)

(94,20,67.00)

(69,16,50.50)

HELIX

3

(723,448,689.00)

(832,520,797.33)

(733,443,687.33)

HILBERTA

2

(13,13,19.50)

(13,13,19.50)

(13,13,19.50)

HILBERTB

10

(9,5,5.90)

(9,5,5.90)

(9,5,5.90)

HIMMELBG

2

(20,12,22.00)

(20,12,22.00)

(18,10,19.00) (15,10,17.50)

HIMMELBH

2

(16,11,19.00)

(14,8,15.00)

JENSMP

2

(11,2,7.50)

(11,2,7.50)

(11,2,7.50)

KOWOSB

4

(282,185,255.50)

(325,209,290.25)

(201,138,188.25)

LOGHAIRY

2

(2305,2120,3272.50)

(3888,3532,5476.00)

(5040,4686,7206.00)

PALMER5C

6

(13,7,9.17)

(13,7,9.17)

(13,7,9.17)

ROSENBR

2

(1845,1186,2108.50)

(1541,990,1760.50)

(1720,1115,1975.00)

S308

2

(178,110,199.00)

(173,106,192.50)

(181,111,201.50)

SINEVAL

2

(1045,664,1186.50)

(1345,861,1533.50)

(1239,788,1407.50)

SISSER

2

(22,21,32.00)

(28,26,40.00)

(22,21,32.00)

SNAIL

2

(33,26,42.50)

(22,13,24.00)

(21,12,22.50) (63,32,33.26)

VAREIGVL

50

(61,31,32.22)

(61,31,32.22)

YFITU

3

(9795,6403,9668.00)

(10402,6818,10285.33)

Failed

ZANGWIL2

2

(3,3,4.50)

(3,3,4.50)

(3,3,4.50)

6.26

(809.88,530.23,717.45)

(1110.56,729.12,995.31)

(1159.85,777.00,1077.64)

Average

2034

Babaie-Kafaki S

Sci China Math

September 2011

Vol. 54

No. 9

All of the runs have been done on a mobile computer, Intel(R) Core(TM)2 Duo CPU 2.00 GHz, with 1 GB of RAM. In the implementation of Algorithm 2.1, we let B0 = I, τ = 10−6 , = 10−18 in (12), δ = 10−4 and σ = 0.9 in (6). Also, all attempts to solve the problems were limited to reaching the maximum of 10000 iterations. To have a proper choice for θ0 in Step 1 of Algorithm 4.1, numerical experiments have been done with diﬀerent values of θ0 ∈ {0.05j}20 j=0 . Based on the observations, we let θ0 = 0.5 because of its proper results. Table 1 shows the numerical results that are given in the form of (Nf, Ng, efe), where Nf and Ng respectively denote the numbers of functions and gradient evaluations, and efe = Table 2 Functions

n

Nf + Ng, n

CPU time (in second) M1

M2

M3 2.21E−01

AIRCRFTB

8

2.12E−01

1.88E−01

ALLINITU

4

2.80E−02

2.74E−02

2.66E−02

BEALE

2

8.89E−03

9.48E−03

1.29E−02

BIGGS3

6

4.08E−01

3.39E−01

3.91E−01

BIGGS5

6

4.01E−01

3.27E−01

3.75E−01

BIGGS6

6

7.78E−02

8.64E−01

6.83E−01

BOX2

3

9.94E−03

1.42E−02

1.67E−02

BOX3

3

9.44E−03

1.45E−02

1.69E−02

BRKMCC

2

4.13E−03

3.55E−03

4.06E−03

50

1.00E+00

1.15E+00

1.37E+00

CUBE

2

1.50E−01

1.82E−01

2.61E−01

DENSCHNA

2

1.38E−02

1.76E−02

3.93E−02

CHNROSNB

DENSCHNB

2

4.42E−03

3.94E−03

9.08E−03

DENSCHNF

2

7.65E−02

5.94E−02

7.11E−02

DIXMAANK

15

6.43E−02

6.15E−02

5.67E−02

ENGVAL2

3

4.67E−01

8.29E−01

7.68E−01

GROWTHLS

3

2.12E−03

6.27E−03

1.82E−03

HAIRY

2

2.58E−02

2.73E−02

8.62E−03

HELIX

3

9.93E−02

7.10E−02

6.81E−02

HILBERTA

2

2.54E−02

3.81E−03

3.99E−03

HILBERTB

10

3.33E−02

4.58E−02

4.78E−02

HIMMELBG

2

1.00E−02

1.47E−02

1.32E−02

HIMMELBH

2

5.56E−03

3.06E−03

4.55E−02

JENSMP

2

6.51E−03

1.38E−03

1.18E−02

KOWOSB

4

8.24E−02

5.80E−02

2.72E−02

LOGHAIRY

2

4.39E−01

6.57E−01

8.46E−01

PALMER5C

6

3.08E−03

2.55E−03

2.88E−03

ROSENBR

2

2.11E−01

1.16E−01

1.24E−01

S308

2

2.64E−02

2.66E−02

2.50E−02

SINEVAL

2

1.21E−01

1.60E−01

1.50E−01

SISSER

2

6.66E−03

4.67E−02

2.33E−02

SNAIL

2

5.78E−03

1.36E−02

1.06E−02 1.92E−02

50

1.55E−02

1.90E−02

YFITU

VAREIGVL

3

1.05E+00

1.09E+00

Failed

ZANGWIL2

2

6.83E−03

1.27E−03

6.66E−02

6.26

1.19E−01

1.58E−01

1.71E−01

Average

Babaie-Kafaki S

Sci China Math

September 2011

1.1

1.1

1.0

1.0 0.8 p (ω)

0.8 p (ω)

2035

No. 9

0.9

0.9

0.7

0.7 0.6 0.5

0.6

0.4

0.5 0.4 0.3 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 ω Figure 1

Vol. 54

M1 M2 M3

2.6 2.8 3.0

Eﬀective function evaluations performance proﬁles

0.3 0.2 0.1 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 ω Figure 2

M1 M2 M3

2.6 2.8 3.0

CPU time performance proﬁles

where efe (eﬀective function evaluations) is a criterion for comparing the performance of the algorithms. Note that we consider n function evaluations as one gradient evaluation to account for the number of components of the gradient. Also, for each numerical result in the form of (Nf, Ng, efe), the related execution time (in second) is shown in Table 2. In our numerical experiments, M1 and M2 successfully solved all the problems, while failure occurred for M3 in solving ‘YFITU’. The average of the numerical results and the execution time (see the last row of Tables 1 and 2), where it is computed without considering the results of ‘YFITU’, shows that M1 (our modiﬁed BFGS algorithm) is more eﬃcient than M2 and M3. Eﬃciency comparisons were also made using the performance proﬁle proposed by Dolan and Mor´e [9]. The performance proﬁle gives, for every ω 1, the proportion p(ω) of the test problems that each considered algorithmic variant has a performance within a factor of ω of the best. As shown in Figures 1 and 2, M1 is more eﬃcient than the other algorithms, demonstrating the eﬀectiveness of the proposed hybrid secant equation. Figure 2 shows that the time which is spent to compute the hybridization parameter θk causes a decrease in the performance of M1 in the perspective of CPU time. Numerical comparisons also show that M2 is more eﬃcient than M3.

6

Conclusions

By making a convex combination of the modiﬁed secant equations proposed by Yuan [27] and Wei et al. [23], a hybrid secant equation and also, a modiﬁed BFGS algorithm has been proposed. The hybridization parameter has been eﬀectively computed using the available information of recent iterations. Under proper conditions, it has been shown that the proposed algorithm is globally, locally and superlinearly convergent. By using the performance proﬁle introduced by Dolan and Mor´e, a comparison between the implementations of the proposed algorithm and two eﬃcient modiﬁed BFGS algorithms proposed by Yuan [27] and Wei et al. [23], on a set of unconstrained optimization test problems from the CUTEr collection, has been done. As the numerical comparisons showed, our modiﬁed BFGS algorithm outperforms the modiﬁed BFGS algorithms proposed in [23,27]. This shows the eﬀectiveness of the proposed hybrid secant equation. Acknowledgements The author is grateful to the anonymous referees for their valuable comments and suggestions that helped to improve the quality of this work. He also thanks the Research Council of Semnan University for its support.

References 1 Amini K, Ghorbani Rizi A. A new structured quasi-Newton algorithm using partial information on Hessian. J Comput Appl Math, 2010, 234: 805–811

2036

Babaie-Kafaki S

Sci China Math

September 2011

Vol. 54

No. 9

2 Babaie-Kafaki S, Ghanbari R, Mahdavi-Amiri N. Two new conjugate gradient methods based on modiﬁed secant equations. J Comput Appl Math, 2010, 234: 1374–1386 3 Broyden C G. The convergence of a class of double-rank minimization algorithms. II. The new algorithm. J Inst Math Appl, 1970, 6: 222–231 4 Broyden C G, Dennis Jr J E, Mor´e J J. On the local and superlinear convergence of quasi-Newton methods. J Inst Math Appl, 1973, 12: 223–245 5 Byrd R H, Nocedal J, Yuan Y X. Global convergence of a class of quasi-Newton methods on convex problems. SIAM J Numer Anal, 1987, 24: 1171–1190 6 Chen L H, Deng N Y, Zhang J Z. A modiﬁed quasi-Newton method for structured optimization with partial information on the Hessian. Comput Optim Appl, 2006, 35: 5–18 7 Davidon W C. Conic approximations and collinear scalings for optimizers. SIAM J Numer Anal, 1980, 17: 268– 281 8 Dennis Jr J E, Mart´ınez H J, Tapia R A. Convergence theory for the structured BFGS secant method with an application to nonlinear least squares. J Optim Theory Appl, 1989, 61: 161–178 9 Dolan E D, Mor´ e J J. Benchmarking optimization software with performance proﬁles. Math Program, 2002, 91: 201–213 10 Fletcher R. A new approach to variable metric algorithms. Comput J, 1970, 13: 317–322 11 Ford J A. A survey of multi-step quasi-Newton methods. In: Proceedings of the International Conference on Scientiﬁc Computations (Beirut, Lebanon), 1999 12 Ford J A, Moghrabi I A. Multi-step quasi-Newton methods for optimization. J Comput Appl Math, 1994, 50: 305–323 13 Ford J A, Moghrabi I A. Using function-values in multi-step quasi-Newton methods. J Comput Appl Math, 1996, 66: 201–211 14 Goldfarb D. A family of variable-metric methods derived by variational means. Math Comp, 1970, 24: 23–26 15 Gould N I M, Orban D, Toint P L. CUTEr: a constrained and unconstrained testing environment, revisited. ACM Trans Math Softw, 2003, 29: 373–394 16 Guo Q, Liu J G, Wang D H. A modiﬁed BFGS method and its superlinear convergence in nonconvex minimization with general line search rule. J Appl Math Comput, 2008, 28: 435–446 17 Huang H Y. Uniﬁed approach to quadratically convergent algorithms for function minimization. J Optim Theory Appl, 1970, 5: 405–423 18 Li D H, Fukushima M. A modiﬁed BFGS method and its global convergence in nonconvex minimization. J Comput Appl Math, 2001, 129: 15–35 19 Nocedal J, Wright S J. Numerical Optimization, 2nd ed. New York: Springer, 2006 20 Powell M J D. Some global convergence properties of a variable metric algorithm for minimization without exact line searches. In: Cottle R W, Lemke C E, eds. Nonlinear Programming SIAM-AMS Proceeding, 1976, 53–72 21 Shanno D F. Conditioning of quasi-Newton methods for function minimization. Math Comp, 1970, 24: 647–656 22 Sun W, Yuan Y X. Optimization Theory and Methods: Nonlinear Programming. New York: Springer, 2006 23 Wei Z, Li G, Qi L. New quasi-Newton methods for unconstrained optimization problems. Appl Math Comput, 2006, 175: 1156–1188 24 Wei Z, Yu G, Yuan G, et al. The superlinear convergence of a modiﬁed BFGS-type method for unconstrained optimization. Comput Optim Appl, 2004, 29: 315–332 25 Yabe H, Ogasawara H, Yoshino M. Local and superlinear convergence of quasi-Newton methods based on modiﬁed secant conditions. J Comput Appl Math, 2007, 205: 617–632 26 Yabe H, Takano M. Global convergence properties of nonlinear conjugate gradient methods with modiﬁed secant condition. Comput Optim Appl, 2004, 28: 203–225 27 Yuan Y X. A modiﬁed BFGS algorithm for unconstrained optimization. IMA J Numer Anal, 1991, 11: 325–332 28 Yuan Y X, Byrd R H. Non-quasi-Newton updates for unconstrained optimization. J Comput Math, 1995, 13: 95–107 29 Zhang J, Xu C. Properties and numerical performance of quasi-Newton methods with modiﬁed quasi- Newton equations. J Comput Appl Math, 2001, 137: 269–278 30 Zhang J Z, Deng N Y, Chen L H. New quasi-Newton equation and related methods for unconstrained optimization. J Optim Theory Appl, 1999, 102: 147–167

A modified BFGS algorithm based on a hybrid secant equation

Recommend Documents