SCIENCE CHINA Mathematics
. ARTICLES .
September 2011 Vol. 54 No. 9: 2019–2036 doi: 10.1007/s11425-011-4232-7
A modified BFGS algorithm based on a hybrid secant equation BABAIE-KAFAKI Saman Department of Mathematics, Faculty of Mathematics, Statistics and Computer Sciences, Semnan University, Semnan 35196-45399, Iran Email:
[email protected],
[email protected] Received August 10, 2010; accepted February 13, 2011; published online June 29, 2011
Abstract By making a convex combination of the modified secant equations proposed by Yuan and Wei et al., a hybrid secant equation and also, a modified BFGS algorithm is proposed. The hybridization parameter is effectively computed using the available information of recent iterations. Under proper conditions, it is shown that the proposed algorithm is globally, locally and superlinearly convergent. By using the performance profile introduced by Dolan and Mor´e, a comparison between the implementations of the proposed algorithm and two efficient modified BFGS algorithms proposed by Yuan and Wei et al., on a set of unconstrained optimization test problems from the CUTEr collection, is done. Numerical results demonstrating the efficiency of the proposed modified BFGS algorithm are reported. Keywords unconstrained optimization, hybrid secant equation, BFGS update, global convergence, local and superlinear convergence MSC(2000):
90C53, 49M37, 65K05
Citation: Babaie-Kafaki S. A modified BFGS algorithm based on a hybrid secant equation. Sci China Math, 2011, 54(9): 2019–2036, doi: 10.1007/s11425-011-4232-7
1
Introduction
Secant (quasi-Newton) equations play essential roles for approximation of the Hessian matrix (the matrix of second order partial derivatives) of an objective function in each iteration of quasi-Newton methods which are efficient iterative methods for solving unconstrained optimization problems. A general quasiNewton method is designed to solve the following unconstrained optimization problem, min f (x),
(1)
x∈Rn
where f : Rn → R is a smooth nonlinear function. Notations.
For a sufficiently smooth function f , we consider the following notations: g(x) = ∇f (x),
G(x) = ∇2 f (x),
and at the k-th iteration, fk = f (xk ),
gk = g(xk ),
Gk = G(xk ),
yk = gk+1 − gk .
Here, · stands for the Euclidean norm and for a nonsingular symmetric matrix E ∈ Rn×n , · E is the matrix norm defined by AE = EAEF , for an arbitrary matrix A ∈ Rn×n . Also, a neighborhood of x), is defined by an arbitrary point x ¯ ∈ Rn with the radius r¯ > 0, Nr¯(¯ c Science China Press and Springer-Verlag Berlin Heidelberg 2011
math.scichina.com
www.springerlink.com
2020
Babaie-Kafaki S
Sci China Math
September 2011
Vol. 54
No. 9
Nr¯(¯ x) = {x ∈ Rn : x − x¯ < r¯}. The iterative formula of a quasi-Newton method is given by xk+1 = xk + sk ,
sk = αk dk ,
k = 0, 1, . . . ,
where αk is a step length to be computed by a line search technique [19], and dk is the search direction to be calculated by solving the following linear system of equations, Bk dk = −gk ,
(2)
in which Bk is an approximation for Gk . Quasi-Newton methods are characterized by the fact that Bk is effectively updated to obtain a new matrix Bk+1 in the following form, Bk+1 = Bk + ΔBk , where ΔBk is a correction matrix. The matrix Bk+1 is imposed on satisfying a suitable equation, namely secant equation, which includes the second order derivatives information. The most popular equation is the (standard) secant equation, i.e., Bk+1 sk = yk . The standard secant equation only uses the gradient values available at the current iteration and ignores the function values. Efforts have been made to modify the standard secant equation so that more available information will be employed and consequently, a better approximation for Gk+1 will be obtained. Using an interpolation for f , Davidon [7] proposed a non-quadratic approximation model, namely conic model, that uses both the function and gradient values available at the current iteration. Based on a quadratic interpolation for f , Yuan [27] and Wei et al. [23] proposed modified secant equations which approximate the curvature of the objective function more accurate than the standard secant equation (see also [24,28]). Zhang et al. [30] and Zhang and Xu [29] used a quadratic interpolation for g and proposed another modified secant equation with a higher order accuracy in approximating the curvature of the objective function. In order to adoptively switch the standard secant equation to the modified secant equation proposed in [29, 30], slight revisions have been made by Yabe et al. [25, 26] and Babaie-Kafaki et al. [2] by embedding a parameter on this equation. All the mentioned methods in [23, 27, 29, 30] use both the gradient and function values available at the current iteration. In the cases that the Hessian has some special structures and is partially available, for example, in the nonlinear least squares problems, the above mentioned secant equations have been developed by Dennis et al. [8], Chen et al. [6], and Amini and Ghorbani Rizi [1]. Convexity assumption on the objective function plays an important role in convergence analysis of the quasi-Newton methods based on all the above mentioned secant equations. Remarkable secant equations have been proposed by Guo et al. [16] and Li and Fukushima [18], in order to achieve the convergence of quasi-Newton methods without convexity assumption on the objective function. Some other attempts have been done in order to use the available data from more than one previous step. Many researchers dealt with the extended secant equation, i.e., Bk+1 si = yi ,
i = 1, 2, . . . , k,
which uses all the available gradient values and guarantees the at most n-step termination property for minimization of a convex quadratic function [17]. Ford and Moghrabi [12, 13] developed the concept of multi-step quasi-Newton methods based on the polynomial interpolation using available data from the m recent steps (see also [11] and references therein). Here, we suggest a modified secant equation that is a convex combination of the modified secant equations proposed by Yuan [27] and Wei et al. [23]. This work is organized as follows. In Section 2, we first state our hybrid secant equation and then, based on the proposed secant equation, we suggest a modified BFGS algorithm. In Section 3, we show that under proper conditions, the proposed algorithm is globally, locally and superlinearly convergent. Computing the hybridization parameter based on the available information of recent iterations is discussed in Section 4. We make a numerical comparison between the implementations of our algorithm and two efficient modified BFGS algorithms proposed by Yuan [27] and Wei et al. [23] in Section 5. Finally, we make conclusions in Section 6.
Babaie-Kafaki S
2
Sci China Math
September 2011
Vol. 54
No. 9
2021
The hybrid secant equation
Here, we propose our hybrid secant equation as a convex combination of the modified secant equations proposed in [23] and [27]. Assume that the objective function f is smooth enough. If we expand the function f at xk = xk+1 − sk using Taylor’s theorem, then we have 1 T 1 T 4 fk = fk+1 − sT k gk+1 + sk Gk+1 sk − sk (Tk+1 sk )sk + O(sk ), 2 6 where sT k (Tk+1 sk )sk =
n ∂ 3 f (xk+1 ) i j l s s s . ∂xi ∂xj ∂xl k k k
(3)
i,j,l=1
Therefore, 1 T T 4 sT k Gk+1 sk = 2(fk − fk+1 + sk gk+1 ) + sk (Tk+1 sk )sk + O(sk ) 3 1 T T 4 = sT k yk + 2(fk − fk+1 ) + sk (gk + gk+1 ) + sk (Tk+1 sk )sk + O(sk ). 3 So, the following approximation can be proposed, T sT k Gk+1 sk ≈ sk yk + ϑk ,
(4)
where ϑk = 2(fk − fk+1 ) + sT k (gk + gk+1 ). Since in the quasi-Newton methods, Bk+1 is required to approximate Gk+1 , by considering (4), a modified secant equation can be proposed as follows: Bk+1 sk = zk ,
zk = yk +
ϑk uk , T sk u k
(5)
where uk ∈ Rn is a vector parameter satisfying sT k uk = 0. The modified secant equation (5) is justified by the following theorem (see [23]). Theorem 2.1. If f is sufficiently smooth and sk is small enough, then the following estimating relations hold : 1 T s (Tk+1 sk )sk + O(sk 4 ), 2 k 1 T s (Tk+1 sk )sk + O(sk 4 ). sT k (Gk+1 sk − zk ) = 3 k
sT k (Gk+1 sk − yk ) =
For a quadratic objective function f , we have ϑk = 0 and consequently, the modified secant equation (5) reduces to the standard secant equation. A comprehensive study has been done on the properties of the modified secant equation (5) by Wei et al. [23, 24]. Different choices for the vector parameter uk in (5) lead to different modified secant equations. The particular choice uk = sk has been considered by Wei et al. [23,24]. Also, if the step length αk is computed such that the Wolfe-Powell line search conditions hold, i.e., f (xk + αk dk ) − fk δαk gkT dk , g(xk + αk dk )T dk σgkT dk ,
(6)
with 0 < δ < σ < 1, then we have T T T sT k yk = sk gk+1 − sk gk −(1 − σ)sk gk > 0.
(7)
2022
Babaie-Kafaki S
Sci China Math
September 2011
Vol. 54
No. 9
Therefore, based on the Wolfe-Powell line search conditions the particular choice uk = yk was firstly proposed by Yuan [27] and then studied by Wei et al. [23]. In the modified secant equation (5), only the information of the current iteration is used and other available information, obtained from more than one previous iteration, is ignored. This motivated us to improve Equation (5) by considering the vector parameter uk as a convex combination of yk and sk , i.e., uk = (1 − θk )yk + θk sk ,
θk ∈ [0, 1],
(8)
and then, to compute an effective value for the hybridization parameter θk based on the available information of the recent iterations (which will be discussed in Section 4). Note that the choices θk = 0 and θk = 1 in (8) converts Equation (5) to the modified secant equations proposed in [27] and [23], respectively. As mentioned in Section 1, the quasi-Newton methods are characterized by the fact that Bk , which is an approximation for Gk , should be effectively updated to obtain a new approximation Bk+1 for Gk+1 , such that a version of the secant equation is satisfied. A well-known class of update formulae is the Broyden family [19]. Among the Broyden family, the BFGS update, which is proposed by Broyden [3], Fletcher [10], Goldfarb [14] and Shanno [21], is famous and popular because of its favorable numerical and theoretical behavior. For the modified secant equation (5), by the BFGS update formula, Bk+1 is computed as follows: Bk s k s T Bk zk z T + Tk. (9) Bk+1 = Bk − T k s k Bk s k sk z k is updated to Also, dual of the update formula (9), in which the available approximation Hk for G−1 k −1 obtain a new approximation Hk+1 for Gk+1 , is given by Hk+1 =
z k sT sk sT sk zkT k Hk I − T + Tk. I− T sk z k sk z k sk z k
(10)
When Bk is a positive definite matrix, to guarantee the heredity of positive definiteness for Bk+1 computed by (9), which ensures the search direction dk computed by (2) is a descent direction, it is important for some quasi-Newton updates, e.g., BFGS update, that the following inequality holds [19], T sT k zk = sk yk + ϑk > 0.
(11)
T Although Wolfe-Powell line search conditions ensure that sT k yk > 0 (see (7)), generally, sk zk may not be positive and consequently, the BFGS update (9) may not generate a positive definite matrix Bk+1 . Here, in order to ensure Inequality (11), we follow the suggestions in [23, 27] and compute ϑk as follows: T T T 2(fk − fk+1 ) + sT k (gk + gk+1 ), sk yk + 2(fk − fk+1 ) + sk (gk + gk+1 ) sk uk , (12) ϑk = 0, otherwise,
where is a positive constant. Now, we can state our modified BFGS algorithm. The convergence property of the following algorithm will be discussed in the next section. Also, as mentioned before, computing the hybridization parameter θk (in Step 6 of the following algorithm) is the subject of Section 4. Algorithm 2.1.
A modified BFGS algorithm based on the hybrid secant equation (5).
Step 0. Select x0 ∈ Rn and a positive definite matrix B0 (H0 ) ∈ Rn×n as initial approximations respectively for x∗ (the solution to (1)) and G(x∗ ) (G(x∗ )−1 ), and also a tolerance τ > 0 to use in the stopping criterion. Set k = 0. Step 1.
If gk < τ then stop.
Step 2.
Set dk = −Bk−1 gk (= −Hk gk ).
Step 3. Compute a step length αk along the search direction dk such that the Wolfe-Powell line search conditions (6) are satisfied.
Babaie-Kafaki S
Sci China Math
September 2011
Vol. 54
No. 9
Step 4.
Set xk+1 = xk + αk dk .
Step 5.
Compute ϑk by (12). If ϑk = 0 then set zk = yk and go to Step 8.
Step 6.
Select a value for the hybridization parameter θk in [0, 1].
Step 7.
Set uk = (1 − θk )yk + θk sk and zk = yk +
Step 8.
Compute Bk+1 by (9) (Hk+1 by (10)), set k := k + 1 and go to Step 1.
3
2023
ϑk uk . sT k uk
Convergence analysis
In this section, we first show that the proposed modified BFGS method is globally convergent for uniformly convex functions and then, under proper conditions, we prove the local and superlinear convergence for Algorithm 2.1. 3.1
Global convergence analysis
Here, we focus on the global convergence analysis of Algorithm 2.1 for uniformly convex functions. So the following definition is now necessary. Definition 3.1. [22] Let S ⊂ Rn be a nonempty convex set and consider the function f : S → R. If there exists a constant > 0 such that for any x1 , x2 ∈ S and for all α ∈ (0, 1), we have 1 f (αx1 + (1 − α)x2 ) αf (x1 ) + (1 − α)f (x2 ) − α(1 − α)x1 − x2 2 , 2 then f is called a uniformly (or strongly) convex function on S. Theorem 3.1. [22] Let S ⊂ Rn be a nonempty open convex set and consider the twice continuously differentiable function f : S → R. f is uniformly convex if and only if its Hessian matrix is uniformly positive definite at each point of S, i.e., there exists a constant γ > 0 such that v T G(x)v γv2 ,
∀ x ∈ S, ∀ v ∈ Rn .
(13)
Here, we make the following basic assumption on the objective function. Assumption 3.1. The objective function f is twice continuously differentiable and uniformly convex in some neighborhood N of the level set L defined by L = {x ∈ Rn | f (x) f (x0 )},
(14)
where x0 is the starting point of Algorithm 2.1. Remark 3.1. Consider the sequence {xk }∞ k=0 generated by Algorithm 2.1. Since the search directions generated by Algorithm 2.1 are descent directions, the first condition of the Wolfe-Powell line {dk }∞ k=0 ∞ search conditions (6) guarantees that {xk }k=0 ⊂ L ⊂ N . Also, under Assumption 3.1, L is a bounded closed convex set [22]. Lemma 3.1. Under Assumption 3.1, g is Lipschitz continuous on L; i.e., there exists a constant Λ > 0 such that g(x) − g(y) Λx − y, ∀ x, y ∈ L. Proof.
From the mean-value theorem, we can write g(x) − g(y) = G(xξ )(x − y) G(xξ )x − y,
(15)
where xξ = ξx + (1 − ξ)y, for some ξ ∈ (0, 1). Since L is a closed set and the Hessian G is a continuous function, there exists a positive constant Λ such that G(x) Λ,
∀ x ∈ L.
(16)
2024
Babaie-Kafaki S
Sci China Math
September 2011
Vol. 54
No. 9
Now, because L is a convex set, from (15) and (16) the Lipschitz continuity for g (on L) is concluded. Lemma 3.2. Under Assumption 3.1, there exists a constant ρ ∈ (0, 1] such that for the sequence {xk }∞ k=0 generated by Algorithm 2.1 the following inequality holds : sT k uk ρsk uk . Proof.
From Lemma 3.1, we have uk = (1 − θk )yk + θk sk (1 − θk )yk + θk sk (1 − θk )Λsk + θk sk = [(1 − θk )Λ + θk ]sk .
Therefore, uk M1 sk , M1 = max{Λ, 1}.
(17)
Also, from the mean-value theorem, we can write yk = G(xς )sk ,
xς = ςxk + (1 − ς)xk+1 ,
(18)
for some ς ∈ (0, 1). Now, from (18), Theorem 3.1 and Inequality (17) we have T 2 sT k uk = (1 − θk )sk yk + θk sk
(1 − θk )γsk 2 + θk sk 2 = [(1 − θk )γ + θk ]sk 2 m1 m1 sk 2 sk uk , M1 in which m1 = min{γ, 1}. Thus, to complete the proof, it is enough to let ρ =
m1 M1 .
Lemma 3.3. Under Assumption 3.1, for the sequence {xk }∞ k=0 generated by Algorithm 2.1 the following inequality holds : zk Γsk . Proof.
From the mean-value theorem, we can write T |ϑk | = |2(fk − fk+1 ) + sT k (gk + gk+1 )| = |[−2g(xζ ) + gk + gk+1 ] sk |,
where xζ = ζxk + (1 − ζ)xk+1 , for some ζ ∈ (0, 1). Therefore, from Lemma 3.1 we have |ϑk | [gk − g(xζ ) + gk+1 − g(xζ )]sk [Λ(1 − ζ)sk + Λζsk ]sk = Λsk 2 .
(19)
Now, from Lemmas 3.1 and 3.2, and inequality (19), we have ϑk |ϑk | Λ sk . uk Λ + zk = yk + T uk yk + T ρ sk u k |sk uk | So, to complete the proof it is enough to let Γ = Λ +
Λ ρ.
Lemma 3.4. Under Assumption 3.1, for the sequence {xk }∞ k=0 generated by Algorithm 2.1 the following inequality holds : 2 sT k zk γsk . Proof.
From Taylor’s theorem, we can write 1 T fk = f (xk+1 − sk ) = fk+1 − sT k gk+1 + sk G(xι )sk , 2
where xι = ιxk + (1 − ι)xk+1 , for some ι ∈ (0, 1). Therefore, from (13) and Remark 3.1 we have T T 2 sT k zk = 2(fk − fk+1 + sk gk+1 ) = sk G(xι )sk γsk .
Babaie-Kafaki S
Sci China Math
September 2011
Vol. 54
No. 9
2025
Theorem 3.2. [22] Let f : Rn → R be continuously differentiable and bounded below, and ∇f be uniformly continuous on the level set L defined by (14). For the sequence {xk }∞ k=0 generated by Algorithm 2.1 we have lim gk cos νk = 0, k→∞
where cos νk = −
sT k gk . gk sk
Lemma 3.5. Under Assumption 3.1, for the sequence {xk }∞ k=0 generated by Algorithm 2.1 we have lim gk cos νk = 0,
k→∞
where cos νk =
sT k Bk s k . Bk sk sk
(20)
Proof. From Assumption 3.1, f is continuously differentiable and from Remark 3.1, the level set L is a closed subset of Rn . Therefore, f is bounded below on L. Also, Lemma 3.1 guarantees that g is uniformly continuous on L. So, because the search directions {dk }∞ k=0 generated by Algorithm 2.1 are descent directions, from Theorem 3.2 the proof is complete. Now, using the above results we prove the following global convergence theorem similar to the proof of Theorem 5.3.6 in [22]. Theorem 3.3.
Under Assumption 3.1, for the sequence {xk }∞ k=0 generated by Algorithm 2.1, we have lim inf gk = 0. k→∞
Proof. that
By computing the trace and determinant of Bk+1 in the BFGS update formula (9), we obtain tr(Bk+1 ) = tr(Bk ) −
Bk sk 2 zk 2 + , sT sT k Bk s k k zk
and det(Bk+1 ) = det(Bk )
sT k zk . sT k Bk s k
sT k zk , sk 2
zk 2 , sT k zk
If we define mk =
Mk =
then from Lemmas 3.3 and 3.4 we have mk γ,
Mk
Also, if we define cos νk by (20) and let qk = then we can write
Γ2 sk 2 Γ2 ˆ = Γ. 2 γsk γ sT k Bk s k , sk 2
Bk sk 2 Bk sk 2 sk 2 sT qk k Bk s k = = . T T 2 2 sk cos2 νk s k Bk s k (sk Bk sk )
In addition, from (21) we have det(Bk+1 ) = det(Bk )
2 sT mk k zk sk = det(Bk ) . sk 2 sT qk B s k k k
Now, we introduce the following function on the set of all positive definite matrices X ∈ Rn×n , ψ(X) = tr(X) − ln(det(X)).
(21)
2026
Babaie-Kafaki S
Sci China Math
September 2011
Vol. 54
No. 9
It is not difficult to show that ψ(Bk ) > 0. By using the above results, we have qk − ln(det(Bk )) − ln mk + ln qk cos2 νk qk qk + ln cos2 νk . = ψ(Bk ) + (Mk − ln mk − 1) + 1 − + ln cos2 νk cos2 νk
ψ(Bk+1 ) = tr(Bk ) + Mk −
(22)
Note that h(t) = 1 − t + ln t 0, for all t > 0. Hence, from (22) we have 0 < ψ(Bk+1 ) ψ(B0 ) + ¯(k + 1) +
k
ln cos2 νj ,
(23)
j=0
ˆ where the constant ¯ = Γ−ln γ −1 is assumed to be positive without loss of generality. From Lemma 3.5 we have (24) lim gk cos νk = 0. k→∞
Now, if cos νk → 0, then there exists k1 > 0 such that for all j > k1 we have ln cos νj < −2 ¯. So, using (23), for all k > k1 large enough, we have 0 < ψ(B0 ) + ¯(k + 1) +
k1
ln cos2 νj +
j=0
= ψ(B0 ) +
k1
k
(−2 ¯)
j=k1 +1
ln cos2 νj + 2 ¯k1 − ¯(k − 1) < 0,
j=0
which gives a contradiction. Therefore, the assumption cos νk → 0 is not true. Hence there exists a subsequence {ji }∞ i=1 such that (25) cos νji γ¯ > 0, ∀ i 0. From (24) and (25) we deduce that lim inf gk = 0. k→∞
3.2
Local convergence analysis
Here, we study the local convergence property of the proposed modified BFGS algorithm (Algorithm 2.1) using the approach suggested by Dennis and Mor´e [4]. Similar to the convergence analysis in [4], we shall discuss only the local convergence property for the direct iterations (αk = 1) of our modified BFGS algorithm. The corresponding convergence properties about the damped version, in which a step length αk is computed such that Wolfe-Powell line search conditions are satisfied, can be deduced from the results of the direct iterations [5, 20]. Also, it can be shown that in some neighborhood of the optimal solution, αk = 1 satisfies in the Wolfe-Powell line search conditions (see the final part of the proof of Theorem 3.8 in [18]). We make the following basic assumptions on the objective function. Assumption 3.2. The function f is twice continuously differentiable in an open convex set D that contains x∗ , where x∗ is a local minimizer of f in which g(x∗ ) = 0 and G(x∗ ) is positive definite. Assumption 3.3. The Hessian G(x) is Holder continuous on D, i.e., there are positive constants p and L such that G(x) − G(y) Lx − yp , ∀ x, y ∈ D. Remark 3.2 The continuity and positive definiteness of G(x∗ ) imply that there are three positive constants r, m and M such that ∀ v ∈ Rn ,
∀ x ∈ Nr (x∗ ) ⊂ D : mv2 v T G(x)v M v2 .
Babaie-Kafaki S
Sci China Math
September 2011
Vol. 54
No. 9
2027
Below, we discuss some properties of our hybrid secant equation that, as pointed out in [29, 30], are necessary for proving the local and superlinear convergence of a BFGS-type algorithm. Lemma 3.6. Under Assumption 3.2, there exists a constant ρ ∈ (0, 1] such that in some neighborhood of x∗ for the sequence {xk }∞ k=0 generated by Algorithm 2.1, the following inequality holds : sT k uk ρsk uk . Proof. Remark 3.2 ensures that the objective function f is uniformly convex on Nr (x∗ ). So, the proof is similar to the proof of Lemma 3.2 and hence is omitted. Lemma 3.7. Under Assumption 3.3, for the sequence {xk }∞ k=0 generated by Algorithm 2.1, the following inequality holds : |ϑk | Lsk 2 σ(xk , xk+1 ), in which σ(xk , xk+1 ) = max{xk+1 − x∗ p , xk − x∗ p }. Proof.
From Taylor’s theorem, we can write 1 T s G(xξ )sk , 2 k 1 fk − fk+1 + gkT sk = − sT G(xη )sk , 2 k T sk = fk − fk+1 + gk+1
where xξ = ξxk + (1 − ξ)xk+1 ,
xη = ηxk + (1 − η)xk+1 ,
for some ξ, η ∈ (0, 1). Therefore, 2(fk − fk+1 ) + (gk + gk+1 )T sk =
1 T s [G(xξ ) − G(xη )]sk . 2 k
(26)
Now, from (12), (26) and Assumption 3.3 we have 1 T |s [G(xξ ) − G(xη )]sk | 2 k 1 sk 2 (G(xξ ) − G(x∗ ) + G(x∗ ) − G(xη )) 2 1 Lsk 2 (xξ − x∗ p + x∗ − xη p ) 2
|ϑk |
Lsk 2 σ(xk , xk+1 ).
Lemma 3.8. Under Assumptions 3.2 and 3.3, in some neighborhood of x∗ for the sequence {xk }∞ k=0 generated by Algorithm 2.1 the following inequality holds : zk − G(x∗ )sk cσ(xk , xk+1 )sk , where c is a positive constant. Proof. Consider the neighborhood Nr (x∗ ) introduced in Remark 3.2. From Lemmas 3.6 and 3.7, Taylor’s theorem stated by (18) and Assumption 3.3, we have |ϑk | uk |sT k uk | L G(xς ) − G(x∗ )sk + σ(xk , xk+1 )sk ρ L Lxς − x∗ p sk + σ(xk , xk+1 )sk ρ
zk − G(x∗ )sk yk − G(x∗ )sk +
2028
Babaie-Kafaki S
Sci China Math
September 2011
Vol. 54
No. 9
1 σ(xk , xk+1 )sk . L 1+ ρ Therefore, to complete the proof it is enough to let c = L(1 + 1ρ ). Lemma 3.9. Under Assumptions 3.2 and 3.3, there exists a positive constant β such that in some neighborhood of x∗ for the sequence {xk }∞ k=0 generated by Algorithm 2.1 the following inequality holds : 1 sk zk βsk . β Proof. Consider the neighborhood Nr (x∗ ) introduced in Remark 3.2. From Lemmas 3.6 and 3.7, Taylor’s theorem stated by (18) and Remark 3.2, we have |ϑk | uk |sT k uk | L M sk + σ(xk , xk+1 )sk ρ L M + σ(xk , xk+1 ) sk , ρ
zk yk +
(27)
and similarly, |ϑk | L zk yk − T uk msk − σ(xk , xk+1 )sk ρ |sk uk | L m − σ(xk , xk+1 ) sk . ρ Now, we choose the positive constant ε , as the radius of neighborhood Nε (x∗ ), such that
ρm p . ε ∈ 0, min r, L
(28)
(29)
Therefore, from (27) and (28), to complete the proof, it is enough to let −1 Lεp Lεp β = max M + , m− . ρ ρ
Remark 3.3. As pointed out in [29], using Lemmas 3.8 and 3.9, the local and superlinear convergence of Algorithm 2.1 with DFP update formula [19] in Step 8 can be proved. Lemma 3.10. If for some ∈ (0, 1) we have xk+1 − x∗ xk − x∗ ,
∀ k 0,
(30)
∞ and {Bk }∞ k=0 and {Hk }k=0 are bounded, then
(Hk − G(x∗ )−1 )zk =0 k→∞ zk lim
(31)
∗ ensures that the sequence {xk }∞ k=0 generated by Algorithm 2.1 superlinearly converges to x .
Proof.
Since Hk = Bk−1 and αk = 1, then xk+1 = xk − Hk gk . So, we have (Hk − G(x∗ )−1 )zk = Hk gk+1 + sk +
ϑk Hk u k T sk u k
⇒ Hk gk+1 = (Hk − G(x∗ )−1 )zk −
− G(x∗ )−1 zk
ϑk Hk uk + G(x∗ )−1 (zk − G(x∗ )sk ). sT k uk
(32)
Babaie-Kafaki S
Sci China Math
September 2011
Vol. 54
No. 9
2029
∞ Suppose that MB and MH are respectively the supremums of {Bk }∞ k=0 and {Hk }k=0 . Thus, from (32) we have |ϑk | uk + G(x∗ )−1 zk − G(x∗ )sk . gk+1 MB (Hk − G(x∗ )−1 )zk + MH T (33) |sk uk | ∗ Property (30) guarantees that there exists an index K such that {xk }∞ k=K ⊂ Nε (x ), with ε defined by (29). So, from Lemma 3.9, we know that
sk
1 zk . β
(34)
Considering (31) and (34), we have (Hk − G(x∗ )−1 )zk = 0. k→∞ sk lim
(35)
From Lemma 3.8 and the property (30), we have zk − G(x∗ )sk zk − G(x∗ )sk cσ(xk , xk+1 ) ⇒ lim = 0. k→∞ sk sk
(36)
Also, from Lemmas 3.6 and 3.7, we have |ϑk | uk |ϑk | L σ(xk , xk+1 ), 2 s ρs ρ |sT u | k k k k where by considering the property (30) it ensures that lim
k→∞
|ϑk | uk = 0. |sT k uk | sk
(37)
From (33), (35), (36) and (37), we obtain the following important result, lim
k→∞
gk+1 = 0. sk
(38)
Now, from Taylor’s theorem and Remark 3.2, we can write gk+1 = g(xk+1 ) − g(x∗ ) = G(xξ )(xk+1 − x∗ ) mxk+1 − x∗ ,
(39)
where xξ = ξx∗ + (1 − ξ)xk+1 with some ξ ∈ (0, 1). On the other hand, from the property (30), we have sk xk − x∗ + xk+1 − x∗ 2xk − x∗ . Finally, from (38)–(40), we have
(40)
xk+1 − x∗ = 0, k→∞ xk − x∗ lim
which guarantees the superlinear convergence. As pointed out in [29, 30], following carefully the proof of superlinear convergence theorem in [4], we find that in most places, if we substitute yk by zk , the proof of [4] remains true trivially, and to complete the proof for our method, only the results stated by Lemmas 3.8–3.10 are necessary. Here, we use these lemmas and some results in [4] to present a compact argument as the local and superlinear convergence analysis of Algorithm 2.1. Theorem 3.4. Suppose that Assumptions 3.2 and 3.3 hold, and there exist nonnegative constants a1 ∞ and a2 such that for the sequences {xk }∞ k=0 and {Hk }k=0 generated by Algorithm 2.1 with αk = 1, for all k 0, the following inequality holds : Hk+1 − G(x∗ )−1 E [1 + a1 σ(xk , xk+1 )]Hk − G(x∗ )−1 E + a2 σ(xk , xk+1 ),
2030
Babaie-Kafaki S
Sci China Math
September 2011
Vol. 54
No. 9
where E ∈ Rn×n is an arbitrary nonsingular symmetric matrix. Then, for each ∈ (0, 1) there exist positive constants ε¯ and δ¯ such that with x0 ∈ Rn satisfying x0 − x∗ < ε¯ and the positive definite matrix ∗ H0 satisfying H0 − G(x∗ )−1 E < δ¯ the sequence {xk }∞ k=0 converges to x . Furthermore, xk+1 − x∗ xk − x∗ ,
∀ k 0,
∞ and {Hk }∞ k=0 and {Bk }k=0 are uniformly bounded.
Proof.
The proof is similar to the proof of Theorem 3.4 in [4] and hence is omitted.
Corollary 3.1. Under the assumptions of Theorem 3.4, if some subsequence of {Hk − G(x∗ )−1 E }∞ k=0 ∗ converges to zero, then {xk }∞ k=0 superlinearly converges to x . Proof.
The proof is similar to the proof of Corollary 3.3 in [4] and hence is omitted.
Lemma 3.11. Let E ∈ Rn×n be a nonsingular symmetric matrix satisfying the following inequality, Es − E −1 z bE −1 z, for some b ∈ [0, 13 ] and vectors s and z in Rn with z = 0. If for each symmetric matrix H ∈ Rn×n , the ¯ ∈ Rn×n is defined by matrix H zsT ssT sz T ¯ + T , H= I− T H I− T s z s z s z then for any symmetric matrix A ∈ Rn×n satisfying A = H, the following inequality holds : √ s − Az Es − E −1 z 5 ¯ − AE H − AE + 2(1 + 2 n)EF , H 1 − a3 η 2 + 2(1 − b) E −1 z E −1 z 1 − 2b 3 ,1 , a3 = ∈ 1 − b2 8
where
Proof.
η=
E(H − A)z . H − AE E −1 z
The proof is similar to the proof of Lemma 5.2 in [4] and hence is omitted.
Lemma 3.12. Suppose that Assumptions 3.2 and 3.3 hold. Let E ∈ Rn×n be a symmetric matrix such that E 2 = G(x∗ ). In some neighborhood of x∗ for the sequence {xk }∞ k=0 generated by Algorithm 2.1 with αk = 1, for all k 0, the following inequality holds : Esk − E −1 zk b, E −1 zk for some constant b ∈ [0, 13 ]. Proof.
From Lemma 3.8, we have Esk − E −1 zk = E −1 (E 2 sk − zk ) E −1 zk − G(x∗ )sk cE −1 σ(xk , xk+1 )sk .
On the other hand, from Remark 3.2 and Lemma 3.9 we have E −1 zk 2 = zkT E −2 zk = zkT G(x∗ )−1 zk Therefore,
1 1 zk 2 2 sk 2 . M β M
Esk − E −1 zk cE −1 √ σ(xk , xk+1 ). E −1 zk β M
So, we choose the positive constant ε as the radius of neighborhood Nε (x∗ ), such that √ β M , ε ∈ 0, min ε , 3cE −1
(41)
Babaie-Kafaki S
Sci China Math
September 2011
Vol. 54
No. 9
2031
where ε is defined by (29). Therefore, from (41), to complete the proof, it is enough to let b=
cE −1 √ ε . β M
Now, using the above results we can prove the following local and superlinear convergence theorem for Algorithm 2.1. Theorem 3.5. Suppose that Assumptions 3.2 and 3.3 hold. If x0 ∈ D and H0 (B0 ) are sufficiently close to x∗ and G(x∗ )−1 (G(x∗ )) respectively, then the sequence {xk }∞ k=0 generated by Algorithm 2.1 with ∗ αk = 1, for all k 0, converges to x supperlinearly. ¯ = Hk+1 , and E ∈ Rn×n Proof. In Lemma 3.11, we let A = G(x∗ )−1 , s = sk , z = zk , H = Hk and H
be a nonsingular symmetric matrix such that E 2 = G(x∗ ). So, from Lemma 3.12, the assumptions of Lemma 3.11 are held and hence we have Esk − E −1 zk 5 Hk − G(x∗ )−1 E Hk+1 − G(x∗ )−1 E 1 − a3 ηk2 + 2(1 − b) E −1 zk √ sk − G(x∗ )−1 zk , + 2(1 + 2 n)EF E −1 zk where
1 − 2b 3 ,1 , a3 = ∈ 1 − b2 8
Now, from Inequality (41), we can write ∗ −1 Hk+1 − G(x ) E 1 − a3 ηk2 +
ηk =
E(Hk − G(x∗ )−1 )zk . Hk − G(x∗ )−1 E E −1 zk
cE −1 5 √ σ(xk , xk+1 ) Hk − G(x∗ )−1 E 2(1 − b) β M
√ cE −1 2 + 2(1 + 2 n)EF √ σ(xk , xk+1 ) β M
3 1 − ηk2 + a1 σ(xk , xk+1 ) Hk − G(x∗ )−1 E + a2 σ(xk , xk+1 ), 8 where a1 =
cE −1 5 √ , 2(1 − b) β M
(42)
√ cE −1 2 a2 = 2(1 + 2 n)EF √ . β M
∗ Thus, the assumptions of Theorem 3.4 hold and therefore, the sequence {xk }∞ k=0 converges to x locally. Furthermore, for each ∈ (0, 1) we have
xk+1 − x∗ xk − x∗ ,
∀ k 0,
(43)
∞ and {Hk }∞ k=0 and {Bk }k=0 are uniformly bounded. To prove the superlinear convergence, we note that if there is a subsequence {Hkl }∞ l=0 which converges ∗ −1 ∗ −1 to G(x ) , then Corollary 3.1 gives the desired conclusion; otherwise, {Hk −G(x ) E }∞ k=0 is bounded √ away from zero. Now, since 1 − t 1 − 2t , from inequality (42) we can write
3 2 η Hk − G(x∗ )−1 E Hk − G(x∗ )−1 E − Hk+1 − G(x∗ )−1 E 16 k + σ(xk , xk+1 )[a1 Hk − G(x∗ )−1 E + a2 ].
(44)
On the other hand, from (43) we have σ(xk , xk+1 ) = xk − x∗ p kp x0 − x∗ p . So, (44) and (45) ensure that
∞ k=0
ηk2 Hk − G(x∗ )−1 E < ∞.
(45)
2032
Babaie-Kafaki S
Sci China Math
September 2011
Vol. 54
No. 9
Since {Hk − G(x∗ )−1 E }∞ k=0 is bounded below, we must have (Hk − G(x∗ )−1 )zk = 0. k→∞ zk lim
(46)
∗ Now, (43) and (46) show that the assumptions of Lemma 3.10 hold. Therefore, {xk }∞ k=0 converges to x superlinearly.
4
On the choice of hybridization parameter θk
Here, by using the information of recent iterations, we propose an effective formula for computing the hybridization parameter θk . By multiplying both sides of Equation (5) with sT k−1 , we have T sT k−1 Bk+1 sk = sk−1 zk .
(47)
Now, to obtain a better approximation for Gk+1 , it is suitable for Bk+1 to satisfy the following equation, Bk+1 sk−1 = zk−1 ,
(48)
where it contains the information obtained from more than one previous iteration. It is remarkable that near the optimal solution, because of the approximate quadratic behavior of the objective function, Equation (48) approximately holds. From (47) and (48), we find that for computing θk , the following equation, containing Equation (48), should be solved, T sk = sT zk−1 k−1 zk .
After some algebraic manipulations, θk is determined as follows: θk = in which wk = sk−1 − γk sk with γk =
wkT yk , T wk (yk − sk )
(49)
T sT k zk−1 − sk−1 yk , ϑk
where ϑk should be nonzero. Here, if θk < 0, then we set θk = 0 and also, if θk > 1, then we set θk = 1. Thus, an interesting feature of the proposed hybrid secant equation is its adoptive applying of the modified secant equations proposed in [23] and [27]. It is remarkable that if (wkT yk )(wkT sk ) < 0, then we have θk ∈ [0, 1). Here, we suggest a procedure for computing the hybridization parameter θk . Algorithm 4.1. A procedure for computing the hybridization parameter θk in the modified secant equation (5). Step 1.
If k = 0 then select an initial value for θk ∈ [0, 1].
Step 2.
Compute θk by (49).
Step 3.
If θk < 0 then set θk = 0.
Step 4.
If θk > 1 then set θk = 1.
It is possible to propose other formulae for the hybridization parameter θk by using proper equations instead of Equation (48), for example, Bk+1 sk−1 = yk−1 . (50) However, a merit of Equation (48), which guarantees the satisfaction of the modified secant equation (5) at the (k − 1)-th iteration, is its account of both the function and gradient available information. So, here we use Equation (48) to compute the hybridization parameter θk .
Babaie-Kafaki S
5
Sci China Math
September 2011
Vol. 54
2033
No. 9
Numerical results
We present some numerical results obtained by applying a MATLAB 7.7.0.471 (R2008b) implementation of Algorithm 2.1 and two efficient modified BFGS algorithms proposed by Yuan [27] and Wei et al. [23] on 35 unconstrained optimization test problems of CUTEr collection [15] with different dimensions ranging from n = 2 to n = 50 as specified in Table 1. More exactly, our results are reported for the following three algorithms: 1) M1: Algorithm 2.1 with θk computed by Algorithm 4.1. 2) M2: Algorithm 2.1 with θk = 0, ∀ k 0 (or equivalently, uk = yk ). 3) M3: Algorithm 2.1 with θk = 1, ∀ k 0 (or equivalently, uk = sk ). Table 1
Numerical results
n
M1
M2
M3
AIRCRFTB
8
(1273,802,961.13)
(1028,615,743.50)
(1240,779,934.00)
ALLINITU
4
(64,37,53.00)
(76,44,63.00)
(57,32,46.25)
BEALE
2
(56,34,62.00)
(59,35,64.50)
(96,60,108.00)
BIGGS3
6
(2134,1263,1618.67)
(1814,1135,1437.33)
(2141,1323,1679.83)
BIGGS5
6
(2134,1263,1618.67)
(1814,1135,1437.33)
(2141,1323,1679.83)
BIGGS6
6
(355,231,290.17)
(4863,2900,3710.50)
(4215,2506,3208.50)
BOX2
3
(43,33,47.33)
(80,53,79.67)
(85,56,84.33)
BOX3
3
(43,33,47.33)
(80,53,79.67)
(85,56,84.33)
Functions
BRKMCC
2
(17,10,18.50)
(17,10,18.50)
(17,10,18.50)
50
(9139,5881,6063.78)
(10569,6813,7024.38)
(9653,6149,6342.06)
CUBE
2
(895,573,1020.50)
(1239,797,1416.50)
(2173,1406,2492.50)
DENSCHNA
2
(16,11,19.00)
(21,14,24.50)
(24,16,28.00)
CHNROSNB
DENSCHNB
2
(20,11,21.00)
(20,11,21.00)
(21,12,22.50)
DENSCHNF
2
(542,346,617.00)
(541,345,615.50)
(558,355,634.00)
DIXMAANK
15
(161,148,158.73)
(173,160,171.53)
(195,182,195.00)
ENGVAL2
3
(4001,2494,3827.67)
(6959,4310,6629.67)
(7349,4739,7188.67)
GROWTHLS
3
(14,2,6.67)
(14,2,6.67)
(14,2,6.67)
HAIRY
2
(50,12,37.00)
(94,20,67.00)
(69,16,50.50)
HELIX
3
(723,448,689.00)
(832,520,797.33)
(733,443,687.33)
HILBERTA
2
(13,13,19.50)
(13,13,19.50)
(13,13,19.50)
HILBERTB
10
(9,5,5.90)
(9,5,5.90)
(9,5,5.90)
HIMMELBG
2
(20,12,22.00)
(20,12,22.00)
(18,10,19.00) (15,10,17.50)
HIMMELBH
2
(16,11,19.00)
(14,8,15.00)
JENSMP
2
(11,2,7.50)
(11,2,7.50)
(11,2,7.50)
KOWOSB
4
(282,185,255.50)
(325,209,290.25)
(201,138,188.25)
LOGHAIRY
2
(2305,2120,3272.50)
(3888,3532,5476.00)
(5040,4686,7206.00)
PALMER5C
6
(13,7,9.17)
(13,7,9.17)
(13,7,9.17)
ROSENBR
2
(1845,1186,2108.50)
(1541,990,1760.50)
(1720,1115,1975.00)
S308
2
(178,110,199.00)
(173,106,192.50)
(181,111,201.50)
SINEVAL
2
(1045,664,1186.50)
(1345,861,1533.50)
(1239,788,1407.50)
SISSER
2
(22,21,32.00)
(28,26,40.00)
(22,21,32.00)
SNAIL
2
(33,26,42.50)
(22,13,24.00)
(21,12,22.50) (63,32,33.26)
VAREIGVL
50
(61,31,32.22)
(61,31,32.22)
YFITU
3
(9795,6403,9668.00)
(10402,6818,10285.33)
Failed
ZANGWIL2
2
(3,3,4.50)
(3,3,4.50)
(3,3,4.50)
6.26
(809.88,530.23,717.45)
(1110.56,729.12,995.31)
(1159.85,777.00,1077.64)
Average
2034
Babaie-Kafaki S
Sci China Math
September 2011
Vol. 54
No. 9
All of the runs have been done on a mobile computer, Intel(R) Core(TM)2 Duo CPU 2.00 GHz, with 1 GB of RAM. In the implementation of Algorithm 2.1, we let B0 = I, τ = 10−6 , = 10−18 in (12), δ = 10−4 and σ = 0.9 in (6). Also, all attempts to solve the problems were limited to reaching the maximum of 10000 iterations. To have a proper choice for θ0 in Step 1 of Algorithm 4.1, numerical experiments have been done with different values of θ0 ∈ {0.05j}20 j=0 . Based on the observations, we let θ0 = 0.5 because of its proper results. Table 1 shows the numerical results that are given in the form of (Nf, Ng, efe), where Nf and Ng respectively denote the numbers of functions and gradient evaluations, and efe = Table 2 Functions
n
Nf + Ng, n
CPU time (in second) M1
M2
M3 2.21E−01
AIRCRFTB
8
2.12E−01
1.88E−01
ALLINITU
4
2.80E−02
2.74E−02
2.66E−02
BEALE
2
8.89E−03
9.48E−03
1.29E−02
BIGGS3
6
4.08E−01
3.39E−01
3.91E−01
BIGGS5
6
4.01E−01
3.27E−01
3.75E−01
BIGGS6
6
7.78E−02
8.64E−01
6.83E−01
BOX2
3
9.94E−03
1.42E−02
1.67E−02
BOX3
3
9.44E−03
1.45E−02
1.69E−02
BRKMCC
2
4.13E−03
3.55E−03
4.06E−03
50
1.00E+00
1.15E+00
1.37E+00
CUBE
2
1.50E−01
1.82E−01
2.61E−01
DENSCHNA
2
1.38E−02
1.76E−02
3.93E−02
CHNROSNB
DENSCHNB
2
4.42E−03
3.94E−03
9.08E−03
DENSCHNF
2
7.65E−02
5.94E−02
7.11E−02
DIXMAANK
15
6.43E−02
6.15E−02
5.67E−02
ENGVAL2
3
4.67E−01
8.29E−01
7.68E−01
GROWTHLS
3
2.12E−03
6.27E−03
1.82E−03
HAIRY
2
2.58E−02
2.73E−02
8.62E−03
HELIX
3
9.93E−02
7.10E−02
6.81E−02
HILBERTA
2
2.54E−02
3.81E−03
3.99E−03
HILBERTB
10
3.33E−02
4.58E−02
4.78E−02
HIMMELBG
2
1.00E−02
1.47E−02
1.32E−02
HIMMELBH
2
5.56E−03
3.06E−03
4.55E−02
JENSMP
2
6.51E−03
1.38E−03
1.18E−02
KOWOSB
4
8.24E−02
5.80E−02
2.72E−02
LOGHAIRY
2
4.39E−01
6.57E−01
8.46E−01
PALMER5C
6
3.08E−03
2.55E−03
2.88E−03
ROSENBR
2
2.11E−01
1.16E−01
1.24E−01
S308
2
2.64E−02
2.66E−02
2.50E−02
SINEVAL
2
1.21E−01
1.60E−01
1.50E−01
SISSER
2
6.66E−03
4.67E−02
2.33E−02
SNAIL
2
5.78E−03
1.36E−02
1.06E−02 1.92E−02
50
1.55E−02
1.90E−02
YFITU
VAREIGVL
3
1.05E+00
1.09E+00
Failed
ZANGWIL2
2
6.83E−03
1.27E−03
6.66E−02
6.26
1.19E−01
1.58E−01
1.71E−01
Average
Babaie-Kafaki S
Sci China Math
September 2011
1.1
1.1
1.0
1.0 0.8 p (ω)
0.8 p (ω)
2035
No. 9
0.9
0.9
0.7
0.7 0.6 0.5
0.6
0.4
0.5 0.4 0.3 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 ω Figure 1
Vol. 54
M1 M2 M3
2.6 2.8 3.0
Effective function evaluations performance profiles
0.3 0.2 0.1 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 ω Figure 2
M1 M2 M3
2.6 2.8 3.0
CPU time performance profiles
where efe (effective function evaluations) is a criterion for comparing the performance of the algorithms. Note that we consider n function evaluations as one gradient evaluation to account for the number of components of the gradient. Also, for each numerical result in the form of (Nf, Ng, efe), the related execution time (in second) is shown in Table 2. In our numerical experiments, M1 and M2 successfully solved all the problems, while failure occurred for M3 in solving ‘YFITU’. The average of the numerical results and the execution time (see the last row of Tables 1 and 2), where it is computed without considering the results of ‘YFITU’, shows that M1 (our modified BFGS algorithm) is more efficient than M2 and M3. Efficiency comparisons were also made using the performance profile proposed by Dolan and Mor´e [9]. The performance profile gives, for every ω 1, the proportion p(ω) of the test problems that each considered algorithmic variant has a performance within a factor of ω of the best. As shown in Figures 1 and 2, M1 is more efficient than the other algorithms, demonstrating the effectiveness of the proposed hybrid secant equation. Figure 2 shows that the time which is spent to compute the hybridization parameter θk causes a decrease in the performance of M1 in the perspective of CPU time. Numerical comparisons also show that M2 is more efficient than M3.
6
Conclusions
By making a convex combination of the modified secant equations proposed by Yuan [27] and Wei et al. [23], a hybrid secant equation and also, a modified BFGS algorithm has been proposed. The hybridization parameter has been effectively computed using the available information of recent iterations. Under proper conditions, it has been shown that the proposed algorithm is globally, locally and superlinearly convergent. By using the performance profile introduced by Dolan and Mor´e, a comparison between the implementations of the proposed algorithm and two efficient modified BFGS algorithms proposed by Yuan [27] and Wei et al. [23], on a set of unconstrained optimization test problems from the CUTEr collection, has been done. As the numerical comparisons showed, our modified BFGS algorithm outperforms the modified BFGS algorithms proposed in [23,27]. This shows the effectiveness of the proposed hybrid secant equation. Acknowledgements The author is grateful to the anonymous referees for their valuable comments and suggestions that helped to improve the quality of this work. He also thanks the Research Council of Semnan University for its support.
References 1 Amini K, Ghorbani Rizi A. A new structured quasi-Newton algorithm using partial information on Hessian. J Comput Appl Math, 2010, 234: 805–811
2036
Babaie-Kafaki S
Sci China Math
September 2011
Vol. 54
No. 9
2 Babaie-Kafaki S, Ghanbari R, Mahdavi-Amiri N. Two new conjugate gradient methods based on modified secant equations. J Comput Appl Math, 2010, 234: 1374–1386 3 Broyden C G. The convergence of a class of double-rank minimization algorithms. II. The new algorithm. J Inst Math Appl, 1970, 6: 222–231 4 Broyden C G, Dennis Jr J E, Mor´e J J. On the local and superlinear convergence of quasi-Newton methods. J Inst Math Appl, 1973, 12: 223–245 5 Byrd R H, Nocedal J, Yuan Y X. Global convergence of a class of quasi-Newton methods on convex problems. SIAM J Numer Anal, 1987, 24: 1171–1190 6 Chen L H, Deng N Y, Zhang J Z. A modified quasi-Newton method for structured optimization with partial information on the Hessian. Comput Optim Appl, 2006, 35: 5–18 7 Davidon W C. Conic approximations and collinear scalings for optimizers. SIAM J Numer Anal, 1980, 17: 268– 281 8 Dennis Jr J E, Mart´ınez H J, Tapia R A. Convergence theory for the structured BFGS secant method with an application to nonlinear least squares. J Optim Theory Appl, 1989, 61: 161–178 9 Dolan E D, Mor´ e J J. Benchmarking optimization software with performance profiles. Math Program, 2002, 91: 201–213 10 Fletcher R. A new approach to variable metric algorithms. Comput J, 1970, 13: 317–322 11 Ford J A. A survey of multi-step quasi-Newton methods. In: Proceedings of the International Conference on Scientific Computations (Beirut, Lebanon), 1999 12 Ford J A, Moghrabi I A. Multi-step quasi-Newton methods for optimization. J Comput Appl Math, 1994, 50: 305–323 13 Ford J A, Moghrabi I A. Using function-values in multi-step quasi-Newton methods. J Comput Appl Math, 1996, 66: 201–211 14 Goldfarb D. A family of variable-metric methods derived by variational means. Math Comp, 1970, 24: 23–26 15 Gould N I M, Orban D, Toint P L. CUTEr: a constrained and unconstrained testing environment, revisited. ACM Trans Math Softw, 2003, 29: 373–394 16 Guo Q, Liu J G, Wang D H. A modified BFGS method and its superlinear convergence in nonconvex minimization with general line search rule. J Appl Math Comput, 2008, 28: 435–446 17 Huang H Y. Unified approach to quadratically convergent algorithms for function minimization. J Optim Theory Appl, 1970, 5: 405–423 18 Li D H, Fukushima M. A modified BFGS method and its global convergence in nonconvex minimization. J Comput Appl Math, 2001, 129: 15–35 19 Nocedal J, Wright S J. Numerical Optimization, 2nd ed. New York: Springer, 2006 20 Powell M J D. Some global convergence properties of a variable metric algorithm for minimization without exact line searches. In: Cottle R W, Lemke C E, eds. Nonlinear Programming SIAM-AMS Proceeding, 1976, 53–72 21 Shanno D F. Conditioning of quasi-Newton methods for function minimization. Math Comp, 1970, 24: 647–656 22 Sun W, Yuan Y X. Optimization Theory and Methods: Nonlinear Programming. New York: Springer, 2006 23 Wei Z, Li G, Qi L. New quasi-Newton methods for unconstrained optimization problems. Appl Math Comput, 2006, 175: 1156–1188 24 Wei Z, Yu G, Yuan G, et al. The superlinear convergence of a modified BFGS-type method for unconstrained optimization. Comput Optim Appl, 2004, 29: 315–332 25 Yabe H, Ogasawara H, Yoshino M. Local and superlinear convergence of quasi-Newton methods based on modified secant conditions. J Comput Appl Math, 2007, 205: 617–632 26 Yabe H, Takano M. Global convergence properties of nonlinear conjugate gradient methods with modified secant condition. Comput Optim Appl, 2004, 28: 203–225 27 Yuan Y X. A modified BFGS algorithm for unconstrained optimization. IMA J Numer Anal, 1991, 11: 325–332 28 Yuan Y X, Byrd R H. Non-quasi-Newton updates for unconstrained optimization. J Comput Math, 1995, 13: 95–107 29 Zhang J, Xu C. Properties and numerical performance of quasi-Newton methods with modified quasi- Newton equations. J Comput Appl Math, 2001, 137: 269–278 30 Zhang J Z, Deng N Y, Chen L H. New quasi-Newton equation and related methods for unconstrained optimization. J Optim Theory Appl, 1999, 102: 147–167