Comp. Appl. Math. DOI 10.1007/s40314-015-0221-8
On the multi-point Levenberg–Marquardt method for singular nonlinear equations Xin Zhao1 · Jinyan Fan1
Received: 1 July 2014 / Revised: 5 January 2015 / Accepted: 4 March 2015 © SBMAC - Sociedade Brasileira de Matemática Aplicada e Computacional 2015
Abstract In this paper, we present a multi-point iterative Levenberg–Marquardt algorithm for singular nonlinear equations. The algorithm converges globally and the convergence order is studied under the local error bound condition, which is weaker than the nonsingularity of the Jacobian at the solution. Keywords Singular nonlinear equations · Levenberg–Marquardt method · Chebyshev’s method · Local error bound Mathematics Subject Classification
90C30 · 65K05
1 Introduction We consider the system of nonlinear equations F(x) = 0,
(1) F (x)
→ is continuously differentiable and is Lipschitz continuous. where F(x) : Throughout the paper, we assume that the solution set of (1) is nonempty and denote it by X ∗ . In all cases, · refers to the 2-norm. There are many methods for nonlinear equations (Kelley 1995, 2003; Levenberg 1944; Marquardt 1963; Moré 1978; Yuan 1998, 2011). Newton’s method is very efficient and popular. At every iteration, it computes dk = −Jk−1 Fk , (2) xk+1 = xk + dk , Rn
Rn
Communicated by Ernesto G. Birgin.
B 1
Jinyan Fan
[email protected] Department of Mathematics and MOE-LSC, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, People’s Republic of China
123
X. Zhao, J. Fan
where Fk = F(xk ), and Jk = F (xk ) is the Jacobian. If J (x) is Lipschitz continuous and nonsingular at the solution, Newton’s method has quadratic convergence. There are also many other one-point iterative methods for (1). For example, Chebyshev’s method computes ⎧ ⎪ d = −Jk−1 Fk , ⎪ ⎨ k 1 (3) dˆk = − Jk−1 Fk dk2 , ⎪ 2 ⎪ ⎩x ˆ = x +d +d , k+1
k
k
k
at every iteration (Argyros and Chen 1993). To avoid the computation of Fk dk2 , Ezquerro and Hernández approximate it by the combinations of the function values at different points (Ezquerro and Hernández 2009). Let yk = xk + pdk , where p ∈ (0, 1] is a constant. Then, 2 Fk dk2 ≈ 2 [( p − 1)Fk + F(yk )], p which, together with (3), leads to the modified Chebyshev’s method: ⎧ dk = −Jk−1 Fk , yk = xk + pdk , ⎪ ⎪ ⎨ 1 dˆk = − 2 Jk−1 [( p − 1)Fk + F(yk )], ⎪ p ⎪ ⎩ xk+1 = xk + dk + dˆk .
(4)
Note that, when the Jacobian is singular or near singular, dk and dˆk given in (4) are not defined or may not be well defined. To overcome the difficulties caused by the possible singularity of Jk , the Levenberg–Marquardt (LM) method for (1) computes the step dkL M = −(JkT Jk + λk I )−1 JkT Fk
(5)
at every iteration, where the LM parameter λk is nonnegative and prevents the step from being too large when JkT Jk is near singular. There have been extensive studies on the LM method. For instance, Yamashita and Fukushima choose λk = Fk 2 and prove that the LM method has quadratic convergence under the local error bound condition, which is weaker than the nonsingularity of the Jacobian at the solution (Yamashita and Fukushima 2001). Fan and Yuan choose λk = Fk δ with δ ∈ [1, 2] and show that the LM method preserves the quadratic convergence (Fan and Yuan 2005). The choice of λk = Fk performs best in the numerical experiments. Inspired by the modified Chebyshev’s method and the Levenberg–Marquardt method, we construct a multi-point LM method in this paper. At every iteration, we first solve the linear equations (JkT Jk + λk I )d = −JkT Fk with λk = μk Fk (6) to obtain the step dk , where μk > 0 is updated from iteration to iteration. Let yk = xk + pdk , where p ∈ (0, 1]. Then, we solve (JkT Jk + λk I )d = − to obtain dˆk , and set
1 T J [( p − 1)Fk + F(yk )] p2 k
xk+1 = xk + dk + p dˆk .
(7)
(8)
The above multi-point LM method includes the modified LM method presented in Fan (2012) as a special case when p = 1. In this paper, we study its convergence properties under the local error bound condition.
123
The multi-point LM method...
The paper is organized as follows: In Sect. 2, we present a globally convergent multi-point LM algorithm using the trust region technique. In Sect. 3, we investigate the properties of dk and dˆk , and derive the convergence order of the algorithm under the local error bound condition. Finally, some numerical results are presented in Sect. 4.
2 The multi-point LM algorithm and global convergence In this section, we present a multi-point LM algorithm. We use the trust region technique to guarantee the global convergence of the algorithm. We take (x) = F(x)2 (9) as the merit function for (1). Define the actual reduction of (x) at the k-th iteration as Aredk = Fk 2 − F(xk + dk + p dˆk )2 . The predicted reduction needs to be nonnegative. Note that, the step dk in (6) is the minimizer of the convex minimization problem: min Fk + Jk d2 + λk d2 ϕk,1 (d).
(10)
k,1 = dk = − (JkT Jk + λk I )−1 JkT Fk ,
(11)
d∈R n
If we let
it can be verified that dk is also a solution of the trust region subproblem: min Fk + Jk d2 s.t. d ≤ k,1 .
(12)
d∈R n
By the result given in Powell (1975), we know that Fk − Fk + Jk dk ≥ 2
2
JkT
Fk min dk ,
JkT Fk
JkT Jk
.
(13)
Similarly, the step dˆk in (7) is the minimizer of the problem min qk + Jk d2 + λk d2 ϕk,2 (d),
(14)
d∈R n
where qk =
1 [( p − 1)Fk + F(yk )]. p2
Obviously, dˆk is also a solution of the trust region subproblem: min qk + Jk d2 s.t. d ≤ k,2 = dˆk .
(15)
d∈R n
So, we have
qk − qk + Jk dˆk ≥ 2
2
JkT qk min
dˆk ,
JkT qk JkT Jk
.
(16)
Based on (13) and (16), we define the new predicted reduction as Predk = Fk 2 − Fk + Jk dk 2 + p 2 [qk 2 − qk + Jk dˆk 2 ],
123
(17)
X. Zhao, J. Fan
which satisfies Predk ≥
JkT
Fk min dk ,
JkT Fk
JkT Jk
+p
2
JkT qk min
dˆk ,
JkT qk JkT Jk
.
(18)
The ratio of the actual reduction to the predicted reduction rk =
Aredk Predk
is used in deciding whether to accept the trial step and how to adjust the parameter μk . The multi-point LM algorithm is presented as follows. Algorithm 2.1 Step 1. Given x1 ∈ R n , μ1 > m > 0, 0 < p0 ≤ p1 < p2 < 1, 0 < p ≤ 1, k := 1. Step 2. If JkT Fk = 0, then stop. Solve (JkT Jk + λk I )d = −JkT Fk with λk = μk Fk
(19)
to obtain dk , and set yk = xk + pdk . Compute qk =
1 [( p − 1)Fk + F(yk )]. p2
Solve (JkT Jk + λk I )d = −JkT qk to obtain dˆk . Step 3. Compute rk = Aredk /Predk . Set xk + dk + p dˆk , if rk ≥ p0 , xk+1 = xk , other wise. Step 4. Choose μk+1 as μk+1
⎧ if rk < p1 , ⎪ ⎨ 4μk , μ , k =
if rk ∈ [ p1 , p2 ], ⎪ ⎩ max μk , m , if rk > p2 . 4
(20)
(21)
(22)
Set k := k + 1 and go to Step 2. Compared with the modified Chebyshev’s method, Algorithm 2.1 can deal with the illconditioned and singular nonlinear equations. To study the global convergence of Algorithm 2.1, we make the following assumption. Assumption 2.1 F(x) is continuously differentiable, and both F(x) and its Jacobian J (x) are Lipschitz continuous, i.e., there exist positive constants L 1 and L 2 such that J (y) − J (x) ≤ L 1 y − x, ∀x, y ∈ R n
(23)
F(y) − F(x) ≤ L 2 y − x, ∀x, y ∈ R n .
(24)
and
123
The multi-point LM method...
By (23), we have F(y) − F(x) − J (x)(y − x) ≤ L 1 y − x2 , ∀x, y ∈ R n .
(25)
Next, we show that the sequence generated by Algorithm 2.1 converges to the stationary point of the merit function. Theorem 2.1 Under the conditions of Assumption 2.1, Algorithm 2.1 terminates in finite iterations or satisfies lim JkT Fk = 0. (26) k→∞
Proof We prove by contradiction. Suppose the theorem is not true, then there exist a positive τ and infinitely many k such that JkT Fk ≥ τ. (27) Let T1 , T2 be the sets of the indices as follows: T T1 = {k|
Jk Fk ≥ ττ}, T2 = k| JkT Fk ≥ and xk+1 = xk . 2 Then, T1 is an infinite set. In the following, we derive contradictions to (27) whatever T2 is finite or infinite.
Case (I) T2 is finite. By the definition of T2 , the set T3 = {k| JkT Fk ≥ τ
and xk+1 = xk }
˜ ∈ T1 }. is also finite. Let k˜ be the largest index of T3 . Then, xk+1 = xk holds for all k ∈ {k > k|k Define the indices set ˜ JkT Fk ≥ τ T4 = {k > k|
and xk+1 = xk }.
T F Suppose k ∈ T4 . It is easy to see that Jk+1 k+1 ≥ τ . Moreover, we have x k+2 = x k+1 . Otherwise, if xk+2 = xk+1 , then k + 1 ∈ T3 , which contradicts the fact that k˜ is the largest index of T3 . Hence, we have k + 1 ∈ T4 . By induction, we know that JkT Fk ≥ τ and ˜ xk+1 = xk hold for all k > k. ˜ which implies that By Step 3, we know that rk < p0 for all k > k,
μk → ∞
and λk → ∞,
(28)
due to (19). Hence, we have dk → 0.
(29)
Moreover, it follows from (15), (25) and (28) that there exists a positive constant c¯ such that 1 T −1 T ˆ dk = − 2 (Jk Jk + λk I ) Jk [( p − 1)Fk + F(yk )] p 1− p 1 ≤ (JkT Jk + λk I )−1 JkT Fk + 2 (JkT Jk + λk I )−1 JkT F(yk ) p2 p 1 1− p (JkT Jk + λk I )−1 JkT Fk + 2 [(JkT Jk + λk I )−1 JkT Fk ≤ p2 p + p(JkT Jk + λk I )−1 JkT Jk dk + p 2 L 1 dk 2 (JkT Jk + λk I )−1 JkT ] L1 L2 2 dk 2 ≤ cd ¯ k ≤ 2 dk + p λk
123
(30)
X. Zhao, J. Fan
holds for all sufficiently large k. Let sk = dk + p dˆk . We have sk = dk + p dˆk ≤ (1 + p c)d ¯ k .
(31)
Furthermore, it follows from (18), (24), (27), (30) and (31) that Aredk − Predk |rk − 1| = Predk 2 2 2 2 F(xk + dk + p dˆk ) − Fk + Jk dk + pqk − pqk + p Jk dˆk ≤ TF Tq J J k k JkT Fk min dk , k + p 2 JkT qk min dˆk , kT T Jk Jk Jk Jk ≤
Fk + Jk sk O(sk 2 + dk 2 ) + O(sk 4 + dk 4 ) + Fk + Jk dk O(dk 2 ) TF J k k JkT Fk min dk , JkT Jk
Fk O(dk 2 ) + Jk dˆk O(dk 2 ) dk → 0,
≤
(32)
which implies that rk → 1. In view of the updating rule of μk , we know that there exists a positive constant m˜ > m such that μk < m˜ holds for all sufficiently large k, which is a contradiction to (28). Hence, (27) cannot be true while T2 is finite. Case (II) T2 is infinite. It follows from (18) and (24) that
F1 2 ≥ (Fk 2 − Fk+1 2 ) k∈T2
≥
p0 Predk
k∈T2
≥
p0
JkT
Fk min dk ,
k∈T2
+p
2
JkT qk min
dˆk ,
≥
JkT Fk
JkT Jk JkT qk
JkT Jk
p0 τ τ min dk , 2 2 2L 2
,
(33)
k∈T2
which implies
lim
k→∞,k∈T2
Then, the definition of dk gives
dk = 0.
λk → +∞, k ∈ T2 .
(34) (35)
Similarly to (30), there exists a positive c˜ such that ˜ k dˆk ≤ cd
(36)
holds for all sufficiently large k ∈ T2 . By (33), we have sk = dk + p dˆk ≤ (1 + p c)d ˜ k .
123
(37)
The multi-point LM method...
Hence, we obtain from (33) that
sk = dk + p dˆk ≤ (1 + p c)d ˜ k < +∞. k∈T2
k∈T2
(38)
k∈T2
Furthermore, it follows from (23) and (24) that
J T Fk − J T Fk+1 < +∞. k+1 k k∈T2
Since (27) holds for infinitely many k, there exists a large kˆ such that J ˆT Fkˆ ≥ τ and k
J T Fk − J T Fk+1 < τ . k+1 2 k k∈T2 ,k≥kˆ
ˆ Then, we deduce from By induction, we see that JkT Fk ≥ τ/2 holds for all k ≥ k. (33)–(36) that limk→∞ xk exists and dk → 0, dˆk → 0.
(39)
μk → +∞.
(40)
So we have By the same arguments as in (32), we know that rk → 1. Thus, there exists a positive constant mˆ > m such that μk < mˆ holds for all sufficiently large k, which is a contradiction to (40). Therefore, the supposition (27) cannot be true when T2 is infinite. The proof is completed.
3 Convergence rate of Algorithm 2.1 In this section, we first investigate the properties of the steps dk and dˆk . Then, we show that the parameter μk is bounded. Finally, we derive the convergence rate of Algorithm 2.1 under the local error bound condition. We make the following assumption. Assumption 3.1 (a) F(x) is continuously differentiable, and F(x) provides a local error bound on some neighborhood of x ∗ ∈ X ∗ , i.e., there exist positive constants c1 > 0 and 0 < b1 < 1 such that F(x) ≥ c1 dist (x, X ∗ ), ∀x ∈ N (x ∗ , b1 ) = {x : x − x ∗ ≤ b1 }.
(41)
(b) The Jacobian J (x) is Lipschitz continuous on N (x ∗ , b1 ), i.e., there exists a positive constant L 1 such that J (y) − J (x) ≤ L 1 y − x, ∀x, y ∈ N (x ∗ , b1 ).
(42)
If the Jacobian at a solution is nonsingular, then it is an isolated solution, so F(x) provides a local error bound on some neighborhood of the solution. However, the converse is not necessary true. We refer to Yamashita and Fukushima (2001) for examples. Thus, the local error bound condition is weaker than the nonsingularity of the Jacobian.
123
X. Zhao, J. Fan
By (42), we have F(y) − F(x) − J (x)(y − x) ≤ L 1 y − x2 , ∀x, y ∈ N (x ∗ , b1 ).
(43)
Moreover, there exists a positive constant L 2 such that F(y) − F(x) ≤ L 2 y − x, ∀x, y ∈ N (x ∗ , b1 ).
(44)
Throughout the paper, we denote x¯k the vector in X ∗ that satisfies x¯k − xk = dist (xk , X ∗ ).
(45)
3.1 Properties of dk and dˆk In the following, we investigate the relationships among dk , dˆk and dist (xk , X ∗ ). Suppose the singular value decomposition of J (x¯k ) is ¯ k V¯kT J¯k =U¯ k =(U¯ k,1 , U¯ k,2 )
¯ k,1 0
T V¯k,1 T ¯ Vk,2
T ¯ k,1 V¯k,1 , =U¯ k,1
(46)
¯ k,1 = diag(σ¯ k,1 , . . . , σ¯ k,r ) with σ¯ k,1 ≥ σ¯ k,2 ≥ · · · ≥ σ¯ k,r > 0. The corresponding where SVD of Jk is Jk = Uk k VkT
⎛
= (Uk,1 , Uk,2 , Uk,3 ) ⎝
k,1
k,2
⎞⎛VT ⎞ k,1 T ⎟ ⎠⎜ ⎝ Vk,2 ⎠ T 0 Vk,3
T T + Uk,2 k,2 Vk,2 , = Uk,1 k,1 Vk,1
(47)
where k,1 = diag(σk,1 , . . . , σk,r ) with σk,1 ≥ σk,2 ≥ · · · ≥ σk,r > 0, and k,2 = diag(σk,r +1 , . . . , σk,r +q ) with σk,r ≥ σk,r +1 ≥ · · · ≥ σk,r +q > 0. For clarity, if the context is clear, we neglect the subscription k in k,i and Uk,i , Vk,i (i = 1, 2, 3), and write Jk as Jk = U1 1 V1T + U2 2 V2T .
(48)
Lemma 3.1 Under the conditions of Assumption 3.1, if xk , yk ∈ N (x ∗ , b1 /2), then there exists a constant c2 > 0 such that sk ≤ c2 dist(xk , X ∗ )
(49)
holds for all sufficiently large k. Proof Since xk ∈ N (x ∗ , b1 /2), we have x¯k − x ∗ ≤ x¯k − xk + xk − x ∗ ≤ 2xk − x ∗ ≤ b1 ,
(50)
which implies that x¯k ∈ N (x ∗ , b1 ). By (41) and (22), we have λk = μk Fk ≥ mc1 x¯k − xk .
123
(51)
The multi-point LM method...
Since dk is a minimizer of ϕk,1 (d), by (10), (43) and (51), we have ϕk,1 (dk ) λk ϕk,1 (x¯k − xk ) ≤ λk Fk + Jk (x¯k − xk )2 + λk x¯k − xk 2 = λk 1 2 ≤ L x¯k − xk 4 + x¯k − xk 2 λk 1
dk 2 ≤
≤ x¯k − xk 2 + c1−1 m −1 L 21 x¯k − xk 3 ,
(52)
then dk ≤ c˜2 x¯k − xk
(53)
for sufficiently large k, where c˜2 is a positive constant. It follows from (43) that dˆk = − 1 p2 1 ≤ 2 p
≤
1 T (J Jk + λk I )−1 JkT [( p − 1)Fk + F(yk )] p2 k
JkT Jk + λk I )−1 JkT ( p − 1)Fk + (JkT Jk + λk I )−1 JkT F(yk ) (1 − p)(JkT Jk + λk I )−1 JkT Fk + (JkT Jk + λk I )−1 JkT Fk
+ p(JkT Jk + λk I )−1 JkT Jk dk + L 1 p 2 dk 2 (JkT Jk + λk I )−1 JkT ≤
2 dk + L 1 dk 2 (JkT Jk + λk I )−1 JkT . p2
(54)
By (47), we have (JkT Jk + λk I )−1 JkT ⎛ 2 (1 + λk I )−1 1 = (V1 , V2 , V3 ) ⎝ (22 + λk I )−1 2 ⎛ 2 ⎞ ( + λk I )−1 1 1 2 −1 ⎝ ⎠ = (2 + λk I ) 2 0 −1 1 . ≤ −1 λk 2
⎞ ⎛ U T ⎞ 1 ⎟ ⎜ T ⎠ ⎝ U ⎠ 2 T 0 U 3
(55)
By the theory of matrix perturbation (Stewart and Sun 1990) and the Lipschitzness of Jk , we have ¯ 1 , 2 , 0) ≤ Jk − J¯k ≤ L 1 x¯k − xk , diag(1 − which yields
¯ 1 ≤ L 1 x¯k − xk 1 −
and 2 ≤ L 1 x¯k − xk .
123
(56)
X. Zhao, J. Fan
Since {xk } converges to the solution set X ∗ , we assume that L 1 x¯k − xk ≤ σ¯ r /2 holds for all sufficiently large k. Then, we have 1−1 ≤
2 1 ≤ . σ¯ r − L 1 x¯k − xk σ¯ r
(57)
Moreover, by (51), λ−1 k 2 =
L 1 x¯k − xk L1 2 2 ≤ = ≤ . λk mc1 x¯k − xk mc1 x¯k − xk mc1
(58)
The above inequality, together with (54), implies that there exists a c¯2 > 0 such that dˆk ≤ c¯2 x¯k − xk .
(59)
sk ≤ c2 x¯k − xk
(60)
By (53) and (59), we have for some c2 > 0. The proof is completed.
3.2 Boundedness of the LM parameter The updating rule of {μk } indicates that {μk } is bounded below. Next, we show that μk is bounded above. Lemma 3.2 Under the conditions of Assumption 3.1, if xk , yk , xk + dk ∈ N (x ∗ , b1 /2), then there exists a positive constant M > m such that μk ≤ M
(61)
holds for all sufficiently large k. Proof First, we prove that for all sufficiently large k, Predk ≥ c3 Fk min dk , x¯k − xk ,
(62)
where c3 is a positive constant. We consider two cases. If x¯k − xk ≤ dk , by (41), (43) and the fact that dk is a solution of (12), we have Fk − Fk + Jk dk ≥ Fk − Fk + Jk (x¯k − xk ) ≥ c1 x¯k − xk + O(x¯k − xk )2 ≥ c3 x¯k − xk for some c3 > 0. In the other case that x¯k − xk ≥ dk , we have dk Fk − Fk + Jk dk ≥ Fk − F + ( x ¯ − x ) J k k x¯ − x k k k k dk ≥ (Fk − Fk + Jk (x¯k − xk )) x¯k − xk dk · c3 x¯k − xk ≥ x¯k − xk ≥ c3 dk .
123
(63)
(64)
The multi-point LM method...
Both (63) and (64) indicate that Fk 2 − Fk + Jk dk 2 = (Fk + Fk + Jk dk )(Fk − Fk + Jk dk ) ≥ c3 Fk min dk , x¯k − xk . Since dˆk is a solution of (15), we have qk 2 − qk + Jk dˆk 2 ≥ 0. So, we have Predk = Fk 2 − Fk + Jk dk 2 + p 2 [qk 2 − qk + Jk dˆk 2 ] ≥ Fk 2 − Fk + Jk dk 2 ≥ c3 Fk min dk , x¯k − xk . It follows from (43), (53) and (62) that Aredk − Predk |rk − 1| = Predk F(x + d + p dˆ )2 − pq + p J dˆ 2 + pq 2 − F + J d 2 k k k k k k k k k k = Predk Fk + Jk sk O(sk 2 + dk 2 ) + O(sk 4 + dk 4 ) + Fk + Jk dk O(dk 2 ) c3 Fk min{dk , x¯k − xk } 2 Fk + Jk sk O(sk + dk 2 ) + O(sk 4 + dk 4 ) + Fk + Jk dk O(dk 2 ) . ≤ x¯k − xk dk (65) ≤
In view of (44), (53), (54) and (59), we see that Fk + Jk dk ≤ Fk ≤ O(x¯k − xk ),
(66)
Fk + Jk sk ≤ Fk + Jk dk + pJk dˆk ≤ Fk + pL 2 c¯2 x¯k − xk ≤ O(x¯k − xk ),
(67)
and sk 2 = dk + p dˆk 2 ≤ dk 2 + 2 pdk dˆk + p 2 dˆk 2 2 2 + 1 dk 2 + O(dk 3 ). ≤ p
(68)
So, we have rk → 1. Hence, there exists a positive constant M > m such that μk ≤ M holds for all sufficiently large k. The proof is completed.
Lemma 3.2, together with (44), indicates that the LM parameter satisfies mc1 x¯k − xk ≤ λk = μk Fk ≤ L 2 Mx¯k − xk .
123
(69)
X. Zhao, J. Fan
3.3 Convergence rate of Algorithm 2.1 By the SVD of Jk , we compute dk = −V1 (12 + λk I )−1 1 U1T Fk − V2 (22 + λk I )−1 2 U2T Fk ,
(70)
and Fk + p Jk dk = U1 [(1 − p)12 + λk I ](12 + λk I )−1 U1T Fk + U2 [(1 − p)22 + λk I ](22 + λk I )−1 U2T Fk + U3 U3T Fk .
(71)
The following two lemmas give the estimations of U1 U1T Fk , U2 U2T Fk , U3 U3T Fk as well as U1 U1T F(yk ), U2 U2T F(yk ) and U3 U3T F(yk ). Lemma 3.3 Under the conditions of Assumption 3.1, if xk ∈ N (x ∗ , b1 /2), then we have U1 U1T Fk ≤ L 2 x¯k − xk , U2 U2T U3 U3T
(72) 2
Fk ≤ 2L 1 x¯k − xk ,
(73)
Fk ≤ L 1 x¯k − xk ,
(74)
2
where L 1 , L 2 are given in (42) and (44), respectively. Proof The result (72) follows directly from (44). Let s¯k = −Jk+ Fk , where Jk+ is the pseudo-inverse of Jk . It is obvious that s¯k is the least squares solution of mins∈R n Fk + Jk s. So, by (43), we have U3 U3T Fk = Fk + Jk s¯k ≤ Fk + Jk (x¯k − xk ) ≤ L 1 x¯k − xk 2 . Let J˜k = U1 1 V1T and s˜k = − J˜k+ Fk . Since s˜k is the least squares solution of mins∈R n Fk + J˜k s, it follows from (43) and (56) that (U2 U2T + U3 U3T )Fk = Fk + J˜k s˜k ≤ Fk + J˜k (x¯k − xk ) ≤ Fk + Jk (x¯k − xk ) + ( J˜k − Jk )(x¯k − xk ) ≤ L 1 x¯k − xk 2 + U2 2 V2T (x¯k − xk ) ≤ L 1 x¯k − xk 2 + L 1 x¯k − xk x¯k − xk ≤ 2L 1 x¯k − xk 2 .
(75)
Due to the orthogonality of U2 and U3 , we obtain (73) and (74).
Lemma 3.4 Under the conditions of Assumption 3.1, if xk , yk ∈ N (x ∗ , b1 /2), then we have U1 U1T F(yk ) ≤ (1 − p)L 2 x¯k − xk + c˜4 x¯k − xk 2 ,
(76)
F(yk ) ≤ (1 − p)c5 x¯k − xk + c˜5 x¯k − xk ,
(77)
F(yk ) ≤ (1 − p)c6 x¯k − xk + c˜6 x¯k − xk ,
(78)
U2 U2T U3 U3T
2
where c˜4 , c5 , c˜5 , c6 , c˜6 are positive constants.
123
2
3 3
The multi-point LM method...
Proof It follows from (71), (57) and Lemma 3.1 that Fk + p Jk dk ≤ ||U1 [(1 − p)12 + λk I ](12 + λk I )−1 U1T Fk || + ||U2 [(1 − p)22 + λk I ](22 + λk I )−1 U2T Fk || + ||U3 U3T Fk || ≤ (1 − p)U1 U1T Fk + pλk U1 (12 + λk I )−1 U1T Fk + (1 − p)U2 U2T Fk + pλk U2 (22 + λk I )−1 U2T Fk + U3 U3T Fk ≤ (1 − p)L 2 x¯k − xk + pλk U1 1−2 U1T Fk + (1 − p)2L 1 x¯k − xk 2 + pU2 U2T Fk + L 1 x¯k − xk 2 4M = (1 − p)L 2 x¯k − xk + pL 22 2 + 3L 1 x¯k − xk 2 . σ¯ r
(79)
Hence, by (43) and (53), we have F(yk ) = F(xk + pdk ) ≤ Fk + p Jk dk + L 1 p 2 dk 2 4M ≤ (1− p)L 2 x¯k −xk + pL 22 2 +3L 1 + L 1 p 2 c˜22 x¯k −xk 2 σ¯ r ≤ (1 − p)L 2 x¯k − xk + c˜4 x¯k − xk 2
(80)
for some c˜4 , which gives (76). By (41), we have
y¯k − yk ≤ c1−1 F(yk ) ≤ (1 − p)
L2 c˜4 x¯k − xk + x¯k − xk 2 . c1 c1
(81)
Let p¯ k = −Jk+ F(yk ). Then, p¯ k is the least squares solution of min p∈R n F(yk ) + Jk p. By simple calculations, we deduce from (41), (42) and (81) that U3 U3T F(yk ) = F(yk ) + Jk p¯ k ≤ F(yk ) + Jk ( y¯k − yk ) ≤ F(yk ) + J (yk )( y¯k − yk ) + (J (yk ) − Jk )( y¯k − yk ) ≤ L 1 y¯k − yk 2 + L 1 pdk y¯k − yk 2 L 2 c˜2 2 L2 ≤ L 1 (1 − p) 2 + L 1 p(1 − p) x¯k − xk 2 + c˜6 x¯k − xk 3 c1 c1 ≤ (1 − p)c6 x¯k − xk 2 + c˜6 x¯k − xk 3
(82)
for some positive c6 and c˜6 .
123
X. Zhao, J. Fan
Let J˜k = U1 1 V1T and p˜ k = −Jk+ F(yk ). Since p˜ k is the least solution of min p∈R n F(yk ) + J˜k p, we deduce from(42), (43) and (81) that (U2 U2T + U3 U3T )F(yk ) = F(yk ) + J˜k p˜ k ≤ F(yk ) + J˜k ( y¯k − yk ) ≤ F(yk ) + J (yk )( y¯k − yk ) + ( J˜k − J (yk ))( y¯k − yk ) ≤ L 1 y¯k − yk 2 + (Jk − J (yk ))( y¯k − yk ) + U2 2 V2 ( y¯k − yk ) ≤ L 1 y¯k − yk 2 + pL 1 dk y¯k − yk + L 1 x¯k − xk y¯k − yk L1 L2 L2 ≤ (1 − p) + p c˜2 + 1 x¯k − xk 2 + c˜5 x¯k − xk 3 (1 − p) c1 c1 ≤ (1 − p)c5 x¯k − xk 2 + c˜5 x¯k − xk 3 for some positive c5 and c˜5 . Due to the orthogonality of U2 to U3 , we get (73) and (74). The proof is completed.
(83)
With the help of the above two lemmas, we obtain the convergence order of Algorithm 2.1. Theorem 3.1 Under the conditions of Assumption 3.1, the sequence {xk } generated by Algorithm 2.1 satisfies x¯k+1 − xk+1 ≤ (1 − p)O(x¯k − xk 2 ) + O(x¯k − xk 3 ).
(84)
Proof We have qk + Jk dˆk = [I − U1 1 (12 + λk I )−1 1 U1T − U2 2 (22 + λk I )−1 2 U2T ]qk = λk U1 (12 + λk I )−1 U1T qk + λk U2 (22 + λk I )−1 U2T qk + U3 U3T qk .
(85)
It follows from (85), Lemmas 3.1 and 3.2 that 1− p T 1 −2 T ˆ qk + Jk dk ≤ λk 1 U1 Fk + 2 U1 F(yk ) p2 p 1− p T 1 1− p T 1 T T + U F + U F(y ) + U F + U F(y ) k k 2 k 3 k p2 p2 2 p2 p2 3 1− p 8 ≤ M L 22 + 3L 1 + c5 + c6 x¯k − xk 2 + c˜7 x¯k − xk 3 p2 σ¯ r2 ≤ (1 − p)c7 x¯k − xk 2 + c˜7 x¯k − xk 3
(86)
for some positive c7 and c˜7 . Let z k = xk + dk . By (43), we have pqk =
1 [( p − 1)Fk + F(yk )] = F(z k ) + (1 − p)O(dk 2 ). p
Specifically, we have F(z k ) = pqk if p = 1.
123
(87)
The multi-point LM method...
By (70), we have T dˆk ≤ 1−1 U1T qk + λ−1 k 2 U2 qk 1 L1 1 − p T 1 2 1− p T T T U1 Fk + 2 U1 F(yk ) + U2 Fk + 2 U2 F(yk ) ≤ σ¯r p2 p mc1 p2 p 2 1− p ≤ · · 2L 2 x¯k − xk + O(x¯k − xk 2 ). (88) σ¯r p2
Combining (41), (42), (53)–(55), (86), (87) and (88), we have c1 x¯k+1 − xk+1 ≤ F(xk+1 ) = F(z k + p dˆk ) ≤ F(z k ) + p J (z k )dˆk + p 2 L 1 dˆk 2 ≤ F(z k ) + p Jk dˆk + (J (z k ) − Jk ) p dˆk + p 2 L 1 dˆk 2 ≤ pqk + p Jk dˆk + (1 − p)O(dk 2 ) + L 1 pdk dˆk + p 2 L 1 dˆk 2 ≤ pqk + Jk dˆk + (1 − p)O(dk 2 ) + L 1 pdk dˆk + p 2 L 1 dˆk 2 ≤ (1 − p)O(x¯k − xk 2 ) + O(x¯k − xk 3 ).
The proof is completed.
4 Numerical experiments We tested Algorithm 2.1 on some singular problems, and compared it with the general LM algorithm which does not solve (20) and only computes the step dk . The test problems were created by modifying the nonsingular problems given by Moré, Garbow and Hillstrom in Moré et al. (1981), and have the same form as in Schnabel and Frank (1984), ˆ F(x) = F(x) − J (x ∗ )A(A T A)−1 A T (x − x ∗ ), where F(x) is the standard nonsingular test function, x ∗ is its root, and A ∈ R n×k has full ˆ ∗ ) = 0 and column rank with 1 ≤ k ≤ n. Obviously, F(x Jˆ(x ∗ ) = J (x ∗ )(I − A(A T A)−1 A T ) ˆ has rank n − k. A disadvantage of these problems is that F(x) may have roots that are not roots of F(x). We created two sets of singular problems, with Jˆ(x ∗ ) having rank n − 1 and n − 2, using A ∈ R n×1 , A T = (1, 1, . . . , 1)
and A ∈ R n×2 , A T =
1 1 1 1 ··· 1 1 −1 1 −1 · · · ±1,
respectively. Meanwhile, we made a slight alteration on the variable dimension problem, which has n + 2 equations in n unknowns; we eliminate the (n − 1)-th and n-th equations. (The first n equations in the standard problem are linear.) We set p0 = 0.0001, p1 = 0.25, p2 = 0.75, and μ1 = 10−5 for all the tests. The algorithm is terminated when the norm of JkT Fk , e.g., the derivative of at xk , is less than 10−5 , or when the number of iterations exceeds 100(n + 1).
123
X. Zhao, J. Fan Table 1 Results on first singular test set with rank J (x ∗ ) = n − 1 Problem
1
3
4
5
6
8
9
n
2
2
4
3
31
10
10
50
10
30
50
11
30
50
12
10
50
123
x0
sk = dk
p = 0.6
p = 0.8
p=1
NF/NJ/NF+NJ*n
NF/NJ/NF+NJ*n
NF/NJ/NF+NJ*n
NF/NJ/NF+NJ*n
1
15/15/45
23/12/47
23/12/47
21/11/43
10
17/17/51
27/14/55
27/14/55
25/13/51
100
21/21/63
33/17/67
31/16/63
29/15/59
1
300/145/590
31/16/63
63/81/99
29/15/59
10
14/14/42
25/13/51
301/132/565
301/72/445 301/74/449
100
300/188/676
300/74/449
301/72/445
1
16/16/80
25/13/77
23/12/71
23/12/71
10
19/19/95
31/16/95
29/15/89
27/14/83 31/16/95
100
22/22/110
35/18/107
33/17/101
1
8/8/32
13/7/34
13/7/34
11/6/29
10
8/8/32
13/7/34
11/6/29
9/5/24
100
8/8/32
13/7/34
13/7/34
11/6/29
1
–
81/19/670
75/15/540
81/18/639
10
–
–
–
–
100
–
–
–
–
1
8/8/88
13/7/83
11/6/71
11/6/71
10
23/23/253
37/19/227
35/18/215
33/17/203 3/2/23
100
2/2/22
3/2/23
3/2/23
1
4/4/44
5/3/35
3/2/23
5/3/35
10
7/7/77
11/6/71
11/6/71
11/6/71 13/7/83
100
9/9/99
15/8/95
13/7/83
1
2/2/102
3/2/103
3/2/103
3/2/103
10
6/6/306
9/5/259
9/5/259
9/5/259 13/7/363
100
9/9/459
13/7/363
13/7/363
1
6/6/186
9/5/159
9/5/159
7/4/127
10
7/7/217
15/8/255
13/7/223
11/6/191 13/7/223
100
10/10/310
15/8/255
15/8/255
1
6/6/306
9/5/259
9/5/259
7/4/207
10
7/7/357
15/8/415
13/7/363
11/6/311
100
10/10/510
15/8/415
15/8/415
13/7/363
1
35/19/605
43/8/283
45/9/315
45/9/315
10
60/46/1440
69/20/669
69/17/579
85/23/775
100
45/35/1095
103/30/1003
101/30/1001
99/24/819
1
19/7/369
45/12/645
57/14/757
55/14/755
10
35/35/1235
127/34/1827
107/32/1707
163/45/2413
100
41/27/1391
129/31/1679
81/22/1181
121/34/1821
1
14/14/154
23/12/143
21/11/131
19/10/119
10
16/16/176
25/13/155
23/12/143
21/11/131
100
19/19/209
31/16/191
29/15/179
27/14/167
1
20/20/1020
31/16/831
29/15/779
27/14/727
The multi-point LM method... Table 1 continued Problem
13
n
30
50
x0
30
50
p = 0.6
p = 0.8
p=1
NF/NJ/NF+NJ*n
NF/NJ/NF+NJ*n
NF/NJ/NF+NJ*n
10
21/21/1071
35/18/935
33/17/883
31/16/831
100
25/25/1275
41/21/1091
37/19/987
35/18/935
9/9/279
15/8/255
13/7/223
13/7/223
10
1
14/14/434
21/11/351
21/11/351
19/10/319
100
17/17/527
27/14/447
25/13/415
25/13/415
9/9/459
15/8/415
13/7/363
13/7/363
14/14/714
21/11/571
21/11/571
19/10/519
1 10
14
sk = dk NF/NJ/NF+NJ*n
100
17/17/867
27/14/727
25/13/675
25/13/675
1
12/12/372
19/10/319
19/10/319
17/9/287
10
18/18/558
29/15/479
27/14/447
27/14/447
100
24/24/744
39/20/639
37/19/607
35/18/575
1
12/12/612
19/10/519
19/10/519
17/9/467
10
18/18/918
29/15/779
27/14/727
27/14/727
100
24/24/1224
39/20/1039
37/19/987
35/18/935
The experiments are implemented on PC with 2.30 GHz CPU processor, 6.0 GB RAM memory, using Matlab R2014b. The results for the first set problems of rank n −1 are listed in Table 1, and the second set of rank n −2 in Table 2. The third column of the table indicates that the starting point is x 0 , 10x0 , and 100x0 , where x0 is suggested by Moré, Garbow and Hillstrom in Moré et al. (1981); “NF” and “NJ” represent the numbers of function calculations, and Jacobian calculations, respectively. If the method failed to find a solution in 100(n + 1) iterations, we denoted it by the sign “-”. Note that, for general nonlinear equations, the Jacobian calculations are usually n times of the function calculations. So, we also presented the values “NF+NJ*n” for comparisons of the total calculations. However, if the Jacobian is sparse, this kind of values does not mean much. From the tables, we can see that Algorithm 2.1 almost always performs best when p = 1, whether on the first singular test set or on the second test set. Though the function calculations of Algorithm 2.1 are more than those of the general LM algorithm, their Jacobian calculations as well as the total calculations are much less. The tables indicate that Algorithm 2.1 performs better and better as p increases, which supports the local convergence result we obtain in Sect. 3. We also adopt the performance profiles by Dolan and Moré (2002) to compare the performances of the general LM algorithm (i.e, sk = dk ) with Algorithm 2.1 for different p based on the numbers of function calculations and Jacobian calculations. Figure 1 shows the performance profiles of the algorithms for the problems with rankJ (x ∗ ) = n − 1. It can be seen that the multi-point LM algorithm with p = 1 has the most wins (has the highest probability of being the optimal solver) and that the probability that it is the winner on a given problem is about 0.76. The multi-point LM algorithm with p = 0.8 has a lower number of wins than with p = 1, but its performance becomes much more competitive if we extend our τ of interest to 2. If we are interested in the solver that can solve 90 % of the problems with the greatest efficiency, then p = 0.6 and
123
X. Zhao, J. Fan Table 2 Results on second singular test set with rank J (x ∗ ) = n − 2 Problem
n
1
2
3
4
5
6
8
2
4
3
31
10
sk = dk NF/NJ/NF+NJ*n
x0
10
10
30
11/11/33
17/9/35
15/8/31
15/8/31
13/13/39
21/11/43
19/10/39
19/10/39
100
17/17/51
27/14/55
25/13/51
23/12/47
1
35/25/85
195/63/321
167/61/289
57/17/91
10
59/54/167
39/14/67
67/29/125
91/34/159
100
25/18/61
41/15/71
41/15/71
51/15/81
1
14/14/70
21/11/65
21/11/65
19/10/59
10
17/17/85
27/14/83
25/13/77
23/12/71
100
29/15/89
20/20/100
33/17/101
31/16/95
1
13/13/52
21/11/54
21/11/54
19/10/49
10
14/14/56
23/12/59
21/11/54
19/10/49
100
24/18/78
23/12/59
21/11/54
21/11/54
1
–
3181/814/28415
–
–
10
–
–
–
–
100
–
–
–
–
8/8/88
13/7/83
11/6/71
11/6/71
23/23/253
37/19/227
35/18/215
33/17/203
100
2/2/22
3/2/23
3/2/23
3/2/23
1
4/4/44
5/3/35
5/3/35
5/3/35
10
7/7/77
11/6/71
11/6/71
11/6/71 13/7/83
1
10/10/110
15/8/95
15/8/95
1
2/2/102
3/2/103
3/2/103
3/2/103
10
6/6/306
9/5/259
9/5/259
9/5/259
100
9/9/459
15/8/415
13/7/363
13/7/363
1
6/6/186
9/5/159
9/5/159
7/4/127
10
7/7/217
15/8/225
13/7/223
11/6/191 15/8/255
100 50
10/10/310
17/9/287
15/8/255
1
6/6/306
9/5/259
9/5/259
7/4/207
10
7/7/357
15/8/415
13/7/363
11/6/311
11/11/561
17/9/467
17/9/467
15/8/415
19/9/289
41/8/281
45/12/405
41/9/311
100 11
30
50
1 10
62/41/1292
95/24/815
63/16/543
65/17/575
100
43/35/1093
147/49/1617
201/60/2001
87/23/777
1
25/14/725
61/15/811
61/14/761
61/15/811
10
50/34/1750
97/29/1547
61/14/761
75/20/1075
100 12
10
123
p=1 NF/NJ/NF+NJ*n
1
100 50
p = 0.8 NF/NJ/NF+NJ*n
10
10 9
p = 0.6 NF/NJ/NF+NJ*n
51/35/1801
91/30/1581
93/28/1493
117/37/1967
1
14/14/154
23/12/143
21/11/131
19/10/119
10
16/16/176
25/13/155
23/12/143
21/11/131
100
19/19/209
31/16/191
29/15/179
27/14/167
The multi-point LM method... Table 2 continued Problem
n
50
sk = dk NF/NJ/NF+NJ*n
x0
30
50
30
50
p=1 NF/NJ/NF+NJ*n
1
20/20/1020
31/16/831
29/15/779
27/14/727
21/21/1071
35/18/935
33/17/883
31/16/831 35/18/935
25/25/1275
41/21/1091
37/19/987
9/9/279
15/8/255
13/7/223
13/7/223
10
14/14/434
21/11/351
21/11/351
19/10/319
100
17/17/527
27/14/447
25/13/415
25/13/415
9/9/459
15/8/415
13/7/363
13/7/363
14/14/714
21/11/571
21/11/571
19/10/519
1
1 10
14
p = 0.8 NF/NJ/NF+NJ*n
10 100 13
p = 0.6 NF/NJ/NF+NJ*n
100
17/17/867
27/14/727
25/13/675
25/13/675
1
12/12/372
19/10/319
19/10/319
17/9/287
10
18/18/558
29/15/479
27/14/447
27/14/447
100
24/24/744
39/20/639
37/19/607
35/18/575
1
12/12/612
19/10/519
19/10/519
17/9/467
10
18/18/918
29/15/779
27/14/727
27/14/727
100
24/24/1124
39/20/1039
37/19/987
35/18/935
Log 2 Scaled Performance Profile of NF+NJ*n when rank=n-1 1 0.9
P((log 2 (rp,s)<= ) : s=1,2,3,4)
0.8 0.7 0.6 0.5 0.4 0.3
sk=dk p=0.6 p=0.8 p=1
0.2 0.1 0
0
0.5
1
1.5
2
2.5
3
3.5
4
Fig. 1 Log2 Scaled Performance Profile of N F + N J ∗ n on first singular test set with rank J (x ∗ ) = n − 1
p = 0.8 stand out. If we hold to more stringent probabilities of completing a solver successfully, then p = 0.8 captures our attention with its ability to solve over 95 % of those questions. Figure 2 shows the performance profiles of the algorithms for the problems with rank J (x ∗ ) = n − 2. Similar results to Fig. 1 can be obtained. The multi-point LM algorithm
123
X. Zhao, J. Fan Log 2 Scaled Performance Profile of NF+NJ*n when rank=n-2 1
P((log 2 (rp,s)<= ) : s=1,2,3,4)
0.9 0.8 0.7 0.6 0.5 0.4 0.3
sk=dk
0.2
p=0.6 p=0.8 p=1
0.1 0
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Fig. 2 Log2 Scaled Performance Profile of N F + N J ∗ n on second singular test set with rank J (x ∗ ) = n − 2
with p = 1 has the most wins and the probability that it is the winner on a given problem is about 0.82. The algorithms with p = 0.8 and 0.6 have lower numbers of wins than p = 1. Acknowledgments The authors would like to thank the referees for the helpful and constructive suggestions, which have greatly improved the presentation of the paper.
References Argyros IK, Chen D (1993) Results on the Chebyshev method in Banach spaces. Proyecciones 12:119–128 Dolan ED, Moré JJ (2002) Benchmarking optimization software with performance profiles. Math Program 91:201–213 Ezquerro JA, Hernández MA (2009) An optimization of Chebyshev’s method. J Complex 25:343–361 Fan JY (2012) The modified Levenberg–Marquardt method for nonlinear equations with cubic convergence. Math Comput 81:447–466 Fan JY, Yuan YX (2005) On the quadratic convergence of the Levenberg–Marquardt method without nonsingularity assumption. Computing 74:23–39 Kelley CT (1995) Iterative methods for linear and nonlinear equations, frontiers in applied mathematics, vol 16. SIAM, Philadelphia Kelley CT (2003) Solving nonlinear equations with Newton’s method. Fundamentals of algorithms. SIAM, Philadelphia Levenberg K (1944) A method for the solution of certain nonlinear problems in least squares. Quart Appl Math 2:164–166 Marquardt DW (1963) An algorithm for least-squares estimation of nonlinear inequalities. SIAM J Appl Math 11:431–441 Moré JJ (1978) The Levenberg–Marquardt algorithm: implementation and theory. In: Watson GA (ed) Lecture notes in mathematics: numerical analysis, vol 630. Springer, Berlin, pp 105–116 Powell MJD (1975) Convergence properties of a class of minimization algorithms. In: Mangasarian OL, Meyer RR, Robinson SM (eds) Nonlinear programming, vol 2. Academic Press, New York, pp 1–27 Stewart GW, Sun JG (1990) Matrix perturbation theory. Academic Press, San Diego Moré JJ, Garbow BS, Hillstrom KH (1981) Testing unconstrained optimization software. ACM Trans Math Softw 7:17–41 Schnabel RB, Frank PD (1984) Tensor methods for nonlinear equations. SIAM J Numer Anal 21:815–843
123
The multi-point LM method... Yamashita N, Fukushima M (2001) On the rate of convergence of the Levenberg–Marquardt method. Comput Suppl 15:237–249 Yuan YX (1998) Trust region algorithms for nonlinear equations. Information 1:7–20 Yuan YX (2011) Recent advances in numerical methods for nonlinear equations and nonlinear least squares. Numer Algebra Control Optim 1:15–34
123