On the multi-point Levenberg–Marquardt method for singular nonlinear equations

In this paper, we present a multi-point iterative Levenberg–Marquardt algorithm for singular nonlinear equations. The algorithm converges globally and...

0 downloads 160 Views 398KB Size

Download PDF

Comp. Appl. Math. DOI 10.1007/s40314-015-0221-8

On the multi-point Levenberg–Marquardt method for singular nonlinear equations Xin Zhao1 · Jinyan Fan1

Received: 1 July 2014 / Revised: 5 January 2015 / Accepted: 4 March 2015 © SBMAC - Sociedade Brasileira de Matemática Aplicada e Computacional 2015

Abstract In this paper, we present a multi-point iterative Levenberg–Marquardt algorithm for singular nonlinear equations. The algorithm converges globally and the convergence order is studied under the local error bound condition, which is weaker than the nonsingularity of the Jacobian at the solution. Keywords Singular nonlinear equations · Levenberg–Marquardt method · Chebyshev’s method · Local error bound Mathematics Subject Classification

90C30 · 65K05

1 Introduction We consider the system of nonlinear equations F(x) = 0,

(1) F (x)

→ is continuously differentiable and is Lipschitz continuous. where F(x) : Throughout the paper, we assume that the solution set of (1) is nonempty and denote it by X ∗ . In all cases, · refers to the 2-norm. There are many methods for nonlinear equations (Kelley 1995, 2003; Levenberg 1944; Marquardt 1963; Moré 1978; Yuan 1998, 2011). Newton’s method is very efficient and popular. At every iteration, it computes dk = −Jk−1 Fk , (2) xk+1 = xk + dk , Rn

Rn

Communicated by Ernesto G. Birgin.

B 1

Jinyan Fan [email protected] Department of Mathematics and MOE-LSC, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, People’s Republic of China

123

X. Zhao, J. Fan

where Fk = F(xk ), and Jk = F (xk ) is the Jacobian. If J (x) is Lipschitz continuous and nonsingular at the solution, Newton’s method has quadratic convergence. There are also many other one-point iterative methods for (1). For example, Chebyshev’s method computes ⎧ ⎪ d = −Jk−1 Fk , ⎪ ⎨ k 1 (3) dˆk = − Jk−1 Fk dk2 , ⎪ 2 ⎪ ⎩x ˆ = x +d +d , k+1

k

k

k

at every iteration (Argyros and Chen 1993). To avoid the computation of Fk dk2 , Ezquerro and Hernández approximate it by the combinations of the function values at different points (Ezquerro and Hernández 2009). Let yk = xk + pdk , where p ∈ (0, 1] is a constant. Then, 2 Fk dk2 ≈ 2 [( p − 1)Fk + F(yk )], p which, together with (3), leads to the modified Chebyshev’s method: ⎧ dk = −Jk−1 Fk , yk = xk + pdk , ⎪ ⎪ ⎨ 1 dˆk = − 2 Jk−1 [( p − 1)Fk + F(yk )], ⎪ p ⎪ ⎩ xk+1 = xk + dk + dˆk .

(4)

Note that, when the Jacobian is singular or near singular, dk and dˆk given in (4) are not defined or may not be well defined. To overcome the difficulties caused by the possible singularity of Jk , the Levenberg–Marquardt (LM) method for (1) computes the step dkL M = −(JkT Jk + λk I )−1 JkT Fk

(5)

at every iteration, where the LM parameter λk is nonnegative and prevents the step from being too large when JkT Jk is near singular. There have been extensive studies on the LM method. For instance, Yamashita and Fukushima choose λk = Fk 2 and prove that the LM method has quadratic convergence under the local error bound condition, which is weaker than the nonsingularity of the Jacobian at the solution (Yamashita and Fukushima 2001). Fan and Yuan choose λk = Fk δ with δ ∈ [1, 2] and show that the LM method preserves the quadratic convergence (Fan and Yuan 2005). The choice of λk = Fk performs best in the numerical experiments. Inspired by the modified Chebyshev’s method and the Levenberg–Marquardt method, we construct a multi-point LM method in this paper. At every iteration, we first solve the linear equations (JkT Jk + λk I )d = −JkT Fk with λk = μk Fk (6) to obtain the step dk , where μk > 0 is updated from iteration to iteration. Let yk = xk + pdk , where p ∈ (0, 1]. Then, we solve (JkT Jk + λk I )d = − to obtain dˆk , and set

1 T J [( p − 1)Fk + F(yk )] p2 k

xk+1 = xk + dk + p dˆk .

(7)

(8)

The above multi-point LM method includes the modified LM method presented in Fan (2012) as a special case when p = 1. In this paper, we study its convergence properties under the local error bound condition.

123

The multi-point LM method...

The paper is organized as follows: In Sect. 2, we present a globally convergent multi-point LM algorithm using the trust region technique. In Sect. 3, we investigate the properties of dk and dˆk , and derive the convergence order of the algorithm under the local error bound condition. Finally, some numerical results are presented in Sect. 4.

2 The multi-point LM algorithm and global convergence In this section, we present a multi-point LM algorithm. We use the trust region technique to guarantee the global convergence of the algorithm. We take (x) = F(x)2 (9) as the merit function for (1). Define the actual reduction of (x) at the k-th iteration as Aredk = Fk 2 − F(xk + dk + p dˆk )2 . The predicted reduction needs to be nonnegative. Note that, the step dk in (6) is the minimizer of the convex minimization problem: min Fk + Jk d2 + λk d2 ϕk,1 (d).

(10)

k,1 = dk = − (JkT Jk + λk I )−1 JkT Fk ,

(11)

d∈R n

If we let

it can be verified that dk is also a solution of the trust region subproblem: min Fk + Jk d2 s.t. d ≤ k,1 .

(12)

d∈R n

By the result given in Powell (1975), we know that Fk − Fk + Jk dk ≥ 2

2

JkT

Fk min dk ,

JkT Fk

JkT Jk

.

(13)

Similarly, the step dˆk in (7) is the minimizer of the problem min qk + Jk d2 + λk d2 ϕk,2 (d),

(14)

d∈R n

where qk =

1 [( p − 1)Fk + F(yk )]. p2

Obviously, dˆk is also a solution of the trust region subproblem: min qk + Jk d2 s.t. d ≤ k,2 = dˆk .

(15)

d∈R n

So, we have

qk − qk + Jk dˆk ≥ 2

2

JkT qk min

dˆk ,

JkT qk JkT Jk

.

(16)

Based on (13) and (16), we define the new predicted reduction as Predk = Fk 2 − Fk + Jk dk 2 + p 2 [qk 2 − qk + Jk dˆk 2 ],

123

(17)

X. Zhao, J. Fan

which satisfies Predk ≥

JkT

Fk min dk ,

JkT Fk

JkT Jk

+p

2

JkT qk min

dˆk ,

JkT qk JkT Jk

.

(18)

The ratio of the actual reduction to the predicted reduction rk =

Aredk Predk

is used in deciding whether to accept the trial step and how to adjust the parameter μk . The multi-point LM algorithm is presented as follows. Algorithm 2.1 Step 1. Given x1 ∈ R n , μ1 > m > 0, 0 < p0 ≤ p1 < p2 < 1, 0 < p ≤ 1, k := 1. Step 2. If JkT Fk = 0, then stop. Solve (JkT Jk + λk I )d = −JkT Fk with λk = μk Fk

(19)

to obtain dk , and set yk = xk + pdk . Compute qk =

1 [( p − 1)Fk + F(yk )]. p2

Solve (JkT Jk + λk I )d = −JkT qk to obtain dˆk . Step 3. Compute rk = Aredk /Predk . Set xk + dk + p dˆk , if rk ≥ p0 , xk+1 = xk , other wise. Step 4. Choose μk+1 as μk+1

⎧ if rk < p1 , ⎪ ⎨ 4μk , μ , k =

if rk ∈ [ p1 , p2 ], ⎪ ⎩ max μk , m , if rk > p2 . 4

(20)

(21)

(22)

Set k := k + 1 and go to Step 2. Compared with the modified Chebyshev’s method, Algorithm 2.1 can deal with the illconditioned and singular nonlinear equations. To study the global convergence of Algorithm 2.1, we make the following assumption. Assumption 2.1 F(x) is continuously differentiable, and both F(x) and its Jacobian J (x) are Lipschitz continuous, i.e., there exist positive constants L 1 and L 2 such that J (y) − J (x) ≤ L 1 y − x, ∀x, y ∈ R n

(23)

F(y) − F(x) ≤ L 2 y − x, ∀x, y ∈ R n .

(24)

and

123

The multi-point LM method...

By (23), we have F(y) − F(x) − J (x)(y − x) ≤ L 1 y − x2 , ∀x, y ∈ R n .

(25)

Next, we show that the sequence generated by Algorithm 2.1 converges to the stationary point of the merit function. Theorem 2.1 Under the conditions of Assumption 2.1, Algorithm 2.1 terminates in finite iterations or satisfies lim JkT Fk = 0. (26) k→∞

Proof We prove by contradiction. Suppose the theorem is not true, then there exist a positive τ and infinitely many k such that JkT Fk ≥ τ. (27) Let T1 , T2 be the sets of the indices as follows: T T1 = {k|

Jk Fk ≥ ττ}, T2 = k| JkT Fk ≥ and xk+1 = xk . 2 Then, T1 is an infinite set. In the following, we derive contradictions to (27) whatever T2 is finite or infinite.

Case (I) T2 is finite. By the definition of T2 , the set T3 = {k| JkT Fk ≥ τ

and xk+1 = xk }

˜ ∈ T1 }. is also finite. Let k˜ be the largest index of T3 . Then, xk+1 = xk holds for all k ∈ {k > k|k Define the indices set ˜ JkT Fk ≥ τ T4 = {k > k|

and xk+1 = xk }.

T F Suppose k ∈ T4 . It is easy to see that Jk+1 k+1 ≥ τ . Moreover, we have x k+2 = x k+1 . Otherwise, if xk+2 = xk+1 , then k + 1 ∈ T3 , which contradicts the fact that k˜ is the largest index of T3 . Hence, we have k + 1 ∈ T4 . By induction, we know that JkT Fk ≥ τ and ˜ xk+1 = xk hold for all k > k. ˜ which implies that By Step 3, we know that rk < p0 for all k > k,

μk → ∞

and λk → ∞,

(28)

due to (19). Hence, we have dk → 0.

(29)

Moreover, it follows from (15), (25) and (28) that there exists a positive constant c¯ such that 1 T −1 T ˆ dk = − 2 (Jk Jk + λk I ) Jk [( p − 1)Fk + F(yk )] p 1− p 1 ≤ (JkT Jk + λk I )−1 JkT Fk + 2 (JkT Jk + λk I )−1 JkT F(yk ) p2 p 1 1− p (JkT Jk + λk I )−1 JkT Fk + 2 [(JkT Jk + λk I )−1 JkT Fk ≤ p2 p + p(JkT Jk + λk I )−1 JkT Jk dk + p 2 L 1 dk 2 (JkT Jk + λk I )−1 JkT ] L1 L2 2 dk 2 ≤ cd ¯ k ≤ 2 dk + p λk

123

(30)

X. Zhao, J. Fan

holds for all sufficiently large k. Let sk = dk + p dˆk . We have sk = dk + p dˆk ≤ (1 + p c)d ¯ k .

(31)

Furthermore, it follows from (18), (24), (27), (30) and (31) that Aredk − Predk |rk − 1| = Predk 2 2 2 2 F(xk + dk + p dˆk ) − Fk + Jk dk + pqk − pqk + p Jk dˆk ≤ TF Tq J J k k JkT Fk min dk , k + p 2 JkT qk min dˆk , kT T Jk Jk Jk Jk ≤

Fk + Jk sk O(sk 2 + dk 2 ) + O(sk 4 + dk 4 ) + Fk + Jk dk O(dk 2 ) TF J k k JkT Fk min dk , JkT Jk

Fk O(dk 2 ) + Jk dˆk O(dk 2 ) dk → 0,

≤

(32)

which implies that rk → 1. In view of the updating rule of μk , we know that there exists a positive constant m˜ > m such that μk < m˜ holds for all sufficiently large k, which is a contradiction to (28). Hence, (27) cannot be true while T2 is finite. Case (II) T2 is infinite. It follows from (18) and (24) that

F1 2 ≥ (Fk 2 − Fk+1 2 ) k∈T2

≥

p0 Predk

k∈T2

≥

p0

JkT

Fk min dk ,

k∈T2

+p

2

JkT qk min

dˆk ,

≥

JkT Fk

JkT Jk JkT qk

JkT Jk

p0 τ τ min dk , 2 2 2L 2

,

(33)

k∈T2

which implies

lim

k→∞,k∈T2

Then, the definition of dk gives

dk = 0.

λk → +∞, k ∈ T2 .

(34) (35)

Similarly to (30), there exists a positive c˜ such that ˜ k dˆk ≤ cd

(36)

holds for all sufficiently large k ∈ T2 . By (33), we have sk = dk + p dˆk ≤ (1 + p c)d ˜ k .

123

(37)

The multi-point LM method...

Hence, we obtain from (33) that

sk = dk + p dˆk ≤ (1 + p c)d ˜ k < +∞. k∈T2

k∈T2

(38)

k∈T2

Furthermore, it follows from (23) and (24) that

J T Fk − J T Fk+1 < +∞. k+1 k k∈T2

Since (27) holds for infinitely many k, there exists a large kˆ such that J ˆT Fkˆ ≥ τ and k

J T Fk − J T Fk+1 < τ . k+1 2 k k∈T2 ,k≥kˆ

ˆ Then, we deduce from By induction, we see that JkT Fk ≥ τ/2 holds for all k ≥ k. (33)–(36) that limk→∞ xk exists and dk → 0, dˆk → 0.

(39)

μk → +∞.

(40)

So we have By the same arguments as in (32), we know that rk → 1. Thus, there exists a positive constant mˆ > m such that μk < mˆ holds for all sufficiently large k, which is a contradiction to (40). Therefore, the supposition (27) cannot be true when T2 is infinite. The proof is completed.

3 Convergence rate of Algorithm 2.1 In this section, we first investigate the properties of the steps dk and dˆk . Then, we show that the parameter μk is bounded. Finally, we derive the convergence rate of Algorithm 2.1 under the local error bound condition. We make the following assumption. Assumption 3.1 (a) F(x) is continuously differentiable, and F(x) provides a local error bound on some neighborhood of x ∗ ∈ X ∗ , i.e., there exist positive constants c1 > 0 and 0 < b1 < 1 such that F(x) ≥ c1 dist (x, X ∗ ), ∀x ∈ N (x ∗ , b1 ) = {x : x − x ∗ ≤ b1 }.

(41)

(b) The Jacobian J (x) is Lipschitz continuous on N (x ∗ , b1 ), i.e., there exists a positive constant L 1 such that J (y) − J (x) ≤ L 1 y − x, ∀x, y ∈ N (x ∗ , b1 ).

(42)

If the Jacobian at a solution is nonsingular, then it is an isolated solution, so F(x) provides a local error bound on some neighborhood of the solution. However, the converse is not necessary true. We refer to Yamashita and Fukushima (2001) for examples. Thus, the local error bound condition is weaker than the nonsingularity of the Jacobian.

123

X. Zhao, J. Fan

By (42), we have F(y) − F(x) − J (x)(y − x) ≤ L 1 y − x2 , ∀x, y ∈ N (x ∗ , b1 ).

(43)

Moreover, there exists a positive constant L 2 such that F(y) − F(x) ≤ L 2 y − x, ∀x, y ∈ N (x ∗ , b1 ).

(44)

Throughout the paper, we denote x¯k the vector in X ∗ that satisfies x¯k − xk = dist (xk , X ∗ ).

(45)

3.1 Properties of dk and dˆk In the following, we investigate the relationships among dk , dˆk and dist (xk , X ∗ ). Suppose the singular value decomposition of J (x¯k ) is ¯ k V¯kT J¯k =U¯ k =(U¯ k,1 , U¯ k,2 )

¯ k,1 0

T V¯k,1 T ¯ Vk,2

T ¯ k,1 V¯k,1 , =U¯ k,1

(46)

¯ k,1 = diag(σ¯ k,1 , . . . , σ¯ k,r ) with σ¯ k,1 ≥ σ¯ k,2 ≥ · · · ≥ σ¯ k,r > 0. The corresponding where SVD of Jk is Jk = Uk k VkT

⎛

= (Uk,1 , Uk,2 , Uk,3 ) ⎝

k,1

k,2

⎞⎛VT ⎞ k,1 T ⎟ ⎠⎜ ⎝ Vk,2 ⎠ T 0 Vk,3

T T + Uk,2 k,2 Vk,2 , = Uk,1 k,1 Vk,1

(47)

where k,1 = diag(σk,1 , . . . , σk,r ) with σk,1 ≥ σk,2 ≥ · · · ≥ σk,r > 0, and k,2 = diag(σk,r +1 , . . . , σk,r +q ) with σk,r ≥ σk,r +1 ≥ · · · ≥ σk,r +q > 0. For clarity, if the context is clear, we neglect the subscription k in k,i and Uk,i , Vk,i (i = 1, 2, 3), and write Jk as Jk = U1 1 V1T + U2 2 V2T .

(48)

Lemma 3.1 Under the conditions of Assumption 3.1, if xk , yk ∈ N (x ∗ , b1 /2), then there exists a constant c2 > 0 such that sk ≤ c2 dist(xk , X ∗ )

(49)

holds for all sufficiently large k. Proof Since xk ∈ N (x ∗ , b1 /2), we have x¯k − x ∗ ≤ x¯k − xk + xk − x ∗ ≤ 2xk − x ∗ ≤ b1 ,

(50)

which implies that x¯k ∈ N (x ∗ , b1 ). By (41) and (22), we have λk = μk Fk ≥ mc1 x¯k − xk .

123

(51)

The multi-point LM method...

Since dk is a minimizer of ϕk,1 (d), by (10), (43) and (51), we have ϕk,1 (dk ) λk ϕk,1 (x¯k − xk ) ≤ λk Fk + Jk (x¯k − xk )2 + λk x¯k − xk 2 = λk 1 2 ≤ L x¯k − xk 4 + x¯k − xk 2 λk 1

dk 2 ≤

≤ x¯k − xk 2 + c1−1 m −1 L 21 x¯k − xk 3 ,

(52)

then dk ≤ c˜2 x¯k − xk

(53)

for sufficiently large k, where c˜2 is a positive constant. It follows from (43) that dˆk = − 1 p2 1 ≤ 2 p

≤

1 T (J Jk + λk I )−1 JkT [( p − 1)Fk + F(yk )] p2 k

JkT Jk + λk I )−1 JkT ( p − 1)Fk + (JkT Jk + λk I )−1 JkT F(yk ) (1 − p)(JkT Jk + λk I )−1 JkT Fk + (JkT Jk + λk I )−1 JkT Fk

+ p(JkT Jk + λk I )−1 JkT Jk dk + L 1 p 2 dk 2 (JkT Jk + λk I )−1 JkT ≤

2 dk + L 1 dk 2 (JkT Jk + λk I )−1 JkT . p2

(54)

By (47), we have (JkT Jk + λk I )−1 JkT ⎛ 2 (1 + λk I )−1 1 = (V1 , V2 , V3 ) ⎝ (22 + λk I )−1 2 ⎛ 2 ⎞ ( + λk I )−1 1 1 2 −1 ⎝ ⎠ = (2 + λk I ) 2 0 −1 1 . ≤ −1 λk 2

⎞ ⎛ U T ⎞ 1 ⎟ ⎜ T ⎠ ⎝ U ⎠ 2 T 0 U 3

(55)

By the theory of matrix perturbation (Stewart and Sun 1990) and the Lipschitzness of Jk , we have ¯ 1 , 2 , 0) ≤ Jk − J¯k ≤ L 1 x¯k − xk , diag(1 − which yields

¯ 1 ≤ L 1 x¯k − xk 1 −

and 2 ≤ L 1 x¯k − xk .

123

(56)

X. Zhao, J. Fan

Since {xk } converges to the solution set X ∗ , we assume that L 1 x¯k − xk ≤ σ¯ r /2 holds for all sufficiently large k. Then, we have 1−1 ≤

2 1 ≤ . σ¯ r − L 1 x¯k − xk σ¯ r

(57)

Moreover, by (51), λ−1 k 2 =

L 1 x¯k − xk L1 2 2 ≤ = ≤ . λk mc1 x¯k − xk mc1 x¯k − xk mc1

(58)

The above inequality, together with (54), implies that there exists a c¯2 > 0 such that dˆk ≤ c¯2 x¯k − xk .

(59)

sk ≤ c2 x¯k − xk

(60)

By (53) and (59), we have for some c2 > 0. The proof is completed.

3.2 Boundedness of the LM parameter The updating rule of {μk } indicates that {μk } is bounded below. Next, we show that μk is bounded above. Lemma 3.2 Under the conditions of Assumption 3.1, if xk , yk , xk + dk ∈ N (x ∗ , b1 /2), then there exists a positive constant M > m such that μk ≤ M

(61)

holds for all sufficiently large k. Proof First, we prove that for all sufficiently large k, Predk ≥ c3 Fk min dk , x¯k − xk ,

(62)

where c3 is a positive constant. We consider two cases. If x¯k − xk ≤ dk , by (41), (43) and the fact that dk is a solution of (12), we have Fk − Fk + Jk dk ≥ Fk − Fk + Jk (x¯k − xk ) ≥ c1 x¯k − xk + O(x¯k − xk )2 ≥ c3 x¯k − xk for some c3 > 0. In the other case that x¯k − xk ≥ dk , we have dk Fk − Fk + Jk dk ≥ Fk − F + ( x ¯ − x ) J k k x¯ − x k k k k dk ≥ (Fk − Fk + Jk (x¯k − xk )) x¯k − xk dk · c3 x¯k − xk ≥ x¯k − xk ≥ c3 dk .

123

(63)

(64)

The multi-point LM method...

Both (63) and (64) indicate that Fk 2 − Fk + Jk dk 2 = (Fk + Fk + Jk dk )(Fk − Fk + Jk dk ) ≥ c3 Fk min dk , x¯k − xk . Since dˆk is a solution of (15), we have qk 2 − qk + Jk dˆk 2 ≥ 0. So, we have Predk = Fk 2 − Fk + Jk dk 2 + p 2 [qk 2 − qk + Jk dˆk 2 ] ≥ Fk 2 − Fk + Jk dk 2 ≥ c3 Fk min dk , x¯k − xk . It follows from (43), (53) and (62) that Aredk − Predk |rk − 1| = Predk F(x + d + p dˆ )2 − pq + p J dˆ 2 + pq 2 − F + J d 2 k k k k k k k k k k = Predk Fk + Jk sk O(sk 2 + dk 2 ) + O(sk 4 + dk 4 ) + Fk + Jk dk O(dk 2 ) c3 Fk min{dk , x¯k − xk } 2 Fk + Jk sk O(sk + dk 2 ) + O(sk 4 + dk 4 ) + Fk + Jk dk O(dk 2 ) . ≤ x¯k − xk dk (65) ≤

In view of (44), (53), (54) and (59), we see that Fk + Jk dk ≤ Fk ≤ O(x¯k − xk ),

(66)

Fk + Jk sk ≤ Fk + Jk dk + pJk dˆk ≤ Fk + pL 2 c¯2 x¯k − xk ≤ O(x¯k − xk ),

(67)

and sk 2 = dk + p dˆk 2 ≤ dk 2 + 2 pdk dˆk + p 2 dˆk 2 2 2 + 1 dk 2 + O(dk 3 ). ≤ p

(68)

So, we have rk → 1. Hence, there exists a positive constant M > m such that μk ≤ M holds for all sufficiently large k. The proof is completed.

Lemma 3.2, together with (44), indicates that the LM parameter satisfies mc1 x¯k − xk ≤ λk = μk Fk ≤ L 2 Mx¯k − xk .

123

(69)

X. Zhao, J. Fan

3.3 Convergence rate of Algorithm 2.1 By the SVD of Jk , we compute dk = −V1 (12 + λk I )−1 1 U1T Fk − V2 (22 + λk I )−1 2 U2T Fk ,

(70)

and Fk + p Jk dk = U1 [(1 − p)12 + λk I ](12 + λk I )−1 U1T Fk + U2 [(1 − p)22 + λk I ](22 + λk I )−1 U2T Fk + U3 U3T Fk .

(71)

The following two lemmas give the estimations of U1 U1T Fk , U2 U2T Fk , U3 U3T Fk as well as U1 U1T F(yk ), U2 U2T F(yk ) and U3 U3T F(yk ). Lemma 3.3 Under the conditions of Assumption 3.1, if xk ∈ N (x ∗ , b1 /2), then we have U1 U1T Fk ≤ L 2 x¯k − xk , U2 U2T U3 U3T

(72) 2

Fk ≤ 2L 1 x¯k − xk ,

(73)

Fk ≤ L 1 x¯k − xk ,

(74)

2

where L 1 , L 2 are given in (42) and (44), respectively. Proof The result (72) follows directly from (44). Let s¯k = −Jk+ Fk , where Jk+ is the pseudo-inverse of Jk . It is obvious that s¯k is the least squares solution of mins∈R n Fk + Jk s. So, by (43), we have U3 U3T Fk = Fk + Jk s¯k ≤ Fk + Jk (x¯k − xk ) ≤ L 1 x¯k − xk 2 . Let J˜k = U1 1 V1T and s˜k = − J˜k+ Fk . Since s˜k is the least squares solution of mins∈R n Fk + J˜k s, it follows from (43) and (56) that (U2 U2T + U3 U3T )Fk = Fk + J˜k s˜k ≤ Fk + J˜k (x¯k − xk ) ≤ Fk + Jk (x¯k − xk ) + ( J˜k − Jk )(x¯k − xk ) ≤ L 1 x¯k − xk 2 + U2 2 V2T (x¯k − xk ) ≤ L 1 x¯k − xk 2 + L 1 x¯k − xk x¯k − xk ≤ 2L 1 x¯k − xk 2 .

(75)

Due to the orthogonality of U2 and U3 , we obtain (73) and (74).

Lemma 3.4 Under the conditions of Assumption 3.1, if xk , yk ∈ N (x ∗ , b1 /2), then we have U1 U1T F(yk ) ≤ (1 − p)L 2 x¯k − xk + c˜4 x¯k − xk 2 ,

(76)

F(yk ) ≤ (1 − p)c5 x¯k − xk + c˜5 x¯k − xk ,

(77)

F(yk ) ≤ (1 − p)c6 x¯k − xk + c˜6 x¯k − xk ,

(78)

U2 U2T U3 U3T

2

where c˜4 , c5 , c˜5 , c6 , c˜6 are positive constants.

123

2

3 3

The multi-point LM method...

Proof It follows from (71), (57) and Lemma 3.1 that Fk + p Jk dk ≤ ||U1 [(1 − p)12 + λk I ](12 + λk I )−1 U1T Fk || + ||U2 [(1 − p)22 + λk I ](22 + λk I )−1 U2T Fk || + ||U3 U3T Fk || ≤ (1 − p)U1 U1T Fk + pλk U1 (12 + λk I )−1 U1T Fk + (1 − p)U2 U2T Fk + pλk U2 (22 + λk I )−1 U2T Fk + U3 U3T Fk ≤ (1 − p)L 2 x¯k − xk + pλk U1 1−2 U1T Fk + (1 − p)2L 1 x¯k − xk 2 + pU2 U2T Fk + L 1 x¯k − xk 2 4M = (1 − p)L 2 x¯k − xk + pL 22 2 + 3L 1 x¯k − xk 2 . σ¯ r

(79)

Hence, by (43) and (53), we have F(yk ) = F(xk + pdk ) ≤ Fk + p Jk dk + L 1 p 2 dk 2 4M ≤ (1− p)L 2 x¯k −xk + pL 22 2 +3L 1 + L 1 p 2 c˜22 x¯k −xk 2 σ¯ r ≤ (1 − p)L 2 x¯k − xk + c˜4 x¯k − xk 2

(80)

for some c˜4 , which gives (76). By (41), we have

y¯k − yk ≤ c1−1 F(yk ) ≤ (1 − p)

L2 c˜4 x¯k − xk + x¯k − xk 2 . c1 c1

(81)

Let p¯ k = −Jk+ F(yk ). Then, p¯ k is the least squares solution of min p∈R n F(yk ) + Jk p. By simple calculations, we deduce from (41), (42) and (81) that U3 U3T F(yk ) = F(yk ) + Jk p¯ k ≤ F(yk ) + Jk ( y¯k − yk ) ≤ F(yk ) + J (yk )( y¯k − yk ) + (J (yk ) − Jk )( y¯k − yk ) ≤ L 1 y¯k − yk 2 + L 1 pdk y¯k − yk 2 L 2 c˜2 2 L2 ≤ L 1 (1 − p) 2 + L 1 p(1 − p) x¯k − xk 2 + c˜6 x¯k − xk 3 c1 c1 ≤ (1 − p)c6 x¯k − xk 2 + c˜6 x¯k − xk 3

(82)

for some positive c6 and c˜6 .

123

X. Zhao, J. Fan

Let J˜k = U1 1 V1T and p˜ k = −Jk+ F(yk ). Since p˜ k is the least solution of min p∈R n F(yk ) + J˜k p, we deduce from(42), (43) and (81) that (U2 U2T + U3 U3T )F(yk ) = F(yk ) + J˜k p˜ k ≤ F(yk ) + J˜k ( y¯k − yk ) ≤ F(yk ) + J (yk )( y¯k − yk ) + ( J˜k − J (yk ))( y¯k − yk ) ≤ L 1 y¯k − yk 2 + (Jk − J (yk ))( y¯k − yk ) + U2 2 V2 ( y¯k − yk ) ≤ L 1 y¯k − yk 2 + pL 1 dk y¯k − yk + L 1 x¯k − xk y¯k − yk L1 L2 L2 ≤ (1 − p) + p c˜2 + 1 x¯k − xk 2 + c˜5 x¯k − xk 3 (1 − p) c1 c1 ≤ (1 − p)c5 x¯k − xk 2 + c˜5 x¯k − xk 3 for some positive c5 and c˜5 . Due to the orthogonality of U2 to U3 , we get (73) and (74). The proof is completed.

(83)

With the help of the above two lemmas, we obtain the convergence order of Algorithm 2.1. Theorem 3.1 Under the conditions of Assumption 3.1, the sequence {xk } generated by Algorithm 2.1 satisfies x¯k+1 − xk+1 ≤ (1 − p)O(x¯k − xk 2 ) + O(x¯k − xk 3 ).

(84)

Proof We have qk + Jk dˆk = [I − U1 1 (12 + λk I )−1 1 U1T − U2 2 (22 + λk I )−1 2 U2T ]qk = λk U1 (12 + λk I )−1 U1T qk + λk U2 (22 + λk I )−1 U2T qk + U3 U3T qk .

(85)

It follows from (85), Lemmas 3.1 and 3.2 that 1− p T 1 −2 T ˆ qk + Jk dk ≤ λk 1 U1 Fk + 2 U1 F(yk ) p2 p 1− p T 1 1− p T 1 T T + U F + U F(y ) + U F + U F(y ) k k 2 k 3 k p2 p2 2 p2 p2 3 1− p 8 ≤ M L 22 + 3L 1 + c5 + c6 x¯k − xk 2 + c˜7 x¯k − xk 3 p2 σ¯ r2 ≤ (1 − p)c7 x¯k − xk 2 + c˜7 x¯k − xk 3

(86)

for some positive c7 and c˜7 . Let z k = xk + dk . By (43), we have pqk =

1 [( p − 1)Fk + F(yk )] = F(z k ) + (1 − p)O(dk 2 ). p

Specifically, we have F(z k ) = pqk if p = 1.

123

(87)

The multi-point LM method...

By (70), we have T dˆk ≤ 1−1 U1T qk + λ−1 k 2 U2 qk 1 L1 1 − p T 1 2 1− p T T T U1 Fk + 2 U1 F(yk ) + U2 Fk + 2 U2 F(yk ) ≤ σ¯r p2 p mc1 p2 p 2 1− p ≤ · · 2L 2 x¯k − xk + O(x¯k − xk 2 ). (88) σ¯r p2

Combining (41), (42), (53)–(55), (86), (87) and (88), we have c1 x¯k+1 − xk+1 ≤ F(xk+1 ) = F(z k + p dˆk ) ≤ F(z k ) + p J (z k )dˆk + p 2 L 1 dˆk 2 ≤ F(z k ) + p Jk dˆk + (J (z k ) − Jk ) p dˆk + p 2 L 1 dˆk 2 ≤ pqk + p Jk dˆk + (1 − p)O(dk 2 ) + L 1 pdk dˆk + p 2 L 1 dˆk 2 ≤ pqk + Jk dˆk + (1 − p)O(dk 2 ) + L 1 pdk dˆk + p 2 L 1 dˆk 2 ≤ (1 − p)O(x¯k − xk 2 ) + O(x¯k − xk 3 ).

The proof is completed.

4 Numerical experiments We tested Algorithm 2.1 on some singular problems, and compared it with the general LM algorithm which does not solve (20) and only computes the step dk . The test problems were created by modifying the nonsingular problems given by Moré, Garbow and Hillstrom in Moré et al. (1981), and have the same form as in Schnabel and Frank (1984), ˆ F(x) = F(x) − J (x ∗ )A(A T A)−1 A T (x − x ∗ ), where F(x) is the standard nonsingular test function, x ∗ is its root, and A ∈ R n×k has full ˆ ∗ ) = 0 and column rank with 1 ≤ k ≤ n. Obviously, F(x Jˆ(x ∗ ) = J (x ∗ )(I − A(A T A)−1 A T ) ˆ has rank n − k. A disadvantage of these problems is that F(x) may have roots that are not roots of F(x). We created two sets of singular problems, with Jˆ(x ∗ ) having rank n − 1 and n − 2, using A ∈ R n×1 , A T = (1, 1, . . . , 1)

and A ∈ R n×2 , A T =

1 1 1 1 ··· 1 1 −1 1 −1 · · · ±1,

respectively. Meanwhile, we made a slight alteration on the variable dimension problem, which has n + 2 equations in n unknowns; we eliminate the (n − 1)-th and n-th equations. (The first n equations in the standard problem are linear.) We set p0 = 0.0001, p1 = 0.25, p2 = 0.75, and μ1 = 10−5 for all the tests. The algorithm is terminated when the norm of JkT Fk , e.g., the derivative of at xk , is less than 10−5 , or when the number of iterations exceeds 100(n + 1).

123

X. Zhao, J. Fan Table 1 Results on first singular test set with rank J (x ∗ ) = n − 1 Problem

1

3

4

5

6

8

9

n

2

2

4

3

31

10

10

50

10

30

50

11

30

50

12

10

50

123

x0

sk = dk

p = 0.6

p = 0.8

p=1

NF/NJ/NF+NJ*n

NF/NJ/NF+NJ*n

NF/NJ/NF+NJ*n

NF/NJ/NF+NJ*n

1

15/15/45

23/12/47

23/12/47

21/11/43

10

17/17/51

27/14/55

27/14/55

25/13/51

100

21/21/63

33/17/67

31/16/63

29/15/59

1

300/145/590

31/16/63

63/81/99

29/15/59

10

14/14/42

25/13/51

301/132/565

301/72/445 301/74/449

100

300/188/676

300/74/449

301/72/445

1

16/16/80

25/13/77

23/12/71

23/12/71

10

19/19/95

31/16/95

29/15/89

27/14/83 31/16/95

100

22/22/110

35/18/107

33/17/101

1

8/8/32

13/7/34

13/7/34

11/6/29

10

8/8/32

13/7/34

11/6/29

9/5/24

100

8/8/32

13/7/34

13/7/34

11/6/29

1

–

81/19/670

75/15/540

81/18/639

10

–

–

–

–

100

–

–

–

–

1

8/8/88

13/7/83

11/6/71

11/6/71

10

23/23/253

37/19/227

35/18/215

33/17/203 3/2/23

100

2/2/22

3/2/23

3/2/23

1

4/4/44

5/3/35

3/2/23

5/3/35

10

7/7/77

11/6/71

11/6/71

11/6/71 13/7/83

100

9/9/99

15/8/95

13/7/83

1

2/2/102

3/2/103

3/2/103

3/2/103

10

6/6/306

9/5/259

9/5/259

9/5/259 13/7/363

100

9/9/459

13/7/363

13/7/363

1

6/6/186

9/5/159

9/5/159

7/4/127

10

7/7/217

15/8/255

13/7/223

11/6/191 13/7/223

100

10/10/310

15/8/255

15/8/255

1

6/6/306

9/5/259

9/5/259

7/4/207

10

7/7/357

15/8/415

13/7/363

11/6/311

100

10/10/510

15/8/415

15/8/415

13/7/363

1

35/19/605

43/8/283

45/9/315

45/9/315

10

60/46/1440

69/20/669

69/17/579

85/23/775

100

45/35/1095

103/30/1003

101/30/1001

99/24/819

1

19/7/369

45/12/645

57/14/757

55/14/755

10

35/35/1235

127/34/1827

107/32/1707

163/45/2413

100

41/27/1391

129/31/1679

81/22/1181

121/34/1821

1

14/14/154

23/12/143

21/11/131

19/10/119

10

16/16/176

25/13/155

23/12/143

21/11/131

100

19/19/209

31/16/191

29/15/179

27/14/167

1

20/20/1020

31/16/831

29/15/779

27/14/727

The multi-point LM method... Table 1 continued Problem

13

n

30

50

x0

30

50

p = 0.6

p = 0.8

p=1

NF/NJ/NF+NJ*n

NF/NJ/NF+NJ*n

NF/NJ/NF+NJ*n

10

21/21/1071

35/18/935

33/17/883

31/16/831

100

25/25/1275

41/21/1091

37/19/987

35/18/935

9/9/279

15/8/255

13/7/223

13/7/223

10

1

14/14/434

21/11/351

21/11/351

19/10/319

100

17/17/527

27/14/447

25/13/415

25/13/415

9/9/459

15/8/415

13/7/363

13/7/363

14/14/714

21/11/571

21/11/571

19/10/519

1 10

14

sk = dk NF/NJ/NF+NJ*n

100

17/17/867

27/14/727

25/13/675

25/13/675

1

12/12/372

19/10/319

19/10/319

17/9/287

10

18/18/558

29/15/479

27/14/447

27/14/447

100

24/24/744

39/20/639

37/19/607

35/18/575

1

12/12/612

19/10/519

19/10/519

17/9/467

10

18/18/918

29/15/779

27/14/727

27/14/727

100

24/24/1224

39/20/1039

37/19/987

35/18/935

The experiments are implemented on PC with 2.30 GHz CPU processor, 6.0 GB RAM memory, using Matlab R2014b. The results for the first set problems of rank n −1 are listed in Table 1, and the second set of rank n −2 in Table 2. The third column of the table indicates that the starting point is x 0 , 10x0 , and 100x0 , where x0 is suggested by Moré, Garbow and Hillstrom in Moré et al. (1981); “NF” and “NJ” represent the numbers of function calculations, and Jacobian calculations, respectively. If the method failed to find a solution in 100(n + 1) iterations, we denoted it by the sign “-”. Note that, for general nonlinear equations, the Jacobian calculations are usually n times of the function calculations. So, we also presented the values “NF+NJ*n” for comparisons of the total calculations. However, if the Jacobian is sparse, this kind of values does not mean much. From the tables, we can see that Algorithm 2.1 almost always performs best when p = 1, whether on the first singular test set or on the second test set. Though the function calculations of Algorithm 2.1 are more than those of the general LM algorithm, their Jacobian calculations as well as the total calculations are much less. The tables indicate that Algorithm 2.1 performs better and better as p increases, which supports the local convergence result we obtain in Sect. 3. We also adopt the performance profiles by Dolan and Moré (2002) to compare the performances of the general LM algorithm (i.e, sk = dk ) with Algorithm 2.1 for different p based on the numbers of function calculations and Jacobian calculations. Figure 1 shows the performance profiles of the algorithms for the problems with rankJ (x ∗ ) = n − 1. It can be seen that the multi-point LM algorithm with p = 1 has the most wins (has the highest probability of being the optimal solver) and that the probability that it is the winner on a given problem is about 0.76. The multi-point LM algorithm with p = 0.8 has a lower number of wins than with p = 1, but its performance becomes much more competitive if we extend our τ of interest to 2. If we are interested in the solver that can solve 90 % of the problems with the greatest efficiency, then p = 0.6 and

123

X. Zhao, J. Fan Table 2 Results on second singular test set with rank J (x ∗ ) = n − 2 Problem

n

1

2

3

4

5

6

8

2

4

3

31

10

sk = dk NF/NJ/NF+NJ*n

x0

10

10

30

11/11/33

17/9/35

15/8/31

15/8/31

13/13/39

21/11/43

19/10/39

19/10/39

100

17/17/51

27/14/55

25/13/51

23/12/47

1

35/25/85

195/63/321

167/61/289

57/17/91

10

59/54/167

39/14/67

67/29/125

91/34/159

100

25/18/61

41/15/71

41/15/71

51/15/81

1

14/14/70

21/11/65

21/11/65

19/10/59

10

17/17/85

27/14/83

25/13/77

23/12/71

100

29/15/89

20/20/100

33/17/101

31/16/95

1

13/13/52

21/11/54

21/11/54

19/10/49

10

14/14/56

23/12/59

21/11/54

19/10/49

100

24/18/78

23/12/59

21/11/54

21/11/54

1

–

3181/814/28415

–

–

10

–

–

–

–

100

–

–

–

–

8/8/88

13/7/83

11/6/71

11/6/71

23/23/253

37/19/227

35/18/215

33/17/203

100

2/2/22

3/2/23

3/2/23

3/2/23

1

4/4/44

5/3/35

5/3/35

5/3/35

10

7/7/77

11/6/71

11/6/71

11/6/71 13/7/83

1

10/10/110

15/8/95

15/8/95

1

2/2/102

3/2/103

3/2/103

3/2/103

10

6/6/306

9/5/259

9/5/259

9/5/259

100

9/9/459

15/8/415

13/7/363

13/7/363

1

6/6/186

9/5/159

9/5/159

7/4/127

10

7/7/217

15/8/225

13/7/223

11/6/191 15/8/255

100 50

10/10/310

17/9/287

15/8/255

1

6/6/306

9/5/259

9/5/259

7/4/207

10

7/7/357

15/8/415

13/7/363

11/6/311

11/11/561

17/9/467

17/9/467

15/8/415

19/9/289

41/8/281

45/12/405

41/9/311

100 11

30

50

1 10

62/41/1292

95/24/815

63/16/543

65/17/575

100

43/35/1093

147/49/1617

201/60/2001

87/23/777

1

25/14/725

61/15/811

61/14/761

61/15/811

10

50/34/1750

97/29/1547

61/14/761

75/20/1075

100 12

10

123

p=1 NF/NJ/NF+NJ*n

1

100 50

p = 0.8 NF/NJ/NF+NJ*n

10

10 9

p = 0.6 NF/NJ/NF+NJ*n

51/35/1801

91/30/1581

93/28/1493

117/37/1967

1

14/14/154

23/12/143

21/11/131

19/10/119

10

16/16/176

25/13/155

23/12/143

21/11/131

100

19/19/209

31/16/191

29/15/179

27/14/167

The multi-point LM method... Table 2 continued Problem

n

50

sk = dk NF/NJ/NF+NJ*n

x0

30

50

30

50

p=1 NF/NJ/NF+NJ*n

1

20/20/1020

31/16/831

29/15/779

27/14/727

21/21/1071

35/18/935

33/17/883

31/16/831 35/18/935

25/25/1275

41/21/1091

37/19/987

9/9/279

15/8/255

13/7/223

13/7/223

10

14/14/434

21/11/351

21/11/351

19/10/319

100

17/17/527

27/14/447

25/13/415

25/13/415

9/9/459

15/8/415

13/7/363

13/7/363

14/14/714

21/11/571

21/11/571

19/10/519

1

1 10

14

p = 0.8 NF/NJ/NF+NJ*n

10 100 13

p = 0.6 NF/NJ/NF+NJ*n

100

17/17/867

27/14/727

25/13/675

25/13/675

1

12/12/372

19/10/319

19/10/319

17/9/287

10

18/18/558

29/15/479

27/14/447

27/14/447

100

24/24/744

39/20/639

37/19/607

35/18/575

1

12/12/612

19/10/519

19/10/519

17/9/467

10

18/18/918

29/15/779

27/14/727

27/14/727

100

24/24/1124

39/20/1039

37/19/987

35/18/935

Log 2 Scaled Performance Profile of NF+NJ*n when rank=n-1 1 0.9

P((log 2 (rp,s)<= ) : s=1,2,3,4)

0.8 0.7 0.6 0.5 0.4 0.3

sk=dk p=0.6 p=0.8 p=1

0.2 0.1 0

0

0.5

1

1.5

2

2.5

3

3.5

4

Fig. 1 Log2 Scaled Performance Profile of N F + N J ∗ n on first singular test set with rank J (x ∗ ) = n − 1

p = 0.8 stand out. If we hold to more stringent probabilities of completing a solver successfully, then p = 0.8 captures our attention with its ability to solve over 95 % of those questions. Figure 2 shows the performance profiles of the algorithms for the problems with rank J (x ∗ ) = n − 2. Similar results to Fig. 1 can be obtained. The multi-point LM algorithm

123

X. Zhao, J. Fan Log 2 Scaled Performance Profile of NF+NJ*n when rank=n-2 1

P((log 2 (rp,s)<= ) : s=1,2,3,4)

0.9 0.8 0.7 0.6 0.5 0.4 0.3

sk=dk

0.2

p=0.6 p=0.8 p=1

0.1 0

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Fig. 2 Log2 Scaled Performance Profile of N F + N J ∗ n on second singular test set with rank J (x ∗ ) = n − 2

with p = 1 has the most wins and the probability that it is the winner on a given problem is about 0.82. The algorithms with p = 0.8 and 0.6 have lower numbers of wins than p = 1. Acknowledgments The authors would like to thank the referees for the helpful and constructive suggestions, which have greatly improved the presentation of the paper.

References Argyros IK, Chen D (1993) Results on the Chebyshev method in Banach spaces. Proyecciones 12:119–128 Dolan ED, Moré JJ (2002) Benchmarking optimization software with performance profiles. Math Program 91:201–213 Ezquerro JA, Hernández MA (2009) An optimization of Chebyshev’s method. J Complex 25:343–361 Fan JY (2012) The modified Levenberg–Marquardt method for nonlinear equations with cubic convergence. Math Comput 81:447–466 Fan JY, Yuan YX (2005) On the quadratic convergence of the Levenberg–Marquardt method without nonsingularity assumption. Computing 74:23–39 Kelley CT (1995) Iterative methods for linear and nonlinear equations, frontiers in applied mathematics, vol 16. SIAM, Philadelphia Kelley CT (2003) Solving nonlinear equations with Newton’s method. Fundamentals of algorithms. SIAM, Philadelphia Levenberg K (1944) A method for the solution of certain nonlinear problems in least squares. Quart Appl Math 2:164–166 Marquardt DW (1963) An algorithm for least-squares estimation of nonlinear inequalities. SIAM J Appl Math 11:431–441 Moré JJ (1978) The Levenberg–Marquardt algorithm: implementation and theory. In: Watson GA (ed) Lecture notes in mathematics: numerical analysis, vol 630. Springer, Berlin, pp 105–116 Powell MJD (1975) Convergence properties of a class of minimization algorithms. In: Mangasarian OL, Meyer RR, Robinson SM (eds) Nonlinear programming, vol 2. Academic Press, New York, pp 1–27 Stewart GW, Sun JG (1990) Matrix perturbation theory. Academic Press, San Diego Moré JJ, Garbow BS, Hillstrom KH (1981) Testing unconstrained optimization software. ACM Trans Math Softw 7:17–41 Schnabel RB, Frank PD (1984) Tensor methods for nonlinear equations. SIAM J Numer Anal 21:815–843

123

The multi-point LM method... Yamashita N, Fukushima M (2001) On the rate of convergence of the Levenberg–Marquardt method. Comput Suppl 15:237–249 Yuan YX (1998) Trust region algorithms for nonlinear equations. Information 1:7–20 Yuan YX (2011) Recent advances in numerical methods for nonlinear equations and nonlinear least squares. Numer Algebra Control Optim 1:15–34

123

On the multi-point Levenberg–Marquardt method for singular nonlinear equations

Recommend Documents