Numerische MathemalJk
Numer. Math. 35, 95-111 (1980)
9 by Springer-Verlag 1980
Starlike Domains of Convergence for Newton's Method at Singularities A.O. Griewank, Department of Computer Science, S.G.S., Australian National University, Canberra A.C.T. 2600
Summary. Given a solution x* of a system of nonlinear equations f with singular Jacobian Vf(x*) we construct an open starlike domain M of initial points, from which Newton's method converges linearly to x*. Under certain conditions the union of those straight lines through x*, that do not intersect with M is shown to form a closed set of measure zero, which is necessarily disjoint from any starlike domain of convergence. The results apply to first and higher order singularities.
Subject Classifications: AMS(MOS): 65H10; CR: 5.15.
1. Introduction Whenever the Jacobian Vf(x) of a vector function f : IR"-~IR" is explicitly available the Newton iteration
x j+ 1 = x j - (Vf(xj))- i f ( x j)
(i)
seems the natural approach to the numerical solution of the system of nonlinear equations f (x) = O. If the Jacobian Vf(x*) at a solution point x * e f - l ( O ) is nonsingular the Newton iteration converges to x* from any initial point x 0 in a sufficiently small ball centered at x*. The radius of such a ball can only be given explicitly if the Jacobian satisfies a known Lipschitz condition in which case the rate of convergence is not only superlinear but quadratic [5]. In the case of a singular Jacobian Vf(x*) the classical theory is not applicable and except for the one dimensional problem only few, comparatively recent, results are available. Singular and nearly singular problems arise in bifurcation theory and the analysis of Newton's method in such cases seems of particular importance as many other numerical schemes are based on iterations similar to (1).
0029-599X/80/0035/009 5/$ 03.40
96
A.O. Griewank
2. Starlike Domains of Invertibility and Convergence Iterative methods of any kind are usually expected to converge from all initial points in some sufficiently small spherical neighbourhood of an isolated solution x*. Whereas Newton's method meets this requirement in the nonsingular case it is in general not satisfied if Vf(x*) is singular. Since the iteration (1) is only defined as long as det(Vf(xj))+O a necessary condition would be that the continuous function det (x) - det(Vf(x)) attains an isolated extremum at x* or n = 1. This strong assumption was made in earlier work by Rall [6] and Cavanagh [1] but in general there is no reason why it should be satisfied. In fact it can be seen easily, that det(x) has no particular structure and could be any scalar function that is as often differentiable as the Jacobian is known to be. Consequently the singular set d e t - 1(0)= {xelR" I det (x)=0} can have a rather complicated structure even i f f is highly differentiable. Any domain of convergence, i.e. any open set of initial points from where Newton's method converges must be a domain of invertibility, i.e. an open set disjoint from d e t - 1(0). Any convergence result entails the nomination of such a domain of convergence. For a local convergence result, which guarantees convergence to a particular solution x* it seems reasonable to impose the condition that the domain of convergence d is starlike with respect to x* in that xe~4~2x+(1-2)x*ed for 2e(0, 1). (2) It should be noted, that in contrast to the usual definition of star-shaped domains in complex analysis the central point x* belongs to ~ only in the special case, where Newton's method has a spherical domain of convergence about x*. With - {telR" [ Ilt]l = 1} the unit sphere of directions in IR" starlike domains are conveniently characterised by boundary~functions as shown in the following Lemma.
Lemma 2.1. A subset ~ c lR" is a starlike domain with respect to x* iff there exists a nonnegative lower semi continuous function a: ~ - - * R w{c~}
such that d = {x= x* + pt l t~S~, O< p < a(t)} which implies that the set of excluded directions a - 1(0) = { t e ~ [ ~ N {x gr -1-pt}o > o =0}
is closed in ~.
(3)
Newton's Method at Singularities
97
Proof. With x * = O and the convention sup(O)=O the definition a(t)=--sup{p]pte~4} provides for any starlike domain d a nonnegative function such that the identity (3) is satisfied. Now suppose a(t) is not lower semi continuous. According to the equivalent definitions of lower semi continuity given on p. 40 in [3] there must be a sequence of directions t j ~ t with a(t~)~o~
x j = 89
j) + a(t)) t j-~89 + a(t)) t ~
belong to ~ , which contradicts the openness of d . Conversely any set d of the form (3) must obviously satisfy (2) so that only the openness remains to be shown. For any converging sequence x j = pjtj--~ x = pt@~4
we find that
pj ~ p < a(t) < lira infa(tj) j~o~
so that all but finitely many of the xj must belong to d , which is therefore open. The closeness of a-~(0) follows from the theorem on page 40 in [3]. Particular examples of starlike domains are balls, cones and their intersections which were used by Reddien [7, 8] and Decker and Kelley [2] as domains of convergence. In constructing starlike domains of invertibility and convergence our main concern is to keep the set of excluded directions as small as possible. The actual values of the boundary functions at included, i.e. not excluded directions, which depend on the magnitude of higher derivatives and technicalities of the mathematical derivation are of lesser importance. Certain directions must always be excluded as we may see from the following corollary of L e m m a 2.1.
Corollary 2.2. I f s~Sf is tangential to the singular set det-a(0) in that there exists a sequence { y j } ~ t = d e t - 1(0) with lira y j = x * j~|
and
lira y j - x * j - ~ I[y~-x*ll
s
then s must be excluded from any starlike domain of invertibility. Proof. Suppose s e 5 P is included in some domain of invertibility J with boundary function a. By L e m m a 2.1 the set of included directions is open so that there must be a neighbourhood ~ll~=--{ t~ Sf [ cos- l (tT s) < 6} ~_~ --a- l (O) with
~ - i n f {a(t) [t6ql~} >0.
98
A.O. Griewank
Consequently we have d e t - 1(0)c~ {x = x * +ptlte~ll~, 0 < p <4} = 0 which excludes the possibility that s is tangential. The term starlike domain of convergence with respect to x* will always indicate convergence to x* even though none of the conditions used in this paper does either assume or imply that x* is an isolated solution. It follows immediately from (2) that the union of infinitely many starlike domains with respect to the same point x* is starlike too so that there must be maximal starlike domains of invertibility and convergence with corresponding minimal sets of excluded directions. Under certain nondegeneracy assumptions these necessarily excluded directions can be characterised in terms of the derivatives of f at x*.
3. The Inverse Jacobian in the Domain With Pa the orthogonal projection from IR" onto the m_>_1 dimensional nullspace ~" of vfT(x *) we define the order of the singularity at x*Ef-l(O) as the maximal index k for which f has a Lipschitz continuous (1 + k)-th derivative and
P~ VJf(x *) = 0
for j = 1... k.
(4)
Since the Newton iteration (1) is invariant with respect to nonsingular linear transformations of f and nonsingular affine transformations of the variables x we assume without loss of generally that x*=0,
Vf(x*)=(l-P~t ) and
J I = I R m x {0} "-m
(5)
which implies in particular that the nullspace JV of Vf(x*) is identical to J/4'. Now the Jacobian can be partitioned as
[B(x), Vf(x)= \O(x),
cr(x)~
E(x) !
where the m x m matrix B(x), the ( n - m ) x m matrices C(x) and D(x) and the (n-m) x (n-m) matrix E(x) are k times differentiable. With/~(x) and C(x) the restrictions of the linear mapping d~
p~ dl~ Vf (l~x)l,:o= P~ vk +l f (x*) x k
(6)
to JV and its orthogonal complement respectively we have because of (4)
B(x)=(k[) -1B(x)+O(pk+')=pk(k!) - ' B(t)+O(pk+'), C(x) = (k !)- ' C(x) + 0 (pk +,)= pk(k !)-, C(t) + O(p k +'), O(x)=O(p) and E(x)=I+O(p).
(7)
Newton's Method at Singularities
99
Clearly E - l ( x ) = I + O(p) so that for p smaller than some rb>O the Jacobian is invertible iff the m x m matrix
G(x) ~ B(x) - C(x) r E(x)- 1 D(x) is nonsingular. This follows directly from the elementary expansion det (x) = det (Vf(x))= det (G(x)) det (E(x)). Because of (7) we have pk
G(x) = B(x) + O(p k + 1) = ~ ~(t) + 0 ( 0 k + ')
(8)
det (x) = (pk]m \k ! ] det (/3(t)) + O(p km+').
(9)
so that
Since all the order terms in powers of p are uniform with respect to the directions tES# there exists a constant 7 such that for all x = p t with p < r b
a(x)-~
B(t) <=~p~+l
(10)
where 1["[1 denotes the spectral n o r m of a matrix. With respect to this n o r m the smallest singular value of/~(t) is given by the continuous function 0
if/~(t) is singular otherwise
a(t)-i1/~_ 1(t)]t _ 1
defined on the c o m p a c t d o m a i n 5:. N o w we can construct a starlike d o m a i n of invertibility, which will be reduced to a d o m a i n of convergence later on. L e m m a 3.1. Let f have a singularity of order k at x ' e f - l ( 0 ) and the conditions (5) be satisfied then: (i) The continuous boundary function f ( t ) - m i n {r b, 37 1 - 1 o(t)/k !}
(1 I)
defines the starlike domain of invertibility ~ - {x=pt[t6Sf, 0 < p < ~-(t)} in which the inverse Jacobian takes the form ( a -I, Vf - ~ -- ~ _ E _ I D G _ I ,
_G-1CrE-1 ] E _ I + E _ I D G _ I C r E _ 1]
(12)
with G-'(x)=k! p-kB-'(t)+o-2(t)O(pt-k)=a-2(t)O(p-k).
(13)
100
A.O. Griewank
(ii) The set of excluded directions F-I(O) is the intersection of 6 r with the solution set of the homogeneous polynomial of degree km
Flo(x ) = det (/3(x)). (iii) The smallest singular value v(x) of Vf (x) satisfies
v(pt)=O(p k)
for all t~Se,
v(pt)=o(p k)
iff tef-l(O).
(iv) Any direction in Y- ~(0) that is not tangential to det-~(0) must be a local minimiser or maximiser of 17o in IR".
Proof (i) Multiplying (10) by k ! p -k B - l ( t ) we find with (11) IIG(x)B-l(t)p-kk!-Ill
1
so that E(x) and G(x) are nonsingular for x e ~ , which implies that V f - l ( x ) must exist in the given form. (13) can be derived from the inequality above by some elementary manipulations. (ii) By definition of/~(x) as the top left m x m submatrix of the n x n matrix given in (6) its entries are homogeneous polynomials of degree k in the components of x, which makes H 0 (x) a h o m o g e n e o u s polynomial of degree mk. It follows immediately from the definitions of f(t) and a(t) that F - ~ ( 0 ) = a - i ( 0 ) = 17o ~(0)c~Se. (iii) Since by (5) all but the smallest m singular values of Vf(x) are close to 1 in a neighbourhood of x* we obtain from (9)
v(pt)=O(det(pt)l/m)=O(pk)
for all tE5 e
and similarly
v(pt)=O(pk+l/"~)=o(pk)
for t~17o1(0).
Conversely we derive from (12) and (13) for p t e ~
v- l(pt) = [rVf - 1(pt)][ = I1G - l (pt)ll (1 + O(p)) = O(p-k) SO that v (p t) = o (pk) implies t e/7 o 1(0). (iv) Suppose 1 7 0 ( 0 = 0 is not a local extremum. Then there must be sequences {tj-}j> 1 and {t+}j>l in 5e such that lim tj- = !im t f =t
and
llo(t~)
for all j.
Because of (9) there must be a sequence of multipliers {pj}~> ~ c l R such that det (p~ tj-) < 0 < det (p~ t +) which implies by the meanvalue theorem the existence of vectors
yj=pj(Tjt~ +(1 - a t ) t~+),
c~j6(0, 1)
Newton's Method at Singularities
101
with det (yj)=O for all j > 1. Since the y/pj are convex combinations of the t i and t ; we must have lim Yj/[lYjl[= t j~oo
so that t is tangential to det-l(0). Consequently any tellol(O) that is not tangential must be a minimiser or maximiser of H o.
4. The Form of a Newton Step from x i e
Using the partitioned form of Vf-1 (x) developed in the previous section and a corresponding expansion of f(x) we obtain an up to O(p 2) terms explicit representation of the single Newton step from some point in ~. Lemma 4.1. Under the assumptions of Lemma 3.1 there is a constant d such that
the Newton step from any xj=pjt36~ with aj:-a(tj) to the new point xi+ 1 satisfies xj+l k+lk g(xi) <:d(p/crj) 2 (14) where the homogeneous vector function g: (IR~
1(0)) +X__IR"
is given by x.
(15)
Proof. In order to obtain an approximate expression for f(x) we use the obvious identity
J'(x)= (i Vf(t~t)dp) t.
(16)
It follows from (7) and (8) that
p pk+ 1 ! B ( # t ) d p - ( ~ - ~ ) ! B(t)+O(Pk+2)=(k+l) p G(x)+O(pk+2)
p
and similarly ~ C(l~t) d# = k - ~ C(x) + O(p k + 2),
p
0
! D(#t)dl~= p D(x)+O(p 3) and
p ~
Substituting these expansions into (16) we find
(k~G(x)-l-o(pk+l),, k~cT(x)-I-O(pk+I)1 f(x)= \
89
)
E(x)+O(p)
Ix.
102
A.O. Griewank
Multiplying from the left by (12) we obtain
Vf - l(x) f(x) / 1 , (x)llO(p k+, ), =[~I+IIG-I_._ \ O(p)+ Ha-X(x)ll O(p k+2) ,
G_a(x) CT(X)+ iia_a(x)ll o(pk+l)~
k+lk
I+O(p)+ Ila-1(x)ll O(pk+l )
) x.
Because of (7), (13) and the boundedness of a we have a - 1 (x) CT(x)~- B - 1 (t)
~r(t ) + a -
2(t) O(p)
which completes the proof as we may use again (13) to bound I[G-l(x)l]. Provided the ratio (po/ao) is small enough the first step from x o = P0 toeS~ is essentially a projection like mapping to the vector g(xo) k/(k + 1) in the nullspace W~. Whenever g(to)4:0 we derive from (14) that the angle ~bl(t0) between g(to) and the exact x I is bounded by s i n ~ l ( t 0 ) = m i n 2x 1 z~* Similarly we obtain for the angles the subspace X
g(x0) < ( k + l ) d p ~ IIg(xo)] = k a 2 [Ig(to)]l"
(17)
Oj+1 between the subsequent iterates xj+ 1 and
sin Oj+ 1 = min Ittj +1 - yll < y~W
d(pjaj)2/Pj +1.
(18)
Using the uniform bound c-=max {IJC(t)]l + a(t) ItcSP} we derive from (14) with (15)
xj+l
kc sin 0j + ~ p j l p j k +k 1 xj =< [ (k +i-)aj
(19)
which implies by the inverse triangular inequality Pj+t
pj
k l < [=i
k+
kc
k ~Ci-) ~j
] ~j Pj
sinO~+ z7 d
(20)
and furthermore sin d ~bj=min ~
(k+l)d lltj-2xj+ll I <=aj c sin O j + - -ka~ PJ
where A ~9j is the angle between two consecutive iterates
(21)
xj and x j+ ~.
5. Linear Convergence in the Proximity of As one might expect intuitively the directions {tj}j>=o associated with a converging Newton sequence {xj=pjtj}~>=o belong for j > 1 essentially to the nullspace ~ . Conversely one can show under certain conditions that the Newton iteration
N e w t o n ' s M e t h o d at Singularities
t03
must converge if p~ and the angle between t I and ./V are sufficiently small. This result was obtained by Decker and Kelley [2] under the assumption that JffC~Hot(0)= {0}, which is never satisfied for odd dim(JV)> 1 and by Reddien [8] under the assumption that det(Vk+af(x*)sk)=~O for some s e X , which is never satisfied if any linear combination of the component functions of f is linear in the variables x. Excluding only those directions in JV along which the smallest singular value of Vf(x) is o(p k) we obtain the following result. Lemma 5.1. Under the assumptions of Lemma 3.1 there are two nonnegative
continuous functions ~:.A/'c~SP~IR
and
p~:JVc~-MR
such that for any se~C'~c~Ar-Hol (O) the Newton iteration converges linearly with common ratio k/(k + I)from all points in the nonempty starlike domain ~ ( s ) - { x = x * +ptl COS-I(tTs)<~(S), 0
7~
{ c o s - l ( t r s) lte,~c~ Ho I (0)} < =4
is obviously a nonnegative continuous function in se~Arc~9~ with 4)--1(0) =,9~ Yc~/7 01 (0). Consequently the two minima d'(s) =- rain {or(t) t te ~, cos- I (t r s) <=4)(s)}, ?(s) - min {f(t) I t e ~, cos- 1(t r s) < 4)(s)} exist and are both nonnegative and continuous on ,9~n,W with ~-1(0)=?-~(0) = 4)- 1(0). Abbreviating q(s)- 88sin 4)(s)_-<88we can now define recursively _q(s) (k+l)dP(s)_~ [kc/~(s) + k - q(s)' ( k - q(s)) ~2 ( S ) J '
sin ~ ( s ) = m i n ~
(22)
p (s) = (k - q (s)) ~2 (s) sin ~(s)
(k+l)d
which ensures p(s)<=~(s)<=f(s). Both functions are nonnegative and continuous on X n 5 P with ~ - 1 ( 0 ) = p - 1 ( 0 ) = 4)-1(0) so that for se,~c~A r "tiT(s)=0 iff H0(s ) ~---0.
Keeping seJr162 z - H o 1(0) fixed we show by induction that the sequence of Newton iterates {xj=pjtj}j>=~ from any initial point Xle~/~-=~(s} maintains the properties pj<~=-p(s), OjNd?=--~)(s), 0 i < 4 ) = 4)(s) (23) where ~j is the angle between s and tj, whose boundedness by 4) implies aj>~=_~(s) which will be used frequently. For the first iterate x~ the three conditions must hold by definition of ~ and because of the inequality sin 01 = min Ilt~ -zH < t[tt - s s r t l II
(24)
104
A.O. Griewank
Assuming that (23) holds for all
kc
d
(k+l)a~ sin0i+ai
i
Pi<
we obtain with (22)
"~
q _ q(s) (kc/~+k-q(s))
which implies by (20)
k-q
for i = l . . . j
(25)
for i=1 ...j.
(26)
[k +q]i jOi+l~Pl \~-~]
Using the left inequality in (25) we obtain from (18) and (22)
sin 0r +1 = ~
_~q~ < sin q~.
(27)
In order to obtain an upper bound on 0j+ 1 we consider the sums
(k + 1 ~ =<~2 (k - q) sin 6
~ PI<=Pl\l-q]
d(1-q)
i=l
(28)
and
~-1
d(k+l) JG1 <(k+l)
i=1 sin 0~+ ~< ~2(k_q) 2.. P i - - sinq~. = i=1 -(1 --q)
(29)
Recalling the definition of A0i in (21) we note J i=1
which implies because of 0 1 - c o s - l ( s r t l ) < ~ J sin0~+l < s i n 4 ; + ~ sinA0i. i=i
Using (28) and (29) we derive from (21)
,
c [ s i n 0 1 + \ (l - ~ !
~sinA0i_-<~
i=1
]
sin~ 4
k(l-q)
Adding to this sum sin ~ and applying the first inequality implicit in (22) we find
sm Oj+~
< q (l-q)
[!2+k-q)c/#+(k2+2k-2kq-q)/k kc/~+k-q ]"
It can be checked easily that the fraction in brackets is always < 3 so that by definition of q <88 sin 0~+ 1
Newton's Method at Singularities
105
Thus we have shown that all iterates stay in the set { x = p t It e ~ , c o s - 1(t r s) < 0(s), 0 < p < ~(s)},
which is by definition of f3(s) a subset of ~. Since (26) and (27) hold for all j > 1 we see from (20) that limp J+lk j - ~ pj (k+t) so that we must have linear convergence at the asserted rate.
6. The M a i n Result
As an immediate consequence of L e m m a 5.1 we note that the union - 0 { ~ ( s ) Is e X ~ -
n o 1(o)}
is a starlike domain of convergence too. In the particular case ~Ar=IR" this is quite a strong result since the only directions s e S e c ~ Y excluded from ~/U are those for which /~(s) is singular, which makes the Jacobian Vf(ps) essentially dependent on the (2 + k)-th and higher derivatives. In general we can expect that m = d i m ( J # ) is small compared to n so that the directions included in represent only a small fraction of the full unit sphere L, 60 in IR". Fortunately we can show under the condition F/o(JV')=I={0} that for most directions t0eS~ the first step from some point x o = P o t o E ~ leads into ~ provided Po is sufficiently small. Recalling the condition (4) for a singularity of order k and the definition of/3(x) as the restriction of (6) to JK we can now state our main result. Theorem 6.1. Let f have a singularity of order k at x * e f - i ( 0 ) and J~(s) be
nonsingular for at least one seJV" then: (i) There is a nonnegative continuous function r: 5P--+IR such that the Newton iteration converges linearly to x* with common ratio k/(k + 1)from any initial point in the starlike domain ~-{x=x*+ptlt~,OO=O }
is given by the intersection of 6P with the solution set 11-l(O) of the nontrivial homogeneous polynomial H(x) =-Ho(Ho(x) g(x)) Ilo(x ). (iii) For any t e r - l ( O ) that is not necessarily excluded from all starlike domains of convergence to x* either of the following conditions must be satisfied -
llo(X ) or H i ( x ) = llo(IIo(x ) g(x)) attains a local extremum at t, dr+ i --g(t)=O and dtlp+l f(x*+l~t)lu=o=O
where p c [ l , k] is the smallest index for which Vp+i f(x*) is nontrivial.
106
A.O. Griewank
(iv) For Newton's method to have a spherical domain of convergence about x* it is sufficient that II o1(0)= {0} and necessary that H o is either nonnegative or nonpositive on N n (Assuming n + 1).
Proof. (i) Since ~3(s) and sin 4~(s) are bounded the function { a2(t)~(g(t)/[lg(t)l[) I,g(t)[,o-2(t)s2~(g(t)/[Ig(t),,) } r ( t ) = m i n g(t), (drb+ca(t)+a2(t)), is well defined and continuous on ~. Now we derive from (17) and (20) that for any x o = p o t o e ~
sin~pl(to)
) and
pl
so that xleff/'(g(to)/[[g(to)[] ) which implies the assertion by Lemma 5.1. (ii) Inspecting the individual terms in the definition of r(t) we see that r(t)=0 iff teg-l(0)=a-l(0)_=Hol(0) or otherwise g(t)eHol(0). It follows directly from the definition of g(x) in (15) that the m nontrivial components of Ho(x ) g(x) are homogeneous polynomials of degree km+ 1 in x so that lI(x) is a homogeneous polynomial of degree (km+2)km in xelR". Clearly t e ~ belongs to the solution set H-1(0) of H iff H0(t)=0
or otherwise Ho(g(t))=0
which shows that r - l ( 0 ) = H - l ( 0 ) n ~ For any x e J V we have g(x)Ho(x ) =xlIo(x ) and consequently H ( x ) = H kin+2 o (x) so that neither Ho(x ) nor H(x) can vanish identically as by assumption Ho(s ) = det (/~(s)) =I=0 for some s c ~ . (iii) We know from Lemma 3.1(iv) that t c H o l ( 0 ) must be tangential to d e t - l ( 0 ) and therefore by Corollary 2.2 necessarily excluded from any starlike domain of invertibility unless H 0 attains an extremum at t. Now suppose H 1 does not attain a local extremum at some te~c~HFl(O)-Flol(O), that is included in a starlike domain of convergence ~r with boundary function a. Since d is open and H o has the same sign in a sufficiently small neighbourhood of t in ~9~ there must be sequences tf--*t and t~-~t of included directions such that F/l(t;) [Ii(t + ) Ho(t;)k~ -- Ho(g(t[ )) < 0 < Ho(g(t+)) = Flo(t; )km . Since a is lower semi continuous and positive at t the Newton steps from y ; -= pj t ; and y~- ---#~ tJ- to z f and z~- respectively are well defined for #j smaller than some fij with fij--,a(t). Combining (14) with (9) we obtain det (z~-) = ~
1 [ kt~j ~kmMo(g(tj))+O(IZ~m) \k + 1]
so that pj < fi~ may be chosen sufficiently small such that det (z~-) < 0 < det (zf). Since V f - l ( x ) f ( x )
is continuous on a domain of invertibility there must be
Newton's Method at Singularities
107
multipIiers ~jE(0, 1) such that the Newton step from each
yj = ]Ij(C(jt j
+
(1
--
Orj)tf)~ d
leads to a point z F d e t - l ( O ) . By assumption ~ is a domain of convergence to x* = 0 so that we must have z j= 0 for all j. Since the yj#j are convex combinations of the tf and tf we find sj-yj/llyjll--+t so that t is tangential to the set of points from which Newton's method converges in one step. Writing y j= zjsj we derive from zj = 0 O=
Vf (yj) y j - f (y)= [zj [Tf(z jSj)-- ~ Vf (IJsj) d p] sj 0
_ PZy+ l Vp+ t f (x,) s~+ t + O(~+ ~). (p+l)! Here we have used the fact that we obtain in the limit
Wf(x*)=O for i = 2 ... p. After division by
Tp+ 1
dp+ 1
dpp+ ~ f(x* +#t)Iu= o = lira Vp+'f(x*) sp+' =0. j~oo
Because of (4) a similar argument applied to the identity shows that
P~t Vk+ if(x*) tk+' =\0, (B(t),
c;(t)) t=0
which implies g ( t ) = 0 as/~(t) is nonsingular. (iv) If H o l ( 0 ) = { 0 } the set ~ - l ( 0 ) = / T g l ( 0 ) m 5 : 6 = m i n {o(t)]teSP} > 0 and furthermore e=min
P~ 17.f(yj)yj=p,,~f(yj)
is
empty
so
that
2 ( k + l ) d ' min {r(t) l t e ~ , Ilg(t)l[ > 1} >0.
Now consider any point xj=pjtj with pj1 then pj<=r(t) so that convergence is guaranteed by (i). If Ilg(t)ll < 1 we obtain from (14)
( kk_ + d p , ~ Pj+t < \ k + 1
2k+1 ~ I P J < 2 k + 2 pj"
Thus we must have in any case at 1east linear convergence to x*. I f / 7 o attains positive and negative values there must be a tES"c~/7o 1(0) that is neither minimiser nor maximiser and therefore necessarily excluded by L e m m a 3.1(iv). Before discussing the theorem in the final section of this paper we illustrate its results on functions in two variables.
7. Examples in Two Dimensions After suitable nonsingular affine transformations any function f~C3(IR 2, IR 2) with a Jacobian of rank 1 at a first order singularity can be written in the form
108
A.O. Griewank
ax2+2y2 ) f(~)=
y+~ y2 O~X2 + f l x y + 2
SO that (
6x
,
ey
2
If 6 = 0 our results are not applicable as /7 0 vanishes identically. Otherwise we can use linear transformations to obtain 6 = 1, c~=0 and ee{ - 1, 0, + 1}. Thus we find /70(X, y)=X,
g(x, y) =((X 2 + gy2)/X, 0) T,
F/l(X ' y) =(X2 + gy2).
Consequently the set of excluded directions is given by 5~c~H_1(0)= {~ J" +(0, 1)} ({ +(0, 1), (_+1, _+ 1)/1/2}
if ee{0, 1} if e = - 1 .
Since H 0 attains positive and negative values in the neighbourhood of (0, 1) the y-axis is necessarily excluded from any starlike domain of invertibility so that 5Pc~H-I(0) is minimal if e > 0 . If e = - I the two straight lines {x=_+y} are mapped by g into the origin but H a attains positive and negetive values in their neighbourhood. Since p = k = 1 we have with 6 = 1 and ~ = 0 d2 d~ 2 f
# 0 p .=o= 2fl+y
(t
( )
and
-- f dp 2
# p .=o =
()
0 -2fl+y
(
)
so that by Theorem 6.1(iii) the directions {(+1, _+1)/1/2} are necessarily excluded whenever lYl + ]2B]. Secondly we consider the case where the Jacobian o f f e C3(IR 2, 11t2) vanishes at a first order singularity with nontrivial polynomial no. After suitable affine transformations we have with ee { - 1, 0, 1}
/ so that
x ,
e,y
We need only consider
17Jo(X,y)=o~X2 + flxy--o:e, y2=(X, y) 89
Newton's Method at Singularities
109
Depending on whether the determinant det(V21Io)=-(4~2E +f12)is positive, negative or zero there is a spherical domain of convergence, a minimal set of two necessarily excluded straight lines or one not necessarily excluded straight line respectively. The last case is particularly interesting as we have for ~ = 1 and fl=O=~ /1X2\
with
If the higher order terms are zero all points on the y-axis are solutions of f and Newton's method converges from all other points linearly to the particular solution x * = 0 . Even though x* is not an isolated solution we obtain from Theorem 6.1 a starlike domain of convergence to x* with only the y-axis excluded. In contrast the result is not applicable to the other solution points as the corresponding polynomials H o vanish identically. If the higher order terms are of the form (_ 88 0)r then both
and the determinant of the Jacobian
-Yx3)=x2+y 4
d e t ( V f ) = d e t (~ I
vanish only at the origin x * = 0 . The Newton iteration is given by
y2+,!
2\ys/
xZ+y 4
4
which yields linear convergence to x* from all points on the y-axis with ratio 88 and from all others with ratio 89 Consequently f has a spherical domain of convergence, which does not follow from Theorem 6.1(iv) as the condition / / o ' ( 0 ) = {0} is not met. If on the other hand f is of the form
f (;) = 89( Xy
'
then the determinant d e t ( x ) = x 2 - y 4 vanishes on the parabolas {x= +_y2} so that the y-axis must be necessarily excluded.
8. Discussion The solution sets of nontrivial real polynomials form a collection of lower dimensional manifolds and are therefore of measure zero. In the case of
110
A.O. Griewank
homogeneous polynomials the same is true for the intersection of the solution sets with the unit sphere 5C Therefore we can conclude that whenever the equivalent conditions det (/~(t)) =#0 . ~ IIvf - ~(p t)]l = O(p -k) are satisfied for at least one teJV" then almost all directions are included in the starlike domain of convergence N. Since nontrivial homogeneous polynomials are unbounded and all their stationary points must have zero value it is quite likely that they have no local extrema besides the origin. If this is the case and the set
{te S e - Hol(O)lg(t)=O , VP+ l f (x*) tP+ l=O} is empty then the set of directions excluded from N is minimal so that the boundary function of the maximal starlike domain of convergence to x* differs from r(t) only in size but not sign. For any ter-l(O) that satisfies either of the two conditions in Theorem 6.1(iii) the question whether it is necessarily excluded can only be decided on the basis of (2+p)-th and higher derivative information. Whenever the assumptions of Theorem 6.1 are satisfied the possibility of faster convergence from some initial point outside N seems rather remote. Unless converging in a finite number of steps the iteration would have to approach x* through the narrowing channels excluded from N, which would imply for the smallest singular value v(xj) of the Jacobian at the iteration points {xj = x* + p~tj} that lim v(x~) pfk=o. j~oo
Provided m < n the conditioning of the Jacobians Vf(x~) would therefore deteriorate rapidly so that rounding errors are likely to lead eventually to a point in N. Inside ~W the smallest singular value v declines by Lemma 3.1(iii) and Lemma 5.1 linearly with the ratio
[k/(k + 1)]ke [l/e, 1/2] which is bounded below by 1/e no matter how high the order of the singularity. This "cautious" approach to the singularity should enable Newton's method to "squeeze" the maximal accuracy out of the routines for the evaluation of f, the Jacobian Vf and the subsequent solution of the linear system VfAx = - f It will be shown in a second paper that the convergence of these slow but highly structured and reasonably robust Newton iterations can be accelerated by a Richardson extrapolation procedure. Whenever a singularity is irregular in that /~(t) is singular for all t ~ Y Newton's method must be expected to behave very irregularly, which was
Newton's Method at Singularities
111
o b s e r v e d b y G r i e w a n k a n d O s b o r n e [ 4 ] for k = 1 = m. I n fact s u c h u n p r e d i c t a b l e b e h a v i o u r s e e m s m o r e likely t h a n l i n e a r c o n v e r g e n c e w i t h r a t i o s o f 2/3, 3 / 4 etc., which only occurs under rather special circumstances.
Acknowledgements. The author wishes to acknowledge the benefit of many discussions with M.R. Osborne during the preparation of this paper.
References 1. Cavanagh R (1970) Difference equations and iterative methods. PhD Diss Univ of Maryland, College Park, MD 2. Decker DW, Kelly CT (1980) Newton's method at, singular points. I. SINUM 17:66-70 3. Goldstein A (1967) Constructive real analysis. Harper and Row New York 4. Griewank A, Osborne MR (1980) On Newton's method for singular problems. SIAM (in press) 5. Ortega JM, Reinboldt WC (1970) Iterative solution of nonlinear equations in several variables. Academic Press, New York 6. Rall L (1966) Convergence of the newton process to multiple solutions. Numer Math 9:23-27 7. Reddien GW (1978) On Newton's method for singular problems. SINUM 15:993-996 8. Reddien GW (1979) Newton's method and high order singularities. Comput Math Appl 5:79-86
Received January 2, 1980