Numer. Math. 59, 561 580 (1991)
Numerische Mathematik
~) Springer-Verlag 1991
Quadratically constrained least squares and quadratic problems* Gene H. Golub i and Urs von Matt 2 1 Department of Computer Science, Stanford University, Stanford, CA 94305, USA z Institut ffir Wissenschaftlicbes Rechnen, ETH Zentrum, CH-8092 Zfirich, Switzerland Received May 31, 1990
Summary. We consider the following problem: C o m p u t e a vector x such that II A x - b II 2 = min, subject to the constraint It x It 2 = ~, A new a p p r o a c h to this p r o b l e m based on G a u s s q u a d r a t u r e is given. T h e m e t h o d is especially well suited when the dimensions of A are large a n d the m a t r i x is sparse. It is also possible to extend this technique to a constrained q u a d r a t i c form: F o r a symmetric m a t r i x A we consider the m i n i m i z a t i o n o f J A x - 2bTx subject to the constraint
I1x II 2 = ~.
Some numerical examples are given. Mathematics Subject Classification (1991): 65F20
1 Introduction Let A be a given m-by-n m a t r i x and b a k n o w n vector. It is frequently desirable in various statistical applications to determine a vector x such that (1)
[IA x - b [[2
:
min
and (2)
I1x II 2 = ~ .
The p a r a m e t e r c~ is prescribed a n d we assume t h a t ~ < II A + b I]2. This p r o b l e m or equivalent ones arise in regularization techniques. The p a r a m e t e r ~ in (2) m a y only be k n o w n a p p r o x i m a t e l y so t h a t it m a y not be necessary to solve the p r o b l e m precisely.
* This work was in part supported by the National Science Foundation under Grant DCR-8412314 and by the National Institute of Standards and Technology under Grant 60NANB9D0908. Offprint requests to: G.H. Golub
562
G.H. Golub and U.v. Matt
A related problem is the following: Let A be a symmetric real n-by-n matrix. We are interested in finding an x such that (3)
x T A x -- 2bXx = min
subject to the constraint (4)
Ilxl12 = ~ ,
We assume that the parameter c~satisfies the condition ~ < It (A - ~rI) § b It 2, where a denotes a lower bound for the spectrum of A. One technique for solving such problems is to introduce a Lagrange multiplier 2. The determination of this parameter can be quite delicate and may involve the computation of the singular value decomposition (cf. [9-1) or the eigenvalue decomposition. A complete theory of solving such problems is given in [3] and I-4]. It is often not necessary to determine the Lagrange multiplier exactly; our technique yields a sequence of upper and lower bounds and enables us to compute an approximate solution x without many additional computations. In Sect. 2 we consider the solution of the constrained least squares problem (1), and Sect. 3 analyses the solution of the constrained quadratic form (3). Since our analysis is strongly based on the theory of Gauss quadrature, we give a complete review of the necessary theory in Sect. 4. In Sects. 5 and 6 we present the postponed proofs of the two theorems in Sects. 2 and 3. Numerical results are presented in Sect. 7.
2 A constrained least squares problem Let A be a general real m-by-n matrix and b a vector with A +b ~= 0. We consider the problem of finding an x with (5)
[IA x - b II2
(6)
"~-
min ,
II x I[ 2 = c~,
where the parameter c~ satisfies the condition (7)
0 < ~ <
IIA+bllz.
2.1 Lagrange equations
In order to calculate the stationary points we set up the Lagrange principal function ~ :
9 (x, ,t):= I[Ax - / , [ I ~ + ,~(If xll 2 - ~2) = x T A T A x _ 2 x T A r b + bTb + 2 ( x T x _ ~2) .
Differentiating 9 with respect to x and 2 yields the Lagrange equations: (8)
ATAx + 2x = A~b , xTI
:
O~2 .
Quadratically constrained least squares
563
In [3], Gander analysed problems of this kind. By comparing the stationary points he showed that the largest value of 2 will minimize (5). In addition, condition (7) ensures that this 2 is the unique solution of (8) where )o > 0.
2.2 Bounds for the Lagrange multiplier By combining the two equations in (8) we get the secular equation (9)
f().):=(ATb)T(ATA + AI)-2(ATb) = ~2 ,
that we must solve for the largest 2. Since f ( 2 ) represents a nonlinear function, an iterative method is needed to solve (9). Consequently, the values of f ( 2 ) and f ' ( 2 ) have to be evaluated for several different arguments. Each evaluation of f ( 2 ) and f ' ( 2 ) can be accomplished by solving a linear least squares system of the form
xf2l
x -
= min
or by computing the singular value decomposition of A. In the latter case we get an explicit expression for f(2). In [3] these solutions are extensively analysed. Since these methods are applicable only for small matrices A, we have to resort to different techniques in the case of a large sparse matrix. We use the Lanczos algorithm [7] to compute the bidiagonalization A = PBQ T ,
(10)
where P is a m-by-m orthogonal matrix with b = matrix with ATb = II AXbllql, and
(11)
B:=
II b Ilpl, Q is a n-by-n orthogonal
61 ~2 62
is a m-by-n lower bidiagonal matrix. We present the procedure recommended by Paige and Saunders [16] as Algorithm 1. Algorithm 1. Lanczos bidiagonalization of a sparse general m-by-n matrix A
Let b # 0 be given. Wo:=b 6o:= ii w0 II qo := 0 for i:= 1, 2. . . . do
k:=(i + 1) div 2
564
G.H. Golub and U.v. Matt if odd (i) then pk:=Wi-1/6k-1 W i : : A T p k -- ~ k - l q k - 1 ~k: = II wl II
else qk := Wi- 1/7k
wi := A qk -- YkPk ~k:----II wi
II
end end Now, the secular Eq. (9) has the form (12)
f ( 2 ) = IIATblIZeT(BTB +/~1)-2 e l = ~2 .
Naturally, it is too costly to compute the complete bidiagonalization (10). Rather we will try to get by with the first k columns of B and compute an approximation of f(2). Let us assume that we know a value 0. > 0 such that 0.2 represents a lower bound for all the eigenvalues of ATA, i.e. 0 < 0.2< 2min(ATA). If m > n this condition is equivalent to 0 < 0. < 0.min(A). Conversely, m < n implies 0. = 0. In the absence of further information we can always set 0. = 0. As the matrix B T B - 0 . 2 I . is positive semidefinite, we can compute the Cholesky decomposition (13)
BTB _ 0.2In = u T u ,
where
(Pl ~/1 (14)
U:=
tp2
] ~'2
".
".
is a m-by-n upper bidiagonal matrix. Note, that only the r . . . . . ~0m~r,,,) and ~ 1 , - . . , ~Om~,,,,~-1 are different from zero. Then, we have the following theorem:
elements
Theorem 2.1. Let A be a 9eneral m-by-n matrix, and let the parameter 0. >=0 satisfy 0 < 0.2 < 2j f o r all the eigenvalues 2i of ATA. Assume that the matrix A is decomposed into a lower bidiaoonal matrix B accordin9 to Eq. (10). Furthermore, assume the knowledoe of the decomposition (13). Define r to be the first index i f or which ];i+161 = O, or min(m, n), if no such i exists. If 2 > _ a 2 and 1 < k < r, then the following inequality holds:
(15) IIATbft2eT(U~Uk + (a 2 + 2 ) I ) - m e l =< (ATb)T(ATA + 2 I ) - " ( A T b )
< iiATbll2eT(~kTtjk + (~2 + 2)])-mel .
Quadratically constrained least squares
565
Here m > 1 denotes an arbitrary natural number, Uk denotes the first k columns of U, and Uk is equal to Uk with the element (Ok replaced by the value of zero. For k = r, the equality
(16)
(Zrb)r(zTz
+ ,~l)-m(ATb) = [IhTbl]2e~(UTUr + (a 2 + 2 ) I ) - " el
applies.
The proof of this theorem will be presented in Sect. 5. In order to make use of Theorem 2,1, it is only necessary to compute Pk, Bk, Qk, and Uk, where these quantities denote the first k columns of the corresponding matrices. Let us define the two functions (17)
~'k(2):= IIa Tb 112e[(Uk~ Uk + (a 2 + 2)I) -2 el ,
(18)
q/k(2):= IIA Tb II2 e~( ~TkX['~k -}- ( ~ -{- ,~)1) - 2 el 9 Theorem 2.1 shows that
(19)
oLfk(2) < f ( 2 ) < 0//k(2 )
for 2 > _ a z and 1 < k < r (see Fig. 1). It is therefore possible to compute the values 2 and Xsuch that 5Ok()0 = ~2 and ~//k()--) = ~2. Obviously, 2 and 2 denote lower and upper bounds for the zero of (9). Finally, if k = r, we have the equality (20)
/ ( 2 ) = Lf,(2).
2.3 Calculation of the solution
When the value of ,~ is determined, we can compute the vector x from the linear least squares system (21)
x -
-- min .
I
f(,~)
/A(2(,~) Ul(,~)
!,..
\-".....
c2(a) Ot2 L 1(,'~)
_~2 Fig. 1. Bounds on the secular function f(2)
566
G.H. Golub and U.v. Matt
This can be accomplished by Algorithm LSQR (cf. [16, 17]).
by Paige and
Saunders
3 A constrained quadratic form Let A be a symmetric real n-by-n matrix. We are interested in finding an x such that (22)
xTAx -- 2bTx = m i n ,
(23)
IIx II2 = ~ 9
In order to exclude some disturbing special cases, we assume that the parameter c~ satisfies the condition (24)
0 < ~ < II(Z - ~rI)+bl[2,
where a denotes a lower bound for the spectrum of A.
3.1 Lagrange equations In order to calculate the stationary points we set up the Lagrange principal function cb: 9 (x, 2):=xTAx - 2bTx + 2(xTx - ~ 2 ) Differentiating 4~ with respect to x and 2 yields the Lagrange equations: (25)
A x + 2x = b, xTx
~
O~2 .
In [-4] this problem is extensively analysed. The conclusion is that, under the given assumptions, there is a unique 2 > - a for which (22) is minimal.
3.2 Bounds for the Lagrange multiplier By combining the two equations in (25) we get the secular equation (26)
f ( 2 ) : = bT(A + 21)-2b = a 2 ,
that we must solve for a 2 > - a . Equation (26) must be solved by an iterative method which requires the repeated evaluation of f ( 2 ) for different values of 2. This could be accomplished by either solving a sequence of linear systems or by computing the eigenvalue decomposition of A. In the latter case we get an explicit rational function for f(2). The details of these two approaches are discussed in [4]. For a large and sparse matrix A these direct approaches become infeasible. What is still possible is the evaluation of the product Ax. In particular, we can use the Lanczos process to compute the tridiagonalization (27)
A
=
QSQ
T ,
Quadratically constrained least squares
567
where Q is an orthogonal matrix with b = IIb IIql, and
(28)
S:=
61
".
".. 6.-1
7. A
is a tridiagonal matrix. The implementation of the Lanczos process recommended by Paige [15] is presented as Algorithm 2. Algorithm 2. Lanczos tridiagonalization of a sparse symmetric n-by-n matrix A
Let b :t: 0 be given. Wo:= b ~o: = IIWo II qo:=0 for i:= 1, 2. . . . do k:=(i + 1) div 2 if odd (i) then qk := Wk- 1 / 6 k - 1 Uk:= Aqk -- '~k- a qk-1 7k:=q~r uk
else Wk := Ilk -- 7k qk
~k:= IIwk I[ end end
The secular Eq. (26) can then be written as (29)
/(2) = [Ihi[ 2e~(S
+ 21) -2 el = ~2.
Of course, we do not want to compute the whole n-by-n matrix S, since this would be too costly. Instead, we would like to approximate f ( 2 ) by Sk, the first k - b y - k leading principal submatrix of S. Since S - a I is a positive semidefinite matrix there always exists the Cholesky decomposition
(30)
S - al
=
U T U
,
where ".
(31)
".
U:= "'.
is a n-by-n upper bidiagonal matrix. Now, the following-theorem holds:
~]n-- 1 qg.
568
G.H. Golub and U.v. Matt
Theorem 3.1. Let A be a symmetric matrix, and let the parameter ~rsatisfy tr < 2ifor all the eigenvalues ,~i of A. Assume that the matrix A is decomposed into a tridiagonal matrix S according to Eq. (27). Furthermore, let the matrix S have the decomposition (30). Define r to be the index i of the first fli = 0, or n, if no such fli exists. If2 > - t r and 1 < k < r, then the following inequality holds:
I[bll2e~(UffUk+(a+2)l)-mel~bT(A q-,~l)-mb
(32)
<= Ilbll2e~ (U~t]k + (~ + 2 ) I ) - m e l . Here m > 1 denotes an arbitrary natural number, Uk denotes the first k columns of U, and Uk is equal to Uk with the element
(33)
applies. The proof of this theorem is deferred until Sect. 6. In order to make use of T h e o r e m 3.1 it is only necessary to compute Sk, Qk, and Uk, where these quantities denote the first k columns of the corresponding matrices. Let us define the two functions (34)
"~k('~) := IIb II2 e~( U k T U k
(35)
~//k(2):= IIb II2 e~ ( [~kT (-~k -[- (0" -~ )0I)
"~
(a -~ 2)1)--2 el , - 2
el 9
By T h e o r e m 3.1 (36)
~k(2)
0
for 2 > --tr and 1 < k < r. Now, we can compute the values ;~ and • such that L~ak(2) = ct 2 and q/k(2-) = ~2. Obviously, )~ and Xdenote lower and upper bounds for the zero of (26). Finally, if k = r, we have the equality (37)
f(2) = ~ ( 2 ) .
3.3 Calculation of the solution If the value of 2 is k n o w n the vector x is given as the solution of the linear system (38)
(A + 2 I ) x = b
with the symmetric and positive definite matrix A + 21. There is an a b u n d a n t literature a b o u t this problem. The reader is referred to [8, 9] for an overview of available methods.
4 Review of Gauss quadrature In this section, we summarize some facts from the theory of Gauss quadrature which are needed to prove T h e o r e m s 2.1 and 3.1. F o r a more detailed discussion of this topic, see [2].
Quadratically constrained least squares
569
4.1 M a t r i x moments
Let us define the symmetric, positive definite n-by-n matrix T as
(39)
T:=
f l "'. l
", fin - 1
O~n
-]
We assume throughout our discussion that fl~ 4: 0. The eigenvalue decomposition of T is given by (40)
T = QDQ T ,
where D is diagonal and QTQ = I,. It is well-known [18, p. 124], that Tpossesses n real, distinct eigenvalues 0 < 61 < 02 < 9 9 9 < O n . The expression (41)
p_m:=e~T-mel
witb m _> 1 is called the - m
|
~-m = e ~ Q D - m Q T e 1 =
i=1
q~i
= "
--dco(x), a Xm
where we have read the weighted sum as a Stieltjes integral of the "integrand" 1
(43)
~b(x): = x m .
By the weight function ~o(x), we mean a staircase function with steps of size qxZ~at the eigenvalues 6~. The limits a and b in (42) bound the spectrum of T, i.e. 0
6,.
4 . 2 0 r t h o g o n a l polynomials
By means of the matrix T we can immediately define a sequence of polynomials {Pi} ~'=0 which satisfy the three term recurrence relationship: (44)
Po = 1 ~ P o + fl~Pl = xpo
fli_lPi_2q-otiPi_l'-[-fliPi=Xpi_l
i=2 .....
Obviously, Pl is of degree i. With the definition
F p0(x) ] (45)
p(x):=|
i
/
L p.-~(x)J
n.
570
G.H. Golub and U.v. Matt
Eq. (44) can be written in matrix terms as
xp(x) = Tp(x) + fl.p,(x)e. .
(46)
Let q, be the eigenvector of T corresponding to the eigenvalue 61 with qTql = 1. We now compare q~ with p(6~), the polynomials Po, 9 9 9 P,-1 evaluated at 61. It is easy to see that the elements of qi and p(6~) satisfy the same three term recurrence relationship, but with a different initial value. We deduce, that q~and p(6~) are equal up to a scaling factor: qi = qliP(rl). Po
(47)
By introducing the two matrices
p:= [
,:
""
Po!~5,) ]
(48)
W'-
and
1 [qll
'-Po l
Lp,_,(6a)--p.-1(6,)
"'.
], ql.
J
equation (47) can be written as (49)
Q = P w.
By means of Eq. (49) we can evaluate the integral
(5o)
[, pk(x)pt(x)dco(x) = a
q~iPk(bl)Pl(61) = ekr+lP w2pTet+I = 6kt, i=l
where 6kl =
10 if k = / , if k + 1
.
It is obvious from (50), that P o , . . . , P.-I are the orthogonal polynomials with respect to the weight function co(x). Consequently, the matrix T not only defines the weight function c0(x) but it also contains the coefficients of the three term recurrence relationship of the orthogonal polynomials Po . . . . . P,- 1. This observation represents the key for approximating (41) by means of Gauss quadrature: (51)
b a
k i=1
#-,, = S q~(x)dco(x) ~ ~ wi~(xi) .
Note, that the integration of the constant function q~(x)= 1 yields the value of #o = 1, which means that the weight function co(x) is already normalized.
4.3 Gauss-rule We now describe how to use a Gauss-rule for computing (51). We already know that the orthogonal polynomials with respect to the weight function co(x) satisfy a three term recurrence relationship described by the matrix T. The Gauss-rule can therefore be determined according to Algorithm 3 (cf. [10]).
Quadratically constrained least squares
571
Algorithm 3. Gauss quadrature rule Let Tk be the k-by-k leading principal submatrix of T. Compute the eigenvalue decomposition Tk = W X W T. The abscissas xl are given by x,. The weights wl are given by #oW~i.
The error law states [13, p. 104] b
(52)
k
E[q)]:=~ ~ ( x ) d o g ( x ) - ~ wi~(xi) a
-
i=1
(x - xi) 2 dco(x)
(2k)t ai=l
where 0 < 61 < ~ < 6n. Since (~(2k)(~) = (m + 2k - 1)! ( - - 1 ) 2k
(m - 1)!
~m+2k > 0 ,
the error E[~b] is guaranteed to be positive for all m > 1. Therefore, the quadrature rule k
(53)
~ w,q)(x,) = e~ W X -" WV el = e'[ Tk-"el
provides a lower bound
e•T-"el
(54)
>=e~Tk-"el
for (41).
4.4 Gauss-Radau-rule A second approach to computing a quadrature rule consists in prescribing the abscissa xl = a. Then, the corresponding Gauss-Radau-rule can be determined according to Algorithm 4 (cf. [5]). The error law states [13, p. 168] b
(55)
k
E[~] :=S ~(x)do~(x)
-
a
~
w,~(x,)
i=1
~ 2 ~ - 1)(~) i k - (-~_-~., (x - Xx) H (x - x,) 2 do~(x) a
i=2
where 0 < xl < ~ < 6,. Since q~(Zk-1)(r
=
(m + 2 k - 2)! ( - 1 ) 2k-1 (m-
1)!
~ra+2k-1
<0,
572
G.H. Golub and U.v. Matt
the error E [ ~ ] is guaranteed to be negative for all m > 1. Therefore, the quadrature rule k
(56)
~ w~q~(x~) = e~ W X -m W T el = e~ ~k-mel i=1
provides an upper bound (57)
e~ T - " el < e~ Tk- me1
for (41). Algorithm 4. G a u s s - R a d a u quadrature rule
if k = 1 then
The abscissa xl is prescribed. W l ~= fl0
else Let Tk-1 be the (k - 1)-by-(k - 1) leading principal submatrix of T. Solve (Tk-1 -- x l l ) u = e k _ 1 for Uk-1. ~k:=X1 + fl2-1Uk- 1 Calculate the eigenvalue decomposition
Tk::
~1
"'.
"'.
"'.
~k-1
flk- ~
:
WX W T
The abscissas x~ are given by xu. The weights wi are given by #oW2i. end
In order to complete the argument, we could also compute a lower bound by prescribing the abscissa Xk = b. But since the ordinary Gauss-rule already provides a lower bound, we will not pursue this approach any further. Similarly, a G a u s s - L o b a t t o quadrature rule could be used to get an upper bound by setting xl = a and Xk = b.
4.5 S u m m a r y
We are now able to allow for vanishing off-diagonal elements ill. Let r be the index i of the first fli = 0, or n, if no such fl~ exists. Obviously, (54) and (57) hold for k < r, and for k = r, we have the equality (58)
e~T-"ex
= e~T,-mel .
To make a long story short, we have proved
Quadratically constrained least squares
573
Theorem 4.1. Let
(59)
T:=
cq
/~1 ".
]
"', fin- 1
O~n
be a symmetric tridiagonal positive definite matrix. Let r be the index i of the first fli = O, or n, if no such fli exists. Assume the knowledge of the parameter xl satisfyin9 0 < x~ <=2j for all the eigenvalues 2~ ofT. l f l <=k < r, then the following inequality holds:
(60)
e[ Tk- me1 ~ e[ T - " el < e~ rk-mel
.
Here m > I denotes an arbitrary natural number, Tk is the k-by-k leading principal submatrix of T, and Tk is computed by (61)
Tk:= [ x l ] ,
if k = 1, or by
(62) "".
(63)
O~k-I
8k:=Xl + f l ~ - l e T , ( T k - 1
ilk-
I
'
-- x l l ) -1 ek-1 ,
/fk > 1. Finally, if k = r, the equality (64)
e T r - " el = e T Tr- me1
holds. Theorem 4.1 represents the main result of this section. We will make use of it in the next two sections in order to give the postponed proofs of Theorems 2.1 and 3.1.
5 Regularized normal equations We can now present the proof of Theorem 2.1. Let us consider the secular function f ( 2 ) as given in Eq. (12). In a first step, we will identify the quantities T and xl of Sect. 4.5. O u r assumption on a implies that B T B - a z I is a symmetric positive semidefinite matrix, and, because of a 2 + 2 > 0 , the matrix B T B - a 2 I + (a 2 + 2)1 = B T B + 21 is positive definite with a z + 2 as a lower b o u n d for all the eigenvalues. This observation leads us to the definitions (65)
T : = B T B + )~I ,
(66)
xl:=a 2 + 2.
574
G.H. Golub and U.v. Matt
It is now clear that the value of r, as it has been defined in Theorem 2.1, denotes the index i of the first vanishing off-diagonal element fl~ in T. If we take the Cholesky decomposition (13) into account, we can express the matrix T by (67)
T = UTU + (o 2 + 2 ) I .
Thereby, the assumptions of Theorem 4.1 are satisfied, and the first inequality in (60) reads eTTk-"ei < e T T - ' e i , or, in the original notation, eT(UT Uk + (a z + 2)I)-m el ~ eT(BT B + 2 I ) - " e l = eT(QTATAQ + ~I)-mel = eTQT(ATA + 21)-mQe l (ATb)T(ATA + 2 I ) - " ( A T b ) IIATbll 2 On the other hand, Theorem 4.1 also implies that erl T - ' e l < eT ~'k-'nel . The value of ~k in (63) can be calculated differently by considering (66) and (67): ~k = 0.2 "~ 2 -1- (~k_llPk_l)2 eT_l(UT_l Uk_l --}-( 0 -2 = r z + 2 + (~k-l~k-1)2eT-l(UT-1
'1-
2)I - (a 2 + 2 ) I ) - i ek_l
Uk-1)-lek-1
I
= ,r 2 + 2 + (,pj,_l,pk_l) 2 ,p2_~ = G2 --~- 2 -.JI- 1~2 1 ,
We can represent Tk in (61) and (62) by
(68)
fk = tT~ tTk + (~ + 2)1,
where Uk is the matrix Uk with the element ~0k replaced by Ok. The value of (bk satisfies ~k = ~ L ~ + , ~ + a 2 + 2 = a 2 + 2 + ~ - 1 ,
from which q3k = 0 follows. Therefore, the second inequality of Theorem 4.1 now reads (ATb) T (ATA + 21)-m(ATb) e~(UT(~k + (0.2 + 2)I)-mel . IIA T b II2
Thereby, we have proved Theorem 2.1.
6 Regularized linear system This section contains the deferred proof of Theorem 3.1. We consider the problem of giving lower and upper bounds to f(2) in (29).
Quadratically constrained least squares
575
In a first step, we will identify the quantities T and xl of Sect. 4.5. Our assumption on a implies that S - aI is a positive semidefinite matrix, and, because of a + 2 > 0, the matrix S - a I + (a + 2)1 = S + 21 is positive definite with a + 2 > 0 as a lower bound for all the eigenvalues. Therefore let (69)
T:= S + 21,
(70)
xl:=a + 2 .
By making use of the Cholesky decomposition (30), the matrix T can be expressed by (71)
T = U T U + (a + 2 ) I .
Now, Theorem 4.1. states e~Tk-"el < e~T-"el
,
or, in the original notation, e~(UTUR + (a + 2)I)-me1 < e'~(S + 2 1 ) - " e l = e~(QTAQ + 21)-'el = e~QT(A + 2I)-'Qel bX(A + 21)-rob
II/,ll ~ On the other hand, Theorem 4.1 also implies that e~T-"el
< e~Tk-"el 9
The value of ~k in (63) can be calculated differently by considering (70) and (71): ~k = cr + 2 + (~Ok-l~Ok-1)2 e~-l(U~-i Uk-1 + (or + 2)1 -- (~r + 2)1) -1 ek-1 = ff + 2 + ((pk-lt~k_l)2eT-l(Uk~.- 1 U k - 1 ) -1 ek-1 1 = ~ + 2 + (~o~_~0k_x) ~ ~o:_1
=~+2+qJLx. We can represent Tk in (61) and (62) by (72)
Tk = 0 [ 0 k + (~ + 2)1,
where Ok is the matrix Uk with the element ~0k replaced by q3k. The value of ~?k satisfies ~k = 0 ~ - 1 + ~
+ ~ + 2 = ~ + 2 + ~0~-1,
from which ~?k = 0 follows. Therefore, the second inequality of Theorem 4.1 now reads bT(A + 2 1 ) - " b < e~r(0] 0k + (~ + 2)I)-me~ . Ilbll 2
576
G.H. Golub and U.v. Matt
7 Numerical results In this section, we present some numerical examples for the constrained least squares problem (5).
7.1 Zero finder In Sects. 2 and 3 we encounter the problem of solving rational equations of the form (73)
f ( 2 ) = -- (x, + 2) 2 i=1
"
In each case we k n o w a parameter ~ with x~ > 4. Additionally, we assume that there exists a solution 2 of (73) with 2 > - 4. We are interested in finding an iteration process that will find the desired zero reliably within a few steps. First, let us assume that an initial value 2o > - ~ with f ( 2 0 ) > a z is available. Then, we can apply the iteration (74)
2k+1 ----"2k --
~
\
with the numerical termination criterion 2k+ 1 < 2k. This method is discussed in more detail in [4, 19]. If, however, only an initial value 2o > - ~ with f ( 2 o ) < ~z2 is known, we can derive another iteration process if we approximate f ( 2 ) by the replacement function a
(75)
9 ( 2 ) : = ( ~ + 2) 2 - b ,
where (76) (77)
a = - 89 b= - 89
+ 2k)3f'(2k), + 2k)f'(2k)--f(2k),
are chosen such that g(2k) = f ( 2 k ) and g'(2k) = f ' ( 2 k ) . It is possible to show that g(2) > f ( 2 ) for - ~ < 2 < 2k. Again, the zero 2k+ t of g(2) = ~2 represents an upper b o u n d for the desired zero of (73). Therefore, we get the iteration process (78)
2k+1 = ({ + 2k)
~/i
(~ + 2k)f'(2k)
{ + 2k)f'(2k) + 2(/(2k) -- ~2)
with the numerical termination criterion 2k+ 1 > 2k. This scheme will find the zero 2 > - ~ of (73) if it is not too close to - {.
7.2 Construction of test cases The matrix A was constructed synthetically such that it had a prescribed distribution of its singular values. This was accomplished by setting (79)
A:= UZV x ,
Quadratically constrained least squares
577
where UH T
U = I,, - 2 ]]u II--~'
(80)
VI2 T
(81)
V=
I. - 2
ii vii2,
are Householder matrices and (82)
2; = diag (ak)
is a m-by-n diagonal matrix. The vectors u, v, and b contain pseudorandomnumbers: ul = 13846
ui = (31416 ui-1 + 13846) m o d 46261
i = 2. . . . .
m
vl = 13846 vi = (42108 vi-1 + 13846) m o d 46273
i = 2,.
., n
i=2,.
.,m.
bl = 13846 bi=(45278bi-l+
13846)mod46219
7.3 Stoppin9 criterion The question remains when the values of -2 and 2 represent a sufficiently accurate approximation of the zero 2 of (9). In m a n y applications the constraint (6) need not be satisfied exactly, but it suffices if the n o r m of the solution x satisfies the inequality (83)
e < [Ixll2 < 8 .
In this case it makes sense to determine the values of 2 and 2 s u c h that "~Pk(-2) = 0~2 and q/k(Z) = 82. Then the condition (84)
2 < _2
represents a sufficient stopping criterion, since this implies that each x corresponding to a 2 with i < 2 < )L satisfies constraint (83). The reader is also referred to Fig. 1. F o r our test problems we started with a given a and set g:= 0.999995" ~ , 8:= 1.000005. ~ . This ensures that the constraint (6) is satisfied to five significant digits. A s u m m a r y of the resulting procedure is given in Algorithm 5.
578
G.H. Golub and U.v. Matt
Algorithm 5. Solution of the constrained least squares problem (5) k:=0 done:= false compute 71 according to Algorithm 1 while not done do k:=k+ 1 compute 6k according to Algorithm 1 if 6k ~= 0 then
compute ~k+ 1 according to Algorithm 1 end
compute r and ~bk-1 of decomposition (13) solve 2~ak(2) = ~2 for 2 if 6k = 0 then done := true
elseif ~)k+ 1 = 0 then done:= true elseif (~k~k+ 1 ~--- 0 then done:= true else solve d~gk(,~ ) : ~2 for 2' if 2' < 2 then done:= true end end end
use Algorithm LSQR to solve (21) for x
7.4 Results Tables 1-3 present a series of test cases. The calculations have been performed in single precision (32 bit reals, about 7 decimal digits) on a Sun Sparcstation 1 + equipped with a Weitek 3170 floating point unit running at 25 MHz. The value of k indicates the number of iterations in Algorithm 5 until the convergence criterion was met. The column labeled "CPU-time" states the amount of time needed for this process excluding the time of Algorithm LSQR to solve (21) for x. Typically, the computation of x by LSQR requires up to twice as many iterations as the calculation of 2. In each test case the value of a, the lower bound for the singular values of A, was set to zero. In Table 1 the effect of an increasing a is shown. As ~ tends to I[A + b II2 the value of 2 decreases to zero. Simultaneously, the number of iterations k goes up as the condition number of the matrix in (21) becomes worse. Table 2 demonstrates the behaviour of Algorithm 5 for different singular value distributions. A rapid decrease of the singular values deteriorates the convergence of the algorithm. Finally, another set of test cases is presented in Table 3, where the singular values tend to duster at the endpoints of the interval [0, 2]. Because the value of
Quadratically constrained least squares
579
Table 1. Results for increasing m
n
ak
ct
[[A + b It2
2
k
CPU-time
20000 20000 20000
10000 10000 10000
e-k/looo
e-k/lOOO
106 107
e-k/lOOO
108
1.29" 101~ 1.29" 10 lo 1.29" 101~
2.70" 10-1 3.47" 10- 3 3.66" 10 -5
8 66 638
2 see 15 sec 194 sec
Table 2. Different singular value distributions rn
n
ak
a
IIA + b It 2
2
k
CPU-time
20000 20000 20000 20000
10000 10000 10000 10000
e -k/lO
4" 106
e -k/100 e -k/1000
4" 106 4" 106
2.02" 1012 5.94" 1012 1.29" 101~
e -k/lOOOO
4" 106
4.76" 106
2.34" 10 -4 2.22" 10 -3 2.15" 10 -2 4.79" 10 -2
138 79 27 11
32 sec 18 sec 6 sec 3 sec
Table 3. Results for increasing m and n rr/
r/
O"k
2000
1000
20000
10000
200000
100000
kTz 1 + cos-n+l kn 1 + cos-n+l kTz 1 + cos-n+l
O~
[IA + b I12
2
k
CPU-time
107
4.37" 10 9
9.54" 1 0 - s
782
104 sec
10 7
8.65" 101~
1.90" 10 -3
175
41 sec
107
4.65"1011
4.81" 10-2
35
76 sec
c~ w a s fixed, t h e p r o b l e m s b e c a m e m o r e w e l l - c o n d i t i o n e d as t h e d i m e n s i o n o f t h e matrix A was increased.
8 Conclusions W e h a v e d e s c r i b e d a t e c h n i q u e w h i c h will c o m p u t e t h e L a g r a n g e m u l t i p l i e r a s s o c i a t e d w i t h q u a d r a t i c p r o b l e m s in a n a c c u r a t e a n d efficient m a n n e r . As a sideeffect o f t h e c o m p u t a t i o n , we a r e a b l e t o c o m p u t e t h e s o l u t i o n v e c t o r a l m o s t simultaneously! F i n a l l y , it s h o u l d b e n o t e d t h a t w i t h o u t m u c h difficulty, t h e s e r e s u l t s c a n b e e x t e n d e d t o m o r e g e n e r a l s i t u a t i o n s as l o n g as a p r o p e r i n n e r p r o d u c t exists.
References 1. Dahlquist, G., Golub, G.H., Nash, St.G. (1979): Bounds for the Error in Linear Systems. In: Proc. of the Workshop on Semi-Infinite Programming, ed. R. Hettich. Springer, Berlin Heidelberg New York, pp. 154 172 2. Davis, P.J., Rabinowitz, P. (1984): Methods of Numerical Integration. Academic Press, Orlando 3. Gander, W. (1981): Least Squares with a Quadratic Constraint. Numer. Math. 36, 291-307 4. Gander, W., Golub, G.H., von Matt, U. (1989): A Constrained Eigenvalue Problem. Linear Algebra Appl. 114/115, 815-839
580
G.H. Golub and U.v. Matt
5. Golub, G.H. (1973): Some Modified Matrix Eigenvalue Problems. SIAM Review 15, 318-334 6. Golub, G.H. (1974): Bounds for Matrix Moments, Rocky Mountain. J. Math., 4, 207-211 7. Golub, G.H., Kahan, W. (1965): Calculating the singular values and pseudo-inverse of a matrix. SIAM J. Numer. Anal. 2, 205-224 8. Golub, G.H., O'Leary, D.P. (1989): Some History of the Conjugate Gradient and Lanczos Algorithms: 1948-1976. SIAM Review 31, 50-102 9. Golub, G.H., Van Loan, C.F. (1989): Matrix Computations, 2nd Ed. The Johns Hopkins University Press, Baltimore 10. Golub, G.H., Welsch, J.H. (1969): Calculation of Gauss Quadrature Rules. Math. Comp. 23, 221-230 11. Householder, A.S. (1975): The Theory of Matricesin Numerical Analysis. Dover Publications, New York 12. Karlin, S., Studden, W.J. (1966): Tchebycheff Systems: With Applications in Analysis and Statistics. Interscience Publishers, New York 13. Krylov, V.I. (1962): Approximate Calculation of Integrals. translated by A.H. Stroud. Macmillan, New York 14. Lawson, C., Hanson, R. (1974): Solving Least Squares Problems. Prentice-Hall, Englewood Cliffs, New Jersey 15. Paige, C.C. (1972): Computational Variants of the Lanczos Method for the Eigenproblem. J. Inst. Math. Appl. 10, 373-381 16. Paige, C.C., Saunders, M.A. (1982): LSQR: An Algorithm for Sparse Linear Equations and Sparse Least Squares. ACM Trans. Math. Softw. 8, 43-71 17. Paige, C.C., Saunders, M.A. (1982): LSQR: Sparse Linear Equations and Least Squares Problems. ACM Trans. Math. Softw. 8, 195-209 18. Parlett, B.N. (1980): The Symmetric Eigenvalue Problem. Prentice-Hall, Englewood Cliffs 19. Reinsch, C.H. (1971): Smoothing by Spline Functions. II. Numer. Math. 16, 451-454