Mathematical Programming 15 (1978) 77-86. North-Holland Publishing Company
ON SOLVING OPTIMIZATION CONSTRAINTS
PROBLEMS
WITH PROPORTION-
Uriel G. R O T H B L U M * Yale University, New Haven, CT, U.S.A. Received 2 November 1976 Revised manuscript received 21 February 1978
The purpose of this paper is to establish sufficient conditions for the existence of solutions to mathematical programs where the variables of the solution satisfy given proportions. These conditions rely on convergence properties of powers of nonnegative matrices when these powers form a bounded sequence. We assume that ff an arbitrary vector x is premultiplied by elements of this sequence, the limit of the sequence (Which might be a Cesaro (C, 1) limit) gives an improvement of the objective.
Key words: Proportion-Constraints, Symmetry, Converging Matrices, Mathematical Programming, Bounded Matrices.
1. Introduction We consider a mathematical program maximize
f(x),
subject to
x EX
(1)
where f is a real valued function defined on X c_ R". We study the existence of a solution to this problem that satisfies the given constraints a~jxi = b~jxj,
i, j = l .... , n,
i < j.
(2)
O b s e r v e that (2) represents ½(n 2 - n) equations with n variables; so, when n > 31 (2) is feasible only when it is degenerate. In fact, we study a case in which m a n y of the equations in (2) are degenerate, with a~j = b~j = 0, and therefore can be ignored. The important case where aiJ = bo = 1 for all i, j was considered by Tamir [14], w h o s e w o r k motivated this study. In this case (2) represents the constraint that x is symmetric, i.e., x~ is a constant for i = 1. . . . . n. This is a v e r y restrictive requirement. In most economic models partial symmetries is the best one can hope for. For example, in a p r o b l e m of optimal production one could expect that (after normalizing) the production of c o m p l e m e n t a r y products (like cameras and films) in an optimal production scheme equals. H o w e v e r , no such relation can be * This research was supported by NSF Grants ENG 76-15599 and ENG76-12266 and ONR Contract N00014-75-C-0493.
78
U.G. Rothblum/ Optimization with proportion-constraints
anticipated f r o m substituting products (like pens and pencils). Tamir's work generalized previous results of Greenberg and Pierskalla [2], Keilson [5,6], Samuelson [11], Tamir [14] and others.
2. Basic definitions and conventions
The following definitions are similar to those in [14]. L e t A be an n x n matrix. A set X C R n is called A-closed if for e v e r y x @ X, A x and e v e r y finite limit point of {ANx}N=O., .... is in X. A real valued function f defined on an A-closed set X is called A-majorizing if for e v e r y x E X, f(Ax)>-'f(x). If X is also convex, we say that f is average A-majorizing if for e v e r y x @ X , f ( ( N + 1)-l ~,~=oAix>-f(x) for all N that are sufficiently large. Of course, if X is convex and f is concave and A-majorizing, then f is also average A-majorizing. H o w e v e r , simple examples can be constructed of functions that are concave, continuous and average A-majorizing but are not A-majorizing. We next s u m m a r i z e some definitions concerning nonnegative matrices. Let A be an n x n nonnegative matrix. Motivated b y the theory of M a r k o v chains we will refer to the indices 1. . . . . n as states. We say that state i has access to (resp., from) state j if for some integer N >- O, (AN)`.i > 0 (resp., (AN)j,. > 0). T w o states i and j, each having access to the other, are said to communicate. The communication relation is an equivalence relation (e.g., [4, p. 60]). H e n c e we m a y partition the totality of states into equivalence classes. In the sequel a class will always mean a n o n e m p t y equivalence class of communicating states. We will use the relation of having access to (resp., from) a class, if there is access to (resp., from) some, or equivalently every, state in that class. The square submatrix of A corresponding to the states in a class J will be denoted Aj. We say that A is irreducible if the equivalence relation induces only one class, i.e., if all states communicate. If A is irreducible we call A aperiodic if A N is positive for some integer N ~ 0. A class J is called aperiodic if A j, which is irreducible, is aperiodic.
3. Boundedness, convergence and zero-convergence
We say that a square matrix A is bounded (resp., convergent) if the sequence {AN}N=0,, .... is bounded (resp., converging). We say that A is zero-convergent if limNo~ A N --- 0. A class J is called fundamental if A j is not zero-convergent. Before stating the next l e m m a that gives well known characterizations of boundedness, convergence and zero-convergence of a matrix in terms of its spectral properties we need some more definitions. Let A be a square matrix. The spectrum (resp., spectral radius) of A will be denoted by o'(A) (resp., r(A)). Also, if k -> 0 is an integer and A is a complex number, let N ~ ( A ) -- null(A - AI) k
U.G. Rothbluml Optimization with proportion-constraints
79
and let the index of A for A be defined by vA(A) = min{k -> O I N ~ ( A ) = N~+~(A)}. Of course, A E or(A) if and only if u~(A)> O. Lemma 3.1. Let A be a square matrix. Then: (a) A is bounded if and only if r(A) < - 1 and u~(A) < - 1 whenever ]A[ = 1. (b) A is convergent if and only if r(A) <- 1, vn(A) <- 1 and A ~ tr(A) for all A ~ 1 with ]AI = 1. (c) A is zero-convergent if and only if r(A) < 1. Proof. The proof of (a) follows from [8, T h e o r e m V] and the proofs of (b) and (c) are given in [7, Theorems 1 and 2]. It is well known that for e v e r y square matrix A, r ( A ) = r(A*) and u~(A)= v~,(A*) where A* stands for the adjoint matrix of A and A* is the complex conjugate of A. The following corollary is immediate from L e m m a 3.1. Corollary 3.2. A square matrix A is bounded, convergent and zero-convergent if and only if this is, respectively, true for A*. It follows from L e m m a 3.1 that a class J of A is fundamental if and only if r(Aj)-> 1. Thus, when r ( A ) = 1 the fundamental classes of A are precisely the basic classes of A, i.e., classes J with r(Aj) = r(A) (see [9, p. 283]). Unlike the concept of a basic class, the fact that a class is fundamental depends only on the internal structure of J and not on A. When A is nonnegative we can characterize boundedness, convergence and zero-convergence of A in terms of its class-structure as follows: Lemma 3.3. Let A be a square nonnegative matrix. Then (a) A is bounded if and only if r(A )<- I and no fundamental class has access to any other fundamental class. (b) A is convergent if and only if r(A)<- 1 and every fundamental class is aperiodic and has no access to any other fundamental class. (c) A is zero-convergent if and only if A has no fundamental classes. Proof. First notice that by (possibly) permuting rows and corresponding columns of A we may assume, without loss of generality, that A is block triangular, where the blocks on the diagonal of A are{Aj ] J is a class}. It follows that
o-(A) = U{o'(Aj) [ J is a class}.
(3)
To prove (a) it suffices, by L e m m a 3.1, to show that if r(A) <- 1, then uA(A) <- 1 for all A with [AI= 1 if and only if no fundamental class has access to any other fundamental class. If r ( A ) < 1, then uA(A)=0 for all A with [A[ = l and by (3), there exists no fundamental class. If r ( A ) = I, then the fundamental classes are
80
U.G. Rothblum/ Optimization with proportion-constraints
precisely the basic classes and the results follow directly from [12, T h e o r e m 3] or Rothblum [9, Corollary 3.4]. We next prove (b) by using L e m m a 3.1 and showing that if A is bounded, then A~o-(A) for e v e r y complex number A ~ I with IA]= 1 if and only if e v e r y fundamental class is aperiodic. First observe that when IA] = 1 and J is a non-fundamental class then A ~ o ' ( A j ) . So, by (3), it suffices to show that a fundamental class is aperiodic if and only if A ~ ~r(Aj) for e v e r y A ¢ I with [A] = 1. This fact follows from the P e r r o n - F r o b e n i u s T h e o r e m (e.g., [1, p. 53]). Finally, to prove (c) observe that (3) implies that r ( A ) = max{r(Aj)l J is a class}. So r ( A ) < 1 if and only if A has no fundamental classes and (c) follows directly from L e m m a 3.1. The next result shows that boundedness is equivalent to some convergence properties. Lemma 3.4. L e t A be a square matrix. Then A is bounded if and only if ( N + 1)-I ~ = 0 A j converges. Proof. Observe that for A ~ t r ( A ) , u~(A) is the maximal size of a block corresponding to A in the Jordan canonical form (e.g., [4, p. 42]). It follows that ( N + 1)-~ ~ = 0 A j converges if and only if for e v e r y A ~ o-(A) and k < u~(A), the sequence aN=(N+I)-I~
N
/ i X i-k i=k [k ) A '
N = O, I .... ,
converges (cf. [7]). It is easy to verify that the above sequence converges if and only if either IA[
(4)
U.G. Rothblum/ Optirnization with proportion-constraints
81
Proof. As A is bounded, L e m m a 3.1 implies that r(A)--< 1. We first consider the case w h e n r ( A ) < 1. In this case L e m m a s 3.1 and 3.3 assure that A has no fundamental classes, so ~ = 0 satisfies the conditions in (a). N e x t o b s e r v e that (c) presents a vacuous requirement. Finally, part (b) is trivial since in this case N 11(A) = {0}. We next a s s u m e that r ( A ) = 1. Let {J1 . . . . . L} be the (nonempty) set of fundamental c l a s s e s of A, which is precisely the set of its basic classes. By [9, T h e o r e m 3.1] and the fact (see L e m m a 3.1) that u1(A)-< 1, there exist nonnegative vectors ~ .... , ~:r which f o r m a basis of N I ( A ) and ~:k > 0 if and only if i has access to Jk. Obviously ~ ~ ~ [ = l ~k satisfies the requirements of (a). Also, (b) follows f r o m the fact that {~ . . . . . ~r} is a basis of N~t(A) and ~ = 0 for e v e r y i that has no access to Jk. Finally, let y ~ N I ( A ) and let i and ] have access to a fundamental class, say Jp, and have no access to any other fundamental class. So, ~ = ~ = 0 for all k ~ p . As {~1. . . . . ~r} is a basis of N I ( A ) there exist al . . . . . ar such that y = ~ [ = l a ~ k. L e t ~ = ~ , = l ~ k ; then Yl = ap~ e = Otp~i and y~ = a,~ p = Cep~j,which verifies (4) for ~ = ~[=~ ~*. The fact that (4) is satisfied for e v e r y ~ E N I ( A ) follows easily as ~ > 0 and ~j > 0. Finally o b s e r v e that L e m m a 3.3 implies that if i and ] are in the same fundamental class, then they have no access to any other fundamental class, so part (c) applies. W e r e m a r k that a method to construct eigenvectors of a nonnegative matrix is described in [9] and [10]. We conclude this section by giving a simple sufficient condition that a square nonnegative matrix is bounded. Lemma 3.6. Let A be a square nonnegative matrix. If there exists a positive vector x such that A x <- x, then A is bounded.
Proof. For N = 1, 2 . . . . . ANx <--x and therefore (AN)ij _< xi/xj, proving that A is bounded. Using Corollary 3.2, one can consider A x and obtain results corresponding to those in L e m m a s 3.5 and 3.6 with row vectors premultiplying A rather than column vectors postmultiplying A.
4. The main results
Before providing a sufficient condition for the existence of a solution to (1) that satisfies (2), we give a sufficient condition that allows one to add the constraint A x = x to (1) without changing this optimization problem. Theorem 4.1. L e t A be an n x n matrix, let X C R" be an A-closed set and let f be
82
U.G. Rothblum [ Optimization with proportion-constraints
a real valued continuous function on X. Also assume that one of the following two conditions holds: (a) X is convex, A is nonnegative and bounded and f is average A-majorizing, or
(b) A is convergent and f is A-majorizing. Then for every x ~ X, there exists a vector y @ X n N ](A ) with f ( y ) -> f ( x ). Remark. Tamir [14, T h e o r e m 1] showed the a b o v e results for an irreducible substochastic matrix under condition (b). Proof. We first prove the theorem under (a). Since A is bounded and nonnegative, L e m m a 3.4 implies that ( N + 1)-1 ~ = 0 Aix converges to some vector, say y. It is easy to verify that Ay = y; so y E N I ( A ) . We next show that y E X . It is well known (cf. [1, Corollary 1, p. 82]) that for some integer p all the f u n d a m e n tal classes of A p are aperiodic. So, by L e m m a 3.3, A p is convergent. It follows that if yi ~ l i m N ~ ( A , ) N A ~ x , ] = 0 . . . . . p - - 1 , then y = (~=oyJ)/p. As X is Aclosed, yJ E X for j = 0 . . . . . p - 1 and as X is convex this implies that y @X. Next, o b s e r v e that as f is continuous and average A-majorizing N
f(Y) = l~rn f[(N +
1)-l i~_,=oAix J >- f(x).
Finally to p r o v e the theorem under (b), let y = limN-~ ANX. Then y E X ANJ~(A) and the fact that f is continuous and A-majorizing implies that f ( y ) = l i m N ~ f(ANx) >--f(x). We next provide a simple sufficient condition that implies that 0 solves (1). Obviously, 0 also satisfies (2) for e v e r y choice of aij's and bit's. Theorem 4.2. If (b) in Theorem 4.1 is strengthened by assuming that A is zero-convergent, then f(O)>-f(x) for every x @ X . Thus (1) has an optimal solution which satisfies (2) for every choice of {aij [ 1 <- i < j <_ n} and {bij I 1 <- i <
j<-n}. Remark. The a b o v e results for substochastic matrices are shown in [14]. Proof. If A is zero-convergent, then for e v e r y x E X the vector y constructed in the proof of T h e o r e m 4.1 is the zero vector, so f(0)-> f ( x ) for all x E X. Remark. Observe that part (b) in T h e o r e m 1 does not require that A---0. H o w e v e r , in the examples we are concerned with, A is usually nonnegative. In this case one can verify that A is convergent by using L e m m a 3.3. N o t e that the assumptions of T h e o r e m 1 do not suffice to p r o v e the existence of a solution to (1). L e t n = 1, X = (0, 1), A be the identity and f ( x ) = x for all
U.G. Rothblum / Optimization with proportion-constraints
83
x @ (0, 1). Of course, X is A-closed and f is continuous and A-majorizing, though (1) has no optimal solution. We next show that if (1) with the added constraint A x = x has a solution, then this solution solves (1). We then obtain a sufficient condition on a~i, b~j, 1 <- i < j ~ n, that suffices to assure the existence of a solution to (1) that satisfies (2). Following Tamir [14], who considered irreducible aperiodic substochastic matrices, we next combine Theorem 4.1 and L e m m a 3.5 to establish the existence of a solution to (1) under proportional constraints. Theorem 4.3. I f the a s s u m p t i o n s o f T h e o r e m 4.1 hold and in addition there exists a solution to the p r o g r a m
maximize
f(x),
subject to
x E X,
(5) A x = x,
then such a solution also solves (1). M o r e o v e r if ~ is given by part (a) of L e m m a 3.5, and z solves (5), then z also satisfies (2) with ais = ~:1 and bij = ~ l f o r pairs {i,j} that have access to the s a m e single f u n d a m e n t a l class and aij = bij = 0 otherwise.
Proof. The proof follows directly from Theorem 4.1 and L e m m a 3.5.
5. Examples and applications of the main results Let e = (1 . . . . . 1)T where the dimension of e is apparent from the context. A nonnegative matrix A is s u b s t o c h a s t i c if A e <--e, stochastic if A e = e, doubly stochastic if both A and A T are stochastic, and A is a p e r m u t a t i o n matrix if it is doubly stochastic and A~i E {0, 1} for all pairs (i, j). Obviously, every substochastic matrix is bounded (cf., L e m m a 3.6). In the context of our main results, permutation matrices were considered by Greenberg and Pierskalla [2], and doubly stochastic m a t r i c e s - - b y Tamir [13]. These results (and others) were unified by Tamir [14] who analyzed the stochastic and substochastic cases. We next apply our results to the substochastic case and generalize Tamir's study. For a substochastic matrix, the fundamental classes are precisely the ergodic classes, i.e., classes J with A j e = e (cf., [1, p. 83]). We next apply results from Section 3 to obtain some spectral properties of stochastic and substochastic matrices. Lemma 5.1. I f A is substochastic, then every x @ N ] ( A ) satisfies xi = x s f o r all states i and j that are in the s a m e ergodic class. I f , in addition, A is stochastic, then every x @ N ] ( A ) satisfies x~ = xj f o r all states i and j that have access to s o m e ergodic class but have no access to any other ergodic class.
U.G. Rothblum [Optimization with proportion-constraints
84
Proof. Let J be an ergodic class of A. Let L be the set of states i ~ J that have access to J and let M be the set of states having no access to J. By possibly permuting rows and corresponding columns of A we may assume that
( A! A=
ALj Aj 0
~M) AM /
where AL, ALj, ALM, As and AM are the corresponding submatrices of A. Of course, AL has no ergodic classes, so r(AL)< 1 and therefore I - - A L is invertible. It is easy to verify that ~ J = ( ~ , ~ , ~ t ) T= ((I-AL)-~AL~e, e, 0)W~ NI(A). L e m m a 3.5 now shows that xi =xj for all x E N~(A) and i, j E J. N e x t assume that A is stochastic, so Ae = e, i.e., e ~ N~(A). Assume that i and j have access to some ergodic class J and have no access to any other ergodic class. Again, L e m m a 3.5 shows that xi = xj for all x E NI(A).
Theorem 5.2. Let A be an n x n stochastic matrix, X C R ~ an A-closed set and f a real valued continuous function on X. If either (a) or (b) of Theorem 4.1 is satisfied and in addition (5) has an optimal solution, then there exists a solution to (1) such that xi = xj for all states i and j in the same ergodic class. Moreover, if A is stochastic then there exists a solution to (1) such that xi = xj for all states i and j that have access to the same ergodic class and have no access to any other ergodic class. Proof. The p r o o f f o l l o w s directly f r o m T h e o r e m s 4.2 and 4.3 and L e m m a s 5.1 and 3.5. T h e o r e m 5.2 gives results concerning partially symmetric solutions to (1), i.e., solutions x to (1) for which xi coincide for i in certain subsets of {1 . . . . . n}. Symmetric solutions to (1), i.e., solutions x for which all xi coincide, were studied by Tamir [14]. Tamir considered two cases: the first case is when A is stochastic, has a unique ergodic class and possibly one additional non-ergodic class; the second is when A is substochastic and r ( A ) < 1. In the first case N](A) = {ae ] a is real} and in the latter N](A) = {0}. In both cases every element in N](A) is s y m m e t r i c and Tamir's results follow f r o m our development. We r e m a r k that Tamir gives a simple proof to an inequality due to Keilson by using his results for stochastic matrices. We next motivate our results by applying them to a Social-Economic model. We generalize Tamir's study of a more restricted model. Consider a society, consisting of n individuals, which is interested in determining a distribution of wealth that maximizes its welfare function f. L e t X C R~ be the set of possible wealth distribution, with x E X indicating that the wealth given to the i th individual is xi. We assume that X is convex, closed and bounded and f continuous and c o n c a v e on X. We also assume that the society has a linear
U.G. Rothblum/ Optimization with proportion-constraints
85
distribution mechanism, i.e., there exists a nonnegative matrix B such that when the wealth distribution is given by x, it is redistributed and the new wealth distribution is Bx. We assume that the welfare function is not decreased by this redistribution. If i and j are two individuals, we say that i provides wealth to i if i has access to j, i.e., wealth distributed to i yields (after a finite time period) wealth to i. Classes in this context are called social classes and consist of sets of individuals providing wealth to each other. A fundamental class J is called self-supporting. A class is self-supporting if and only if there exists a positive vector x with A1x >-x. Our theory clearly applies to this model. 1 This means that we can predict proportions between the wealth levels in a wealth distribution that maximizes the welfare function. Such proportion can be predicted for individuals in the same self-supporting class or for those that provide wealth to some selfgupporting class but not to any other such class. When X = {x E R~_ [ xe < w}, i.e., w wealth units are distributed among the members of the society, the requirement Bx ~ X implies that B x is substochastic. If in addition B is doubly stochastic, i.e., also Be = e, we obtain subsets of the society which get the same allocation in an optimal wealth distribution. A special case of this model in which B is irreducible, aperiodic and doubly stochastic was considered by Tamir [14].
References [1] F.R. Gantmacher, The theory of matrices, part II, translated from Russian by K.A. Hirsh (Chelsea Publishing Company, New York, 1971). [2] H.J. Greenberg and W.P. Pierskalla, "Symmetric mathematical programs", Management Science 16 (1970) 309-312. [3] S. Karlin and H.M. Taylor, A first course in stochastic processes, 2nd Ed. (Academic Press, New York, 1975). [4] T. Kato, Perturbation theory for linear operators (Springer, New York, 1966). [5] J. Keilson, "A theorem on optimum allocation for a class of symmetric multilinear return functions", Journal of Mathematical Analysis and Applications (1966) 269-272. [6] J. Keilson, "On global extrema for a class of symmetric functions", Journal of Mathematical Analysis and Applications 18 (1967) 218-228. [7] R. Oldenburger, "Infinite powers of matrices and characteristic roots", Duke Mathematics Journal 6 (1940) 357-361. [8] A. Ostrowski, "(lber Normen von Matrizen", Mathematische Zeitschrift 63 (1955) 2-18. [9] U.G. Rothblum, "Algebraic eigenspaces of nonnegative matrices", Linear Algebra and its Applications 12 (1975) 281-292. [10] U.G. Rothblum, "Computation of the eigenprojection of a nonnegative matrix at its spectral radius", Mathematical Programming Study 6 [Stochastic systems: modeling and optimization, II Proceedings of the Lexington Symposium on Stochastic Systems] (1976) 188-201. l Observe that in this model {Bt~x}N=0,1,... is bounded for every x ~ X , but {BN}N=0,L... is not necessarily bounded. Our proofs follow under this weaker condition.
86
U.G. Rothblum/ Optimization with proportion-constraints
[11] P.A. Samuelson, "General proof that diversification pays", Journal of Financial Quarterly Analysis 2 (1967) 66-84. [12] H. Schneider, "The elementary divisors, associated with 0, of a singular M-matrix", Edinburgh Mathematical Proceedings 10 (1956) 108-122. [13] A. Tamir, "Symmetry and diversification of interdependent prospects", Tech. Rept., Northwestern University (1975). [14] A. Tamir, "Ergodicity and symmetric mathematical programs", Mathematical Programming 13 (1977) 81-87.