Conditions for characterizing the structure of optimal strategies in infinite-horizon dynamic programs

The study of infinite-horizon nonstationary dynamic programs using the operator approach is continued. The point of view here differs slightly from th...

0 downloads 18 Views 791KB Size

Download PDF

JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS: Vol. 36, No. 3, MARCH 1982

Conditions for Characterizing the Structure of Optimal Strategies in Infinite-Horizon Dynamic Programs E. PORTEUS 1

Communicated by G. Nemhauser

Abstract. The study of infinite-horizon nonstationary dynamic programs using the operator approach is continued. The point of view here differs slightly from that taken by others, in that Denardo's local income function is not used as a starting point. Infinite-horizon values are defined as limits of finite-horizon values, as the horizons get long. Two important conditions of an earlier paper are weakened, yet the optimality equations, the optimality criterion, and the existence of optimal "structured" strategies are still obtained.

Key Words. Dynamic programming, operator approach, optimality equations, optimality criterion, structured policies.

1. Introduction

This paper continues the study of infinite-horizon nonstationary dynamic programs using the operator approach. Operators have long been used in the analysis of dynamic programs [see Shapley (Ref. 1), Bellman (Ref. 2), and Blackwetl (Refs. 3 and 4), for example]. Usually, a basic model is first presented and the operators are defined in terms of that model (see Refs. 5-12, for example). Denardo (Ref. 13), however, essentially began with the operators as primitive objects of the model, by introducing what is now called the local income function (through which the operators are defined). He showed how many existing models can be encompassed by such an approach. Others have subsequently also started with a local income function. For instance, Fox (Ref. 14) developed finite1 Professor of M a n a g e m e n t Science, G r a d u a t e School of Business, Stanford University, Stanford, California.

419 0022-3239/82/0300-0419503.00/0 © 1982 PlenumPublishingCorporation

420

JOTA: VOL, 36, NO. 3, MARCH 1982

state approximations to denumerable-state problems; Porteus first (Ref. 15) derived bounds on the optimal value function and then (Ref. 16) studied nonstationary problems where infinite-horizon values are defined as limits of the finite-horizon values; Kreps and Porteus (Ref. 17) and Bertsekas (Ref.. 18) extended the analysis from contracting processes to positive and negative ones; and Whitt (Refs. 19, 20) continued the study of finite-state approximations to large-scale problems. References 16, 18, and 21 add models that can be analyzed by this framework. This paper continues the studies in this tradition, with a slightly different starting point, which is more closely aligned with (but not as general as) that of Sch~il (Ref. 22), who studies negative problems. In particular, we assume that the finitehorizon returns are primitive objects of the model and that they can be represented by certain strategy indexed operators. These operators cannot necessarily be represented by using a local income function, although operators so defined are included in our approach. As in Refs. 16-18 and 22, the infinite-horizon values are defined as limits of the finite-horizon values. The contribution of this paper is to weaken one of the important conditions in Ref. 16 and eliminate one of the others while still obtaining the optimality equations, the optimality criterion, and the existence of optimal strategies with a certain structural form. The weakenings made in this paper extract a cost: two conditions are strengthened and two new ones are added. However, these new conditions are often easily verified in applications. An interesting anomaly is that, under the basic conditions given, the infinite-horizon values do not necessarily exist for any strategies except for the optimal ones. When the appropriate condition from Ref. 16 is reinstated, this incongruity is eliminated.

2. General Definitions and Assumptions For the purposes of this paper, the primitive objects of a (nonstationary, denumerable stage) dynamic program are the state spaces f~t, for each t = 1, 2 . . . . . the admissible policy spaces At, for each t, the admissible strategy space FI = A1 × A2 ×" • ' [so that, if ~r ~ l-I, then ~r = (St, 8 ~ , . . . ) and 8t ~ At, for each t], and the finite-horizon value functions v t (~'), for each 7r~F[ and all positive integers t and n such that t<-n. Each vT(zr) is an extended real-valued function defined on l~, and is called the value of zr for stages t through n. We assume henceforth that, for each t, there exists a metric space (pt, lit) and an operator Ht(cr) for each ~r (~ II) such that

JOTA: VOL. 36, NO. 3, MARCH 1982

421

each v in Vt is an extended real-valued function defined on fit, H~(~r) : V,+I ~ V~,

for each or,

vT(Ir) = H~(~r)vt+l (or),

for each t -< n and ~,

Ht(or) depends on or only through 6~ [i.e.,/-/t(rr) has the representation H~,], and pt has the representation

Or(U, v ) = sup tx~(x)tu(x), v(x)I, where ~t(x)>0,

for all x c ~,,

and, for extended real numbers a and b, 0, la, bt := l a - b [ ,

if a, b = -o0 or a, b = co, otherwise.

We permanently suppress t from p and ~. This metric corresponds to the weighted supremum norm introduced in this context by Wessels (Ref. 23) and used by Van Nunen and Wessels (Ref. 24) and Whirr (Ref. 20). It easily allows the consideration of certain unbounded value functions, which arise in contextual applications such as inventory theory. As indicated by Porteus (Ref. 25), the weighted supremum norm corresponds in certain cases to the similarity transformation introduced in this context by Veinott (Ref. 5) and discussed by Porteus (Ref. 26). In applications, there usually exists a set Dr(x), called the set of admissible decisions for stage t at stage x, for each x ~ Ftt and each t, such that 8 ~ At implies 8(x)~ Dt(x). T h e r e also usually exists a local income function h~ for each t such that

[H,~v](x) = h,(x, •(x), v). However, these assumptions are not formally necessary for the developments here. Indeed, attempting to define a local income function via

h,(x, y, v) = [Ht~v](x), for all 6 such that 6 ( x ) = y, may fail as we have not assumed that the operator Ht~ can be decomposed to yield an unambiguous definition. Following Ref. 16, let f7 := sup v~'(0r),

for all t-< n,

called the optimal value for stages t through n. All definitions in this

422

JOTA: VOL. 36, NO. 3, MARCH 1982

paragraph are made pointwise, so that, for instance, for x ~ ~t.

f7 (x) := sup[v 7 (¢r)](x), The optimal value operator At is defined by

Atv := sup H~(~r)v = sup Ht~v, ¢reIl

for each v e V,+I.

8~A~

Let ~Tt(~r) := lim sup vT(~r), n -~cx3

vt(~') := lira vT(~-) n -~¢X3

(when it exists), which is the infinite-horizon value of rr starting at t, and

[t := sup ~3t(~r), "tr~n

which is the optimal infinite-horizon value starting at t. A strategy ~r is E-optimal for stages t through n if

p(vT(~r), fT)<-¢ E-optimal for stages t on if

o(vt(lr),[,)<-E, and E-optimal if p(vt(~r), L) -< E,

for all t.

We suppress e when it equals zero. The use of *3t(*r) instead of vt(Tr) to define ft is intentional: ft always exists; and, if a strategy ,r* is optimal, then it satisfies

vt(Tr*) >--~t(Tr),

for all 7r and t.

Thus, an optimal strategy or* satisfies the following: its infinite-horizon value exists for all starting stages, and v,(Tr*) -> ~t(Ir),

for all ~r

(its infinite-horizon value exceeds the limit superior of all values from all other strategies). Define A7 by

A~'v := AtAt"+lv,

for t ~ n - 1,

A:v := Any, and define H7 (or) analogously. W e are interested in conditions under which an optimal strategy exists and in characterizing the optimal infinite-horizon value functions. In contex-

JOTA: VOL. 36, NO. 3, MARCH 1982

423

tual applications, such as operations management, engineering, economics, finance, and marketing, we are often interested in characterizing the form of the optimal strategies. We may conjecture the form [such as (s, S) policies in inventory theory and control limit policies in maintenance theory] and seek conditions that guarantee that there exist optimal policies with that form. Thus, a researcher may identify a nonempty subset A~ of At, called the set of structured policies for each t, and consider l-I* := ,~* x , ~ * x . • • ,

the set of structured strategies. It will be convenient to consider here H * := A * x A * x

o • • xA*

x ~ , + ~ x ~,+2 x " " ,

the set of t-structured strategies, those whose policies are structured for the first t stages. References 16 and 17 deal directly with conditions under which there exists an optimal strategy in 17". Reference 16 deals roughly with contracting dynamic programs, whereas Ref. 17 deals with the essentially positive and negative cases. This paper primarily weakens the conditions in Ref. 16.

Assumption FH. Finite Horizon. (a) Ht(~r) is isotone on Vt+l for each ¢r and t; i.e., u, v ~ V,+I and u --
for u, v ~ Vt+l.

(c)

For each t, e > 0, and v s V~+I, there exists ~ E At such that

(d) (e)

p(H, av, Atv ) <- E. At: V~+I~ Vt. There exist sequences { V~ } and {u~} such that ¢#V~C~,

v,"+l (~')= u,+l, (f)

utEV~,

and

i.e., vT(#) =HT(~r)u~+l.

There exists a sequence {V,* } such that

¢ #

v*

c

v,°;

for each v ~ V~*¥1 and e > 0, there exists 3 ~ At* such that p(H, sv, A,v) <- E;

424

J O T A : VOL. 36, NO. 3, M A R C H 1982

and, for each t, there exists N such that n > N implies n

At

0

: V n + l --> VSt.

We will use the following assumption later to establish the existence of structured optimal strategies.

Assumption SA. Structured Attainment. there exists 8 ~ A* such that

For each t and v ~ Vt*+I,

Ht~v = A t v .

V* is called the set of structured value functions, V~t is the set of regular value functions, and ut is a terminal-value function. FH(e) requires that the finite-horizon value functions can be represented as the application of operators on a terminal-value function and that each terminal-value function is regular. The critical property of a regular-value function is given in (f): if v is regular and n is large, then ATv must be structured. This is called a preservation property. The critical property of structured value functions is called the E-attainmentproperty and is also given in if): if v is structured, then the value of the optimal value operator (applied to v) can be e-attained by an operator associated with a structured policy. One of the important weakenings of Ref. 16 is that we now require only that ut ~ V°t, rather than that ut ~ V* (in our context where V* C V ° ). Because of the E-attainment property, often in applications Vt* will be relatively small. For instance, structured-value functions may possess convexity or monotonicity properties. If the terminal-value functions used to define the finite-horizon values must have such properties, the question naturally arises as to whether different results would be obtained if the terminal-value functions had less structure. Might the infinite-horizon values be higher? Might there no longer exist optimal policies that are structured? Our weakening here will help to address these questions. This weakening is not made without cost, as FH(c) and FH(d) are new. However, these conditions are often easily satisfied in applications.

3. Finite-Horizon Results Lemma 3.1. If Assumption F H holds, then the following hold: (a) At is isotone and at-Lipschitzian on Vt÷I for each t. (b) For each n, v ~ V,~+I, and E > 0, there exists v ~ H such that

p(H'~(.'rr)v, ATv)<-E,

for all t_
425

JOTA: VOL. 36, NO. 3, M A R C H 1982

(c)

0

For each t, there exists N such that, if n ->N and v 6 Vn+l, then

A~v = sup H~, (zr)v = sup H ~ (~r)v s V*; ~Erl

~reH*

and, for each e > 0, there exists v ~ H~* such that

p(H'~(zr)v,A'~v)<-e,

for alt k_
Proof. The method of Denardo's (Ref. 13) proof applies to (a). Parts (b) and (c) follow almost directly from T h e o r e m 1 in Ref. 16. [] Theorem 3.1. If Assumption F H holds, then for each t there exists N such that, for n ~ N, the following hold: (a) f~ is structured and equals A~Un+I for all k -< t. (b) For each e > 0, there exists a t-structured strategy that is e-optimal for stages k to n, simultaneously for all k -> t. (c) f~ = Akf~+l for all k -< t.

Proof. By substituting u~÷l for v and using FH(e), parts (a) and (b) follow from (b) of L e m m a 3.1. Part (c) then follows from (a). []

4. Additional Assumptions The infinite-horizon conditions that follow will always be used in conjunction with Assumption FH.

Assumption IH.

Infinite Horizon. For each t, the following condi-

tions hold: (a) (b)

V* is complete. There exist real numbers Mt and N (=Nt) such that

p(H'] (~r)u, H7 (~)v) <-Mr,

for all n - N ,

0

7r ~ H*, and u, v s Vn+i. (c) There exists a sequence {aT; n-~t} of real numbers such that H 7 (or) is a 7-Lipschitzian on V,+~ for all ~r e II~ and n -> t, and o~7Mn+, -~ 0,

as n - e e e .

The following assumption was in effect in Ref. 16. Major results will be obtained without it here. However, the additional results that can be obtained with it will also be indicated.

426

JOTA: VOL. 36, NO. 3, M A R C H 1982

Assumption RP.

Regularity Preservation.

For each t, the following

conditions hold: (a)

There exists N such that HT(¢r): V,,+l o ->Vt,o

(b)

for all n ->N and ~r ~ II *.

Vt° is complete.

Our next assumption is used after establishing the basic results. In applications, it may be convenient to establish basic existence and uniqueness results prior to considering fully characterizing the form of the optimal strategies. The researcher may identify a nonempty subset At** of At*, called the set of specially structured policies for each t, and consider R * * :=

A 1* * x A *2 * x

"

-.

,

the set of specially structured strategies. The researcher may speculate that there exists an optimal specially structured strategy. The following assumption will allow that speculation to be verified.

Assumption SS. Specially Structured Strategies. For each t, there exists a nonempty set V~*, called the set of specially structured value functions, such that the following conditions hold: (a) Vt** C Vt*. (b) For every v e V*+*I, there exists 8 ~ At** such that Htav = Atv. (c) There exists N such that A~: V,+I** --> V** for all n ->N. (d) V** is complete.

Assumption S. Stationarity. There exists a state space l'l, an admissible policy space A, a metric space (p, V), and an operator H8 for each 8 ~ A such that the following hold: (a) (b) (c)

f~t = II, At = A, pt = p, and Vt = V for each t. H=AxAxAx-... /'L~v = Hsv for each t, each 6 ~ A, and v s V.

The specializations of the previous assumptions when S holds are obvious and therefore not listed.

5. Infinite-Horizon Results Lemma 5.1. (a)

If Assumptions F H and IH hold, then the following hold:

If {vt} and {vt} are arbitrary sequences of regular value functions

JOTA:

VOL.

36, N O .

3, M A R C H

427

1982

and 7r e H*, then o ( H , n (rr)v~+l, H i " (¢r)v .+1 )--> O,

(b)

as n --> oo.

For each t, there exist real numbers M, and N (=Nt) such that n

p ( A t u , A'~v)<-M~

foralln>-Nandu,

0

v ~ Vn+l,

(c) For each t, there exists N such that, for n - N , A~ is n • • • 0 a t -Llpschltman on Vn+l. (d) If {v,} and {v t} are arbitrary sequences of regular value functions, then, for each t,

tg~xq-.tl)n+l~2qktU

)"~0~

a s H~ / ~ ---> 0 0 ,

(e) f ~ := lim~_~f~' exists, is structured, and independent of the choice of regular terminal value functions (used to define the finite-horizon values) for each to (f) f~t = A tft+l for each t. (g) {f~} is the unique sequence of regular value functions that satisfies (f). (h) If strategy zr e II* satisfies Ht('rr ),t~+l

= Atf~+ i,

for each

then vt(Tr) =]~t,

for each t.

Proof. Part (a) follows directly from IH(b) and IH(c). Denardo's method of proof (Ref. 13) applies to (b) and (c), this time using Lemma 3.1 (c). Part (d) uses (b) and (c), but is otherwise similar to (a). For (e), we show that, for each t, {ft'; n - t} forms a Cauchy sequence using (d). The result follows, since V* is assumed to be complete, and from another application of (d). Part (f) follows from (c) and (e), whereas (g) follows from (d), and (h) from (a). Lemma 5.2. hold:

If Assumptions FH, IH, and RP hold, then the following

(a) If {vt} and {v'} are arbitrary sequences of regular value functions and zr 6 H*, then, for each t, o ( H t " (Tr)v,+l, H t "~(rr)v ,~+I )-->0,

asn, rn-->oo.

(b) vt(rr) exists, is regular, and is independent of the choice of regular terminal value functions, for each t and each rr 6 I1".

428

JOTA: VOL. 36, NO. 3, MARCH 1982

Proof. The proofs of (a) and (b) are similar to those of L e m m a 5.1(d) and 5. l(e), respectively. Theorem $.1. hold:

If Assumptions FH and IH hold, then the following

(a) For each t and • > 0, there exist N and a strategy 7r e l-I* that is e-optimal for stages k through n and also satisfies

simultaneously for all k -< t and n -> N. (b) ft =f~t = s u p ~ n * 5,(~r), for each t. (c) ft is structured and independent of the sequence of regular terminal value functions, for each t. (d) The optimality equations hold:

f, = Atft+l,

for each t.

(e) {ft} is the unique sequence of regular value functions that satisfies (d). (f) The optimality criterion holds: If a strategy ~r e II* satisfies

]-It(lr)ft+l

--

A~ft+t,

for each t,

then ¢r is optimal. (g) If, in addition, Assumption SA holds, then there exists an optimal structured strategy. Proof. Part (a) uses an appropriate selection of en for n >--1 and of zr ~ H* such that cg~

p(H,(zr)f~+i,A,f,+l)<-En,

for n-->l.

The result then follows from an application of FH(b), IH(c), Lemma 5.1 (a), and Lemma 5.1(e). Part (b) follows, in part, from the fact that, for every E > 0, there exists ¢r E FI* such that p(f,, v,(~')) <- E, for all sufficiently large n. Parts (c), (d), (e), and (f) follow directly from Lemma 5.1, since f~ = f~,

for all t.

Part (g) follows from an application of Assumption SA to (f).

D

JOTA: VOL. 36, NO. 3, MARCH 1982 Theorem 5.2. ing hold:

429

If Assumptions FH, RP, and IH hold, then the follow-

(a) vt(zr) exists, is regular, and is independent of the choice of regular terminal value functions, for each t and each zr ~ H*. (b) v,(~r) =/-L(zr)v,+l(zr) for each t and each ¢r ~ H*. (c) {vt(zr)} is the unique sequence of regular value functions that satisfies (b). (d) For each t and e > 0 , there exists a structured strategy that is e-optimal for stages k on, for all k -< t. (e) If zr is structured and optimal, then

Ht(Tr)ft+l = A~ft+l,

for each t.

Proof. The proofs are similar to those for Lemma 5.1, except part (e), which follows from (b) and Theorem 5.1 (d).

Corollary 5.1. ing hold:

If Assumptions FH, IH, and SS hold, then the follow-

(a) ft is specially structured for each t. (b) For each t and e > 0, there exist N and a specially structured strategy that is e-optimal for stages k through n, simultaneously for all

k<_tandn>_N. (c) f~ = sup~n** ~,(Ir), for each t. (d) There exists an optimal specially structured strategy. Proof. We can assume without loss of generality, by Theorem 5.1 (c), that the terminal-value functions are specially structured. Each ft must therefore be in V** too. The remaining results are derived as were the analogous ones (for structured policies) in Theorem 5.1.

Corollary 5.2. If Assumptions FH, IH, and S hold, then the obvious specializations of Theorem 5.1 hold. In particular, the following hold: (a) There exists f e V* such that ft = f for all t. (b) f is the unique regular fixed point of A. (c) If 6 cA* satisfies H ~ f = A f , then rr = (8, 6 . . . . ) is an optimal (stationary) strategy. (d) If, in addition, Assumption SA holds, then there exists an optimal structured stationary strategy.

430

JOTA: VOL. 36, NO, 3, MARCH 1982

The other specializations, such as of Theorem 5.2, where S holds, are obvious and not listed. Corollary 5.3.

If Assumption IH(b) is changed to

p(u,v)<-M,, forallu, v s V °, o and r~÷~ is replaced by Vn÷~ in Assumption IH(c), then the results in this paper that depend on Assumption IH continue to hold. Proof. The major change is in the proof of Lemma 5.1(a), which is simplified. Lemma 5.1 (b) does not hold, but is unnecessary. The remaining results hold, including Lemma 5.2. []

6. Conclusions

This paper has established the optimality equations, the optimality criterion, and the existence of optimal structured strategies under conditions that are essentially weaker than those in Ref. 16. The two important weakenings are (i) that the terminal value functions used to define the finite-horizon values need only be regular, rather than structured, and (ii) that the regularity preservation assumption need not hold. Assumptions FH(c) and FH(d) are new, and two assumptions are strengthened: Assump0 tions FH(f) requires that A t : V,+I ~ V* for sufficiently large n, and 0 Assumption IH(c) uses V~+I in place of Vn+l. Thus, the conditions here are not formally weaker than those in Ref. 16. Some relatively minor assumptions in Ref. 16 are not assumed here: In addition to the fact that we do not assume that there exist decision sets Dr(x) for each t and x e ~t and local income functions ht(x, y, v) for each t, x ~ , y EDt(x), and v Vt÷~, we do not assume that structured-value functions are finite, or that At: V't+1 ~ V* for each t. We have dropped the notion of reduced state spaces, for clarity of presentation and understanding. They could easily be reinstated. For an immediate application of the results in this paper, consider the financial model in Ref. 27. P. Jackson has pointed out to this author that Assumption RP does not necessarily hold there: The regular-value functions are specified as being nonnegative, yet the application of operators corresponding to the structured policies specified therein yields functions that are not necessarily nonnegative. Thus, until now, the results in Ref. 27 have been incompletely verified. Since Assumptions FH, IH, and SA were verified, the results are now solidified, by use of Theorem 5.1.

JOTA: VOL. 36, NO. 3, MARCH 1982

43~

References 1. SHAPLEY, L., Stochastic Games, Proceedings of the National Academy of Sciences, Vol. 39, pp. 1095-1100, 1953. 2. BELLMAN, R., Dynamic Programming, Princeton University Press, Princeton, New Jersey, 1957. 3. BLACKWELL, D., Discounted Dynamic Programming, Annats of Mathematical Statistics, Vol. 36, pp. 226-235, 1965. 4. BLACKWELL, D., Positive Dynamic Programming, Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, VoI. 1, University of California Press, Berkeley, California, 1967. 5. VEINOTT, A., JR., Discrete Dynamic Programming with Sensitive Discount Optimality Criteria,Annals of Mathematical Statistics, Vol. 40, pp. 1635-1660, 1969. 6; HINDERER, K., Foundations of Non-Stationary Dynamic Programming with Discrete Time Parameter, Springer-Verlag, New York, New York, 1970. 7. BLACKWELL, D., FREEDMAN, D., and ORKIN, M., The Optimal Reward Operator in Dynamic Programming, Annals of Probability, Vol. 2, pp. 926-941~ 1974. 8. FREEDMAN, D., The Optimal Reward Operator in Special Classes of Dynamic Programming Problems, Annals of Probability, VoL 2, pp, 942-949, 1974. 9. SCHAL,M., Conditions for Optimality in Dynamic' Programming and for the Limit of n-Stage Optimal Policies to Be Optimal, Zeitschrift Wahrscbeinlichkeitstheorie verwandte Gebiete, Vol. 32, pp. 179-196, 1975. t0. FEDERGRUEN, A., SCHWEITZER, P., and TIJMS, H., Contraction Mappings Underlying Undiscounted Markov Decision Problems, Journal of Mathematical Analysis and Applications, Vol. 65, pp. 711-730, 1978. 11. SHREVE, S., and BERTSEKAS, D., UniversallyMeasurable Policiesin Dynamic Programming, Mathematics of Operations Research, Vol. 4, pp. 15-30, 1979. 12. KLEIN HANEVELD, W., On the Behavior of the Optimal Value Operator of Dynamic Programming, Mathematics of Operations Research, Vol. 5, pp. 308-320, 1980. 13. DENARDO, E., Contraction Mappings in the Theory Underlying Dynamic Programming, SIAM Review, Vol. 9, pp. 165-177, 1967. 14. FOX, B., Finite-StateApproximations to Denumerable-State Dynamic Programs, Journal of Mathematical Analysis and Applications, Vol. 34, pp. 665-670, 1971. 15. PORTEUS, E., Some Bounds for Discounted Sequential Decision Processes, Management Science, Vol. 18, pp. 7-11, 1971. 16. PORTEUS, E., On the Optimality of Structured Policies in Countable Stage Decision Processes, Management Science, Vol. 22, pp. 148-157, 1975. 17. KREPS, D., and PORTEUS, E., On the Optimality of Structured Policies in Countable Stage Decision Processes, H: Positive and Negative Problems, SIAM Journal on Applied Mathematics, Vol. 32, pp. 457--466, 1977. 18. BERTSEKAS, D., Monotone Mappings with Application in Dynamic Programing, SIAM Journal on Control and Optimization, Vol. 15, pp. 438-464, 1977.

432

JOTA: VOL. 36, NO. 3, MARCH 1982

19. WHITT, W., Approximations of Dynamic Programs, I, Mathematics of Operations Research, Vol. 3, pp. 231-243, 1978. 20. WHITT, W., Approximations of Dynamic Programs, II, Mathematics of Operations Research, Vol. 4, pp. 179-185, 1979. 21. KREPS, D., and PORTEUS, E., Dynamic Choice Theory and Dynamic Programming, Econometrica, Vol. 47, pp. 91-100, 1979. 22. SCHAL, M., An Operator-Theoretical Treatmentof Negative Dynamic Programming, Report, University of Bonn, Bonn, West Germany, 1978. 23. WESSELS,J., Markov Programming by Successive Approximations with Respect to Weighted Supremum Norms, Journal of Mathematical Analysis and Applications, Vol. 58, pp. 326-335, 1977. 24. VAN NUNEN, J., and WESSELS, J., Markov Decision Processes with Unbounded Rewards, Markov Decision Theory, Edited by H. Tijms and J. Wessels, Mathematisch Centrum, Amsterdam, Holland, 1977. 25. PORTEUS, E., Overview of Iterative Methods for Discounted Finite Markov and Semi-Markov Decision Chains, Recent Developments in Markov Decision Processes, Edited by R. Hartley, L. Thomas, and D. White, Academic Press, London, England, 1980. 26. PORTEUS, E., Bounds and Transformations for Finite Markov Decision Processes, Operations Research, Vol. 23, pp. 761-784, 1975. 27. PORTEUS, E., On Optimal Dividend, Reinvestment, and LiquMation Policies for the Firm, Operations Research, Vol. 25, pp. 818-834, 1977.

Conditions for characterizing the structure of optimal strategies in infinite-horizon dynamic programs

Recommend Documents