Acta Informatica 7, 197--216 (1976) 9 by Springer-Verlag 1976
OptimalMultiprogramming* P e t e r J. Denning, K e v i n C. K a h n , J a c q u e s Leroudier, D o m i n i q u e P o t i e r a n d R a j a n Suri Received June 10, 1976
Summary. Three heuristics for controlling the multiprogramming load to maximize system work capacity are studied. Each allows the highest load possible subject to a given constraint. The knee criterion constrains the memory policy so t h a t program resident sets average near the knees of their inter page fault lifetime curves. The L ~- S criterion constrains the memory policy or load so t h a t the system inter page fault lifetime L is at least as large as page swap time S. The 50 % criterion constrains the load so t h a t the paging device is busy approximately half the time. Numerical evaluations of queueing networks show t h a t the knee criterion, which is the most difficult to implement, is the most robust, while the easily implemented S0 % criterion is the least robust. These evaluations also circumscribe the conditions under which the criteria are expected to work reliably. Examples from practical systems further validate the criteria. Stability problems are examined. Introduction
W e shall discuss here three heuristic a p p r o a c h e s to o p t i m a l l o a d control in m u l t i p r o g r a m m e d c o m p u t e r s y s t e m s w i t h p a g e d v i r t u a l m e m o r y . There is no difficulty in d e m o n s t r a t i n g t h a t an o p t i m u m exists [3, 7, 8, 20, 2% 34, 35, 381. The difficulty is in finding some p r a c t i c a l a d a p t i v e o p t i m a l control. The t h r e e approaches presented here have s t r o n g i n t u i t i v e bases for being o p t i m a l ; t h e y are well verified in s i m u l a t i o n s a n d in real s y s t e m s ; b u t t h e y are heuristic because t h e y c a n n o t be rigorously proved. Figure I illustrates t h e b e h a v i o r of T(n), which is t h e s y s t e m work rate or throughput (transactions c o m p l e t e d p e r second) as a function of n, the load (level of n m l t i p r o g r a m m i n g ) . T h e curve i n i t i a l l y rises as t h e increasing load drives t h e s y s t e m t o w a r d s a t u r a t i o n ; t h e highest possible w o r k r a t e in a single processor s y s t e m is given b y a 0, where I / % is t h e m e a n p r o g r a m execution time. H o w e v e r , the increasing l o a d e v e n t u a l l y reduces t h e m e m o r y space a v a i l a b l e to each prog r a m below the m i n i m u m r e q u i r e d to s u s t a i n its w o r k i n g set, p r o d u c i n g excessive paging a n d reducing the work c a p a c i t y of the s y s t e m [20]. The " p l a t e a u " of T(n) is an i n t e r v a l (n A , riB) of loads in which T(n) is within some specified toler* This paper is a compendium of two papers and the ensuing discussion a t the ACMS I G M E T R I C S / I F I P WG7.3 International Symposium on Computer Performance Modeling, Measurement, and Evaluation, a t H a r v a r d University, on 29-31 March 1976. The two papers were presented, respectively, b y the first two pairs of authors [25, 27~, and the discussion b y the last author. The work of the first two authors was supported in part b y National Science Foundation Grant GJ-41289 at Purdue University.
198
P . J . Denning et al.
%
nA
t~
. Under Loaded
| nO
~ n
nB
Plateau ~
Thrashing
Fig. 1. A system throughput curve
ance (e.g., 5 %) of the optimal T(no). If load exceeds nB, the system is "thrashing", while if it does not reach % , it is "underloaded". (Underloading is seldom a problem in practice.) For a system whose user community is of size N, where each user takes a mean time Z to submit a next transaction after the completion of the last, the well known formula for the mean response time W is [13, 30, 3t] N IV-- T(n)
Z.
(1)
From this it is obvious that maximizing T(n) for a given user community minimizes the response time for a transaction. Since a transaction progresses in virtual time only if executing on the processor, the normalized throughput T(n)/a o is the processor utilization Uo(n), maximized at the optimal load. The objective of a load controller is regulating both the load and the memory policy so that throughput remains near optimal. The most direct control simply hunts, by varying n and observing T(n), for a maximal throughput. Such a direct control has an obvious limitation: an estimate of T(n), which depends on observing a series of recent jobs departing from the system, may not apply to the conditions in the system. A controller capable of reacting to current conditions must use measurements of system variables, whose rates of change are considerably faster than the rate of user interactions with the system. For example, a controller could regulate throughput indirectly by hunting for maximal processor utilization.
Optimal Multiprogramming
199
To be practical, a controller m u s t avoid the high overhead of h u n t i n g for a m a x i m u m of a control function; it m u s t employ supplemental measures whose values indicate the most desirable direction for an adjustment of load or m e m o r y policy parameter. The most useful supplemental measures are related to program behavior because load changes are strongly correlated with changes in the main m e m o r y space available to each program. The supplemental measures considered here range from the lifetime measure of each program (for the knee criterion), to the lifetime measure averaged over all programs (for the L = S criterion), to the effect of page faults on the paging device queue (for the 50% criterion). Our experiments reveal t h a t controlling these measures is so effective in locating optimal loads, t h a t a controller which hunts for an o p t i m u m of a control function m a y be totally unnecessary: the "supplemental measures" are in fact adequate control functions in their own rights.
System Model Figure 2 is a diagram of the type of system under consideration: a s t a n d a r d central server network of m + t stations, in which station 0 is the processor, station I is the paging device, and stations 2 . . . . . m are the i n p u t / o u t p u t devices. The source supplies transaction to the system from a finite population; it also controls the load n. E a c h transaction is implemented as an active program which is imagined to circulate through the network, receiving service portions from various stations sequentially until it leaves the system. Let b, denote the service completion rate at station i ( i = 0 . . . . . m), t h a t is lib i is the mean service time of a task at station i. In particular, the page swap time is S = 1/b I .
(2)
The parameters bI . . . . . b,, are assumed independent of n (e.g., each device contains a single server), but bo m a y increase with n due to page faults. Except t h a t no server is idle when a task is in its queue, no other assumptions are made about stations or their service distributions. Let a i denote the request rate for station i ( i = t . . . . . m); t h a t is, 1/a i is the mean virtual time (averaged over all transactions) between two requests for station i. W h e n such a request is made, the program departs the processor and joins the queue at station i. The average total execution time of a program is taken as t/ao, so t h a t a 0 is the request rate for leaving the system. Under these assumptions the service completion rate at the processor is bo= a 0 + "'" +am"
(3)
We suppose t h a t all request rates, except for the page fault rate al, are intrinsic to the programs and independent of n. The mean time between page faults in the system, referred to hereafter as the system li[etime, is
L(n) = t/a t.
(4)
L(n) could be estimated at time t as follows. Let L~ denote the length of the interfault interval in the program t h a t generated the jth most recent system
200
P . J . Denning et al.
I
%
%
I
•x[mi I
"
I/0 2
ibm
1
H
I/0
I
CPU
~
li
.7" qo
I
T(n)
SOURCE AND LOAD CONTROL
i
Fig 2. System model
page fault. For a sequence of k such faults while the load is n, l
k
L(n) = ~ ~
L 7.
(5)
It is important to note that a program need not have executed continuously for time Lj prior to generating a system page fault, for it may have stopped for some I/O operation; Lj is thus the aggregated processor time of a program since its prior page fault.
Optimal Multiprogramming
201
We suppose that the main memory capacity available to user programs is fixed at M page frames. It is almost always true that L(n) is strictly decreasing in n: most memory policies have a parameter which can be adjusted to decrease resident set sizes, allowing the load to increase, with consequent increase in paging rates. Working set based memory management is of this type [19, 291 and the P F F policy is approximately so [16]. The transition ]requencies q, of Figure 1 specify the fraction of processor departures that proceed next to station i; qo is the frequency with which processor departures leave the system. Thus a~
at
q ' - - ao + " " + a,, -- b0 '
(6)
where b0 is the processor completion rate of Equation (1). Under the reasonable assumption that the rates of transitions within the system are much faster than the rate of load adjustments by a controller of the system, the long run dynamics will depend primarily on the equilibrium values attained b y the system during intervals of constant load [17, 18]. In evaluating load controls, therefore, it is reasonable to approximate the actual behavior as if T(n) were the equilibrium throughput for each load n. Define Ti(n ) as the equilibrium work completion rate (tasks per unit time) of station i for i = 0 . . . . . m. The system service rate is T(n)= To(n ) qo. In equilibrium, the output rate must equate the input rate at station i, whereupon T,(n) = To(n ) q,. Combining these facts with Equation (6),
T(n) = I T~
a~176
/T,(n) aola,,
(7) i = t .....
m.
Since b, is the m a x i m u m service rate at station i, r(n) <
= a o b,/a,,
i = 1 . . . . . m.
(8)
Under our assumption that the I/O stations are load independent, we can define the I/O constant I : min{l, b,,l~ . . . . . b.,/a.,} (9) with which (8) reduces to
T(n) ~ ao min {I, -L-~- }
(,0)
where bl/al=L(n)/S according to Equations (2) and (4). Figure 3, illustrating the effect of these bounds on the processor utilization To(n)/ao, makes more precise the intuitions underlying Figure 1. In summary, there are only three important assumptions made about the (central server) model of the active part of the computer system: a) The internal system dynamics are such, that the equilibrium throughput as if load were constant reasonably approximates actual throughput at that load. b) Of all request rates (a i, i----O. . . . . m), only the paging rate (al) varies with load; moreover system lifetime L(n)= t/a 1 is decreasing in n. c) Of all the service rates (bi, i = 0 . . . . . m), only the processor service rate (b0) varies with load.
202
P. J, Denning et al. I
~. aoL(n)/S
OoI
.k
~"
no
n
nI
Fig. 3. Effects of bounds on system throughput
Three Load Control R u l e s
There are two principal types of load control methods. A program-driven m e t h o d specifies a working set for each program, requiring that the load be determined as the number of working sets that can be placed together in main memory. Its free parameter(s) is (are) m e m o r y policy parameters, such as working set window sizes(s) ; the load is a dependent variable [23]. In contrast, a loaddriven m e t h o d specifies a load, requiring that the m e m o r y policy determine a m e m o r y partition that accommodates the given load. Its free parameter is the load; the m e m o r y partition is a dependent variable. The so-called second chance paging algorithm used in CP-67 and Multics is an example of a load-driven method
[2]. We consider three control rules. Their intuitive bases are presented in the following subsections; validation is presented afterward. Each rule allows the load to rise as high as demand warrants subject to a simple constraint. Under the knee criterion, the constraint is that the mean resident set size of each program averages near the knee of that program's lifetime-versus-space function. Under the L = S criterion, the constraint is that the system lifetime L(n) does not fall below the value cS, where c is a constant not much larger than 1 and S is the page swap time. Under the 50?0 criterion, the constraint is that the paging device utilization approximates ( 5 0 + d ) ~ where d is a constant less than t0. These criteria differ in the amount of information about program behavior they use, and therefore in their robustness and implementation costs. The knee criterion employs detailed information about the dynamics of individual pro-
Optimal Multiprogramming
203
grams; consequently its robustness is always good but its implementation m a y be difficult. It is best suited for a program-driven control method. In contrast, both the L = S and 50% criteria employ partial information about program dynamics, obtained b y observing the overall effect of the job mix on the system. Whether this partial information adequately reflects program dynamics m a y depend on the operating point of the system. Consequently, their robustness is not always good, but their implementation is easier. Though they can be used with program-driven methods, their primary attraction is in systems already committed to a load-driven method. Each criterion should be envisaged as part of a two-level control scheme. The upper level regulates tile composition of the job mix in order to achieve a balanced load, that is, one which is neither I/O nor CPU bound. The lower level regulates the memory policy and level of multiprogramming.
The Knee Criterion Figure 4 illustrates the lifetime function G(x) of a given program under a given memory policy; it specifies the mean virtual time between page faults when the given program's resident set averages x pages. (In terms of Equation (5), the expected contribution L~ to system lifetime is G(x) if the given program's resident set contained x pages prior to a system page fault.) For locality-based memory policies, such as L R U or moving-window working set, the lifetime function is usually observed to comprise an approximately convex region (for small x) followed by an approximately concave region; however, lifetime functions with several convex and concave regions are frequently observed. The knee of the function is the point beyond which the curve tends to flatten out. It is defined geometrically as the highest point of tangency between a ray from the origin and the curve.1 The flattening results from one or both of two factors: a) The memory policy tracks program locality, whereupon most page faults result from transitions between program phases [24, 28]. In this the case knee lifetime is approximately H/E, H being the mean phase holding time and E the mean number of pages entering the locality set at a phase transition [24]. Flattening occurs because there is less gain in making the resident set larger than the locality set, than there is penalty in making it smaller, b) There are various external factors in effect, such as the loss of resident set pages during I/O waiting periods or the fact of observed lifetimes' never exceeding the program's total execution time [24, 26, 32J. The important property of the knee is its maximizing the ratio
g(x)= G(~)
(11)
Suppose one page fault induces a mean delay of D in a program's execution, and consider K page faults in a given program. These faults span a real time interval whose expected length is KG(x)+KD. The memory space-time per I In [23,24], the knee-defining ray eminates from G (0) ~ I and dominates the entire curve. Since G (x*) is typically much larger than 1, tile difference between the two definitions is negligible.
204
P . J . Denning et al.
G(x ~)
X
0
Z ~
Fig. 4. A program hfetime function
reference is estimated as x(KG(x) + KD)
KO(x)
xD
D
=x4 G(x)-x+---.g(.)
(t2)
Operating at the knee tends to minimize the component of memory space-time due to paging. Since I/O device speeds and request rates are independent of x, it thus tends to minimize total memory space-time per job. Now: over a period of V time units with load n, the total memory space-time in the system is MV, and the total number of departures is estimated as VT(n). Thus the memory space-time per job is MV/T(~0 V=M/T(n). In other words, throughput is maximized if and only if memory space-time per job is minimized. (See also [13].) This is why the knee criterion tends to define a load at which throughput is optimal. One must be mindful that the foregoing argument merely outlines the intuition of the knee criterion; it is not a proof. In fact, we know of no formal proof. A proof is not required in our context since the validation experiments support the effectiveness of the knee criterion as a heuristic.
Optimal AIultiprogramming
205
The L----S Criterion Figure 3 shows the intersection of the two bounds on T(n)/a o occurring at the load h i - - t h a t is, L(nl)=IS. The L = S criterion derives from the intuition that, when I = 1, T(n) ought to be decreasing for n > n 1 - t h a t is, nl is near but slightly larger than the optimum n 0. In other words, implementing the constraint L(n) >=clS for some small constant c > t should cause T(n) to approximate T(n0 ).2 Two gross system properties modify, and possibly interfere with, this intuition--namely the existence of execution and I/O bounds in the system. In an execution bound system, programs either of inherently short execution time or subject to a maximum limit on resident set size, give rise to a system lifetime function that does not exceed Lma x < IS. (Do not confuse this with " C P U - b o u n d " system.) In this case, there is no crossing point between L(n) and IS. However, if we define n., as the largest load for which L(n)----L . . . . the intuition is that n > n., implies throughput T(n) decreases in n, and that T(n2) approximates the optimal T(n0). By combining this and the previous intuition, we can extend the control constraint, L (n) =>min {L ..... clS}.
(13)
Let ~* be the largest n for which equality holds in (13); the intuition is that T(n*) is approximately optimal because n > n * ought to imply that T(n) is decreasing (it is dominated by the decreasing bound ao L(n)/S). In an I/O bozmd system, the service completion rate for some I/O station is smaller than the request rate (i.e., b, < a,), which implies from (9) that I < 1. The rate of growth of T(n) toward the asymptote a 0 1 will be influenced by the ratios b,/a,: with all other parameters held fixed, a reduction in any one ratio will cause T(n) to grow more slowly in n L311- Slow growth in T(n) will be especially apparent in a "multiply-saturating" system, i.e., one in which [ = b,/a, <=1 for more than one station i. If the growth of T(n) is sufficiently slow, the influence of the bound L(n)/S of (t0) may not be great enough at n = n * (relation 13) to cause T(n) to be decreasing for .n > n*; in other words, the optimal load m a y be greater than 12", rather than smaller or near as assumed before. If the system is both execution bound and I/O bound, the L = S heuristic may fail completely. The execution bound will tend to define a small value of n*; the I/O bound will cause slow growth in T(n); and the optimal load may be significantly larger than ~*. However, this case is not of great practical interest, since the two bounds will severely limit the work rate, forcing it well below its potential maximum (ao)--the system designer has problems more important than load control to solve. The L = S heuristic is of primary interest in systems with sufficient I/O capacity. 2 For a simple model with m = l and L(n)=A(~I,I/n)~S, Brandwajn etal. [6] showed no = (A/S)llk(M/e), where e is the base of the natural logarithm. This gives L(n0) =e~S and n~-----en 0. The policy criterion for this case is L(n) ~ekS, and n I is close to no. However, various practical factors will force n~ closer to n 0 than this model predicts, such as flattening of L(n) for n ~n~, the presence of I/O devices in the system, or L(n) being approximately S [27]. 14
Acta Informatma, Vol. 7
206
P . J . Denning et al.
An interesting property of systems controlled by" a criterion L = c S is that system lifetime will be approximately cS for all memory sizes M larger than that required to permit one program to achieve lifetime of cS. (Likewise, in the case considered in footnote (2), the lifetime at the optimum is independent of memory size.) This does not contradict Saltzer's observations of linear growth in L as a function of M [36]. In Saltzer's formulation, some of the (execution bound) programs sharing memory could be inactive, belonging to terminals in "think s t a t e " ; references to the paging device are thus decreased for larger M because larger numbers of programs making transitions into "active state" require no paging. In our formulation, all programs sharing memory are active; increasing 3[ is assumed to permit more such programs to become active. The load balancing principle embodied in the L = S criterion has a long history of intuitive appeal. In 1962 Corbato et al. proposed the multilevel scheduler for CTSS, in which every job was guaranteed a quantum at least as long as the time required to swap it into main memory [I 5]. In 1967 Belady defined a " p a r a c h o r " as the space at which the mean lifetime equates the mean page swap time [4; also 5]. In t968, Denning suggested this principle as one method of determining the working set window, and as a way of balancing job demands against available equipment I19; also 21, 22, 33]. Wulf suggest that good job mixes are generated if their demands match the capacity of the equipment [39; also 27]. Buzen [t0] and Brinch Hansen [9] study this principle for balancing loads within systems. The 50% Rule The mean queue of a simple exponential server of utilization U is known to be Q = U/(1 - U) [t4, p. t 56] ; since Q > 1 implies Q - 1 customers are awaiting, but not receiving, service, the condition Q = 1, or U = 1/2, corresponds to the onset of significant queueing. Since thrashing is characterized by excessive queueing at the paging device, it is plausible that the optimal throughput is achieved just before any significant queue appears there--i.e., when the paging device is utilized approximately half the time. While this intuition may appear weak because the paging device is not a simple exponential queueing system, it is in fact much stronger: the negative feedback inherent in closed networks causes device utilizations in actual systems to be well approximated by simple exponential models [12].3 The paging device utilization Ul(n ) is typically an 3-shaped function of ~ with an inflection point near U1(~)=1/2 at which J U l ( ~ ) / d n is maximum [25, 27]. This implies that small changes in load will effectively produce large changes in Ul--the load controller will typically not need to vary ~z by more than i 1 [27]. Because the inverse ratio A~z/J Ul(n ) is minimum for these n, the likelihood is least that the controller will react needlessly to small fluctuations in the estimator of U~. 3 Note that the formal statement of the 50~ criterion--control Ux to (50 + d ) % allows the designer some latitude, for the constant d can be "tuned" during system development to cater for the paging device's deviations from exponentml or state independent service time.
Optimal Multiprogramming
207
Relations Between the L = S and 50% Criteria There are s t r o n g relations between the L = S a n d 50% criteria. U n d e r our a s s u m p t i o n t h a t each of the n e t w o r k s t a t i o n s contains a single server, we m a y express work r a t e s as T,(n) = U,(n) b,,
i = 0 . . . . . m,
(14)
where U,(n) is the u t i l i z a t i o n of n e t w o r k s t a t i o n i u n d e r s y s t e m l o a d n [13]. Using this w i t h E q u a t i o n s (7), we deduce
T(n)/a o = Uo(n ) = Ul(n) L(n)/S,
(t 5)
where i = 0 designates the processor a n d i : t t h e p a g i n g device. This e q u a t i o n shows t h a t the t h r o u g h p u t is i n h e r e n t l y m o r e stable t h a n either of the control functions U 1 or L : a t fixed n, an increase in L implies a decrease in U1, a n d conversely, w h e r e u p o n the p r o d u c t U 1 L is more stable t h a n either U z or L alone. E q u a t i o n (15) also suggests t h a t the 50% criterion m a y not w o r k in an execution b o u n d system, i.e., one in which L(n)---= L max < I S for n _-- 50% for n > n 2 is possible; t h u s the o p t i m a l value of U o can arise when t h e paging device is utilized c o n s i d e r a b l y in excess of 50%. The same p h e n o m e n o n can occur if Lma x is slightly larger t h a n S a n d L(n) is r e a s o n a b l y flat for n < n 1 (where L ( n l ) = S ) . However, these difficulties will n o t arise if Lma x is large c o m p a r e d to S a n d L(n) crosses S w i t h s t e e p slope a t n = n~. Suppose the s y s t e m is neither execution nor I/O b o u n d ; a n d t h a t t h e c o n t r o l is achieving L(n) = c S a n d Ul(n) = 0 . 5 + d . E q u a t i o n (15) implies Uo=c(0.5 +d)
(16)
at a n d n e a r the o p t i m u m load. Since U o c a n n o t exceed 1, e c a n n o t exceed I/(0.5 + d). This shows t h a t when the two heuristics b o t h work, c will be between 1 a n d 2. E q u a t i o n s (t5) a n d (16) also show t h a t a s y s t e m which verifies t h e 50% criterion will also verify the L = S criterion. Validation Using a series of n u m e r i c a l e v a l u a t i o n s of queueing networks, we h a v e verified the i n t u i t i o n s u n d e r l y i n g t h e three heuristics for a wide range of choices of s y s t e m a n d p r o g r a m p a r a m e t e r s [25, 27]. I n addition, a review of e x t a n t d a t a on real systems yields further v a l i d a t i o n of the 50% criterion [27]. A n o v e r v i e w of these results is given below.
Numerical Evaluations All t h e n u m e r i c a l e x p e r i m e n t s used a queueing n e t w o r k with one I/O s t a t i o n (a file disk); Buzen's m e t h o d was a p p l i e d in each case to e v a l u a t e utilizations [!1]. F o r a given p r o g r a m lifetime function G(x), a m e m o r y limit of Y pages per active p r o g r a m (i.e., x <~ Y), a n d a m a i n m e m o r y c a p a c i t y of M pages, we 14'
208
P.J.
D e n n i n g e t al.
used for the system lifetime function
[G (M/n), L ( r - - (t Z m a x = G
n > M/Y (V),
n<
(17)
M/Y.
The Leroudier-Potier (LP) experiments were based on the lifetime function
G(x) = A x h with 1.5 <=k ~2.5 and Y = M . In this case there is neither a knee nor a flattening of L(n), so that tile conditions under which the 50% rule might fail were not observed. These experiments verified the 50% rule in all cases. The Denning-Kahn (DK) experiments were based on a variety of lifetime functions; a few were artificially generated, but most were from real programs [25]. These experiments verified all the intuitions about all the control rules, including tile conditions under which they fail. Table t summarizes the ranges of parameters used in the experiments. Figure 5 sketches the forms of typical results. The DK experiments showed the knee criterion to be almost always within 5 % of peak throughput. Only when S exceeded the knee lifetime by a significant amount (e.g., 80%) did the criterion worse; however, it was never observed to T a b l e 1. P a r a m e t e r r a n g e s for n u m e r i c a l e v a l u a t i o n s Quantity
Units
Denning a n d K a h n [25]
Leroudier a n d Potier [27]
M Y G (x)
pages pages ms
200-500 60-140 various, real a n d artifical
512 512
L max S R = bz/a 2
ms ms --
N u m b e r of e x p e r i m e n t s
.4 x k
A ~ 0.0t m s 1.5 ~ k ~ 2 . 5
12 2-18 0.8-2.0 tOO
T(n)/~
5-1oo o.3-1.7 19
L.(n)/S I t
1.0
. . . . . . . . .
"x
........ %
."
..... ' 0.5
5a
...*"
9" " ~ " N
\
~
\~
0
KNEE L=S 50%
Optimal Multiprogramming
209
T(n)/a 0 1.0
..,........
,,,.-,:,.r . . . . . . . . . . . .
A n 0
"-." "'"
U I .............
"~ L(n)/S
9
KNEE L=S ,50%
~.
."
0.5,
" %,
:::~"~
X
|
0
5b
~n
5
T(n)/a 0
1.0 ~'%%
...'"
A KNEE DL=S 0 50%
:'x :. ,,. /
0.5.
/u,
x
%
I
n
5c 0 4 Fig. 5a--c. Typical results of numerical evaluations. (a) No execution or I/O bound (b) Execution bound but not I]O bound. (c) Execution and I/O bound deviate by more than t 0 % from peak. As might be expected, the exact of vrJue load generated b y this criterion depends on S : the load tended larger than optimum with S beyond the knee of L(n), and smaller with S below the knee. With S at the knee, the knee and L-----S criteria agreed. Interestingly, the knee criterion was must reliable when the system was both execution and I/O bound--i.e., when the others tended to fail (Fig. 5 (c)). The L = S criterion behaved as expected in the D K experiments. With L = 1.3 S a system having neither an execution or an I/O bound delivered throughput within 5 % of peak (Fig. 5 (a) ; note that this figure is drawn for c-----t). When the system was execution bound but not I/O bound, operating with L = L m a x
21o
P . J . Denning et al.
consistently defined a load whose throughput was within 5 % of peak (Fig. 5 (b)). When the system was execution and I/0 bound, operating with L =cIS would have required c < t to be optimal (Fig. 5 (c)). The 50~ criterion seemed very accurate as long as S was less than the knee lifetime; for relatively smooth lifetime curves, this corresponds to operating in the convex region. However, as soon as S was comparable to or larger than the knee lifetime, this criterion tended to grossly underestimate the optimal load (as noted in connection with Equation (15) -- and suggested in Figures 5 (b) and 5 (c)). The potential utility of this criterion apparently depends on knowing that the swap time S is always less than the mean knee lifetime of the system. As noted in connection with Equation (15), when the system is neither execution nor [/O bound both the the L = S and 50% criteria worked satisfactorally, and the L - - S parameter c fell between 1 and 2. The choice c = 1 . 3 seemed consistently good; even though it produced U~> 50~ U0= 1.3 U~ still corresponded to a load on the plateau (Fig. 5 (a)).
Experimental Observations During discussions with Adams about [t] we learned that, when the EMAS system operates at maximum efficiency, the paging channel is busy about 50% Lma x / S
I
0
0o KNEE
I
KNEE
L= c l S UI = 0.5+d
Capacity
,,~ 1
I=l
KNEE
KNEE
L = Lrnax
L= cS UI= O.5 + d
I
Optimal Multiprogramming
21t
Table 2. Measurements from Multics [37]. S = 35 ms n
Uo U1 c=L/S=Uo/U x L
t
0.55 O.2O 2.8 98
2
O.64 0.40 t.6 56
3
4
0.66 O.55 1.2 42
0.65 O.66 1.0 35
5
O.64 0.73 0.88 31
6
0.62 O.79 0.78 27
7
0.61 O.83 0.73 26
8
0.60 O.87 0.69 :24
of the time. The d a t a reported by Rodriguez and D u p u y show t h a t at memory space-time measure was optimized when U I = 50 % in the Grenoble CP-67 system [34, 351; to the extent that their measure represents the true memory space-time, this system verifies the 50~ criterion directly (and the L = S criterion indirectly). Somewhat more precise results can be obtained from performance prediction for Multics by Sekino E37, Table 5-7, p. 227J, which are summarized in Table 2. Table 2 shows the success of both the L = S criterion with c = t.2 and the 50% criterion. No direct practical experience with the knee criterion is available.
Implementation
Issues
Overview
The implementor of any load controller, be it program or load driven, must face three kinds of problems. 1. Instrmnentation. Supplementary hardware or software devices m a y be needed in the system, to collect measurements of the controller's input parameters, and to transmit control adjustments back to the load or memory policy. 2. Estimation. The controller must process its input d a t a to build estimates of the control function, and then it must act on these estimates. The algorithms used for these purposes should provide stable estimates from which stale information is removed, and they should not generate undue overhead. A compromise has to be struck between the frequency at which these algorithms are a c t i v a t e d and the quality of the resulting control. 3- Control and Stability. Given the estimates, there must be a heuristic for performing the desired control without instabilities: this heuristic m a y be simple, as for the L----S and 50% criteria, or it m a y be more complex, as for the knee criterion.
The level of complexity required to build a good controller m a y not be obvious at the outset. The second two authors contributed to an earlier s t u d y of a complex controller t h a t hunted for a maximum in the weighted sum of all device utilizations [6]. The 50% criterion, which controls just the paging device utilization U1, turned out to be far simpler, more responsive, and more s t a b l e - - a fact discovered only after exhaustive experimentation [27]. In retrospect we see t h a t hunting for the maximum of a control function was unnecessary, since
212
P . J . Denning et al.
controliing a supplemental measure, U~, was equally effective; and that mixing U~ with other device utilizations masked off the effects of dynamic program behavior from the control function, thereby reducing responsiveness and stability. The questions of instrumentation, estimation, and stability in our three heuristics are taken up briefly in the following paragraphs.
Instrmnentatio~ and Estimaholz The numerical experiments suggest that the knee criterion is the most robust, producing near-optimal throughput over the widest range of conditions. Though one might suspect implementation would be costly and the overhead in the controller high, the available data suggest otherwise. Typical measurements of individual programs show that the working set policy's memory space-time is within 5 % of minimum over a wide range of window sizes--a range often spanning two or three orders of magnitude. Measurements also reveal so much overlap among these ranges for different programs that one could easily specify two system values of window size, of which one would suffice to put a given program in its 5 ~ range [16]. In other words, there appears to be little need for a controller to adjust the working set window dynamically; the controller's main task would be the initial selection of window size for a given program. Since working set knees to dominate those of other memory policies [24], the working set memory policy would give the best results with the knee criterion. The hardware to implement a working set policy could be built for approximately $20 per main memory page frame at t972 prices [29]. The indications are, therefore, that a working set based knee policy can be built cheaply; and that the knee controller would generate little overhead. However, further experimental studies on program behavior are needed to verify these projections. The numerical experiments suggest that the L = S criterion is the next in robustness, working well expect when the system is both execution and I/O bound. Instrumentation would consist of a timer associated with each active program: to accumulate virtual time since that program's prior page fault. When a program generates a page fault, its timer value would be entered into the system lifetime estimator. This estimator could be of the moving-window type, as in Equation (5), or of the exponentially-smoothed type [6, 27]. The numerical experiments suggest that the 50% criterion is least robust, possibly failing when the system is execution bound or when the L = S point is beyond the system lifetime's convex region. If the system is known to be neither execution bound nor I/O bound, this criterion is as effective as the L = S criterion. Its instrumentation is the simplest among the three. A hardware monitor, or simple routine in the paging device driver, would report a current estimate of the paging device throughput Tl(n); from Equation (t 5) ,the desired value of Tl(n ) is t/2S. The control actions for both the L = S and 50% criteria depend on the general method of control. In the context of program-driven c~ntrol, a single value of memory policy parameter would be used to measure program working sets; it
Optimal Multiprogrammmg
213
would be adjusted from time to time to keep the estimator near target (L near cS, U 1 near 0.5), and the load would rise and fall in consequence. In the context of load-driven controls, the load itself would be adjusted from time to time to keep the estimator near target.
Stability It is difficult to be precise about the stability of systems under any of our load controls, since our arguments have been based on transient hiding equilibrium assumptions. Nonetheless, we can make some general arguments leading to optimism about stability. Thrashing is the major form of instability we wish to avoid. Practical experience shows that working set memory management reliably thrashing [23, 34, 35], so we know that our heuristics can be stable to this extent. A second form of instability would be caused by excessive delay in the feedback loop between the measurement of the control function and the corresponding load adjustment. Should the character of the active set of jobs change significantly over the estimation interval, the estimate could be inaccurate at the time it is used to determine a control action. This would be no problem with the knee criterion controlling a working set policy, for the policy itself adjusts rapidly to program dynamics and, as noted, the controller would make window size adjustments only rarely. It is potentially a problem with the L = S and 50 % criteria because of the delay in estimating L or U 1. However, preliminary experiments indicate that the 50~ criterion responds rapidly and accurately to changes in load characteristics [27]. This observation, together with successful experience with working set based memory management, allows for optimism that the feedback delays can be kept suitably short. Nonetheless, further investigation of this question is needed. A third form of instability would be caused b y statistical fluctuations in the actual load, or in control function estimators, about their average values. This possible instability would affect the knee criterion the least (each program's working set is measured independently using a fixed window), the L = S criterion intermediately (the estimate L is measured over the recent past of active programs), and the 50% criterion the most (a change in load can affect paging device queue length significantly). Leroudier and Potier observed frequent, but minor, fluctuations in actual load, typically 4- 1, under the 50% criterion [27]. The sensitivity to estimator fluctuations can be controlled b y establishing a " n o action" range containing the target control function value. In the 50% criterion, for example, the target value of the paging device throughput Tl(n ) is I/2S (cf. Eq. (15)). Whenever 1
2S
e =< Tl(n ) =< ~
t
+e
(t8)
for some small constant e, no control action would be taken. Load adjustments would be made only when the T1 estimate was outside this range. A similar strategy can be adopted for the L = S criterion, no action being taken unless the L estimate were outside a range containing the target clS.
214
P . J . Denning et al. Conclusion
Figure 6 summarizes the results of this paper, showing the control rules that will maximize the system's throughput under various conditions. The knee criterion applies universally. Except when the system is both execution and [/O bound, a form of L = S criterion applies. Except when the system is execution bound or the L----S point is outside tile convex region of the system lifetime function, the 50% criterion applies. The shaded area suggests that the L = S and 50% criteria may not apply in systems on the verge of execution and I/O bounds. The knee criterion is of primary interest in systems having the hardware for working set memory management. This environment yields a robust, program-driven control. The L = S and 50~ criteria are of primary interest in existing systems where a commitment to load-driven memory management has already been made. They allow robust load controls to be added with just simple modifications of the system. The results of this paper can be used to determine if the system falls in the robustness ranges of these criteria. The control rules are designed to approximate optimal system throughput (i.e., T(n) ~- T(n0) ) rather than the optimal load itself. This will minimize response time and memory space-time per job; it will also maximize processor utilization. If other factors are taken into account, two loads on the plateau of T(n) may be judged nonequivalent--e.g., there is more paging overhead at the right of the plateau than at the left; a smaller load would be preferred if this overhead were significant compared to swap time. (However, we can put S ( n ) = S + o (n), where o(n) is the paging overhead, and set the L = S criterion to L(n)=>cS(n); or we can continue applying the 50% rule as long as L . . . . is large compared to the maximal value of S(n) [25, 27].) Although stability does not appear to be a problem with these criteria, more investigation of their stability properties is needed. It is worth reemphasizing that our heuristics are intended as a component of a two level resource control policy. The lower level regulates the degree of multiprogramming and memory policy according to one of the heuristics. The upper level regulates the job mix itself in order to seek an overall load which is neither I/O nor CPU bound. The principle of balancing mean inter request intervals against mean service times (balancing a, against b,) has been found effective at the upper level. Leroudier and Potier showed that regulating the job mix to cause a,-~ b, (over the mix) for as many I/O stations as possible will tend to further increase throughput [271, corroborating earlier findings by Denning [19, 21, 22], by Wulf [39], and by Buzen [t0].
Acknowledgement. We should like to thank Jeffrey Buzen, Domenico Ferrari, and G. Scott Graham for valuable insights and criticisms on earlier drafts of our papers. References
1. Adams, M.C., Millard, G.E.: Performance measurements on the Edinburgh Multi Access System (EMAS). Proe. ICS 75, Antibes, June 1975 2. Bard, Y. : Application of the page survival index (PSI) to virtual memory system performance. IBM J. R and D 19, 212-220 (i975) 3. Brandwajn, A., Buzen, J., Gelenbe, E., Potier, D.: A model of performance for virtual memory systems. Proc. ACM SIGMETRICS Symposium, October t 974, P.9
Optimal Multiprogramming
215
4. Belady, L. A.: Biased replacement algorithms for multiprogramming. IBM T. J. Watson Research Center, Yorktown Heights (N.Y.), Note NC697, March 1967 5. Belady, L . A . , Kuehncr, C. J.: Dynamic space sharing in computer systems. Comm. AC~I 12, 282-2SS (1969) 6. Brandwajn, A., Gelenbe, E., Lenfant, J., Potler, D.: A model of program and system behavior in virtual memory. Technical report, I R I A - L a b o r i a , Rocquencourt, France, October 1973 7. Badel, M., Gelenbe, E., Leroudier, J., Potier, D.: Adaptive optimization of a time sharing system's performance. Proc. I E E E 63, 958-965 (1975) 8. Brandwajn, A. : A model of a time sharing virtual memory system solved using equivalence and decomposition nlethods. Acta Informatica 4, 11-47 (1974) 9. Brinch Hansen, P.: Operating systems principles. Englewood Cliffs (N.J.): Prentice-Hall t 973 10. Buzen, J . P . : Analysis of system bottlenecks using a queueing network model. Proc. ACM S I G O P S Workshop on System Performance Evaluation, April t97t, p. 82-103 11. Buzen, J . P . : Computational algorithms for closed queueing networks with exponential servers. Comm. ACM 16, 527-531 (1973) 12. Buzen, J. P.: Cost effective analytic tools for computer performance evaluation. Proc. I E E E Compcon Conf., Sept. 1975, p. 293-296 13. Buzen, J . P . : F u n d a m e n t a l operational laws of computer system performance. Acta Informatica, 7, 167-182 (1976) 14. Coffman, E . G . , Denning, P . J . Operating system theory. Englewood Cliffs (N.J.): Prentice-Hall 1973 15. Corbato, F . J . , Dagget, M.M., Daley, R . C . : An expernnental time sharing system. Proc. A F I P S 1962 SJCC 21, p. 279-294. Also in: Rosen, S. (ed.): Programming systems and languages. New York: McGraw-Hill 1967 16. Chu, ~V. W., Opderbeck, H.: The page fault frequency replacement algorithm. Proc. A F I P S 1972 FJCC 41, p. 597-609 t7. Courtois, P. J.: On the near-complete-decomposability of networks of queues and of stochastic models of multiprogrammed computer systems. Sci. Rpt. CMU-CS-71-11, Carnegie-Mellon University, November 1971 18. Courtois, P. J.: Decomposability, instabilities, and saturation in a multiprogrammed computer. Comm. ACM 18, 371-377 (1975) t9. Denning, P. J.: The working set model for program behavior. Comm. ACM 11, 323-333 (t968) 20. Denning, P. J.: Thrashing: Its causes and prevention. Proc. A F I P S 1968 FJCC 33, P. 915-922 21. Denning, P. J.: E q u i p m e n t configuration in balanced computer systems. I E E E Trans. Computers C-18, t008-1012 (1969) 22. Denning, P.J. : Third generation computer systems. Computing Surveys 3, 175-216 (t971) 23. Denning, P. J., Graham, G. S.: Multlprogrammed memory management. Proc. I E E E 63, 924-939 (1975) 24. Denning, P. J., Kahn, K.: A study of program locality and lifetime functions. Proc. 5th ACM Syrup. on Operating System Principles November 1975, p. 207-216 25. Denning, P. J., Kahn, K. : An L = S criterion for optimal multiprogramming. Proc. I n t ' l Syrup. on Computer Performance Modeling, Measurement and Evaluation ACM-SIGMETRICS and I F I P WG7.3, Cambridge (Mass.), March t976, p. 219-229 26. Easton, M.C., Fagin, R.: Cold start vs. warm s t a r t miss ratios and multiprogramming performance. IBM T. J. Watson Research Center, Yorktown Heights (N.Y.), Report R e 5715, November 1975 27. Leroudier, J., Potier, D. : Principles of o p t i m a l i t y for multiprogramming. Proc. I n t ' l Symp. on Computer Performance Modeling, Measurement and Evaluation ACM-SIGMETRICS and I F I P WG7.3, Cambridge (Mass.), March 1976, p. 2t 1-218
216
P . J . Denning et al.
28. Madison, W., Batson, A. P.: Characteristics of program localities. Comm. ACM 19, 258-294 (1976) 29. Morris, J. ]3.: Demand paging through the use of working sets on the Maniac II. Comm. ACM 15, 867-872. (1972) 30. Muntz, R. R. : Analytic modeling of interactive systems Proc. I E E E 63, 946-953 (1975) 31. Muntz, R . R . , Wong, J.: Asymptotic properties of closed queueing network models. Proc. 8th Princeton Conf. on Info. Scis. and Systs, Department of Electrical Engeering Princeton Universxty, March 1974 32. Parent, M., Potmr, D.: A note on the influence of program loading on the page fault rate. Technical Report, IRIA-Laboria, Rocquencourt, France, February 1976 33. Prieve, ]3. G.: Using page residency to determine the working set parameter. Comm. ACM 16, 619-620 (1973) 34. Rodriguez-Rosell, J., Dupuy, J. P. : The evaluation of a time sharing page demand system. Proc. A F I P S 1972 SJCC 40, p. 759-765 35. Rodriguez-Rosell, J., Dupuy, J. P.: The design, implementation, and evaluation of a working set dispatcher. Comm. ACM 17, 247-253 (1973) 36. Saltzer, J . H . : A simple hnear model of demand paging perfermance. Comm. ACM 17, 181-185 (1974) 37. Sekino, A. : Performance evaluation of a multiprogrammed time shared computer system. MIT Project MAC, Rpt MAC-TR-a03 Ph. D. Thesis, September 1972 38. Weizer, N., Oppenheimer, G. : Virtual memory management in a paging environment. Proc. A F I P S 1969 SJCC 34, p. 234 39. Wulf, W . A . : Performance monitors for multiprogramming systems. Proc. 2nd ACM Symp. on Operating Systems Princepels., October 1969, p. 175-181 Peter J. Denning Kevin C. Kahn Computer Science Department Purdue University VCest Lafayette, Indiana 47907 USA Jacques Leroudier Dominique Potier IRIA-Laboria Domaine de Voluceau, Rocquencourt 78150 LE CHESNAY France Rajan Suri Division of Engineering Harvard University Cambridge, Massachusetts 02138 USA