J O H N D. N O R T O N
THE THEORY
OF RANDOM
PROPOSITIONS*
ABSTRACT. The theory of random propositions is a theory of confirmation that contains the Bayesian and Shafer-Dempster theories as special cases, while extending both in ways that resolve many of their outstanding problems. The theory resolves the Bayesian "problem of the priors" and provides an extension of Dempster's rule of combination for partially dependent evidence. The standard probability calculus can be generated from the calculus of frequencies among infinite sequences of outcomes. The theory of random propositions is generated analogously from the calculus of frequencies among pairs of infinite sequences of suitably generalized outcomes and in a way that precludes the inclusion of contrived or ad hoc elements. The theory is also formulated as an uninterpreted calculus.
1. I N T R O D U C T I O N
The Bayesian theory of confirmation suffers from two defects: 1 • The Problem o f the Priors. All judgments of the import of evidence
arise in Bayesianism as posterior probabilities upon updating a prior probability distribution. On pain of infinite regress, this prior probability distribution is immune from any form of scrutiny by evidence, at least within the Bayesian scheme. This immunity can be disastrous if the prior probability distribution imprudently assigns zero probability to some hypothesis, for no weight of favorable evidence can ever lead to anything but a zero posterior probability for the hypothesis. Of course, in many circumstances, the arbitrary differences between possible prior probability distributions will vanish in a limit as the number of stages of evidential updating becomes infinite. This "washing out of the priors" is cold comfort, however, for those mortals who can only ever see a finite stage of updating. For each such stage and any given body of evidence, there will always be some prior probability distribution that allows us to accord an arbitrarily high or low probability to our favorite or least favorite hypothesis. Yet the choice of prior probability is uncontrolled by evidence within the Bayesian scheme. Another way of seeing the difficulty is to notice that there is an asymmetry in the way the Bayesian scheme admits propositional information. It admits it in two forms: as hypothesis Erkenntnis 41: 325-352, 1994. © 1994 Kluwer Academic Publishers. Printed in the Netherlands.
326
JOHN
D.
NORTON
and evidence and as the background assumptions that inform the choice of prior probability distribution. However only the former can be subject to evidential scrutiny within the Bayesian scheme. • Impossibility of Vacuous Belief. The additivity of Bayesian probability measures makes it impossible to represent a state of suspended belief within a single probability measure. For example, we can assign a near zero probability to a proposition A in order to represent our near complete ignorance over whether A obtains. However, we are then forced by additivity to assign near unit probability - near complete certainty - to the negation of A, although we may be equally unsure of whether it obtains. What we should like to do is assign near zero probability to both proposition A and its negation - A and unit belief to their tautologically true disjunction (A v ~A). The Shafer-Dempster theory of belief functions (Shafer 1976) is very attractive in so far as it allows for just such a non-additive distribution of belief. However, the Shafer-Dempster theory suffers from two serious defects:
• The Problem of Distinguishing Dependent and Independent Evidence. In the Shafer-Dempster system, one's state of belief in hypothesis H is represented by a monadic belief function Bel(H). New evidence is represented by a new belief function Bel'(H) and updating occurs by combining the two belief functions into a new belief function Bel ® Bel' according to a rule known as "Dempster's rule of combination." This rule can only be applied to combine belief functions based on independent evidence. However, unlike the Bayesian scheme, there is no criterion within the system for distinguishing the case of dependent from independent evidence, so that the proper application of the rule depends on intuitive judgments of independence. There is also no systematic method of combining partially dependent evidence. • The Problem of Interpreting Belief Functions. What precisely is meant by an assertion that Bel(H) = 0.5 is rendered as the somewhat opaque assertion that one's degree of belief in H is 0.5 on a scale of 0 to 1. There is no further interpretation of the assertion in terms of relative frequencies or inclinations to accept or reject certain wagers. There is no decision theory that allows these states of belief to be converted into actions. 2
RANDOM
327
PROPOSITIONS
The purpose of this paper is to develop the theory of random propositions. The theory incorporates both Bayesian and Shafer-Dempster schemes as special cases. In extending them, it solves all of the above problems. That is: • The theory no longer suffers from the problem of the priors and of representing null or vacuous belief. The admissibility of vacuous belief distributions of the type of the Shafer-Dempster theory makes possible prior probability distributions that have no non-trivial propositional content and that make no contribution to the posterior probability distributions in the simple sense that they give no weight to any outcome over any other. If, however, one employs non-vacuous prior probability distributions, their propositional content is as open to scrutiny by evidence as any other proposition is within the theory. And finally, the assigning of a zero prior probability to a hypothesis no longer forces all its posterior probabilities to be zero. • While no particular interpretation of probability is urged here, the theory does admit a simple relative frequency interpretation, which can be used in the special case of the Shafer-Dempster theory, if desired. The theory also admits the possibility of a decision theory, which can also be applied in the special case of the Shafer-Dempster theory. The theory replaces the monadic belief functions Bel(.) by dyadic functions P(. I "), as in the Bayesian scheme. The combination of evidence E and E ' is represented as a transition from probability distributions P(. [E) and P(. I E ' ) to P(- [E & E'). The transition reduces to Dempster's rule of combination in the case in which E and E ' are independent and this independence can be expressed within the theory. ° The theory of random propositions contains no arbitrary elements or arbitrary functions that can be adjusted opportunistically to incorporate some designated feature. The entire theory arises from a natural extension of the Bayesian scheme. Once this extension is made, the entire theory and all the above properties are completely fixed. 2.
GENERAL
STRATEGY
AND
DISCLAIMER
ON INTERPRETATIONS
When one seeks to generalize the probability calculus, one faces the problem that there is a bewildering array of essentially arbitrary extensions. However, as one considers more complex extensions, it becomes
328
JOHN
D.
NORTON
harder to see which can be combined consistently, let alone naturally, to resolve cleanly a particular defect of the earlier theory. For this reason, this paper employs an explicit strategy. If one associates probabilities with relative frequencies, one can generate the standard calculus very rapidly, at least for finite outcome sets. With the calculus in hand, one can then discard the association with relative frequencies and recover an uninterpreted calculus. The strategy of this paper is to seek a natural extension of the relative frequency interpretation of probability and to repeat the process of generation of a calculus. We shall see in Section 3 that there is a natural extension well adapted to resolving the problems outlined in the introduction. The extension induces an extended probability calculus, which is given in Section 4 in a form dependent on the sequences used in Section 3. The calculus can be developed axiomatically as an uninterpreted calculus and this is done in Section 5. One is free to choose among several attitudes to the frequencies and sequences discussed here: First, one could adopt the ffequentist interpretation and understand probabilities to be relative frequencies among sequences of possible outcomes of repeated trials. Second, one could regard the frequencies and sequences as a convenient but in principle dispensable device for generating a new calculus. Because the sequences and frequencies form a model of the calculus, one might still use them as a convenient aid for representing and visualizing results of the theory. The sequences are not thought of as sequences of outcomes of repeated trials. They are just convenient, mathematical devices for encoding probabilistic information. (In a similar spirit, one uses Venn diagrams to facilitate calculations within a Boolean algebra.) This second choice will be the attitude adopted in this paper, since the use of sequences greatly simplifies manipulation of the calculus. Finally one could expunge any suggestion of the frequentist interpretation by using the calculus of SectionS, without ffequentist interpretation. 3.
HEURISTIC
GENERATION
STANDARD
OF AN EXTENSION
PROBABILITY
OF
THEORY
Consider a six-sided and not necessarily unbiased die. We can represent the probabilities of throwing different numbered faces by an infinite sequence Bi (where i = 1, 2 . . . . ) such as
RANDOM
Bi =
PROPOSITIONS
329
2, 6, 4, 5, 1, 2, 3, 2, 2, 2, 5, 6, 1, 5 , . . .
in which the probability of throwing a 1 is the relative frequency of ls in the sequence, etc. It is natural to interpret this sequence as the outcomes of infinitely many possible throws of the die. We can now recover the probabilities of certain other types of throws by simple operations on this sequence, since Bi encodes the information in the prior probability distribution for the outcome of a die throw. For example, we can determine the probability of an even throw (hypothesis " H " ) by representing H as the sequence Hi = {2, 4, 6}, {2, 4, 6}, {2, 4, 6}, {2, 4, 6}, {2, 4, 6}. . . . We then compare the sequences and read off P(H I B), the probability of H on the background of information B encoded in sequence Bi, as the frequency with which/-L_DB~. Similar operations allow further conditionalization. If we wish to know the probability of an even throw given evidence ( " E " ) that the throw is low, that is, one of 1, 2, or 3, we represent E by the sequence E, = {1, 2, 3}, {1, 2, 3}, {1, 2, 3}, {1, 2, 3}, {1, 2, 3}. . . . We form the new sequence
E~ OBi = 2 , 0 , 0 , 0 , 1 , 2 , 3 , 2 , 2 , 2 , 0 , 0 , 1 , 0 , . . . which represents the conjunction E & B, and compute P(H I E & B) as the frequency with which Hi D_(Ei n Bi), excluding those cases in which (Ei n Bi) = 0. That is, we compute the frequency of even throws among the subsequence of low throws in Bi. In general such devices enable us to translate a computation in the standard probability calculus into set theoretic operations on sequences. In this translated form, the two defects of Bayesianism discussed above adopt more precise expression. The role of the prior probability distribution is played by the sequence Bi. It is the sole source of randomness in the system; without it all the above set theoretic operations would yield probabilities of 0 or 1. (For example we would have P(H I E) = 0 since Hi _DE/, fails for all i.) Moreover, the operation of conditionalization on further evidence simply amounts to deletion of terms from Bi and their replacement by 0, such as in the formation of the sequence E~ n Bi above. Therefore the sequence Bi - in effect the prior probability - completely dominates the evidential updating. If a probability
330
JOHN
D. NORTON
distribution is not implicitly present in Bi a s a subsequence that can emerge after deletion of terms in Bi, then no amount of conditionalization on the background B by any evidence E will ever yield it as a posterior probability distribution. Yet the information in Bi is immune to direct evidential scrutiny. This harmful dominance can be removed in two ways. First note that the harmful dominance of Bi derives in part from the fact that only elementary outcomes - the individual outcomes 1 , 2 , . . . , 6 - may appear as its members. This forces the sequence to be very rich in information and makes a vacuous prior probability distribution distribution impossible. In this regard, B is quite unlike the hypothesis H and evidence E whose sequences may contain sets of elementary outcomes. We can allow reduction of this prior information content of the background sequence Bi by allowing it to have non-singleton sets of elementary outcomes as terms. For example, B', where B~ = {2, 4, 6}, {2, 4, 6}, {2, 4, 6}, {1, 3, 5}, {1, 3, 5}, {2, 4 , 6 } , . . . only contains information on the probabilities of odd or even throws, widening the possible distributions that may arise on conditionalization. As a limiting case, we could introduce the vacuous belief distribution of the introduction through B .... where Bvacl = ~, ~-~, ~'~, ~-~, ~"~, ~'~, ~'~, ~"~, ~'~, ~'~, ~-~, ~-~, • • •
and ~ = {1, 2, 3, 4, 5, 6}. It is Vacuous since it tells us only the trivial information that one of the faces 1 , 2 , . . . , 6 appears on each roll, without in any way restricting which that face might be on each roll. Let us now call sequences such as Bi "random propositions" since the distribution of their terms encodes probabilistic information about random outcomes. Sequences such as Ei and Hi are "non-random propositions" since their terms are constant. In this language, the extension needed to allow reduced information prior probability distributions is just: (i)
The terms of random propositions need not be restricted to elementary outcomes but may also contain sets of them. 1
We can now compute P(HIB' ) = g, the probability of an even throw H given the prior assumption that even and odd throws are equally likely.
RANDOM
PROPOSITIONS
331
Second, we can address the problem that the background information in B cannot be subject to direct evidential scrutiny, whereas H and E can. The difference arises merely because H and E are allowed to appear within either argument of P(. [.), whereas B can only appear in the second. That is, if our evidence E is that a throw is one of 1, 2 or 3, we cannot ask the obvious question of how well our background assumptions B stand up to this datum of evidence. We are not allowed to form P(B [E). The extension needed to remedy this is: (ii)
Random propositions are allowed to appear within either argument of P(. [.).
and we can now compute
1
P(BIE ) = 5
as the frequency with which
Bi ~Ei. At first it may seem an odd device for the outcome of a single trial to be represented by an infinite sequence. It follows naturally, however, if we allow that an infinite sequence can be used to represent what would happen were the same trial to be repeated infinitely often. Under this supposition, the sequence Bi naturally represents the background assumption that the die has equal chance of showing each face. The datum of an even throw can be represented in comparable terms, that is, as what it tells us about what would happen were the trial repeated infinitely often. Note that we seek to represent the datum alone, without any further information, such as our knowledge from elsewhere that dice throwing is a random operation in which any face of the die may show. Since the datum does not contain such extra information, it gives us no reason to expect anything other than an even throw, were the throw to be repeated. That is naturally represented by the sequence Hi, all of whose terms are the same. Of course, we need not banish our knowledge of the random character of dice throwing. Rather this extra information should be introduced explicitly in other random propositions and not hidden in the representation of H. This device allows both random and non-random information to be represented in the same way in the theory. Thus the random information of B and the non-random information of H and E may each enter freely into either position of P(. I "), enabling extension (ii) to be carried out. In effect we now have the entire theory of random propositions. It consists in the extensions of the standard probability calculus (of finite outcome spaces) by (i) and (ii). We shall now see that these natural
332
JOHN D. NORTON
extensions yield a quite definite theory without further assumption. In Section 4, the theory will be formulated in a way that is dependent on the sequence representation of random and non-random propositions. In Section 5 it will be formulated axiomatically, in a way that is free of association with sequences and frequencies. 4. PRIMITIVE TERMS AND NOTIONS OF THE THEORY OF RANDOM PROPOSITIONS
The theory is defined on a finite sequence of elementary outcomes f~ = {al, a2 . . . . .
an}.
A proposition, represented by boldface letters, is an infinite sequence of subsets of ft: X = X1, X2, X3, . . .
where Xi C 11. Proposition X is "non-random" if all Xi are the same; otherwise it is " r a n d o m . " Special cases are the vacuous proposition
~=ft,
f~,ft,~,12, I't,f~,~,f~ . . . .
and the impossible proposition . . . .
Proposition X is "elementary" just in case all Xi are ~ or singleton sets {ai}. Several operations are defined on propositions: =
£,
(X V Y)i = X i U Y i (X•Y)i = X / N Y/
for i = 1, 2 . . . . and where (X v Y)i is the i-th member of the sequence X v Y, etc. These operations ~ , v, & correspond to the familiar logical connectives " n o t , " " o r , " " a n d , " The operations - , v , & induce a Boolean algebra on the set of propositions in the familiar way that the corresponding set theoretic operations =, U, n induce a Boolean algebra on sets. It will also be convenient to define an "inclusive conjunction" operation
RANDOM
PROPOSITIONS
333
(X &inc Y)i = Xi V~ Y~ = Yi
if Xi D Yi
= 0
otherwise
The operation &inc is badly behaved in so far as it is neither commutative nor associative. However, it will greatly simplify the statement of the generalized version of Bayes' theorem. Note that if Y is an elementary proposition, then (X &inoY) = (X & Y)
Following the model of Section 3, we can now define the probability P(X I Y) as the frequency with which XiDY/ among the Yi that are not 0. That is n
I(xi _Yi)i(Y (1)
P ( X [ Y ) = Lim~=l n____>~
4=O)
n E l(Yi--,--PO) i=l
where the indicator function I is defined by /(logical expression) = 1 if logical expression is true = 0
if logical expression is false.
P(X [ O) is not defined for all X. The theory must be restricted to sets of propositions in which the limit of (1) is always defined, excepting, of course, P(X[fl). To pick out suitable sets of propositions, we define the frequency distributions of terms in the propositions. The frequency with which A appears in X is (2)
f x ( A ) = Lira 1_ i I(X~ = A) n----~ Fl i = 1
for A C ~. Joint frequency distributions for sets of propositions X, Y . . . . are defined as (3)
fx, v,...(A, B . . . . ) = Lira 1 ~ I(Xi = A)I(Yi = B ) . . . n-..-~ n i= 1
for A, B , . . . C_fL These joint distributions record the frequency with which outcomes A, B . . . . occur at the same positions in the sequences of X, Y . . . .
334
JOHN D. NORTON
The theory is restricted to a finite set of propositions ~ = {P1, • • •, PM}, where ~ contains all the non-random propositions of 12 and where the joint distribution re1 ..... r~ = f ~ exists. Several results follow immediately from the existence of this joint distribution. The joint distribution of any subset of propositions of ~ exists. In particular the distribution fx.y will exist for any X, Y ~ ~, so that the limit (1) defining P(X I Y) always exists, unless Y = t~. Finally, if X E 9~ and Y ~, then it is always possible to extend ~ to ~ ' by adding - X , X v Y, X & Y, X &i.c Y to ~ and still retain existence of the extended distribution f~,. It follows from the definition (3) that f ~ , for any finite set gt of propositions, satisfies an additivity requirement (4)
~
f~(A, B .... ) = 1
A,B,..-CI2
when it is defined. The relative frequency interpretation of the theory consists in the following. The sequence E encodes information on what might be the outcomes of indefinitely many repeated trials. H encodes an hypothesis on what these same outcomes might be. P(tt I E ) reports the frequency with which H succeeds relative to the information E. 3 5. UNINTERPRETED
AXIOMATIZATION RANDOM
OF THE THEORY
OF
PROPOSITIONS
We can now excise from the theory the explicit connection between probability and frequency and between propositions and sequences of outcomes. To do this we need only see that the crucial probabilistic information encoded in the sequences of a set of propositions ~ is fully captured by the joint distribution function f~. Therefore an uninterpreted axiomatization of the theory can be given with propositions as primitive terms and the joint distribution f ~ posited as the fundamental quantity. To erase suggestion of any dependence on a relative frequency interpretation, the joint distribution functions f will be renamed the "probability generating functions," since that is their role in the theory.
RANDOM
335
PROPOSITIONS
5.1. The Axioms
I. Existence of outcomes and propositions The primitive terms of the theory are a finite set of elementary outcomes 12 = {al, a 2 , . . . , a,} and a finite set of propositions ~ = {P1 . . . . , PM}. Unary operation -- and binary operations v and & are defined on and induce a Boolean algebra on it. That is, following standard definitions (Iyanaga and Kawada, 1980, p. 158), v and &, respectively join and meet, are commutative and associative and obey absorption and distributive laws. - forms the complement. There is a least element, the impossible proposition, 0 and a greatest element, the vacuous proposition, 12. A further binary operation &incis also defined, such that X &incY = X & Y, for the special case in which Y is elementary (defined below). It will turn out below that consistency will prevent &inc from being associative or commutative in the general case. II. Existence of a probability generating function T h e r e is a real valued probability generating function re1 ..... r~ = f ~ defined on ~ that is additive
~
(4)
f~(A1 . . . . ,. AM) = 1
A1,A2,."CI~
where (5)
0 ~
AM)<<-I
for all A1 . . . . , Am C 12. Generating functions f ~ on non-emtpy subsets ~ of ~ can be defined by successive elimination of propositions from ~ according to the scheme fl,1 ..... eN-I,PN+I..... ~ ( A I , . • • , AN-1,AN+I . . . . =
Z
fP1 ..... PN ..... p R ( A 1 , . . . ,
AN,...,
,AR)
AR)
ANE~
for all A~ . . . .
,AN-l,
AN+I ....
AR C_12. It follows that any such re-
336
JOHN D. NORTON
duced generating function will obey correspondingly reduced versions of (5) and additivity (4). III. Adaptation of generating function to propositional operations The function f ~ is such that for any X, Y ~ ~, and any A C_ft, if - X , X & Y, X v Y, X &inc Y are in ~ also, then 4 (6)
f . . . , - x .... (A) = f .... x . . . . (A) B,C,~I'Z
(7)
f...,x,~Y,...( . . . . a . . . . ) =
f...,x,Y,...( . . . . B , C . . . . )
~ BNC=A B,CEFz
(8)
f...,xvy,...(...,a,...)
=
~ BtAC=A
f...,x,y,...(...,B,C,...)
BCI~
(9)
f...,x,~i,cY,...( . . . . a . . . . ) =
~ f...,x,Y,...( . . . . B , A . . . . )
B~A
i f A :k0 B,C~Ft
=
E
not(B~C)
f.,x,Y,(...,8,
c,...)
+ ~ f...,x,Y,...( . . . . B , 0 . . . . ) B___n
ifA=0
It follows from (9) that fx~inoY = fx~Y when Y is elementary (defined below). However, the asymmetric entry of X and Y in (9) rules out the possibility that &inc is associative and commutative in the general case. We can now define special cases of propositions. A proposition A is non-random just in case fA(A) = 1 for some A C 12. Special cases of non-random propositions are the vacuous proposition 1~ and the impossible proposition ~J, for which we must have fn(12) = f~(0) = 1. All other propositions are non-random. A proposition X is elementary just in case f x (A) is non-zero only for singleton A, that is, for A = {a}, any a E ~. One might want to posit somehow that "all" non-random propositions associated with f~ must be in ~ or that ~ is closed under the propositional operations. Such postulates are superfluous and the content of ~ can in principle be specified with each application according to its particular needs. It will be tacitly supposed that any proposition named is in ~. Finally we can define
RANDOM
(10)
P(XIY) =
PROPOSITIONS
B,CCI2 / ~ fx,y(B, C) B~C,C¢O
337
1 - fy(0)
for any X, Y E ~ excluding those Y for which fy(0) = 1. In practical applications, the following is a useful lemma. It is derived directly from the additivity of the generating function. L E M M A . If A is a non-random proposition in ~ for which PA(A) = 1 and X any proposition in ~, then the only terms in their joint generating function that can be non-zero are (11)
fx,A(C, A) = fA,x(A, C) = f x ( C )
for any C C_12. 5.2. Comparison of Interpreted and Uninterpreted Formulations The advantage of the uninterpreted formulation is (of course!) that it is free from interpretation. Its disadvantage is that it is harder to manipulate and it seems somewhat arbitrary. Both problems are solved by the interpretation supplied through the representation of propositions as sequences. Moreover the interpretation enables a relative consistency proof of the axiom system: the system is as consistent as the corresponding theory of infinite sequences. Finally it should be noted that the formulations of Section 4 and of Section 5.1 differ in a set of cases of measure zero. These are propositions that, as sequences, contain a finite number of some term. For example, take X = A, 0, 0, 0, 0, 0, 0, 0,0 . . . . Y=A,A,O,O,O,O,O,O,O ....
and
In the interpreted system, according to (1), P ( X I Y ) = 0.5. Since P(X I A) = 0, where A = A, A, A , . . . , in effect what we see here is the admissibility of infinitesimal probabilities. These probabilities escape the axiom system of Section 5.1. For X and Y as above, we have fx.Y(0, 0) = 1 and therefore from additivity that all other terms in fx,Y are zero. Therefore we have from (10) that P ( X I Y ) is undefined. In cases in which this difference arises, we shall follow the axiom system of Section 5.1. Equivalently, we could augment Definition (1) by the requirement that P(X I Y) be undefined when fv(~) = 1.
338
JOHN
6.
D.
NORTON
RELATION TO THE SHAFER--DEMPSTER
THEORY
6.1. Shafer-Dempster Belief Functions as Special Cases A Shafer-Dempster belief function Bel is defined on a finite set 12 = {al, a2 . . . . } (called the "frame of discernment"). It is not defined as an additive measure on ~ , but through an adjunct, additive measure m ("basic probability assignment" - bpa) on the power set of 12: A~I2
(12)
~ re(A) = 1 A~O
where all m(A) lie in [0, 1] and m(0) = 0. We then define (13)
Bel(A) = E m(B) BCA
It is convenient to introduce an improper bpa/~ for which (14)
~
p.(A) = 1
AC~
where all/~(A) lie in [0, 1]. Unlike m, ~ may assign non-zero weight /~(0) to 0. A proper bpa is recovered from ~ by normalization (15)
m(A) =/z(A)/(1 -/~(0))
for all A C Yt and A 4: 0.
The connection between the theories of random propositions and Shafer-Dempster belief functions depends on the simple result that any random or non-random proposition X induces a Shafer-Dempster belief function on the set of outcomes 12 according to (16)
Belx(A) = P ( A I X )
where A _C~ and A = A, A, A, A . . . . . To see this, note that the function f x satisfies (14) so that we may set (17)
/~x(A) = fx(A)
for all A C l-l. If Belx is defined from/Zx by (13) and (15), the relation (16) follows. 5 A non-random propositions B = B, B, B, B , . . . induces a special case of a Shafer-Dempster belief function
RANDOM
BelB(A) = 1 = 0
339
PROPOSITIONS
ifAD_B otherwise
6.2. Non-Additivity and Ignorance The possibility of non-additive probabilities is very important for the theory of random propositions (and for the S h a f e r - D e m p s t e r theory), for it is this that allows a simple representation of ignorance. Imagine, for example, that we have incomplete information on the construction and throwing of an extremely elaborate 6-sided die. We are able to determine only that the die will show an even face on 90% of throws in the long run and we know nothing about the remaining 10%. Taking f~ = {1, 2, 3, 4, 5, 6} and writing even = {2, 4, 6}, we can represent this information as the random proposition B = even, even, even, even, 1~, even, even, even, even, even,... where the frequency of even amongst the two terms even and f~ is 0.9. Writing 1 = 1, 1, 1, 1, 1 . . . . ; 2 = 2, 2, 2, 2, 2 , . . . etc.; even = even, even, even, even . . . . ; and odd = odd, odd, odd, o d d , . . . , where odd = {1, 3, 5}, we compare the sequences to recover the non-additive probability P(1 [ B) = P(2 1B) = P(3 [ B) . . . . . P(even I B) = 0.9 P ( o d d l B ) = 0
P(6 IB) = 0 P(•IB) = 1
For those used to additive probabilities, it seems strange that P(2 ] B) = 0, since we have strong belief that some even face will result on a throw. This latter information, however, is fully contained in P(even [ B) = 0.9. What P ( 2 1 B ) = 0 tells us is that we have no information on whether the even throw expected will be a 2, i.e., it represents ignorance and not disbelief. Notice also that this background proposition B is subject to evidential scrutiny. If a 2 results on an actual throw, then we can ask for the degree of support that B accrues from this evidence 2 and find P(B [ 2) = 1 so that B is fully vindicated. If, however, a result of 1 occurs, then we have P(BI 1) = 0.1, which gives little support to B . 6
340
JOHN
D.
NORTON
6.3. Dempster's Rule of Combination In the Shafer-Dempster theory of belief functions, evidence from different sources is combined by Dempster's rule of combination. In its most perspicacious form (Norton, 1988), the rule is expressed in terms of improper bpa's. To combine two such assignments /xl and /x2, we form the sum B,CC_I2
(18)
tzl ® pa(A) =
~
/xl(B)./x2(C)
BCIC=A
If ~1, ~ and /-tl Q pa correspond to the belief functions Bell, Bel2 and Bell Q Bel2 according to (15) and (13), then the application of Dempster's rule corresponds to the transition (19)
Bell, Bel2 ~ Bell ® Bel2
In the light of (16), this transition is analogous to a combination of the evidence in propositions X and Y according to (20)
P(. IX), P(. I Y ) ~ P(. I X & Y )
However transition (20) fails in general to correspond to (19) in the sense that we do not in general have (21)
Selx Q Bely(.) = P(. IX & Y)
Condition (21) will obtain, however, if X and Y are independent in the natural sense that their joint generating function can be factored as (22)
fx,y(A, B) = fx (A)fy(B)
for all A, B C IL 7 One now sees immediately that the independence condition (22) sanctions the identification (21) so that upating via Dempster's rule (19) corresponds to the combining of propositions X and Y according to (20) in this special case. For, using (22), we have B,C~I2
(23)
fx&y(A) =
~
fx(B)fy(C)
BNC=A
If the fx, fY and fxs~Y of (23) are taken as improper bpa's according to (17), then relation (23) simply becomes Dempster's rule (18). Thus the theory of random propositions contains Dempster's rule of combination. Moreover it supplies the crucial component missing from the Shafer-Dempster theory necessary for the rule's consistent appli-
RANDOM
341
PROPOSITIONS
cation. The rule can only be applied to combine belief functions based on independent evidence. But the Shafer-Dempster theory provides no internal criterion of independence. That criterion is now supplied as the independence condition (22). The scheme (20) handles many more cases than the Shafer-Dempster scheme (19), since the former allows for evidence that is not independent. In fact, in forming X & Y, one .automatically takes into account their degree of dependence. For example, at the opposite extreme, scheme (20) properly handles the case in which X and Y are fully dependent. In this case we have (24)
fx,y(A, A) = fx(A)
for all A C_f~.s Therefore for this case we have P(. IX), P(. I Y)--, P(. I X & Y ) = P(. IX) = P(. l Y) Thus, there is no alteration in our distribution of belief, as we should expect, since X & Y contain the same information as either of X and Y individually. There is no similar automatic mechanism for allowing for the full or partial dependence of evidence in the Shafer-Dempster theory. If one uses Dempster's rule to combine two belief functions based on fully dependent evidence, one improperly alters the distributions of belief. 7.
RELATION
TO THE
BAYESIAN
THEORY
7.1. Standard Probability Theory as a Special Case We recover this standard theory as a special case of the theory of random propositions as follows. To begin, any elementary proposition E induces an additive measure p on 1) according to
p(A) = P(A [ E) for all non-random A = A, A, A, A , . . . and A C_~. However, to recover the full Bayesian theory including its conditional probabilities we must undo the two part extension of Section 3. We can undo this extension by restricting our set of propositions to all non-random propositions allowed by 1~ and to a single, elementary, random proposition B that will form the background information of the prior probability distribution. We now allow the set to close under the operations - ,
342
JOHN D. NORTON
V, & , I~in c, In the the closure, we let random propositions appear only in v, &, &in¢ as their second argument. (Thus all random propositions remain elementary and the distinction between & and &in¢ vanishes). Finally we allow only non-random propositions as the first argument of P(" I') and random propositions as the second. The resulting system is isomorphic to the Bayesian theory on a finite outcome set, under the association of non-random propositions A with outcome A and P(. IB) with the prior probability p(. I B).
7.2. Generalized Product Rule and Generalized Bayes' Theorem In the standard probability calculus, we have the product rule
(25)
p(A n B I C) = p(A I B n C)p(B I C)
where lower case p is used to represent a standard probability measure. One derives directly from this rule two forms of Bayes' theorem. Representing hypothesis, evidence and background by H, E and B and using (25) to e x p a n d p ( H n E I B) is the two ways allowed by H n E = E n H, we arrive at
(26)
p(HI E n B) - p(EIH n B) p(HI B) P(EIB)
and the relative form of the theorem for hypotheses H and H ' (27)
p(/-/lE n 8) _ p(EI/-/n 8) p(/418) p ( U ' l E n B) p(EI/-/' n B) p(H'IB)
In the theory of random propositions, as can be confirmed by a brief calculation, the rule analogous to (25) is the generalized product rule (28)
P(A & B I C) = P ( A I B &ino C)P(B I C)
It holds for all propositions A, B, C if the relevant probabilities are defined. If C is an elementary proposition, the operator &inc reduces to & and the generalized rule (28) reduces to the same form as the standard rule (25). 9 If we have hypothesis I-I, evidence E and background B, then we can use the generalized product rule to generate corresponding forms of Bayes' theorem by the usual method of using
RANDOM PROPOSITIONS
343
(28) to expand P(H & E I B) in the two ways allowed by H & E = E & H. We recover (29)
P(H I E &i.c B) = P(E ] H &inc B ) . P(H [ B) P(EIB)
and the relative form of the theorem for hypotheses H' and H (30)
P(H ] E •inc B) = P(E [ H &inc B) . P(H ] B) P(H'I E &~.¢ B) P(E !H' &incB) P(H'[ B)
As before, whatever the nature of H and E, if B is an elementary proposition, the operator &in~ reduces to & and these versions of Bayes' theorem reduce to the corresponding forms of the standard theory (26) and (27). 7.3. The Problem of the Priors Revisited Since the theory of random propositions reduces to the standard Bayesian theory in special cases, the prior probability P(H I B) can exert just as powerful an influence on the posterior probability P(H I E & B) as it does in standard Bayesianism. However, in the present theory, this influence need not always be so significant. One can reduce the influence of the prior probability distribution by reducing the information content of the background B. If propositions are interpreted as sequences, one does this by replacing singleton sets as terms of the sequence B by larger sets of outcomes. In the limiting case, one replaces them all by f~ and one reverts to the vacuous background B = ~ = ~, 12, ~, 12, 12, f~. . . . for which P(H [ a ) = 0 where fn(12)> 0. This background has no influence on the posterior probability at all since E & 1) = E, so that 1° P ( H t E & ll) =
P(H IE)
The assignment of a zero-valued prior probability p(H I B) in standard Bayesianism is potentially disastrous, for it follows from Bayes' theorem (26) that no possible evidence E - no matter how favorable to the hypothesis H - can ever lead to a non-zero posterior probability
344
JOHN D. NORTON
p(H I E f3 B). This unfortunate situation no longer necessarily obtains in the present theory. A zero-valued prior probability P(I-I I B ) is compatible with a non-zero posterior probability P ( H I E & B). A simple example suffices to show it: On ~ = {a, b, c, d}, set B = {a, b}, {a, c}, {a, b}, {a, c}, {a, b}, {a, c } , . . . H = {a}, {a}, {a}, {a}, {a}, {a}, {a}, {a}, . . . E = {a, c}, {a, b}, {a, c}, {a, b}, {a, c}, {a, b} . . . . Then, even though P(H [ B) = P(H [ E) = 0, we have P(H [E & B) = 1,11
This example is possible since, unlike their standard forms (26) and (27), the generalized Bayes' theorem (29) and (30) is unable to force a zero-valued posterior probability from a zero valued prior probability. The crucial difference is the presence of the operator &ino instead of & in the thereom. A zero-valued prior probability P(HIB ) = 0 now forces fH~i.cB(O) = 1 so that P(E I H &inc B) is undefined according to (10). In this case, the generalized Bayes' theorem is unable to force any result at all. 8 . A P P L I C A T I O N TO T H E C A S E O F B E R N O U L L I T R I A L S
The simple case of Bernoulli trials illustrates the differences between the Bayesian and the present theory. For concreteness, let us say we have a coin and that we have no idea, to begin with, whether the coin is fair or not. The coin is tossed ten times and yields five heads and five tails. We shall write this result as "5H5T." We now seek the degree to which this outcome supports the various members of the family of hypotheses h(r), where h(r) asserts that the coin will yield a head on a fraction r of tosses and a tail on the remainder.
8.1. The Bayesian Analysis We seek to apply Bayes' theorem (26) or (27) (in which we now write
RANDOM
PROPOSITIONS
345
h(r)). We seek to determine the posterior probability where E is the evidence 5H5T and B the background assumptions. If h(r) is true, the number of heads in ten tosses will be binomially distributed so that the likelihood is
hypothesis H as
p(h(r) [E n B),
L(r) =p(Elh(r) n B) = (lO)rs(1- r)S= 252rS(1- r) 5 Since we have no credible way of establishing p(E I B), we cannot use the form (26) of Bayes' theorem. We can only use the relative form (27) and recover for hypotheses h(r) and h(r')
(31)
p(h(r) [E n B) p(h(r') ]E n B)
--
L(r) p(h(r) [B) L(r') p(h(r') ]B) ,
It is easy to show that this likelihood ratio L(r)/L(r') is greater than 1 for any r' :P r if r = 0.5. In this sense, the Bayesian analysis tells us that the evidence E = 5HST favors h(0.5) over all other hypotheses. However, this analysis is unsatisfactory for two closely related reasons. First, while the likelihood ratio favors h(0.5) over all others, this does not mean that the ratio of posterior probabilities will always favor h(0.5). The effect of a favorable likelihood ratio can always be swamped by an unfavorable ratio of prior probabilities. Since these prior probabilities can be chosen freely and independently of the evidence, the possibility of an unfavorble ratio cannot be ruled out.12 Second, without artificial hypotheses, the analysis can give no numerical value for the quantity that interests us most, p(h(0.5) I E n B), the actual degree of support of the most favored hypothesis on the evidence. To do so, one would need to supply a value for p(E I B) for the form (26) of Bayes' theorem. Or one must do something that amounts to the same thing; one must supplement the system of Equations (31) with a summation
p(h(r) ] E n B) = 1 r E R
in which it must be assumed that the set {h(r) lr ~ R} exhausts all viable hypotheses. (This assumption will be incorporated into the prior probability distribution.) The problem is that the value of the posterior probability p(h(0.5) I E N B) will be very sensitive to the size of the
346
JOHN D. NORTON
freely stipulated R. In particular, if R is continuous, the posterior probability will become zero, since it is the value at a single point in a continuum of possible r values. Non-zero values will only arise for probability densities. 8.2. Analysis within the Theory of Random Propositions The outcome set is 1~ = {0H10T, 1H9T . . . . . is given as the non-random proposition
10HOT}. The evidence E
E = 5/-/5 T, 5/-/5 T, 5H5 T, 5H5 T, 5H5 T, 5H5 T . . . . Since h(r) entails that the outcomes ill(10 - i)T, for i = 0, 1, 2 , . . . , 10, will be binomially distributed, it is most natural to represent the hypothesis by the random proposition h(r) = . . . ,
i H ( I O - i)T . . . .
where each term iH(IO - i)T occurs with the frequency
(lO)ri l r, Oi as dictated by the binomial distribution. In language uninterpreted by sequences, this means that
fE(5H5T) = 1 fh(r)(iH(lO - - i ) T ) = (1/0)ri(1 _
r) lo-i
From the lemma (11) we have
fh(r),E(iH(lO- i)T, 5H5T)= (lO) ri(l - r) 1°-i as the only non-zero terms. We now have immediately that (32)
P(h(r) l E ) = ( l : ) r S ( 1 - r )
5
This expression has a maximum value when r = 0.5, so that h(0.5) is again the hypothesis most favored by the evidence. For r = 0.5 we recover
RANDOM
347
PROPOSITIONS
P(h(0.5) [ E) --- 0.246 This analysis is considerably simpler than the Bayesian. It also dearly resolves the problem of the unwanted intrusion of arbitrary prior probabilities, for a background B plays no role in the analysis. We have also recovered a definite numerical value for P(H(0.5) IE). However we should not read too much significance into the glory of a single number, for this number really only has meaning within the fairly restricted confines of this particular outcome set fL 13 The case of Bernoulli trials lends itself well to further analysis within the theory of random propositions. For example, we could introduce prior probabilities through a background proposition B. If we chose B = ~ , the vacuous proposition, then it would not alter the above analysis or affect the recovery of (32). For, as we saw in Section 7.3, we would have P(h(r) [E) = P(h(r) I E & B) and, since we would have P(h(r) [B) = 0, Bayes' theorem (29) and (30) would be degenerate. We could also introduce non-trivial prior probabilities via a nonvacuous background proposition B, which would then exert an influence on the posterior probabilities P(h(r) [ E & B). However that influence would not be unaccountable; for B would always be subject to evidential scrutiny by the formation of P(B [E).
9.
POSSIBILITY
OF A DECISION
THEORY
The purpose of this section is to make plausible the idea that some sort of decision theory could be coupled with the theory of random propositions. The basic idea is that the theory, with its relative frequency interpretation, give us information on the long term behavior of repeated trials. We can then use this information to make decisions aimed at maximizing utility over the long term. If we wish, we can then divorce the decision theory from the relative frequency interpretation and use its methods and rules indepedently, assured at least of its consistency, since an interpretation exists for it. A simple example illustrates the possibility. We are invited to play a simple game in which we are to guess the outcome of a coin toss. If
348
JOHN
D.
NORTON
we guess correctly, we gain a unit of utility. There is no penalty for an incorrect guess. The outcome set is 1~ = {H, T}. I - I = H , H , H . . . . and T = T, T, T , . . . are non-random propositions representing the outcomes H and T respectively. On assembling our best evidence E for the outcome of a coin toss let us say that we have P(HIE )=0.2
P(TIE )=0.5
P(~ E)=0.3
As a sequence, E would have a form such as
E = H , T, T, fl, T, f l , . . . /
with the frequencies of H, T and f~ at 0.2, 0.5//and 0.3 respectively. It represents our best information on the outcomes of repeated tosses. The presence of fl as a term of the sequence represents a lack of knowledge on our part: it stands for an outcome of either H or T. Thus we can re-encode the information in E in an a-parametrized family of additive probability measures on fl. Each measure P , arises from replacing a fraction a (0 ~< a ~< 1) of terms II in the series E by H and the remaining fraction (1 - a ) by T. The additive measure P~ is generated by setting P~ ( H ) equal to the relative frequency of H in the series and similarly for P~ (t). 14 We thereby arrive at P~ (/-/) = 0.2 + 0.3a
P o ( T ) = 0.5 + 0.3(1 - a )
Thus, the expected utilities of choosing H or T are, for each a, e ~ (/4) = 0.2 + 0.3a
E ~ ( T ) = 0.5 + 0.3(1 - a )
We see immediately that, for all a, the choice of T dominates H in the sense that
(T) i> T h e r e f o r e we choose T. This example was easy in so far as one action dominated another. In general this will not happen and the real burden of designing a decision theory will lie in selecting principles to direct decision in these more complicated cases.
RANDOM PROPOSITIONS
349
9. CONCLUSION The theory of random propositions provides a theory of confirmation which contains the Bayesian and the Shafer-Dempster theories as special cases, while extending both in ways that resolve some of their outstanding problems. The theory of random propositions itself suffers some technical limitations. For simplicity it is formulated only on finite outcome sets and finite sets of propositions. However this simple formulation at least suffices to illustrate the viability of the theory and to make clear its connection to other theories. More serious limitations are shared with most other calculi of confirmation. The theory is based on the assumptions that confirmation is a binary relation between hypothesis and evidence and that degrees of confirmation can be compared on a simple linear scale and measured by numbers. A still more general theory might well need to dispense even with these assumptions. NOTES * I am grateful to Frank Arntzenius, John Earman, Allan Franklin, Teddy Seidenfeld, Brian Skyrms, Sandy Zabell and an anonymous referee for helpful discussion. i For a recent examination of the Bayesian theory see Earman (1992). 2 I am grateful to Teddy Seidenfeld for emphasizing the importance of this last deficiency. 3 The theory at this stage could be qualified in two ways. First, the representation of a random proposition as a single sequence of outcomes is too rich. A single sequence contains more information than is relevant to the theory. For example, let us say the sequences P1 = A, B, A, B, A, B, A, B, A, B, A, B, A, B, A, B, A, B, A, B , . . . P2 = B, A, B, B, A, A, B, A, A, B, B, A, B, B, B, A, B, B, B, A . . . . both have the same limiting frequency of A and B, which, in this case, is 0.5 for both A and B. Then either P1 or P2 could represent random proposition P for which P(P I A) = P(PI B ) = 0.5, where A = A , A , . . . and B = B, B . . . . . The ways in which P1 and P2 differ - P1 begins with A, B where P2 begins with B, A, for example - are irrelevant. All that counts is the frequencies of A and B. To ensure that the irrelevant extra information of such differences plays no role in the theory, we could represent the random proposition P not by a single sequence, but by the set of all sequences whose terms have the same limiting frequencies. We would require that the properties relevant to the theory are just those common to each member of the set. This revision must actually be made a little more cautiously to ensure that relations between different random propositions are preserved. For example, we may want P1 and P2 to represent different random propositions for which P(P1 I P2) @ 1. Therefore we would require the following. Let the joint frequency distributions of a set of sequences {P, Q , . . . } represent the probabilistic relations intended between some intuitively repre-
350
JOHN D. NORTON
sented set of propositions. Then we define the set of propositions to be the set of all such sets of sequences with the same joint frequency distribution. The second qualification pertains to the difference between P1 and P2. P1 is nonrandom in so far as it is simply an alternating sequence of A and B. P2 is random in the sense that it was generated by tosses of an unbiased coin. One might prefer to restrict the sequences representing random proposition to the latter type of sequence, which is intuitively random. We would then need to provide a formal criterion of randomness of a sequence, such as yon Mises (1957, pp. 24-25) requirement of independence of limiting frequencies from place selection. While both modifications are of importance to the interpretation of the theory, we forgo them here in the interests of simplicity. Neither affects the calculus actually generated, as long as the rules of the calculus depend only on the various frequency distributions. 4 The ellipses of these schemes are to be read as follows. In (6), the ellipses to the left of - X and to the left of X can be replaced by any finite list of proposition in ~ provided that the two lists are the same. Similarly, the ellipses to the right of - X and to the right of X can be replaced by any finite list of proposition in ~ provided the two lists are the same. For example, the scheme allows fv,v,-x,v,z(B, C, .xi, D, E) = fu,v,x,v,z(B, C, A, D, E) for any U, V, Y, Z E ~ and any B, C, D, E E f L The remaining schemes are treated analogously. 5 The random proposition X induces a Shafer-Dempster belief function in exactly the same way as a random set, as shown by Nguyen (1978). Each random proposition corresponds to a random set, since each random proposition determines an additive measure over the power set of f~ and this measure determines a random set. However random propositions are richer structures in so far as many distinct random propositions can determine the same random set, but not conversely. That is, if random propositions X and Y determine the same random set, then there will be as many distinct pairs of random propositions X and Y as there are distinct functions fx,v. This extra structure is essential to the theory of random propositions since it allows formation of the probability P(X [ Y) and random propositions such as X & Y. 6 This example can be carried through without explicit mention of sequences in the axiomatic spirit of Section 5.1. The relevant information in the sequence B is fully captured by fn(even) = 0.9
fB(f~) = 0.1
We can then invoke the lemma of (11) to conclude that the only non-zero fl,B are f~.n(1, even) = 0.9
f~,B(1, f~) = 0.1
That P(11 B) = 0 follows from the definition (10). Similar arguments yield the remaining probabilities. 7 If one interprets fx,v as a joint frequency distribution, it is immediately clear that (22) expresses the statistical independence of the distribution of terms in the sequences X and Y. This is compatible with Shafer's (1990, p. 341) description of the independence required by Dempster's rule as "simply probabifistic i n d e p e n d e n c e . . . " . Shafer, however, prefers to treat dependent evidence by altering the frame of discernment so that
RANDOM PROPOSITIONS
351
independence is recovered. Shafer (1992, §4.8) describes an approach by Ruspini in which dependence of evidence is expressed as a probabilistic dependence similar to the one advocated here. s In the frequency interpretation, (24) just tells us that the terms of the sequences of X and Y are the same and have the same statistical distribution, being perfectly correlated. This is a natural sense of full dependence. 9 U n d e r the association p ~ P, f3 ~ & and H~--~ H, etc. 10 In the uninterpreted calculus, we would choose B = 1~, so that fa(l~) = 1. It then follows from a variant of the lemma (11) that fn,~.a(A, C,f~)=fn,E(A,C), for all A, C C fl, are the only non-zero terms of fn,E,a. That P ( H I E & l l ) = P ( H t E ) now follows from the formation of frI.~aa. n If the propositions are uninterpreted, we have
fB,n.E({a, b}, {a}, {a, c}) = 0.5 = fD,H,E({a, C}, (a}, {a, b}) as the only non-zero terms offD.~.E; SO that P(H ] B) = P(H ] E) = 0. But fn.Ee,B({a}, {a}) = I so that P(H I E & B) = 1. 12 Of course, in the limit of very many throws, the effect of a poor choice of prior probabilities will "wash out." However we do not have that limiting case at hand. We have the case of ten throws and hope for a more definite judgment now. 13 One way to see this is to compare with the corresponding case of two tosses where IT ={2HOT, 1HIT, OH2T}. On analogous evidence of 1H1T, we would recover P(h(0.5) t E) = 0.5. Thus, if we compare - illegitimately - numerical values between l'l and gl', we find that the transition from evidence 1H1T to the intuitively stronger 5H5T actually reduces the probability P(h(0.5) I E). 14 The procedure generalizes naturally to arbitrary E on an arbitrary ft. If E contains the non-singleton term X = {bl,. • . , bm}, where bl . . . . . b,, ~ l), then a fraction o/i of occurrences of X in E is replaced by bl, for i = 1 , . . . , m, and where c~1 + . • • + c~,~= 1. Thus X contributes indices a l , . . . , am to the family of additive measures P...~I ......... which are generated by setting P ............. (A) equal to the frequency of A in the modified sequence E, for each single subset A of fl..
REFERENCES Earman, J. 1992: Bayes or Bust: A Critical Examination of Bayesian Confirmation Theory, MIT Press, Cambridge, MA. Iyanaga, S. and Y. Kawada: 1980, Encyclopedic Dictionary of Mathematics, trans. K. O. May, MIT Press, Cambridge, MA. Nguyen, H. T.: 1978, 'On Random Sets and Belief Functions', Journal of Mathematical Analysis and Applications 65, 531-542. Norton, J. D.: 1988, 'Limit Theorems for Dempster's Rule of Combination', Theory and Decision 25, 287-313. Shafer, G. : 1976, A Mathematical Theory of Evidence, Princeton University Press, Princeton. Sharer, G.: 1990, 'Perspectives on the Theory and Practice of Belief Functions', International Journal of Approximate Reasoning 4, 323-362.
352
JOHN D. NORTON
Shafer, G.: 1992, 'Rejoinders to Comments on "Perspectives on the Theory and Practice of Belief Functions"', International Journal of Approximate Reasoning 6, 445-480. von Mises, R.: 1957, Probability, Statistics and Truth, 2nd ed. Dover, New York, 1981. Manuscript submitted August 5, 1993 Final version received July 28, 1994 Dept. of H.P.S. University of Pittsburgh 1017 Cathedral of Learning Pittsburgh, PA 15260 U.S.A.