Journal of Intelligent Information Systems, 4, 231-260 (1995) @ 1995 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands.
Combining Databases with Prioritized Information* SHEKHAR PRADHAN
pradhan @cs.umd.edu
Department of Computer Science, Universityof Maryland, CollegePark, MD 20742, U.S.A.; and Department of Philosophy, CentralMissouri State University, Warrensburg,MO 64093, U.S.A. JACK MINKER V.S. SUBRAHMANIAN
[email protected] vs @cs.umd.edu
Institatefor Advanced ComputerStudies, Universityof Maryland, CollegePark, MD 20742, U.S.A.;and Department of Computer Science, Universityof Maryland, CollegePark, MD 20742, U.S.A.
Abstract. To solve a problem one may need to combine the knowledge of several different experts. It can happen that some of the claims of one or more experts may be in conflict with the claims of other experts. There may be several such points of conflict and any claim may be involved in several different such points of conflict. In that case, the user of the knowledge of experts may prefer a certain claim to another in one conflict-point without necessarily preferring that statement in another conflict-point. Our work constructs a framework within which the consequences of a set of such preferences (expressed as priorities among sets of statements) can be computed. We give four types of semantics for priorities, three of which are shown to be equivalent to one another. The fourth type of semantics for priorities is shown to be more cautious than the other three. In terms of these semantics for priorities, we give a function for combining knowledge from different sources suchthat the combined knowledge is conflict-free and satisfies all the priorities.
Keywords: combining databases, databases, prioritized data
1.
Introduction
To solve a problem one may need to draw on the knowledge of several different experts. It can happen that some of the claims of one or more experts may be in conflict with the claims of other experts. In that case one cannot simply pool together the knowledge of the different experts. Since the person consulting the various experts is most likely not an expert in these domains (or why would he/she be consulting experts?), he/she is often in no position to adjudicate between the conflicting statements. But then how should he/she proceed? Consider a motivating example. Suppose the personnel officer of a large company has to determine whether there are any medical reasons for not hiring a certain applicant for a high stress job. It is the standard practice of the company to consult both a cardiologist who examines the applicant and the patient's personal physician. The cardiologist's report says, among other things, that the applicant suffers from a certain heart irregularity that leads to a heart attack under great stress, and therefore the applicant may well suffer a heart attack *Jack Minker and Shekhar Pradhan were supported in part by the National Science Foundation grant IRI-8916059 and Air Force Office of Scientific Research grant 91-0350. V.S. Subrahmanian was supported in part by Army Research Office grant DAAL-03-92-G-0225, Air Force Office of Scientific Research Grant F49620-93-I0065, and NSF grant IRI-9109755.
232
PRADHAN, MINKERAND SUBRAHMANIAN
due to the stress of the job. It also says that the applicant's diet, if continued over a long period of time, will worsen the heart irregularity. The personal physician testifies that over the many years that the applicant has been his patient he has remained very healthy even in times of great stress. And the patient's generally robust health and healthy habits will enable him to handle very stressful situations in spite of his heart irregularity. The physician further adds that the applicant has been following a diet over the years prescribed by the physician which will reduce the heart irregularity. The physician notes that indeed the heart irregularity has somewhat diminished over the years. The rest of the physician's report is not in conflict with the cardiologist's report. Clearly, these two reports are in conflict. Furthermore, they are in conflict over two points: 1) whether the patient's heart irregularity will make him unable to handle great stress, and 2) the effect of his diet on his heart irregularity. There are several different more or less extreme policies that can be adopted in the face of such conflict: 1. 2. 3. 4.
Reject all the conflicting theories. Accept only one of the conflicting theories. Accept only that part of the theories where they are not in conflict. Accept all of one of the conflicting theories and those parts of the other theories that are not in conflict with the accepted theory. 5. Accept the disjunction of the conflicting parts and the parts of each theory that does not conflict with any of the other theories. None of the above policies are entirely satisfactory. The policy of rejecting all conflicting theories is clearly very wasteful. The second policy is unwise because the person consulting the experts may lack the expertise to choose between the two theories and it requires us to throw out even those parts of the other theories that are not in conflict with the chosen theory. The fourth policy, although not as wasteful as the second policy, is still unwise because a theory may be correct on some points of conflict and wrong on other points. For instance, in the above example, the physician may be right about the applicant's capacity to handle stress, but the cardiologist may be right about the applicant's diet. Hence, the policy of choosing one theory in toto is not a wise policy. The fifth policy is better than the other four policies because it does not throw out any of the conflicting information and does not choose among the theories. But, even though having disjunctive information is better than having no information, the more disjuncts there are in the disjunction, the weaker the information content. So a combination policy that uses the fifth approach, but finds a way of reducing the disjunctive information would be useful. Such a policy would not choose one theory over another in toto, but it would choose among the conflicting statements wherever possible. We investigate such a policy in this paper. We assume that the knowledge of experts can be encoded in the form of databases, and, thus, pooling together the knowledge of different experts can be regarded as combining databases. When two or more statements in the combined database conflict, this conflict can be eliminated or reduced by introducing priorities among the conflicting statements. We envisage these priorities as provided by a user of the combined database. These priorities are intended to express arbitrary preferences of a given user. A priority, written as x > Y, where x is an atom and Y is a non-empty set of atoms, says the user prefers x to be in the combination over Y. This does not mean that x will be in the combination, but only that in the conflict between x and Y, x is chosen over Y. It could happen that x is in conflict with
COMBININGDATABASESWITH PRIORITIZEDINFORMATION
233
some other statement, z, in the combined database which is preferred over {x} and, thus, x may not end up being in the combination. Also, we shall not assume that x > Y means that x is more reliable than Y or, even, considered more reliable by the user. That is, in our formal treatment of priorities, we shall not make any assumptions about the user's reasons for preferring x to Y. Thus, in our example, the personnel manager can give priority to the physician's claim that the applicant can handle the stress of the job over the cardiologist's claim that the stress of the job will cause a heart attack in the applicant. The personnel manager may give preference to this claim of the physician because he thinks it is more reliable, or because it is the company's policy to give an applicant the benefit of the doubt in these matters, or because the personnel manager favors the applicant, or whatever. The preferences of an ideally rational agent are transitive. But Tversky and Kahneman (1981) and others have observed that real agents (or users) are not always this rational in their preference structures. So we shall not assume that the user-supplied priorities are transitive. But, if for a given user they happen to be rational, our approach applies to such priorities without any modification. In section 2 we introduce the basic definitions, terminology and some preliminary lemmas; in section 3 we introduce two semantics of priorities, prove their equivalence, and also introduce a function for combining theories based on a semantics of priorities; in section 4 we introduce a third semantics of priorities and prove that it is equivalent to the first two; in section 5 we introduce a cautious semantics of priorities and prove that the previous three semantics are not cautious; in section 6 we discuss related work. Sections 9 and 10 are appendices. In section 9 we give an algorithm for computing the combination of theories based on the cautious as well as on the non-cautious semantics; in section 10 we give the proofs of the lemmas that are stated in the previous sections, but not proved there.
2.
Preliminaries
In this paper we shall assume that the theories to be combined consist of propositional atoms, and so, in particular, they do not contain negative statements. Although such theories do not contain negative statements, two or more statements from these theories may be in conflict because they violate integrity constraints. In this paper we shall assume that the integrity constraints are of the form +-- al, a2 . . . . . an, where al, a2 . . . . . an are all propositional atoms. The intended meaning of such an integrity constraint is that not all of {as, a2 . . . . . an } can be true. Thus, going back to our motivating example, let 9 al = applicant can handle the stress of the job. 9 a2 = the stress of the job will cause a heart attack in the applicant. Then, we can have the integrity constraint +- al, a2. The union of the cardiologist's report with the physician's report will violate this integrity constraint. We make this idea precise next. DEFINITION 1. Given an integrity constraint 1Ci = + - al, a2 . . . . . an, let {al, a2 . . . . . an}. We shall call
Obj(ICi) the objective part
of ICi.
Obj(ICi) denote
234
PRADHAN, MINKER AND SUBRAHMANIAN
In this paper we shall assume what is commonly referred to as the consistency interpretation of the satisfaction of integrity constraints. More precisely: DEFINITION 2. Let ICi be an integrity constraint. Let T be a set of propositional atoms. Then, T satisfies ICi if Obj(ICi) ~ T. That is, if T U ICi is consistent. T satisfies a set of integrity constraints if it satisfies each member of the set. T violates an integrity constraint if it does not satisfy that integrity constraint. T violates a set of integrity constraints if it violates any member of the set. DEFINITION3. Givenasetoftheories,{T1, T2 . . . . . T~ }, consisting entirely ofpropositional atoms, each satisfying a set of integrity constraints, IC, the union of the theories will be called the base set. DEFINITION 4. A n option associated with {T1, T2 . . . . . Tn } is a maximal subset of the base set consistent with IC. EXAMPLE 1. Let the base set, B = {a, b, c}. Let IC = { +- a, b; +- a, c}. In this case, the options are {a} and {b, c}. DEFINITION 5. Let B be a set of atoms. Let ICi be an integrity constraint. We say that ICi is active in B, if B violates ICi. DEFINITION 6. Let IC be a set of integrity constraints. We say that IC is canonical, if no member of IC subsumes any other member of IC. If X = {Xl, x2 . . . . . xn } and Y = {Yl, Y2. . . . . ym }, then the priority X > Y is understood as an abbreviation of {Xl > Y, x2 > Y. . . . . xn > Y}. But {Xl > {yl}, xl > {Y2}. . . . . Xl > {ym}} should not be understood as Xl > Y. This is because in general it is not the case that if a statement (or a choice) is preferred to several other statements (or choices) taken individually that that statement (or choice) is preferred to those other statements (or choices) taken jointly. However, we shall assume that Xl > Y implies that Xl > Y', where Y' is any subset of Y. W h e n Y is a singleton set we shall write x > Y as x > y, where Y = {y}. DEFINITION 7. returns Y.
Let P/ = x > Y be a priority. Then greater(Pi) returns x and lesser(Pi)
REMARK 1. In this paper we shall assume that all priorities are such that the greater part of the priority is in the base set. Having a priority whose greater part is not in the base set would amount to giving preference to a statement which is not in any of the theories being combined. In this paper we are considering only preferences among statements that are already in one or more of the theories being combined. We shall also assume that we have no priorities of the form x > Y where x 6 Y, since this would imply that x > x, which makes little sense.
DEFINITION 8. If a is a propositional atom, then an option oi contains an a-block iff there is a subset X of the option such that there is an ICi E IC such that Obj(ICi) = {a} U X.
COMBINING DATABASES WITH PRIORITIZED INFORMATION
235
EXAMPLE 2. In Example 1, the option {b, c} contains an a-block because Obj(IC2) = {a, c}; here X = {c}. DEFINITION 9.
A set of atoms oi satisfies a priority x > Y iff
~. X E Oi, or
2. Y f q o i = O , or 3. (oi - Y) contains an x-block
EXAMPLE 3. In Example 1, {b, c} satisfies a > b by condition (3) above as {b, c} contains an a-block, but {a} does not satisfy c > a. A base set which violates an integrity constraint is split up into subsets (what we call 'options') each of which maximally satisfies that integrity constraint. So each of these options contains a maximal strict subset of the objective part of that integrity constraint. Lemma 1 below states that this property holds regardless of how many integrity constraints are involved in splitting up the base set into the set of options: each option contains a maximal strict subset of the objective part of each integrity constraint. LEMMA l. Let T1, T2 . . . . . Tn be a set o f theories each consisting only o f propositional atoms. Let l C be a set o f canonical integrity constraints. Let 0 be the set o f options induced on B = 1"1 U T2 U . . . U Tn by IC. Then, f o r each atom x ~ Obj(ICi) such that ICi E IC and ICi is active in B, there exists an oi E 0 such that Obj(ICi) - {x} __coi.
3.
Semantics of priorities
When an option oi ~ 0 does not satisfy the priority x > Y, then we envision three ways in which O can be altered so as to satisfy the priority. 1. Remove oi from O because it does not satisfy the priority. 2. Impose a ranking on members of O which gives oi a lower status. 3. Transform oi so that it satisfies the priority. These three possible responses suggest three different types of semantics for priorities, described below in this section and the next. Option elimination, defined below, is based on the first way of satisfying priorities. DEFINITION 10. S e m i : O P T I O N ELIMINATION Let TI, T2 . . . . . Tn be a set of theories each consisting only of propositional atoms and each satisfying a set of integrity constraints IC. Let P = {P1 . . . . . Pn} be a set of priorities. Let O be the set of options induced on B = TT U/'2 U 9 99 U Tn by IC. Then, Seml(T1, T2 . . . . . Tn; IC; P ) = {oi E O [ oisatisfies eachPi ~ P}
EXAMPLE4. Let Tl = {a, b} and Tz = {c, d}; l e t l C = {+-- a, b, c; +-- a, d; +- b, c, d}; and let P = {a > b , b > c , c > d}. So B = { a , b , c , d } , and O = {{a, b}, {a, c},
236
PRADHAN,MINKERAND SUBRAHMANIAN
{b, c}, {b, d}, {c, d}}. a > b eliminates {b, c}; b > c eliminates {a, c} and {c, d}; c > d eliminates {b, d}. Hence, Seml(T1, T2 . . . . . T,; IC; P ) = {{a, b}}. DEFINITION 1 I. Let oi and oj be two options and let P be a set of priorities, then o i ~_p Oj iff there exists a p r i o r i t y x > Y 6 P such t h a t x ~ oi, Y N o j # 0 a n d o j Y c oi
-
{x}.
Option ordering, defined below, is based on the second way of satisfying priorities.
DEFINITION 12. Sem2: OPTION ORDERING Let T1, T2 . . . . . T~ be a set of theories each consisting only of propositional atoms and each satisfying a set of integrity constraints IC. Let P = {PI . . . . , P, } be a set of priorities. Let O be the set of options induced on B = TI U T2 U ... U T, by IC. Then, Sem2(T1, T2 . . . . . Tn;1C; P ) = {oi c 0 I ~ o j c 0 such that oj >-p oi}.
EXAMPLE 5. As in Example 4, let TI = {a, b} and T2 = a , d ; + - - b , c , d } ; and l e t P = {a > b , b > c , c > d}. {{a, b}, {a, c}, {b, c}, {b, d}, {c, d}}. Then, {a, c} >-_e {b, {c, d}, {b, c} ___ {b, d}. So, Sem2(T1, T2 . . . . . Tn; IC; P ) =
{c, d}; let IC = {+-- a, b, c; +-S o B = {a, b, c, d}, and O = c}, {a, b} >-e {a, c}, {b, d} ___p {{a, b}}.
We prove below that S e m l and Sem2 are equivalent. THEOREM 1. Let T1, T2 . . . . . T, be a set o f theories each consisting only o f propositional atoms and each satisfying a set o f integrity constraints IC. Let P = { P1 . . . . . Pn } be a set o f priorities. Let 0 be the set o f options induced on B = T1 U T2 U . . . U Tn by IC. Then, S e m l (T1, T2 . . . . . Tn; IC; P ) = Sem2(T1, Tz . . . . . Tn; IC; P ) .
PROOF. Suppose by way of contradiction that Seml(T1, Tz . . . . . Tn; IC; P ) # Sem2(Tl, T2 . . . . . 7,,; IC; P ) . So either there is an oi ~ Seml(T1, T2 . . . . , Tn; IC; P ) such that oi Sem2(T1, T2 . . . . . Tn; IC; P ) , or there is an oi ~ Sem2(T1, T2 . . . . . T,; IC; P ) such that oi q~ Seml(T1, T2 . . . . . Tn; IC; P ) . First Case: Suppose that there is an oi ~. Seml(T1, 7"2. . . . , T,; IC; P ) such that oi Sem2(Ta, T2 . . . . . T,; IC; P ) . So there is an oj ~ 0 and a priority Pk = x > Y ~ P such that oj ___e~oi. That is, there is apriority x > Y 6 P such that Y N oi # 0 and x ~ oj and oi - Y c oj - {x}. But, since, oi c Seml(T1, 7"2. . . . . T~; IC; P ) , x > Y does not eliminate oi. So either i) x ~ oi, or ii) Y f) oi = 0, or iii) oi - Y contains an x-block. I f x 6 oi then oi - Y ~= oj - {x}. So oj ~ze~ oi. If Y N oi = 0 then by definition of >-, oj ~Pk Oi" Suppose oi - Y contains an x-block. Let oi = (oi N Y) U X , where X is a set of statements containing an x-block. If oi - Y c_ oj - {x}, then X ___ oj. But then x cannot belong to oj since X is an x-block and every option is consistent with IC. So by definition of ___, oy ~p~ oi.
COMBININGDATABASESWITH PRIORITIZEDINFORMATION
237
Hence, oi must belong to Sem2(T1, T2, .. 9 Tn; IC; P). Thus, a contradiction. Second Case: Suppose that there is an oi r Seml(Tl, T2 . . . . . T~; IC; P) such that oi Sem2(T1, T2 . . . . . T,; IC; P). Since oi f{ Seml(T1, T2 . . . . . T,,; IC; P) there must be a priority Pk = x > Y 6 P such that Y f3 oi ~ 13 and x q{ oi and oi - Y does not contain an x-block. This priority would eliminate oi from Seml(T1, T2 . . . . . Tn; IC; P). Let oi = Y U X, where X is a set of statements not containing an x-block. Since X does not contain an x-block, {x} U X is a possible combination, and since 0 contains all maximal options consistent with IC, there must be an option oj such that {x} t_J X c_ oj. But in that case oi - Y c o.i - {x}. So Oj ~Pk Oi. Hence oi r T2 . . . . . Tn; IC; P). Thus, a contradiction. [] DEFINITION 13. Let T1, T2,. 9 9 Tn be a set of theories each consisting only of propositional atoms and each satisfying a set of integrity constraints IC. Let P be a set of priorities. Let O be the set of options induced on B = T1 U T2 U . . . U Tn by IC. Let Sem(T1, T2 . . . . . Tn; P; IC) be a semantics for combining theories with priorities, such as Seml, or Sere2 above. Then, Comb(Sem (TI, T2 . . . . . Tn; P; IC)) = CNF(Voj~Sem(r~,r2,...,r, jc;p) (Aasoj a)). Since the combination of theories each consisting of propositional atoms will be a disjunctive theory, we need to decide what it means for a disjunctive theory to satisfy an integrity constraint. As before, we shall adopt what is commonly referred to as the consistency interpretation of the satisfaction of integrity constraints. DEFINITION 14. LetlCbeasetofintegrityconstraintssuchthatforeachlCi c IC, Obj(ICi) consists only of propositional atoms. Let T be a theory consisting of clauses which are disjunctions of propositional atoms. Let M be the set of minimal models of T. Then,
Tsatisfies IC iff every
mj
E M satisfies IC in the sense of Definition 2.
Further, we need to decide what it means for a disjunctive theory to satisfy a priority. DEFINITION 15. A disjunctive theory T satisfies a priority Pi iff every minimal model mi of T satisfies Pi. LEMMA 2. Let T1, T2 . . . . . Tn be a set of theories each consisting only of propositional atoms and each satisfying a set of integrity constraints IC. Let P be a set of priorities. Let 0 be the set o f options induced on B = T~ U T2 U ... U T, by IC. Then,
Comb(Seml(T~, T2 . . . . . Tn; P; IC)) satisfies IC. Comb(Sem2(T1, T2 . . . . . T~; P; IC)) satisfies IC. LEMMA 3. Let T1, T2. . . . . Tn be a set of theories each consisting only of propositional atoms and each satisfying a set of integrity constraints 1C. Let P be a set of priorities. Let 0 be the set o f options induced on B = I"1 U T2 U . . . U Tn by IC. Then,
Comb(Semi(T1, T2 . . . . . Tn; P; 1C))satisfies P. Comb(Sem2(Tl, T2 . . . . . Tn; P; 1C)) satisfies P.
238
4.
PRADHAN, MINKER AND SUBRAHMANIAN
Option transformation
In the previous section we considered two semantics based on the first two ways in which a set of options O can be changed when an oi ~ 0 does not satisfy a priority. In this section we consider a third way of changing O such that oi is itself transformed so that it satisfies the priority. An option oi that does not satisfy a priority can be transformed in at least two ways. 9 The first type of transformation takes x > Y to mean that in any situation in which Y (or some non-empty subset of Y) is believed and x is not, one should instead believe x, unless there are reasons independent of g in the situation which preclude belief in x. That is, in the first type of transformation we replace oi N Y by x in oi unless x-block EO i -- y. 9 The second type of transformation takes x > Y to mean that in any situation in which Y (or some non-empty subset of Y) is believed and x is not, one should also believe x, unless there are reasons independent of Y in the situation which preclude belief in x. That is, in the second type of transformation if Y n oi ~ 0, but x r then we amplify oi by adding x to it unless x-block c o i - Y. In this section we introduce a semantics based on the first way of transforming an option. We call this semantics 'option transformation'. In the next section we will introduce a semantics based on the second way of transforming an option. We will call this semantics 'option amplification'. In the option transformation semantics, an option or option transformation, oi, which does not satisfy a priority x > Y, should be transformed by replacing oi n Y by x. Note that an option, oi, satisfies x > Y if oi - Y already contains an x-block. The transformation of an option by a priority necessarily results in a set which satisfies the priority. But the transformation of an option to satisfy one priority may result in a set which may no longer satisfy a priority it had satisfied before the transformation. Thus, with a sequence of priorities ( P1, P2 . . . . . P n ) , if an option oi which satisfies ( P1, P2 . . . . . Pn-1) is transformed to satisfy Pn, it may no longer satisfy (P~, P2 . . . . , Pn-1). To satisfy all the priorities we may have to iterate the process of re-transforming the option until all the priorities are satisfied. If the set of priorities contains a cycle, then this process may never terminate. We formalize this idea below: DEFINITION 16. If there is a priority x > Y which is not satisfied by an option oi, then replacing o i n Y by x in oi is a transformation of oi by x > Y. We shall write this as oi (x I Y). This is intended to mean oi with the statement x replacing the statements o i n Y. We shall call an o i ( x I Y) an 'option-transform'. REMARK 2. The transformation of an option by a priority need not itself be an option. Consider the following example. As in Example 1, let the base set be B = {a, b, c}. Let IC = {~--- a, b; +-- a, c}. In this case, the options are {a} and {b, c}. The priority b > a will transform option {a} into {b}; but {b} is not an option (maximally consistent with IC) because adding c to it would not make it inconsistent. DEFINITION 17. Let B be a base set, let IC be a set of integrity constraints, let Pj = x > Y be a priority, and let oi be one of the options defined by IC on B. Then, we define an
COMBINING DATABASES WITH PRIORITIZED INFORMATION
239
operator I" such that
FB'Ic(PJ' ~
=
0 i satisfies Pj
0i oi(x l Y)
otherwise
Whenever possible we shall omit explicit reference to B and IC in the statement of the I~ operator. Next we define the operator I" over a sequence of priorities. DEFINITION 18. Let P = {P1 . . . . . Pk } be a set of priorities. Let P/~) be any permutation of the members of P, which will represent a sequence of priorities. Then we denote the i-th member of this sequence by
DEFINITION 19. Let oi be defined as in Definition 17. Let P = {P1 . . . . . Pk} be a set of priorities. Let P(~> be an arbitrary sequence over P. Then define,
r(e(cr>, oi) = I~(P{o-)(k), F(e(rr)(k-1) . . . . . l~(e(o-}(l), oi)) Next we define what it means to iterate the operator F.
DEFINITION 20.
Let P(~) be a sequence of priorities. Let oi be an option. Then,
oi) = oi F1 (P(~I, oi) = F(P(~}, oi) F~
Fn+l(P(a), oi) = F(P(cr),
I'"(P{~/, oe)).
Next we define what it means for an option or the transformation of an option to satisfy a set of priorities. DEFINITION 21. Let P = {P1 . . . . . P~} be a set of priorities. Let P(~} be an arbitrary sequence over P. Let oi be an option. Then,
I'n( P(~), oi) satisfiesPiff it satisfies every member of P. REMARK 3. If I"*(P(~>, oi) satisfies P then, clearly, no transformation will take place in r'n(P~l, oi) in the next iteration of P(~/, and, so, I~n+l(P{~/, oi) = Fn(P/~>, oi). But the converse of this does not hold. That is, it could be the case that l'n+l(P(~/, oi) = I'n(P(~I, oi), but Fn(p(~I, oi) does not satisfy P. And, so, the concept of satisfaction of a sequence of priorities cannot be captured in terms of a fix-point of the P operator. The example below establishes this point. Consider the case where, the base set B = {a, b}, IC = {+- a, b}. Let the set of priorities P = {a > b; b > a}. Let P(~) = (a > b; b > a). Let {b} be Oi. In this case,
240
PRADHAN, MINKER AND SUBRAHMANIAN
I'~ {b}) = {b} FI(P(c~), {b}) : {b}, since a > b transforms {b} into {a} and b > a transforms {a} into {b}. Similarly, I'2(P(~), {b}) = {b}. And, Fn(P(~), {b}) = Fn+I(P(~>, {b}) = {b}, for any n > 0. But, clearly, F"(P(~), {b}) does not satisfy a > b, for any n > 0. Hence, the concept of an option satisfying a sequence of priorities cannot be captured in terms of a fix-point of the P operator.
DEFINITION 22. Let P = {P1 . . . . . Pk} be a set of priorities. Let P(~) be an arbitrary sequence over P. Let O be a set of options. Then, I'"(P(~), O), satisfies P iff r'n(Pl~), Oi) satisfies P, for each oi ~ O. DEFINITION 23. Sem3: OPTION TRANSFORMATION Let T1, T2 . . . . . Tn be a set of theories each consisting only of propositional atoms and each satisfying a set of integrity constraints IC. Let P be a set of priorities. Let P(~I be any arbitrary sequence over P (that is, any permutation of P). Let O be the set of options induced on B = T1 U T2 U . . - U T~ by IC. Then, Sem3(T1, Tz . . . . . Tn; IC; P) = MAX{oi I oi E I'n(Pl~), O) and I'n(P(~, O) satisfies P}, where MAX(S) is the set of C-maximal elements of S.
(This definition implicitly assumes that the value of Sem3(T1, T2 . . . . . Tn; IC; P) is the same regardless of which permutation of P is chosen as P(~. This claim will be proven below in Theorem 4.) EXAMPLE 6. Let 7"1 = {a, b} and T2 = {c, d}. So B = {a, b, c, d}. Let IC = { +-a, b, c; +- a, d; <-- b, c, d}. So O = {{a, b}, {a, c}, {b, c}, {b, d}, {c, d}}. Let P = {a > b , b > c , c > d}. Let P<~ = (a > b , b > c , c > d). So F((a > b , b > c , c > d), O) --- {{a, b}, {b, c}}. Hence, F2((a > b, b > c, c > d), O) = {{a, b}} {{a, b}} satisfies P = {a > b, b > c, c > d}. Hence, Sem3(T1, T2; IC; P) = {{a, b}}. In the rest of this section we establish several properties of the option transformation semantics, the most important of which is that this semantics is equivalent to the Option Elimination semantics and the Option Ordering semantics which were defined in the previous section. LEMMA 4. Let B be a base set, let oi be one of the options induced on B by a set of integrity constraints IC, and let x > Y be a priority. Then, the result o f transforming oi by x > Y satisfies IC. The next lemma says that the result of transforming an option by a priority (or sequence of priorities) is either one of the original options or a subset of one of the original options. LEMMA 5. Let 0 be the set of options defined on a base set B by a set of integrity constraints IC. Let P = { P1 . . . . , Pk } be a set of priorities. Let P(~) be any arbitrary permutation over P. Then
241
COMBINING DATABASES WITH PRIORITIZED INFORMATION
f o r any oi c O, there exists an oj E 0 such that I TM ( P(,I, oi) ~ o i, f o r any n > O.
This result holds regardless of which permutation of the priorities is chosen, since, as shown in the appendix, the proof does not use any information about the structure of the permutation. DEFINITION 24. Let P = {P~ . . . . . Pc} be a set of priorities. Let P(r be any arbitrary permutation of P. Let O be a set of options. Then, Fn(P(r O) is stalled iff 1. it does not satisfy P, and 2. there is an r > n such that F~(P/~, O) = Fr(P(cr), O). We shall show below in Theorem 2 that if there exists a permutation a of a set of priorities P and an integer n such that I'n(P(~), O) is stalled then P contains a cycle. The next few definitions and lemmas are needed to prove Theorem 2. DEFINITION 25. The result of transforming an option or an option-transform oi by k iterations of a sequence of priorities P(~), that is, I"k (P(~), oi) will be called the k-th transform ofoi with respect to PIll" We shall denote this as o~ 'e(~ . Informally, we shall say oi ~ 0 k , P(~)
is transformed into u i REMARK4.
k, P(,x)
in I'k(P/~/, O). We shall also write this as oi "~ o i kP( )
I f o / = Fn(P(~/, oi) then oj' ~ = Fk(P(~l, I'"(P(~), oi)) = Fk+n(P(~/, oi) =
o/k+"',<~>. Hence, if oi ~ 0 then okj"P'~ 6 rk+"(P<~>, O). EXAMPLE 7.
Let F~(P/~/, oj) = oi = {a, b, c}. Let F r ( P / ~ , Oj) = {a, b, d}, where r--n, P(~)
r > n. Then oi = {a, b, c} ..~ o i
= [,r (p<~>, o i) = {a, b, d}.
L e m m a 6 and L e m m a 7 below are needed in the proof of Theorem 2. Let Ir'n( P ~ , O) be stalled. Then there exists an integer r > n such that I'n(P/~5, O) = Fr(p/~I, O). So by r - n iterations of the F operator on I'~(P(~/, O) we get Fr(P(~, 0 ) . In L e m m a 6 and L e m m a 7 below we look at some of the properties of the transformation of a member of Fn(PI~I, O) into a member of Fr (P(~/, O). L e m m a 6 says that any oi c I'~(PI~, O) which does not satisfy P will get transformed into some ok ~ I'~(P(~/, O) and nothing else in I'"(P(~/, O) will get transformed into ok and ok will not satisfy P. LEMMA 6. Let P be a set o f priorities. Let P(~) be any arbitrary permutation o f P. Let F~(P(~), O) be stalled. Let r > n be such that F~(P(~/, O) = Fr(P(~), 0). I f oi C F~(P(~/, O) and oi does not satisfy P then: r --n, P(~I
1~ There exists an ok ~ Fr(P/~/, O) such that o i = ok. That is, ok can be obtained from oi by r - n iterations o f the sequence of priorities, and 2. ok does not satisfy P, and 3. Nothing other than oi is transformed into ok in r,r (p(~), O).
L e m m a 7 below builds on L e m m a 6. If an oi ~ Fn(PI~I, O) gets transformed into an ok E Fr(P/~/, O), then, since Fn(P(~/, O) = Fr(P(~/, O), ok must also belong to
242
PRADHAN, MINKER AND SUBRAHMANIAN
F~(P(~), O). Since ok cannot satisfy P by L e m m a 6, it must be transformed into an oj ~ I'r(P/~>, O), and so on. L e m m a 7 says that this chain of transformations forms a cycle. LEMMA 7. Let P be a set of priorities and let P(~) be any arbitrary permutation of P. Let F"(P(~), O) be stalled. Let r > n be such that F~ ( P(~) , O) = W ( PI~), 0). Then,
there exists an o i
E
Fn(p(cr), O) such that oi does not satisfy P and such that there r - - n , P(~)
is a cycle o f transformations in which oi "~ o i 0i2 ~
r-n, 0i2
DEFINITION 26.
P(~) ~
0i3 , 9 . . ~ Oi m ~
r-n, Oim
= oil, oil
~
r-n, 0il
P(~) =
Oi2,
P(~) ~
0 i.
Let P(~) be an arbitrary permutation of a set of priorities P. Then,
r-n, P(~) atraceofthetransformationoi ..~ o i ,whereoi ~ Fn(P/~), O),isthecollection of all changes caused in oi by P(~) through r - n iterations of P~).
We shall denote the above trace by trace(oi, n, r). The changes are stored as a collection of pairs such that the first member of each pair is the set of atoms being replaced and the second member of the pair is the atom replacing it. DEFINITION 27. A set of priorities, P, forms a cycle if there exist priorities/'1, P2 . . . . . Pn in P such that greater(P2) E lesser(P1),greater(P3) ~ lesser(P2) . . . . . greater(P1) lesser(Pn). THEOREM 2. Let 0 be a set of options. Let P be a set of priorities. If P has a permutation P(~) such that there exists an integer n such that I'n(P(~), O) is stalled then P contains a cycle. PROOF. Assume Fn(P(~), O) is stalled. So Fn(P(~), O) does not satisfy P and there is an r > n such that Fn(P(~), O) = Fr(P(~/, O). Let r be the smallest such r. Since Fn(p(~), O) does not satisfy P, there must be an Oil E I'n(P(~), O) such that it does not satisfy P. By L e m m a 7, there must be a cycle of transformations through oi. Let Cycle = {oj [ oj ~ cycle through oi }. We shall create the following digraph G. Let the vertices of G be V = {a [ a c base set}. Let GreenNodes = (-JojeCycleoj. Let RedNodes = V - GreenNodes. Let TotalTrace = ~-Joj~Cycletrace(o j, n, r). Let the edge of the graph be E = {(a, b) a E first member of e, b = second member of e, e ~ TotalTrace}. With these specifications, the graph has the following properties: 1. A path through the graph cannot terminate in a RedNode, otherwise some oj c Cycle will be transformed into an ok r Cycle. But, by L e m m a 7, this is not possible. 2. Every GreenNode in the graph that has out-degree at least one (i.e., it is being replaced) must have in-degree at least one (i.e., it is being restored). Otherwise some oj ~ F~(P(,), O) will not be in Fr(p(~), 0 ) . But that contradicts our assumption that r n ( P ~ ) , O) = rr(p(,~), 0 ) . 3. Clearly, the graph is finite. 4. At least one GreenNode has out-degree one since oi does not satisfy P.
COMBININGDATABASESWITH PRIORITIZEDINFORMATION
243
Under these circumstances there must be a cycle in the graph. Take any GreenNode that has outdegree at least one. Follow the path from that node. The path cannot terminate in a RedNode (by 1 above). But the path cannot end in a GreenNode (by 2 above). But given that the graph is finite, this means the path must be part of a cycle. Since each edge in the graph corresponds to a priority, there must be a cycle in P, the set of priorities. [] DEFINITION 28. Let IC be a set of integrity constraints, let O be a set of options, let P be a set of priorities. Then, P is unsatisfiable with respect to IC and O if, for each permutation, P/a/, of P, there exists no n such that I'n(P/~/, O) satisfies P. LEMMA 8. Let IC be a set of integrity constraints, let 0 be a set of options induced on a base set B by IC. Let P be a set of priorities. Then,
if P is unsatisfiable with respect to IC and 0 then, for each permutation, P(~>, of P, there exists an n such that I~n ( P(~, O) is stalled. THEOREM 3. Let IC be a set o f integrity constraints, 0 be a set of options, and P be a set of priorities. If P is unsatisfiable with respect to IC and 0 then P contains a cycle. PROOF.
Follows directly from Lemma 8 and Theorem 2.
[]
This raises the question whether the converse of Theorem 3 holds. It turns out that it does not hold. REMARK 5. Even if P = {P1 . . . . . Pk} contains a cycle, P can be satisfiable with respect to some IC and some O. Consider the following example. Let B = {al, a2, a3, a4}. Let IC = {+-- al, a3, a4; +-- a2, a4; +- al, a2, a3}. So O = {{al, a2}, {al, a3}, {al, a4}, {a2, a3}, {a3, a4}}. Let P = (al > a2, a2 > a3, a3 > a4, a4 > al }. Thus, P contains a cycle. Let P(~/ = (a~ > a2, a2 > a3, a3 > a4, a4 > al). al > a2 transforms O into O1 = {{al, a2}, {al, a3}, {al, a4}, {a3, a4}}. a2 > a3 transforms O1 into 02 = {{ai, a2}, {al, a4}, {a3, a4}}. (Note that a2 > a3 does not transform {a3, a4} into {a2, a4} because of the integrity constraint ~ a2, a4.) a3 > a4 transforms 02 into 03 = {{al, a2}, {al, a3}, {a3, a4}}. a4 > at transforms 03 into 04 = {{al, a2}, {a3, a4)}. (Note, again, that a4 > al does not transform {al, a2} into {a2, a4} because of the integrity constraint ~ a2, a4.) So Pl((al > a2, a2 > a3, a3 > a4, a4 > al), O) = 0 4. SO 1-q ((al > a2, a2 > a3, a3 > a4, a4 > al), O) satisfies all the priorities. Note that in the above example every member of P causes a transformation before P is satisfied. That is, no member of P is redundant. However, there are cases where a cycle in the set of priorities P can result in P being unsatisfiable with respect to some IC and O. An example is given below.
244
PRADHAN,MINKERAND SUBRAHMANIAN
EXAMPLE8. L e t l C -----{+-- a, b, c}. L e t O = {{a, b}, {a, c}, {b, c}}. L e t P = {a > b, b > c, c > a}. Thus, P contains a cycle. In this example, regardless of which permutation of P is chosen, P cannot be satisfied by IC and O. This raises the question: what sorts of priorities containing cycles are satisfiable with respect to what sorts of integrity constraints and what set of options? This turns out to be a very difficult question and one that we have not been able to answer. The next lemma says that any permutation of an acyclical set of priorities can transform a set of options so that it satisfies that set of priorities. This lemma is needed for the proof of Theorem 4. LEMMA 9. Let P be a set of acyclical priorities. Let 0 be the set of options induced on a base set B by some set of integrity constraints. Then,
for any permutation P(alof P, there exists an integer nsuch that F n (PIa), O) satisfies P. The above lemma assures us that if we want to transform a set of options O so that it satisfies an acyclical set of priorities by iterating the priorities over O in terms of the r operator, then it does not matter which permutation of P we choose. Regardless of which permutation we choose, we wilt end up with a transformation of O that satisfies all the priorities. But this does not mean that we will end up with the same transformation of O if we choose different permutations of the priorities. Consider the following example. EXAMPLE 9. Let base set, B = {a, b, c}. Let IC = {+-- a, b; <--- a, c}. In this case, the options are {a} and {b,c}. Let P = {b > a , c > a}. Let P(~/ = (b > a , c > a}. Let P/p) = (c > a, b > a). In this case, FI(P(~), {{a}, {b, c}} = {{b}, {b, c}} satisfies P. Also, I'I(Po>, {{a}, {b, c}} = {{c}, {b, c}} satisfies P. But FI(P(~/, {{a}, {b, c}} 7~
r'~(P(p/, {{a}, {b, c}}. Nevertheless, it will turn out that the value of Sem3(Tl, T2. . . . . Tn; IC; P) will not depend on which permutation of P is chosen in the computation of Sem3(T1, I"2. . . . . Tn; IC; P). This is proven below in Theorem 4. Earlier, in Lemma 5 we had shown that the result of transforming an option oi ~ 0 by a sequence of priorities must be a subset of at least one of the original options in O. In the next lemma we shall go further and show that any option or option-transform satisfying a set of acyclical priorities is a subset of one of the original options which also satisfies the priorities. Note that this further claim is not a corollary of Lemma 5, since a set of statements could satisfy a priority without a superset of it satisfying that priority. We shall use this lemma in the proof of Theorem 5. LEMMA 10. Let 0 be the set of options defined on a base set B by a set of integrity constraints IC. Let P be a set of priorities containing no cycles. Let PI~) be an arbitrary permutation of P. Then,
if, for some n > 0, I~n(P/~ ), oi) satisfies P, where oi ~ O, then there exists an o i ~ 0 such that Fn(P(~r>, oi) CC_oj andoj satisfies P.
COMBININGDATABASESWITHPRIORITIZEDINFORMATION
245
REMARK 6. Let P be a set of acyclical priorities. Let P(~/be some permutation of P. Let O be a set of options. Let Fn(P(~), O) be such that every member of it satisfies P. Then, it follows straight forwardly from the above lemma that the only members of 1-'~ (P(,~, O) that are not members of O are precisely those which are strict subsets of some members of O. Equivalently, all members of Fn(p(cr), O) that are not strict subsets of any member of O are members of O. THEOREM 4. Let TI, T2 . . . . . Tn be a set of theories each consisting only of propositional atoms and each satisfying a set o f integrity constraints IC. Let P be a set of acyclical priorities. Let 0 be the set of options induced on B = T1 tO T2 U 9 9 U T~ by IC. Then, Regardless of which permutation of Pis used in the calculation, Sem3(T1, T2 . . . . . Tn; IC; P) has the same value. PROOF. Let P(~/ and P(pl be two distinct permutations of P. Since P is acyclical, by Lemma 9, we know there are integers n and r such that F"(P(~/, O) and Pr(P(p~, O) both satisfy P. In calculating Sem3 from I'"(P/~/, O) and Pr(p(pl, O) we take their maximal elements (see Definition 23). So the theorem is proved if we prove that the maximal elements of F'(P(~), O) and I'r(P{p), O) are the same. AssumebywayofcontradictionthatthemaximalelementsofF"(P/~>, O) and I"r(P(p}, O) are not the same. So either r n (P/~, o ) contains a maximal element that Fr(p(p>, O) does not, or the other way around. Suppose, without loss of generality, that there is a maximal element oi ~ F~(P(~), O) such that oi r Fr(P~p), 0 ) . Since, by hypothesis, F~(P{~/, O) satisfies P, oi satisfies P. Since oi satisfies P, it follows by Lemma 10 that there exists an oj E 0 such that oi ~ oi. Since oi is maximal, it follows that oi 92 o i. Hence oi = oj. Hence, it follows that oi E O. Since oi satisfies P, no member of P can transform it. So no member of P(pl can transform it. So oi must be in 1-'r(P(p), O), since it is in O and it is not transformed by any member of P(pl. This contradicts our earlier assumption that oi r F r (P(p), 0 ) . Hence ITM (P(~}, O) and Pr(P(p), O) must have the same maximal elements. Hence Sem3(T1, T2 . . . . . Tn; IC; P) has the same value regardless of which permutation of P is used in the calculation of Sem3(T1, 7'2. . . . . Tn; IC; P). [] We prove below that the three semantics for priorities, Seml, Sem2, and Sem3, are equivalent if the set of priorities is acyclical. If the set of priorities contains a cycle, we have no assurance that Sere3 is well-defined. THEOREM 5. Let T1, T2 . . . . . T~ be a set of theories each consisting only of propositional atoms and each satisfying a set of integrity constraints IC. Let P = {['1 . . . . . Pn} be an acyclicaI set of priorities. Let 0 be the set of options induced on B = 7"1 U T2 U .. 9 U Tn by IC. Then, Seml(T1, T9 . . . . . Tn; IC; P) =Sem2(T1, T~ . . . . . Tn; IC; P) =Sem3(T1, T2 . . . . . Tn; Ic; e ) . PROOF. In light of Theorem 1, it will suffice to prove that Sem 1(T1, T2. . . . . Tn; IC; P) = Sem3(Tt, T2 . . . . . Tn; IC; P).
246
PRADHAN, MINKER AND SUBRAHMANIAN
Seml(T1, 7"2. . . . . T,; IC; P) consists of all those members of O which satisfy all members of P. So, to prove the equivalence we need to show that Sem3(T1, T2 . . . . . T~; IC; P) consists of all those members of O which satisfy all members of P, and nothing else. Clearly, all members of O that satisfy P will be in Sem3(T~, T2 . . . . . Tn; IC; P) since no priority will transform any of them. By definition of Sem3, every oi 6 Sem3(T1, T2 . . . . . Tn; IC; P) must satisfy P. But, then, by Lemma 10 we know that for each such oi there exists an oj 6 0 such that oi ___oj and oj satisfies P. So each such oj will be in Sem3(T1, T2 . . . . . T~; IC; P). But, since, Sem3 takes the maximal elements, oi will not be in Sem3(T1, T2. . . . . Tn; IC; P), ifoi ~ oj. So there cannot be a member of Sem3(T1, T2 . . . . . Tn; IC; P) such that it does not belong to O. So Sem3(T1, T2 . . . . . T,; IC; P) consists of nothing else than all those members of O which satisfy all members of P. Hence, Seml(Tl, 7"2. . . . . Tn; IC; P) = Sem3(Tl, T2 . . . . . T,,; IC; P). And, hence, the three semantics are equivalent. [] Now that we have Theorem 5, we can extend the results of Lemma 2 and Lemma 3 to Sem3. LEMMA 11. Let 7"1, 7"2. . . . . T, be a set of theories each consisting only of propositional atoms and each satisfying a set of integrity constraints IC. Let P be a set of acyclic priorities.
Then, Comb(Sem3(T1, T2, . . . , Tn; P; IC) ) satisfies IC. PROOF. Follows straightforwardly from Theorem 5 and Lemma 2.
[]
LEMMA 12. Let T1, T2 . . . . . Tn be a set of theories each consisting only of propositional atoms and each satisfying a set of integrity constraints IC. Let P be a set of acyclic priorities. Then, Comb(Sem3(T1, T2 . . . . . Tn; P; IC)) satisfies P. PROOF. Follows straightforwardly from Theorem 5 and Lemma 3.
[]
In the next lemma we show that the Comb function will not return an empty set if the theories being combined are non-empty. This lemma is needed to prove Lemma 14, which says that for any acyclical set of priorities P, there is at least one option oi c 0 which will satisfy P. LEMMA 13. Let T1, T2 . . . . . Tn be a set of theories each consisting only of propositional atoms and each satisfying a set of integrity constraints IC. Let P be a set of acyclical priorities. Let 0 be the set of options induced on B = T1 t J T2 U .. 9tA Tn by IC. Then, if Bis non - empty, then Comb(Semi(T1, T2 . . . . . Tn; P; IC)), Comb(Sem2(Tl, T2, . . . . Tn; P; IC)), and Comb(Sem3(T1, T2 . . . . . Tn; P; IC)) are non-empty.
COMBININGDATABASESWITH PRIORITIZEDINFORMATION [,EMMA 14.
247
Let 0 be the set o f options induced on B = T1 U T2 U . . . U Tn by IC. Then,
f o r any acyclical set o f priorities P, there is at least one option oi c Owhich will satisfy P.
5.
Cautious semantics
Earlier we had noted that an option oi ~ 0 which does not satisfy a priority x > Y can be transformed in two ways so that it satisfies the priority. In the previous section we explored the Option Transformation semantics based on the first way in which a priority can transform an option. In this section we introduce a new semantics based on the second way in which a priority can transform an option. The second type of transformation takes a priority x > Y to mean that in any situation in which a non-empty subset of Y is believed and x is not, one should also believe x, unless there are reasons independent of Y in the situation which preclude belief in x. That is, in the second type of transformation if Y A oi ~ 13, but x ~( oi, then we amplify oi by adding x to it unless x-block c oi - Y. Amplification of an option gives rise to a new semantics, called 'Option Amplification', which is more cautious than Seml, Sem2 and Sere3. Next we introduce the property of being cautious. Consider the following example: Let there be three theories T1 = {a},/'2 = {b}, and 7"3 = {c}, encoding the knowledge of three experts. Let IC = {+-- a, b; +-- a, c} and let P = {b > a}. Then B = {a, b, c} and O = {{a}, {b, c}}. Clearly, Sem3(Ta, T2 . . . . . T~; P; IC) = Seml(T1, T2 . . . . . T~; P; IC) = Sem2(T1, T2. . . . . T~; P; 1C) = {{b, c}}. So, Comb(Sem3(T~, T2, I C , P ) ) = Comb (Seml (T1, T2, IC, P ) ) = Comb(Sem2(T1, T2, IC, P ) ) = {b, c}. So, although the claim of the third expert (i.e., c) is in contention with the claim of the first expert (i.e., a) and although c has no priority over a, yet in the final combination c is included. A cautious policy would not include c in the final combination without first resolving the conflict between c and a. A proponent of such a cautious policy would argue that giving priority to the claim of the second expert (i.e., b) over the claim of the first expert does not mean that the claim of the first expert is now discredited, and, so, it should retain its force in the conflict between the first expert and the third expert (i.e., the conflict between c and a). Whereas a proponent of the less cautious policy that Seml, Sem2, and Sem3 embody, would argue that since a cannot be included in the final combination, we might as well include c, since it is after all the claim of an expert and we should include as many claims of experts as we can in the final combination. Clearly there is merit in both points of view. Below we formalize the intuition behind the cautious policy. DEFINITION 29. Let T1, T2, . . . , Tn be a set of theories consisting of propositional atoms, each satisfying a set of integrity constraints IC. Let P be a set of priorities. Let B = T1 U T2 U . . . U T~. Then, aground a t o m x ~ B is free from lCi ~ IC if either x r Obj(ICi) or, ifx 6 0 b j ( I C i ) , then there exists a priority Pj c P such that x = greater(Pj) and lesser(pi) v1 O b j ( I G ) # 0.
248
PRADHAN, MINKER AND SUBRAHMANIAN
DEFINITION 30. Let 7"1, T2 . . . . . T~ be a set of theories consisting of propositional atoms, each satisfying a set of integrity constraints IC. Let P be a set of priorities. Let B = T1 U T2 U . . . U Tn. Let Sem(T1, T2 . . . . , T,; P; IC) be a semantics for combining theories with priorities, such as Seml, or Sem2, or Sem3 above. Then, Sem(T1, T2 . . . . . T~; P; IC) iscautious if a ground atom x is a member of Comb(Sere(T1, T2 . . . . . Tn; P; IC)) only i f x is free from every active I C i C IC.
Informally, this definition says that a semantics is cautious if the combined theory it gives includes some atom x only if for each active integrity constraint in which x appears, x has a higher priority than some atom which appears in that integrity constraint. REMARK 7. Seml, Sem2, and Sem3 are not cautious. In the example above we saw that c ~ Comb(Sem(T1, T2, T3; {b > a}; {+-- a, b; +-- a, c})), where Sem can be Seml, orSem2, or Sere3. But c has no priority over a. 5.1.
Option Amplification
Next we develop the Option Amplification semantics, which is cautious. DEFINITION 31. Let 7"1, T2 . . . . . T, be a set of theories each consisting only of propositional atoms and each satisfying a set of integrity constraints IC. Let P be a set of acyclical priorities. Let O be the set of options induced on B = T1 U T2 U . . . U Tn by IC. Let oi ~ O. Then amplification(o/) = oi U {x I x c B, x > Y E P, Y A oi ~ 13, oi does not satisfy x>Y}. EXAMPLE 10. Let T1 = {a, b} and T2 = {c, d}. So B = {a, b, c, d}. Let IC = {+-a, b, c; ~ a, d; +-- b, c, d}. So O = {{a, b}, {a, c}, {b, c}, {b, d}, {c, d}}. Let P = {b > c, a > d}. Then, amplification({c, d}) = {a, b, c, d}, amplification({b, d}) = {a, b, d}, amplification({b, c}) = {b, c}. DEFINITION 32. Option Amplification Let T1, T2 . . . . . Tn be a set of theories each consisting only of propositional atoms and each satisfying a set of integrity constraints 1(7. Let P be a set of acyclical priorities. Let O be the set of options induced on B = T1UT2U. 9 9UTn bylC. Then, Sem4(T1, T2 . . . . . Tn; IC; P ) = {amplification(oi) I oi c 0}. EXAMPLE 11. As in Examples 4, 5, 6, let T1 = {a, b} and T2 = {c, d}. So B = {a, b, c, d} Let IC = {+- a, b, c; +- a, d; +- b, c, d}. So O = {{a, b}, {a, c}, {b, c}, {b, d}, {c, d}}. LetP={a>b,b>c,c>d}.Then, Amplification({a, b}) = {a, b} Amplification({a, c}) = {a, b, c} Amplification({b, c}) = {a, b, c} Amplification({b, d}) = {b, c, d} Amplification({c, d}) = {b, c, d}
COMBINING DATABASES WITH PRIORITIZED INFORMATION
249
Hence Sem4 (T1, T2; IC; P) = {{a, b}, {a, b, c}, {b, c, d}}. REMARK 8. Note that if T1, T2, IC, and P are as in Example 11 above, then, as seen in Examples 4, 5, 6, Sem(T1, T2 . . . . . Tn; IC; P) = {a, b}, where Sem can be any of Seml, or Sere2, or Sem3. So, although there is no priority a > d and there is an integrity constraint +-- a, d, but it is the case that a ~ Comb(Sere(T1, ~ . . . . . T~; IC; P)). That is, Semi, Sem2, and Sere3 are not cautious. But a r Comb(Sem4(T~, T2 . . . . . Tn; IC; P)) precisely because {b, c, d} E Sem4(T1, T2; IC; P). That is, whereas Seml, Sem2, and Sere3 get rid of options which do not satisfy the priorities, Sere4 retains their amplified versions. Hence, in general, fewer atomic statements can be derived from Comb(Sere4) than Comb(Sere). THEOREM 6.
Sem4 (Option Amplification) is cautious.
PROOF. Let T1, Tz, .. 9 Tn be a set of theories each consisting only of propositional atoms and each satisfying a set of integrity constraints IC. Let P be a set of acyclical priorities. Let O be the set of options induced on B = T1 U T2 U ... U T~ by IC. Suppose x c Comb(Sem4(T1, T2 . . . . . Tn; P; IC)), where x 6 B. Since x is a propositional atom, if x ~ Comb(Sem4(T1, 7"2. . . . . Tn; P; IC)) then x must belong to every amplification(o/) 6 Sem4(T1, T2, ... , Tn; P; IC). Now we have two cases: either x belongs to every oi ~ 0 or there exists an oi ~ 0 such that x r Case 1: x belongs to every oi ~ O. Then there can be no active ICj ~ IC such that x ~ Obj(ICj). Assume to the contrary that there exists an active ICj E IC such that x c Obj(ICj). So, by the definition of an active integrity constraint, Obj(ICj) c B. So, by Lemma 1, there must exist an ok 6 0 such that Obj(ICj) - {x} __ ok. So, i f x belongs to every oi ~ O, ok would violate ICj. Hence there cannot be an active ICj ~ IC and a substitution such that x E Obj(ICj). So, trivially, x is free from every active ICj E IC. Hence Sem4(T1, 7"2. . . . . Tn; P; IC) is cautious. Case 2: It is not the case that x belongs to every oi ~ O. For every active ICj such that x E Obj(ICj), there is at least one option ok 6 0 such that x r ok. x can be in amplification(ok) only if there exists a priority Pr ~ P and such that x = greater(Pr) and such that lesser(Pr) f'l ok ~ 0 and x-block f~ ok - lesser(Pr). But it must be the case that lesser(Pr) ~ Obj(ICj), otherwise ok - lesser(Pr) would contain an x-block (because otherwise Obj(ICj) - {x} would be in ok). So, again, we have that x is free from every active ICj ~ I C. Hence Sem4(T1, 7"2. . . . . T~; P; IC) is cautious. [] Since each option is maximally consistent with IC, amplifying an option necessarily results in a set that no longer satisfies IC. Hence, Comb(Sem4(Tl, T2 . . . . . Tn; P; IC)) will not satisfy IC in the sense of Definition 2. On the other hand, if those sets in SEM4 that do not satisfy IC are discarded then SEM4 will not be different from the non-cautious semantics. Below we introduce a weaker notion of satisfaction of IC, and in this sense of satisfaction, Comb(Sem4(T~, T2 . . . . . T~; P; IC)) will satisfy IC.
250
PRADHAN,MINKERAND SUBRAHMANIAN
DEFINITION 33. Let IC be a set of integrity constraints. Let T be a theory consisting of clauses which are disjunctions of propositional atoms. Then, T weakly satisfies IC if at least one minimal model of T satisfies all members of IC. The definition of weak satisfaction implies that the theory is consistent with the integrity constraints. However, we do not delete models of the new theory that are not consistent with the integrity constraints. LEMMA 15. Let 7"1, Ta . . . . . Tn be a set of theories each consisting only of propositional atoms and each satisfying a set of integrity constraints IC. Let P be a set of acyclical priorities. Let 0 be the set of options induced on B = 7"1 U T2 U . . . U Tn by IC. Then,
Comb(Sem4(T1, T2 . . . . . Tn; P; IC)) weakly satisfies IC. LEMMA 16. Let T1, T2 . . . . . T, be a set of theories each consisting only of propositional atoms and each satisfying a set of integrity constraints IC. Let P be a set of acyclical priorities. Let 0 be the set of options induced on B = T1 U T2 U. 99U Tn by IC. Then,
Comb(Sem4(T1, T2 . . . . . Tn; P; IC)) satisfies all Pi ~ P. The next theorem states that the set of consequences of the cautious semantics is a subset of the set of consequences of the non-cautious semantics, thus justifying its name. THEOREM 7. Let T1, T2. . . . . Tn be a set of theories each consisting only of propositional atoms and each satisfying a set of integrity constraints IC. Let P be a set of acyciical priorities. Let Sem stand for Seml, or Sem2, or Sem3, defined above. Then,
for any sentence x, if Comb(Sem4(T1, T2 . . . . . Tn; P; IC)) ~ x, then Comb(Sem(T1, T2 . . . . . Tn; P; IC)) ~ x. PROOF. From Lemma 13 we know that Sem(T1, T2 . . . . . Tn; P; IC) is non-empty. Any oi ~ Sere(T1, T2 . . . . . Tn; P; IC) satisfies all the priorities and hence will be in Sem4(T1, T2, . . . . Tn; P; IC) since Sem4 amplifies an option only if it does not satisfy a priority. So, Sem(T1, T2 . . . . . Tn; P; IC) C Sem4(T1, T2 . . . . . Tn; P; IC). Suppose Comb(Sem4(T1, T2 . . . . . Tn; P; IC)) ~ x. Let x be of the form al va2 v - . . van, where n > 1. Then, let disjuncts(x) denote {al, a2 . . . . , an}. That is, x VaiEdis]uncts(x)ai" Then for each oj ~ Sem4 (T1, T2 . . . . . Tn; P; IC) there exists an ai E disjuncts(x) such that ai ~ oj. But, given that Sere(T1, T2 . . . . . Tn; P; IC) c Sem4(T1, T2 . . . . . Tn; P; IC), there exists a subset X ofdisjuncts(x) such that for each ok 6 Sem(Tl, T2 . . . . . Tn; P; IC) there exists an ar ~ X such that a r c ok. That is, Sem(T1, T2 . . . . . Tn; P; IC) ~ V,r~x ar. However, since X is a subset of disjuncts(x), it follows that Varex ar ~ x. Hence, for any x, if Comb(Sem4(T1, T2, . . . , T,; P; IC)) ~ x, then Comb(Sere(T1, T2, . . . . Tn; P; IC)) ~ x. [] =
6.
Related work
(Baral, et al., 1991) gives methods of combining theories each of which satisfies a set of integrity constraints, where the naive union of the theories fails to satisfy the integrity
COMBININGDATABASESWITHPRIORITIZEDINFORMATION
251
constraints. They do not however consider adding priorities among arbitrary statements or even priorities among theories. (Baral, et al., 1992) gives methods for combining first order theories with priorities among theories. (Fagin, et al., 1983) present an account of updates with priorities among sets of statements. However, none of these papers consider the problem of combining databases with priorities among arbitrary statements. Ryan's work on Ordered Theory Presentations (Ryan, 1991, 1992a, b) gives a semantics for first order sentences with arbitrary priorities among statements. It is based on the idea of ordering all possible interpretations of the sets of sentences in terms of which interpretations most nearly satisfy the set of sentences and satisfy the priorities. Interpretations maximal in the ordering are taken to be the models of the set of sentences with the priorities. Although Ryan has not worked on the problem of combining databases, his approach is closely related to what we have called Sem2. Each option can be regarded as an interpretation that most nearly (even if not completely) satisfies the base set. Sem2 basically introduces an ordering among the options in terms of the priorities and chooses the maximal options in terms of this ordering. Our approach is different than Ryan's in several respects. First, Ryan's priorities are required to be transitive; we do not require priorities to be transitive. Second, in our system i r a > b a n d a > c, then it does not follow t h a t a > {b,c}. This seems natural to us. Just because some one prefers a to b and a to c, does not mean that he prefers a to both b and c together. But in Ryan's treatment of priorities this would hold. Third, Ryan's treatment of priorities restricts itself to priorities among sentences in the theory, but does not consider priorities among any two sentences in the base set, regardless of whether they are part of the theory. But our approach allows this in principle. This is obscured in the case where the theories consist entirely of atoms, but this will become clear in extensions of this work to non-atomic theories. Fourth, Ryan's semantics is non-cautious, whereas we develop both a cautious and a non-cautious semantics. An alternative approach to combining multiple databases has been developed by Subrahmanian (Subrahmanian 1992) who develops a language for expressing supervisory databases. Intuitively, a supervisory database contains conflict resolution information. The advantage of (Subrahmanian 1992) is that it provides a very general framework for combining multiple knowledge bases, even when uncertainty, time, inconsistency and nonmonotonicity (both in terms of stable and well-founded semantics (Gelfond and Lifchitz, 1988; Van Gelder, et al., 1988) are present. In contrast, in this paper, we deal only with deduction-free relational databases. What (Subrahmanian, 1992) lacks is an explicit articulation of what preference means and this topic is the main focus of this paper. An interesting topic for future work would be to see whether the notion of preference developed in this paper can be encoded in the supervisory language of (Subrahmanian, 1992).
7.
Conclusions and future work
In this paper we present a cautious and three equivalent non-cautious semantics for combining theories (consisting entirely of propositional atoms) with priorities among statements, which express user-preferences among conflicting statements. We have proved several interesting properties of these semantics and presented an algorithm for combining theories which can be used in either the cautious mode or the non-cautious mode.
252
PRADHAN, MINKER AND SUBRAHMANIAN
In terms of our work on priorities among arbitrary statements, we can express priorities among theories. Thus, T1 > T2, understood to mean T1 = {al, a2 . . . . . an} has priority over T2, can be expressed as {al > T2, a2 > T2 . . . . . an > T2}. Our framework can also be used to express priorities among theories in terms of topics. Thus, let T1 >t T2 mean that theory T1 is to be preferred over theory T2 on topic t. In our framework we can express this as, {al > {bl, b2 . . . . . bj}, a2 > {bl, b2 . . . . . bj} . . . . . ak > {bl, b2 . . . . . b j}}, where {al, a2 . . . . . ak} are the statements in T1 on topic t and {bl, b2 . . . . . bj } are the statements in T2 on topic t. In future work we shall consider the problem of combining Horn theories, normal theories, disjunctive normal theories, and first-order theories, with priorities amongst arbitrary statements.
8. Algorithm Algorithm f o r combining theories INPUT: I C = {ICa, IC2 . . . . . ICs }, a set of integrity constraints; T1, 7"2. . . . . Tn, set of theories consisting of propositional atoms, each satisfying IC;
P, an acyclical set of priorities; t y p e - - h a s the value Cautious or NonCautious indicating whether the combination should be Cautious or not. OUTPUT: A maximal combination of T1, T2 . . . . . T, that satisfies I C and P. STEP1: Find Active_lC the set of integrity constraints that violate the union of the theories. Active_IC +-- {}; B +- Tl U T2 U . . . U T,;
F o r i = 1 t o s do If Obj(ICi ) cc_ B then Active_lC = Active_IC tA ICi ; Active_IC = {ICa . . . . . ICk } is the set of integrity constraints which violate the combined theory B. If Active_lC is empty then the combined theory is consistent with the integrity constraints and will satisfy all the priorities in P and the algorithm terminates. I f A c t i v e _ I C is not empty continue with STEP2. STEP2: Calculate the set of options. O +-- {B}; Call Fragment(O, Active_IC), where Fragment(O, X) is the following reeursive procedure: recursive procedure Fragment(O, X) If X 5& {} then
begin temp_O = {{}}; Choose any member of X; call it X j; for each oi ~ 0 do if O b j ( X j ) c oi then for each atom ak ~ O b j ( X i ) do temp_O = temp_O U {oi - {ak}};
else temp_O = temp_O tA {oi};
COMBININGDATABASESWITHPRIORITIZEDINFORMATION
253
x = x - { x j}
0 = temp_O; Fragment(O, X);
end. STEP3: Compute the semantics If type = Cautious then for each oi ~ 0 do amp(o/) = {}; ( . amp(o/) is a local variable which stores the atoms which will be added to oi in amplifying it *) For each P i ~ P do ( . check if 0 i satisfies Pj *) for each oi c 0 do if greater(Pj) r then if lesser(Pj) f'l oi ~ 13 then if CheckBlock(oi, Pj, Active_IC) then ( . if there's no block . ) if type r Cautious then 0 +-- 0 - oi ( . delete the offending option .) else amp(oi) +-- amp(oi) U greater(Pj); ( . store greater(Pj) in amp(oi)*) If type = Cautious then for each oi ~ 0 do oi +-- oi U amp(oi)
(* amplify each option with its amp .)
STEP4: Compute the combination
Return CNF(VojeSem(rl,r2,...,r,;iC;e ) (Aasoj A)). Sub-Procedure: CheckBlock INPUT: oi, an option; Pi, a priority; S, a set of integrity constraints;
OUTPUT: returns true if greater(Pj)-block r constraints in S, false otherwise.
Oi --
lesser(Pj), relative to the integrity
STEPI: Compute the subset of S that might generate a block to greater(pi) IC' = {ICi c S I greater(Pj) 60bj(ICi)}
STEP2: Check for a block to greater(Pj) in oi - lesser(Pj) Found +- false; k+-l;
254
PRADHAN, MINKERAND SUBRAHMANIAN
While not Found and IU ~ {} do if for 1Ck ~ 1U, Obj(IC~) c_ oi - lesser(Pj) then Found +-- true else k +-- k + 1 Return Found
8.1. Complexity of the algorithm Let the input to the algorithm be T~, T2. . . . , Tn, the theories to be combined, IC, the set of integrity constraints, and P, the set of priorities. We can take the size of the input to be m = IT11+ IT21+ - - " + IT~I. For the sake of simplicity, we are assuming that the contribution of the size of IC and the size of P to the size of the input is negligible compared to the contribution of the size of the theories to be combined. Now it is easy to show that because of Step 4 of the above algorithm, in the worst case, the algorithm is essentially exponential in the size of the input, i.e., m. In the worst case, the cardinality of the base set, B, will be m, i.e., when each member of {T1, T2. . . . . T~} is disjoint from every other member. In the worst case, we, also, have the maximum possible number of options, given the size of B. So if m is the cardinality of the base set, then the maximum number of options in O will be
(/2)'m Each option will contain at most m/2 atoms. If the type of semantics is non-cautious, then in the worst case the options satisfy all the priorities; else, if the type is cautious, then the worst case is when each option fails to satisfy the maximal number of priorities and, thus, each option is maximally amplified. We shall only consider the former case since it is easy to see that if the former case is exponential, then so is the latter. In the former case, Sem(T1, T2. . . . . Tn; IC; P) = O. So in the worst case, the number of options in Sem(Tl, T2 . . . . . Tn; IC; P) = O will be
Since each option will contain at most m/2 atoms, CNF will generate
~)
m/2
formulae. Thus, Step 4 of the algorithm is exponential in m, the size of the input. Steps 1, 2 and 3 of the algorithm consist in unioning two sets or intersecting two sets or checking whether a set is a subset of another set. These sets are always a subset of the base set, so these operations are polynomial in m. Furthermore, the number of times these operations are done in these steps is either independent of m or polynomial in m. Hence Steps 1, 2, and 3 are polynomial in m.
COMBININGDATABASESWITH PRIORITIZEDINFORMATION 9.
255
Proofs of Lemmas
LEMMA 1. Let T1, T2, . . . , Tn be a set of theories each consisting only of propositional atoms. Let IC be a set of canonical integrity constraints. Let 0 be the set of options induced on B = T~ U T2 U . . . U Tn by IC. Then,
f o r each atom x ~ Obj(ICi) such that ICi E IC and ICi is active in B, there exists an oi E 0 such that Obj(ICi) - {x} c o i . PROOF. For each atom x c Obj(ICi), Obj(ICi) - {x} satisfies ICi. Since IC is canonical, there is no ICj ~ IC such that IC/ subsumes ICj. So there is no ICi which would be violated by Obj(ICi) - {x}. Hence, Obj(ICi) - {x} violates no integrity constraints in IC. Given that IC/ is active, Obj(ICi) c B, so Obj(ICi) - {x} _c B. Since O contains all the maximally consistent subsets of B, it follows that there exists an oi E 0 such that Obj(ICi) - {x} __coi. [] LEMMA 2. Let I"1, T2 . . . . . T, be a set of theories each consisting only of propositional atoms and each satisfying a set of integrity constraints IC. Let P be a set of priorities. Let 0 be the set o f options induced on B = T1 U T2 U . . . t3 Tn by IC. Then,
Comb(Semi(T1, T2 . . . . . Tn; P; IC)) satisfies IC. Comb(Sem2(T~, T2 . . . . . Tn; P; IC) ) satisfies lC. PROOF. Since Semi=Sere2, Comb(Sem 1) = Comb(Sem2). So proving the lemma for any one will suffice to prove it for both. We choose to prove it for Seml. Comb(Semi) can violate an ICi c IC only if Obj(ICi) c_ oi, for some oi ~ (Seml (T1, T2, . . . . T,; P; IC). But, since, Seml(T1, T2 . . . . . T,; P; IC) c_ O, this would mean that some of the original options violate IC/. But that is not possible by the definition of an option. [] LEMMA 3. Let TI, T2 . . . . . Tn be a set of theories each consisting only of propositional atoms and each satisfying a set of integrity constraints IC. Let P be a set of priorities. Let~ 0 be the set of options induced on B = T1 U T2 U 99 9 U Tn by IC. Then,
Comb(Semi(T1, T2 . . . . . Tn; P; IC) )satisfies P. Comb(Sem2(T1, T2 . . . . . T~; P; IC))satisfies P. PROOF. Since S e m l = Sem2, Comb(Sere 1) = Comb(Sere2). So proving the lemma for any one will suffice to prove it for both. We choose to prove it for Seml. Given the nature of Comb and Seml it is clear that the minimal models of Comb (Semi(T1, T2 . . . . . Tn; P; IC)) are precisely the options in Seml(T1, T2 . . . . . Tn; P; IC). According to Definition 15, a disjunctive theory satisfies P iff all its minimal models satisfy P. Since each member of (Semi(T1, T2 . . . . . Tn; P; IC) satisfies P it follows that all the minimal models of Comb(Seml(T~, T2 . . . . . Tn; P; IC)) satisfy P. Hence, it follows that Comb(Seml (T~, T2 . . . . . Tn; P; IC) ) satisfies P. []
256
PRADHAN, MINKER AND SUBRAHMANIAN
LEMMA 4.
Let B be a base set, let oi be one o f the options induced on B by a set o f integrity constraints IC, and let x > Y be a priority. Then, the result o f transforming oi by x > Y satisfies IC.
PROOF. Either o i satisfies x > Y or it does not. If it does then x > Y does not transform oi and so there is nothing to prove. So let us assume that oi does not satisfy x > Y. Definition 9 implies that x-block r - Y. But the definiton of x-block (Definition 8) implies that if x-block f~ oi - Y then there is no subset X of oi - Y such that x U X violates IC. That is, the result of replacing oi f'l Y by x does not violate IC. [] LEMMA 5. L e t O b e t h e s e t o f o p t i o n s d e f i n e d o n a b a s e s e t B b y a s e t o f i n t e g r i t y c o n s t r a i n t s IC. Let P = { P1 . . . . . Pk } be a set o f priorities. Let P(a) be any arbitrary permutation over P. Then f o r any o i E O , there exists an oj E 0 such that I"n (P(~), oi) c_ o j , f o r any n > O.
PROOF. O contains all the options defined on B by IC. By definition, each option is a subset of B that is maximally consistent with IC. Either F ~ (PI~), oi ) is maximally consistent with I C or it is not. (It follows from L e m m a 4 that it is consistent with IC.) If it is maximally consistent then it must be equal to some oj 6 0 . If it is not maximally consistent then it must be a subset of such a maximally consistent set. In either case it is a subset of some member of O since O contains all subsets of B maximally consistent with IC. [] LEMMA 6. Let P be a set o f priorities. Let P~I be any arbitrary permutation o f P. Let F"(P(~), O) be stalled. Let r > n be such that I'n(P(~), O) = I'r(P(~), 0 ) . I f oi F"(P(~), O) and oi does not satisfy P then: r --n, P(~)
1. There exists an ok ~ I'r(P(a), O) such that o i = ok. That is, ok can be obtained f o r m oi by r - n iterations o f the sequence o f priorities, and 2. ok does not satisfy P, and 3. Nothing other than oi is transformed into ok in Fr ( p(cr), 0).
PROOF. r--n P(,O
I. B y t h e d e f i n i t i o n o f ( r - n ) t h t r a n s f o r m , o i
'
= Fr-n(P(~), oi).Sinceoi E I'n(P(~), O), r --n, P(~)
it follows that F r-n ( P(cr), Oi) E I ~r-n+n (P(tr), O), (See Remark 4.) That is, o i r-n,P(a)
F r (P(~), O). Hence, F r (P(~), O) must contain an ok such that o i 2. Suppose to the contrary ok satisfies P.
c
= ok.
Then o2-n'el~) = ok. Since o~-n'P~ = ok
and o2-"'pI~ = ok (that is, both oi and Ok are transformed into ok in I ' r ( P l ~ , O)), I'r(P(~), O) would contain at least one member less than Fn(P(~), 0 ) . But this is not possible since I'n(P(~), O) = Fr(P(~), 0 ) . 3. Suppose to the contrary, there exists a j 5k i such that o} -"' p~o~ = ok. Then by the argument in 2 above, we get a contradiction. So o~-n'~/~ r Ok.
[]
LEMMA 7. Let P be a set o f priorities and let PI~) be any arbitrary permutation o f P. Let 17n(P(~), O) be stalled. Let r > n be such that I'n(P(~), O) = Fr(P(~), O). Then,
COMBINING
DATABASES
WITH PRIORITIZED
257
INFORMATION
there exists an o i E F"(P<~>, O) such that o i does not satisfy P and such that there r - n , P(~}
is a cycle o f transformations in which oi ~
oi
r - n , P(~> Oim ~
0 i.
r - n , P<~ 0i2 ~
0i2 ~
0i3 , . . . , Oi m ~
r--n, P{~)
= oil, oit ~'~ oi~
= oi2,
PROOF. SinceFn(P(~), O) d o e s n o t s a t i s f y P, t h e r e m u s t b e a t l e a s t o n e o i c Fn(P(,), O) r - n , P{o)
r - n , P(o)
such that oi does not satisfy P. Either oi "~ o i = oi or oi "* o i = Oil, such that oi ~ oi,. If the former holds, we are done. So, let us suppose the latter holds. Then, by Lemma 6, part 2, oii does not satisfy P. Since oi~ c Fn(P(a), 0 ) , it must be transformed r - n , P(~)
into something in F r (P(~/, O). So oil "~ oil = % . Since oi ",~ oi,, by Lemma 6, part 3, oil # % . So either oi2 = oi or % # oi. In the former case we have the desired cycle of transformations. So assume the latter case. Hence, suppose by the argument above using r--n, Ply) Lemma 6, part 2, that oi2 "-+ oi: = 0 6. Again, by the argument above using Lemma 6, part 3, 0 6 r oi~ and 0 6 # oil. But, as we saw before, i f o i 3 = oi, we have the desired cycle. But, if 0 6 # oi, then it must be transformed into oi4, and so on. But, since the cardinality of F~(P(~), O) = Fr(p(~), O) is finite, this chain of transformations must end somewhere. Suppose it ends with Oim. But by Lemma 6, part 2, oi,,, must be transformed by the priorities. r - n , P(q)
Om
r - n , P(~)
# oil, Om
r - - n , P(,~)
So Om
r - n , P(,,)
By Lemma 6, part 3, Om
#
Oim,
# 0i2, and so on for every member of the chain except the first one.
"-* oi. Thus, we have the desired cycle.
[]
LEMMA 8.
Let IC be a set o f integrity constraints, let 0 be a set o f options induced on a base set B by IC. Let P be a set o f priorities. Then, i f P is unsatisfiable with respect to IC and 0 then, f o r each permutation, P(uI, o f P, there exists an n such that Fn(p(,), O) is stalled.
PROOF. Assume that P is unsatisfiable with respect to IC and O. Let P/~/be any arbitrary permutation of P. By the definition of unsatisfiable, there exists no n such that Fn(P(~), O) satisfies P. By Lemma 5 we know that each iteration of I'(P/~), O) produces some subset of the power set of B. Since the base set B from which O is generated is finite, there are only a finite number of such subsets. So there must be an integer r such that after r iterations of I'(P(~/, O) some subset of the power set of B gets repeated. Hence, there must be an n, r > n, such that I'~(P(~), O) = Fr(p(a), 0 ) . It follows that pn(p(~), O) is stalled. [] LEMMA 9.
Let P be a set o f acyclical priorities. Let 0 be the set o f options induced on a base set B by some set o f integrity constraints. Then, f o r any permutation Pl~)of P , there exists an integer nsuch that r'n( P(,), O )satisfies P.
PROOF. Since P is acyclical, by Theorem 2 no permutation of P is stalled. Let P{~/ be any arbitrary permutation of P. If P/~} is not stalled then either there is an integer s such that I'~'(PI~I, O) satisfies P or even though there is no s such that I's(P/~), O) satisfies P, there are no integers n and r such that n > r and Fn(P(~/, O) = Fr (P(~, O). If the former then we are done. So let us suppose the latter.
258
PRADHAN,MINKERAND SUBRAHMANIAN
Suppose that even though there is no s such that Fs(Pig ), O) satisfies P, there are no integers n and r such that n > r and I'n(P(~), O) = I'r(P(~), O). But, by an argument similar to that used in Lemma 8, this is not possible because sooner or later every subset of the power set of B will be repeated if there is no s such that F'~(P(~), O) satisfies P. So, there must be an s such that Us (P(~>, O) satisfies P. And this must be true for every permutation of P. [] LEMMA 10. Let 0 be the set of options defined on a base set B by a set of integrity constraints IC. Let P be a set of priorities containing no cycles. Let P(c~) be an arbitrary permutation o f P. Then,
if, f o r some n > O, Fn(P(gl, oi) satisfies P, where oi ~ O, then there exists an o.j c 0 such that I'n(P(g), Oi) C. Oj and oj satisfies P. PROOF. Assume there is an nl such that Okl = I"n~ (PIgl, oil) satisfies P. Either Okl is maximally consistent with IC or it is not. Phase 1: If Okl = I~nl (P(~/, oil) is maximally consistent with IC, then, as explained in the proof of Lemma 5, it must be equal to some oj c O. In this case, oj also satisfies P, and we are done. Phase 2: If Okl = I~nl(P/g/, Oit) is not maximally consistent with IC (but it must be consistent with IC by Lemma 4), then, by Lemma 5, there must be at least one option % such that Okl C oi2. Let O S = {oi ~ 0 [ okl C oi}. (We shall show below that some member of O S satisfies P, given that Okl satisfies P). Let oi2 be any member of OS. Let Pr be the first member of P that oi~ fails to satisfy. Since % does not satisfy Pr and Okl = Fn(P(g}, Oil ) does, it follows that:
1. greater(Pr) E Okl, but greater(Pr) r oi2. Or, 2. Okl -- lesser(Pr) contains a greater(Pr)-block, but oi2 - lesser(Pr) does not contain a greater( Pr )-block. Or, 3. lesser( Pr ) N Okl = 0, but lesser( Pr ) A oi2 ~: 0. Since Ok~ C oi2, (1) and (2) above cannot be true. So (3) must be true. Given this, Pr transforms oi2 by replacing lesser(Pr) N oi: with greater( Pr ). In other words, it changes only that part of % which does not belong to ok~. And this is true for any priority satisfied by o~l , but not by a superset of ok~ : the transformation changes only that part of the superset which does not belong to ok~. So, any such sequence of transformations of % must still be a superset of ok1. Phase 3: By Lemma 9 we know that there must be an n2 such that F~2(P(g~, % ) satisfies P, since P contains no cycles. Either OkL = I~"~(P(~), oiL) is maximally consistent with IC or it is not. If the former then we can reapply the argument of Phase 1 to get the desired o j, which proves the lemma. If % = I'"~(Pig ~, oiL) is not maximally consistent with IC, then Okl C ok2, and then we can reapply the argument of Phase 2. But each time we apply the argument of Phase 2 we end up with an ok~ which is a superset of Ok1 = I"~1 (P(~I, oil) and which is of at least one greater cardinality than %_~. But since all members of O S are of finite cardinality, there must be an i such that o~ is equal to some member of OS, and, so, maximally consistent with IC. (Recall that all members of O S are maximally consistent with IC and are strict supersets of Ok~ = I~nl(P/g), 0il))" So by the argument of Phase 1, such an ok~ is the desired o.j, and the lemma is proved. []
COMBININGDATABASESWITHPRIORITIZEDINFORMATION
259
LEMMA 13. Let I"1, 1"2. . . . . T~ be a set of theories each consisting only of propositional atoms and each satisfying a set of integrity constraints IC. Let P be a set of acyclical priorities. Let 0 be the set of options induced on B = T1 U T~ U ... U T, by IC. Then, ifB is non-empty then Comb(Sem 1(T1, T2. . . . . Tn ; P; IC) ), Comb(Sem2(T1, 1"2. . . . . Tn; P; IC)), and, Comb(Sem3(T~, T2 . . . . . T,; P; IC)) are non-empty. PROOF. Since S e m l = Sem2= Sem3, it will suffice to prove the lemma regarding any one of them. We shall prove that Comb(Sem3(T1, 7"2. . . . . Tn; P; 163) is non-empty, if B is nonempty. Clearly, Comb(Sem3( T1, T2 . . . . . In; P; IC) ) is non-empty, if Sem3( T1, Tz . . . . . Tn; P; IC) is non-empty. Hence, we shall show that Sem3(T1, T2 . . . . . Tn; P; IC) is non-empty if B is non-empty. Assume that B is non-empty. Hence O is non-empty. Let P(~/ be an arbitrary permutation of P. Since P is acyclical, from Lemma 9 we know that there exists an n such that Pn(P{~/, O) satisfies P. But if O is non-empty, then r'n(P/~/, O) cannot be empty since the transformation of an option can make that option disappear only if it has a duplicate. So Sem3(Ti, T2 . . . . . T~; P; IC) cannot be empty since it only takes the maximal elements of ['~(P(~/, O). (Clearly, the set of maximal elements of a non-empty set is non-empty.) Hence, given that B is non-empty, Sem3(Tl, 1"2. . . . . In; P; IC) is also non-empty. But then Comb(Sem3(T1, 1"2. . . . . T~; P; IC) ) must also be non-empty. [] LEMMA 14.
Let 0 be the set of options induced on B = T1 U T2 U . . . U T, by IC. Then,
f o r any acyclical set of priorities P, there is at least one option oi ~ 0 which will satisfy P. PROOF. From Lemma 13 we know that Seml(T1, T2 . . . . . Tn; P; IC) is non-empty (assuming that B is non-empty). But, clearly, any member of Semi(T1, T2 . . . . . T~; P; IC) is one of the original options. Hence, at least one of the original options must satisfy any acyclical set of priorities. [] LEMMA 15. Let TI, T2 . . . . . Tn be a set of theories each consisting only of propositional atoms and each satisfying a set of integrity constraints IC. Let P be a set of acyclical priorities. Let 0 be the set of options induced on B = T1 tO T2 U . . . U T~ by IC. Then, Comb(Sem4(T1, 1"2. . . . . In; P; I C ) ) weakly satisfies IC. PROOF. Comb(Sem4(T1, T2, . . . , T~; P; IC)) weakly satisfies IC only if there is at least one minimal model of it which satisfies all members of IC. By the definition of an option, all oi ~ 0 satisfy IC. So, to prove the lemma it will be sufficient to prove that at least one of the original options in O is a minimal model of Comb(Sem4(T1, T2, . . . , Tn; P; IC)). By the nature of the Comb function, any minimal member ofSem4(T1, T2 . . . . . Tn; P; IC) would be a minimal model of Comb(Sem4(T1, 7"2. . . . . Tn; P; IC)). Sere4 amplifies an option only if it does not satisfy the priorities; otherwise, it simply includes the option without amplifying it. From Lemma 14 we know that since P is acyclical at least one of the options satisfies P. That option must be a minimal member ofSem4(T1, T2 . . . . . Tn ; P; IC). Hence, it must be a minimal model of Comb(Sem4(Tl, T2 . . . . . Tn; P; IC)). Since it must satisfy IC, the lemma is proved. []
260
PRADHAN, MINKER AND SUBRAHMANIAN
LEMMA 16. Let T1, T2 . . . . . Tn be a set of theories each consisting only of propositional atoms and each satisfying a set of integrity constraints IC. Let P be a set of acyclical priorities. Let 0 be the set of options induced on B = 7"1 U T2 U . . . U Tn by IC. Then, Comb(Sem4(T1, T2 . . . . . Tn; P; IC)) satisfies allPi ~ P.
PROOF. Comb(Sem4(T1, T2 . . . . . Tn; P; IC)) will be a disjunctive theory. A disjunctive theory satisfies a priority if every minimal model of the theory satisfies the priority. Given the nature of the Comb function, it is clear that the minimal models of Comb(Sem4(T~, T2, . . . . Tn; P; IC)) are precisely the minimal members ofSem4(T1, 1"2. . . . . Tn; P; IC). Clearly, all members of Sem4(T1, 1"2. . . . , Tn; P; IC) satisfy the priorities. Hence, Comb(Sem4(T1, T2 . . . . . Tn; P; IC)) satisfies all Pi E P. []
Acknowledgment We gratefully acknowledge the help of Jose Alberto Fern~indez, Parke Godfrey, Don Perlis, and Carolina Ruiz.
References BarN, C., Kraus, S., and Minker, J. (1991). Combining multiple knowledge bases. IEEE Transactions on Data and Knowledge Engineering, 3,208-220. BarN, C.r S., Minker, J., and Subrahmanian, V.S. (1992). Combining knowledge bases consisting of first order theories. Computational Intelligence, 8, 45-71. Fagin, R., Ullman, J.D., and Vardi, M.Y. (1983). On the semantics of updates in databases. In Proc. 7th ACM SIGACT/SIGMOD Symposium on Principles of Database Systems (pp. 352-365). Gelfond, M. and Lifschitz, V. (1988). The Stable Model Semantics for Logic Programming. In R.A. Kowalski and K.A. Bowen, (eds.). Proc. 5th International Conference and Symposium on Logic Programming (pp. 10701080). Seattle, Washington. Ryan, M. ( 1991). Belief revision and ordered theory presentation. In P. Dekker and M. Stokhof, (eds.) Proc. Eighth Amsterdam Colloquium on Logic. Ryan, M. (1992). Ordered Presentation of Theories: Default Reasoning and Belief Revision. PhD thesis, Department of Computing, Imperial College. Ryan, M. (1992). Representing defaults as sentences with reduced priority. In B. Nebel and W. Swartout, (eds.) Proc. KR'92. Morgan Kaufmann. Subrahmanian, V.S. (1992). Amalgamating knowledge bases, Technical Report Univ. of Maryland CS-TR-2949, Department of Computer Science, University of Maryland, College Park, Md 20742. Currently being revised for publication in ACM Trans. on Database Systems. Tversky, A. and Kahneman, D. (1981). The framing of decisions and the psychology of choice. Science, 211, 453-458. Van Gelder, A. Ross, K., and Schlipf, J.S. (1988). Unfounded Sets and Well-founded Semantics for General Logic Programs. In Proc. 7th Symposium on Principles of Database Systems (pp. 221-230).