Optimum decision trees — an optimal variable theorem and its related applications

The problem of translation of a decision table into an optimal sequential testing procedure (STP) under two cost criteria is considered. We formulate ...

4 downloads 59 Views 1MB Size

Download PDF

Acta Informatica 22, 475-498 (1985) 9 Springer-Verlag1985

Optimum Decision Trees - An Optimal Variable Theorem and its Related Applications Masahiro Miyakawa Computer ScienceDivision, ElectrotechnicalLaboratory, 1-1-4 Umezono, Sakura-mura, Niihari-gun, Ibaraki-ken, Japan

Summary. The problem of translation of a decision table into an optimal sequential testing procedure (STP) under two cost criteria is considered. We formulate the STP as a decision tree and introduce its description and execution costs. As usual, optimal STPs are defined as minimum cost decision trees. We call a variable optimal when it is tested first in an optimal STP. In other words an optimal variable is one which is placed (and hence to be tested) at the root of an optimum decision tree. We introduce a notion of quasi-decisive variables and prove that under the strong equivalence assumption on quasi-decisive variables (SEA), they are optimal, and conversely, only the quasi-decisive variables are optimal whenever they exist in both costs (the optimal variable theorem). SEA holds in the general description cost and in the uniform execution cost. We show that SEA in the theorem can not be weakened in case of the general execution cost_ We describe an algorithm for the construction of an optimum decision tree which requires O(N t~ log N) comparison operations, where N is the size of the input table. Then we show that we can speed-up this algorithm by applying the optimal variable theorem in description cost case. In fact the number of executions of the inner-most loop of the algorithm is reduced greatly under fairly reasonable assumptions on the test data tables, resulting in a considerable reduction in the total execution time of the algorithm. As a related topic we discuss optimization problems of quasi-decisive chains under execution cost. Our optimization algorithm requires O(N log N) operations.

1. Introduction

Let us be given a set of L symbols called properties, denoted by the set { 1, 2 . . . . . L}. For simplicity assume that each property takes the binary value 0 or 1. Let i, possibly with a suffix, denote a property. Let xl, i--1 ..... L, denote

476

M. Miyakawa

the binary value of the property i. We call an element x of cartesian product {0, 1} L a vector and denote it by x = x l ... x L. Let I = { x l x s { 0 , 1}L}. We denote by {a 1..... aK} a set of K different symbols called actions. Each action is often denoted by a, b, c, possibly with any suffix to distinguish from each other. A mapping f from I onto (into) the set of actions {al . . . . . at} is called an L arity K action decision table or simply a table. The mapping f: I--* {a 1. . . . . aK} is expressed by f (x)= f (x 1 ... xL). Geometrically interpreting, an L arity K action table is an L-dimensional cube I, associated with each of its 2 L verticies an action from the set {a 1..... at}A pair consisting of a vector x and corresponding action f (x) is called a rule, and is denoted by G x = ( x , f ( x ) ) . We treat a table f as the set of all rules Gx; thus rules are units of which the table consists. We execute a given table f when we actually decide a corresponding action f ( x ) for an arbitrarily given vector x. If a vector x is explicitly given and the table f is stored in the memory of a computer, to execute the table is merely to table-search with the given key x. However the situation considered in this paper is different. We are given an object, and not the vector itself; x i is the value of its property i. It often happens that the value xi is not easy to know. Consider, for example, the diagnosis of a patient. It is impractical to have every result for all tests (vector x consists of all such results). There may be dangerous or expensive tests, and hence it is preferable to avoid unnecessary tests. Thus the diagnosis problem is to determine the disease with as small as possible number of tests. In these situations vector x for a given object is initially not known, and we investigate the values x~ sequentially one by one as we need them until the resulting actions to an action f ( x ) become uniquely determined. We use the word test for investigating the value x~, and call these procedures sequential testing procedures (STPs) [16, 17]. Although STP is a simple model, for example, one is not allowed to test several variables simultaneously, it is a basic model of sequential decision. Interpreting properties and actions in various concrete domains, a table can be regarded as a functional representation for various deterministic decision procedures. Some examples are: diagnoses (properties correspond to tests; actions correspond to faults or diseases), pattern recognition (properties correspond to features; actions correspond to names of patterns), programs (properties correspond to conditional properties; actions correspond to operations or sequences of operations), database searching (properties correspond to attributes; actions correspond to records to be searched), mental association (properties correspond to attributes; actions correspond to objects mentally associated by given attributes). Thus a table provides a functional representation for various important decision procedures, and hence it is a fundamental structure commonly underlying these important decision mechanisms. The simplicity and usefulness of decision tables are largely due to the fact that a strict order of their execution is not required. Hence their automatic translation into optimal or at least "reasonable" STPs is of significance and has long been studied by m a n y researchers [14-18]. It is important that in an STP it is not always necessary to test x i for all i. F r o m this property the request for designing an efficient STP arises. Namely,

Optimum Decision Trees

477

the problem is stated as follows: for each step of an STP assign a test variable so that decisions totally become as efficient as possible. To solve this optimal design problem we formulate the STPs as decision trees, and introduce the notion of subtables in Sect. 2. The efficiency associated with an S T P is introduced through rigorous definition of costs of a decision tree in Sect. 3. Section 4 is the central part of this paper. We introduce the notions of optimal variables and quasi-decisive variables, and investigate the conditions for optimal or nonoptimal variables. We prove that the quasi-decisive variables are optimal under the description cost, and under the execution cost when a special condition, namely the strong equivalence assumption of quasi-decisive variables (SEA), is assumed. We also prove that, conversely, whenever there exists a quasi-decisive variable, non-quasi-decisive variables can not be optimal under decription cost and under execution cost with SEA restriction (these two statements are combined to an optimal variable theorem). W e show that SEA restriction upon quasi-execution cost; thus showing that the direct extension of the theorem is difficult and very little can be mentioned of optimal variables in the execution cost without SEA. In Sect. 5 we describe an optimum tree construction algorithm which requires O(N l~ N) operations. We show that the optimal variable theorem greatly reduces this c o m p u t a t i o n - f o r most practical input tables in description cost case. We also present an optimization problem of quasidecisive chains which is characteristic to the execution cost. Our algorithm solves it in O(N log N) operations.

2. Decision Trees In this section we introduce a decision tree which represents an STP and provide a notation for it.

2.1. Subtables and Prefix Notation It is assumed in this paper that we are given an L arity K action table f as an initial table. A subtabte is a table which is derived from f by assigning its h variables xi,~ to fixed constants sin, sine{O, 1 } ( m = l .... ,h, O < h < L ) . T h a t is, a subtable is a table corresponding to an (L-h arity) function f ( x ~ ... s I ... x i . . . sh... xL). The variables x~m, m = 1.... , h, are called fixed (to sin, m = 1. . . . . h), and the remaining variables are free. In other words, a subtable consists of all rules having vectors with some of their elements equal fixed constants. The n u m b e r of free variables is the arity of the table. Note that the initial table is the highest arity subtable, arity L, and a rule is the lowest arity subtable, arity 0. T w o subtables are called equivalent if they have the same set of free variables and they coincide when considered as functions of these variables. Geometrically speaking, a subtable with arity L - h is an L - h dimensional subcube of an L dimensional cube. Since the initial table consists of 2 L rules, we have 2 2'- different subsets, a m o n g them 3 L subsets are subtables.

478

M. Miyakawa

E x a m p l e 2 J . A subtable in which fixed variables are x~=0, X3=1 and x 4 = 0 corresponds to the function f ( O x 210x 5 ... xL).

The notion of subtable plays a central role in this paper, and hence we prepare several notations for it. It is sufficient to represent a variable xi simply by the index i. Hereafter we call i a variable, and x~ its value. As we already saw, a subtable f ( x 1 ... s~ ... x i ... s, ... x t ) is uniquely characterized by its fixed variables. Thus it is convenient to provide a simple notation to represent them. We call a symbol tm'sma fixation. The set of h fixations ~ - - { l.l.,. , s l th'sh}' 1 < im< L, sine{0, 1}, and O < _ m < h < L , is called a f i x a t i o n set of the subtable. A subtable consists of all rules that have the same fixation set. A fixation is treated as a " m a s k " operator to x; i.e. i~" x = x ~ . . . xim_ 1 s,,x~,,+ ~ ... x L. Thus a fixation set is represented by the concatenation (product) of the fixation operators and it is called a prefix. Let 0t' be a permutation of the prefix 0c Then by the definitions of a fixation set and prefix, we have ~ x = c t ' x , i.e. fixation operators are commutative. Thus the subtable can be represented by h! different prefixes. The null prefix 2 corresponds to the initial table f. The subtable corresponding to a prefix ~ is denoted by fct, which is a restriction of the table f to the domain ctI = {~xlxeI}. As a useful identity, we have f0t(0cx)=f(ctx). Various properties of decision trees are conveniently studied through fixation operators and the prefix notation. E x a m p l e 2.2. f ( l ~ 1 7 6

=f(4~

l~

=f31401~

l~

= f ( 0 x 210x 3 ... xL).

2.2. Dividing a Table

By dividing a L - h arity table f0t by one of its free variables i, we mean to separate the rules otx of f ~ according to x i = 0 or 1 into two L - h - 1 arity subtables f i ~ or filc~ respectively. Thus a prefix represents successive dividing of the initial table.

2.3. An Algorithm to Construct a Decision Tree

We assume a binary tree is directed from the root. In fact every tree we deal with is an extended binary tree [3, p. 399]. Each node of a binary tree has at most one in-edge and at most two out-edges. Leaves are nodes with no outedges. The remaining nodes of the tree are called internal. The root of a tree is denoted by R. Level of a node r, denoted by lev (r), is the number of edges from the root R to the node r (lev (R)=0). The path of a node r is the set of nodes which are connected by the edges from the root R to the node r. For two nodes r 1 and r 2 on a path, the path f r o m r 1 to 1"2, denoted by l-r1, r2] , is the set of nodes which are connected by the edges from r~ to r 2. A decision tree for f is a binary tree associated with each internal node a subtable and a test variable, and with each leaf a decision action according to the following algorithm.

479

Optimum Decision Trees

Corresponding subtable is defined inductively for each node beginning from the root to its sons and so on. D1. [Initialize.] Set L E V = 0 . M a k e the root R and let the initial table f correspond to the root. D2. [Finish?] If there is no node at level LEV, finish, else apply D 3 to all nodes having level LEV. D3. [Extend or terminate the nodes.] We denote the subtable corresponding to the current node by f a .

Case 1. f ~ is not a constant table. Select a variable i arbitrarily from the set of its free variables and divide it by i. Place i in the node as the test variable. M a k e left and right sons at level L E V + 1, and let subtables f i~ and f i l e correspond to them respectively. The two out-edges leading to the left and right sons are labeled 0 and 1 respectively. Case2. f ~ is a constant table. Place the action aj in the node (f~(o~x)=aj for a vector ex). D4. [Next level.] Set L E V = L E V + I

and go to D2.

We assume that each node contains a corresponding subtable. N o t e that any subtree can be an initial table. We call a decision tree simply a tree and denote it by T, or T i f f ] when it is necessary to indicate the initial table, possibly with a suffix. The number of non-constant subtables which a p p e a r in T is at most 2 L - 1.

2.4. Notation for Nodes of a Tree We denote an internal node having a subtable f ~ and having a test variable i by (r, i, f~). Note that when r is the root we replace r by R. Nodes may be suffixed such as r 1, r 2 . . . . . When we need not to indicate either a test variable or a subtable of the node, we place the symbol * in the corresponding place. A leaf having a decision action f ~ is denoted by t(f~). We denote Ts subtree which has a subroot (r, i, f~) by T(r, i, f~). The variable i and the subtable f ~ are called the test variable and subtable of the subtree respectively. We call subtrees equivalent when they have the same subtable. Usually a path is denoted by a sequence of nodes. In this situation to indicate an edge label leading from a node r~ to a node rm+ 1, we denote the node rm by rm(i~m). Here s m indicates the value of the test variable im when the edge is traversed in the execution. Given a tree T(R, i~,f), assuming that a node r = rh+ ~ is located in the path R = r x ( q ) .~ , r 2 ( z.s2 ,~ r=rh+l(ih+l), a 2 ) , ... , rh(zh), prefix of fixations ~=~"/~h~_-~ ... ~1 represents the path JR, rh§ uniquely. The prefix 7 is called a path prefix of r. Subtable of a node having a path prefix is represented by f ~.

Example 2.3. Figure 2.1 shows a path and path prefixes of the nodes.

M. Miyakawa

480

R=r~ II

lh-t

" " " It

.$1l .$ll-I . r

l'h.

lh

lh-t

..

.St it

Fig. 2.1. A path and path prefixes of the nodes

3. Description and Execution Costs L For an L arity table f, there can be at most I-[ i2'-I equivalent trees 1"18]. To i=1

construct "efficient trees" we need precise definition of efficiency which adequately reflects the actual situation of STP. To do this we would associate with a tree various quantities that reflect the amount of resources consumed when the corresponding STP is actually executed. A m o n g these we include required storage to describe the whole procedure and its execution time. These quantities are called "costs". For this purpose two different costs are associated with each variable i, 1 < i < L: description cost C~ and execution cost C~. The description cost, abbreviated by d-cost, represents the amount of any resources needed for storing the testing procedure of variable i. The execution cost, abbreviated by e-cost, represents the amount of any resources required to execute the testing procedure for variable i; e.g. time, labor, error rate and anything that is required for the execution.

32. Description Cost (d-Cost) of a Tree We define description cost (d-cost) of a tree T by IT}a =

~

C/a.

(r,i,*JeT

Note that IT[ d is defined through test variables i of all internal nodes. Variable i appearing several times in T is counted separately. Since no test is required at leaves, ITI d is the cost for representing the whole test procedures of T. (We don't assume that we prepare only a single test procedure for each i and then at each internal node we place " b r a n c h " c o m m a n d to the corresponding test procedure.)

3.2. Execution Cost (e-Cost) of a Tree " A v e r a g e " cost related to an execution of a tree depends on execution frequencies of the rules {Gx=(X,f(x)) }. We assume that we are given a priori a

Optimum Decision Trees

481

distribution function p(x) of execution probability over the set I. The function p(x) is the normalized execution frequencies of rules, i.e. p(I)= ~ p(x)= 1 and xel

p(x)_>_0 for all x. The execution probability of a set of rules is the sum of p(x) over all elements x of the set. Hence for any subtable f~, its execution probability, i.e. the probability that one of the rules in the subtable is executed, is given by p(foO=p(~I)= ~ p(~x). ~xe~l

Obviously the execution probability for the initial table f is p ( f ) = 1. Every rule 0ex of the subtable f e of the node (r, i, f ~) passes the node with the probability p(ex), and no other rule passes it. Hence the execution probability of the node (r,i,f~) equals p(fe). Since every node (r,i, f e ) is executed with the cost C~, the summation ~. p ( f ~)C~ is an average execution cost for a decision, where the sum is taken over all internal nodes. Thus we define the execution cost (e-cost) of a tree T by ITI e =

y

p(f~t) C~.

(r, i, f~t)e T

This is an internal node expansion form of e-cost. An alternative expansion form for which the meaning of the e-cost becomes clearer is given as follows. For a subtable f ~ = f ~ h i ~ - ; ... ~1, its prefix cost, denoted by pcost(f~), is defined by the sum of the e-costs of its fixed variables; namely h

pcost(f~t)= ~

C.~lm 9

A leaf expansion form of the e-cost of the tree T is: [TI e=

p(f~) pcost(f~).

~ ~(f ~t)eT

This is easily seen since the prefix cost coincides with the weighted path length of the leaf t(f0 0, which is defined by the sum of the e-costs of the test variables on the path [R, t]. The leaf expansion form represents the expectation of the ecost for a decision action. It is expressed through the subtables of leaves, and not through the paths.

T(R,i,f)

o,oC kojo/o o

b Fig. 3.1.

An example tree

482

M. Miyakawa

Example 3.1. The internal node expansion form of the example of Fig. 3.1 is p ( f ) C e § p ( f i ~ C~ + p(fjo i o) C~,§ p ( f i 1) C~. Meanwhile the leaf expansion form is

(p(fk~ ~ i ~ + p ( f k l j ~ i~

e + C~ + C~) + p ( f jl io)( c~ + C;.)

+ (p(fk~ 1) + p ( f k ~i~))(C~ + C~). F r o m the conservation law of probability we have p ( f ~ ) = p ( f i ~ and the above two expansion forms are equivalent.

1ct),

3.3. Optimum Decision Trees and Their Costs as Invariants Associated with a Decision Table We call a tree optimum and denote it by Topt[f] when its cost is minimum a m o n g all trees corresponding to the tabte f. When it is necessary to distinguish between d-cost and e-cost we write Toptd[f] and Tovt~[f] respectively. In the special case when Cf = 1 for all i, ITI a represents the number of whole nodes (tests) of T. Also in the case C~= 1 for all i, ITI~ is the average number of internal nodes (tests) necessary to make a decision. A test corresponds to dividing of a subtable. Hence when C~= C~= 1, the d-cost and e-cost correspond respectively to a total number and an average number of the dividing of subtables necessary to decompose an initial table into constant subtables. Especially the costs IToptd[ and [Top,el of an o p t i m u m tree are respectively the lower bounds corresponding to the above quantities, and hence they can be considered as structural invariants of the initial table.

3.4. Cost of a Subtree F o r any subtree T(r, i, fct) there is an isomorphic tree T[fo~]. As for d-cost these trees have the same costs. However, as for e-cost we have [T(r,i,f~)l e = p ( f ~ ) l T [ f c t ] l ~. A minimum cost subtree for a subtable f ~ is called an optimum subtree for f~. Thus an o p t i m u m subtree for a subtable fct is isomorphic to an o p t i m u m tree for the initial table f = f ~ . To describe an optimum tree construction algorithm, it is convenient to provide expressions which relate the cost of a tree to the costs of its subtrees.

3.5. Recursive Representation of Costs of a Tree and Their Uniform Notation The cost of a subtree T(r, i, f ~) in Fig. 3.2 can be represented by the costs of its two subtrees as follows

IT (r, i, f~)ld = C~ + I T (ro, j, f i ~ ct)ld + I T (rl, k, f i 1~)[d,

(3.1)

Optimum Decision Trees

483

T(r, t,1a) p (.fa)

r(ro, j',fi~ r(rl , ks.fib) Fig. 3.2- A subtree T (r, i, f a). p(f ct)=p(f i~=)+ p(f iIct) and

IT(r,i, fct)le=p(fe)C~+lT(ro,j, fi~

k, file)l e.

(3.2)

This is easily seen from the internal node expansion form. To represent these two formulas in a unified notation, let the cost of the node (r, i, foO be represented by C,(fe), where

C~(f ct)

}'C~

l p(fct) C~

in case of d-cost, in case of e-cost.

(3.3)

By this notation we represent the cost of a subtree T(r, i, for) for both cases by the cost of the node (r, i, f~) and those of its two subtrees as follows

IT(r, i, f~)l = C i ( f a) +IT(to,j, f i~ a)l + IT(rt, k, f i t e)l-

(3.4)

3.6. Strict Monotonicity of Costs of Subtrees Since Ca>0, C~>0 and p(f~)>=O we have p ( f ~ ) C 7 > 0 except when p ( f a ) = 0 , in which case we may identify the subtree corresponding to the subtable fct with a null tree. Then from the formulas (3.3) and (3.4), we have: L e m m a 3.1. The cost of a tree is greater than the cost of its any proper subtree. In this section we have introduced the notion of description and execution costs of a tree and provided a unified notation to represent them. Compared to the d-cost case, where the cost can be described by " p u r e " combinatoric notions, e-cost case is more complex, because of the probabilistic structures superposed by the introduction of "execution probability" upon combinatorial structure of decision trees. Under uniform costs (Ca= CT=const. for all i) and uniform execution probability (p(x)=const. for all x) the optimum d-cost trees and the optimum e-cost trees are closely related [10], nevertheless the two cost criteria are not equivalent. Before investigating construction algorithms of optimum trees, we prove a theorem which is important in the theory of optimum decision trees. They also clarifies the delicate difference between dcost and e-cost structure. This is done in the next section.

484

M. Miyakawa

4. Quasi-Decisive Variables and the Optimal Variable Theorem We call a variable i an optimal variable of f if there is an optimum tree Topt(R, i, f ) having i at its root. Also the variable i is called essential if f(i~ for some x, i.e. if the two subtables f i ~ and f i 1 are not equivalent. Otherwise we call i nonessential; f ( i ~ for all x. In this paper two subtables which differ from each other only in nonessential variables are identified. Again i is called decisive if each " b r a n c h " of i is a constant table, i.e. f ( i ~ = a 0 and f ( i l x ) = a l for all x.

Lemma 4.1. Assume i is decisive. Then f is a constant table or i is a unique essential variable of f (in the last case all the other variables are nonessential). Proof. f contains only two actions a 0 and a 1. Hence, if ao=a 1 then f is a constant. Otherwise we prove that all the other variables of f are nonessential. F o r j (j4:i), we show that fjo and f j l are equivalent, i.e. f j ~ 1 7 6 Consider x = i ~ y for each s = 0 and 1. f j ~ 1 7 6 1 7 6 while f j l ( j l x ) = f ( i S j l y ) = a ~ = f j ~ 1 7 6 for all x. []

Lemma 4.2. A nonessential variable i can not be an optimal variable. In other words, no optimum tree exists having a nonessential variable at its root. Proof. Since i is nonessential, the subtable f i ~ is identified to f Hence any tree having i at the root R is equivalent to R's left subtree, because this subtree corresponds to f i ~ Hence from the monotonicity of subtrees (Lemma3.1), T(R, i, f ) is not optimum. [] Corollary4.2.1. Nonessential variables do not appear as test variables in any

nodes of an optimum tree. Proof. A nonessential variable of f cannot be essential in any subtables fc~, since fc~(iSccx)=f(iSccx)=f(~x)=f~(~x) for s = 0 and 1. Hence by applying L e m m a 4.2 repeatedly the corollary is proved. []

Corollary 4.2.2. A unique essential variable of a table f is optimal. Proof. Table f has a single essential variable. Hence since f is not a constant, an o p t i m u m tree of f has at least a test variable (we assume hereafter the essential variable is intrinsic, that is the execution probability of any branch in it is greater than zero). Thus a unique nonessential variable is the test variable. [] Note that if i is decisive and f is not constant, then i is optimal and no other variable is optimal. A variable i of f is quasi-decisive if it is decisive except a single branch f i ~, where s = 0 or 1, i.e. if f ( i ~ for all x and f(i~x) is not a constant, or a dual. We abbreviate a quasi-decisive variable by q-d variable. The action a is called the decision action of the q-d variable. Each branch of a q-d variable is called decisive or indecisive according whether it is a constant or not. Again we assume hereafter that q-d variables are intrinsic, that is the execution probability for the decision action of any q-d variable is greater than zero.

Optimum Decision Trees

485

Note 4.1. A q-d variable is an essential variable. Note 4.2. Both the arity and the number of different actions of a table having a q-d variable is at least 2. L e m m a 4.3. I f both i and j are q-d variables, then the decision actions of i and j

are identical. Proof Assume that i and j are both 0-branch decisive and its decision actions are a and b respectively. We have f ( i ~ for all x, and f ( j ~ for all y. Substituting j ~ for x, and i~ for y, we have f ( i ~ 1 7 6 for all y and f ( i ~ 1 7 6 = b for all x. Thus we have a = b . []

Corollary 4.3.1. A table whose all variables are q-d variables contains only two different actions; one is the decision action of the q-d variables and the other is a single action. We call this table a quasi-decisive table, or a q-d table. An tree corresponding to a q-d table is a chain, which we call a quasi-decisive chain or a q-d L

chain (cf. Corollary 4.4.1). The d-cost for any q-d chain is ~ C di -_ const., hence i=l

for the d-cost any q-d chain is optimum. The situation is different for the e-cost case. We see later that the problem of q-d chains characterizes the difference between the optimal variables for d-cost and e-cost and makes the optimization of e-cost more complex than that of d-cost. We also note that q-d chains appear frequently in practice as "yes" and " n o " actions. This is discussed in Sect. 5.2 below. The next lemma is important for proving an optimal variable theorem; Theorem 4.5.

Lemma 4.4. Let i be a q-d variable of f. Then i is q-d or decisive in fj~ for any variable j of f (j ~ i) for s= 0 and 1. Proof Assume i is 0-branch decisive (the other case can be proved in a similar way). Then fj~(i~ for all x. If ffl(ilflx)~-const., then i' is q-d in ffl. Otherwise i is decisive in ffl since ffl(i~flx)=b for all x. []

Corollary 4.4.1. Assume further that j is a q-d variable of f (assume f i ~ = f j o =a). Then i is a q-d variable or a unique essential variable in the indecisive branch f j l (the last case occurs iff f has only two essential variables; i and j). Proof If fjl(iljlx)=const.,

then i is decisive in fjl. In this case f j l ( i l j l x ) = b =~a, since otherwise j is not q-d. Thus i is a unique essential variable in fj~ and f contains only two q-d variables; i and j. []

Corollary 4.4.2. Again assume alternatively that j is not a q-d variable o f f (this implies that j is not decisive). Then i is a q-d variable or a unique essential variable in f fl for s = 0 and 1 (if i is a unique essential variable both in f jo and f j~ then f contains only two essential variables; i and j). Proof If ffl(i~flx)~-const., then i is a q-d variable in ffl. Otherwise if ffl(ilj~x) = b for all x, then b =I=a, since otherwise we have j at least a q-d variable, which contradicts to the assumption. Thus i is a unique essential variable in ffl. []

486

M.

Miyakawa

Before going to the theorem we explain an important condition on variables.

q-d

The Strong Equivalence Assumption of q-d Variables (SEA) We call all the q-d variables of f equivalent if the costs of any q-d chains constructed from them are identical. For d-cost the set of all q-d variables for any f is equivalent, while for e-cost this is not always true. However, in the subsequent part of this paper it is often convenient to adopt the following assumption.

The strong equivalence assumption of q-d variables (SEA): all q-d variables are equivalent for each subtable of f. The equivalence condition of q-d variables can be expressed in a slightly formal way. Let f have m q-d variables; for simplicity let us represent them by numbers 1, ..., m, and assume that they are all 0-branch decisive. We may assume m _ 2 ( m = 1, a single q-d variable is trivially equivalent). Let an arbitrary q-d chain T1 be R=q(il) .... ,r,_1(i,_1) , r,(i), rt+l(j), rt§ m. Let T2 be the q-d chain which is resulted from T~ by interchanging the adjacent test variables i and j; T2: R=r1(il) ..... rt_l(it_O, r,(j), r,+l(i), r,+2, ..., r,, (see Fig. 4.1). If all q-d variables are equivalent, then ITII=IT21 for all t ( l < t < m - 1 ) and for any i and j. This is equivalent to

Ci/p( f i~ct)= Cjp( f j ~~)

(4.1)

for any f0t, where e = i l ...i~_i: the prefix for the test variable i in Tt, i t .... , i,_ 1e { 1. . . . . m}, 1 < t < m - 1 and for any i,jr {it,... , it_ i}- Conversely, if (4.1) holds for any adjacent test variables i and j of the chain and for any such e, then the costs of all m! q-d chains are identical, since any permutation of qd variables on the chain can be transformed each other by successively interchanging adjacent variables.

~

,i,Ia)

O ~Tt+f ,J,fZIcl)

r,

,],la)

O ~.(Tt§ ,~,-fjl~)

r~

Fig. 4.1. Interchanging adjacent q-d variables i and j in a q-d chain A special e-cost defined by Ce=const. for all i and p(x)=const, for all x is called uniform e-cost. It is easy to see that SEA holds under the uniform e-cost. Note that under SEA the costs of q-d chains having any q-d variable at the root are identical.

O p t i m u m Decision Trees

487

We call a tree quasi-optimum or q-optimum, if both of the left and right subtrees of T(R, i,f) are optimum, i.e. Topt [ r i o ] and T~vt [fi 1] respectively. We denote a q-optimum tree by Tquasi(R, i,f). Certainly optimum trees are quasioptimum ones. Theorem 4.5. The optimal variable theorem. Q-d variables are optimal under the strong equivalence assumption of q-d variables (SEA). Conversely, if there are q-d variables, any optimal variables are q-d variables under SEA. That is, if there are q-d variables, then only the q-d variables and any of them are optimal under SEA.

Proof. I. Proof of the first part. We use an induction on n, the arity of the table. We may assume that f has n essential variables. Assume i is a q-d variable (for convenience assume it is 0-branch decisive and its decision action is a). 1. Induction Base (n=2). The initial table f can be written, using " d o n ' t care" symbol " - ' , as follows: x~

xi

actions

0

-

a

1

0

b

1

1

c,

where b . c , since i is a q-d variable in 0-branch. Two q-optimum trees corresponding to this table and having test variables i and j at the roots are indicated in Fig. 4.2. We have two cases. Case 1: a~=b and a ~=c. The costs of T2 are greater than those of T1 by Ca or by p(a)C~, where p(a) is the execution probability of the decision action a. Case 2: a = b and a 4=c or its dual a 4=b and a=c. We prove only the former case since the latter is its dual. In this case, j is also a q-d variable, and hence by SEA both i and j are optimal.

Induction Step. Assuming that the theorem holds for arity n - 1 , we show that it holds for arity n(n>=3), i.e. assuming i is a q-d variable, we show that a qo p t i m u m tree Tl(R,i,f) having test variable i at the root is o p t i m u m under SEA. We denote left and right sons of R by r 0 and r 1 respectively. Since i is 0branch decisive, corresponding o p t i m u m subtree for subtable f r has at least an internal node, and hence has a root r 1. We prove that the cost of a q-optimum tree T4(R,j,f) having an arbitrary variable j of f at R is greater than or equal to that of the q-optimum tree TI(R, i,f) (Fig. 4.3). T o show this we consider an auxiliary tree T2(R, i,f) having q-d variable i at R and having a q-optimum subtree T2(rl, j, f i x) whose test variable is j (see Fig. 4.4) as R's right subtree. Since T2(r~,j, f i ~) is q-optimum, the left and right subtrees of r 1 are equivalent to Topt [fj~ and Topt [ f f i 1] respectively. Also since the subtree T~(rt, , , f / i ) is an o p t i m u m subtree, by comparing T1 and T2 we have IT21 > ITll. We have a situation in L e m m a 4.4 concerning i and j, and from this l e m m a and n > 3, we have the following three cases.

M. Miyakawa

488

b

c

a

ba

c

Case I

a

C

(7

r,

c

r,

Case 2 Fig. 4.2. All q-optimum trees for n = 2

~

[yil]

T~ t ~

r,~t

[s

/ \ / \ T, (R,j,I )

r, (R,t,f)

Fig. 4.3. Q-optimum trees TI(R, i, f) and T4(R, j, f)

Case I. i is a T4(R,j,f) for

q-d variable in both fjo and fjl. Consider the q-optimum tree this case. By inductive assumption we can assume the left and right o p t i m u m subtrees T4(ro, .,fjo) and T4(r1, *,ff) as those having i as their test variables (the costs of the arbitrary q-optimum tree T4(R,j,f) coincides with this q-optimum tree). Let this tree be T3. The right subtrees of the nodes r o and r 1 in T 3 a r e respectively Topt[filj ~ and Top' [filj 1] (Fig. 4.4, Case 1). Obviously, we have [Tom [fjo il]1 = ]Topt [filjo]l ' and

iTop' [fjl il-ll =

ITopt [fi~j~Jl.

Hence IT31>lZ_,l. M o r e precisely we have IT31"=lZ21a + Cf,

(4.2)

Optimum Decision Trees

489

and IT31e = IT21e +p(a) C~, where

(4.3)

p(a) is the execution probability for the decision action a of i in f.

Case 2. i is q-d in either fjo or decision action coincides with a).

fjl (we assume i is q-d in fjl, and hence its

Case 2.1. i is a unique essential variable in the other subtable (in our case in fjo). From Corollary 4.2.2 and from the induction hypothesis, we may assume that the left and right optimum subtrees Ta(ro, .,fjo) and T4(rl, .,fjl) of R have i as their test variables. Hence this can be included in Case 1.

Case 2.2. The other subtable (in our case fjo) is a constant subtable (with action a). In this case j is also a q-d variable in f (see Fig. 4.4, Case 2.2). Hence from SEA, the costs of T2(R, i,f) and Ts(R,j,f) coincide, i.e. IZel--IZ31. Thus in all cases we have 17"31>17"21. Since IZ41=lZ31, we conclude that IT4I>ITlt. 11. Proof of the Converse Parr. We prove that any non-q-d variable is not optimal under SEA, whenever at least a a-d variable exists. Let i and j be any q-d and non-q-d variables of f respectively. Assume j is an optimal variable and T3(R,j,f) be an optimum tree. Since the left and right subtrees of T3(R,j,f ) are optimum trees for fjo and f f , and since i is an q-d variable or a unique essential variable both in fjo and fjl from Corollary 4.4.2, we may assume that both optimum subtrees T3(ro, .,fjo) and T3(rt, *,fj~) have i as their test variables from the first part of this theorem (which we have proved just before) or from Corollary 4.2.2 (see Fig. 4.4, Case 1 and Case2.1). However, consider T2(R, i,f) which is equivalent to T3(R,j,f ) and is shown in the same figure. As indicated in (4.2) and (4.3) the description and execution costs of T3 is greater than those of T2. This contradicts to the optimality of T3. (Note that we used the intrinsities of q-d variables and unique essential variables in an essential way to indicate that j is actually not optimal, while in the first part of this proof, the intrinsities are not required in a strict sense). [] We give several illustrative examples for the theorem. Since SEA holds for the general d-cost, hereafter we mainly concern in e-cost. For simplicity we omit superffLX e which indicates e-cost. Let m and n denote the numbers of q-d and non-q-d variables o f f respectively.

Example 42. In Fig. 4.5 we give an example that a q-d variable is optimal and a non-q-d variable is not optimal in both costs ( m = l , n=2). I T / I - l T j l l = ( p ( f i l ) - 1 ) C i , <0, since p(fil):~l from the intrinsity of the q-d variable. Note that i is a unique q-d variable in f, fjo, fjl, fjo and fj~ (Jl and J2 are symmetric), hence SEA holds in f. Thus T~1 is a q-optimum tree. [] If f has a single q-d variable i and a single non-q-d variable j (m=n = 1), then i is optimal and j is nonoptimal as is indicated in Fig. 4.2, Case 1, of which optimality the optimal variable theorem guarantees (SEA is trivially satisfied). Judging from d-cost case, one may expect that a single q-d variable among n non-q-d variables is most likely to be optimal, and also that a single non-q-d

M. Miyakawa

490

,o,,

T,.,, [.fj}"]

r, pt

t.fzll

ro,,t t.f~ J/(~,,, ~

t/j'Z"]

cij'J

T,pzt.fz'Y"]

~pt tfi'J ~

Te CR, i , f ]

~(R,f,I} Case f and Case 2. I

Topt [ f i I ]

Topt[/i'j']

Topt [ f j ' i ' ]

~(R,i,[)

~(R,j,f) Case 2. 2

Fig. 4.4. Auxiliary trees T2 and Ta for the proof of the optimal variable theorem

t7

b

cc

b

b

c

c

b

Fig. 4.5. An example that a q-d variable i is optimal and non-q-d variables are nonoptimal under both general costs

491

Optimum Decision Trees

variable among m q-d variables is most likely to be nonoptimal without SEA. Example 4.2 and 4.4 below indicate that these do not hold, and thus they indicate that the direct weakening of SEA of the optimal variable theorem in necessary and sufficient directions for the general e-cost is not successful. We show a simple example which works as a counter example simultaneously for both directions.

Example 4.2. In Fig. 4.6 we give a " m i n i m u m " example (in the sense of number of actions as well as number of variables) that even a single q-d variable i in all the other non-q-d variables (m= 1, n = 2 ) can not be optimal under the general e-cost (consequently a non-q-d variable Jl (J2) becomes optimal). We have ITj, l p(fj~176 and p(fj~j~)=l/4>p(fj~i~ (cf. (4.1)). In fact, we have IT~I=(2.5+ 2e)Ci>IT~J =2.5 C i. []

o

bb

r~

a

rj,

Fig. 4.6. An example that a q-d variable i cannot be optimal and a non-q-d variable Jt can be optimal under the general e-cost SEA seems to be a very severe restriction upon e-costs of variables and execution probabilities. However, it is easy to construct a non-uniform distribution example which satisfies SEA.

Example 4.3. We give an example table fl which has 2 q-d variables and a single non-q-d variable in Table4.1 (m=2, n = l ) . Let p~ and Pc be the execution probabilities for the actions b and c. The other probabilities for the action a are given by: p(fi~176176176 p(fi~176176 ~ =p(fi~ i~jl)=e, where q and e are the values which satisfy 2(q + 2 ) ~ = 1 - P b - P c -

492

M. Miyakawa

The costs of the variables are: C ~ = C~2= C~=arbitrary, Cj=arbitrary. All the typical trees for f l are illustrated in Fig. 4.7 in which one can check that SEA holds. In fact, we have [T~,I=IT~2[=(1 + 2 s + p p + p c ) C~+(pb+pc ) Cj. ITj.II=ITj.2I=ITjI=(I + 2 e + p b + p c ) C ~ + C j . Table 4.1. Examplesfl

and -/'2

i~

actions

i2

j

0 0 0 0 1 1 1

0 0 1 1 0 0 1

0 1 0 1 0 1 0

1

1

1

probabilitydistribution

f~

A

b

qe qs e e e e Pb

c

p,

Po Pl P2 Pa p,~ P5 Pb Pc

a

[]

Our last example shows that even a single non-q-d variable in all the other q-d variables (m=2, n = l ) can be optimal without SEA, thus again indicating that SEA is essential in the e-cost for the optimal variable theorem. Example 4.4. The table f2 is given in Table4.1 and all its typical trees are illustrated in Fig. 4.7. We have IT/11< ITj. 11 and ITi21<[T~.2[. ITjl = Cj + p(fjo + f j l i~2)Ci ~+ p (fjl +fjo i~) Ci2 ,

IT~,I = Cr +p(fi~)Ciz + p ( f i ~i12)Cr ITi2[= Ciz + p(fil2) C,~ + p(fi12 i~) Cj. We give p(x) and C,~, C,:, Cj such that IT~I
((pl + P 3)/(P ~ + P 5)) c,2 + (pAp 1 + p~)) c j < c,, < ((Po + P2)/(Po + P4)) Ci2 - (P/(Po + 04)) Cj

(4.4)

and 0 < p = 1 -- (Pb + Pc) < 1.

(4.5)

Thus it is sufficient that

((Pl + p3)/(Pl + Ps) -(Po + pg/(Po + pD) Ci2 < --P(1/(Pl + Ps) + 1/(P0 + P,~))Cj < 0.

(4.6)

Since from (4.5) and from the second inequality of (4.6) follows that C~ is arbitrary, the condition that Ci,, Ci 2 and Cj exist is expressed by (Pl + P3)/(Pt + P 5) < (Po + PE)/(Po + P4).

(4.7)

Optimum DecisionTrees

493

<

(7

b

c

a

b

a

c

a

c

7),t

Ti,

<

b

o

c

T/2

b ~ 2

a

b

a

c

Tj Fig. 4.7. All the typical trees for fl and f2 These conditions (4.5) and (4.7) are actually satisfied, for example, by taking Po =p4=0.01, p~=p3=ps=O.1, p2=0.08 ( p = l - p b - p c = 0 . 4 ) . Substituting these probabilities in (4.4) we have C12+2 C~< Cil <4.5 Ci2-20Cj. This is satisfied, for example, by Ci~ = 1.15 Ci2, Cj=0.05 Ci. In fact, we have ITjl = 1.88 C~2, IT~,I =1.89 Ci~ and ITi~1=1.927 Cir. []

An Extension to m-ary Trees We remark that the theorem holds for any m-ary decision trees, in which a variable i is a q-d variable when fi~ f i 1. . . . . f l ~ - 2 = a m _ 2 and

494

M. Miyakawa

f.m_ a # const. Note that if i and j are q-d variables, then their decision actions degenrate to a single action. Thus q-d table for n ( > 1) q-d variables has two actions for m-ary trees, too.

5. An Algorithm to Construct an Optimum Decision Tree - Related Applications of the Optimal Variable Theorem In this section we describe an algorithm to construct an optimum decision tree. It is based on the dynamic programming principle. We show that the optimal variable theorem is successfully used to "accelerate" the optimum decision tree construction algorithm. Then we present an optimization algorithm of quasi decisive chains. So far we have represented subtables by their prefixes. However, to describe the o p t i m u m decision tree construction algorithm we need detailed notation to represent subtables; namely, we supplement it with a notation for representing free variables.

Variable Notation for Subtables Recall that we are given L symbols called variables represented by the numbers { 1. . . . . L}. Let n denote the set of all length n ordered strings defined by n = { i 1 ... i,,lij~{1, ..., L}, 1 <=j
ExampleS.1. 0={2}, 1 = { 1 . . . . . L}, 2={12, 13, ..., 1L, 23 . . . . , 2 L . . . . . L - 1 L } , .... L=

{ 1 2 3 ... L } .

Let L* be the set of unions of n for n = 0 . . . . . L. We call an element of L* a

sequence. We represent free variables of a subtable by a sequence. Thus the initial table is represented by the sequence f L = f l 2 ... L. Symbols. We denote sequences by u and v; elements of these sequences by u(1) . . . . . u ( L - n ) and v(1) . . . . . v(n) (the two sequences u and v appear in a formula " c o v e r " the whole set {1 ..... L}). Let x denote an element of the cartesian product {0, 1}m, where 1 <_m<_L and let us write x = x ( 1 ) ... x(m). We abbreviate a subtable corresponding to a fixation set u(1)Xmu(2)~2) ...

u ( L - n ) x(z-") by fu': (prefix notation). While if we represent the whole variables of the subtable, discriminating free variables from fixed ones, we write fury. Thus we have fu:'= f u ~ v. Example5.2. The set of vector I={xix~{O, 1} L} is represented by {L~lxeI}. Similarly, the distribution function of execution probability p(x) can be represented by {p(U')lx~l}. Probability conservation law concerning the dividing of a subtable by i is represented by

p ( f u~i) = p ( f u:'i~ + p ( f u~il). N o t e that the execution probabilities for all butarity 0 subtables of f, i.e.

{p(f u x v)lxe {0, 1 }m, I < m < L, v~L*}, can be computed systematically (by using

Optimum Decision Trees

495

an analog of dynamic programming technique) by the above conservation law, in 3L--21 steps. As we see below, this computation is not greater than that for the optimization, so we assume execution probabilities for all subtables are always available. Now we can state the construction algorithm of an optimum decision tree. Let If~l denote the cost (either d-cost or e-cost) of an optimum subtree corresponding to a subtable f~. Then from the dynamic programming principle which states that "any subtree of optimum (sub)tree is optimum" and from formula (3.4) we have for any subtable fury of arity n (yen):

[0 if fuXv=const., , ffuXv[= ) min Cvti)(uXv)+tfu~v(i)~ [+[fuXv(i)Xv'[

(5.1) otherwise,

kl <=i<=n

where v = v(1).., v(n) and v' = v ( 1 ) . . . v ( i - 1)v(i+ 1)... v(n). Thus from (5.1) the cost of an optimum subtree for any subtable fury of arity n for each x can be computed (there are 2 t'-" such x for each v), if we know Cvt~)(fuXv), the costs for the arity n - 1 subtables Ifu~v(O~ and Ifu~v(i)av'[ for every free variable v(i) of the subtable fury. Since the cost for arity 0 subtable is 0, this can be done for all arity 1 subtables, and then for all arity 2 subtables and so on until we reach f v = f l 2 ... L. Thus we can compute I f 1 2 ... LI by repeatedly applying (5.1) for v e l , v~2, ..., v~L in this order.

Amount of Computation We measure the amount of computation needed for the algorithm by the number of rain's applied in (5.1). Usually min is done through comparison of two costs. By simple calculation the number of rain's needed to construct an optimum decision tree is given by

11=1

Since the size of the initial input table is about N = 2 L, the number of the operations needed for this algorithm is O(N ~~ logN), where the log has base 2. The dynamic programming approach for constructing optimum decision trees are reported by many authors [5, 15, 18], possibly independently.

5.1. Computation Acceleration by the Optimal Variable Theorem: Case of d-Cost Assume that we are constructing an optimum d-cost tree. Suppose the condition lfuXv(O~ or [fuXv(i)lld=O holds. These conditions are easily checked, as the values for the optimum subtrees are usually stored in an array and can be easily "read". Then we know that there holds one of the following three cases only: v(i) is an q-d variable or a unique essential variable of the subtable fury, or the subtable is a constant. Since the strong equivalence

496

M. Miyakawa

assumption of q-d variables (SEA) currently holds, the optimal variable theorem and Corollary 4.2.2 indicate that, whatever the variable v(i) is a q-d variable or a unique essential variable, it is optimum. So we want to distinguish the last case from the other ones. It is equivalent to [fuXv(i)~ IfuXv(i)lld=O and f(uXv(i)~ (the same single action). And this last condition is also easily checked by reading two rules of the initial table. Hence in the both (first two) cases, excluding the last case, we can " s t o p " the execution of min operators in (5.1) for the other values of i. That is because the min value of (5.1) is given by the current variable i. This leads us to the acceleration of the construction of the optimum decision trees. The number of min's in (5.1) needed for the algorithm is taken as a measure of the required computation. We have 3 L - 2 L < computation < (L/3)3 L. The lower bound is attained when min is performed once for each subtable except arity 0 subtables and the upper bound when min is computed for all i. Thus the upper bound is the amount of computation usually required. The acceleration mentioned above greatly reduces this computation, though the computational order cannot be lowered below O(Nl~ For example, for 10 arity 4 action tables, where the occurrences of each action obey 80-20 rule [4, p. 397], the computation decreases to 42 ~ its value without acceleration, while the total execution time decreases to about 80 ~o.

5.2. An Optimization Problem of Quasi-Decisive Chains: Case of e-Cost The q-d table seems a very special one. However there are several applications using the q-d table. For example, it is used in the handling of AND queries in database systems [1] or in the unification of arguments in logic programming [6]. In these applications there are two actions typically correspond to "success" and "failure". So an efficient optimization algorithm for this problem is important. Although the problem seems very special, no such algorithm is devised for it so far. Let us modify the general algorithm of (5.1). For convenience, let us assume that the q-d table is uniformly 0-branch decisive. Hence only such subtables having fixed values uniformly 1 survive; if any variable is fixed to 0, it degenerates to a decision action. Thus we denote subtables having all fixed values uniformly 1 by fulv. Formula (5.1) reduces to

Ifu 1v[e-- min p(fu 1v) ce.~+ [fu 1v(i) 1V'IL

(5.2)

1 <--i
where v = v ( 1 ) . . , v(n) and v'=v(1).., v(i-1)v(i+ 1)... v(n). We can compute the purpose If12 ... LIe by repeated application of (5.2) for vsl, v~2, ..., vsL in this order.

Amount of Computation Again we need 2 L steps for the calculation of probabilities p(fu 1v) for all v. It is easy to show that the number of min's executed during this algorithm is

Optimum Decision Trees

497

n=l

Hence the order of this algorithm is

O(N log N).

6. Conclusions We considered the problem of translation of a decision table into optimal sequential testing procedures (STPs) under two cost criteria. We formulated an STP as a decision tree and introduced its description and execution costs. We defined an optimal STP by minimum cost decision trees. A variable is called optimal when it is tested first in an optimal STP. That is an optimal variable is one which is placed and hence to be tested at the root of an o p t i m u m decision tree. We introduced quasi-decisive (q-d) variables and proved the optimal variable theorem. That is, under the strong equivalence assumption of q-d variables (SEA), q-d variables are optimal and, conversely, if there are q-d variables, only the q-d variables are optimal. SEA holds for the description cost and for uniform execution cost. We showed that SEA in the theorem can not be weakened in the general execution cost. A construction algorithm is described for o p t i m u m decision trees which requires O(N l~ log N) comparison operations, One can speed-up this algorithm by applying the optimal variable theorem in description cost case. The number of executions of the inner-most loop of the algorithm is reduced to 42 ~ under fairly reasonable assumptions on the test data tables. The reduction in the total execution time of the algorithm is about 80 ~ . As a related topic we discussed optimization problems of quasi-decisive chains under execution cost. Our optimization algorithm requires O(N log N) operations. Parts of these results are published in [6-10]. Construction problems of o p t i m u m decision trees involve N P - c o m p l e t e problems; especially in a more practical situation when given initial tables are not in completely expanded form [2]. Hence various algorithms constructing n e a r - o p t i m u m decision trees are of significance. The o p t i m u m variable theorem provides a theoretical basis to " p r o v e " that such "heuristic" algorithms have " s o u n d " behaviors [11-13].

Acknowledgement.The author is indebted to Tetsuya Morioka and Nobuyuki Otsu and to his other colleagues of the Mathematical Engineering Section for many helpful and stimulating discussions. He expresses his gratitude to Drs. K. Tamura, A. Tojo, O. Ishii, K. Fuchi, H. Kawai and H. Nishino for their encouragements to continue the project. The earliest stage of this work was begun while the author was on leave at the Computing Center of the Siberian Division, the Academy of Sciences of USSR in Novosibirsk. His hearty thanks go to Profs. A.P. Ershov, G.I. Marchuk and Drs. V.E. Kotov, V.A. Nepomnyashchii,V.E. ltkin, V.K. Sabelfeld and O.N. Ochakovskaya. His thanks also go to Dr. N.N. Abdelmalek for carefully reading the manuscript and giving many helpful comments. The author wishes to dedicate Theorem 4.5 to the late Mrs. Michiko Kinjo, his aunt.

498

M. Miyakawa

References 1. Hanani, M.Z.: An optimal evaluation of boolean expressions in an online query system. Commun. ACM 20, 344-347 (May 1977) 2. Hyafil, L., Rivest, R.L.: Constructing optimal binary decision trees is NP-complete. Commun. ACM 5, 15-17 (May 1976) 3. Knuth, D.E.: The art of computer programming, vol. 1 (2nd ed.). Reading, MA: Addison Wesley 1973 4. Knuth, D.E.: The art of computer programming, vol. 3 (2nd ed.). Reading, MA: Addison Wesley 1973 5. Miyakawa, M., Sabelfeld, V.K.: On minimizations of size of logical schemes (in Russian). In: Theoretical basis of compiling (A.P. Ershov, ed.), pp. 49-58. Novosibirsk: Novosibirsk State University 1980 6. Miyakawa, M.: Transforming nondeterministic programs into optimal sequential programs The optimal variable theorem and optimal ordering of arguments in unification procedures (in Japanese). Proceedings of the logic programming Conference "83, Tokyo, 1983 7. Miyakawa, M.: Optimum decision trees, I - On the optimal variable theorems (in Japanese). TGEC IECE Japan 1, EC83-6, May 1983 8. Miyakawa, M.: Optimization problems of semi-decisive chains (in Japanese). 1984 Nat. Conv. Rec. of IECE Japan z, 1502, April 1984 9. Miyakawa, M.: A nonoptimal variable theorem in decision tables - An extension of the optimal variable theorem (in Japanese). 1985 Nat. Conv. Rec. of IECE Japan, 1510, March 1985 10. Miyakawa, M , Morioka, T.: On efficiency of decision tree optimization (in Japanese). 1983 Nat. Conv. Rec. IECE Japan ~, 1335, 1983 11. Miyakawa, M., Otsu, N.: On variable selection criteria in the minimum decision tree construction problems (in Japanese). 1982 Nat. Cony. Rec. of 1ECE Japan, 1249, 1982 12. Miyakawa, M., Otsu, N.: Algorithms constructing near-minimum total nodes decision trees from expanded decision tables (in Japanese). TGEC IECE Japan, EC82-83, July 1983 13. Miyakawa, M., Otsu, N.: Adequacy of the variable selection criteria in constructing nearoptimum decision trees (in Japanese). 1983 Nat. Conf. Rec. 3 on information science and technology, IECE Japan, $4-8, September 1983 14. Moret, B.M.E., Thomason, M.G., Gonzalez, R.C.: The activity of a variable and its relation to decision table. ACM Trans. Program. Lang. Syst. 2, 580-595 (October 1980) 15. Moret, B.M.E.: Decision trees and diagrams. Comput. Surv. 14, 593-623 (December 1982) 16. Reinwald, L.T., Soland, R.M.: Conversion of limited-entry decision tables to optimal computer programs, I: Minimum average processing time. J. ACM 13, 339-358 (July 1966) 17. Reinwald, L.T., Soland, R.M.: Conversion of limited-entry decision tables to optimal computer programs, II: Minimum storage requirement. J. ACM 14, 742-755 (October 1967) 18. Schumacher, H., Sevcik, K.: The synthetic approach to decision table conversion. Commun. ACM 19, 343-351 (June 1976)

Received April 22, 1985/June 26, 1985

t Paper of technical group on electronic computers, Institute of Electronics and Communication Engineers of Japan z National Convention Records of the Institute of Electronics and Communication Engineers of Japan 3 National Conference Records

Optimum decision trees — an optimal variable theorem and its related applications

Recommend Documents