Annals of Mathematics and Artificial Intelligence 3 (1991) 331-360
THE INPUT/OUTPUT Jeffrey D. ULLMAN
COMPLEXITY
OF TRANSITIVE
331
CLOSURE
*
Stanford University, Stanford, CA 94305, USA Mihalis YANNAKAKIS
AT&T Bell Laboratories, Murray Hill, NJ 07974, USA
Abstract Suppose a directed graph has its arcs stored in secondary memory, and we wish to compute its transitive closure, also storing the result in secondary memory. We assume that an amount of main memory capable of holding s "values" is available, and that s lies between n, the number of nodes of the graph, and e, the number of arcs. The cost measure we use for algorithms is the I / 0 complexity of Kung and Hong, where we count 1 every time a value is moved into main memory from secondary memory, or vice versa. In the dense case, where e is close to n 2, we show that I / O equal to O(n3/vrs) is sufficient to compute the transitive closure of an n-node graph, using main memory of size s. Moreover, it is necessary for any algorithm that is "standard," in a sense to be defined precisely in the paper. Roughly, "standard" means that paths are constructed only by concatenating arcs and previously discovered paths. For the sparse case, we show that I / O equal to O ( n 2 f ~ ) is sufficient, although the algorithm we propose meets our definition of "standard" only if the underlying graph is acycfic. We also show that (2(n2vfe/s) is necessary for any standard algorithm in the sparse case. That settles the I / O complexity of the sparse/acyclic case, for standard algorithms. It is unknown whether this complexity can be achieved in the sparse, cyclic case, by a standard algorithm, and it is unknown whether the bound can be beaten by nonstandard algorithms. We then consider a special kind of standard algorithm, in which paths are constructed only by concatenating arcs and old paths, never by concatenating two old paths. This restriction seems essential if we are to take advantage of sparseness. Unfortunately, we show that almost another factor of n I / O is necessary. That is, there is an algorithm in this class using I / O O(n3~fe--~) for arbitrary sparse graphs, including cyclic ones. Moreover, every algorithm in the restricted class must use ~2(n3fb--~/log3n) I / O , on some cyclic graphs.
1. The problem L e t u s a s s u m e t h a t d a t a is s t o r e d i n s e c o n d a r y m e m o r y , a n d c o m p u t a t i o n m u s t be performed in main memory. We suppose that the cost of moving a datum from * The work of this author was partially supported by N S F grant IRI-87-22886, IBM contract 476816, Air Force grant AFOSR-88-0266 and a Guggenheim fellowship. 9 J.C. Baltzer A.G. Scientific Publishing Company
332
J.D. Ullman, M. Yannakakis / Input~output complexity
secondary to main memory or back is very large, so much so, that we shall consider the cost of data movement to the exclusion of computation cost. For us, a datum is a value of small size, such as an integer whose value is polynomial in the size of the input [and thus can be represented in O(log n) bits]. We shall use s throughout as the capacity of main memory. The quantity s measures the number of data items that can be stored in main memory at one time. Because of our imprecision in the size of a datum, we can only talk about s to within an order of magnitude. Since we generally buy machines with, say, exactly 4 megabytes of main memory, not "on the order of 4 megabytes," all our results should, strictly speaking, be revised to lower, by a constant factor, the problem size that we can handle with a given amount of memory in a given time. Nevertheless, the notion that main memory is expandable by a constant factor is a convenient fiction. We are primarily interested in transitive closure problems and its generalization to closed semiring problems as described in Aho, Hopcroft, and Ullman [2]. In the basic problem, we are gNen a directed graph of n nodes and e arcs, and we wish to find its transitive closure, that is, the set of pairs of nodes (v, w) such that there is a path of length one or more from v to w. In the generalized problem, we are given a label on each arc, chosen from a closed semiring, which is a certain algebraic structure defined in Aho et al. [2], having multiplication and addition operators. We wish to compute the sum over all paths from v to w, of the product of the labels along that path, in order. 1.1. THE KUNG-HONG MODEL This model was studied by Kung and Hong [5], who proved some basic results about a variety of problems. Their notion of an algorithm is a D A G (directed acyclic graph), in which the nodes correspond to values; the predecessors of a value v are the values used to compute v. For example, they considered the standard algorithm for multiplication of n • n matrices, and showed that the I / O required is I2(n3/7~). That amount of I / O is sufficient, as well, by an algorithm of McKellar and Coffman [9]. They could equally as well have shown the same lower bound for the usual algorithm, due to Warshall [13], for taking the transitive closure. Indeed, our lower bound for a family of transitive closure algorithms makes heavy use of two important combinatorial ideas from Kung and Hong [5]. 1. If we examine the computation during a time slice in which s I / O operations are performed, then only 2s data items are available during that entire time slice - - the s items in main memory before the slice began, and the up to s items read into main memory during that slice. 2. With O(s) "facts" available, and with each fact concerning a pair of nodes (e.g., a "fact" might say there is a path from v to w), the number of triangles for which we have a fact for each of the three sides is 0(s3/2).
J.D. Ullman, M. Yannakak& / Input~output complexity
333
1.2. OUR MODEL
Our model is somewhat different. We focus only on transitive closure and related problems, so it is natural to view steps of the algorithm not as nodes of a DAG, but as operations in which paths are discovered. That is, each operation involves some number of facts that must simultaneously be in main memory. A fact is a variable that we shall designate either arc(v, w) ("there is an arc from node v to node w") or path(v, w) ("there is a path from v to w"). In the basic transitive closure problem, all these facts are Boolean-valued; we are given the arc facts, and need to compute the path facts. For example, an operation might take data items representing the facts path(v, x) and path(x, w) and assign a new value to the fact path(v, w). In the generalized, closed-semiring case, the values of arc facts are labels of arcs. What we call fact path (v, w) is intended, after the algorithm is complete, to be the sum over all paths from v to w, of the product of the labels along each path. In our model, we do not assume a precedence in the order of operations, as is assumed by the DAG model of Kung and Hong. Rather, when proving lower bounds, we characterize an algorithm by the set of sets of facts that must be in main memory simultaneously. We should note, however, that when proving upper bounds, that is, exhibiting algorithms, we sometimes perform operations involving data other than the arc and path facts. If so, we shall account for the main memory space used by these facts, under the assumption that integers whose number of bits is logarithmic in the input size may be stored in one unit of main memory. Domain values from a closed semiring are also assumed to take one memory unit, even though their nature and representation is left unspecified. That model is both more and less restrictive than the K u n g - H o n g model. It is more restrictive in that we assume a new value of a "fact," such as path(v, x), always replaces its previous value in main memory. The Kung-Hong, or DAG, model, allows executions in which two different D A G nodes are in memory simultaneously, even though they conceptually represent different versions of the same variable. On the other hand, the K u n g - H o n g model only allows us to discuss one DAG at a time, which implies that the order of operations is fixed, except for operations that are independent. We, on the other hand, can consider whole families of algorithms at once. For example, we speak of a family that includes all n ! versions of Warshall's algorithm where the order in which we pivot on the various nodes is allowed to vary. Each of these versions would have a different DAG, although the DAGs would differ only by renaming of nodes. 1.3. STANDARD ALGORITHMS
When proving lower bounds for the dense case, we shall focus on standard algorithms. These have the property that (for at least one input) for every triangle of nodes ( v, w, x }, there is at some time simultaneously in main memory a fact
J.D. Ullman, M. Yannakakis / Input~output complexity
334
for all i and path(i,j) for k := I to for all i
j do := arc(i.j); n do and j do
path(i,j) := path(i.,j) OR (p~th(i,k) AND path(k,j)); Fig. 1. Warshall's algorithm. about v and w, a fact about w and x, and a fact about v and x. What algorithms are "standard"? a) 1. Warshalrs algorithm, as described in fig. 1. W e assume the nodes are numbered 1,2,..., n. Each node, in turn, becomes k, the pivot node. After k has had its turn at pivot, path(i, j) is 1 if and only if there is a path from i to j that goes through no node numbered above k (although i or j m a y exceed k). Notice that at the last line of fig. 1, three facts involving the triangle of nodes i, j, and k must be in main memory at once. In fact, that would be true even if we counted ordered triangles. 2. More generally, Kleene's algorithm, as described in A h o et al. [2], applied to an arbitrary closed semiring, is a standard algorithm. 3. The improvement of Warshall's algorithm due to Warren [12] is also a standard algorithm. 4. Seminaive evaluation 2) of the logical rules
path(V, W) :- arc(V, W). path(V,
W) :- path(V,
X) & path(X,
W).
(1.2)
Here, we use Prolog notation, and the two rules are read "there is a path from V to W if there is an arc from V to W," and "there is a path from V to W if there exists a node X such that there is a path from V to X and a path from X to W." Seminaive evaluation applies the rules repeatedly, starting from the data (the arc facts), and getting successive approximations to the path facts. In seminaive (as opposed to naive) evaluation, care is taken that when applying a rule, at least one of the facts in the b o d y (right side) is " n e w , " in the sense that it was just discovered on the previous round. Seminaive evaluation might not look at facts for every triangle; for example, it would not if the set of arcs were empty. However, if the set of arcs is a complete graph, then on the second round, the recursive rule finds all path facts " n e w , " and attempts to apply the recursive rule in all possible ways. Then, all triangles { V, X, W } of path facts will appear simultaneously in memory, when the recursive rule is applied to that triple of nodes (in any of the six orders). 1) We do not care whether triangles are regarded as ordered or unordered, since all our results are order-of-magnitude anyway and order only introduces a factor of 6 in the count of triangles. 2) See Ullman [10] for a discussion of seminaive evaluation.
J.D. Ullman, M. Yannakakis / Input~output complexity
335
5. Similarly, we might use either of the two linear forms of the recursion for transitive closure. Seminaive evaluation of the rules
path(V, W) :- arc(V, W). path(V, W) :- arc(V, X) & path(X, W).
(1.3)
or the rules
path(V, W) :- arc(V, W). path(V, W) :- path(V, X) & arc(X, W).
(1.4)
are standard algorithms. Here, the triangles each involve two path facts and an arc fact, but there is nothing in the definition of "standard" that requires the facts all be path facts. Are there algorithms in use that are not "standard" in our sense? Yes; there are several different approaches that do not use triangles of nodes. 1. Algorithms for transitive closure can be based on a fast matrix-multiplication algorithm. For example, if we square the adjacency matrix log n times, using the algorithm of Coppersmith and Winograd [4] to square matrices, we get an O(n 2"37 log n) time algorithm, that uses I / O equal to 0((n237/s 0"19)log n). 2. Algorithms can be based on depth-first search. As we shall see, for graphs with cycles, the discovery of strongly connected components allows us to infer the existence of paths without actually constructing them. There is also a subtlety regarding algorithms that do not behave in an "oblivious" way, but base their action on the values computed. For example, we could modify Warshall's algorithm in two ways. First, if we find
path ( i, k) A N D path ( k, j) to be false, then we need not modify path(i, j). Hence, it appears we do not need to have the triangle ( i . j . k } in memory at one time. However, this modified version of Warshall's algorithm is still standard, because when the given graph is complete, it must assign to path(i, j ) all the time. Another modification to Warshall's algorithm is to write a 1 into the copy of path(i, j) in secondary memory whenever path(i, k) A N D path(k, j) is true, regardless of its current value. Again, it appears we do not need path(i, j) in main memory when we consider the triple (i, j, k). However, at the expense of doubling the I / O and increasing the memory by half, we could insist that path(i,j) be brought into main memory anyway, and then written out. Since our results are all order-of-magnitude anyway, this modification cannot be better than a standard algorithm. Thus, any lower bound for standard algorithms applies to modifications that write new values for variables into secondary storage without examining their current value first. Moreover, if we are performing a generalized transitive closure, then the analog of Warshall's algorithm (Kleene's algorithm) cannot write a new value for path(i, j) without having the old value in main memory.
336
,I.D. Ullman, M. Yannakakis / Input/output complexity
1.4. EXTENSIONTO BLOCK-ORIENTED I/O Usually, we do not want to count only the volume of traffic between main and secondary memory, but rather count the number of blocks or pages that must move between the two memories. As in the model of Ullman [10], blocks carry some fixed, but large number of facts. It turns out that, under reasonable assumptions about the size of blocks, each of the algorithms we propose allows complete blocking; that is, if b facts fit on one block, then the amount of I / O can be reduced by a factor O(b). Since the lower bounds cannot be reduced by a greater factor, our results will carry over to the model with blocked I / O , simply by dividing expressions by b.
2. An algorithm for the dense case We begin with some relatively easy results; these cover the dense case, where the number of arcs in an n-node graph can be as high as n 2. A recent paper by Agrawal and Jagadish [1] offers an algorithm for this case. By partitioning the adjacency matrix of the graph into stripes, they use I / O equal t o O(n4/s). As usual, n is the number of nodes and s the amount of main memory. However, the correct way to partition matrices for transitive closure when I / O is at a premium was known to McKellar and Coffman [9]. We break an n • n matrix M into squares of size r there being n2/s submatrices, which we designate Mij, for 1 <~ i, j <~ n/~/-s. Figure 2 illustrates this partition.
M1,2
"
"
"
"
~
"
iMl,n/,~
M2,1
Mn/ l
Fig. 2. Partition of a matrix.
Mn,y
;,
J.D. Ullman, M. Yannakakis / Input/output complexity
337
(1) for k := 1 to n/v/s do b e g i n (2) Mk,k := M;,k; (3) for i := 1 to n/x/~ do (4) for j := 1 to n/v~ do (5) Mi,i := Mi,j + .Mi,k • Mk,k x .Mk,j end Fig. 3, Kleene's algorithm. If M is a Boolean matrix, or in fact, a matrix whose d e m e n t s come from any closed semiring, then the x/s- • x/S- submatrices of M are elements of a closed semiring, with the usual matrix multiplication and addition as • and + in the closed semiring (see Exercise 15.28 of Ullman [11] for some details). Thus, we can apply Kleene's algorithm, as shown in fig. 3, to c o m p u t e the transitive closure of M. There, note that M~k is the reflexive and transitive closure of Mk,k.
THEOREM 2.3 The algorithm of fig. 3 correctly computes the transitive closure of the graph with adjacency matrix M.
Proof Divide the nodes of the graph into zones as follows. N o d e s 1 through n~ v~ are in zone 1, nodes n/x/s + 1 through 2n/x/-s are in zone 2, and so on. Thus, the submatrix M~,j tells about arcs, and later paths, from nodes in zone i to nodes in zone j. An easy induction on k shows that, after executing lines (2) through (5) with a fixed value of k, the value in row v and column w is 1 if and only if there is a path from node v to node w that goes through no zone higher than k. N o t e that v a n d / o r w may themselves be in higher zones. The key to the proof is an examination of line (5). A path from v in zone i to w in zone j that goes through no zone higher than k either does not go through zone k at all, in which case, M,,j already had a I in the bit for the pair ( v, w), or it goes through zone k one or more times. In the latter case, M,, k has a 1 that represents a path from v to the first node in zone k along the path, b y the inductive hypothesis. The transitive closure step at line (2) assures that Mk, k has a 1 representing any path from one node in zone k to another in the same zone that goes through no zone higher than k. In particular, there is a 1 representing the portion of the path from v to w that goes from the first node on zone k on that path to the last node in zone k on that path 3). Finally, Mk, j has a 1 representing the portion of the path from the last node in zone k to node w, in zone j. [] 3) Note this claim is true even when there is only one node in zone k on the path.
338
J.D. Ullman, M. Yannakakis / In_put/output complexity
THEOREM 2.4 The algorithm of fig. 3 requires I / 0 equal to O(n3/1/s-), provided s ~ n 2.
Proof Note that any submatrix Mi, J will fit in main m e m o r y at once. Since we assume we have O(s) memory, we can hold any finite n u m b e r of submatrices, in particular, M,,j, Mk,k, and Mk, j at once. When we read M , , , at line (2), we can take its transitive closure by Warshall's algorithm. That algorithm is executed "in place," and needs no additional main memory. To execute line (5), we surely need no more than O(s) I / O , since we read and write a finite n u m b e r of O(s)-sized submatrices. Step (2) requires only O(m/'s) I / O , since it is executed n / f s times and uses O(s) I / O each time. Step (5) dominates, since it is executed n3/s 3/2 times, with O(s) I / O each time. The I / O cost is thus O(n3/Vrs). [] Note that n3/v~ - is always less than the Agrawal and Jagadish [1] upper bound of n4//s, as long as s < n 2. When s >7 n 2, the upper bound on I / O is O(n2), since that is required just to read the data. Once read, we can compute the transitive closure in main m e m o r y with no further I / O , except the n z steps needed to write the answer. COROLLARY 2.5 If for some closed semiring, the domain elements require O(1) m e m o r y space, and the + , x , and * operations can be performed with at most O(s) scratch space, then the generalized transitive closure can be taken for this closed semiring with O(n3/vrs) I / O . []
3. A lower bound for dense graphs We shall now show that the upper bound n3/vr~ for dense graphs is also a lower bound on the I / O needed by any standard algorithm. As we mentioned, it is possible to beat the upper bound, in principle, by using a fast matrix-multiplication algorithm, although it is unclear whether these schemes are practical for problems of realistic size. The following combinatorial lemma is the key. LEMMA 3.1 With m edges, we can make no more than
O(m 3/2) triangles.
Proof This result is well known; see Lovasz [8], Exercise 13.31, for example. However, a proof will serve as an introduction to a more complicated lemma that we
J.D. Ullman, M. Yannakakis / Input/output complexity
339
need for the sparse case. The technique is to assign to each edge "charge" for O ( ~ - ) triangles, in such a way that every triangle is charged to some edge. Since there are m edges, the bound of O(m 3/2) triangles follows. Consider each node v in turn. If there are at most ~ edges incident upon v, then the number of triangles including some particular edge (v, w) is at most ~/-m-. Charge the edge (v, w) one unit for each such triangle. If there are more than ~ edges incident upon v, we observe that v cannot be part of more than m triangles, because there can be no more than m opposite edges. Divide the charge for these m triangles equally among the edges incident upon v, resulting in a charge of no more than (m- to each. Thus, no matter how many incident edges v has, those edges are charged no more than ~ each, for triangles containing v. Now, note that an edge can only receive a charge from its two endpoints, so its total charge is no more than 2~/-m. Also, a charge for any triangle is made for each of its three vertices, so the total charge, of at most 2m 3/z, is actually three times the number of triangles. Thus, 2m3/2/3 is an upper bound on the number of triangles. [] Now we can prove the lower bound result for dense graphs.
THEOREM 3.2 Every standard algorithm for computing the transitive closure of a dense graph with n nodes, using main memory of size s, requires ~(n3/1/s -) I / O in the worst case.
Proof Although we deal with triangles and they deal with DAGs, the argument is really the same as the one found in Kung and Hong [5] for matrix multiplication.
f silO ONE SLICE ts I/0 )NE SLICE
TIME
ONE SLICE t
s I/0
Fig. 4. Dividing computation into slices.
340
J.D. Ullman, M. Yannakakis / Input/output complexity
By definition, for every standard algorithm there is some graph for which facts involving all (~) = ~2(n 3) triangles are present in m e m o r y at some time. Consider the action of the algorithm divided into slices, during which s I / O steps are performed, as suggested in fig. 4. During any slice, there are at most 2s facts ever in main memory: the s that were there originally and up to s that were read from secondary storage. By lemma 3.1, at most O(s 3/2) triangles are formed during one slice. Thus, there must be ~(n3/s 3/2) SliCeS. As each slice uses s I / O , the amount of I / 0 is I2(na/v/J-). []
4. An upper bound for sparse, acyclic graphs We shall now introduce e, the number of arcs, as a separate parameter. It is useful to study the case of acyclic graphs first, because they appear to be able to take advantage of sparseness in a way that cyclic graphs cannot. In what follows, we shall assume that e >t n, and s << e 4), since if e main m e m o r y space is available, we can read the entire graph, and then compute its transitive closure by searching from each node, in main memory. Then, n 2 I / O suffices to write the answer. A standard algorithm for sparse graphs must combine arc facts with path facts, or it cannot take advantage of sparseness. The reason is that O(n) arcs can yield paths between any pair of nodes. Thus, a standard algorithm that combines paths (probably) will have to consider all triangles, even if e is m u c h less than n 2. Thus, it is reasonable to consider algorithms that only combine arc and path facts, such as seminaive evaluation of the linear rules (1.3) or (1.4) or the basic algorithm of Ioannidis and Ramakrishnan [7] for acyclic graphs. 4.1. SPARSE-STANDARD ALGORITHMS We shall call an algorithm sparse-standard if it has in m e m o r y at some time every triangle (v, x, w) consisting of an arc fact arc(v, x), and path facts path(v, w) and path(x, w). We call such a triangle an app-triangle. Note that there are e(n - 2), or about en app-triangles, since each arc can have any of the n - 2 nodes that are not its endpoints as the opposite vertex. It turns out that there is a sparse-standard algorithm, to compute transitive closure of a sparse, acyclic graph in O ( n 2 ~ - - s ) time 5). Moreover, any sparsestandard algorithm must take at least this amount of time, even on acyclic graphs. Interestingly, the upper bound also holds for sparse graphs with cycles, but the algorithm is not even standard. We show in section 8 that for sparse-standard 4) That is, s grows asymptotically more slowly, as a function of n, than e. 5) Note that this formula reduces to the formula for dense graphs when e = n 2.
J.D. Ullman, M. Yannakakis / Input~output complexity
341
algorithms on cyclic graphs, almost a full factor of n more is needed, that is, ~2(n3e~/log3n). 4.2. THE "BIG" AND "SMALL" PARAMETERS
In what follows, it is useful to define the shorthands
fl=n
,
r o=
n
We think of/3 as the size of a "big" set of nodes and a as the size of a "small" set. Notice that in the dense case, where e = n 2, we have/3 = 0 = fs-. The intuition regarding/3 and o is as follows. In lemma 3.1, we observed that s edges make at most s 3/2 triangles. To attain that ratio of triangles to edges, we must select about x/~ nodes and all edges between these nodes. Now, suppose we are only interested in app-triangles. If the graph is random, that is, each arc is chosen with probability e/n 2, then, approximately, the greatest n u m b e r of app-triangles will be obtained if we have two " b i g " sets of/3 nodes each, and a "small" set of size o. For our s facts, we select all the arcs of the given graph that run from one big set to the other, and we also select path facts running from each node in the small set to each node in either of the big sets. The expected n u m b e r of arcs from one big set to another is fl2e/n2= s, and the n u m b e r of pairs of nodes, one from a big set and one from a small set, is/30 = s. Thus, there are an expected O(s) facts selected, and by constant-factor adjustments in the size of the big and small sets, we can make that be exactly s. Like the algorithm of section 2, we shall "pivot" on sets of nodes. However, instead of using "zones" of size x/S-, when we pivot on zone k, and establish paths from zone i to zone j that go through zone k, zones i and k will be big sets, and zone j will be a small set. Moreover, we use only arcs from zone i to zone k, not paths. That is why the algorithm only works for acyclic graphs, and then only if the pivot zones are chosen in reverse topological order. 4.3. A SPARSE-STANDARD ALGORITHM FOR ACYCLIC GRAPHS
The algorithm is best explained as doing something more general than computing transitive closure, which we shall call extended transitive closure. We are given m nodes topologically sorted. That is, the nodes are n u m b e r e d 1,2 . . . . , m, and any arc i ~ j must have i < j. We are given e arcs among the first n ~< m nodes and any number of paths from nodes among the first n to any of the m nodes (including the first n). Of course, all paths go from lower- to higher-numbered nodes, as suggested by fig. 5. The problem is to find which nodes among the first n have paths to which of the m nodes; each such path will follow zero or more arcs among the first n, and then an (optional) path fact. The case of ordinary
342
J.D. Ullman, M. Yannakakis / Input/output complexity
e ARCS
U P TO
nm
PATHS
a~
Fig. 5. Given data for algorithm 4.2. sparse transitive closure is extended transitive closure with m = n, and no given paths. We now give an algorithm that computes the extended transitive closure with I / 0 equal to O(mn(1 + e f t ) ) .
Algorithm 4.2 SPARSE-STANDARD ALGORITHM FOR EXTENDED TRANSITIVE CLOSURE OF ACYCLIC GRAPHS INPUT: We are given m nodes, numbered 1 , 2 , . . . , m, e arcs among the first n of these, and an amount of main memory s >~ n. We are also given some number of path facts from the first n nodes to any nodes. All paths and arcs are acyclic, in the sense that if they run from node i to node j, then i < j 6). For convenience, we assume that there is a path (of zero length) from every node among the first n to itself. O U T P U T : The set of pairs of nodes (i, j ) such that there is a path from i to j consisting of zero or more given arcs followed by a given path. M E T H O D : If e = O(s), then read all the given edges into main memory. For each of the m nodes, say node v, do the following. 1. Read all given paths from one of the first n nodes to v into main memory. There are at most n such paths, so they fit, since s >~ n is assumed. 2. In, main memory, find all the nodes among the first n that can reach v following arcs and then a path directly to v. N o I / O is needed. 3. Write the up-to-n path facts telling which of the first n nodes reach v along an allowable path. Evidently, O(n) I / O per node is needed in steps (1) and (3), so the total I / O is ran. That cannot exceed our predicted upper b o u n d of mn(1 + e ~ ). Note also that the e I / O steps needed to bring in the e given arcs cannot exceed mn, since
m >~n, and e <~n 2. Now, let us consider the case where e grows more rapidly than s. Define fl = as before. Note that fl grows more slowly than n, since s
nvrs/e and a = r
6) If the reader is concerned that our assumption the nodes are initially topologically sorted hides I/O charges, note that shortly we shall show how depth-first search may be performed with O(e) I/O, provided only that s is as large as the number of nodes involved.
J.D. Ullman, M. Yannakakis / Input~output complexity
C~
I I I I I I I I I i
I I I I i I ! I I I
a
oalm
I I I I I I I I I I
343
I I I I I i ! I I I
Fig. 6. Partitions into big and small zones. grows more slowly than e. W e shall partition the first n nodes into n/fl " b i g zones" of size fl each. All nodes are partitioned into "small zones" of size o; there are m/a such zones. In each case, the zones consist of consecutive nodes, as suggested in fig. 6. In that figure, we have made the tacit assumption that n/fl = 3, and fl = 20. Technically, we should see fig. 6 as two Boolean matrices. One, of size n • m, represents path information, the second, of size n • n, and located at the left end, represents arc information. We execute the algorithm sketched in fig. 7. This algorithm is similar to the pivoting algorithms seen so far. Here we pivot on a square of the matrix whose side is/3; that is, we use a set of/3 consecutive nodes as the "pivot." In order to take what serves as the transitive closure of a pivot block, we must call algorithm (I) if n is sufficiently small that n < fl + 1 then (2) compute the extended transitive closure in any manner else (3) for k := n/fl d o w n t o 1 do begin (4) apply Algorithm 4.2 to the nodes of big zone k (playing the role of the first n nodes) and all higher nodes (playing the role of the remaining m-n nodes); (5) for i := I to k-i do
(6)
for j := 1 to
(7)
m/~r do
for each node v in big zone i, each node x in big zone k, and each node w in small zone j do path(v,w) := path(v,w) OK (arc(v,x) AND path(x,w))
(8)
end Fig. 7. Algorithm 4.2 for the case s << e.
J.D. Ullman, M. Yannakakis / Input/output complexity
344
4.2 recursively. In the recursive call, the pivot block plays the roll of the n • n matrix at the left, as in fig. 7, while that block and everything to its left plays the roll of the entire matrix in fig. 7. Note that, as we assume s << e, there are only a finite n u m b e r of n's for which n < fl + 1. Therefore, the basis case at line (2) applies only to problems of limited size. Thus, all the arcs can be stored in O(1) memory, and can be kept in main m e m o r y no matter what s is. As a result, an O(mn) I / O algorithm for line (2) is easy. One last detail concerns the loop of line (5). If there are O(s) arcs from big zone i to big zone k, then for each j we can execute the loop of lines (6) to (8) with O(s) I / O . As we mentioned, in the average case, there are no more than s arcs between two big zones. However, sometimes there will be more, and if so, we must break up zone i into sets of nodes that each have O(s) arcs to zone k. Since n ~< s is assumed, we can always make the partition. When we perform the analysis of the I / O in theorem 4.5, we shall see that the effect of big zones with more than s arcs to the pivot zone can be neglected. [] THEOREM 4.5 Algorithm 4.2 requires I / O equal to O(mn (1 + ~-e/s )).
Proof The proof proceeds by induction on n. As was just mentioned, the basis case, where n < fl + 1, trivially satisfies the upper bound of the theorem. In the general case, there are two places where I / O is needed: the recursive call of line (4), and the loop of lines (5) through (8). The recursive call is made with fl in place of n and a value of m that does not increase. Since fl is strictly less than n, or else the test of line (1) succeeds, we can apply the inductive hypothesis and assert that each of the n/fl calls at line (4) takes I / O at most O(mfl(1 + ~e-kk/S)), where e k is the n u m b e r of arcs between nodes of the k th big zone. The total I / O due to line (4) can therefore be expressed as
n/B mr(1 + ~--k-k~)"
(4.6)
k=l
The convexity of the square-root function tells us that (4.6) is maximized, subject to the constraint ~n/~,, a . , k = l ~ k ~< e, when all the ek'S are equal. Thus, we m a y take each e k to be e/(n/fl) = fle/n = ~ . Hence, (4.6) is upper b o u n d e d b y
n/B
E m (1 + (e/s)lJ4).
k=l
Now, the summand no longer depends on k, so we m a y take the sum by simply multiplying by n/fl, which says (4.6) is upper bounded by mn(1 + (e/s)1~4).
J.D. Ullman, M. Yannakakis /Input~output complexity
345
Since s << e is assumed, we have shown that the I / O due to recursive calls is asymptotically less than mn (1 + ~-/s ), the claimed upper b o u n d on I / O . Next, we must turn to the loop of lines (5) through (8). Let e,k be the n u m b e r of arcs from big zone i to big zone k. As mentioned, we shall partition big zone i into at most 1 + eik/S groups of s arcs each, so that we m a y bring to main memory these s arcs, along with the 2flo = 2s path facts from nodes in big zones i and k to small zone j. We can then do line (8) for each of these combinations of nodes in main memory. Our conclusion is that we can perform the loop of lines (7) and (8) for fixed i, j, and k, with I / O equal to O(s(1 + e,k/S)) = O(s + eik), if we introduce another loop around lines (6) to (8) that iterates over each group of s arcs from big zone i to big zone k 7~. The total I / O of lines (5) through (8) can thus be written n/fl n/fl m/a
E k=l
E Es+e,k 9 t=l j=l
Since the sum of the
-n-2ms + ~2 0
eik'S
c a n n o t exceed e, the above is upper b o u n d e d b y
me 0
It is easy to check that each of the above two terms simplifies to mn(~/s. Thus, the induction is proved. []
THEOREM 4.7 Algorithm 4.2 correctly computes the extended transitive closure of an acyclic graph.
Proof We perform a double induction, first on the value of n, and second on the number of big zones (values of k) that have so far been considered during the loop of lines (3) to (8) in fig. 7. The first inductive hypothesis says simply that the extended transitive closure is correctly c o m p u t e d for n. The second says that after finishing the loop of lines (3) through (8) for a particular value of k, if there is a path from v to w whose second node is in big zone k or higher, then the fact that there is a path from v to w has been discovered. For the basis of the first induction, we shall take all those values of n for which the test of line (1) holds. There, the claim is trivial, since we use a standard method for computing the extended transitive closure. 7) When the new loop iterates more than once, the bound on the I/O for lines (7) and (8) with fixed i, j and k does not change; that is, it applies to the sum of the I/O used by lines (7) and (8) over all iterations of those lines with the same i, j and k.
346
J.D. Ullman, M. Yannakakis / Input~output complexity
Now, consider the inductive step in the first induction, a value of n for which the test of line (1) fails. We proceed with the second induction, and take as a basis the case where we have not executed the outer loop even once. Then, the second hypothesis holds vacuously. For the second inductive step, consider a value of k. W e invoke the first inductive hypothesis to claim that after step (4), we know the existence of any path whose first arc has both nodes in big zone k. Then, the loop of lines (5) to (8) make us aware of the existence of any path whose second node is in zone k, but whose first node is in a lower-numbered zone. Of course, before starting the outer loop iteration for k, we were aware of any path whose second node is in a zone higher than k, b y the second inductive hypothesis. When we reach k = 1 in the second induction, we are aware of all paths from the first n nodes to all nodes, so for this value of n, the extended transitive closure has been computed correctly. That completes the inductive step of the first induction, and we have proved our claim 8). [] Now, we can apply theorems 4.5 and 4.7 to the case we are really interested in, when m = n, n ~< s ~< e ~< n 2, and there are no given paths. COROLLARY 4.8 If n ~< s ~< e ~< n 2, algorithm 4.2 computes the transitive closure of an n-node, e-arc acyclic graph using s main memory space with I / O equal to O(n2f~/s ). []
5. A lower bound for sparse-standard algorithms We shall now show that every sparse-standard algorithm for computing transitive closure requires I2(nZ(-e/s) I / O . The key is the following combinatorial lemma, which tells us that for every sufficiently large n and e, we can find a directed graph of n nodes and e arcs, such that no set of i nodes, where i is between o and fl, has more than O(tri) arcs among them. Recall that fl = nfs/e, and o = f ~ / n . The argument is "Hungarian"; that is, we show the graph exists b y counting the graphs that fail to meet our conditions, and then claiming that there is at least one graph left. LEMMA 5.1 F o r any o = r there is a constant a with the property that no matter what n, e, and s are, provided that n is sufficiently large, that n ~< e ~ (2), and that tr is I2(log n), there is an acyclic graph of n nodes and e arcs, such that no set of i nodes, where o ~< i ~< fl, has more than a o i arcs in its generated subgraph. 8) Notice that the fact the graph is acyclic is essential. Otherwise, the second induction could not be proved for paths whose first arc goes from some zone to a lower-numbered zone.
J.D. Ullman, M. Yannakakis / Input/output complexity
347
Proof Assume that nodes are n u m b e r e d 1 , 2 , . . . , n, and we only consider acyclic graphs in which all arcs go from a node to a higher-numbered node. The n u m b e r of directed acyclic graphs of n nodes and e arcs is
We must show that the fraction of graphs failing to meet our condition is smaller than 1. A graph fails to meet our condition whenever one of the (~) subsets of size i has more than q = ~oi arcs. For a given set of i nodes, we can upper b o u n d the n u m b e r of graphs with this property by choosing q out of the (~) possible pairs of nodes to have arcs, and then choosing the remaining e - q arcs from a m o n g the remaining n 2 - q pairs of nodes, which will include pairs that are contained in the given set of i nodes. There is considerable double-counting of graphs, and some of them will only have q arcs in the set, rather than q + 1 or more, b u t those discrepancies are in the right direction. Thus, we claim it is sufficient to show the inequality
i~o(i) (2) (!) qq (2) (1. q
-
e
(5.2)
We begin by upper b o u n d i n g
((!))
by (3i2/2q) q. In justification, we may b o u n d (~) by i2/2 and invoke Stirling's approximation 9). If we substitute aoi for one occurrence of q, this quantity becomes ( 3i/2o~o ) q. Next, we turn our attention to the quantity ( (?)qq)/(
(!))'
which, expanded out in factorials becomes
(e-q)'((2)-e)'
(2)'
9) The constant 3 could be the base of the natural logarithms, but we are using e to mean something else, so what the heck.
J.D. Ullrnan,M. Yannakakis / Input~output complexity
348
The above simplifies to (e)(e-
1) - . . ( e - q + 1)
((2))((2)-
1)-.. ((2)-
q + 11"
Since e ~< (2), the largest of the q ratios is e/(2). It follows that
((!)__qq)/((!))<~(e/(2))
q<~(Se/3n2)q
The second inequality comes from the fact that (2) >1 3n2/8 for sufficiently large n, in particular, for n >/4. If we combine our two upper bounds, and substitute aoi for q in the exponent, we see that to show (5.2), it suffices to show
}-~(7)(4ie/aon2)~'~ i=o
When we note that i ~< fl within the summation, and that 13/o = sufficient to show Y', ( 7 ) ( n / a ) ' ~
< 1.
n=/e, we
see it is
(5.3)
Since (7) ~< n', we m a y let k be the small constant ( 4 / a ) ", and observe that to show (5.3), it suffices to show B E ( n k ~ ' < 1. (5.4) Since a is 12(log n), and k can be m a d e as small as we like by picking a sufficiently large, it follows that nk ~ can be m a d e as small as 1/n. Since the largest term in the sum (5.4) is the first term, where i = o, it suffices to show that n(1/n) *gs < 1. But that relationship is immediate for sufficiently large n, proving the lemma. [] Now, we can prove an analog of l e m m a 3.1, at least for the graph whose existence was shown in lemma 5.1. LEMMA 5.5 U n d e r the same conditions as lemma 5.1, there is an acyclic graph G such that any set of s arc and path facts forms no more than O(os) app-triangles with respect to the arcs of G.
Proof By lemma 5.1, there is an acyclic graph G such that no subset of i nodes, o ~< i ~< fl, has more than O(oi) arcs of G a m o n g them. We shall account for all
J.D. Ullman, M. Yannakakis / Input~output complexity
349
the app-triangles in such a w a y that no path fact is charged more than O(o). Consider any node v. We account for the app-triangles containing v and an arc of G as the opposite side, as follows; there are three cases. 1. v appears in no more than o path facts. Then none of these facts can be involved in more than o app-triangles. Charge each path fact for the triangles of which it is a side, and we see that the charge is not greater than o. 2. v has i incident path facts, where o ~ i ~3. Consider the set of neighbors of v, that is, the nodes at the other ends of the path facts. Since this set is between o and/3 in size, we know that there are no more than O(oi) arcs of G among these nodes. Thus we can form only O(oi) app-triangles including v and some opposite side that is an arc of G. Since there are i path facts incident upon v, we may divide the charge for these triangles among those i facts in such a way that each is charged no more than O ( o ) . 3. v has at least/3 incident path facts. The total n u m b e r of arc facts is no more than s, so v can be involved in no more than s app-triangles. Charge each of its incident path facts at most s/fl = o for these triangles. As in lemma 3.1, we argue that each path fact is charged only through its two endpoints, so it receives O ( o ) charge. Thus, the n u m b e r of app-triangles must be O(os). Notice that in the dense case, e = n 2, we have o = / 3 = VCs-, so case (2) is vacuous, and the proof here is essentially that of lemma 3.1. [] THEOREM 5.6 If n ~
Proof The argument repeats that of theorem 3.2. Now, however, we must consider the action of the sparse-standard algorithm on the graph of lemma 5.5, selected with m there equal to 2s here. Divide time into slices, during which s I / O steps are made. Then during a slice there are at most 2s facts with which to build app-triangles. By lemma 5.5, we can make no more than O(os) app-triangles during a slice. As we need to make fg(en) app-triangles in any sparse-standard algorithm (the e arcs can have any of n - 2 nodes opposite), there must be 9(en/os) slices. Each slice causes s I / O , so the total I / O is ~2(en/a). W h e n we substitute f ~ / n for o, the theorem follows by algebraic simplification. [] We should observe that there is a gap in our knowledge, in that we do not know whether our lemmas, or theorem 5.6, are true when o << log n. That condition can only occur when e and s are both very close to n, however. N o t e that when e and s are both O(n), then theorem 5.6 does hold, just because 12(n 2) I / O is necessary to write the answer. Also, even if o << log n, we cannot improve on the upper b o u n d b y more than an O(log n) factor.
350
J.D. Ullman, M. Yannakakis / Input/output complexity
6. An algorithm for general sparse graphs With very little I / O , we can reduce the general case to the acyclic case. We find strong components, by the depth-first search technique of Hopcroft and Tarjan [6]. As we shall see, O(e) I / O suffices. Once we have strong components, we can find the intercomponent arcs, which form an acyclic graph, and apply algorithm 4.2 to it. Finally, we translate the reachability information about strong components into information about the nodes in those components. Notice that this algorithm is not sparse-standard; in fact it is not even standard, because we infer the existence of certain paths by the fact that two nodes are in the same strong component, even though we may not have constructed a path from one node to the other. As a consequence, this algorithm will not generalize to arbitrary closed sernirings. 6.1. I/O-EFFICIENT DEPTH-FIRST SEARCH Suppose that we have main memory s at least n, so we can build in main memory tables indexed by the nodes. Then we can perform a depth-first search during which we read the edges out of each node exactly once. The "trick" is to keep information regarding all the successors of a node in main memory, even if they have not yet been placed in the depth-first spanning tree. Then, the second and subsequent times we need to find a successor of a given node v, those of its successors that have not already been placed in the tree remain in a linked list, and we simply take the first, with no I / O needed, and place it in the tree as a child of v. The main-memory data structure we need is an array, indexed by nodes. The elements of the array are structures capable of holding the following. 1. The parent of the node in the tree being formed. 2. Leftmost-child and right-sibling pointers, so we can construct a list, for each node, of its children in the tree, and its other successors that have not yet been placed in the tree. 3. A bit telling whether the node is in the tree. 4. The depth-first number of the node (the order in which the depth-first search retreated from each node).
Fig. 8. Examplegraph.
J.D. Ullman, M. Yannakakis / Input~output complexity PARENT LEFTMOST RIGHT IN CHILD SIBLING TREE
1
J~ |
DF#
m
|
2
1
3
3
0
-
1
2
0
-
1
1
4 5 6
351
l
J
I,
Fig. 9. Data structure representing tree during depth-first search. In practice, the list of (2) should be doubly-linked. However, we need not put the links in explicitly; in principle, we can find predecessors by scanning the entire array, which takes no I / O .
Example 6.1 The graph of fig. 8 is represented by the data structure of fig. 9 at the time when we retreat from node 2, assuming we started our tree construction at node 1. Notice that nodes 3 and 5, which are successors of 1, are still shown as its children. When we explore the successors of node 3, we shall find 5 among them, and make 5 a child of 3 in the tree, just as we previously discovered 4 is a child of 2 and moved 4 out of the list of children of 1. [] Algorithm 6.4 DEPTH-FIRST SEARCH WITH MINIMAL I/O I N P U T : A directed graph of n nodes and e arcs. O U T P U T : A depth-first spanning forest for the graph, and a depth-first ordering for the nodes. M E T H O D : Start with any node, which becomes the root of the current tree and also becomes the current node. In what follows we can reach the current node by "advancing," which is the first time we visit the node, or by "retreating." Each time we advance to a new current node, which includes the case where we have just made the current node the root of its tree, we do the following. Bring the successors of the current node into main memory, using a buffer of size n. Each successor is examined, in turn. If it is already in the tree, it is deleted from the list. If not already in the tree, it is added to the list of children of the
352
J.D. Ullman, M. Yannakakis / Input~output complexity
current node. If it is on the list of children for some other node (but not yet placed in the tree), we remove it from the list of children for that other node. Thus, no node is ever in the list of children for two different nodes. N o n e of the children of the current node are placed in the tree, if they were not already in the tree. Now, we move to the first node on the list of children of the current node that is not yet in the tree. If there are none, we retreat from the current node, and make its parent the current node. Before we do, we give the current node its depth-first number, which is one more than the previously-awarded depth-first number. If we find a child of the current node that is not yet in the tree, then we place that node in the tree and make it the current node. Whether the new current node was determined by retreating or b y advancing to a child of the current node, we repeat the above loop, with the new current node. If we retreat from the root of the current tree, then we are done. However, there may be other nodes not yet inserted into any tree, because they were not reachable from any root selected so far. If so, select one as a new root, and repeat the entire process above. [] THEOREM 6.5 Algorithm 6.4 constructs a depth-first spanning forest and uses I / O equal to O(e), provided only that e and s are at least n.
Proof Clearly, each node's successors are brought into main m e m o r y only once, when that node becomes the current node for the first time (by "advancing"). Thus, the input cost is O(e). The output cost is O(n), which is no greater, since we need only write two pieces of information - - the parent and depth-first n u m b e r - - for each node. The algorithm is correct because it mimics the standard algorithm, as in Aho et al. [2]. The only difference is the w a y unplaced nodes are stored as "children" of their most recently explored predecessor, until they are placed in the tree. [] COROLLARY 6.6 With O ( e ) I / O , we can find a topological order of an acyclic graph, provided only that e and s are each at least n.
Proof Perform a depth-first search, and take the topological order to be the reverse of the depth-first numbering of the nodes. [] THEOREM 6.7 With O ( e ) I / O , we can partition nodes into strong components, provided e and s are at least n.
J.D. Ullman, M. Yannakakis / Input~output complexity
353
Proof After performing algorithm 6.4, we can do a depth-first search of the forest, implementing the standard strong-components algorithm based on depth-first search, as described in Hopcroft and Tarjan [6]. Recall this algorithm works by finding, for each node, the ancestor closest to the root, that it can reach (the "lowpoint"). W h e n we first reach a node, we need to read in all its successors, to get lowpoint information from its ancestors and nodes to its left, to which it has arcs. We also need to examine the successors of a n o d e w h e n we retreat from a node, to get lowpoint information from its successors in the tree. However, these successors can be read from the tree itself, which m a y be stored in m a i n memory. Thus, O(e) additional I / O suffices to find strong components, once we have the depth-first spanning forest. [] 6.2. THE GENERAL ALGORITHM We can now describe an algorithm for c o m p u t i n g the transitive closure of general graphs.
Algorithm 6.8 TRANSITIVE CLOSURE FOR SPARSE, CYCLIC GRAPHS I N P U T : A directed graph of n nodes and e arcs, and an amount, s, of main memory. We assume n ~< s ~< e ~< n 2. O U T P U T : The transitive closure of the given graph. M E T H O D : Perform the following steps. 1. Find strong components of the graph by the algorithm described in theorem 6.7. 2. Storing the strong c o m p o n e n t for each node in main memory, examine the arcs of the graph to create a list of arcs that run between two strong components; these become the arcs of an acyclic graph for the next step. 3. Perform algorithm 4.2 on the acyclic graph from (2). 4. For each pair of strong components a and b such that it is determined in step (3) that there is an arc from a to b, output the fact that there is an arc from each node in a to each n o d e in b. [] THEOREM 6.9 Algorithm 6.8 correctly determines the transitive closure and uses I / O equal to
O( n2
).
Proof Correctness is well known. The I / O requirement of steps (1) and (2) is clearly O(e), and step (3) requires O(n2~-/-s) by theorem 4.5. Step (4) needs O(n 2) I / O . By our assumptions in algorithm 6.8 that n ~< s ~< e ~< n 2, step (3) dominates, proving our claim. []
354
J.D. Ullman, M. Yannakakis ~Input~output complexity
7. Sparse-standard algorithms for cyclic graphs It can be shown that it is impossible to match the performance of algorithm 6.8 on cyclic graphs with a sparse-standard algorithm. In fact, about another factor of n in I / O is needed. First, observe that we can run algorithm 4.2 n times 10), and each time, we make progress of at least one arc along any path whatsoever, even if the next arc on the path goes from a high-numbered node to a lower-numbered node. Thus, the following is immediate. T H E O R E M 7.1
There is a sparse-standard algorithm to compute the transitive closure of a general graph with I / O equal to O(n3 every/s), under our usual assumptions that
n<~s~e<~n 2. [] 7.1. A L G O R I T H M S THAT FOLLOW ALL PATHS
It is not known whether theorem 7.1 can be beaten by an algorithm that only combines arc facts with path facts (rather than combining two path facts, as the algorithm of fig. 2 does). However, if we add the additional restriction that the algorithm must follow each acychc path, then we can show that the bound of theorem 7.1 is tight. The requirement that all paths be followed is very stringent. For example, when computing the ordinary transitive closure, we only require that at least one path from node v to node w be discovered, not that all acyclic paths be discovered. An algorithm like seminaive evaluation will not normally follow all paths either, since it stops when no new path facts are discovered on a round. On the other hand, there are closed semirings where it appears all acyclic paths must be followed, if we are constrained only to combine arc and path facts, never two path facts. The simplest example is the Boolean closed semiring, where arcs are labeled 0 or 1, and we ask whether there is a path from each node to each other node having only arcs of label 1. That is, some arcs of the graph are "real" (those labeled 1), and others are " f a k e " (those labeled 0). We want to know what paths of real arcs exist. Evidently, our graph could have exactly one real path between v and w. If an algorithm that combines paths with arcs does not start with the trivial path from v to v, and append each arc along the path in turn, then we shall not get the right answer about the existence of a real path from v to W.
Thus, let us define an all-paths standard algorithm to be one in which, for each acyclic path v I ~ v2 ~ "-" ~ ok, there is a sequence of times t2,..., tlc, with t 2 ~< t 3 ~< . . . ~< t,, such that at time ti, the facts arc( v,,v,+ l), path(vDv,) , and path( vl,V,+ l) are in m e m o r y simultaneously. That is, the sequence of triangles, 10) But in line (5), make i range from 1 to n/fl, rather than 1 to k - 1.
J.D. Ullman, M. Yannakakis / I n p u t / o u t p u t complexity
355
each of which has vl as a vertex and has, as opposite side, one of the arcs along the path (other than the first), must appear in main m e m o r y in the same order as the arcs appear on the path. 7.2. A LOWER BOUND FOR ALL-PATHS STANDARD ALGORITHMS We shall now show that ~(n4/vts -) I / O is needed for any all-paths standard algorithm on dense graphs. The result almost generalizes to all graphs, b u t we shall give the result for dense graphs first because it is simpler and introduces most of the key ideas. Intuitively, we look at the complete graph on n nodes, and we select a path such that even-numbered arcs each appear as late as possible in main memory, after the previous even-numbered arc. Further, we wish to focus on an initial node for the path, such that the n u m b e r of triangles with that node as vertex opposite the arc fact is no greater than average. THEOREM 7.2 If s << n 2, then the minimum amount of I / O needed for an all-paths standard algorithm on graphs of n nodes with main m e m o r y of size s is I2(n4/Crs).
Proof Consider the action of the algorithm on the complete directed graph of n nodes. We can divide the steps into slices, during each of which s I / O operations are performed. Suppose that there are m slices (ignore a fractional slice at the end of the algorithm). Let a/)-triangle be a triangle with vertex/) opposite an arc fact, say arc(w, x), along with the path facts path(/), w) and path(/), x). Select as/)1 some vertex such that the number of/)l-triangles in main memory, s u m m e d over all the slices, is no greater than 1 / n t h of the total number of triangles in main memory, again summed over all slices. Surely, /)1 can be found, because all n nodes cannot have their triangles appear more than an average n u m b e r of times. Let a i be the number of /)l-triangles among those in main m e m o r y for the ith slice, i = 1,2,..., m. In the following, we restrict our attention in each slice to the arc facts that form an app-triangle with path facts from v 1. That is, if an arc w ~ x is in main memory, b u t the path fact v I ~ w or the path fact /)a ~ x is missing from main memory, then this triangle is not available during the slice. Now, select the arc /)2 ~/)3 to be an arc satisfying the following. 1. Neither v2 nor v 3 is v 1. 2. N o arc satisfying (1) appears for the first time in a later slice than that in w h i c h /)2 ~ U3 a p p e a r s .
If we call the slice in which/)2 ---'/)3 first appears ii, then we k n o w that
aI + a2+ 9
+ai, >1n2-2n.
The reason is that the left side is the number of/)a-triangles appearing among the first i I slices, and the right side is the number of arcs that do not involve /)1-
J.D. Ullman, M. Yannakakis / Input/output complexity
356
Now, we select an arc v4 ~ v5 according to similar criteria. We want 1. Neither v4 nor v5 is any of v 1, v2, or v3. 2. N o arc satisfying (1) appears for the first time, among slices i I and later, after v4 ~ v5 appears among those slices. Let the slice in which v4 ~ v5 appears be i2; note that i 2 >1 il, and under reasonable conditions we can show i 2 > i 1. We m a y conclude a,1 +
ail+l
q- 9 9 9 + a i 2 >~ n 2 -- 6 n .
Again, the reasoning is that the left side is the total number of vl-triangles available, and the right side is the number of arcs that do not involve the three forbidden nodes. We m a y continue this process indefinitely, until we either run out of candidate arcs because all nodes have been used, or we get to a point where the number of candidate arcs is O(s), and therefore, we cannot claim that successive triangles are made from distinct slices. Formally, let ij be the slice ij_~ or later that includes the last of the arcs that involve no node previously chosen, if we start our search for arcs at slice ij_ 1. Let v2j ~ v2j+l be the arc in question. Then for all j = 1,2,..., we have (taking J0 to be 1) a,j_, + a,j_,+a + . . - +a,j>~ n 2 - ( 2 j + 2)n.
(7.3)
Since s << n 2 is assumed, we can let j <~n/5 and know that the right side of (7.3) is, for sufficiently large n, greater than both n2/2 and any function that is O(s). The point of the latter is that then ij must be strictly greater than ij-1, because not all candidate arcs can fit in main m e m o r y during slice ij_ 1Thus, let us sum (7.3) for j = 1,2 . . . . . n/5. N o a k appears more than twice on the left, because the ij's are an increasing sequence, as we just argued. Thus, the sum of the left is no more than 2~,maa~. The sum of the right sides is at least n3/10. Thus, we have m
Y', ak = I2(n3).
(7.4)
k=l
Since the sum of the ak's is no greater than 1 / n t h the total number of triangles appearing among all the slices, we conclude from (7.4) that the number of triangles among the slices is ~(n4). Now, we remember from lemma 3.1 that a slice can make only O(s 3/2) triangles, so the n u m b e r of slices is ~2(n4/$3/2). Since each slice involves s I / O steps, we conclude that the amount of I / O is ~(n4/~-).
[]
8. A lower bound for all-paths standard algorithms in the sparse case Let us now take up the differences in proof technique necessary to get a lower b o u n d for all-paths standard algorithms on sparse graphs. W e would like to get
J.D. Ullman, M. Yannakakis / Input/output complexity
357
the lower bound 12(n3~-e/s), and although we cannot do so, we only miss some logarithmic factors. If we consider the proof of theorem 7.2, we notice the problem with applying the technique to sparse graphs: the construction of a path that took m a n y slices to cover was done in a complete graph. Thus, we were able to choose any eligible arc v2j ~ v2j+l, sure that the arcs connecting our chosen arcs, such as rEj_ 1 ~ Vzj, exist in the graph. In the sparse case, we must pick a graph like that constructed in lemma 5.5, which has the proper number of edges and the property that no slice can have more than O(os)= O(s3/2v~/n) app-triangles. However, this graph was only shown to exist, so we do not know anything about the degrees of its nodes. In particular, if we happened t o select arcs v2j ~ v2:+1 whose nodes had degree about n, we would lose about 2n eligible arcs at each step, and thus might not be able to continue our " g a m e " of picking arcs that did not appear in early slices for long enough. Fortunately, in most cases we can generalize lemma 5.1 to show that the graph of lemma 5.5 may be assumed to have no nodes of degree m u c h higher than average. Thus, the "game" of selecting arcs m a y be continued until 12(e/n) arcs have been picked for our path. A more serious problem is that if we select v~ so that the n u m b e r of v~-triangles is no more than average, and we select even-numbered arcs on the path to consume as m a n y slices as we can with each arc, we have no reason to believe the selected start node and arcs can be connected by other arcs or paths in the graph. We solve this problem by adding a routing network to the graph. Then, after selecting our arcs, we know we can find node-disjoint paths through the routing network from the head of each selected arc to the tail of the next. We first address the problem of limiting the degrees of nodes.
LEMMA 8.1 For any n, e, and s such that 1. n l o g n < < e , 2. s << e, and 3. o = 12(log n), 11)there is a graph of n nodes and e arcs such that a) N o set of i arc and path facts has more than O(oi) app-triangles, and b) N o node has in- or out-degree more than 2e/n.
Proof The proof is essentially that of lemma 5.1. First, we have no need to make the graph acyclic, so we can modify the count to consider all directed graphs on nodes 1,2,..., n. The effect is to replace (~) by n 2 throughout the proof, but the reasoning is unchanged. We next need to modify the counting argument by throwing out those graphs with a node of in- or out-degree higher than 2e/n. The 11) Recall o = V~/n and fl = n s ~ .
358
J.D. Ullman,M. Yannakakis / Input~output complexity
n u m b e r of such graphs is surely bounded above by
2n(2en/n)( n2-2e/n]e2e/n ] -
As long as e grows faster than n log n, it is easy to show that the above is small n2 compared with the total number of graphs, ( e ) ; we omit the proof. Then, the same argument as lemma 5.1 shows that we can find a graph for which sets of i nodes, where o ~< i ~3, have no more than O(oi) arcs among them. The remainder of the proof follows that of lemma 5.5. [] THEOREM 8.2 If n<~s<
standard
algorithm requires I / O
at least
Proof First, pick the graph G that lemma 8.1 claims exists, but with m = n / l o g n nodes. Parameters e and s do not change, so the n u m b e r of arcs of G, which is e, must satisfy m log m << e; thus lemma 8.1 appfies to the parameters m, s, and e. Then, create a duplicate set of nodes, v' for every v in G, and create a graph H with the nodes of G and the duplicate set. There is an arc v ~ w' in H for every arc o ~ w in G. Finally, add a "split butterfly" or "Benes network," as shown in fig. 10. This network consists of two butterfly networks, the first with the small butterflies at the left and the largest at the right, the second has the largest at the left and smallest at the right. Moreover, the largest butterflies are not duplicated Primed Nodes
Unprimed Nodes
II
|
Split H
--,,,,,,
,,,.-
I
Fig. 10. Graph for theorem 8.2.
!
J.D. Ullman, M. Yannakakis / Input~output complexity
359
Table 1 Summary of upper and lower bounds Graph class
Algorithm class
Lower bound
Upper bound
Dense Sparse, acyclic Sparse, cyclic Dense
Standard Sparsestandard Any
n3/x/s
n3/v~
n2~/e~
Sparse, cyclic
All-paths standard All-paths standard
~estrictions for lower bound
Upper bound generalizes?
n2~/re/s
o = f~(log n)
Yes Yes
?
/,/2~-'7s
_
9
n4/v/s
n4/x/s
-
Yes
na~/logan
n31/-e~
o = fl(log n) s << e
Yes
b u t shared b y the two networks. The nodes at the left are the duplicate set of n / l o g n nodes, and the nodes on the right are identified with the original, unprimed nodes. The total number of nodes in the split butterfly, which has n / l o g n rows and about 2 log n columns, is O(n). The number of arcs is also O(n). Thus, the graph of fig. 10 has O(n) nodes and O ( e ) arcs. The important property of the split butterfly is that we can match nodes on the left with nodes on the right however we wish, and there will exist node-disjoint paths from each node on the left to its mate on the right. See Benes [3]. Now, the proof can follow that of theorem 7.2. We assume the all-paths standard algorithm has its steps divided into slices with s I / O steps each. W e pick v 1 to be a primed node with at most twice the average n u m b e r of vl-triangles, among all the nodes of H 12). Then, we can pick a sequence of arcs, each from an unprimed node to a primed node. Each time we pick an arc, we eliminate from eligibility arcs that enter or leave either end of the selected arc, and by lemma 8.1(b) the number of such arcs is at most 4e/n. Thus, the arcs are selected in turn from among e - 2e log n/n, e - 6e log n / n , e - 10e log n / n , and so on, eligible arcs. Consequently, it is possible to select ~2(n/log n) arcs in turn, with at least half the arcs of H eligible at each turn. Since s << e is assumed, we know that we can, b y picking the last eligible arc that we encounter, when we start our search at the "current" slice, consume at least one slice per arc. Moreover, if ak is the number of vl-triangles in the k t h slice, we can show as in theorem 7.2 that the sum of the ak'S is fg(ne/log n). Since the total number of app-triangles among the slices is at least ~2(n/log n) times the number of vl-triangles, we conclude that the total number of app-triangles among all the slices is at least fg(n2e/logEn). 12) The factor of two comes from the possibility that all the app-triangles formed from nodes of H may have primed nodes as the vertex opposite the arc fact.
360
J.D. Ullman, M. Yannakakis / Input~output complexity
Finally, we argue that the number of app-triangles involving nodes of H in each slice is O ( s 3 / 2 f e l o g n / n ) . Technically, that result applies to G, not H, but it is easy to see that should there be more app-triangles of H in a slice, we could easily construct a set of O ( s ) facts that represent more than that number of app-triangles of G. It follows that the number of slices is at least ~(n31/re//s 3/2 log3n). Since each slice performs s I / O steps, we have our lower bound of 1 2 ( n 3 e ~ / l o g 3 n ) on the I / O . []
9. Summary We summarize the results of this paper in table 1. For all these results, the assumption n ~
References [1] A. Agrawal and H.V. Jagadish, Direct algorithms for computing the transitive closure of database relations, Proc. Int. Conf. on Very Large Data Bases (1987) pp. 255-266. [2] A.V. Aho, J.E. Hopcroft and J.D. UUman, The Design and Analysis of Computer Algorithms (Addison-Wesley, Reading, MA, 1974). [3] V.E. Benes, Mathematical Theory of Connecting Networks and Telephone Traffic (Academic Press, New York, 1965) [4] D. Coppersmith and S. Winograd, Matrix multiplication via arithmetic progressions, Proc. 19th Annual ACM Syrup. on the Theory of Computing (1987) pp. 1-6. [5] H.-T. Kung and J.-W. Hong, The red-blue pebble game, Proc. 13th AnnualACM Syrup. on the Theory of Computing (1980) pp. 326-333. [6] J.E. Hopcroft and R.E. Tarjan, Efficient algorithms for graph manipulation, Comm. ACM 16: 6 (1973) 372-378. [7] Y.E. Ioannidis and R. Ramakrishnan, Efficient transitive closure algorithms, CSTR-765, Dept. of CS, Univ. of Wisconsin (1988). [8] G.L. Lovasz, Combinatorial Problems and Exercises (North-Holland, Amsterdam, 1979). [9] A.C. McKellar and E.G. Coffman, Organizing matrices and matrix operations for paged memory systems, Comm. ACM 12:3 (1969) 153-165. [10] J.D. Ullman, Principles of Database and Knowledge-Base Systems, VoL 1: Classical Database Systems (Computer Science Press, New York, 1989). [11] J.D. Ullman, Principles of Database and Knowledge-Base Systems, VoL 2: The New Technologies (Computer Science Press, New York, 1989). [12] H.S. Warren, A modification of Warshall's algorithm for the transitive closure of binary relations, Comm. ACM 18:4 (1975) 218-220. [13] S. Warshall, A theorem on Boolean matrices, J. ACM 9:1 (1962) 11-12.