ISSN 09655425, Computational Mathematics and Mathematical Physics, 2015, Vol. 55, No. 5, pp. 891–905. © Pleiades Publishing, Ltd., 2015. Original Russian Text © E.V. Djukova, P.A. Prokofjev, 2015, published in Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki, 2015, Vol. 55, No. 5, pp. 895–910.
Asymptotically Optimal Dualization Algorithms E. V. Djukova and P. A. Prokofjev Dorodnicyn Computing Center, Russian Academy of Sciences, ul. Vavilova 40, Moscow, 119333 Russia email:
[email protected],
[email protected] Received October 29, 2014
Abstract—The design of efficient on average algorithms for discrete enumeration problems is studied. The dualization problem, which is a central enumeration problem, is considered. New asymptotically optimal dualization algorithms are constructed. It is shown that they are superior in time costs to ear lier constructed asymptotically optimal dualization algorithms and other available dualization algo rithms with different design features. DOI: 10.1134/S0965542515050103 Keywords: dualization, Boolean matrix, asymptotically optimal algorithm, irreducible covering, enu meration with a polynomialtime delay.
We consider the problem of searching for irreducible coverings of a Boolean matrix. Let L = ||aij||m × n be a Boolean matrix and H be a set of columns of L. The set H is called a covering of L if each row of L has at least one unit element in the columns H. A covering H is said to be irreducible if any proper subset of H is not a covering of L. Let ᏼ(L) denote the set of all possible irreducible coverings of L. The task is to construct ᏼ(L). In the theory of algorithmic complexity, the enumeration of the elements of ᏼ(L) is regarded as the main enumeration problem, which is known as dualization. There are other formulations of dualization, specifically, based on concepts of the theory of Boolean functions and graph and hypergraph theory. Let us present these formulations. (1) Given a conjunctive normal form of m different clauses that implements a monotone Boolean func tion F(x1, …, хn), construct a reduced disjunctive normal form of F. (2) Given a hypergraph Ᏼ on n vertices with m edges, find all minimal vertex coverings of Ᏼ. The efficiency of enumeration algorithms is estimated by the complexity of a single step (see [1]). An algorithm is said to perform with a (quasi) polynomial time delay if, for any individual problem, every step of the algorithm (the construction of the current solution) is executed in (quasi) polynomial time in the input size of the problem. As applied to the search for irreducible coverings, this means that, for any m × n Boolean matrix, the time required for the construction of the current irreducible covering is bounded by a polynomial or quasipolynomial in m and n. In the general case, no dualization algorithm with a (quasi) polynomial time delay has been constructed and it is not known whether it exists. There are examples of such algorithms for some special cases of dualization [1, 2]. For example, in [1] an algorithm with an O(n3) time delay was constructed in the case when each row of L has at most two unit elements, which, in formulation (2), means that Ᏼ is a graph. Studies concerning the complexity of enumeration problems basically address the possibility of con structing incremental (quasi) polynomialtime algorithms. In this case, the incremental property means that, at every step (in the construction of the current solution), an algorithm searches through the set of solutions constructed at the preceding steps and this search takes time that is a (quasi) polynomial in the problem size and the number of previously found solutions. An incremental quasipolynomialtime dual ization algorithm was constructed in [3, 4]. For several special cases of dualization, incremental polyno mialtime algorithms were constructed in [5, 6]. Another approach to the solution of the considered problem is based on the concept of an asymptoti cally optimal algorithm with a polynomial time delay. This approach was first proposed in [7] and deals with the average case. According to this approach, under certain conditions, the original enumeration 891
892
DJUKOVA, PROKOFJEV
problem Z can be replaced by a “simpler” one Z1 that has the same input and is solved with a polynomial time delay. Moreover, first, the solution set of the problem Z1 contains the solution set of the problem Z and, second, with increasing input size, the number of solutions of Z1 is almost always asymptotically equal to the number of solutions of Z. This approach is substantiated by obtaining asymptotics for the average number of solutions to each of the problems Z and Z1. Thus, in contrast to an “exact” algorithm with a polynomial time delay, an asymptotically optimal algorithm can execute “extra” polynomialtime steps. An extra step means the construction of a solution of Z1 that was either found previously or is constructed for the first time, but is not a solution of the prob lem Z. For almost all problems of a given size, the number of extra steps has to have a lower order of growth than the number of all steps of the algorithm as the problem size increases. Whether a step is extra has to be verified in polynomial time in the problem size. A number of asymptotically optimal algorithms of searching for irreducible coverings of a Boolean matrix have been constructed in the case log m ≤ (1 – ε) log n , 0 < ε < 1 (see [7–16]). The following cri terion is used to construct ᏼ(L) in these algorithms. A set H of r columns of the matrix L is an irreducible covering if and only if the following two conditions hold: (1) the submatrix LH of L made up of the columns of H does not contain a row of the form (0, 0, …, 0); and (2) LH contains each of the rows of the form (1, 0, 0, …, 0, 0), (0, 1, 0, …, 0, 0), …, (0, 0, 0, …, 0, 1), i.e., up to a row permutation, contains the identity submatrix of order r. A set of columns satisfying con dition (2) is called compatible. In the asymptotically optimal dualization algorithm AO1 (see [7]), Z1 is the problem of constructing a collection of sets of columns of a matrix L satisfying condition (2) in which each set of length r occurs as many times as there are identity submatrices of order r in this set. In fact, all identity submatrices of L are enumerated with a polynomial time delay. Clearly, an irreducible covering can be generated only by a max imal identity submatrix, i.e., an identity submatrix that is not contained in any other one. A maximal iden tity submatrix generates a maximal compatible set of columns, i.e., a compatible set of columns that is not contained in any other one. According to the algorithm AO1, the maximal identity submatrices (the maximal compatible sets of columns) can be enumerated (enumerated with repeats) with step complexity O(qmn), where q = min{m, n}. As a result of enumerating the identity submatrices, some sets of columns are repeatedly constructed. When obtaining the current maximal identity submatrix Q in time O(mn), the algorithm AO1 checks con dition (1) for the set H of columns of L generated by the submatrix Q. If condition (1) holds, then AO1 checks in time O(mn) whether H was constructed at previous steps. The algorithm AO2 [12], which is a modification of AO1, enumerates (with a polynomial time delay O(qm2n)) only identity submatrices of L that generate coverings. At every step, AO2 constructs an irreduc ible covering. However, as in AO1, the solutions found can repeat. This algorithm takes less extra steps than AO1. Based on AO2, algorithms AO2K and AO2M with a reduced running time were constructed in [16]. The number of extra steps is minimal in the asymptotically optimal algorithm OPT [14], in which the sets of columns of L satisfying condition (2) and some additional conditions, including the maximality one, are enumerated with a polynomial time delay O(qm2n). Extra steps in OPT arise due to the construc tion of maximal compatible sets of columns that are not coverings (do not satisfy condition (1)). The substantiation of asymptotically optimal dualization algorithms is based on the following assertion. Proposition 1. Let (L) be the set of all identity submatrices of the matrix L. If log m ≤ (1 – ε) log n , where 0 < ε < 1, then |(L)| ~ |ᏼ(L)|, n ∞, for almost all m × n Boolean matrices L. The dualization algorithms RS and MMCS were proposed in [17, 18]. Their description makes use of concepts of hypergraph theory. These algorithms are based on constructing sets of vertices S of a hyper graph Ᏺ satisfying the “crit” condition, which is equivalent to compatibility condition (2) for the corre sponding set of columns of the incidence matrix of Ᏺ. Thus, the approach proposed in [17, 18] for the construction of dualization algorithms is not new (in fact, RS and MMCS are asymptotically optimal algorithms). It was shown in [17, 18] that RS and MMCS are much faster than other algorithms, for exam ple, the incremental quasipolynomial algorithm from [4]. COMPUTATIONAL MATHEMATICS AND MATHEMATICAL PHYSICS
Vol. 55
No. 5
2015
ASYMPTOTICALLY OPTIMAL DUALIZATION ALGORITHMS
893
In this paper, we constructed asymptotically optimal algorithms RUNC, RUNCM, and PUNC. Like OPT, each of them is based on the enumeration of compatible sets of columns satisfying additional con ditions (which are different for different algorithms). It is shown that the algorithms proposed have smaller running times than the asymptotically optimal algorithms constructed in [7–16] and are usually faster than the algorithms from [17, 18]. 1. UNDERLYING PRINCIPLES OF PREVIOUSLY CONSTRUCTED AND NEW ASYMPTOTICALLY OPTIMAL DUALIZATION ALGORITHMS Asymptotically optimal dualization algorithms can be formally divided into two types. In algorithms of the first type, maximal identity submatrices (not necessarily all) of a matrix L are enumerated with a poly nomial time delay. Such algorithms include extra steps associated with the repeated construction of max imal compatible sets of columns, since each compatible set of columns can be generated by several identity submatrices. Examples are the algorithms AO1, AO2, AO2K, and AO2M constructed in [7, 12, 16]. In algorithms of the second type, maximal compatible sets of columns (not necessarily all) are enumer ated with a polynomial time delay without repeats. Examples are the OPT algorithm from [14] and the algorithms MMCS and RS from [17, 18]. Constructed in this paper, the dualization algorithms RUNC, RUNCM, and PUNC are of the second type. Let us describe the general scheme for the performance of an algorithm of the second type. First, we introduce the definitions and notation used below. The column j (indexed by j) is said to cover the row i (indexed by i) of a matrix L if aij = 1. Let H be a set of columns of L. The set H is said to cover the row i if there exists j ∈ H covering i. A row i of L is called supporting for a pair (H, j), j ∈ H, if aij = 1 and аil = 0, l ≠ j, l ∈ H. Obviously, the set H is compatible if and only if, for every (H, j), j ∈ H, there exists at least one supporting row. The set of all supporting rows for (H, j) is denoted by S(H, j). A column j of L is called forbidden for a set H of columns if there exists a column l ∈ H such that j covers all the supporting rows for (H, l). Otherwise, the column j is said to be compatible with H. Obviously, H ∪ {j} is a compatible set if and only if the column j is compatible with H. General Scheme for an Algorithm of the Second Type The work of an asymptotically optimal dualization algorithm of the second type can be regarded as a unidirectional traverse of the branches of a decision tree, all of whose vertices, except for the root, are compatible sets of columns. The root of the tree is an empty set of columns. The terminal vertices are either irreducible coverings or correspond to extra steps of the algorithm. Every step of the algorithm rep resents an iterative procedure that constructs a single tree branch starting at the root or at some previously constructed internal vertex. Let С[Н] denote a set of columns used to choose ones to be added to H in the construction a child ver tex for H. At the first iteration of Step 1, a certain rule specific to each algorithm is used to form a set of columns С[∅] of the matrix L, the root becomes the current vertex, and the algorithm goes to the second iteration. Suppose that, at Step s, s ≥ 1, at the iteration t, t ≥ 1, the set H is the current vertex. Then the following procedures are executed at the iteration t + 1. 1. If С[Н] = ∅, then go to the next step (if H is a terminal vertex, the step of the algorithm is considered extra). Otherwise, the first (in order) column j ∈ С[Н] is eliminated from С[Н]. 2. If j is forbidden for H, then remain the current vertex unchanged and go to the next iteration. Oth erwise, construct the vertex Н' = H ∪ {j}. 3. If j covers all rows uncovered by H, then the result of this step is the irreducible covering H' and the algorithm goes to the next step. Otherwise, construct a set of columns С[Н'] according to some rule, make H' the current vertex, and go to the next iteration. Suppose that a set H is produced at Step s, s ≥ 1. Then, at Step s + 1 at the first iteration, among the vertices of the tree branch joining the root to the vertex H, we search for a vertex Н' nearest to H such that COMPUTATIONAL MATHEMATICS AND MATHEMATICAL PHYSICS
Vol. 55
No. 5
2015
894
DJUKOVA, PROKOFJEV
С[Н'] ≠ ∅. If Н' is found, it is made the current vertex and the algorithm goes to the next iteration. Oth erwise, the algorithm terminates. Algorithms of the second type differ in the rules used to construct С[∅] at Step 1 at the first iteration and to construct С[Н] in the creation of a new vertex H. Let us describe these differences. First, we introduce additional definitions and notation. Let R be a set of rows and С be a set of columns of L. Denote by L(R, С) a submatrix made up of rows from R and columns from С. The row i ∈ R is said to envelop the row t ∈ R in L(R, C) if aij ≥ atj ∀j ∈ C. It is well known that, if the row i ∈ R enveloping the row t ∈ R is deleted from the submatrix L(R, C), then its set of irreducible cov erings remains unchanged. The number vj(R) =
∑
j∈C
∑
i∈R
a ij , j ∈ C, is called the weight of the column j in L(R, C). The number wi(C) =
a ij , i ∈ R, is called the weight of the row i in L(R, C). If wi(C) = 0 (vj(R) = 0), the row i ∈ R
(column j ∈ C) is called zero in L(R, C). Obviously, if the row i envelops the row t in L(R, С), then wi(C) ≥ wt(C). Let R[H] denote the set of rows associated with the vertex H. Note that R[H] contains only rows uncov ered by H. Algorithm OPT At the first iteration of Step 1, the submatrix L(R[∅], C[∅]) is constructed by sequentially deleting the enveloping rows and zero columns from the matrix L. Next, the root becomes the current vertex and the algorithm goes to the next iteration. Suppose that H is the current vertex at the tth iteration (t ≥ 1) of step s, s ≥ 1. Then the following actions are executed at the iteration t + 1. 1. [Similar to the general scheme for a secondtype algorithm]. If С[Н] = ∅, then go to the next step. Otherwise, delete the first (in order) column j ∈ С[Н] from С[Н]. Construct the vertex H ' = H ∪ {j}. 2. If the column j covers the rows uncovered by H, then the result of this step is the irreducible covering H' and the algorithm goes to the next step. Otherwise, the submatrix L(R[H'], C[H']) is con structed by deleting the rows covered by the column j and the columns forbidden for H' from the submatrix L(R[H], C[H]). Next, delete the enveloping rows and zero columns sequentially from L(R[H'], C[H']), make H ' the current vertex, and go to the next iteration. At step s, s > 1, at the first iteration, OPT executes the same operations as described in the general per formance scheme for a secondtype algorithm. Let D[H] denote the set of columns associated with the current vertex H. Note that this set is other than С[Н] and performs other functions. Algorithm MMCS At the first iteration of Step 1, the algorithm constructs a set of columns С[∅] covering the row i = 1 of the matrix L. Next, the columns not included in С[∅] are used to construct the submatrix L(R[∅], D[∅]), the root becomes the current vertex, and the algorithm goes to the next iteration. Suppose that, at step s, s ≥ 1, at the iteration t, t ≥ 1, the current vertex is H. Then the following opera tions are executed at the iteration t + 1. 1. [Similar to the general scheme for a secondtype algorithm]. If С[Н] = ∅, then go to the next step. Otherwise, delete the first (in order) column j ∈ С[Н] from С[Н]. If j is forbidden for H, then remain the current vertex unchanged and go to the next iteration. Otherwise, construct the vertex Н' = H ∪ {j}. 2. Add the column j to D[H]. If j covers the rows uncovered by H, then the result of this step is the irre ducible covering H' and the algorithm goes to the next step. Otherwise, choose the leastindex row i uncovered by j in the submatrix L(R[H], D[H]), form a set С[H'] of columns of L(R[H], D[H]) that cover the row i, and construct the submatrix L(R[H'], D[H']) by deleting from L(R[H], D[H]) the rows covered by j and the columns of С[H']. Make H' the current vertex and go to the next iteration. COMPUTATIONAL MATHEMATICS AND MATHEMATICAL PHYSICS
Vol. 55
No. 5
2015
ASYMPTOTICALLY OPTIMAL DUALIZATION ALGORITHMS
895
At step s, s > 1, at the first iteration, the algorithm MMCS executes the same operations as described in the general performance scheme for a secondtype algorithm. The algorithms RUNC (Remove Unallowable Columns) and RUNCM (RUNC Minimum weight row) proposed below combine the ideas of OPT and MMCS. Algorithm RUNC (RUNCM) At the first iteration of Step 1, as in MMCS, choose a row i of the matrix L (i = 1 in RUNC, while the row i of L with a minimum weight is sought in RUNCM), construct a set of columns С[∅] that cover i, and construct the submatrix L(R[∅], D[∅]) by sequentially deleting the enveloping rows and zero col umns from L. Next, the root becomes the current vertex and the algorithm goes to the next iteration. Suppose that, at step s, s ≥ 1, at the iteration t, t ≥ 1, the current vertex is H. Then the following oper ations are executed at the iteration t + 1. 1. [Similar to the general scheme for a secondtype algorithm]. If С[Н] = ∅, then go to the next step. Otherwise, delete the first (in order) column j ∈ С[Н] from С[Н]. Construct the vertex H' = H ∪ {j}. 2. Delete the column j from D[H]. If j covers the rows uncovered by H, then the result of this step is the irreducible covering H' and the algorithm goes to the next step. Otherwise, in the submatrix L(R[H], D[H]), choose a row i uncovered by j (RUNC uses the row with the least index, while RUNCM searches for the row i with the least weight wi(D[H])), form the set С[H'] of columns of L(R[H], D[H]) that cover i, and construct the submatrix L(R[H '], D[H ']) by deleting from L(R[H], D[H]) the rows covered by the column j and, like in OPT, the columns forbidden for H'. Next, make H' the current vertex and go to the next iteration. At step s, s > 1, at the first iteration, RUNC (RUNCM) executes the same operations as described in the general performance scheme for secondtype algorithms. To describe the algorithms RS and PUNC, we need additional definitions and notation. Let H be a set of columns of the matrix L, and let i ∈ {1, …, m}. Denote the set of rows supporting for (H, j), j ∈ H, that lie no lower than the row i by Si(H, j). A set H of columns is said to be icompatible if H covers the rows 1, 2, …, i and S i(H, j) ≠ ∅ ∀j ∈ H. An empty set of columns is called 0compatible. Note that an mcompatible set of columns is an irreducible covering of the matrix L. Suppose that i ≥ 1 and a set H of columns does not cover the row i and is (i – 1)compatible. A column j is called iforbidden for H if j does not cover i or if i > 1 and there is a column l in H such that the column j covers all the rows from S i – 1(H, l). Note that, if the column j is not iforbidden for H, the set H ∪ {j} is icompatible and, hence, compat ible. Otherwise, H ∪ {j} is not icompatible and, depending on the rows i + 1, i + 2, …, m, can be incom patible. Moreover, for an iforbidden column j, there can exist a row t > i such that j is not tforbidden. Algorithm RS At the first iteration of Step 1, construct a submatrix L(R[∅],C[∅]) consisting of the columns of L that cover the first row. Make the root the current vertex and go to the next iteration. Suppose that, at step s, s ≥ 1, at the iteration t, t ≥ 1, the current vertex is H. Then the following actions are executed at the iteration t + 1. 1. [Similar to the general scheme for a secondtype algorithm]. If С[Н] = ∅, then go to the next step. Otherwise, delete the first (in order) column j ∈ С[Н] from С[Н]. Construct the vertex H' = H ∪ {j}. 2. If the column j covers the rows uncovered by H, then the result of this step is the irreducible covering H' and the algorithm goes to the next step. Otherwise, in the submatrix L(R[H], C[H]), choose the first (in order) row i uncovered by j, form a set of columns С[H'] that are not iforbidden for H', and construct a submatrix L(R[H'], C[H']) containing the rows of L(R[H], С[H]) that are not covered by j. Next, make H' the current vertex and go to the next iteration. At the first iteration of step s, s > 1, the algorithm RS executes the same operations as described in the general performance scheme for secondtype algorithms. COMPUTATIONAL MATHEMATICS AND MATHEMATICAL PHYSICS
Vol. 55
No. 5
2015
896
DJUKOVA, PROKOFJEV
The algorithm PUNC (Pending Unallowable Columns) proposed in this work relies on the idea of the algorithm RS. Let i[Н] denote the row index associated with the current vertex H. All rows lying above i[H] are cov ered by the set of columns H. Algorithm PUNC At the first iteration of Step 1, the row i[∅] = 1 is associated with the root and a set of columns С[∅] of the matrix L that cover row 1 is constructed. The root becomes the current vertex, and the algorithm goes to the next iteration. Suppose that, at step s, s ≥ 1, at the iteration t, t ≥ 1, the current vertex is H. Then the following opera tions are executed at the iteration t + 1. 1. [Similar to the general scheme for a secondtype algorithm]. If С[Н] = ∅, then go to the next step. Otherwise, delete the first (in order) column j ∈ С[Н] from С[Н]. Construct the vertex H' = H ∪ {j}. 2. Search for the first (in order) row i', i[H] < i' ≤ m, that is not covered by the column j. If such a row cannot be found, then the result of this step is the irreducible covering H' and the algorithm goes to the next step. Otherwise, form a set of columns С[H'] that are not i'forbidden for H' and make the row i[H'] = i' associated with the vertex H'. Next, make H' the current vertex and go to the next iteration. At the first iteration of step s, s > 1, the algorithm PUNC executes the same operations as described in the general performance scheme for secondtype algorithms. 2. DETAILED DESCRIPTION OF THE ALGORITHMS RUNC AND RUNCM In this section, we give a detailed description of RUNC and RUNCM and analyze the complexity of a single step of these algorithms. As was noted in Section 1, RUNC and RUNCM have insignificant differences. Accordingly, we give their common description and then provide their structural features and experimental efficiency esti mates. For illustrative purposes, we distinguish two procedures, BuildSubtreeRUNC and CreateNodeRUNC, in the description of RUNC and RUNCM. BuildSubtreeRUNC is a recursive procedure that builds a deci sion subtree. CreateNodeRUNC builds a new vertex H of the decision tree and updates the state of the algorithm, namely, the set of rows uncovered by the set H, the set of columns compatible with H, and the sets of rows supporting for (Н, j), j ∈ H. We introduce the following global variables: H is the current compatible set of columns; L(R, C) is the current submatrix; and Sj, j ∈ {l, …, n}, is the current set of supporting rows for (H, j). The dualization of L by RUNC (RUNCM) starts with initializing the global variables and calling the basic recursive procedure BuildSubtreeRUNC (see Algorithm 1). In fact, RUNC and RUNCM differ in the method used to choose a row i at Step 2 of BuildSubtreeRUNC. The choice of i determines the composition of child nodes in the decision tree for the current set H. It has been shown experimentally that RUNCM is usually more efficient than RUNC, since the former involves a substantially smaller number of recursive calls of BuildSubtreeRUNC (the number of internal vertices of the decision tree), which compensates for the computational costs of the search for a minimal weight row in RUNCM. CreateNodeRUNC has the only parameter j, which is a column added to the current set of columns H. At Step 2 of CreateNodeRUNC, a set of rows from R covered by column j become the current set of sup porting rows Sj (these rows are supporting for (H ∪ {j}, j), since they are not covered by H). Next, the rows covered by j are deleted from R. Steps 5–11 are executed for each l ∈ H such that the current set of sup porting rows for (Н, l) is not empty. At Step 5, the rows covered by j are deleted from Sl, since these rows are not supporting for (Н ∪ {j}, l). If the current set Sl, l ∈ H ∪ {j}, contains at least one row uncovered by a set of columns D, the set Si does not influence the forbidden columns in D. Accordingly, Sl at Step 7 is replaced by ∅. At Steps 9–11, the columns forbidden for H ∪ {j} are deleted from D. COMPUTATIONAL MATHEMATICS AND MATHEMATICAL PHYSICS
Vol. 55
No. 5
2015
ASYMPTOTICALLY OPTIMAL DUALIZATION ALGORITHMS
897
Algorithm 1: RUNC (RUNCM) Input: L is an m × n Boolean matrix. Output: ᏼ(L) is a set of irreducible coverings of the matrix L. (1) global variables: Н is the current set of columns; L(R, D) is the current submatrix; Sl, l ∈ {1, …, n}, is the current set of supporting rows for (Н, l); (2) H: = ∅; (3) R: ={1, …, m}; (4) D: ={1, …, n}; (5) for all l = 1, …, n (6) Sl := ∅; (7) call BuildSubtreeRUNC; // construct a decision tree with root ∅. (1) Procedure BuildSubtreeRUNC (2) choose a row i ∈ R; // RUNC: i is the first (in order) row in R; // RUNCM: i is the row with a minimal weight wi(D); (3) for all (j ∈ D : aij = 1) (4) delete j from D; (5) call CreateNodeRUNC(j); (6) if (R = ∅) then (7) output the irreducible covering H; (8) otherwise (9) call BuildSubtreeRUNC; // recursive construction a tree branch (10) delete the changes introduced into the global variables at Step 5. (1) Procedure CreateNodeRUNC(j) (2) Sj : ={i ∈ R: аij = 1}; (3) delete the rows covered by j from R; (4) for all (l ∈ H : Sl ≠ ∅) (5) delete the rows covered by j from Sl; (6) if (∃i ∈ Sl : aip = 0 ∀p ∈ D) then (7) Sl := ∅; (8) otherwise (9) for all (p ∈ D) if (aip = 1, ∀i ∈ Sl) then (10) (11) delete р from D; (12) add j to H;
// add new supporting rows;
// update supporting rows
// delete forbidden columns;
Let us estimate the complexity of a step of RUNC (RUNCM). Without taking into account the recur sive calls, the complexity of BuildSubtreeRUNC and CreateNodeRUNC is O(mn). The depth of the deci sion tree does not exceed q, where q = min{m, n}. Thus, the complexity of a step of RUNC (RUNCM) is O(mnq). Additionally, the algorithm requires O(m + n) memory storage. 3. DETAILED DESCRIPTION OF THE ALGORITHM PUNC In this section, we give a detailed description of PUNC and analyze the complexity of its step. For illustrative purposes, two procedures, BuildSubtreePUNC and CreateNodePUNC, are distin guished in the description of PUNC. BuildSubtreePUNC is a recursive procedure that builds a decision subtree. CreateNodePUNC builds a new vertex H of the decision tree and updates the current state of the algorithm, namely, the row number not covered by the set H and the set of supporting rows for (H, j), j ∈ H. COMPUTATIONAL MATHEMATICS AND MATHEMATICAL PHYSICS
Vol. 55
No. 5
2015
898
DJUKOVA, PROKOFJEV
Let us introduce the following global variables: H is the current compatible set of columns; i is the cur rent row uncovered by H; and Sj, j ∈ {1, …, n}, is the current set of supporting rows for (H, j). The dualization of the matrix L by PUNC begins with initializing the global variables and calling the basic recursive procedure BuildSubtreePUNC (see Algorithm 2). Among the columns covering the current row i, BuildSubtreePUNC enumerates ones that are icom patible with the current set H. At Step 3, icompatibility is verified with the use of the current sets of sup porting rows Sl, l ∈ H. At Step 4, for each icompatible column j, CreateNodePUNC is called to construct the vertex H ∪ {j}. At Step 7, BuildSubtreePUNC is called recursively to construct a subtree with the root H ∪ {j}. When CreateNodePUNC(j) is called, H' = H ∪ {j} becomes the current vertex and the current row i uncovered by H' and the sets of supporting rows for (H', l), l ∈ H' are updated. After calling CreateNo dePUNC, it holds that Sl = Si(H, l), l ∈ H. Comparing the description of RS from [17] and the detailed description of PUNC, we see that, after constructing the current set H, the algorithm RS forms (modifies) the set of rows uncovered by H and the sets of rows S(H, l), l ∈ H, while PUNC modifies the minimal number of the row i uncovered by H and the sets of rows S i(H, l), l ∈ H. The sets S i(H, l), l ∈ H, are sufficient for verifying the icompatibility of columns. As a rule, the time costs for RS to construct and modify S(H, l), l ∈ H, are higher than those for PUNC to construct and modify S i(H, l), l ∈ H. The complexity of a single step of PUNC is bounded by O(mnq), where q = min{m, n}. However, it should be noted that a step of PUNC is executed on average faster than a step of RUNC or RUNCM, since, in the former case, for each row i of L, the fact that i is covered by H is verified at most once. Algorithm 2: PUNC Input: L is an m × n Boolean matrix; Output: ᏼ(L) is a set of irreducible coverings of L; (1) Global variables: H is the current compatible set of columns; i is the current row uncovered by H; Sl, l ∈ {1, …, n}, is the current set of supporting rows for (H, i); (2) H := ∅; (3) i := 1; (4) for all l = 1, …, n (5) Sl := ∅; (6) call BuildSubtreePUNC; // construct a decision tree with root ∅. (1) Procedure BuildSubtreePUNC (2) for all (j : аij = 1) (3) if (∀l ∈ H, ∃t ∈ Sl : atj = 0) then (4) CreateNodePUNC(j); if (i > m) then (5) output an irreducible covering H; otherwise (6) (7) call BuildSubtreePUNC; // recursive call (8) delete the changes introduced into the global variables at Step 4; (1) Procedure CreateNodePUNC(j) (2) for all l ∈ H (3) delete the rows covered by the column j from Sl; (4) add j to H; (5) while (i ≤ m and the set H covers the row i) (6) if (∀l ∈ H: the row i is supporting for (H, l)) then (7) add i to Sl; (8) i:= i + 1; COMPUTATIONAL MATHEMATICS AND MATHEMATICAL PHYSICS
Vol. 55
No. 5
2015
ASYMPTOTICALLY OPTIMAL DUALIZATION ALGORITHMS
899
Table 1. Random matrices, the case m = 10, 50 ≤ n ≤ 300 m×n
OPT
MMCS
RS
RUNC
RUNCM
PUNC
|ᏼ(L)|*
|H|*
10 × 50 10 × 100 10 × 150 10 × 200 10 × 250 10 × 300
0.007 0.06 0.31 1.2 3.2 8
0.008 0.1 0.6 2.3 6.4 15.4
0.008 0.09 0.5 2.2 6 14.4
0.01 0.06 0.3 1.5 4.3 10.8
0.011 0.06 0.3 1.4 4 11
0.007 0.06 0.3 1.2 3.2 7.8
8361 146248 836107 3257326 8666765 20658771
3.8 4.1 4.3 4.5 4.6 4.7
Table 2. Random matrices, the case m = 20, 50 < n < 150 m×n
OPT
MMCS
RS
RUNC
RUNCM
PUNC
|ᏼ(L)|*
|H|*
20 × 50 20 × 75 20 × 100 20 × 125 20 × 150
0.03 0.17 0.7 2.2 5.2
0.04 0.28 1.1 4.1 10.2
0.04 0.27 1.1 3.8 9.2
0.029 0.16 0.6 2.2 5.7
0.021 0.14 0.6 2 5.2
0.03 0.16 0.6 2 5
44552 314154 1269495 4254740 10321112
4.6 4.8 4.9 5 5.1
Table 3. Random matrices, the case m = 30, 50 ≤ n ≤ 110 m×n
OPT
MMCS
RS
RUNC
RUNCM
PUNC
|ᏼ(L)|*
|H|*
30 × 50 30 × 70 30 × 90 30 × 110
0.08 0.4 1.8 4.8
0.12 0.7 3 9.1
0.12 0.6 2.9 8.2
0.07 0.4 1.7 4.7
0.06 0.3 1.4 4.3
0.08 0.4 1.6 4.7
113307 608535 2772442 7917863
5 5.1 5.3 5.4
4. EXPERIMENTAL RESULTS To test the efficiency of the algorithms proposed, we performed a series of experiments with random Boolean matrices, model data, and realworld problems. The algorithms OPT, RUNC, RUNCM, and PUNC were implemented in C++. A feature of their implementation is that the elements of each row of a Boolean matrix with n columns were represented by a sequence of bits of an array of n ⁄ 32 double computer words. As a result, some operations with finite sets of rows and columns were replaced by bit operations over double computer words. The original codes of the constructed algorithms can be found at http://sourceforge.net/p/logicalan alyze/code/HEAD/tree/trunk/dualization/. The original codes of MMCS and RS were taken from http://research.nii.ac.jp/~uno/dualization.html. A set of random matrices was formed following [14]. Each m × n random matrix L was obtained with the help of a randomnumber generator such that each element аij took a value of 0 or 1 with identical probability. For each pair (m, n), we generated 20 random matrices L1, …, L20 and computed the average number of coverings and the average length of a covering. The numerical results for random matrices are given in Tables 1–7. The algorithms were compared in the following cases: 1. n > m (see Tables 1–3); 2. n < m (see Tables 4–6); 3. n = m (see Table 7). The “m × n” column in Tables 1–7 contains the input size of a problem. The |ᏼ(L)|* and |H|* columns indicate the average number of irreducible coverings and the average length of an irreducible covering for matrices of identical size, respectively. The other columns present the average running time (in seconds) of each of the tested algorithms for problems of the corresponding size. COMPUTATIONAL MATHEMATICS AND MATHEMATICAL PHYSICS
Vol. 55
No. 5
2015
900
DJUKOVA, PROKOFJEV
Table 4. Random matrices, the case 50 ≤ m ≤ 300, n = 40 m×n
OPT
MMCS
RS
RUNC
RUNCM
PUNC
|ᏼ(L)|*
|H|*
50 × 40 100 × 40 150 × 40 200 × 40 250 × 40 300 × 40
0.09 0.5 1.1 2.2 3.2 5
0.12 0.5 1.2 2.8 3.6 5.2
0.13 0.6 1.3 2.8 3.6 5.4
0.07 0.3 0.8 1.8 2.3 3.5
0.06 0.24 0.5 1 1.5 2.2
0.09 0.4 1.1 2.6 3.7 5.8
94389 333177 663155 1239948 1593748 2183946
5.5 6.3 6.7 7.2 7.3 7.6
Table 5. Random matrices, the case 70 ≤ m ≤ 230, n = 50 m×n
OPT
MMCS
RS
RUNC
RUNCM
PUNC
|ᏼ(L)|*
|H|*
70 × 50 110 × 50 150 × 50 190 × 50 230 × 50
0.8 2.4 4.6 9.7 15.9
1 3.1 6 11.8 19.1
1 3 5.5 11.2 17.9
0.6 1.8 3.4 7 11.3
0.5 1.4 2.6 5.1 8
0.7 2.4 5.1 10.6 18.1
705711 1813930 3145715 5682139 8450430
6 6.5 6.8 7.1 7.4
Table 6. Random matrices, the case 90 ≤ m ≤ 150, n = 60 m×n
OPT
MMCS
RS
RUNC
RUNCM
PUNC
|ᏼ(L)|*
|H|*
90 × 60 110 × 60 150 × 60
4.8 8.4 22.6
7 12.6 30.3
6.5 10.7 27.7
3.7 6.4 16.4
2.9 5.1 12.9
4.8 8.8 23.3
4213716 6907119 15378069
6.3 6.6 7
Table 7. Random matrices, the case 30 ≤ m = n ≤ 90 m×n
OPT
MMCS
RS
RUNC
RUNCM
PUNC
|ᏼ(L)|*
|H|*
30 × 30 45 × 45 60 × 60 75 × 75 90 × 90
0.015 0.16 1.5 10.4 66.7
0.006 0.2 2.3 17.5 112.4
0.012 0.2 2.2 15.7 99.7
0.009 0.12 1.2 8.9 56.5
0.007 0.1 1 7.6 48.2
0.01 0.13 1.5 10.8 70
8113 160566 1618553 10921285 64273167
4.8 5.5 5.9 6.2 6.5
When the number of rows is not large (m = 10, 20), there is no absolute leader among the algorithms. The three fastest algorithms are OPT, PUNC, and RUNCM (see Tables 1 and 2). In the remaining cases, RUNCM outperforms the other algorithms. Note that the superiority of RUNCM in speed is higher for a larger input matrix (see Tables 3–7). The speed of OPT and RS is lower than that of RUNCM because the time required for executing every step of the former depends strongly on the number m of rows in the input matrix. The speed of the algorithms considered is highly affected by the number of vertices in the decision tree. In terms of this characteristic, the most efficient algorithm is RUNCM. Interestingly, the decision tree in OPT, which takes, as a rule, the least number of redundant steps, almost always has the largest number of vertices. This circumstance suggests that the number of redundant steps weakly correlates with the speed of the algorithm. The algorithms were also tested using the model data from [17]. We used the incidence matrices of the following hypergraphs: 1. М(n) is a graph of combinations on n vertices with m = n/2 edges of the form {2i – 1, 2i}, i ∈ {1, …, m} (the number of irreducible coverings of the incidence matrix of М(n) is 2m); COMPUTATIONAL MATHEMATICS AND MATHEMATICAL PHYSICS
Vol. 55
No. 5
2015
ASYMPTOTICALLY OPTIMAL DUALIZATION ALGORITHMS
901
Table 8. Model data: the combination graph М(n) m×n
OPT
MMCS
RS
RUNC
RUNCM
PUNC
|ᏼ(L)|
|H|*
10 × 21 12 × 25 14 × 29 15 × 31 16 × 33 17 × 35 18 × 37 19 × 39 20 × 41 21 × 43 22 × 45 23 × 47
0.001 0.031 0.001 0.031 0.05 0.09 0.17 0.3 0.7 1.4 2.9 5.8
0.001 0.001 0.016 0.05 0.11 0.16 0.3 0.7 1.6 3.1 6.3 13.2
0.016 0.05 0.05 0.06 0.11 0.2 0.5 0.9 1.7 4.2 8.6 16.8
0.001 0.016 0.001 0.016 0.016 0.05 0.12 0.23 0.5 1.1 2.3 4.6
0.001 0.001 0.001 0.016 0.031 0.06 0.14 0.27 0.6 1.1 2.4 7.1
0.001 0.001 0.016 0.031 0.06 0.17 0.31 0.6 1.4 2.8 5.8 12.2
1024 4096 16384 32768 65536 131072 262144 524288 1048576 2097152 4194304 8388608
10 12 14 15 16 17 18 19 20 21 22 23
Table 9. Model data: the hypergraph DM(n) dual to М(n) m×n
OPT
MMCS
RS
RUNC
RUNCM
PUNC
|ᏼ(L)|
|H|*
1024 × 21 4096 × 25 16384 × 29 32768 × 31 65536 × 33 131072 × 35 262144 × 37 524288 × 39 1048576 × 41
0.06 0.8 11.2 46.9 184.5 – – – –
0.001 0.11 0.9 3.1 10 35 122.3 399.9 –
0.016 0.06 0.6 2.2 6 20.3 70 283.6 –
0.016 0.05 0.27 0.8 2.5 8.9 28.7 93.6 569.9
0.001 0.031 0.28 0.8 2.7 9.5 31.7 109.2 309.2
0.016 0.031 0.17 0.5 1.6 5.6 17.3 51.7 315.4
10 12 14 15 16 17 18 19 20
2 2 2 2 2 2 2 2 2
2. DM(n) is the hypergraph dual to М(n) (a hypergraph Ᏼd is called dual to a hypergraph Ᏼ if the edges of Ᏼd are minimal vertex coverings in Ᏼ); 3. SDFP(n) is a selfdual hypergraph on n = 7k + 2 vertices with edges of the form {{n} ∪ Е : Е ∈ (FP(k))d} ∪ {{n – 1} ∪ Е : Е ∈ FP(k)} ∪ {{n – 1, n}}, where the hypergraph FP(1) = {{1, 2, 3}, {1, 5, 6}, {1, 7, 4}, {2, 4, 5}, {2, 6, 7}, {3, 4, 6}, {3, 5, 7}} defines a projective Fano plane, while the hypergraph F(k), k > 1, has edges of the form {Е ∪ (Е + 7) ∪ … ∪ (Е + 7(k – 1)) : Е ∈ FP(1)} (Е + b denotes the set {е + b : е ∈ Е}; a hypergraph Ᏼ is called selfdual if Ᏼd = Ᏼ); 4. ТН(n) is a threshold graph on an even number n of vertices with m = n2/4 edges of the form {i, 2j}, 1 ≤ i < 2j ≤ n (the number of irreducible coverings of the incidence matrix of ТН(n) is n/2 + 1); 5. SDTH(n) is a selfdual threshold hypergraph on n vertices with m = (n – 2)2/4 + n/2 + 1 edges of the form {{n} ∪ Е : Е ∈ (ТН(n – 2))d} ∪ {{n – 1} ∪ Е : Е ∈ ТН(n – 2)} ∪ {{n – 1, n}}. The algorithms were also tested on the realworld problems from [17]. We used the incidence matrices of the hypergraphs arising in the following realworld problems. (1) Data on the moves made by players in the desktop game Connect4 were used to make up two hypergraphs: WIN and LOSE. The data were taken from the UCI Machine Learning Repository [19]. Each edge of WIN (LOSE) describes the first player’s state when he or she wins (loses). A minimal vertex covering of WIN (LOSE) determines an optimal winning (losing) strategy of the first player. The hyper graphs WIN(m) and LOSE(m) consist of the first m edges of WIN and LOSE, respectively. (2) The hypergraphs BMS and АСС were generated relying on the search for jointly occurring sets. We used the problems BMSWebView2 and accidents from the FIMI Repository (see [20]). The hyper COMPUTATIONAL MATHEMATICS AND MATHEMATICAL PHYSICS
Vol. 55
No. 5
2015
902
DJUKOVA, PROKOFJEV
Table 10. Model data: the selfdual hypergraph SDPF(n) m×n
OPT
MMCS
RS
RUNC
RUNCM
PUNC
|ᏼ(L)|
|H|*
15 × 10 64 × 17 365 × 24 2430 × 31 16843 × 38 117692 × 45
0.001 0.001 0.05 2.7 200.9 –
0.001 0.016 0.016 0.5 25.3 –
0.001 0.001 0.016 0.4 15 –
0.001 0.001 0.016 0.5 20.4 –
0.001 0.001 0.016 0.3 9.1 440.2
0.016 0.016 0.016 0.25 10.4 329.5
15 64 365 2430 16843 117692
3.9 6.3 9.6 12.9 16 19
Table 11. Model data: the threshold graph TH(n) m×n
OPT
MMCS
RS
RUNC
RUNCM
PUNC
|ᏼ(L)|
|H|*
400 × 41 900 × 61 1600 × 81 2500 × 101 3600 × 21 4900 × 141 6400 × 161 8100 × 181 10000 × 201
0.016 0.14 1 4.9 16.5 42.5 100.2 214.5 412.7
0.031 0.031 0.016 0.001 0.001 0.016 0.016 0.05 0.05
0.001 0.016 0.016 0.016 0.001 0.016 0.016 0.05 0.06
0.001 0.001 0.016 0.016 0.031 0.05 0.08 0.12 0.19
0.001 0.016 0.001 0.016 0.05 0.05 0.08 0.14 0.2
0.001 0.016 0.016 0.05 0.08 0.14 0.19 0.31 0.5
21 31 41 51 61 71 81 91 101
29 44 59 74 89 104 119 134 149
Table 12. Model data: the selfdual hypergraph SDTH(n) m×n
OPT
MMCS
RS
RUNC
RUNCM
PUNC
|ᏼ(L)|
|H|*
422 × 43 932 × 63 1642 × 83 2552 × 103 3662 × 123 4972 × 143 6482 × 163 8192 × 183 10102 × 203 14522 × 243 19742 × 283 25762 × 323 32582 × 363 40202 × 403
0.031 0.17 1.1 4.8 15.8 42.2 103.2 217.8 411.1 – – – – –
0.016 0.001 0.001 0.016 0.031 0.031 0.05 0.08 0.09 0.14 0.23 0.4 0.5 0.7
0.016 0.001 0.016 0.016 0.031 0.016 0.05 0.06 0.09 0.16 0.23 0.3 0.6 0.6
0.031 0.016 0.016 0.016 0.031 0.05 0.08 0.11 0.19 0.4 0.7 1.4 2 3.1
0.001 0.001 0.031 0.031 0.031 0.05 0.09 0.12 0.2 0.4 0.9 1.3 2 3.2
0.031 0.031 0.031 0.08 0.17 0.3 0.5 0.7 1.7 2.2 5.7 20.9 12.7 27.6
422 932 1642 2552 3662 4972 6482 8192 10102 14522 19742 25762 32582 40202
4.3 4.4 4.4 4.4 4.4 4.5 4.5 4.5 4.5 4.5 4.5 4.5 4.5 4.5
graphs BMS(m) and АСС(m) consist of the first m edges of BMS and ACC, respectively. A feature of these problems is that each edge of a hypergraph contains almost all its vertices. The numerical results obtained with the model data and realworld problems are presented in Tables 8–12 and Tables 13–16, respectively. The types of the problems are indicated in Tables 8–16. The “m × n” column in Tables 8–16 contains the size of an input matrix. The |ᏼ(L)| and |H|* columns indicate the number of irreducible coverings and the average length of an irreducible covering of a matrix, respec tively. The other columns contain the running time in seconds for each of the tested algorithms. If an algo COMPUTATIONAL MATHEMATICS AND MATHEMATICAL PHYSICS
Vol. 55
No. 5
2015
ASYMPTOTICALLY OPTIMAL DUALIZATION ALGORITHMS
903
Table 13. Realworld problems: the hypergraph WIN(m) m×n
OPT
MMCS
RS
RUNC
RUNCM
PUNC
|ᏼ(L)|
|H|*
100 × 76 200 × 78 400 × 78 800 × 78 1600 × 80 3200 × 82 6400 × 82 12800 × 84 25600 × 84 44473 × 85
0.016 0.05 0.5 1.5 27.8 517.7 – – – –
0.001 0.001 0.016 0.06 0.3 2.9 12 50.5 212.8 –
0.001 0.016 0.016 0.08 0.5 5.3 21.4 84.7 – –
0.001 0.001 0.016 0.05 0.23 2.3 13 57.9 511 –
0.001 0.001 0.031 0.031 0.17 1.3 5.1 20.3 75.5 306.5
0.001 0.031 0.08 0.22 1.7 20.2 104 530.9 – –
287 1145 6069 11675 71840 459502 1277933 4587967 11614885 31111249
10.8 12.1 14.3 15.1 16.7 17.9 18.9 19.9 20.8 21.9
Table 14. Realworld problems: the hypergraph LOSE(m) m×n
OPT
MMCS
RS
RUNC
RUNCM
PUNC
|ᏼ(L)|
|H|*
100 × 77 200 × 77 400 × 79 800 × 81 1600 × 81 3200 × 81 6400 × 81 12800 × 85 16635 × 85
0.031 0.22 1.3 4.2 66.5 – – – –
0.001 0.05 0.11 0.3 1.1 12.9 41.8 185.8 519.5
0.016 0.06 0.16 0.5 1.7 17.1 107.1 449.7 –
0.016 0.031 0.09 0.5 2.2 10.3 58.5 278.6 –
0.001 0.016 0.05 0.17 0.5 4.3 12.5 49.2 137.7
0.016 0.11 0.4 1.5 5.8 71.1 501.2 – –
2341 22760 33087 79632 212761 2396735 4707877 16405082 39180611
11.2 12.5 13.7 14.8 15.9 17.2 17.6 19.3 20.1
Table 15. Realworld problems: the hypergraph ACC(m) m×n
OPT
MMCS
RS
RUNC
RUNCM
PUNC
|ᏼ(L)|
|H|*
81 × 64 447 × 64 2000 × 81 4322 × 336 10968 × 336 32207 × 336 135439 × 442
0.001 0.001 0.05 0.23 1.1 8.2 136.3
0.001 0.016 0.06 0.27 0.8 3.7 33
0.001 0.001 0.05 0.3 0.8 3.4 35.3
0.016 0.031 0.031 0.09 0.25 0.9 7.6
0.001 0.001 0.016 0.08 0.25 0.9 7.3
0.001 0.001 0.031 0.27 1.4 9.9 208.3
253 1039 3547 7617 17486 47137 185218
2.6 3.8 4.7 5.1 5.7 6.5 7.3
Table 16. Realworld problems: the hypergraph BMS(m) m×n
OPT
MMCS
RS
RUNC
RUNCM
PUNC
|ᏼ(L)|
|H|*
62 × 3340 237 × 3340 823 × 3340 2591 × 3340 6946 × 3340 17315 × 3340 30405 × 3340
0.031 0.16 0.7 3.9 21.1 105.3 310.6
0.031 0.17 1.3 15.7 117.4 510.8 –
0.06 0.16 1.3 14.2 98.2 446.7 –
0.05 0.19 0.8 3.8 24.7 94.2 261.7
0.06 0.2 0.8 4 25.1 113.6 315.6
0.031 0.3 3.5 34.8 217.4 – –
4616 15993 89448 438867 1289303 2297560 3064937
1.3 1.8 2 2 2 2 2.1
COMPUTATIONAL MATHEMATICS AND MATHEMATICAL PHYSICS
Vol. 55
No. 5
2015
904
DJUKOVA, PROKOFJEV
rithm failed to complete its work over 600 s, its execution was terminated. In the tables, this situation is marked with a minus. As in the case of random matrices, the numerical results obtained for the model data failed to reveal an absolute leader among the compared algorithms. For the incidence matrix of the combination graph М(n), the best results were produced by RUNC, while OPT and RUNCM were insignificantly inferior. Apparently, the high speed is achieved by deleting the forbidden columns, which is a common procedure in these algorithms. Note that the weight of each row in L is initially equal to 2. Therefore, the removal of enveloping rows in OPT and the search of a min imumweight row in RUNCM require redundant computational costs, but do not reduce the number of vertices in the decision tree. For the incidence matrix of the hypergraph DM(n), the best results were produced by PUNC, which is n 2
the least sensitive to an increase in the number of rows in the input matrix. Recall that DM(n) has m = 2 edges, and the search through them at every iteration step of the algorithm requires a rather large amount of time even for sufficiently small n. The most critical to the number of rows in the input matrix is OPT. The theoretical complexity of its step is proportional to m2. The best results in the dualization problem of the selfdual graph SDFP(n) were again produced by PUNC, which was mainly insignificantly superior to RUNCM. The RS and MMCS algorithms were the fastest in the dualization of the threshold graph ТН(n) and the selfdual hypergraph SDTH(n). The incidence matrices of ТН(n) and SDTH(n) are highly sparse, which, according to the RS and MMCS’s authors, is a condition required for these algorithms to perform efficiently. The tests on realworld problems showed that the best algorithm is RUNCM. Its superiority is espe cially pronounced for largesize input matrices. 5. CONCLUSIONS The approach introduced in [7] and developed in [8–16] was used to address the design and experi mental substantiation of asymptotically optimal dualization algorithms. Two types of algorithms were considered: (1) ones in which the sets of columns generated by maximal identity submatrices are enumerated with a polynomial time delay with repeats (see [7, 12, 16]); (2) ones in which the maximal compatible sets of columns are enumerated with a polynomial time delay without repeats (see [17, 18, 14]). New dualization algorithms of the second type, namely, RUNC, RUNCM, and PUNC were con structed. The general scheme for asymptotically optimal dualization algorithms of the second type was presented. Previously constructed algorithms, namely, OPT [14] and MMCS and RS [17, 18] were described in terms of this scheme. The new algorithms were described in detail with the use of recursive procedures, and the time complexity of their step was analyzed. The experiments performed showed that the efficiency of the algorithms depends on the type of the problem to be solved and there is no absolute leader among the algorithms tested. Nevertheless, RUNCM was found to be the fastest algorithm in most cases. The computations performed in this paper and in [17, 18] demonstrated that asymptotically optimal dualization algorithms surpass algorithms with a different structure, specifically, incremental quasipolynomialtime algorithms. ACKNOWLEDGMENTS This work was supported in part by the Russian Foundation for Basic Research (project nos. 130100787a, 140700819a) and by the program “Leading Scientific Schools” (project no. NSh4908.2014.1). REFERENCES 1. D. Johnson, M. Yannakakis, and C. Papadimitriou, “On generating all maximal independent sets,” Inf. Proc. Lett. 27 (3), 119–123 (1988). 2. T. Eiter, G. Gottlob, and K. Makino, “New results on monotone dualization and generating hypergraph trans versals,” SIAM J. Comput. 32 (2), 514–537 (2003). COMPUTATIONAL MATHEMATICS AND MATHEMATICAL PHYSICS
Vol. 55
No. 5
2015
ASYMPTOTICALLY OPTIMAL DUALIZATION ALGORITHMS
905
3. M. L. Fredman and L. Khachiyan, “On the complexity of dualization of monotone disjunctive normal forms,” J. Algorithms 21, 618–628 (1996). 4. L. Khachiyan, E. Bows, K. Elbassioni, and V. Gurvich, “An efficient implementation of a quasipolynomial algorithm for generating hypergraph transversals and its application in joint generation,” Discrete Appl. Math. 154 (16), 2350–2372 (2006). 5. E. Boros and K. Elbassioni, “Generating maximal independent sets for hypergraphs with bounded edgeinter sections,” LATIN 2004: Theoretical Informatics (Springer, Berlin, 2004), pp. 488–498. 6. E. Boros, V. Gurvich, K. Elbassioni, and L. Khachiyan, “An efficient incremental algorithm for generating all maximal independent sets in hypergraphs of bounded dimension,” Parallel Process. Lett. 10 (4), 253–266 (2000). 7. E. V. Djukova, “Asymptotically optimal algorithm for constricting irredundant tests,” Dokl. Akad. Nauk SSSR 233 (4), 527–530 (1977). 8. E. V. Djukova, “Discrete recognition procedures: The complexity of realization,” Pattern Recogn. Image Anal. 13 (1), 8–10 (2003). 9. E. V. Djukova and R. M. Sotnezov, “On the complexity of discrete generation problems,” Dokl. Math. 82 (3), 847–849 (2010). 10. E. V. Djukova and Yu. I. Zhuravlev, “Discrete methods of information analysis in recognition and algorithm syn thesis,” Pattern Recogn. Image Anal. 7 (2), 192–207 (1997). 11. E. V. Djukova, “The Complexity of the Realization of Certain Recognition Procedures,” USSR Comput. Math. Math. Phys. 27 (1), 74–83 (1987). 12. E. V. Djukova, “On the implementation complexity of discrete (logical) recognition procedures,” Comput. Math. Math. Phys. 44 (3), 532–541 (2004). 13. E. V. Djukova and Yu. I. Zhuravlev, “Discrete analysis of feature descriptions in recognition problems of high dimensionality,” Comput. Math. Math. Phys. 40 (8), 1214–1227 (2000). 14. E. V. Djukova and A. S. Inyakin, “Asymptotically optimal construction of irredundant coverings of an integer matrix,” Mat. Voprosy Kibern. 17, 235–246 (2008). 15. E. V. Djukova and P. M. Sotnezov, “On the complexity of the dualization problem,” Comput. Math. Math. Phys. 52 (10), 1472–1481 (2012). 16. E. V. Djukova and P. A. Prokofjev, “On asymptotically optimal enumeration for irreducible coverings of a Bool ean matrix,” Prikl. Diskret. Mat., No. 1, 96–105 (2014). 17. K. Murakami and T. Uno, “Efficient Algorithms for Dualizing LargeScale Hypergraphs,” Proc, 15. 18. K. Murakami and T. Uno, “Efficient algorithms for dualizing largescale hypergraphs,” Discrete Appl. Math. 170, 83–94 (2014). 19. UCI machine learning repository; http://archive.ics.uci.edu/ml/. 20. Frequent itemset mining dataset repository; http://fimi.ua.ac.be/data/.
Translated by I. Ruzanova
COMPUTATIONAL MATHEMATICS AND MATHEMATICAL PHYSICS
Vol. 55
No. 5
2015