A Graph-Based Ant Colony Optimization for Association Rule Mining

This study is aimed at proposing a graph-based ant colony optimization (ACO) approach for association rule mining (ARM). The ACO-ARM is a two-phase ap...

3 downloads 141 Views 1MB Size

Download PDF

Arab J Sci Eng (2014) 39:4651–4665 DOI 10.1007/s13369-014-1096-5

RESEARCH ARTICLE - COMPUTER ENGINEERING AND COMPUTER SCIENCE

A Graph-Based Ant Colony Optimization for Association Rule Mining Ghassan Saleh Al-Dharhani · Zulaiha Ali Othman · Azuraliza Abu Bakar

Received: 8 August 2012 / Accepted: 31 May 2013 / Published online: 7 May 2014 © King Fahd University of Petroleum and Minerals 2014

Abstract This study is aimed at proposing a graph-based ant colony optimization (ACO) approach for association rule mining (ARM). The ACO-ARM is a two-phase approach comprising a Boolean transactional data representation scheme and the graph-based ACO. The first phase enhances the normal Apriori algorithm and engages in a data representation scheme. The data representation involves an adapted Boolean matrix representation of the transactional data. A standard Apriori algorithm is applied to the represented data, and n-frequent itemsets are generated. The second phase embellishes the ACO-ARM, which relies on the graph of 2-frequent items to generate the final frequent itemset. We have conducted two experiments. The outcomes of these tests reveal that the graph-based ACO-ARM enhances execution time compared to the standard Apriori algorithm. In addition, ACO-ARM improves the process of data representation in the Apriori algorithm. Keywords Ant colony optimization · Association rule mining · Graph-based · Data representation · Frequent items

G. S. Al-Dharhani (B) · Z. A. Othman · A. A. Bakar Faculty of Information Science and Technology, Centre for Artificial Intelligence Technology, Universiti Kebangsaan Malaysia, Bangi, Selangor, Malaysia e-mail: [email protected] Z. A. Othman e-mail: [email protected] A. A. Bakar e-mail: [email protected]

1 Introduction Association rule mining (ARM) was introduced by Agarwal et al. [1]. After the advent of the Apriori algorithm, ARM has rapidly gained popularity. Creating association rules from frequent itemsets is the main focus of this study [2–4]. This research was inspired by market basket analysis. ARM is used in many of the fields, including risk analysis in business organizations, the medical field, market basket data analysis, cross marketing, catalog design, clustering, and classification. The Apriori [2] algorithm is useful in computer science for data mining and learning association rules [5]. It is intended to function on business databases that hold records, such as the types of goods bought by people or the frequency of website visits. Initially, frequent itemsets that are produced hold the criteria of minimum support [6]. In consequent steps, the confidence measures are validated. In turn, the confidence measures induce the generation of rules.

123

4652

Arab J Sci Eng (2014) 39:4651–4665

In graph theory, usually linked nodes are represented using complete graphs [7] on n vertices, which are denoted Kn. The graph is simple, and an edge exists between every set of different vertices. A path is a series of unique vertices that are linked by edges. The important phase for connected nodes in a graph will have a pair of vertices and edges between these vertices in the graph. A graph-based data mining approach is used to identify frequent itemsets based on frequent pattern graphs. These patterns can be found in frequent itemsets [8– 11]. Kuo et al. [12] introduced the ant colony optimization (ACO) reduction technique [13] and the ant colony system (ACS) algorithm. These approaches offer extensive condensed rules compared to the Apriori method. Nevertheless, the authors have stated that these approaches have a few drawbacks in the preparation stages that need to be addressed. For example, Huang et al. [11] proposed the graph-based algorithm that builds a graph and then assigns items to several subgroups to mine association rules. The literature about ACO-ARM in [14–16] describe a variety of approaches and related applications. As graph-based ARM techniques have a variety of approaches and structures, they are not compared with other graph-based approaches, except for the standard Apriori algorithm. Hence, we have carried out experiments to compare ARM techniques with the general Apriori algorithm because it is the common benchmark algorithm. Apart from the introduction above, the paper consists of the following sections: Sect. 2 covers the fundamental concept of the graph-based ACO for ARM, Sect. 3 demonstrates the main methodology for the proposed approach and explains experimental design, Sect. 4 explains the experiment and result, and Sect. 5 concludes our study.

2 Association Rule Mining Naturally, a business-related database grows rapidly. Hence, data mining has become vital in today’s world. In data mining, Apriori [1] and FP-growth (frequent pattern growth) [17] are the core techniques utilized in ARM to obtain sequence patterns for items in a database. Mining frequent pattern is believed to be one of the vital approaches in data mining. This approach is also known as market basket analysis from the ARM task [18]. Patterns (sets of items, sub-sequences, or substructures) occur frequently in datasets. A formal definition of support(x) is based on Aggrawal and Srikant’s work [2]: Let I = i 1 , i 2 , . . . , i m be a pair of m literals known as items. Let database D = t1 , t2 , . . . , tn be a pair of n transactions, both with a pair of items from I . A transaction t ∈ D consists of itemset x if X ⊆ t. Therefore, the support of item-

123

set x is defined as shown in Eq. (2) below, where support (x) = t ∈ D, X ⊆ t/t ∈ D. Mining for frequent patterns in huge datasets is associated with the Apriori algorithm and has many applications; for example, market basket data analysis, cross marketing, catalog design, sale campaign analysis, web log (frequency count) analysis, and DNA sequence analysis. Each transaction t ∈ D is an itemset where t is a proper subset of I. A unique identifier called TID is associated with each transaction. We claim that transaction t supports x, a set of items in I , if x is a proper subset of t. We assume that the items in a transaction or an itemset are sorted in lexicographic order. He et al. [19] stated as shown in Eqs. (1, 2) that an itemset X has support s if s% of the transactions support X . Support of X is denoted as Supp(X). An association rule is an implication of the form X => Y , where X and Y are subsets of I and X ∩ Y = Ø. We emphasize that the rule X => Y applies to database D with confidence C which is denoted as Conf(X→ Y ) if confidence (X → Y ) = P(Y |X ) =

n(X ∪ Y ) ≥C n(X )

(1)

In addition, the rule X => Y has support S if support (X → Y ) = P(X ∪ Y ) =

n(X ∪ Y ) ≥S N

(2)

where N represents the number of transactions in D. Here, support is a measure of the frequency of a rule, and confidence is a measure of the strength of relationships between sets of items. 2.1 Ant Colony Association Rule Mining Graph-Based Data Mining is a fundamental approach based on frequent pattern graphs that can locate frequent itemsets from a database [9,20]. This approach, which was introduced by Huang et al. [11], is superior to the FP-growth algorithm in execution time for synthetic transaction databases. The algorithm represents large itemsets as a graph, constructs the graph based on two frequent itemsets, and divides items into several subgroups for ARM. Swarm Intelligence (SI) is a regulation for artificial, natural, and self-organized systems. SI is inspired by collective behavior concepts. This approach is specifically employed in artificial intelligence. In 1989, Beni and Wang proposed this expression for distributed problem-solving machines also known as robotic systems [21]. Ant Colony behavior is one of the most popular techniques for swarm behavior. The ACO algorithm is helpful in solving problems that require finding paths to destinations. Artificial ‘ants’ ascertain the optimal solutions by moving through a parameter space (e.g., locations, cities, nodes of a graph) to find superior paths through graphs. Real ants

Arab J Sci Eng (2014) 39:4651–4665

deposit pheromones on a path between a food source and their nest to indicate the shortest path from source to destination [22,23]. The ant system (AS) algorithm is applied in various combinatorial optimization problems. The traveling salesman problem (TSP) is the most popular example [24–26]. However, AS has also been used successfully in other combinatorial problems: the AS-QAP algorithm has been used for quadratic assignment problems (QAP), AS-JSP for the jobshop scheduling problem (JSP), AS-VRP for the vehicle routing problem (VRP), AS-SCS for the shortest common super sequence (SCS) problem, and MMAS-TT for the university course timetable problem (TTP). Recently, ARM has been one of the most important applications of ACS that was proposed by Kuo and Shih [10,14]. ARM provides more condensed association rules than the Apriori method. In addition, Kuo and Shih presented ARM through the integration of clustering analysis and an ant colony system. Meanwhile, Shang et al. [13] proposed the ant colony optimization algorithm for solving the traveling salesman problem with association rules. Several studies on ACO for ARM also can be observed in [11,15].

2.2 Graph-Based Ant Colony Algorithms ACO is a part of SI [1]. In nature, ants have the capability to discover the shortest distance to and from a food source and their nest through the deposition of pheromones. Expanding on this phenomenon, let us assume that a static, connected graph G = (N , A) [23], where N is the set of n = |N | nodes, and A is the set of undirected arcs linking them. The set of points between which we seek the minimum-cost path are the source and destination nodes. Cost is calculated as in other minimum-cost path problems (when the cost of arcs is given by their length, the minimum-cost path problem is the same as the shortest-path problem). Occasionally, in reference to the shortest-path-finding behavior of real ants, we call the source the “nest” and the food source the “destination.” The description of a graph, sub-graph, and graph isomorphism (similarity in form) are given. A graph G = (V, E), where V symbolizes a set of nodes (vertices) and E represents the set of edges that connects the nodes. It is easy to extend the graph-based scheme. The graph is measurable and symbolizes itemsets of transactions. We can likely capture frequent itemsets from a large database of customer transactions using a graph-based approach. Computational problems can be reduced by identifying acceptable paths through graphs (ACS). Keeping the intrinsic structural information of the original document intact is the primary advantage of graphbased techniques. The graph-based approach with association rules has the following common steps with ACO:

4653

1. Creation of completely connected graphs from 2-frequent itemsets 2. Initialization of ACO-ARM parameter settings 3. Initialization of ant colony 4. Initialization of pheromones 5. Construction of the pheromones matrix 6. Computation of heuristic information for all nodes with their (weighted) edges in the connected graph 7. Construction of the heuristic matrix 8. Evaluation of pheromones and ants 9. Updating of the quantities of pheromone and evaporation of graph tracing 10. Acquisition of frequent patterns 11. Creation of the new ant colony 12. Obtain optimal last frequent pattern While implementing the ACO in mining association rules, many vital principles are accepted, such as computing heuristic information for all nodes of the graph, obtaining the weight for each edge of vertices, and initializing the parameters (alpha and beta) for cardinality. The preliminary number of ants in a colony is determined by initializing the number of ants and the number of iterations (maximum cycles). These parameters are selected for the graph with the most nodes. Thus, the number of ants will be equal to the number of nodes. The ants pass the construction graph and make a probabilistic decision at each vertex. The process of updating pheromones will continue until the termination criterion is met. Then, frequent patterns are acquired. 3 Proposed Framework In the graph-based ACO for ARM, two main phases are involved: in Phase I, a Boolean-based transactional data representation scheme in Boolean matrix form is proposed to generate n-frequent itemsets and association rules. The data representation scheme tends to enhance the basic Apriori algorithm. The flowchart and schematic diagram of (Phase I) are illustrated in Fig. 1. In the second phase, a graph-based ACO algorithm for ARM is proposed based on the completed graph that was created from the 2-frequent itemsets in Phase I. Figure 2 illustrates the two phases of the framework for the proposed approach. The flowchart and schematic diagram of (Phase II) are illustrated in Fig. 3. 3.1 Phase I: Preliminary Data Representation to Boolean Matrix The first phase is concerned with preliminary data representation. This process, along with similar representations, will be fed into the second phase. The Boolean data representation scheme is implemented to represent items in the

123

4654

Arab J Sci Eng (2014) 39:4651–4665

Fig. 1 Schematic diagram: Phase (I) based on two stages

Dataset

Data Representation

2-frequent itemset for Phase II

Apply Apriori

Candidate is not empty

Yes

Capture once Mining association rules

n-frequent itemset (i=i+1)

transaction database as a Boolean matrix. Then, the standard Apriori algorithm is applied on this new Boolean item representation, where n-frequent itemsets and association rules are generated. This enhances the execution time of the standard Apriori algorithm. A novel transactional data representation scheme in the form of a Boolean matrix is proposed to help us to generate an n-frequent itemset. Later, the proposed B_Apriori algorithm is performed to obtain association rules because of the improved efficiency of detecting frequent itemsets. The construction of an n × m Boolean matrix of items M B (i, j) is performed after loading the data from a sparse dataset, where n is the number of items (i = 1, 2, . . ., n) and m is the number of transactions ( j = 1, 2, . . ., m) as shown in Eq. 3. 1, i j (3) M B (i, j) = 0, i j

4. Use regular expression to replace items by item Vector’s index. 5. Replacement process will reduce the capacity of Boolaen matrix. 6. The columns of Boolean matrix are discritized data while the number of rows is equivalent to the number of transactions. 7. Apply Bin-Apriori on Boolean matrix, to generate nCandidates and get m-frequent itemsets. 8. Occupy 2-frequent itemsets that will be passed to Phase (II) for applying ACO algorithm. 9. B_Apriori that will complete all frequent itemsets and save them with their minSupport values in AVL tree. 10. Capture the last frequent and set confidence value. 11. Obtain all possible subsets with their weight by set theory. 12. Generate all possible association rules which depend on weight and confidence value.

Thus, all possible association rules can also be generated based on their confidence value by the prime possible subsets obtained with their weights via set theory at the end of processing for Phase I. In Phase (I), there are two main stages involved: firstly, a new transactional data representation scheme in the Boolean matrix form is proposed to help us to generate n-frequent itemset, and secondly the improvement to get association rules for proposed Apriori algorithm, because of the efficiency to detect the last frequent itemsets. Automated Boolean matrix with association rules involves several common steps in Apriori algorithm as follows:

The conceptual framework involves two main phases: the initial Phase (I) is executed by transferring data once from the transaction table to the database. This phase is called the data representation scheme because it automatically converts the data from a transaction database into a Boolean matrix. Next, the N-frequent itemsets are generated from the Apriori algorithm using the Boolean matrix from the previous stage. Table 1 shows transaction items, and Table 2 shows an example data discretization. This discretization is part of the data reduction for a given continuous number of values, particularly for numerical data. The discretization process is performed by swapping indices of arrays, as shown in Table 3. These indices were generated from an AVL tree that produces unduplicated sorted frequent patterns. Tables 1 and 2 represent the Phase I. Suppose that we have synthetic sparse data from dataset TI D ∈ TID, where TID is the original transactional database with multiscattered records. Every record holds many data items. The items are sorted and tagged into ordered discrete values.

1. Compute unduplicated sorted n items and m transactions during loading process by AVL tree. 2. Save all unduplicated n items in itemVector. 3. Get the value of parameters: number_of_items, and number_of_rows automatically, then initialize settings for minSupport and minConfident.

123

Arab J Sci Eng (2014) 39:4651–4665 Fig. 2 The graph-based ACO for ARM framework

4655

Phase (I): Read data to AVL-Tree

Sorted unduplicated data (Database) data sets

Traverse DB Once

1 2 3 4 5 7 8 9 10 55

Sorted data

Apply data discretization

Data Representation Scheme

Represent

Retrieve

Discrete Data

Get 1 to n-frequent Itemsets 1-fq-it

2-fq-it

n-fq-it

Capture

Apply Apriori with threshold minSupport

------

With threshold minConfidence

Phase (II):

Set

Create a complete connected graph from 2-frequent itemsets

Association Rules

Obtain

τ 0 = threshold minSupport

Employ ACO

Heuristic Information

Pheromone Matrix

Examine Transition Rule Statement

Tour of ant (1) Tour of ant (2) Tour of ant (n)

Update Local Pheromone

Construction solution Tour of ant (0)

Update global Pheromone

Best tour from best ant

The initial step is to set the value of minimum support number Sto 0.22. The minimum confidence C needed is 0.7. Table 2 shows the new item representation. This is performed by swapping the indices of arrays in Table 3. The indices were generated from an AVL tree that produces unduplicated sorted frequent patterns. Then, the n × m Boolean matrix of items is constructed based on Eq. (1), where n is the number of distinct items (n = 5) and m is the number of transaction (in this case, m = 9). Table 4 shows the Boolean matrix for items obtained from the item database in Table 3. The numbers of 1’s in all of the transactions are computed and denoted as (count of 1’s

Last frequent itemsets

transactions) ct. We compute the probability via confidence of an item i occurring in transactions. This probability is a computed weight. The computed weight is calculated by dividing the number of counts for 1, ct, over the number of transaction n, as shown in Eq. 5. ct ≥S m where ct is a (count of 1’s transactions)

computed _weight =

(4)

The final step of (Phase I) generates rules based on the equivalence subset theory, which states that itemsets can be represented as a set of specific rules. In this example, the

123

4656

Arab J Sci Eng (2014) 39:4651–4665

Fig. 3 Flowchart of proposed graph-based ACO algorithm

Initialization Read once 2-frq items Capture 2-freq itemset to construct Completed Connected Graph

Compute

Output of Phase I with n-frequent itemset

ηij (t )

Create trail of ants and set tour for each ant

Examine Transition Rule Statement

NO

q q 0

YES

Apply the act of Biased Exploration

Ant(1)

Tour

Apply the act of Eploitation

Construction Solution

Local Pheromone is updated

Tour

Ant(2) …. Ant(n)

Tour

Global Pheromone is updated

Termination Iterations

NO

YES Last-frequent Itemsets

last frequent itemset is {1, 2, 5}. Therefore, several rules generated are Rule 1: 1 ˆ 2 → 5; Rule 2: 1 ˆ 5 → 2; Rule 3: 2 ˆ 5→ 1; Rule 4: 1 → 2 ˆ 5, 2 → Rule 5: 1 ˆ 5; and Rule 6: 5 → 1 ˆ 2. The performance of Phase I is compared with the standard Apriori algorithm [1]. 3.2 Phase II: The Graph-Based ACO for Association Rule Mining Phase II develops a graph-based ACO algorithm for ARM using the data representation in Phase I. Phase II also

123

improves the performance of the approach that was introduced in the previous phase. Phase II is initiated by capturing the 2-frequent itemsets from Phase I for creating a completely connected, weighted graph that is suitable for artificial ants. Thus, the 2-frequent itemsets are captured with itemsets that have minSupport values. These minSupport values coordinate the weights of the graph. Then, the ACO algorithm is employed to obtain the last frequent itemset. The following steps illustrate the main algorithm: Stage 1: Initialization:

Arab J Sci Eng (2014) 39:4651–4665 Table 1 Transactions items

4657 TID

T100

T101

T102

T103

T104

T105

T106

T107

T108

List of items

7, 8, 55

8, 10

8, 9

7, 8, 10

7, 9

8, 9

7, 9

7, 8, 9, 55

7, 8, 9

1

2

3

4

5

1

1

1

0

0

1

2

0

1

0

1

0

3

0

1

1

0

0

4

1

1

0

1

0

5

1

0

1

0

0

6

0

1

1

0

0

7

1

0

1

0

0

8

1

1

1

0

1

9

1

1

1

0

0

ct

6

7

6

2

2

ct/m

0.66

0.77

0.66

0.22

0.22

10. Compute ηi j (tc) = d1i j , where di j = min Supporti j is the minimum support value of each 2-frequent item. 11. If q ≤ q0 , then choose the next node (item) j with the transition rule. 12. Apply Eq. (8), otherwise apply Eq. (7). 13. Choose next edge ij if q ≤ q0 is not true, then apply the transition probability in Eq. (7). Afterward, the construction of the solution is performed by moving the k th ant on the graph from node i to node j. Each ant has added items (nodes) to its tour. 14. The local pheromone update is performed by all of the ants after each construction step. Each ant is applied by Eq. (9) only to the last edge traversed. 15. When the cycle of construction solution has finished, then 1 = L best if the best ant used edge (i, j) in compute τibest j its tour, otherwiseτibest = 0. j 16. The global pheromone is applied by Eq. (5) at the end of each the iteration by only one ant. This global pheromone update represents the best so far. 17. Set MaxIteration = MaxIteration + 1 and tc = tc + 1. Then, repeat. 18. Repeat steps 8–12 until MaxIteration is terminated. 19. Each run generates its solution of the last frequent itemset during mining results.

1. Load the data and compute the unduplicated, sorted n items and m transactions during the loading process for the AVL tree. 2. Initialize n as the number of items and m as the number of transactions. Then, set minSupport and minConfidence in Phase I. 3. Set MaxIteration = the maximum number of cycles or iterations. 4. Set tc = 0 as the (time counter) tc which is a counter loop. 5. Set m = the number of ants (trail of ants). 6. Set n = the number of tour locations for each ant. 7. Set τ0 = the minimum support that is used in Phase I and τi j = 0. 8. Set q = random_value, q0 = 0.9, β = c, α = c ∈ (0, 1], ρ ∈ (0, 1) and tour (s) = φ. Stage 2: Construction of the completed, connected graph for ACO: 9. Read the 2-frequent itemsets from the output data of Phase I once, and save all data in di j = the weights of edges between nodes of the graph. Stage 3: Graph-based with ACS: (generation of the new ant colony)

The ants move on the completely connected graph that was generated by snapping all possible candidates of 2-frequent items together. These candidates have three columns: the frequent pattern, count, and computed weight for each frequent itemset. The frequent pattern has two parts, source and destination (e.g., 1 − 2 = 1 is the source, and 2 is the destination), that comprise the nodes of the graph. In addition, the computed weight is the weight of the edge that connects two nodes, as shown in Table 5. A graph-based ACO method has been proposed based on the 2-frequent itemsets that are generated in Phase I to construct a complete weighted graph. The data representation scheme is the base of Phase I. The generation of a Boolean matrix from the transactional database is used for mining association rules with n-frequent itemsets and then applied to the ACO algorithm with a graph-based approach to obtain the last frequent itemset. The generation of 2-frequent itemset is also explored in Phase I. Therefore, it indicates that the ant colony environmental analysis is related to Phase II and illustrates the steps of the proposed graph-based ACO algorithm. In real life, natural ants choose a path based on the density of pheromones. The transformation of real ants to artificial ants includes two main operators: heuristic data (ηi ) and a

Table 2 Transactions items TID

T1

T2 T3 T4

T5 T6 T7 T8

T9

List of items

1, 2, 5 2, 4 2, 3 1, 2, 4 1, 3 2, 3 1, 3 1, 2, 3, 5 1, 2, 3

Table 3 Sorted frequent pattern array 1 7

2 8

3 9

4 10

5 55

Table 4 Boolean matrix of the itemsets

123

4658

Arab J Sci Eng (2014) 39:4651–4665

Table 5 All possible candidates of 2-frequent itemsets from a Boolean matrix Nodes

1–2

1–3

1–4

1–5

2–3

Count

4

4

1

2

4

Weight

0.44

0.44

0.11

0.22

0.44

Nodes

2–4

2–5

3–4

3–5

4–5

Count

2

2

0

1

0

Weight

0.22

0.22

0.00

0.11

0.00

Table 6 Adjacency/heuristic matrix for graph 1

2

3

4

5

1

0

0.44

0.44

0.11

0.22

2

0.44

0

0.44

0.22

0.22

3

0.44

0.44

0

0.0

0.11

4

0.11

0.22

0.0

0

0.0

5

0.22

0.22

0.11

0.0

0

pheromone trail (τi ). The heuristic data are computed using ηi = 1 di , where d is the distance or weight between two nodes in the completed graph. In this algorithm, each artificial ant is inspired by the behavior of real ants. The fundamental idea is to represent objects of nodes in the adjacency matrix that have the same number of rows (source), columns (destination), and data (weight). Therefore, the heuristic matrix is equivalent to the adjacency matrix in Table 6. The adjacency matrix is representative of the completed graph. The pheromone matrix will be updated during the movement of artificial ants on the completed graph when applying the ACO algorithm. The main property of ACO is that pheromone values are updated by all ants that have completed the tour. The graph components vi j are the nodes of the graph, and the pheromone update for τi j (that is, for the pheromone associated with the edge joining nodes i and j) is performed by pheromone evaporation, as shown in Eqs. (6–8) are adapted from Dorigo et al. [23–25,27]. τi j ← τi j .(1 − ρ) +

n

τikj

(5)

k=1

where ρ ∈ (0, 1] is the evaporation rate, n is the number of ants, and τikj s the quantity of pheromone laid on edge (i, j) by the kth ant:

τikj

=

⎧ ⎨ ⎩

1 Lk

if ant(k) used edge(i, j) in its tour

0

otherwise

p

decision at each vertex. The transition probability p(vi j |sk ) of the kth ant moving from node i to node j is given in Eq. (7). ⎧ β τiαj .ηi j p ⎪ ⎪ ⎨ ∈N (s p )τ α .ηβ if j ∈ N (sk ), V p k ij ij ij (7) p(vi j |sk ) = ⎪ ⎪ ⎩ 0 otherwise p

where N (sk ) is the set of vertices that do not yet belong to p the partial solution sk of ant k, α and β are parameters that control the relative importanceof the pheromone versus the heuristic information ηi j = 1 di j , and di j is the length of vertex vi j (i.e., of edge (i,j)). For example, if the initial values of α = 1, β = 1, pheromone = 1, and d1 = 0.44, the result of the heuristic data (ηi ) is η1 = 1/d1 = 2.2727. The ants in the AS pass through the construction graph while constructing solutions. At each location, the ants p make probabilistic decisions. The shift prospect p(vi j |sk ) of the k-th ant moving from node i to node j is given by Eq. (7). The first vital difference between ACS and AS is the form of the decision rule used by ants during the construction process. Ants in ACS use the so-called pseudorandom proportional rule: the prospect of an ant moving from i node to j node is based on a random variable q that is evenly dispersed over [0, 1] and q0 . If q ≤ q0 , then q is distributed β between the feasible components τil ηil that maximize the product, as shown in Eq. (8). Otherwise, the AS equation is the same as Eq. (7). ⎧ β ⎪ ⎨ arg maxvi j ∈ N (s p )τi j .ηi j if q ≤ q (8) j= ⎪ ⎩ p(v |s p ) otherwise ij k This overambitious rule favors the exploitation or utilization of pheromone information. This overambitious exploitation is compensated by the inclusion of an expanded component, such as a local pheromone update. After each construction step, the local pheromone update is performed by all ants, and the local pheromone update is applied only to the final edge crossed: τi j ← τi j .(1 − ϕ) + ϕ.τ0

(9)

where φ ∈ (0, 1] is the pheromone decay coefficient, and τ0 is the initial value of the pheromone, as described by Dorigo and Gambardella [24].

(6)

where L k is the tour length of the k-th ant. In an ant system, when constructing solutions, the ants traverse the construction graph and make a probabilistic

123

4 Experiments This section explains the results of the framework implementation. The framework has two phases that apply to the graphbased ARM with ACO algorithm compared to the normal

Arab J Sci Eng (2014) 39:4651–4665

Apriori algorithm. In addition, a new transactional data representation scheme tailored for graph-based ARM is proposed. Dataset T10I4D100K is chosen based on the number of transactions, average length and type (Sparse). We run our experiments on a laptop with the following specifications: Intel(R) Core(TM) Duo CPU 2.20 GHz, 2 GB RAM, and a 32-bit version of Windows 7. We also use the concept of object-oriented programming and the Java programming language to implement our code. 4.1 Experiment I To validate the performance of the Boolean matrix data representation proposed in Phase I, experiment I was carried out. A new transactional data representation scheme in the automated Boolean matrix form is proposed to produce the Nfrequent itemsets and association rules. Three main processes involved are as follows: first, the AVL tree representation phase (AVLT), where data are read into the AVL tree to obtain unduplicated items and return the number of items and transactions automatically. Second, the Boolean matrix is generated. A value of 1 indicated that an item is available in the original dataset; a value of 0 is used for all other scenarios. Finally, the Apriori algorithm is applied on this matrix to obtain the N-frequent itemset. Table 7 shows the execution time taken in all three processes for AVLT, BLMATRIX the Boolean matrix representation process, and APRIORI_B for applying Apriori algorithms on the Boolean matrix. The experiments used various threshold minimum support values ranging from 0.5 to 4 (Agrawal and Srikant [2]; Yen et al. [8,9]). The time taken by AVLT for each threshold min_sup was the shortest compared to BLMATRIX and APRIORI_B because the representation process was simple and straightforward. The variation of time in AVLT depends on the size of transaction items in the database and is not significant enough for further analysis. Similarly, various minimum supports in the BLMATRIX process do not give any information about time trends because they depend on the distribution of items and the difficulty of transferring those items from the AVL tree to a matrix. However, the APRIORI_B process indicates that as the min_sup threshold values increase, the time for generating items is reduced significantly. This reduction is due to the higher minimum support threshold, which removes items from consideration for rule generation. Table 8 shows the time taken by processes in Phase I (AVLT, BLMATRIX and APRIORI_B) with a minimum threshold support of 0.5 and 5 different sizes of data. It is clearly shown that the processes will take more time as the amount of data increases. The experiment for the Apriori algorithm, based on the Boolean matrix (APRIORI_B), is carried out with five sets of data from the main T10I4D100K.dat dataset. Each set of

4659 Table 7 Execution time of processes in Phase I (with various minimum support threshold values) Threshold minSup (%) AVLT

0.5

1

1.5

2

1.235

1.176

1.214

1.275

BLMATRIX

1,120.316

1,120.46

1,148.002

1,148.525

APRIORI_B

3,282.526

1,600.445

648.525

295.037

2.5

3

3.5

4

Threshold minSup (%) AVLT

1.029

1.029

1.029

1.107

BLMATRIX

1,197.881

1,620.454

1,635.314

1,660.728

APRIORI_B

160.493

79.261

60.076

47.845

Table 8 Experimental results of Phase I with 5 random subdatasets (minSup threshold (0.5 %)) Num_data (thousands)

10

20

30

40

50

Number of items 868

869

870

870

870

AVLT

0.172

0.265

0.343

0.469

0.578

BLMATRIX

111.409 345.775 516.6

437.682

864.351

APRIORI_B

329.563 656.995 1,052.143 1,332.232 1,668.905

data is different from the previous set of data by 10,000 transactions. The first dataset contains 10,000 transactions, the second contains 20,000 transactions, the third dataset contains 30,000 transactions, and so forth. Moreover, the numbers of items are 868, 869, 870, 870, and 870. The five sets are randomly collected from the main dataset by choosing 10,000 transactions from the main dataset for experiments. The same mechanism is repeated with 20,000 transactions, 30,000 transactions, 40,000 transactions, and 50,000 transactions, with the same minSupport threshold of 0.5 % to represent the efficiency of APRIORI_B for generating association rules with an increasing number of transactions. During data representation, with the help of the AVL tree, the system calculates the number of unduplicated items. The Apriori algorithm is a common ARM algorithm. This algorithm is compared to APRIORI_B. The comparisons are performed because the proposed APRIORI_B applies the same mechanism as the standard Apriori algorithm, but in a different manner for computing 1’s in the Boolean matrix in all rows to generate all possible candidates with N-frequent items. We have adopted the same approach that other researchers have carried out for these comparisons. Similar minimum support threshold settings are used for the number of items mentioned earlier. Table 9 presents the execution time taken for both APRIORI_B and the Apriori algorithm (APRIORI) for 868 items and different minimum supports (minSup threshold). The results clearly show that the execution time is faster for APRIORI_B. The results also showed differences in the execution times of AVL tree and Boolean matrix. The proposed method shortens the search

123

4660

Arab J Sci Eng (2014) 39:4651–4665

Table 9 Execution time of Phase I in seconds (868 items) minSup threshold (%) Apriori Apriori_B minSup threshold (%) Apriori Apriori_B

0.5

1

1.5

2

10,387.57

8,630.668

4,868.884

2,111.979

369.706

153.899

62.525

n-frequent itemset

minSupport ≤ 0.005

33.072

32,192,245,298,446

0.00732

184,250,393,773,846

0.00693

192,474,573,806,828

0.00559

302,354,494,501,696

0.00504

302,354,494,501,732

0.00503

302,354,494,696,732

0.00509

302,354,501,696,732

0.00509

302,494,501,696,732

0.00503

354,494,501,696,732

0.00507

2.5

3

3.5

4

1,062.788

446.751

255.013

180.398

15.382

8.412

6.131

5.101

process in two ways. With the five sets of data from the main dataset in Table 8, the average data representation execution time is (156.9873) s, and the average read time from the AVL tree is (0.187286) s. This clearly reveals that the Apriori algorithm with a Boolean matrix is faster than the Apriori algorithm. The execution time is shortened by the transformation of ‘sparse data’ that are related to the average records, that have no “similar” length in multidimensional space in the Boolean matrix, and that have “similar” length in the multidimensional matrix. This mechanism has n-frequent itemsets that are better than the Apriori algorithm in the normal case. Figure 3 illustrates the results of APRIORI and APRIORI_B. The threshold value of minSupport is greater than or equal to 0.005 for the dataset that has 100,000 transactions. The execution time for data representation is 1,331.46 s. The execution time to read data from the AVL tree is 1.136 s, and the execution time of APRIORI_B is 3,382.959 s. This proves that the Apriori algorithm with a Boolean matrix is faster than the Apriori algorithm that is shown in Table 7. The last stage is to validate the quality of the association rules that are generated from APRIORI_B and APRIORI. It is obvious that reduced APRIORI_B will generate fewer rules than the standard Apriori algorithm. Hence, a measure for quality of rules is needed to ensure that the generated rules have a high confidence level in comparison with rules of smaller sizes. A measure of quality is also needed to ensure that important knowledge is preserved while conducting the proposed method. We compute nfrequent itemsets with their threshold minSupport >= 0.005 in our experiments. Table 10 shows n-frequent itemsets with their threshold minSupport values for the T10I4D100K.dat dataset. The rule generation process starts by finding all subsets of set {354, 494, 501, 696, 732} that are contained in the last frequent itemset of dataset. We choose (354–494–501–696– 732) with its threshold minSupport equal to 0.00507 and a confidence value of 0.7 (70 %) to measure the acceptance or rejection of the association rules. In addition, switching this value from 1 to 100 will affect the number of rules. The confidence is the controller in Table 11, which shows the

123

Table 10 n-frequent itemset with Threshold minSupport (T10I4D 100K.dat dataset)

Table 11 Selected and rejected rules with minSup≥0.005 and min_confidence = 70 % n-frequent itemset

No. of rules

Selected rules

Rejected rules

32,192,245,298,446

24

18

6

184,250,393,773,846

24

20

4

192,474,573,806,828

24

18

6

302,354,494,501,696

24

18

6

302,354,494,501,732

24

19

5

302,354,494,696,732

24

18

6

302,354,501,696,732

24

18

6

302,494,501,696,732

24

19

5

354,494,501,696,732

24

18

6

216

166

50

Total

selected rules. The rules are selected based on these confidence values, leading to 6 rules being rejected and 18 rules being selected. Table 11 summarizes the total number of rules generated and the total number of selected and rejected rules with a confidence level of 70 %. The total number of rules is divided into 166 accepted rules and 50 rejected rules. Thus, the threshold minSupport is greater than 0.005 for each n-frequent itemset. In addition, a confidence of 70 % indicates that if the confidence value is different, then the number of accepted and rejected rules will vary because the rules are based on the weight of the last minSupport, which is related to the last frequent pattern. 4.2 Experiment II: Graph-Based ACO The second elementary phase in this study is to construct a completely connected, weighted graph. This graph is the most significant stage of work for applying the ant colony algorithm. Therefore, the process begins by extracting the 2-frequent itemsets from the APRIORI_B algorithm. Each row in the 2-frequent itemset has three values (columns): the

Arab J Sci Eng (2014) 39:4651–4665

4661

Table 12 Average results of ACO with last frequent itemsets and min and max best path Alpha α = 0.1 rate

Max

Min

Execution time ACO

0.1

0.024144

0.01022

5.59805

0.2

0.027757

0.01027

5.6308

0.3

0.028021

0.01003

5.47245

0.4

0.022527

0.01052

5.4756

0.5

0.026556

0.01064

5.43115

0.6

0.025382

0.0107

5.4593

0.7

0.026499

0.01068

5.42335

0.8

0.024918

0.01004

5.46935

0.9

0.024339

0.01053

5.4148

Average

0.025571

0.010403

5.486094

source node, destination node, and the weight. These three values are the core components for the graph. During data configuration, the system captures the 2-frequent itemsets. The experiments for the second phase start by setting the following initial values: 1. 2. 3. 4. 5.

MaxIteration = maximum number of cycles or iterations. tc = 0 as time counter. m = number of ants (trail of ants). n = number of tour locations for each ant. τ0 = minimum support that is used in Phase I, and τi j = 0. 6. q = random_value, q0 = 0.9, β = c, α = c ∈(0, 1], ρ ∈ (0, 1), and tour(s) = φ.

In each experiment, the value of α = c ∈(0,1] starts at 0.1 and increases to 1 over 20 runs. Each value of α runs for MaxIterations times. In addition, β = 2 and 0 < ρ <1. The value of ρ starts at 0.1 and increases to 0.9. During each experiment, ρ remains constant for MaxIterations times. The main idea is to fix a value for α and change the value for the evaporation rate from 0 < ρ < 1. Table 12 presents the execution time of the ACO with the last frequent itemsets and the min and max best paths. The value of q = random_value, q0 = 0.9, β = 2, α = 0.1 and ρ = 0.3, and min and max are the extreme paths on the graph. Setting the number of ants to 10 (m = 10) and the number of run times to 20, each ant has 100 iterations in each run with different values for α = c and ρ = c. Table 13 provides the complete execution time for the experiment using q = rand_value, q0 = 0.9, β = 2, α = 0.1 and ρ = 0.3. These parameter values give a satisfactory solution for the minimum path. In short, there are two factors that have a strong control over the initial values for ACO parameters. Problem definitions with the dataset are specific to each problem.

Table 13 below depicts the results of the comparison between the Apriori algorithm with a Boolean matrix (APRIORI_B) and the proposed graph-based ACO algorithm. The experiment operated on the T10I4D100K dataset. The run time index is shown in the first column. The fourth column presents the execution time of APRIORI_B for n-frequent itemsets with minimum support Ms = 0.005. Columns five and six present the execution time of the graph-based ACO algorithm with different parameters alpha α = 0.1, rate ρ = 0.3, beta β = 2, and initial pheromone τ0 . These values have the same minimum support, Ms = 0.005, as in Phase I. The last column indicates the total execution time for the graphbased ACO. For every run time index, the minimum path indicates the value of parameters that describe a tour of an ant.This experiment shows the comparisons for the graphbased ACO is faster than APRIORI_B. The experimental results show that the graph-based ACO depends on 2-frequent itemsets that significantly reduce the execution time compared with APRIORI_B. The main purpose of showing the execution time of the 2-frequent itemset is to show the additional time of its execution in combination with the execution of the ACO. The execution time required to generate n-frequent itemsets is more than the execution time required to generate 2-frequent itemsets in Phase I. Hence, capturing the 2-frequent itemsets plus ACO is faster than APRIORI_B. In addition, the number of nfrequent itemsets is dependent on the value of minimum support, Ms. This value will (determinate?) the weight for each frequent itemset. On the contrary, moving ants through a completely connected graph probabilistically does not take a long time (refer to Fig. 4). As the ants have a suitable environment for moving through a completely connected graph in the proposed graphbased ACO, this algorithm is faster than the Apriori algorithm with a Boolean matrix. In turn, the Apriori algorithm with a Boolean matrix is better than the Apriori algorithm. The ants examine the transition rule statement and make a decision to proceed from the source to the destination node. In addition, each ant evaluates its pheromone and updates to produce new quantities of pheromone and the evaporation of graph tracing. Therefore, graph-based ACO is better than an Apriori algorithm with a Boolean matrix that contains 1’s and 0’s to indicate the existence of items in the same rows of the matrix. The algorithm computes the number of 1’s for each row and then determines the weight for each frequent item. On the other hand, the ants use a heuristic mechanism to select adequate paths through the graph in a shorter time period than a Boolean matrix scan. This speed advantage depends on the existence of items in the transaction table. The graph-based ACO constructs a graph from 2-frequent itemsets, and ants move through this graph by applying their transition rule statement.

123

4662

Arab J Sci Eng (2014) 39:4651–4665

Table 13 Experimental results for APRIORI_B and graph-based ACO

ACO = (alpha α = 0.1 rate ρ = 0.3) initial pheromone τ0 = minimum support Ms = 0.005 Run time no.

Min best tour

Path length

Exec. time APRIORI_B (s)

Exec. time 2frequent itemsets (s)

Exec. time graph-based ACO (s)

Total exec. time for graph-ACO

1

0.01

16

3,382.96

3216.38

5.54

3,221.92

2

0.01

11

3,382.96

3216.38

5.52

3,221.90

3

0.01

8

3,382.96

3216.38

5.43

3,221.81

4

0.01

22

3,382.96

3216.38

5.41

3,221.79

5

0.01

18

3,382.96

3216.38

5.40

3,221.78

6

0.01

12

3,382.96

3216.38

5.38

3,221.76

7

0.01

23

3,382.96

3216.38

5.46

3,221.84

8

0.01

17

3,382.96

3216.38

5.49

3,221.87

9

0.01

21

3,382.96

3216.38

5.43

3,221.81

10

0.01

23

3,382.96

3216.38

5.35

3,221.73

11

0.01

28

3,382.96

3216.38

5.37

3,221.74

12

0.01

17

3,382.96

3216.38

5.35

3,221.73

13

0.01

30

3,382.96

3216.38

5.46

3,221.84

14

0.01

16

3,382.96

3216.38

5.46

3,221.84

15

0.01

26

3,382.96

3216.38

5.43

3,221.81

16

0.01

24

3,382.96

3216.38

5.35

3,221.73

17

0.01

38

3,382.96

3216.38

5.40

3,221.78

18

0.01

18

3,382.96

3216.38

5.77

3,222.15

19

0.01

27

3,382.96

3216.38

5.66

3,222.04

20

0.01

27

3,382.96

3216.38

5.79

3,222.17

Average

0.01

3,382.96

3,216.38

5.47

3,221.85

Fig. 4 Comparisons between the Apriori algorithm with a Boolean matrix and the normal Apriori algorithm with 10,000 transactions and 868 items from T10I4D100K.dat

5 Discussion and Contribution This study focuses on graph-based ARM using an ant colony algorithm. Two main phases are developed for ARM: a data

123

representation scheme and a graph-based approach using ACO. The data representation scheme falls under the area of improving efficiency. The scheme discovers association rules using a new data representation from the transaction of data. The scheme also generates N-frequent itemsets to build mining association rules. The graph-based approach with ACO focuses on the existing ACO algorithm to reduce the time required to obtain the last frequent itemsets. The graph-based approach can capture frequent itemsets from a database of customer transactions. An appropriate data representation scheme is used to transfer the ‘sparse dataset’ to a Boolean matrix. Then, the scheme constructs an undirected, completely connected and weighted graph that is suitable for ACO. The construction of the graph depends on extracting the 2-frequent itemsets during the first phase of the implementation. The significance of this research comes from the need to reduce time and increase the quality of rules. One of the main issues in data mining is the discovery of association rules from database transactions, where each transaction consists of a set of items. The most time-consuming operation in this discovery procedure is the computation of the frequency of occurrences of interesting subsets (called candidates). In addition, synthetic data are transformed into a Boolean matrix, and then the Apriori

Arab J Sci Eng (2014) 39:4651–4665

4663

approach is applied to rapidly acquire the n-frequent itemset. A completely connected graph is built after 2-frequent itemsets are created. The ACO algorithm can be applied to this graph. The enhanced ACO algorithm obtains the last frequent itemsets. As soon as the database has been scanned, a representation of the data will stimulate the ACS to attain a good result. The ACS system infrastructure utilizes a probabilistic technique for solving computational problems based on graph theory. This research has explained that a completely connected graph is the suitable environment for moving ants. The graph is created by capturing all possible candidates of 2-frequent itemsets that are related to Phase I. The process starts by initializing an ant colony, where a number of ants have their own route. Next, the initialization of pheromones and a computation of heuristic data for all nodes are set to weight edges in the connected graph.

Fig. 5 The discretized table and Boolean matrix that are created automatically

AVL tree

List of Items 7, 8, 55 8, 10 8, 9 7, 8, 10 7, 9 8, 9 7, 9 7, 8, 9, 55 7, 8, 9

Data Discretization 1 2 3 4 5 7 8 9 10 55 Data replacement by index using Regular Expression in java Represent

Discretize

TID T100 T101 T102 T103 T104 T105 T106 T107 T108

Each ant evaluates its pheromones and produces new quantities of pheromones and the evaporation of graph tracing by computing the local pheromone left by all other ants after the constructed solution has finished. Each ant has its own tour. Each ant applies a probabilistic state transition rule for movement to obtain a satisfactory path and to acquire knowledge that exists in the nodes of the graph. This state transition rule mainly depends on the state of the pheromones. The solution is constructed by the movement of the kth ant on the graph from node i to node j. Each ant adds items (nodes) to its tour. In an ant colony system, there is a local pheromone update that is performed by all ants after each construction step. In addition, a global pheromone is applied at the end of each iteration by a single ant. This global pheromone could be the best so far with an adequate path tour. Each ant generates its own solution and obtains an adequate last frequent pattern (last frequent itemset) during mining results. Each

T1 is =row1 and 1is = column1 T1 is =row1 and 2 is = column2 T1 is =row1 and 5 is = column5

HashMap Table TID

List of Items

T1

1 , 2 , 5

1

2

3

4

5

1

1

1

0

0

1

T2

2, 4

2

0

1

0

1

0

T3

2, 3

3

0

1

1

0

0

T4

1, 2, 4

4

1

1

0

1

0

5

1

0

1

0

0

T5

1, 3

6

0

1

1

0

0

T6

2, 3

7

1

0

1

0

0

T7

1, 3

8

1

1

1

0

1

9

1

1

1

0

0

T8

1, 2, 3, 5

T9

1 , 2 , 3

Generate Boolean Matrix automatically

T9 is =row9 and is = column1 T9 is =row9 and 2 is = column2 T9 is =row9 and 3 is = column3

123

4664

Arab J Sci Eng (2014) 39:4651–4665

Fig. 6 Construction of a completed graph from 2-frequent itemsets

All possible candidates of 2- frequent itemsets from Boolean Matrix

Nodes Count Weight

1-2 4 0.44

1-3 4 0.44

1-4 1 0.11

1-5 2 0.22

1

2-4 2 0.22

Destination

weight of edge is 0.44

2-5 2 0.22

3-4 0 0.00

3-5 1 0.11

4-5 0 0.00

Represent 2_frequent itemsets to completed connected graph

Source

2-3 4 0.44

2

0.44

2

1 0.44

Ant Colony

0.22

0.22

0.44

0.22

0.11 3 0.0

4

0.11

0.0 5

6 Conclusion This paper has investigated an area of data mining that is related to mining association rules. The proposed technique has two main phases that are developed for ARM, obtains all frequent itemsets in the first phase, and gain association rules, then captures 2-frequent itemsets to create a completely connected, weighted graph. This graph can be used by the ACO algorithm. Thus, a completely connected, weighted graph is a suitable place for ant movements. This graph is also a suitable tool for acquiring the last frequent itemsets during a mining process and to obtain an adequate solution with the last frequent pattern. The consequence of these experiments proposes B- Apriori algorithm that is based on transactional data representation scheme and reveals the graphbased ACO-ARM enhances execution time compared to the standard Apriori algorithm. Fig. 7 Illustration of the execution time of the graph-based ACO algorithm; this algorithm is faster than the Apriori algorithm with a Boolean matrix

7 Future Works ant then tells other ants to smell and follow its pheromones by spreading globally (Figs. 5, 6, 7).

123

This investigation has identified several interesting directions for future work:

Arab J Sci Eng (2014) 39:4651–4665

Develop a multitasking, multithreaded framework to apply the ACO algorithm in many fields. This system has many threads, and each has a task to complete in its own process and in a specific time. Currently, there are many IT facilities and large hardware capacities, hence, we use them in a multithreading system. An automatic frequent pattern miner, based on data mining techniques with artificial intelligence can be used, to obtain last frequent patterns without any initial values from the user. Future studies can also enhance a data representation phase for a new representation layer with multifunctional purposes, which gives different styles of data types (such as a numerical matrix, graph, Boolean values, or classified table) that will be used in various environments and research fields. The ACO (ant colony optimization) algorithm has many parameters, such as alpha α, beta β, evaporation rate ρ, initial value of pheromone τ0 , and the number of ants that start their jobs to obtain the best result; however, there is no system available to analyze the relationships between these parameters and their initial values. Therefore, the user must create initial values when using existing systems. In addition, many experiments have proved that correct initial values for each parameter are required to acquire adequate results and solutions. There is a need for an expert system that can automatically determine these data-dependent parameter values. Future researchers can develop a hybrid algorithm, which has an ant colony and an artificial bee colony (ABC) with fewer parameters than the ACO algorithm. The ABC has objective and fitness functions that will give a new system that can be combined with the ACO’s deposited pheromone attributes.

References 1. Agrawal, R.; Imielinski, T.; Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 207–216, Washington D.C. (1993) 2. Agrawal, R.; Srikant, R.: Fast algorithms for mining association rules in large databases. In: Bocca, J.B.; Jarke, M.; Zaniolo, C. (eds.) Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, pp. 487–499, Santiago, Chile (1994) 3. Agrawal, R.; Srikant, R.: Mining sequential patterns. In: Proc. Internal Conf. Data Engineering, pp. 3–14 (1995) 4. Agrawal, R.; Mehta, M.; Shafer, J.; Srikant, R.; Arning, A.; Bollinger, T.: The quest data mining system. In: Proc. of KDD’96, pp. 244–249, USA (1996) 5. Han, J.; Kamber, M.: Data mining concepts and techniques, 2nd edn. Elsevier Inc., San Francisco (2006) 6. Mannila, H.; Toivonen, H.; Verkamo, A.I.: Efficient algorithms for discovering association rules. AAAI Workshop on Knowledge Discovery in Databases (SIGKDD). Seattle, pp. 181–192, July 1994 7. David, Gries; Schneider, F.B.: A Logical Approach to Discrete Math., p. 436. Springer, Berlin (1993)

4665 8. Yen, S.J.; Chen, A.L.P.: An efficient approach to discovering knowledge from large databases. In: Proc. of the IEEE/ACM International Conference on Parallel and Distributed Information Systems, pp. 8–18 (1996) 9. Yen, S.J.; Chen, A.L.P.: A Graph-Based Approach for Discovering Various Types of Association Rule, vol. 13, pp. 839–845 (2001) 10. Kuo, R.J.; Shih, C.W.: Association Rule Mining Through the Ant Colony System for National Health Insurance Research Database in Taiwan. Elsevier Inc., Taipei (2007) 11. Huang, L.W.; Chang, Y.I.: Efficient graph-based approach to mining association rules for large databases. Int. J. Intell. Inf. Datab. Syst. 3(3) (2009) 12. Liu, B.; Pan, J.: A graph-based algorithm for mining maximal frequent itemsets. In: Fourth International Conference on Fuzzy Systems and Knowledge Discovery FSKD (2007), IEEE, China 13. Shang, G.; Lei, Z.; Fengting, Z.; Chunxian, Z.: Solving traveling salesman problem by Ant colony optimization algorithm with association rule. In: Third International Conference on Natural Computation (ICNC 2007), China (2007) 14. Kuo, R.J.; Lin, S.Y.; Shih, C.W.: Mining Association Rules Through Integration of Clustering Analysis and Ant Colony System for Health Insurance Database in Taiwan, vol. 33, pp. 794–808. Elsevier Inc., Taipei (2007) 15. Zhu, W.; Wang, J.; Cao, H.; Zhang, Y.: A novel association rule decision algorithm based on ant colony optimization algorithm for ball mill pulverizing system. In: International Conference on Computer Science and Software Engineering, IEEE (2008) 16. Fimi Frequent Itemset Mining Dataset Repository retrieved 2 February 2012 from the World Wide Web: http://fimi.cs.helsinki. fi/data/ 17. Jiawei, H.; Jian, P.; Yiwen, Y.; Runying, M.: Mining frequent patterns without candidate generation. Data Mining and Knowledge Discovery, 8: pp. 53–87 (2004) 18. Han, J.; Kamber, M.: Data Mining Concepts and Techniques. Morgan Kaufman, Burlington (2001) 19. He, Z.; Huang, J.Z.; Xu, X.; Shengchun, D.: A Frequent Pattern Discovery Method for Outlier Detection. In: SpringerLink (ed.) Lecture Notes Computer Science, pp. 726–732 (3129/2004). Springer, Berlin (2004) 20. Kotsiantis, S.; Kanellopoulos, D.: Association rules mining: a recent overview. GESTS Int. Trans. Comput. Sci. Eng. 32(1), 71– 82 (2006) 21. Bonabeau, E.; Dorigo, M.; Theraulez, G.: Swarm Intelligence: From Natural to Artificial Systems. Oxford University Press, NY (1999) 22. Dorigo, M.; Stützle, T.: The ant colony optimization metaheuristic: algorithms, applications and advances. In: Glover, F.; Kochenberger, G. (eds.) Handbook of Metaheuristics. Kluwer Academic Publishers, Norwell, pp. 251–285 (2002) 23. Dorigo, M.; Stützle, T.: Ant Colony Optimization. MIT Press, Cambridge (2004) 24. Dorigo, M.; Maniezzo, V.; Colorni, A.: The ant system: an autocatalytic optimizing process. Technical Report TR91-016, Politecnico di Milano (1991) 25. Dorigo, M.: Optimization, learning and natural algorithms. Ph.D. Thesis, Politecnico di Milano, Milano (1992) 26. Dorigo, M.; Gambardella, L.M.: Ant colony system: a cooperative learning approach to the traveling salesman problem. IEEE Trans. Evol. Comput. 1(1), 53–66 (1997) 27. Dorigo M.; Birattari, M.; Stützle, T.: Ant colony optimization: artificial ants as a computational intelligence technique. IEEE Comput. Intell. Mag. 1(4), 28–39 (2006)

123

A Graph-Based Ant Colony Optimization for Association Rule Mining

Recommend Documents