CSIT (December 2016) 4(2–4):249–254 DOI 10.1007/s40012-016-0119-7
SPECIAL ISSUE REDSET 2016 OF CSIT
A fuzzy association rule mining approach using movie lens dataset Sumana Ghosh1 • Navjot Kaur Walia1 • Parul Kalra1 • Deepti Mehrotra1
Published online: 16 December 2016 Ó CSI Publications 2016
Abstract In the modern era of e-commerce and social networking, people wants to get everything best and better from every other person in their group. There is always an urge within people to find products, services and opportunities better than everyone else. Nowadays, business grows by advertising it online because people are always online and they find it more interesting to review products on a social platform. Mining of association rules is an advanced data mining research area. Currently many works are undergoing in this domain. Completeness is one of the key strengths of fuzzy association rule mining. Fuzzy cannot work with large datasets. The massive amount of candidate item sets sometimes makes it unproductive for a data mining system to analyze them. At the end, it produces a significant number of fuzzy associations. The research focuses on helping the needful to find the best suited movies depending on the previous ratings. Rules have been generated by using fuzzy association rule mining. Keywords Association rule mining Fuzzy logic Movie lens KDD FARM Prediction
& Sumana Ghosh
[email protected] Navjot Kaur Walia
[email protected] Parul Kalra
[email protected] Deepti Mehrotra
[email protected] 1
Department of Information Technology, Amity University, Noida, India
1 Introduction Data mining, sometimes acknowledged as Knowledge Discovery in Databases (KDD), is a process that is used to find unseen, exciting, undiscovered, possibly useful and patterns from large amount of database [1, 2]. The patterns that are discovered from the large databases using the process of data mining, has enabled to make better decisions in different areas. One of the topics that are discussed these days is the discovery of association rules. FARM or fuzzy association rule mining was began with the need of information discovery in fuzzy expert system. A fuzzy system [3] makes use of collection of fuzzy membership and rules. These rules [4] are of the following pattern (Table 1): ‘‘If it is raining then put up your umbrella’’ The significance of above statement is that if part is taken as the antecedent and then part is taken as consequent part. This kind of equation can be used in recommender system to generate the results or give the suggestion to the users. The use of fuzzy logic helps in the broadways of clarification that is open for the users. The dataset that is used in fuzzy association rule mining in this research is Movie Lens dataset. It contains the id of users, their ratings, gender, genre of movie and movie names. These kind of dataset can be used to generate a recommender system that can give suggestions to the users based on their age, genre or name of movie.
2 Background Classification of Association Rule Mining is a leading research area in the field of data mining. It facilitates the extraction of obscured patterns that depends on their
123
250
CSIT (December 2016) 4(2–4):249–254
Table 1 Market Basket transactions Tid
Items
1
Bread, Milk
2
Bread, Dipper, Beer, Egg
3
Milk, Dipper, Beer, Coke
4
Bread, Milk, Dipper, Beer
5
Bread, Milk, Dipper, Coke
frequencies in context to the association rules in the datasets to represent the most frequent trends in the given dataset [5]. These extracted patterns are used for the analysis of physical data or mining tasks like collection and categorization which enables the experts to automate decision making solutions. Association rule mining algorithms are broadly divided into two classes: BFS and DFS [6]. In BFS, the calculation of minimum support is done by all the item sets in a precise level depth. Similarly for DFS, the calculation goes down the structure recursively through numerous depth levels. These two algorithms can be further divided in double sub classes i.e. counting and intersecting. The sample of counting BFS algorithm is Apriori algorithm. It mines frequent patterns from large datasets. This algorithm can also be used for finding frequent patterns and deriving association rules from them. The example of counting DFS class algorithm is FP Growth algorithm. Both these algorithms are the example of classic association rule mining. To better understand the concept of association rule mining [7], the example of Market– Basket analysis can be illustrated. Below discussed table can show the dataset for Market Basket transactions: By analyzing the transactions patterns, the following association rule can be estimated: {Dipper} ? {Beer} {Bread, Milk} ? {Egg, Coke} {Bread, Beer} ? {Milk} This symbol ? denotes the co-occurrence. The above rules implies the form of X ? Y, where X and Y are item sets. There are some common terms that are frequently used while discussing about association rule mining and has been discussed below: 1. 2.
Support: It contains the portion of association rules that contains both X and Y. Confidence: It is the measure of how continually items that is present in Y becomes visible in transactions contained in X. Ratings 0-2 2-3 4-5
Fig. 1 Pre-processing table
123
Category Bad Average Good
Fig. 2 Dataset
3. 4.
Item sets: It is the collection of one or more items in datasets. Example: {Bread, Milk, Dipper, Beer}. Frequent Itemsets: They are the itemset whose support value is larger than or equal to a minimum support threshold. For a set of transactions T in association rule mining, the main aim is to discover all rules that have Support C min_sup threshold and Confidence C min_conf threshold.
Fuzzy association rule mining Algorithm The concept of fuzzy association rule mining comes into picture after the necessity of efficiency in mining quantitative data frequently. Mangalampalli and Pudi [8] has represented the issues associated with rule mining that are based on the sharp partitioning. These are described as follows: A problem of uncertainty arises due to the usage of sharp ranges. This causes the loss of information on the boundaries of these ranges. This may sometimes cause production of wrong results. The partitions does not have the appropriate semantics associated with them. In fuzzy association rule mining the transformation of numerical attributes into fuzzy attributes is done using the fuzzy logic concept. Attribute values are not represented by just 0 or 1. Here attribute values are represented with in a range between 0 and 1 [10]. Fuzzy association rules use fuzzy logic to convert numerical attributes to fuzzy attributes,
CSIT (December 2016) 4(2–4):249–254
251
Fig. 3 Complete KNIME workflow Fig. 4 Scatter plot
like ‘‘Income = High’’, thus maintaining the integrity of information conveyed by such numerical attributes. On the other hand, crisp association rules use sharp partitioning to transform numerical attributes to binary ones like ‘‘Income = [100 K and above]’’, and can potentially introduce loss of information due to these sharp ranges [11]. To overcome the issues that is faced with association rule mining, the concept of fuzzy logic was introduced in association rule mining. FARM, transforms the mathematical attributes into fuzzy attributes by using the concept of fuzzy logic. In this algorithm, the attributes are represented with the range of 0–1. By using this approach, the binary values can be converted into fuzzy attributes which will solve the above stated problem easily [9]. Fuzzy
association rules generates the results that uses both the concepts of fuzzy and association rule mining. The results that you get from FARM are more crisp and accurate.
3 Experiment In this paper the data has been taken from MovieLens [10] which is an open forum created by University of Minnesota for performing research in the field of Information Retrieval, Recommender System and human computer interaction. There are various dataset available out of which the dataset with about 100,000 movie ratings is selected. The first step was to perform preprocessing of the
123
252
CSIT (December 2016) 4(2–4):249–254
Fig. 5 Interactive table
to at least 50 movies. The ratings column has been labelled and the rates have been classified as Good, Average and Bad (Fig. 1). The dataset used the following attributes described below in the Fig. 2. Knime which is an open data mining tool has been used for generating fuzzy rules. It is a Modular platform for building and executing workflows using predefined components, called nodes. Functionality available for tasks such as standard data mining, data analysis and data manipulation. In Knime workflows are generated which calculates the overall score of the dataset used.
Fig. 6 Learner statistic
4 Results
dataset. The data present here is in the raw form and need to be changed as per the need by doing requisite preprocessing, here the fuzzy techniques are used and dataset need to attribute having class label. The data set consists of 4 attributes User ID, Movie ID, Ratings and Timestamp, in this paper timestamp attribute has been eliminated from the dataset as it was not required. Each user has given ratings
A complete KNIME Workflow was created in the tool to generate the rules for fuzzy association rule mining. This tool gives us the predicted rules, scatter plots and interactive table for manipulation of the results (Fig. 3). In this graph, the X-axis displays the User ID and Y-axis displays the Ratings that have been further classified into Good, Average and Bad. Figure 5 shows the interactive
123
CSIT (December 2016) 4(2–4):249–254
253
Fig. 7 Fuzzy rule prediction
table for the rules generated. It divided the ratings into set of three columns Good, Average and Bad by setting their values as 1 or 0 for generating the association rules (Figs. 4, 5). Number of epochs = 6 which means there were total 6 iterations done on the dataset. Number of classes = 3. A total of 1287 rules have been generated per class (Fig. 6). Fuzzy Rule Predictor gave us a predicted rating class depending upon the fuzzy learner rules thus generated. For example: If User ID = 1 Movie ID = 2355 and Rating = Good but Knime Fuzzy Rule Predictor = Average (Fig. 7).
difficult for a user to analyze those rules In any case, if such an immense number of rules do exist in the information, it won’t be suitable to subjectively dispose of any of them or to create just a little subset of them. It is more desirable to summarize them. In this paper fuzzy association rule has been applied on MovieLens dataset. A large amount of rules were generated. These rules helps the user to know about various factors of movie like previous rating, which helps them to decide whether they want to see the movie or not. Generally people read reviews before going for any movie. This system can help them to judge based on the past experiences of other users. In this paper the ratings of the movie has been predicted using fuzzy rule learning approach and how fuzzy rules can be manually inspected by a user.
5 Conclusion Association rule mining is a dynamic information mining research range. Fuzzy association rules portrayed by the natural language are appropriate for considering human subjects. In this manner, fuzzy association rules will be useful to build the adaptability for the clients in settling on any choices or outlining the fuzzy frameworks. When a mining system creates a huge number of rules it becomes
References 1. Turksen IB, Tian Y (1993) Combination of rules or their consequences in fuzzy expert systems. Fuzzy Sets Syst 58(1):3–40 2. Han J, Cai Y, Cercone N (1992) Knowledge discovery in databases: an attribute-oriented approach. In: VLDB, vol 92, pp 24–27
123
254 3. Fayyad UM, Piatetsky-Shapiro G, Smyth P (1996) Knowledge discovery and data mining: towards a unifying framework. In: KDD, vol 96, pp. 82–88 4. Zadeh LA (1973) Outline of a new approach to the analysis of complex systems and decision processes. IEEE Trans Syst Man Cybern 1:28–44 5. Saurkar AV, Gode SA (2015) Association rule mining with fuzzy logic: an overview. Int J Sci Res (IJSR) 4(6):823–827 6. Hipp J, Gu¨ntzer U, Nakhaeizadeh G (2000) Algorithms for association rule mining—a general survey and comparison. ACM Sigkdd Explor Newsl 2(1):58–64 7. Kapila D, Chopra V (2015) A survey on different fuzzy association rule mining techniques. Int J Techno Res Eng 2(9)
123
CSIT (December 2016) 4(2–4):249–254 8. Mangalampalli A, Pudi V (2009) Fuzzy association rule mining algorithm for fast and efficient performance on very large datasets. In: Fuzzy systems, 2009. FUZZ-IEEE 2009. IEEE international conference on, pp 1163–1168 9. Zadeh LA (1965) Fuzzy sets. Inf Control 8(3):338–353 10. www.grouplens.org/movielens 11. Roy A, Chatterjee R (2013) A survey on fuzzy association rule mining methodologies. IOSR J Comput Eng (IOSR-JCE) 15(6):1–8 12. Mangalampalli A, Pudi V (2009) Fuzzy association rule mining algorithm for fast and efficient performance on very large datasets. In: Fuzzy systems, 2009. FUZZ-IEEE 2009. IEEE international conference on. IEEE, pp 1163–1168