Arab J Sci Eng (2018) 43:935–947 https://doi.org/10.1007/s13369-017-2800-z
RESEARCH ARTICLE - COMPUTER ENGINEERING AND COMPUTER SCIENCE
Interest-Based Clustering Approach for Social Networks Lulwah AlSuwaidan1,2 · Mourad Ykhlef1
Received: 13 March 2017 / Accepted: 6 August 2017 / Published online: 23 August 2017 © King Fahd University of Petroleum & Minerals 2017
Abstract Recently, the applications of community detection have increased because of their effectiveness in identifying communities correctly. Many methods and algorithms have been introduced to bring new insights that will improve community detection in social networks. While such algorithms can find useful communities, they tend to focus on network structure and ignore node interests and interconnections. However, accurate community detection requires the consideration of both network structure and node interests. The best method to achieve this is by utilizing unsupervised models. In this article, we introduce a new approach for social network clustering, termed Interest-based Clustering, which clusters nodes in social networks based on a measure of interest similarity. It considers structure, interaction, and node interest along with nodes friends’ interests. The empirical evaluation of this new approach was done using real dataset crawled from Twitter. The approach outperforms well-known community detections algorithms, SCAN, Fast Modularity, Zhao et al., in terms of modularity, connectivity, and overlapping. Keywords Clustering algorithms · Data mining · Social computing · Social network · Twitter
B
Lulwah AlSuwaidan
[email protected] Mourad Ykhlef
[email protected]
1
Department of Information Systems, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
2
Department of Information Management, College of Computer and Information Sciences, Al-IMAM Mohammad Ibn Saud Islamic University, Riyadh, Saudi Arabia
1 Introduction Emerging media platforms, such as social media, have changed the way of how information is shared. Applications such as marketing, business, politics, and health use social media to spread information or influence other individuals/communities. The majority of these applications depend on targeting the right community in social media that matches the targeted campaign. The proposed approach, termed Interest-based Clustering (IntClus), acquires this aim by identifying certain interests; then, IntClus shows a graph representing people who have the same interests. This is considered as the first step toward successful information diffusion because it starts with the most relevant community where users can share the information quickly and spread it to nearest community. The problem of community detection has received maximum attention recently [1–4]. The majority of existing papers [5–8] do not consider overlapping properties. However, the problem of information diffusion requires overlapping community detection to simplify the diffusion process because each node could belong to multiple communities at the same time. On a normal basis, applications targeting a certain community in a social network require knowledge of that community’s interests, yet many existing community detection methods cover only node structure or node attributes [5,7,9]. In [5], a method called scan statistics was proposed by Wang and Phoa to cluster social networks based on network structure and attributes. In [7], Zhang et al. have introduced a joint probabilistic model to detect communities, which combines node attributes and topological structure. Identifying network communities is considered as a problem of clustering a set of nodes into communities, but a particular node can belong to multiple communities at once. Therefore, covering aforementioned issues requires the introduction of a
123
936
Arab J Sci Eng (2018) 43:935–947
new approach based on density clustering and node interests. In this article, a proposed method for community detection, IntClus, is presented. The purpose of IntClus is to address the problem of detecting a community based on common interests. It also ensures the overlapping property between communities because it is targeted to enhance the diffusion process afterward. The problem with the majority of interest clustering methods is to consider each user profile as a text document and use information retrieval methods to handle it [10]. Existing works done on the web and social media have utilized the Latent Dirichlet Allocation (LDA) [11] method, which has limited effectiveness for social networks. The LDA method treats each node in the network as document, which is time-consuming and inaccurate. It is inaccurate because handling each user’s tweets or posts in social network as a document and then compute the similarity between users will give an inaccurate results. Moreover, LDA fails to achieve accurate and consistent results in terms of big data collections or large dataset. It is also affected by “order effect” which shows different results for the same topic. LDA treats the same topic with different processing and generates different results. The proposed IntClus method has overcome the limitation of LDA by utilizing a list-based approach that can infer user interest according to its expert lists. The crucial property of IntClus is its ability to cluster (or group) nodes based on how users share common interests with their neighbors. It introduces a new interest similarity measure, and it is inspired by the well-known density approach, SCAN [12], which is simple and effective in identifying core, hub, and outlier nodes. IntClus considers the effectiveness of SCAN and introduces new interest similarity measures while improving the drawbacks of SCAN. Whereas SCAN only computes the similarity between two nodes based on network structure, IntClus covers both network structure and user interest. To quantify the performance of IntClus, we measure modularity [13], density [14], connectivity [15], and overlapping [16] in real dataset collected from Twitter. IntClus can produce reasonable network division, produce sufficient connectivity and density, and provide communities with levels of overlapping so they can be used for information diffusion. We have included more community detection method for experiment, and we have considered in our experiment both topological such as Fast modularity and topical community detection such as Zhao et al. approach. The rest of the article is organized as follows. Section 2 briefly reviews related works. In Sect. 3, a description of the IntClus approach is presented. The experimental evaluation is provided in Sect. 4, and Sect. 5 concludes the article.
123
2 Related Works Recently, online social network becomes part of human daily activities. Therefore, companies and organizations tend to make use of the rapidness of social networks. Viral marketing is one of the marketing methods by using this platform. It is based on people who have influence over plenty of relatives such as family and friends throughout their connections. Those people are the target of companies since marketing via those people will have a great impact. The most crucial part in viral marketing is choosing the right influential nodes called seeds which affects the diffusion process afterward. In specific, targeting the community instead of one influencer is beneficial in terms of contagion spreading and coverage. This section is going to cover the most related subjects to community detection and interest-based community detection. It also presents a brief background of interest similarity measures. 2.1 Community Detection Among current trends in social networks, the study of community detection has received much attention recently. Its basic premise is finding similar entities that share the same interests. The two possible ways of detecting communities are focusing on network structure or node features [9]. In this paper, we will consider the density property of nodes which reflects the real-world representation where nodes are densely connected within one community with limited links between communities. Also, we will include the topic interests of each community. Before describing IntClus approach, we will review the existing comparative studies of community detection. Leskovec et al. [1] provided a detailed comparison of community detection methods. Their comparison concentrated on two aspects: examining the quality of the clusters and quantifying the structural properties of the resulting clusters. Flow and spectral-partitioning methods were the first aspects discussed to compare the local spectral-partitioning algorithm [17] with the flow-based Metis+MQI algorithm [18]. The comparison found that the Metis+MQI algorithm is better than the local spectralpartitioning algorithm at the nominal task of finding cuts with low conductance. In the other aspect of the study, heuristic algorithms were covered. The findings showed that this kind of algorithm better correspond with small-scale network and a considerable quality of clusters. Ahn et al. [19] have evaluated community detection based on node attributes, and Yang and Leskovec [20] performed an evaluation of network communities based on ground-truth. The evaluation was based on comparing the different structural definitions of network communities with ground-truth, and it covered 13 different structural definitions concentrating on sensitivity, robustness, and performance in identifying ground-truth.
Arab J Sci Eng (2018) 43:935–947
Yang and Leskovec findings showed that conductance and the triad-participation-ratio performed best in detecting groundtruth communities. The majority of community detection methods are assuming a standard division in which the node is assigned to only one community [2–4]. However, in real graphs, the nodes belong to different communities (or clusters). For example, one node can belong to science, lifestyle, and health communities simultaneously. Detecting overlapping communities was examined in a proportion of studies in the field of graph clustering and community detection. Xie et al. [21] revealed the need for more investigations and improvements in detecting networks with high overlapping density and high overlapping diversity. Moreover, they suggested that over-detection and under-detection be considered when developing detection algorithms. In terms of realworld social networks, the results in [21] showed that the sparsity of a social network affects the performance of detection algorithms. Yang et al. [9] established a connection between both edge structure and node interest by introducing an Edge Structure and Node Attributes algorithm to detect overlapping communities in networks. Yang and Leskovec [22] introduced the Community-Affiliation Graph Model (AGM) to capture dense and overlapping communities; AlSuwaidan et al. [23] have proposed the spreading framework to detect overlapping clusters that share common interests. This framework, which is based on incremental clustering to ensure the evolving property of social networks, also covers the problem of information diffusion and predicts the social graph. In addition, AlSuwaidan and Ykhlef [24] have introduced a new Viral Marketing Information Diffusion (VMID) model that is constructed upon three main components: social analysis, structure modeling, and diffusion modeling. Structure modeling utilizes incremental clustering to detect overlapping clusters. In last, overlapping property is crucial in community detection specifically with applications of information diffusion and spreading. Table 1 presents a comparison between state-of-art community detection methods in terms of overlapping and type of community detection approach. These dimensions have included in the comparison because the proposed approach solves the limitations in the existing community detection approaches. Overlapping property of community detection mostly concerns about the topical interest of each community. Therefore, overlapping is crucial property to be considered because it effects the overall diffusion process between communities. Tacking into account topic and topology while detecting community will affect the community detection results. According to Table 1, the majority of existing approaches have focused on topology. Since IntClus approach concentrates on topological and topical community detection approach, the type of community detection has included to measure the dominant of current approaches.
937 Table 1 Community detection methods comparison Approach
Year
Overlap
Type
Fast modularity [25]
2004
No
Topology
SCAN [12]
2007
No
Topology
EAGLE [27]
2009
Yes
Topology
Zhao et al. [28]
2011
No
Both
Lim and Datta [29]
2012
No
Both
CESNA [9]
2013
Yes
Topology
LinkSCAN [30]
2014
Yes
Topology
OCDDP [26]
2017
Yes
Topology
The methods are listed in a chronological order starting from 2004 where the Fast Modularity method proposed by Clauset et al. [25]. It finds disjoint communities in dense graph without determining the role of each node in the community. The majority of community detection algorithms are handling the problem of disjoint ones. Recently, limited researches have considered overlapping in community detection. Moreover, high proportion of current approaches have concentrated on topological structure of networks without considering the topic or interest of each users. In a meanwhile, the dense network has considered in SCAN algorithm [12] where core, hub, and outliers have indicted. This comparison indicates the limitations of exiting community detection algorithms where this research handles them by IntClus which concentrates on detecting overlapping communities based on users interests and their interactions to build the network structure. 2.2 Interest-Based Community Detection Only recently have researchers begun to examine systematically the effect of topic-based community detection or clustering users based on their interest. Ding [31] presented two types of community detection approaches: the first is topology-based, and the second is topic-based. The detection contains different topics within each community in terms of topology-based community detection while includes topological diverse sub-communities within each community. Lim and Datta [32] used a link analysis in predicting Twitter users’ interest by analyzing the following of celebrities’ links. It detects all communities before determining the interest of these communities. Most of the literature suggests that there is some relationship between community detection and clustering. So, Zhao et al. [28] addressed the problem of topical community detection by classifying Twitter users based on topics of interests, using clustering algorithms and link analysis. They focused on the interested topics shared between entities in social networks. At first, the approach begins by grouping all the social objects into topics using subspace clustering algorithm. Then it divides the members that are
123
938
Arab J Sci Eng (2018) 43:935–947
involved in those social objects into topical clusters, each corresponding to a distinct topic. It applies a link analysis on each topical cluster to detect the topical communities which differentiates the strength of connections. In addition, Zhang et al. [33] identified interest-based communities using classical clustering, textual contents, and social structure. In order to accomplish this, the approach requires computing user similarity including both textual contents and social structure. The limitation behind this approach is the number of features considered for computation which requires time and space and not scalable for large data. For example, it includes tweet text, URLs, hashtags, following relationship and retweeting relationship which considers them all related to interests. On the other hand, Goel et al. [34] introduced a machine-learning-based method termed the SIMILAR-TO framework to discover similar users with shared interests. Similarities among users are based on the content they share. Graph mining has also been used to partition Twitter users based on their interest [35–37]. An opinion-based community is a recent concept that has introduced by Chen et al. [38]. They proposed a generative graphic model called People Opinion Topic (POT) model. It detects social communities, associated hot discussed topics, and performs sentiment analysis simultaneously by modeling user’s social connections, common interests, and opinions in a unified way. 2.3 Interest Similarity Background As discussed in Sect. 2.1, research in community detection has passed through various stages of development. However, the most crucial stage involves computing how two members are similar and determining how they should be assigned to the same community. The greatest research efforts have concentrated on link analysis, prediction, and network structure without considering the user interests specific to social networks. Limited attention has been devoted to combining both network structure and interest analysis. The majority of experiments on similarity have focused on profile similarity. For instance, Akcora et al. [39] proposed an original user similarity measure that includes both network and profile in its calculations. In [40], Mazhari et al. used a mining model to determine the level of friendship. This was a pre-step to find similarity based on the information available in a user’s profile. Dougnon et al. [41] identified the problem of profile similarity by presenting an algorithm named partial graph profile inference (PGPI+). The PGPI+ algorithm has the ability to estimate user profiles by knowing the partial social graph [41]. The user similarity measures have been used widely in different domains, and recommender system was one of the dominant by using collaborative filtering [42,43]. It was effective because it can easily finds relative items based on their past behavior. Because of limited space, we will focus
123
our discussion on the most well-known distance measures that have been used in the context of community detection and data mining. Cosine similarity measure has been used intensively in information retrieval and data mining because of its efficiency in high-dimensional positive spaces. It concentrates on binary similarity in text and document vector distance measurement [44]. However, it fails to figure out the similarity between users in social networks because this type of similarity has to handle each user profile as a text document which has high time and cost complexity. In a meanwhile, Minkowski distance family [44] has a crucial limitation that effects its suitability for social networks since it requires a knowledge about the relevant features which increase dimensionality. Lp-norm distance measure, Euclidean ( p = 2) and Manhattan ( p = 1), is dominantly applied in transportation to compute the distance between two points. However, it has limited capability when the data have high sparsity, distribution, and noise. In this context, Lp-norm distance measure may not work well with social networks since they have high sparsity and users have different profile features [44]. Recently, more attention has been given to find the interest similarity between two nodes. However, profile similarity suffers from one or more of the following limitations: (1) not all users provide complete and accurate information, and (2) most profile similarity methods are based on social graphs and predicted interests. Therefore, recent studies have attempted to handle these limitations [45–47]. Han et al. [47] measured interest similarity according to the user’s social features, such as location or gender, by discovering the relation between interests and social features [45]; accordingly, they determined how two users are alike or unalike. In addition, Wu et al. [46] applied topic modeling on tags, the technique used known as LDA [11], to infer user interest. Then, they proposed the friend recommendation algorithm by user similarity graph (FRUG). In [45], Han et al. introduced the community similarity degree (CSD) to compute the interest similarity among multiple users in a community.
3 Interest-Based Clustering Research investigating community detection and user clustering based on attributes has revealed the need to introduce a new improvement in the way of community detection. In this section, we address the problem of community detection using the density clustering method while considering user interest. This section contains two parts: the first is a definition of the proposed similarity measure, and the second is the definition of the IntClus approach. Our proposed method, developed here, will consider network structure and interest analysis, which passes through three main stages: (1) defining the network structure by building interaction graphs between network nodes, (2) proposing new interest similarities, and
Arab J Sci Eng (2018) 43:935–947
939
(3) proposing a new interest-based community detection by utilizing density graph clustering.
Table 2 Interest similarity example
User
Interest Technology (65)
v
3.1 Interest Similarity Definition
Medical (10) Art (5)
The interest similarity σ between two nodes v and w σ (v, w) mainly concentrates on finding node similarity by computing node interests based on how many of the node’s friends are sharing the same interest. Interest similarity consists of four main components: (1) for each interest, determine the number of friends in the node’s graph who are sharing the same interest; (2) for each interest, compute the interest occurrence I nt Occ(v j , wk ) between two interests (i.e., j for node v and k for node w); (3) identify interest combinations for each interest in node v and w, respectively; and (4) choose the maximum interest occurrence combination for each interest and normalize the summation of maximum interest occurrence for each interest by the number of interests. We perform the unification of the number of interests for both nodes v and w, so they will have the same number of interests. The unification considers the top interests, which have a higher number of node friends who share the same interest. Definition 1 (Interest Rate) Let assume A is the total number of v friends and B is the total number of w friends. Also, let assume the number of v friends who have interest item j is r (v j ), and the number of w friends and interest item k is r (wk ). Then X and Y are the interest rates of v and w, respectively, which represents the rate of the number of friends with the same interest j for user v and the same interest k for user w to the total node friends: B A , Y = log (1) X = log r (v j ) r (wk ) Definition 2 (Interest occurrence) We define the interest occurrence (IntOcc) between two interest nodes v j and wk as an application from + × + into [0,1]. The interest rate X and Y defined in Eq. 1 will be used in the definition of interest occurrence to compute how specific node interest is high or low with respect to his/her friends: I nt Occ(v j , wk ) =
1 (1 + X + Y )−1
i f v j = wk i f v j = wk
(2)
Definition 3 (Interest combinations) Let I be a set of node interests for each interest of v and w ; an interest combinations (I ntComb( j)) of two nodes, v and w, is computed as an application from I × I . It requires computing interest occurrence (I nt Occ( j, k)) defined in Eq. 2 between interest j and k. I ntComb( j) = {I nt Occ( j, k)| j ∈ I (v) ∧ k ∈ I (w)}
(3)
Medical (40) w
Engineering (75) Law (2)
Definition 4 (Interest similarity) The interest similarity σ (v, w) between two nodes v and w is as follow: σ (v, w) =
1 max(I ntComb( j)) |I (v)|
(4)
j∈I (v)
If the two nodes v and w share all interests then σ (v, w) = 1. Several components have been presented in the proposed interest similarity. An example can be deduced to understand the computations behind it. Let assume there are two nodes v and w need to compute the interest similarity between them. Suppose each node has three interests as shown in Table 2. The v node has Technology, Medical, and Art as his interests. The numbers in parenthesis represent the number of node v friends who have the same interest r (v j ). For example, r (vT echnology ) = 65 indicates that 65 of v friends are interested in technology. Table 2 shows the complete required information to calculate the similarity. Let A = 100, which represents the total v node friends, and Let B = 150, which represents total friends of w node friends. The similarity needs to compute the interest combinations I ntComb( j) between two nodes, v and w. To illustrate, v1 = T echnology, v2 = Medical, v3 = Ar t, w1 = Medical, w2 = Engineering, w3 = Law. 1. I ntComb(technology) = I nt Occ(v1 , w1 ) = 0.32 , I nt Occ(v1 , w2 ) = 0.45, I nt Occ(v1 , w3 ) = 0.26, 2. I ntComb(medical) = I nt Occ(v2 , w1 ) = 1, I nt Occ(v2 , w2 ) = 0.43 , I nt Occ(v2 , w3 ) = 0.26, 3. I ntComb(ar t) = I nt Occ(v3 , w1 ) = 0.29 , I nt Occ(v3 , w2 ) = 0.38 , I nt Occ(v3 , w3 ) = 0.24 Then, apply the similarity shown in Eq. 4 as follow: σ (v, w) =
1 (1 + 0.45 + 0.43) = 0.6 3
In the next paragraph, we will show that similarity σ between nodes v and w defined in Eq. 4 verifies the similarity properties:
123
940
Arab J Sci Eng (2018) 43:935–947
Let P be a set of nods as an application P × P into [0, 1]. Our similarity σ (v, w) in Eq. 4 has defined upon I nt Occ(v j , wk ) in Eq. 2. It has to verify the following wellknown mathematical properties [48] 1. ∀v, w ∈ P : σ (v, w) ≥ 0 2. ∀v, w ∈ P : σ (v, v) = σ (w, w) ≥ σ (v, w) 3. ∀v, w ∈ P : σ (v, w) = σ (w, v) Property 1 It is obvious from the definitions of σ (v, w) in Eq. 4 which is mainly based on I nt Occ(v j , wk ) defined in Eq. 2 that if v j = wk , then the value will be equal 1 which makes σ (v, w) equal 1 too, then property 1 hold. On the other hand, if v j = wk , then the ratio (1 + X + Y )−1 as defined in Eq. 2 is applied. Since all X and Y take their values from R + , then σ (v, w) ≥ 0. The ratio satisfies the condition σ (v, w) ≥ 0. Generally, the similarity σ (v, w) verifies Property (1). Property 2 Refer to the definition of I nt Occ(v j , wk ) (Eq. 4), when v j = wk then the similarity σ (v, w) will be 1 since all interest items are the same. Therefore, it satisfies the condition σ (v, v) = σ (w, w) ≥ σ (v, w). I nt Occ(v j , wk ) = (1 + X + Y )−1 ≤ 1 = I nt Occ(v j , v j ), I nt Occ(wk , wk ) Since the user similarity σ (v, w) in Eq. 4 aggregates the interests to find the whole similarity between two users, we assume user should have the same value for interests to satisfy the condition. Thus, the Property (2) is verified. Property 3 The definition of third property states that σ (v, w) = σ (w, v). Accordingly, the values of X + Y in I nt Occ(vi , wi ) (Eq. 2) is affected by the values of A = v friends and B = w friends. Therefore, B A + log )−1 = I nt Occ(v j , wk ) = (1 + log (r (v j )) (r (wk )) A B + log )−1 = I nt Occ(w j , vk ) (1 + log (r (wk )) (r (v j )) Then, I nt Occ(v j , wk ) = I nt Occ(wk , v j ) → σ (v, w) = σ (w, v) , so property (3) is verified. 3.2 Interest Clustering Approach In this section, we will present our new IntClus algorithm which is inspired by the well-known density clustering algorithm Structural Clustering Algorithm for Network (SCAN)[12]. It is simple, efficient, able to identify clusters, core, hubs, and outliers, and it is useful for detecting communities in graphs. The last property is the most crucial one because the problem to be solved here is concentrating on
123
identifying communities on social networks which is represented as graph. However, SCAN algorithm clusters nodes based on structural similarity where it is not sufficient to detect community according to a specific topic as it is the demand by majority of application. The similarity measure considers the user’s interest to cluster the nodes. Application such as viral marketing concerns about nodes interests in order to carefully reach the right customers. In the following subsections, we will provide the description of IntClus approach. 3.2.1 IntClus Approach Description In this section, we describe the IntClus approach which improves the SCAN algorithm. It has the main property in identifying core, hub, and outlier. SCAN algorithm computes the similarity according to node structure and its neighborhood where IntClus computes the similarity based on users interests. The IntClus approach detects communities that share similar interests. The approach has four important features: (1) connectivity, nodes assigned to one community are likely to be connected to each other, (2) overlapping, nodes can be assigned to more than one community, (3) sharing, nodes within one community share similar interests, and (4) density, nodes belong to one community are having more edges between each other. A mathematical formulation is required to describe the process of IntClus approach. It handles the problem of community detection as graph G contains a collection of nodes P and each node has J interests. The network G initiates a C communities each of which has specific interest and may overlap with each other. In general, G denotes the network, and J denotes the node interests (X vj is the J th interest of node v). Meanwhile, IntClus requires a preprocessing step to build the interaction graph G. It is needed because IntClus requires a Graph G in order to compute the similarity. To model how two nodes are interacted with each other to build the network structure, we applied the distance function that is based on how often two nodes or users are interacted with each other. This method proposed by Falkowski et al. [49] as shown in Eq. 5: I nteract(v, w) ⎧ ⎨0 = min(N oIv,w , N oIw,v )−1 ⎩ 1
⎫ if v = w ⎬ if N oIv,w > 1 ∧ N oIw,v > 1 ⎭ Otherwise
(5) where N oIv.w and N oIw.v is the number of interactions between nodes v and w initiated by v and w, respectively.
Arab J Sci Eng (2018) 43:935–947
Algorithm 1: Preprocessing Interest Clustering (PreIntClus)
1 2 3 4 5 6 7 8 9 10 11 12 13
Input: v Output: G(V,E) Fill user list; add user v to user list; Calculate user interaction graph; for each user v in user list do calculate user v interaction with friend user w ; if v = w then I nteract (v, w) = 0; else if N oIv,w > 1 ∧ N oIw,v > 1 then I nteract (v, w) = min(N oIv,w , N oIw,v )−1 ; else I nteract (v, w) = 1; end end
The previous function (Eq. 5) has been adopted because the proposed approach, IntClus, is based on interest and it is required to build and model the input graph as interaction graph. It differs than structural graph because it is based on how nodes interacted with each other rather than how they normally connected to each other. Since nodes interacted with other nodes who share common interest which meets the IntClus approach aim. It is normalized by the number of reciprocal interaction in order to keep the value range between [0–1]. The philosophy of interaction graph is that two nodes with a high number of reciprocal interactions are closer and the distance for two nodes that do not interact is 1. A complete description of Preprocessing Interest Clustering (PreIntClus) algorithm is shown in Algorithm 1. A crucial step after creating the graph is inferring node interests in order to be able to apply the proposed interest similarity. IntClus has applied a novel interest-based inferring method using node’s expert list proposed by Bhattacharya et al. [50]. This method has chosen because it achieves better results than the traditional topic modeling approaches such as labeled LDA [51] and topical recommendations [52]. In addition, social annotation to infer topical expertise has proposed in [53,54]. The list-based method has been used in our IntClus to infer a node’s interests depending on the experts the node follows. Mapping between the node and his or her interests was recorded in the node interest vector. The vector was initiated if and only if node participated to at least 3 experts in interest i. It is also sorted according to the number of experts on a specific topic. While the interests have been associated with each node, the similarity can be computed directly. Specifically, similarity measure proposed in this article represents the important step in IntClus approach. The process of detecting communities starts by receiving the interaction graph along with the parameters ε and μ then stating each node’s vector interest. The interest similarity
941
Algorithm 2: Interest Clustering (IntClus)
1 2 3 4 5 6 7 8 9 10 11 12
Input: ε,μ,G(V,E) Output: Clustrted Graph Infer user v interest list; Prepare user v interest; Check weather topic exist and friends have the same interest; Map user v interest with number of friends sharing the same interest; Calculate user similarity; for each user v in user list do if user v shares interests with his friend w then σ (v, w) = 1; else 1 σ (v, w) = |I (v)| j∈I (v) max(I ntComb( j)) end end
requires identifying node friends’ interest. Once the above steps completed, IntClus will take the interaction graph and the parameter ε and μ. Afterward, IntClus will be processed with the new interest similarity, and identify whether the node is core, hub, or outlier in order to show the resulted clustered graph. Detailed steps of IntClus approach are presented in Algorithm 2. 3.3 Time Complexity of the IntClus Algorithm Denote the number of nodes as n, the average number of friends in the interaction graph which represents the number of edges as m. In each iteration of IntClus, we infer and prepare node v interests which cost O(n). Since we have m edges as friends, then the complexity will cost O(mn). In most cases the time complexity is far less than this. Computation of interest similarity is pure mathematics within for loop which normally cost O(n). The time complexity of determining core, hub, and outliers is O(n). The overall complexity of IntClus algorithm is O(mn 3 ) which is very effective comparable with the features included in the algorithm which covers graph interaction, overlapping, and users’ interests.
4 Experiment Evaluation The experiment will cover two parts: (1) evaluating the proposed interest similarity measures by comparing it with state-of-art methods, and (2) evaluating the proposed IntClus algorithm with community detection approaches. The main objective of the IntClus approach is inferring communities based on node interaction and interests. The approach considers an undirected and weighted social graph G(V, E) with list-based node interests. Some parameters have to be set correctly to reach reasonable results; ε and μ are the most important parameters. The evaluation of IntClus considered four aspects. First, we measured the quality
123
942
Arab J Sci Eng (2018) 43:935–947
of division by using a modularity metric that aims to measure how well IntClus is dividing social networks into communities, each of which consists of nodes having similar interests. Second, we quantified the connectivity of the community by utilizing density; communities that are dense mean that they can diffuse information smoothly. Third, we assessed how nodes are likely to be connected to each other by applying clustering coefficients. Fourth, we measured community overlapping by utilizing node betweenness centrality. Each of the four evaluation measures will be discussed in the next Sect. 4.3.1. Upon conducting the aforementioned measures, a comparison was made between the original SCAN [12] algorithm and the IntClus approach on a real dataset. In addition, we compare IntClus with Fast Modularity proposed by Clauset et al. [25] and Zhao et al. approach [28]. We have considered these two algorithms because they are a very well known in the community detection domain where one considered as topological algorithm and the other take into consideration the network topology and users topics. The implementation of the IntClus approach has been done in Java with JDK 8. All experiments were conducted on a PC with a 2.3 GHz Intel Core i5 processor and 6 GB of RAM. 4.1 Dataset Description The dataset utilized in this work was collected using the Twitter Search API. Since the IntClus approach aims to include users with common interests (i.e., those who tweet about similar topics), we retrieved an expert list for each user and applied the list-based methodology discussed in Sect. 3.2.1 to infer user interest. Since the proposed similarity measure requires knowing friends’ interests, we mapped between each topic, including the number of friends who shared interest in the same topic. Each user is associated with a vector to store the inferred interests. Each interest is described by only one word and 25 topic interests is the maximum number of topic associated. Interests are sorted in ascending way in the vector which more frequent topic is at the top. It is ordered by this manner to easily compute the interest similarity since the number of frequent topics is crucial in the computation. The dataset is considered a real dataset because it is collecting real Twitter users with their topic expert lists. It is collected using twitter4j using Java JDK to integrate Java applications with Twitter. The dataset size contains 9081 Twitter users with 17,573 edges. Even though the size is relatively small, the overall dataset size reaches more than 5 GB; this is due to the problem of interest clustering, which uses friends’ interests and expert lists, along with the number of a user’s friends who share common interests, to infer user interest. The required data needed for IntClus approach is focusing on inferring node interests by knowing node expert list. This minimizes the required space and time for processing. Other
123
information such as number of followers, number of following, the posts content, or number of posts is not part of IntClus approach processing which guarantee the approach objective in minimizing the space and time. The graph topology for IntClus is constructed based on the interaction between nodes in the networks as described in Sect. 3.2.1. This way is more effective than following–follower relationship because people in social network may have huge number of followings but they are not interest about what they usually post than the ones who rapidly interact with.
4.2 Interest Similarity Evaluation We will verify the effectiveness and efficiency of proposed interest similarity discussed in Sect. 3.1. The experiment focuses on measuring the performance in respect to time. We take into consideration the variation of number of users and number of topics per user. In a meanwhile, a comparison has done between the proposed similarity and interest similarity ISCoDe proposed by Jaho et al. [55] and Han et al. approach [45,47,56]. In Fig. 1, we have assigned for each user different interests after inferring them using expert-list method. Then, we compute the performance at different number of interests as shown in Fig. 2. This comparison has been considered to guarantee the effectiveness of the proposed similarity at different level of interests. In addition, we include different sizes of number of users to test the performance. This has considered to measure the quality of similarity measure in respect to dataset size.
4.3 IntClus Approach Evaluation In previous Sect. 4.2, we have evaluated the proposed interest similarity with other well-known interest-based similarity which it is part of IntClus approach. In this subsection, we will evaluate throughout experiment the IntClus with stateof-art interest-based community detection approaches.
Fig. 1 Interest similarity measure performance
Arab J Sci Eng (2018) 43:935–947
943
- Density The denser community the more likely suitable for viral marketing because the nodes within community become more connected and likely to share the marketing message. The density measure has firstly introduced by Fortunato [14]. The density has two types either intra-cluster density or intercluster density. What important to note is that intra-cluster density should be larger than average link density whereas the inter-cluster density be smaller than the average link density. Fortunato has defined intra-cluster density δint as [14]: Fig. 2 Interest similarity measure performance
# internal edge of C n c (n c − 1)/2 where C is community and n c is the nodes within C. Similarly, the inter-cluster density δext as:
δint =
4.3.1 Evaluation Measures The goodness measures used in the evaluation of IntClus approach have covered internal measures, density and clustering coefficient, in addition to network measures, modularity, along with measure examining the overlapping between communities.
δext =
# inter-cluster edges of C n c (n c − 1)
(8)
(9)
Yang and Leskove have indicated that density is a measure for assessing community connectivity [20]. They also generalize the density as follow:
- Modularity g(P) =
mp n c (n c−1 )/2
(10)
It is the most well-known and effective measure for assessing the quality of community division. It is proposed by Leicht and Newman as an evaluation measure for community detection [13]. It defined as:
m p = |{(v, w) ∈ E : v ∈ P, w ∈ P}| , and n p = |P|
Q = (fraction of edges within communities)
Which is similar to average link density stated by Fortunato [14].
−(expected fraction of such edges)
(6)
Where P is the set of nodes, (11)
- Clustering coefficient When the edges fall within the range of communities, it is an indicator for higher modularity [57]. Generally, modularity is an NP-complete problem needs for high computational processing for optimization. It considers node degree over all modularity and the probability of two nodes to be connected. It can then be written: kv kw 1 Avw − δcv cw Q= 2 vw 2m
(7)
where Avw is an element of the adjacency matrix, δcv cw is the Kronecker delta symbol, and cv is the label of the community to which node v is assigned. The probability of an v kw , where kv is the edge between two vertices v and w is k2m degree of vertex v and m is the total number of edges in the network. A notable point regarding modularity has sated by Newman and Girvan [58] as the natural community division is normally below 0.7.
It measures how two nodes are likely to be connected with each other. It is a simple evaluation measure concerns about the probability of node connectivity. The premise concentrates on the possibility of pair of nodes that share common neighbors to be connected with each other. In this article, we have selected clustering coefficient measure because it matches the problem of IntClus in cluster similar nodes with their neighbors. Local clustering coefficient for each node is defined as [15]: Cv =
no. of triangles connected to node v no. of triples centered around node v
(12)
The clustering coefficient for the whole graph is the average of the local values Cv :
C=
n 1 ci n
(13)
i−1
123
944
Arab J Sci Eng (2018) 43:935–947
Fig. 4 Comparison of the density measure
Fig. 3 Comparison of the modularity measure
- Betweenness centrality Node betweenness defined by Tang and Liu [16] as the number of shortest paths that pass one node in the network. Once the node has high betweenness centrality, it would have a better communication and diffusion. This feature makes betweenness centrality suitable measure in identifying the overlapped nodes that have high betweenness values. This fact has proven by Bhat and Abulaish [59]. They stated that the nodes with high betweenness centrality are more suitable for viral marketing diffusion since they have higher overlapping. It is computed by the following Eq. 14: C B (u i ) =
v=u=w∈P,v
αvw (u) αvw
(14)
where αvw is the number of shortest paths between v and w, αvw (u) is the number of shortest paths between v and w via u, and P is the set nodes. Social network is a scale-free network that follows a power law distribution. The crucial property behind scale-free network is its ability to generate short paths across the network by utilizing connectivity of network hubs. In this case, it is important to consider node betweenness centrality because it affects the process of viral marketing. For each cluster, IntClus has computed the betweenness centrality values of each node. If the node has value of 1 as betweenness centrality, then it considered as best match for viral marketing. 4.4 Experiment Result The hypothesis behind introducing the IntClus algorithm was to cluster a group of people that share similar interests into one group, which may overlap with other groups if they have multiple interests. For this purpose, the list-based approach was utilized to infer user interest-based on expert interests. In order to find the best results of IntClus, the input parameters were tuned to ε = 0.8 and μ = 2. To illustrate the results, Fig. 3 shows the modularity measure at different ε = 0.8 val-
123
Fig. 5 Comparison of the clustering coefficient measure
ues. It is clearly notable that the highest values range between 0.7 and 0.9 in IntClus, while the values range between 0.3 and 0.5 with SCAN. It is clear that IntClus outperforms all other methods in modularity. This indicates that the IntClus has reasonable quality of division with some overlapping which facilitates the diffusion of any contagion within the network. In addition, Fig. 4 illustrates the results shown by density measure. The IntClus algorithm reached a value of 3 at ε = 1 which SCAN also reaches it at the same value of ε. The notable is that the Fast Modularity and Zhao et al approach have less values at all ε values. The results of the last measure, the clustering coefficient, are shown in Fig. 5. Table 3 summarizes the resulted values. It clearly shows that IntClus outperforms SCAN, Fast Modularity, and Zhao et al approach at quality of division and connectivity. As it is shown in Fig. 3 the IntClus has higher modularity than other approaches especially at ε = 0.7 and ε = 0.8. In addition, it demonstrates the clustering coefficient is nearly the same for IntClus, SCAN and Zhao et al., but it has much variation for Fast modularity. What’s important to note; even though IntClus gives a higher values, it also has additional features since it considers structure, interaction, and node interest along with node friends’ interest. In terms of density, IntClus and SCAN are nearly have the same density while other two approaches have less values. This has happened
Arab J Sci Eng (2018) 43:935–947 Table 3 Evaluation measure comparison
945
Measure
IntClus
SCAN
Fast modularity
Zhao et al.
Modularity
0.42
0.2
0.1
0.28
Density
3
3
2.3
2.7
clustering coefficient
0.72
0.7
0.52
0.68
Fig. 6 Twitter community detection of SCAN algorithm approach Fig. 7 Twitter community detection of IntClus algorithm approach
because of the twitter users have more than one common interests. This makes the cluster has more density between its nodes. Figure 6 illustrates SCAN results using a Twitter dataset, and it shows how SCAN failed in identifying clusters. It is obvious that SCAN resulted in a low quality of division because the Figure shows one big cluster and a few clusters interleaving. This indicates that SCAN grouped most Twitter users in one cluster. In contrast, Fig. 7 demonstrates the results of the IntClus approach. It is clear that there are a number of clusters with overlapping; this is evident from the diversity of resulting clusters, which indicates clusters of a specific interest. The interleaving between cluster is because of common interests between the users. To be able to identify certain interests, the seeker can enter any keyword, and IntClus can show a graph representing the entered interest. For instance, if the seeker is looking for a group of people who are interested in business on social media, the seeker can enter the keywords ‘social media’ and ‘business’. Figure 8 shows the resulting graph of the query entered.
Fig. 8 Graph represents group of people who are interested in social media business
123
946
Arab J Sci Eng (2018) 43:935–947
The results show that 36 clusters out of 118 (31%) have at least one node with betweenness centrality equal to 1. This indicates that IntClus is relatively sufficient to handle the problem of information diffusion to clearly identify the correct community.
5 Conclusion Network community detection is an essential task in many applications especially for social networks. Several methods have introduced in different disciplines including computer science, physics, and business. This article has presented a new approach to tackle the problem of community detection for viral marketing by considering user’s interests along with modeling network interaction as a graph representing nodes and their interactions. IntClus groups similar nodes into one cluster by using interest similarity measure. In addition, the approach takes into considerations list-based approach in inferring user’s interests along with friends’ common interests. The approach has successfully evaluated on real datasets crawling from Twitter. It has shown larger performance comparable with other community detection methods in terms of division, connectivity, and overlapping. The future improvements of IntClus should cover additional datasets such as Facebook, Instagram, and Snapchat with considering specifications of each in the similarity definition. Acknowledgements This work was supported by Deanship of Scientific Research and Research Center of College of Computer and Information Sciences, King Saud University. The authors are grateful for this support.
References 1. Leskovec, J.; Lang, K.J.; Mahoney, M.: Empirical comparison of algorithms for network community detection. In: Proceedings of the 19th International Conference on World Wide Web, pp. 631– 640. ACM (2010) 2. Newman, M.E.: Community detection and graph partitioning. EPL (Europhys. Lett.) 103(2), 28003 (2013) 3. Newman, M.E.: Spectral methods for community detection and graph partitioning. Phys. Rev. E 88(4), 042822 (2013) 4. Zhang, P.; Moore, C.; Newman, M.: Community detection in networks with unequal groups. Phys. Rev. E 93(1), 012303 (2016) 5. Wang, T.-C.; Phoa, F.K.H.: A scanning method for detecting clustering pattern of both attribute and structure in social networks. Phys. A Stat. Mech. Appl. 445, 295–309 (2016) 6. Liu, X.; Liu, W.; Murata, T.; Wakita, K.: Community detection in multi-partite multi-relational networks based on information compression. New Gener. Comput. 34(1–2), 153–176 (2016) 7. Zhang, F.; Li, J.; Li, F.; Xu, M.; Xu, R.; He, X.: Community detection based on links and node features in social networks. In: He, X., Luo, S., Tao, D., Xu, C., Yang, J., Hasan, M.A. (eds.) MultiMedia Modeling. MMM 2015. Lecture Notes in Computer Science, vol. 8935. Springer, Cham (2015)
123
8. Kim, J.; Lee, J.-G.: Community detection in multi-layer graphs: a survey. ACM SIGMOD Rec. 44(3), 37–48 (2015) 9. Yang, J.; McAuley, J.; Leskovec, J.: Community detection in networks with node attributes. In: IEEE 13th International Conference on Data mining (ICDM), pp. 1151–1156. IEEE (2013) 10. Bouadjenek, M.R.; Hacid, H.; Bouzeghoub, M.: Social networks and information retrieval, how are they converging? A survey, a taxonomy and an analysis of social information retrieval approaches and platforms. Inf. Syst. 56, 1–18 (2016) 11. Blei, D.M.; Ng, A.Y.; Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003) 12. Xu, X.; Yuruk, N.; Feng, Z.; Schweiger, T.A.: Scan: a structural clustering algorithm for networks. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 824–833. ACM (2007) 13. Leicht, E.A.; Newman, M.E.: Community structure in directed networks. Phys. Rev. Lett. 100(11), 118703 (2008) 14. Fortunato, S.: Community detection in graphs. Phys. Rep. 486(3), 75–174 (2010) 15. Watts, D.J.; Strogatz, S.H.: Collective dynamics of small-world networks. Nature 393(6684), 440–442 (1998) 16. Tang, L.; Liu, H.: Community detection and mining in social media. Synth. Lect. Data Min. Knowl. Discov. 2(1), 1–137 (2010) 17. Andersen, R.; Chung, F.; Lang, K.: Local graph partitioning using pagerank vectors. In: 47th Annual IEEE Symposium on Foundations of Computer Science, FOCS’06, pp. 475–486. IEEE (2006) 18. Lang, K.; Rao, S.: A flow-based method for improving the expansion or conductance of graph cuts. In: Bienstock, D., Nemhauser, G. (eds.) Integer Programming and Combinatorial Optimization. IPCO 2004. Lecture Notes in Computer Science, vol. 3064. Springer, Berlin, Heidelberg (2004) 19. Ahn, Y.-Y.; Bagrow, J.P.; Lehmann, S.: Link communities reveal multiscale complexity in networks. Nature 466(7307), 761–764 (2010) 20. Yang, J.; Leskovec, J.: Defining and evaluating network communities based on ground-truth. Knowl. Inf. Syst. 42(1), 181–213 (2015) 21. Xie, J.; Kelley, S.; Szymanski, B.K.: Overlapping community detection in networks: the state-of-the-art and comparative study. ACM Comput. Surv. 45(4), 1–35 (2013). doi:10.1145/2501654. 2501657 22. Yang, J.; Leskovec, J.: Structure and overlaps of ground-truth communities in networks. ACM Trans. Intell. Syst. Technol. 5(2), 1–35 (2014). doi:10.1145/2594454 23. AlSuwaidan, L.; Ykhlef, M.; Alnuem, M.A.: A novel spreading framework using incremental clustering for viral marketing. In: 2014 IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA), 10–13 Nov 2014, pp. 78–83 24. AlSuwaidan, L.; Ykhlef, M.: Toward information diffusion model for viral marketing in business. Int. J. Adv. Comput. Sci. Appl. 1(7), 637–646 (2016) 25. Clauset, A.; Newman, M.E.; Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70(6), 066111 (2004) 26. Bai, X.; Yang, P.; Shi, X.: An overlapping community detection algorithm based on density peaks. Neurocomputing 226, 7–15 (2017). doi:10.1016/j.neucom.2016.11.019 27. Shen, H.; Cheng, X.; Cai, K.; Hu, M.-B.: Detect overlapping and hierarchical community structure in networks. Phys. A Stat. Mech. Appl. 388(8), 1706–1712 (2009). doi:10.1016/j.physa.2008. 12.021 28. Zhao, Z.; Feng, S.; Wang, Q.; Huang, J.Z.; Williams, G.J.; Fan, J.: Topic oriented community detection through social objects and link analysis in social networks. Knowl. Syst. 26, 164–173 (2012). doi:10.1016/j.knosys.2011.07.017 29. Lim, K.H.; Datta, A.: Finding twitter communities with common interests using following links of celebrities. Paper Presented at the
Arab J Sci Eng (2018) 43:935–947
30.
31. 32.
33. 34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44. 45.
Proceedings of the 3rd International Workshop on Modeling Social Media, Milwaukee, Wisconsin, USA Lim, S.; Ryu, S.; Kwon, S.; Jung, K.; Lee, J.G.: LinkSCAN*: Overlapping community detection using the link-space transformation. In: 2014 IEEE 30th International Conference on Data Engineering, 31 March 2014–4 April 2014, pp. 292–303 Ding, Y.: Community detection: topological versus topical. J. Inf. 5(4), 498–514 (2011). doi:10.1016/j.joi.2011.02.006 Lim, K.H.; Datta, A.: Following the follower: detecting communities with common interests on twitter. In: Proceedings of the 23rd ACM Conference on Hypertext and Social Media, pp. 317–318. ACM (2012) Zhang, Y.; Wu, Y.; Yang, Q.: Community discovery in Twitter based on user interests. J. Comput. Inf. Syst. 8, 991–1000 (2012) Goel, A., Sharma, A.; Wang, D.; Yin, Z.: Discovering similar users on twitter. In: 11th Workshop on Mining and Learning with Graphs 2013 Petkos, G.; Schinas, M.; Papadopoulos, S.; Kompatsiaris, Y.: Graph-based multimodal clustering for social multimedia. Multimed. Tools Appl. (2016). doi:10.1007/s11042-016-3378-2 Vathi, E.; Siolas, G.; Stafylopatis, A.: Mining Interesting Topics in Twitter Communities. In: Nez, M.; Nguyen, T.N.; Camacho, D.; Trawiski, B. (eds.) Computational Collective Intelligence: 7th International Conference, ICCCI 2015, Madrid, Spain, 21–23 Sept 2015, Proceedings, Part I, pp. 123–132. Springer International Publishing, Cham (2015) Hromic, H.; Prangnawarat, N.; Hulpu, I.; Karnstedt, M.; Hayes, C.: Graph-based methods for clustering topics of interest in Twitter. In: Cimiano, P.; Frasincar, F.; Houben, G.-J.; Schwabe, D. (eds.) Engineering the Web in the Big Data Era: 15th International Conference, ICWE 2015, Rotterdam, The Netherlands, 23–26 June 2015, Proceedings, pp. 701–704. Springer International Publishing, Cham (2015) Chen, H. et al.: People opinion topic model: opinion based user clustering in social networks. In: Proceedings of the 26th International Conference on World Wide Web Companion. International World Wide Web Conferences Steering Committee, 2017. Akcora, C.G.; Carminati, B.; Ferrari, E.: User similarities on social networks. Soc. Netw. Anal. Min. 3(3), 475–495 (2013). doi:10. 1007/s13278-012-0090-8 Mazhari, S.; Fakhrahmad, S.M.; Sadeghbeygi, H.: A user-profilebased friendship recommendation solution in social networks. J. Inf. Sci. (2015). doi:10.1177/0165551515569651 Dougnon, R.Y.; Fournier-Viger, P.; Lin, J.C.-W.; Nkambou, R.: Inferring social network user profiles using a partial social graph. J. Intell. Inf. Syst. (2016). doi:10.1007/s10844-016-0402-y Schafer, J.B.; Frankowski, D.; Herlocker, J.; Sen, S.: Collaborative filtering recommender systems. In: Brusilovsky, P., Kobsa, A., Nejdl, W. (eds.) The Adaptive Web. Lecture Notes in Computer Science, vol. 4321. Springer, Berlin, Heidelberg (2007) Sun, H.-F.; Chen, J.-L.; Yu, G.; Liu, C.-C.; Peng, Y.; Chen, G.; Cheng, B.: JacUOD: a new similarity measurement for collaborative filtering. J. Comput. Sci. Technol. 27(6), 1252–1260 (2012). doi:10.1007/s11390-012-1301-5 Aggarwal, C.C.: Data Mining: The Textbook. Springer, Berlin (2015) Han, X.; Wang, L.; Farahbakhsh, R.; Cuevas, Á.; Cuevas, R.; Crespi, N.; He, L.: CSD: a multi-user similarity metric for community recommendation in online social networks. Expert Syst. Appl. 53, 14–26 (2016). doi:10.1016/j.eswa.2016.01.003
947 46. Wu, B.-X.; Xiao, J.; Chen, J.-M.: Friend recommendation by user similarity graph based on interest in social tagging systems. In: Huang, D.-S.; Han, K. (eds.) Advanced Intelligent Computing Theories and Applications: 11th International Conference, ICIC 2015, Fuzhou, China, 20–23 August 2015. Proceedings, Part III. pp. 375– 386. Springer International Publishing, Cham (2015) 47. Han, X.; Wang, L.; Crespi, N.; Park, S.; Cuevas, Á.: Alike people, alike interests? Inferring interest. Decis. Support Syst. 69, 92–106 (2015). doi:10.1016/j.dss.2014.11.008 48. Belkhirat, A.; Belkhir, A.; Bouras, A.: A new similarity measure for the profiles management. In: 2011 UKSim 13th International Conference on Modelling and Simulation, pp. 255–259. IEEE (2011) 49. Falkowski, T.; Barth, A.; Spiliopoulou, M.: DENGRAPH: A density-based community detection algorithm. In: IEEE/WIC/ACM International Conference on Web Intelligence, vol. 2–5, pp. 112–115. (2007) 50. Bhattacharya, P.; Zafar, M.B.; Ganguly, N.; Ghosh, S.; Gummadi, K.P.: Inferring user interests in the Twitter social network. Paper Presented at the Proceedings of the 8th ACM Conference on Recommender systems, Foster City, Silicon Valley, California, USA 51. Ramage, D.; Hall, D.; Nallapati, R.; Manning, C.D.: Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora. Paper Presented at the Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: vol. 1–vol. 1, Singapore 52. Michelson, M.; Macskassy, S.A.: Discovering users’ topics of interest on twitter: a first look. Paper Presented at the Proceedings of the Fourth Workshop on Analytics for Noisy Unstructured Text Data, Toronto, ON, Canada 53. Sharma, N.K.; Ghosh, S.; Benevenuto, F.; Ganguly, N.; Gummadi, K.: Inferring who-is-who in the Twitter social network. Paper Presented at the Proceedings of the 2012 ACM Workshop on Workshop on Online Social Networks, Helsinki, Finland 54. Ghosh, S.; Sharma, N.; Benevenuto, F.; Ganguly, N.; Gummadi, K.: Cognos: crowdsourcing search for topic experts in microblogs. Paper Presented at the Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, Portland, Oregon, USA 55. Jaho, E.; Karaliopoulos, M.; Stavrakakis, I.: Iscode: a framework for interest similarity-based community detection in social networks. In: 2011 IEEE Conference on 2011 Computer Communications Workshops (INFOCOM WKSHPS), pp. 912–917. IEEE 56. Han, X.; Wang, L.; Park, S.; Cuevas, A.; Crespi, N.: Alike people, alike interests? a large-scale study on interest similarity in social networks. In: 2014 IEEE/ACM International Conference on 2014 Advances in Social Networks Analysis and Mining (ASONAM), pp. 491–496. IEEE 57. Newman, M.E.: Mixing patterns in networks. Phys. Rev. E 67(2), 026126 (2003) 58. Newman, M.E.; Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004) 59. Bhat, S.Y.; Abulaish, M.: Overlapping social network communities and viral marketing. In: 2013 International Symposium on Computational and Business Intelligence (ISCBI), pp. 243–246. IEEE (2013)
123