Predictability of evolving contacts and triadic closure in human face-to-face proximity networks

The analysis of link structures and particularly their dynamics is important for enhancing our understanding of the underlying (social) processes. Thi...

0 downloads 187 Views 1MB Size

Download PDF

Soc. Netw. Anal. Min. (2014) 4:217 DOI 10.1007/s13278-014-0217-1

ORIGINAL ARTICLE

Predictability of evolving contacts and triadic closure in human face-to-face proximity networks Christoph Scholz • Martin Atzmueller Mark Kibanov • Gerd Stumme

•

Received: 15 December 2013 / Revised: 18 June 2014 / Accepted: 7 July 2014 Ó Springer-Verlag Wien 2014

Abstract The analysis of link structures and particularly their dynamics is important for enhancing our understanding of the underlying (social) processes. This paper analyzes such structures in networks of face-to-face spatial proximity: we focus on evolving contacts and triadic closure and present new insights on the dynamic and static contact behavior in real-world networks, where we utilize face-to-face contact networks collected at three different conferences using the social conference guidance system Conferator [Atzmueller et al. 2011, 2014]. We analyze network dynamics and the predictability of all, new and recurring links. Furthermore, we especially investigate the strength of ties, their connection to triadic closure, and examine influence factors for predicting triadic closure in face-to-face proximity networks.

1 Introduction In recent years, online and offline social networks have attracted significant interest. Analyzing how networks evolve over time, for example, is of special interest in the field of social network analysis. Especially the prediction

C. Scholz (&) M. Atzmueller M. Kibanov G. Stumme Knowledge and Data Engineering Group, University of Kassel, Kassel, Germany e-mail: [email protected] M. Atzmueller e-mail: [email protected] M. Kibanov e-mail: [email protected] G. Stumme e-mail: [email protected]

of links between users is an important and challenging task in the fields of network science. In particular, the goal of link prediction is to infer the existence of future links in a graph Gtþ1 , based on the observations of a given graph Gt . A detailed study of this problem is not only essential from a theoretical point of view. In most social network platforms, for example, in Facebook, LinkedIn, etc., friendship recommenders are responsible for a large number of links. Therefore, accurate link predictions are very important to increase the quality of such friendship recommenders. Understanding the processes on the creation of links between actors is one foundational issue for high quality of link predictions. Different factors contribute to the creation of links, for example, existing connections (Scholz et al. 2012), common neighbors (Backstrom and Leskovec 2011), and their respective tie strengths. Identifying connections among evolutionary processes, link creation, and factors for strengthening existing ties is also of significant interest in this context. In general, all those features are relevant for the analysis of a wide range of social networks. In this paper, we consider dynamic and static aspects of contact structures in networks of face-to-face spatial proximity. Our application context is given by academic conferences. The human face-to-face contact networks are collected during a set of conferences using the social conference guidance system Conferator (Atzmueller et al. 2011, 2014). The Conferator system is based on an RFIDbased proximity sensing system developed by the SocioPatterns collaboration (http://www.sociopatterns.org) which enables the convenient detection of face-to-face contacts. In this work, we particularly study two aspects in the context of analyzing the contact behavior of actors at conferences. First, we consider the link prediction problem in evolving face-to-face contact networks. Second, we

123

217 Page 2 of 17

analyze triadic closure in such networks. In particular, our contribution can be summarized as follows 1.

2.

3.

4.

To the best of the authors’ knowledge, we present the first link prediction analysis for different types of links [all, new (Liben-Nowell and Kleinberg 2003) and recurring links (Scholz et al. 2012; Backstrom et al. 2013)] on evolving face-to-face contact networks. We analyze the predictability of the next k new links that arise after a given point of time in face-to-face contact networks, introduce a temporal weighting model for this link prediction task and show that it outperforms baseline models. We define the concept of a reversed triadic closure and analyze its properties in networks of face-to-face proximity. We present the first work of analyzing the strength of ties and its connection to triadic closures in face-toface proximity networks. Furthermore, we study features of nodes in the context of triadic closure, which include different (academic) roles of participants, as well as demographic features, and network proximity measures between pairs of nodes.

The rest of this paper is structured as follows: Sect. 2 discusses related work. After that, Sect. 3 describes the RFID hardware setting and gives a detailed overview on the collected real-world datasets. In Sect. 4, we present the analysis: first, we consider the predictability of links in evolving face-to-face proximity networks. This means that we evaluate the predictability of links at different points of time during a conference. In addition, we analyze the predictability of the next k new links in such networks. Next, we analyze the strength of ties and its connection to triadic closure in face-to-face proximity networks. Furthermore, we define and analyze the concept of reversed triadic closure and investigate influence factors in the context of triadic closure. Finally, Sect. 5 summarizes our results and discusses future work.

2 Related work This section discusses related work concerning the analysis of human contact behavior, link prediction, and triadic closure, and puts our work in context. 2.1 Human contact behavior The analysis of human contact patterns is an interesting and challenging task. Eagle et al. (2009) presented an analysis using proximity information collected by bluetooth devices. However, with bluetooth technology it is (up to now) not possible to reliably detect face-to-face contacts. For the

123

Soc. Netw. Anal. Min. (2014) 4:217

detection of face-to-face proximity we use a new generation of active RFID tags that have been developed by the SocioPatterns collaboration. In this context, Alani et al. (2009) first presented an application based on this RFID technology at the ESWC 2009 conference. Furthermore, Barrat et al. (2010) compared the attendees’ contact patterns with their co-authorship and activity in social web platforms. Using the same technology, Isella et al. (2011a, b) and Stehle´ et al. (2011) studied the dynamics of human contacts in healthcare environments, schools and museums. Macek et al. (2012) analyzed the dynamics of participants’ contact patterns at conferences, and also the connection between academic jobs and roles at a conference. Focusing on the dynamics of such structures, Atzmueller et al. (2012) described the dynamics of community structures and roles. 2.2 Link prediction A first comprehensive analysis of link prediction as well as the fundamental problem definition was done by LibenNowell and Kleinberg (2003), where the authors defined the link prediction problem and analyzed the predictability of several unsupervised methods. Weighted variants of different network proximity measures are presented and analyzed in Murata and Moriyasu (2007). Wang et al. examined the impact of human mobility on link prediction in Wang et al. (2011). Lichtenwalter et al. (2010) presented a new unsupervised (a restricted variant of rooted PageRank) and a new supervised method for the prediction of new links. Backstrom and Leskovec (2011) introduced a novel supervised method, based on supervised random walks, for the prediction of new links. However, most of these works analyzed the predictability of new links in online social networks like Facebook or DBLP. The prediction of new links in offline social networks has been largely neglected. In Scholz et al. (2012), we presented a first analysis concerning the predictability of new and recurring links in real-world face-to-face contact networks. In Scholz et al. (2013), we showed that the predictability of new links can be further improved by data from online networks. We also proposed a new unsupervised link prediction method that combines the information of different networks. In Tsugawa and Ohsaki (2013) the authors analyzed also the quality of unsupervised methods in the context of link prediction in face-to-face proximity networks. They also compared the predictability of links in face-toface contact networks and other types of social networks, supporting our earlier work in Scholz et al. (2012). 2.3 Triadic closure The concept of triads was introduced by Simmel (1950). In Rapoport (1953), the authors presented the definition of

Soc. Netw. Anal. Min. (2014) 4:217

Page 3 of 17

217

triadic closure. Romero et al. (2011) developed a framework for analyzing the social theories balance (Heider 1946, 1958), exchange (Emerson 1962; Walker et al. (2000) and betweenness.Dong et al. (2013) studied social balance on a call network. Romero and Kleinberg (2010) studied the directed triadic closure process and provided evidence for its importance in link formation on Twitter. Golder and Yardi (2010) investigated structural predictors for tie formation in Twitter focusing on transitivity and mutuality. Lou et al. (2013) presented a factor graph model for reciprocal relationship prediction in Twitter and showed the relationship to triadic closure. In Kossinets and Watts (2006) the authors analyzed influence factors of triadic closure on an evolving e-mail network.

60 s. In Barrat et al. (2008) the authors chose 20 seconds for the purpose of minimizing the statistical error of face-toface contact detection. A proximity tag sends out proximity signals and tracking signals. Proximity signals are used to detect proximity between conference participants; tracking signals are used to determine the current position (at room level basis) of a conference participant. In addition tracking signals can be used to find out whether a participant is present at a specific point of time or not. For more information about the proximity sensing technology, we refer to the web site of the SocioPatterns (http://www.sociopatterns. org) collaboration.

2.4 Discussion

At the three conferences introduced above, we collected networks of face-to-face proximity. Each link in the network indicates physical proximity and can be weighted by the cumulated duration of all face-to-face proximity contacts between the respective pair of persons.

The fundamental difference between our work and existing literature is that (to the best of our knowledge) we present the first link prediction analysis for different types of links (all, new and recurring links) on evolving face-to-face contact networks. Furthermore, we consider network dynamics for the prediction of new links and introduce the first work analyzing the strength of ties and its connection to triadic closures in face-to-face proximity networks. Specifically, we investigate different types of triadic closure link settings and discuss their implications. In this way, we present new insights into the communication behavior of participants during a conference. We expect that these results can help to improve the quality of link prediction algorithms in social networks, e.g., for recommendation methods and in further future applications.

3 Face-to-face contact data In this section, we describe the framework used for collecting the face-to-face contact networks. After that, we summarize the characteristics of the collected real-world datasets. 3.1 RFID setup At the LWA 2010, 2012 and Hypertext 2011 conferences we asked each participant to wear proximity tags that can detect face-to-face contacts (1-1.5 meters) of other participants wearing them (Cattuto et al. 2010). Due to the fact that the human body blocks RFID signals, face-to-face contacts can then be detected. As in Szomszor et al. (2010), Cattuto et al. (2010), Scholz et al. (2012, 2013), Macek et al. (2012), Barrat et al. (2008), we record a face-to-face contact when the length of a contact is at least 20 s. A contact ends when the proximity tags do not detect each other for more than

3.2 Datasets

Table 1 provides a detailed overview of the three collected face-to-face proximity datasets. As already observed in many other contexts (Cattuto et al. 2010; Isella et al. 2011b; Macek et al. 2012) the distributions of all aggregated face-to-face contacts’ lengths between conference participants are heavy-tailed (see Fig. 1). More than the half of all aggregated face-to-face contacts are less than 200 s and the average contact duration is less than 1 min, Table 1 General statistics for the collected datasets LWA 2010

HT 2011

LWA 2012

#days

3

3

3

#Reg. participants

109

95

66

jVj

77

68

42

jEj

1,004

698

478

Average degree (G)

26.07

20.53

22.76

APL ( G )

1.7

1.76

1.45

d(G)

3

4

3

AACD

797

529

1,023

jVday1 j

63

60

38

jVday2 j

68

47

33

jVday3 j

52

29

23

jEday1 j

426

481

261

jEday2 j

775

244

314

jEday3 j

161

68

57

Given the social network graph G ¼ ðV; EÞ, where V is the set of participants and an edge e ¼ ðu; vÞ 2 E represents a face-to-face contact between two participants u and v. Furthermore APL is the average path length, d the diameter and AACD the average aggregated contact-duration (in seconds). In addition VdayX is the set of participants at day X and an edge e ¼ ðu; vÞ 2 EdayX represents a faceto-face contact at day X

123

217 Page 4 of 17 5e−01

triadic closures of participants: we investigate the correlation between link strength and triadic closure, and the influence of roles of participants. Furthermore, we define and analyze reversed triadic closure. Finally, we analyze triadic closure on evolving face-to-face proximity networks. Thus, we focus on contact structures from ranging from dyadic to triadic relations to investigate the corresponding contact and interaction dynamics in the given networks of spatial face-to-face proximity. Such analyses yield insights into the underlying social processes and can as well contribute to methodological improvements, for example, for recommendation approaches.

1e−02 5e−04

Probability

(a)

Soc. Netw. Anal. Min. (2014) 4:217

LWA 2010 HT 2011 LWA 2012

20

50

200

1000

5000

0.100 0.001

Probability

4.1.1 Applied network proximity measures

LWA 2010 HT 2011 LWA 2012

20

50

200

1000

5000

Aggregated Contact Length [in Seconds] Fig. 1 Cumulated contact length distribution of all face-to-face contacts (a) and all aggregated face-to-face contacts for all conferences. The x axis displays the minimum length of a contact in seconds, the y axis the number of contacts having at least this contact length, respectively. The axes are scaled logarithmically

but very long contacts are also observed. The diameter, average degree and average path lengths are similar to results presented in Isella et al. (2011b), Atzmueller et al. (2012). In addition we present an overview about how many conference participants joined our experiments. We see that 70.64 % joined our experiment at the LWA 2010, 71.58 % at the HT 2011 and 63.64 % at the LWA 2012.

4 Analysis Finding underlying structural patterns in complex networks helps to understand and to model the behavior of the users. In this section, we analyze some fundamental social network issues that offer explanations in these directions. First, we analyze the predictability of all, new and recurring links throughout the conference and consider network dynamics for the prediction of new links. Then, we analyze

123

4.1 Link prediction in evolving face-to-face contact networks In this section, we discuss the predictability of links in evolving face-to-face contact networks. We start with a brief overview of the applied network proximity measures, before we continue with the definition of the research problem.

0.010

(b)

1.000

Contact Length [in Seconds]

In the following, we first discuss and define the network proximity measures used in our analysis. Focussing on unsupervised methods, most of the predictor scores are based on either node neighborhoods or path information. For our analysis we use the network proximity measures weighted Resource Allocation (Lu¨ and Zhou 2009), weighted Jaccard’s Coefficient (Scholz et al. 2012), weighted Adamic Adar (Adamic and Adar 2003; Murata and Moriyasu 2007) and weighted rooted Page Rank (Liben-Nowell and Kleinberg 2003), since these network proximity measures have shown very good predictor results in previous works, cf. (Liben-Nowell and Kleinberg 2003; Lu¨ and Zhou 2009; Scholz et al. 2012). All of the presented proximity measures are based on the assumption that two nodes have a higher probability to become connected if these two nodes are close in the graph. We model the social network as an undirected multigraph G ¼ ðV; EÞ, where V is the set of participants and an edge e ¼ ðu; vÞ 2 E with weight wðeÞ represents a face-toface contact between two participants u and v with contact duration wðeÞ. The weight wR ðeÞ represents the sum of all face-to-face contact durations between participants u and v. For participant x 2 V the set of neighbors NðxÞ is defined as NðxÞ ¼ fyjy 2 V; ðx; yÞ 2 Eg: As in Lu¨ and Zhou (2009) the network proximity measure weighted resource allocation, based on the neighborhood information of the considered nodes, is defined as

Soc. Netw. Anal. Min. (2014) 4:217

WRAðx; yÞ ¼

Page 5 of 17

X

wR ðx; zÞ þ wR ðy; zÞ P ; 0 z0 2NðzÞ wR ðz ; zÞ z2NðxÞ\NðyÞ

where x and y are conference participants. The weighted variant of Adamic Adar (Adamic and Adar 2003; Murata and Moriyasu 2007) is defined as WAAðx; yÞ ¼

X

wðz; xÞ þ wðz; yÞ P ; log ð z0 2NðzÞ wðz; z0 ÞÞ z2NðxÞ\NðyÞ

where x and y are conference participants. Considering Jaccard’s Coefficient it is more likely that two nodes are connected if these two nodes share a high fraction of their respective neighborhood. The weighted Jaccard’s Coefficient is defined as P z2NðxÞ\NðyÞ wðx; zÞ þ wðy; zÞ P WJCðx; yÞ ¼ P 0 0 ; x0 2NðxÞ wðx; x Þ þ y0 2NðyÞ wðy; y Þ where x and y are conference participants. The rooted PageRank predictor (RPR) (Liben-Nowell and Kleinberg 2003) is an adaption of the PageRank algorithm (Brin and Page 1998) for the link prediction task. The rooted PageRank predictor score between participants r and y is defined by the stationary probability distribution of participant y under the following random walk (LibenNowell and Kleinberg 2003): – –

With probability a, jump to r. With probability 1 a, jump to a random neighbor of the current node.

For the weighted rooted PageRank (WRPR) predictor, the random walk selects a random neighbor n of the current node c with probability P wðc;nÞ , where wðc; dÞ is the c!d

wðc;dÞ

weight of the edge ðc; dÞ. Assume that we want to determine the weighted Rooted PageRank predictor score for participants r and x. In this case, we use participant r as root node and execute the algorithm. As a result, the algorithm computes the stationary probability distribution of all nodes. The predictor score of participant r with respect to participant x is then given by the stationary probability distribution of participant x.

217

½ts ; t as training data. More precisely, for a given t 2 ½ts ; te , we define E\t ¼ fe 2 E j tðeÞ\tg: The training data are thus the edges of the undirected graph G\t ¼ ðV; E\t Þ where two participants u; v 2 V are connected by an edge e :¼ ðu; vÞ in E\t if they had at least one face-to-face contact before t. In analogy to E\t , we define E t ¼ fe 2 E j tðeÞ tg: This means, two participants u; v 2 V are connected by an edge e :¼ ðu; vÞ in E t if they had at least one face-to-face contact after t. For our analysis we consider three different link prediction tasks—to predict 1. 2. 3.

all links, i.e., all links in E t . new links, i.e., all links in E t n E\t . recurring Links, i.e., all links in E t \ E\t .

4.1.3 Prediction approach Let M be a network proximity measure, for instance, 1 of Sect. 4.1.1. For the prediction task we compute a predictor score for each possible (new or recurring) link ðu; vÞ 2 V V using the network proximity measure M. In a realworld application, the predicted links are then all links, where the predictor score is greater than a defined threshold. 4.1.4 Evaluation method For the evaluation process, we use the standard approach of determining the area under a receiver operating characteristic (AUC) value (Hanley and McNeil 1982) directly based on the predictor scores. AUC is a measure that considers the performance of all thresholds. Receiver operating characteristic (ROC) graphs plot the true-positive rate on the y-axis and the false-positive rate on the x-axis, concerning the set of predictions. The AUC value represents the probability that the prediction score of a randomly chosen existing link is higher than the prediction score of a non-existing link (Zhou et al. 2009). Here we note that the AUC value of a random predictor is 0.5.

4.1.2 Problem definition We model the social network as an undirected multi-graph G ¼ ðV; EÞ, where V is the set of participants and an edge e ¼ ðx; yÞ 2 E represents a face-to-face contact between two participants x and y. Each edge e 2 E is labeled with the start time tðeÞ of the respective face-to-face contact. Let ts be the starting time of the conference, te its end time, and t 2 ½ts ; te . We consider all face-to-face contacts during

4.1.5 General link prediction in evolving face-to-face contact networks In the following, we discuss the predictability of links in evolving face-to-face contact networks. Exemplary applications of such an analysis include the integration of the findings into recommendation and personalization approaches. As predictors we choose the described state-

123

217 Page 6 of 17

Soc. Netw. Anal. Min. (2014) 4:217 LWA 2012, WRPR

0.4

0.6

0.8

1.0

AUC 0.7 0.0

0.2

0.4

0.3 0.6

0.8

1.0

0.8

1.0

0.2

0.4

0.3 0.6

0.8

1.0

0.0

0.2

0.4

0.6

Normalized Time

Normalized Time

Normalized Time

LWA 2010, WJC

HT 2011, WJC

LWA 2012, WJC

0.8

1.0

0.8

1.0

0.5 0.0

0.2

0.4

All Links Reccuring Links New Links

0.3

All Links Reccuring Links New Links

0.3 0.6

1.0

AUC 0.7

0.9 AUC 0.7 0.5 0.4

Normalized Time

0.8

AUC 0.7 0.0

All Links Reccuring Links New Links

0.9

0.3

All Links Reccuring Links New Links

0.3 0.6

1.0

0.5

AUC 0.7 0.5

AUC 0.7 0.5

0.4

0.8

0.9

LWA 2012, WRA

AUC 0.7 0.3

0.6

HT 2011, WRA

0.5

0.2

0.4

LWA 2010, WRA

All Links Reccuring Links New Links

0.0

0.2

Normalized Time

0.9

0.2

0.0

Normalized Time

All Links Reccuring Links New Links

0.0

All Links Reccuring Links New Links

Normalized Time

0.9

0.2

0.9

0.0

All Links Reccuring Links New Links

0.3

0.3

All Links Reccuring Links New Links

0.5

AUC 0.7 0.5

0.5

AUC 0.7

0.9

0.9

HT 2011, WRPR

0.9

LWA 2010, WRPR

0.6

Normalized Time

0.8

1.0

0.0

0.2

0.4

0.6

Normalized Time

Fig. 2 AUC values for all, new and recurring links at different points of time during the conference. The x axis represents the normalized conference time, the y axis displays the AUC value. The first row

shows the results with weighted Rooted PageRank, the second row depicts the results for weighted resource allocation and the third row shows the results for weighted Jaccard’s Coefficient

of-the-art predictors. We note that for calculating a predictor score for a (possible) recurring link between two nodes x and y a proximity measure based on node neighborhoods does not take into account a direct connection between x and y. In Fig. 2, we plot the results exemplarily for the network proximity measures weighted rooted PageRank, weighted Resource Allocation and weighted Jaccard’s Coefficient.

We note here that the results using the the network proximity measure weighted Adamic Adar are very similar to the results presented in Fig. 2. In this figure we observe the predictor scores at different points of time during the conference. We considered the problem that given a snapshot of a face-to-face contact network at time t, we analyze the predictability of all future links (that arise after time t) till the end of conference. As expected, Fig. 2

123

Soc. Netw. Anal. Min. (2014) 4:217

wtemp ðeÞ ¼

217

wðeÞ ; c0

0

where c ¼ ð1 þ cÞni . In our evaluation we analyze the predictability for the next k ¼ 4; 8; 16; 32 new links. For analyzing the impact of 0.75

LWA 2010

0.60 0.45

0.50

0.55

AUC

0.65

0.70

0 0.005 0.01 0.02 0.04

4

8

16

32

k 0.70

HT 2011

0.60

AUC

0.65

0 0.005 0.01 0.02 0.04

0.55

shows that recurring links are easier to predict than new links. For all three conferences we observe that the predictor score for recurring links is close to 1 at the beginning of the respective conference. This means that there is an increased probability that all participants who had a faceto-face contact together at the beginning of the conference will have a face-to-face contact again at some point of time later at the conference. In addition, we also observe that recurring links are harder to predict towards the end of the conference. This observation can be explained by the fact that in the course of the conference more and more recurring links are formed, which makes it harder to predict them. Furthermore, we observe that the predictability of all and new links at the beginning of the conference is always random (the respective AUC values are 0.5). The predictability of all and new links increases monotonically for all datasets until half of the conference duration is reached. This increase can be explained by new participants joining the conference, which implies a better predictability of new links. The scores of both predictors for all and new links decrease after half of the conference, which is partially caused by participants leaving the conference. We also see that at the end of the conferences new links are predicted between participants, who are not close in the graph. To our surprise the proximity all used network proximity measure show an equal performance for the prediction of recurring links on all datasets. Here we expected that the rooted PageRank outperforms the neighborhood-based measures, because rooted PageRank is able to use information of the direct neighborhood.

Page 7 of 17

0.50

4.1.6 Next-k link prediction in evolving face-to-face contact networks

4

8

16

32

k 0.65

LWA 2012

0.55 0.50

AUC

0.60

0 0.005 0.01 0.02 0.04

0.45

In a social network it is important to understand the dynamic contact structures, i.e., why and when the next links are formed, to analyze the evolution of the network. Therefore, we consider an adaption of the general link prediction problem, the next k link prediction problem. This means that we analyze the predictability of the next k new links that arise after time t. In addition, we introduce a temporal weighting model for analyzing the impact of older and newer links: the model gives newer links a higher weight than older links. For this purpose, we damp the weight of all links of the current contact graph by a factor c at each time an edge is added to the contact graph. We say that an edge is temporal weighted if its weight gets damped by a factor c [ 0 when new edges are added to the graph. More specifically, when n edges have already been added to the contact graph and edge e was the ith such added edge, then the weight of edge e is given by

4

8

16

32

k

Fig. 3 AUC values for the next k ¼ 4; 8; 16; 32 new links. For the analysis we used the damping factors c ¼ 0; 0:005; 0:01; 0:02; 0:04 (the data are displayed to the legend)

123

217 Page 8 of 17

4.2.1 Problem definition The concept of triadic closure (see Fig. 4) was first introduced by Rapoport (1953). The basic principle of triadic closure is that there is an increased likelihood that two people become friends in the future if they have a friend in common. This means, given that participant A is connected to participant B with weight wR ðA; BÞ [ 0 and participant A is connected to participant C with weight wR ðA; CÞ [ 0, we calculate the probability that there exists an edge from B to C. In this context triadic closure means that an edge from B to C closes the triangle. In the following, we analyze the probability that B and C are connected, depending on the edge weights and the status of node A, given that there exists an edge from A to B and from A to C. As in Easley and Kleinberg (2010), we need to categorize all links in the network as weak or strong links. Given a threshold T 2 N we define a link ðA; BÞ between participants A and B as strong; if wR ðA; BÞ T typeðA; BÞ ¼ ; weak; if wR ðA; BÞ\T

123

C

1.0

A

0.2

0.4

0.6

0.8

LWA 2010 HT 2011 LWA 2012

0

In this section, we analyze the correlation between link strength and triadic closure, and study the influence of different factors concerning triadic closure in face-to-face proximity networks. In addition, we introduce and analyze the reversed triadic closure problem in these networks.

?

B

0.0

4.2 Triadic closure in face-to-face contact networks

Fig. 4 Triadic closure concept: does there exists a link between person B and person C, given that person A is connected to person B and person C

Probability to be connected

older and newer links we choose in addition different damping factors. Here, we refer to the damping factor c ¼ 0 as the baseline method, because links are not temporally weighted. For the analysis we use the network proximity measures discussed in Sect. 4.1.1 as predictor and choose the transition probabilities with respect to the weightings of the temporal weighting model for c 2 f0; 0:005; 0:01; 0:02; 0:04g. In addition, we considered only pairs of participants who were present within the time interval the next k new links were formed. For the justification of our results we used a t test. We say that our results are significant if the corresponding confidence intervals do not cross each other. In Fig. 3, we plot exemplarily the results using the weighted rooted PageRank as predictor. We observe that using the temporal weighting model performs significantly better for the next k new links than the baseline method (baseline is c ¼ 0). This indicates that new links are better predictable, considering information of recently generated links. We here note that this trend could also be observed when we use the network proximity measures weighted Resource Allocation, weighted Jaccard’s Coefficient and weighted Adamic Adar.

Soc. Netw. Anal. Min. (2014) 4:217

5

10

15

20

Common Neighbors

Fig. 5 Probability for a link between two participants x and y as a function of common neighbors between x and y

where wðA; BÞ is the weight of edge ðA; BÞ. In our analysis we consider three cases. (1) The links ðA; BÞ and ðA; CÞ are both strong links (strong). (2) Exactly one of the links ðA; BÞ or ðA; CÞ is a strong link (medium). (3) The links ðA; BÞ and ðA; CÞ are both weak (weak). 4.2.2 General triadic closure in face-to-face contact networks Online social networks have attracted great interest in the last decade. In this domain understanding the strength and importance of relationships between individuals is very important. The triadic closure principle indicates that persons who have many friends in common are also likely to be friends. When we consider weak and strong in this context ties, it is unlikely that two individuals have a strong connection with a common friend and do not know each other. In this work we consider this issue and analyze triadic closure in face-to-face contact networks. In Fig. 5, we plot the probability for a link in the face-to-face contact networks as a function of common neighbors. As already

Soc. Netw. Anal. Min. (2014) 4:217

Page 9 of 17

b Fig. 6 Probability for a link between two participants B and C, when

LWA 2010

0.9 0.8 0.7 0.6 0.4

0.5

Probability for closed triangle

1.0

participant A is connected to B and C. We differentiate among three types of connection strengths (strong, medium and weak). The x-axis represents the threshold to define the link strength, the y-axis the probability that B and C are connected

weak medium strong mean

0.01

0.02

0.05

0.10

0.20

0.50

Threshold

0.6 0.4

0.02

0.05

0.10

0.20

0.50

0.20

0.50

Threshold

0.5

0.6

0.7

0.8

0.9

weak medium strong mean

0.4

Probability for closed triangle

1.0

LWA 2012

0.01

0.02

observed in online networks (Backstrom and Leskovec 2011) the probability to be connected increases linearly with the number of common neighbors on all datasets. On the other hand, people without many common neighbors are very unlikely to be connected in the face-to-face contact graph. In the context of triadic closure, this means that there is an increased probability for a friendship if they have a friend in common (Rapoport 1953). In Fig. 6, we analyze the probability (for each case strong, medium and weak) that participants B and C close the triangle. Here, the x-axis represents the used time threshold and the y-axis the probability to close the triangle. We note that each contact duration of participant A is here normalized by the maximal contact duration of A itself. Given participant X, this means that we set the weight of the link ðA; XÞ to w0R ðA; XÞ ¼

0.7

0.8

0.9

weak medium strong mean

0.5

Probability for closed triangle

1.0

HT 2011

0.01

217

0.05

0.10

Threshold

wR ðA; XÞ ; maxðwR ðA; X1 Þ; . . .; wR ðA; XjNðAÞj Þ

where X1 ; . . .; XjNðAÞj are the face-to-face contacts of participant A. The time threshold categorizes each link as strong or weak. For the justification of our results we also used a t test. As expected considering the strong case we observe that an increasing time threshold leads to a significant higher probability for closing the triangle than in the medium and weak case. In addition, even one strong link (medium case) has a higher probability to close the triangle than a triangle without a strong link. However the confidence intervals of the medium and the weak case cross each other, which means that these results (medium case vs. weak case) have to be read very carefully. The mean-line (null hypothesis) in this figure is the probability that the triangle is closed independent of the link strength. In Fig. 7 we distinguish whether node A is a professor or not. We observe no significant differences between nodes labeled as professor and those not labeled as professor at the LWA 2010 and LWA 2012 conference datasets, because the corresponding confidence intervals cross each other. At Hypertext 2011, we observe the interesting trend that triangles are rather closed (for greater thresholds) if node A is not a professor. We note here that due to data sparseness the corresponding x axes of Figs. 6 and 7 are different. A potential explanation concerns the fact that the LWA is a German conference that is regularly visited by a rather

123

217 Page 10 of 17

Soc. Netw. Anal. Min. (2014) 4:217 b Fig. 7 Probability for a link between participants B and C when

LWA 2010

0.8 0.7 0.6 0.5 0.4

strong(prof) strong(not prof)

0.3

Probability for closed triangle

0.9

1.0

(A,B) and (A,C) are both strong links (the x axis defines the threshold). We differentiate between two cases: the red-line (circle) indicates that participant A is a professor, the black line (triangle) that A is not a professor

0.002

0.005

0.010

0.020

0.050

0.100

Threshold

HT 2011

0.5

0.6

0.7

0.8

0.9

1.0

In this section we introduce and analyze the concept of reversed triadic closure in face-to-face contact networks, complementing our analysis on triadic closure by considering the direction of the respective interactions. In contrast to the triadic closure described above, we basically consider the ‘reversed’ directions of the links, i.e., considering the reversed weights, i.e., wR ðX; AÞ instead of wR ðA; XÞ, with X 2 fB; Cg. Given participant X, this means that we set the weight of the link ðA; XÞ to

0.4

strong(prof) strong(not prof)

0.3

Probability for closed triangle

4.2.3 Reversed triadic closure in face-to-face contact networks

w0R ðA; XÞ ¼

0.002

0.005

0.010

0.020

0.050

0.100

Threshold

0.4

0.5

0.6

0.7

0.8

0.9

1.0

LWA 2012

strong(prof) strong(not prof)

0.3

Probability for closed triangle

stable community, where most of the participants know each other. In contrast, Hypertext is an international conference, where in particular the professors know many attendees from different communities. This does not mean, however, that the attendees the professors talk to necessarily know each other, e.g., due to the international scope of the conference and to changing sets of participants from year to year.

0.002

0.005

0.010

0.020

Threshold

123

0.050

0.100

wR ðA; XÞ ; maxðwR ðX; X1 Þ; . . .; wR ðX; XjNðBÞj Þ

where X1 ; . . .; XjNðXÞj are the face-to-face contacts of participant X. Therefore, given that participant B is connected to participant A with weight wR ðB; AÞ [ 0 and participant C is connected to participant A with weight wR ðC; AÞ [ 0, we calculate the probability that there exists an edge from B to C. This means that we shift the perspective from one originating node and two reference nodes towards two originating nodes and one reference node. For the analysis it is very important to note that it is not necessarily true that wR ðA; BÞ ¼ wR ðB; AÞ. As in the last section we consider the three connection types: strong, medium and weak. In Fig. 8, we analyze the probability that the participant B and C close the triangle. To our surprise we see that the clear structure of Fig. 6 is missing for the reversed triadic closure. In Fig. 9 we observe a possible explanation of this phenomenon. Given that wR ðB; AÞ [ 0 and wR ðC; AÞ [ 0 we classify the case (probability that B and C close the triangle) with respect to the quotient of the maximal contact durations of B and C. To ensure that the quotient is between 0 and 1 we calculate the minimal quotient of the maximal contact durations. We notice that a strong imbalance between the maximal contact durations of the

Soc. Netw. Anal. Min. (2014) 4:217

Page 11 of 17

b Fig. 8 Probability for a link between two participants B and C, when

LWA 2010

0.8

two corresponding participants implies a small probability that these participants are connected.

●

0.6

●

●

●

● ● ● ● ●●●●●●●● ● ●●●●●●●●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

0.4

4.3 Triadic closure in evolving face-to-face contact networks

0.2

●

weak medium strong mean

0.0

Probability for closed triangle

1.0

participant A is connected to B and C. This figure is similar to Fig. 6, but in this case we consider the (normalized) link weights wR ðB; AÞ and wR ðC; AÞ

0.01

0.02

0.05

0.10

0.20

0.50

Threshold

0.8

●

0.6

● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

0.4

●

0.2

●

0.01

weak medium strong mean

0.02

0.05

0.10

0.20

0.50

Threshold

●

●

●

● ● ● ● ●●●●●●● ●●●●●●●●●●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

0.4

0.6

0.8

1.0

LWA 2012

●

weak medium strong mean

0.0

0.2

●

0.01

In Sect. 4.2 we studied triadic closure on aggregated conference face-to-face proximity networks. This means that we do not take into account the dynamics of social networks. Over time, links between individuals form and vanish. Understanding the complex process why ties form or why ties vanish is an important research topic. Therefore in this subsection we analyze triadic closure on evolving face-to-face proximity networks. For this we first define active open, opened, censored and closed triangles. 4.3.1 Definition: active open, opened, censored and closed triangles

●

0.0

Probability for closed triangle

1.0

HT 2011

Probability for closed triangle

217

0.02

0.05

0.10

Threshold

0.20

0.50

We construct an edge eT ¼ ðA; BÞT between two participants A and B in the face-to-face contact network when the contact duration durðA; BÞ between participants A and B is at least T 20 seconds (here we note that 20 s is the shortest possible contact duration). The timestamp tðeT Þ for edge eT ¼ ðA; BÞT is defined as the point of time (during the conference) when the contact duration between A and B is exactly T seconds. Given two time thresholds T1 and T2 we define similar to Romero et al. (2011) an open triad OðA; B; CÞ as graph of three nodes A, B and C with edges ðA; BÞT1 , ðA; CÞT1 and no edge ðB; CÞT2 . We define the timestamp of the open triad OðA; B; CÞ as Ot ðA; B; CÞ ¼ maxðtððA; BÞT1 Þ; tððA; CÞT1 ÞÞ. An open triad OðA; B; CÞ is defined as closed with closing time tððB; CÞT2 Þ when the edge e ¼ ðB; CÞT2 is formed at time tððB; CÞT2 Þ. When participant B or participant C finally leave the conference, all still opened triangles OðA; B; CÞ cannot be closed till the end of the conference and we mark this triangles as censored. We call all not censored and still opened triangles as active triangles. 4.3.2 Development of opened, closed, censored and active triangles Figure 10 depicts the development of opened, closed, censored and active triangles (for T1 ¼ 20 and T2 ¼ 20): the number of active open triangles reaches its maximum at the end of the second day at each conference. At day 3

123

217 Page 12 of 17

Soc. Netw. Anal. Min. (2014) 4:217 b Fig. 9 Probability for a link between two participants B and C, when

0.2

0.4

0.6

0.8

(0,0.2] (0.2,0.4] (0.4,0.6] (0.6,0.8] (0.8,1] Null−Mod.

0.0

Probability for closed triangle

1.0

LWA 2010

0.01

0.02

0.05

0.10

0.20

0.50

Threshold

0.4

0.5

0.6

0.7

(0,0.2] (0.2,0.4] (0.4,0.6] (0.6,0.8] (0.8,1] Null−Mod.

0.3

Probability for closed triangle

0.8

HT 2011

both participants B and C have a strong connection to participant A. The x-axis represents the normalized time threshold for the strong link, the y-axis the probability that B and C are connected. Each line is plotted depending on the quotient of the maximal contact durations of participants B and C. The line labeled with ð0:2; 0:4 for example means that the quotient of the maximal contact durations is between 0.2 and 0.4

the number of censored triangles increases rapidly and hence the number of active triangles decreases rapidly. Here, most of the conference participants left the conference at the end of day 2 or at the beginning of the third day. Towards the end of day 2 we observe a strong increase in the number of closed triangles at each conference. This observation can be explained by the fact that at this point of time the conference poster-sessions took place. Consider the open triad OðA; B; CÞ. From the definition of an open triad it follows that durðA; BÞ T1 and durðA; CÞ T1 . The open triad OðA; B; CÞ closes if the contact duration between participants B and C is equal to T2 seconds. Given the time thresholds T1 and T2 we analyze in the following the probability that the open triad OðA; B; CÞ will be active till time t. We define the connection between two participants as weak if the contact duration between these two participants is at least 20 seconds. The connection between two participants is strong, if the contact duration between these two participants is at least 180 seconds. We here note that every strong connection also satisfies the property for a weak connection. For the analysis we consider four cases: 1.

0.01

0.02

0.05

0.10

0.20

0.50

Threshold

2.

0.9

(0,0.2] (0.2,0.4] (0.4,0.6] (0.6,0.8] (0.8,1] Null−Mod.

3.

0.6

0.7

0.8

4.

0.5

Probability for closed triangle

1.0

LWA 2012

0.01

0.02

0.05

0.10

Threshold

123

0.20

0.50

The links ðA; BÞ, ðA; CÞ satisfy a weak connection and for closing the triangle we require also a weak connection between B and C. The links ðA; BÞ, ðA; CÞ satisfy a weak connection and for closing the triangle we require a strong connection between B and C. The links ðA; BÞ, ðA; CÞ satisfy a strong connection and for closing the triangle we require a weak connection between B and C. The links ðA; BÞ, ðA; CÞ satisfy a strong connection and for closing the triangle we require also a strong connection between B and C.

Considering case (3) it is possible that a weak connection between B and C could have been established before the triad opens. In our analysis we consider this case and differentiate between two sub-cases: (3.1) we do not take into account the triads where the weak connection between B and C has already been established before the triad OðA; B; CÞ opens. (3.2) we set the closing time of the corresponding triads to the opening time of these triads.

Soc. Netw. Anal. Min. (2014) 4:217

Page 13 of 17

b Fig. 10 Development of opened, closed, censored and active trian-

LWA 2010 40000

gles. The ith dotted line symbolizes the end of the ith conference day

Count

0

10000

20000

30000

cummulated open closed censored active open

0

20000

40000

60000

80000

100000

120000

Time [in Seconds]

20000

HT 2011

10000

15000

cummulated open closed censored active open

5000

Count

217

In Fig. 11, we plot the Kaplan–Meier curves for all defined cases. First, we observe that the probability is significantly decreased to exceed time t when we just require a weak B–C connection to close the triad. However this case is trivial and can be expected. In addition, we observe that (comparing case 2 and case 4) the probability is significantly decreased to exceed time t for triads that require a strong connection to close the triad if participants A and B and participants A and C have a strong connection. This emphasizes also the role of strong ties in this context. Considering the cases (1) and (3.1) we observe the interesting phenomenon that there exists no significant difference in probability to exceed time t (that hold on all datasets). Considering instead the cases (1) and (3.2) we notice that a strong connection between participants A and B and participants A and C has a significant influence on closing the triad with at least a weak connection. We notice that at each conference more than 40 % of all triads has already a weak connection between B and C before the triad opens. This additionally emphasizes the role of strong ties. In general it is interesting to see that a strong connection between participants A and B and participants A and C has significant influence on the survival time of one triad.

0

4.3.3 Influence factors of triadic closure on evolving faceto-face contact networks

100000

120000

140000

160000

180000

Time [in Seconds]

12000

LWA 2012

0

2000

4000

Count

6000

8000

10000

cummulated open closed censored active open

0

20000

40000

60000

Time [in Seconds]

80000

In the following we analyze a number of factors concerning triadic closure in evolving face-to-face proximity networks. We study different factors that are based on the nodes’ attributes (like for example academic status) as well as on network features (for example common neighbors). Additionally, we consider the similarities of papers of the different actors. Given the open triad OðA; B; CÞ, let fðA1 ; B1 ; C1 Þ; . . .; ðAm ; Bm ; Cm Þg be the set of all open triangles at time Ot ðA; B; CÞ; the function shortestPathDBLP ðB; CÞ ¼ n defines the shortest path distance between participants B and C in the coauthorshipnetwork DBLP, while paperSimðB; CÞ obtains the paper similarity between B and C. To determine the paper similarities we crawled all papers from all conference participants that are listed in DBLP (since 2006). For each paper we created a bag-of-words representation and weighted the terms with a simple term frequency (TF) weighting. The paper similarity between two participants is then defined as the cosine similarity of the corresponding bag-of-word representations. Furthermore, the parameter b defines the

123

217 Page 14 of 17

Soc. Netw. Anal. Min. (2014) 4:217 b Fig. 11 Kaplan–Meier curve for all conferences. An event is the

Kaplan Meier Curve for LWA 2010

0.8

1.0

closing of a triangle. The y axis shows the probability that a triangle will be active till time t (given by the x axis)

0.6

(C2) T1=20, T2=180 (C4) T1=180, T2=180 0.4

(C3.1) T1=180, T2=20

0.2

(C1)T1=20, T2=20

0.0

(C3.2) T1=180, T2=20

0

500

1000

1500

Time (in minutes)

0.8

(C2) T1=20,T2=180

0.6

1.0

Kaplan Meier Curve for HT 2011

(C4) T1=180, T2=180

0.4

(C1)T1=20, T2=20

0.2

(C3.1)T1=180, T2=20

0.0

(C3.2)T1=180, T2=20

0

200

400

600

800

1000

Time (in minutes)

median of all possible pairs of paper similarities. Given a network proximity measure M we define MedianM ðB; CÞ as MedianðMðB1 ; C1 Þ; . . .; MðBm ; Cm ÞÞ. The network proximity measures Common Neighbors (CN) and Resource Allocation (RA) are defined as CNðx; yÞ ¼ jNðxÞ \ NðyÞj and X 1 ; RAðx; yÞ ¼ jNðzÞj z2NðxÞ\NðyÞ where x and y are conference participants. We then consider the following binary variables: ( 1 if CNðB; CÞ MedianCN ðB; CÞ CNðB; CÞ ¼ 0 otherwise ( 1 if RAðB; CÞ MedianRA ðB; CÞ RAðB; CÞ ¼ 0 otherwise ( 1 if B and C are both professors Prof ðB; CÞ ¼ 0 otherwise ( 1 if B and C are both PhDs PhDðB; CÞ ¼ 0 otherwise ( 1 if B and C are both PhD-C. PhDCðB; CÞ ¼ 0 otherwise ( 1 if shortestPathDBLP ðB; CÞ ¼ n DBLPðn; B; CÞ ¼ 0 otherwise ( 1 if paperSimðB; CÞ b PaperSimðB; CÞ ¼ 0 otherwise

0.6

0.8

1.0

Kaplan Meier Curve for LWA 2012

0.4

(C2) T1=20, T2=180

(C4) T1=180, T2=180 0.2

(C1) T1=20, T2=20 (C3.1) T1=180, T2=20

0.0

(C3.2) T1=180, T2=20 0

200

400

600

800

Time (in minutes)

123

1000

As in Kossinets and Watts (2006) we use multivariate survival analysis for the study of influence factors in the context of triadic closure. In Fig. 12 we plot the hazard ratios and their 95% confidence intervals for all conferences. A hazard ratio of g indicates that the instantaneous probability of triadic closure increases ðg [ 1Þ or decreases ðg\1Þ by a factor of g relative to the reference category. A hazard ratio of g ¼ 1 means that the predictor has no effect in predictability of triadic closure. We say that a covariance is significant if the corresponding confidence interval does not cross the hazard ratio g ¼ 1. In our analysis we focus on triads with time threshold T1 ¼ T2 ¼ 20 and T1 ¼ T2 ¼ 180 seconds. The results show that the probability of triadic closure is significantly increased if the distance between participants B and C is 1 in the co-authorship network

Soc. Netw. Anal. Min. (2014) 4:217

Page 15 of 17

LWA2010

HT2011

DBLP(1)−180 DBLP(1)−20 DBLP(2)−180 DBLP(2)−20 RA−180 CN−180 CN−20 RA−20 DBLP(3)−20 Prof−20 PhD−20 PaperSim−180 PhD−180 PaperSim−20 PhD_C−180 DBLP(3)−180 Prof−180 PhD_C−20 DBLP(>3)−20 DBLP(>3)−180

LWA2012

DBLP(1)−180 RA−180 CN−180 CN−20 DBLP(1)−20 RA−20 Prof−20 Prof−180 PhD_candidate−180 DBLP(>3)−20 PhD_C−20 PaperSim−180 PaperSim−20 DBLP(>3)−180 DBLP(3)−180 PhD−20 DBLP(2)−180 DBLP(3)−20 DBLP(2)−20 PhD−180 1

2

3

4

5

Hazard Ratio ± 95% CI

217

DBLP(1)−180 DBLP(1)−20 Prof−180 DBLP(2)−180 DBLP(2)−20 RA−180 PhD_C−180 PhD_C−20 Prof−20 CN−180 DBLP(>3)−180 PaperSim−180 DBLP(3)−20 RA−20 Paper−20 CN−20 DBLP(>3)−20 DBLP(3)−180 PhD−180 PhD−20 2

4

6

Hazard Ratio ± 95% CI

8

5

10

15

Hazard Ratio ± 95% CI

Fig. 12 Hazard Ratios for different predictors. The predictors are sorted by hazard ratios. In this figure, 20 indicates that we use T1 ¼ T2 ¼ 20, 180 indicates T1 ¼ T2 ¼ 180

DBLP. However this result is not surprising, because it could be expected that co-authors have in general a face-toface contact and tend to have longer face-to-face contacts during a conference. At the LWA 2010 and LWA 2012 there is also an increased likelihood for triadic closure if the distance in the coauthorship network DBLP is two. To our surprise, this result does not hold for the HT 2011 dataset. Here the probability for triadic closure decreases if the DBLP distance between B and C is two. A possible explanation concerns the fact that the LWA is a German conference that is regularly visited by a rather stable community, where most of the participants know each other. In contrast, the Hypertext is an international conference, where most of the participants change from year to year. Considering a DBLP distance above 3 we see that the probability for triadic closure is significantly decreased at the LWA 2010 and HT 2011 conference. However, this result does not hold for the LWA 2012 conference. It is interesting to see that on the LWA 2012 and HT 2011 datasets the Ph.D.’s have a significantly decreased probability for triadic closure. The professors show a significantly increased likelihood to form at least a short tie on all datasets. For stronger triangles (T ¼ 180) this result is not significant on the LWA 2010 dataset. The role that Ph.D. candidate plays has no significant influence on triadic closure. Only on the HT2012 dataset we found a weakly increased likelihood for triadic closure. For our surprise the paper similarity has no significant effect on the HT 2011 and LWA 2012 dataset. Only on the LWA 2010 dataset there is a small increased probability for triadic closure if the paper similarity between participants B and C is above the median of all possible paper similarities.

In general, the predictors Common Neighbors and Resource Allocation perform better on stronger triangles (T ¼ 180) than on weaker triangles (T ¼ 20). In addition we see that for all network proximity measures we find a significant effect on predicting triadic closure. In addition we see that Resource Allocation performs better than Common Neighbors. We also tested the performance of other proximity measures like rooted PageRank, Katz (Katz 1953), Preferential Attachment Barabasi (2002) and Jaccard’s Coefficient. Also here we found that Resource Allocation performs best. Overall we observe that all conferences have different characteristics and it is hard to find significant conclusions that hold on all datasets. In addition we observe that academic roles play an important role for the prediction of triadic closure.

5 Conclusions In this paper, we investigated dynamics of contact structures in evolving networks of face-to-face proximity. Overall, such analyses yield insights into the underlying social processes and can as well contribute to methodological improvements, for example, for recommendation approaches. We first presented an analysis of the predictability of links, considering all, new, recurring links, and the k next new links. Our results indicate that recurring links are very well predictable at the beginning of the conference. However, the predictability of recurring links decreases towards the end of the conference. The predictability of new links decreases after about half of the conference. Also, newer links are better suited to predict new links.

123

217 Page 16 of 17

In addition, we studied triadic closure and reversed triadic closures in face-to-face contact networks. We showed that the link strength and academic status of participants have potential influence for closing triangles (depending on the type of the conference), and complemented the analyzes by considering reversed triadic closures. Furthermore, we analyzed influence factors of triadic closure in evolving face-to-face proximity networks. We found that academic roles have a significant influence in the context of triadic closure. Here for example professors have a significant increased likelihood to form at least a short tie. In addition we found that coauthors have a significantly increased probability to close the triangle. For future work, we aim to extend our analysis to additional influence factors concerning the prediction and the triadic closures, and more diverse closure structures. Acknowledgments This work has been supported by the VENUS research cluster at the interdisciplinary Research Center for Information System Design (ITeG) at Kassel University. We thank the SocioPatterns collaboration for providing privileged access to the SocioPatterns sensing platform that was used in collecting the contact data.

References Adamic LA, Adar E (2003) Friends and neighbors on the Web. Soc Netw 25(3):211–230 Alani H, Szomszor M, Cattuto C, den Broeck WV, Correndo G, Barrat A (2009) Live social semantics. In ISWC, pp 698–714 Atzmueller M, Becker M, Kibanov M, Scholz C, Doerfel S, Hotho A, Macek B-E, Mitzlaff F, Mueller J, Stumme G (2014) Ubicon and its Applications for Ubiquitous Social Computing. New Rev Hypermedia Multimedia 20(1):53–77 Atzmueller M, Benz D, Doerfel S, Hotho A, Ja¨schke R, Macek BE, Mitzlaff F, Scholz C, Stumme G (2011) Enhancing Social Interactions at Conferences. IT Inf Technol 53(3):101–107 Atzmueller M, Doerfel S, Hotho A, Mitzlaff F, Stumme G (2012) Face-to-face contacts at a conference: dynamics of communities and roles. In Modeling and Mining Ubiquitous Social Media, volume 7472 of LNAI Backstrom L, Kleinberg JM, Lee L, Danescu-Niculescu-Mizil C (2013) Characterizing and curating conversation threads: expansion, focus, volume, Re-entry. CoRR, abs/1304.4602 Backstrom L and Leskovec J (2011) Supervised random walks: predicting and recommending links in social networks. In WSDM Barabasi A-L (2002) Linked the New Science of Networks. Perseus Pub., Cambridge, Mass Barrat A, Cattuto C, Colizza V, Pinton J-F, den Broeck WV, Vespignani A (2008) High resolution dynamical mapping of social interactions with active RFID. CoRR, abs/0811.4170 Barrat A, Cattuto C, Szomszor M, den Broeck WV, Alani H (2010) Social dynamics in conferences: analyses of data from the live social semantics application. In ISWC, pp 17–33 Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw 30(1–7):107–117 Cattuto C, Van den Broeck W, Barrat A, Colizza V, Pinton J-F, Vespignani A (2010) Dynamics of person-to-person interactions

123

Soc. Netw. Anal. Min. (2014) 4:217 from distributed RFID sensor networks. PLoS One, 5(7):e11596, 07 Dong Y, Tang J, Lou T, Wu B, Chawla NV (2013) How long will she call me? Distribution, social theory and duration prediction. ECML/PKDD 2:16–31 Eagle N, Pentland A, Lazer D (2009) From the Cover: Inferring Friendship Network Structure by using Mobile Phone Data. Proc Natl Acad Sci 106:15274–15278 Easley D and Kleinberg J (2010) Networks, crowds, and markets: reasoning about a highly connected world. Cambridge University Press, Cambridge, UK Emerson RM (1962) Power-dependence relations. Am Soc Rev 27(1):31–41 Golder S, Yardi S (2010) Structural predictors of tie formation in Twitter: transitivity and mutuality. SocialCom 2010:88–95 Hanley JA, McNeil BJ (Apr. 1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1):29–36 Heider F (1946) Attitudes and cognitive organization. J Psychol 21:107–112 Heider F (1958) The psychology of interpersonal relations. Wiley, New York Isella L, Romano M, Barrat A, Cattuto C, Colizza V, Van den Broeck W, Gesualdo F, Pandolfi E, Rava` L, Rizzo C, Tozzi A (2011a) Close encounters in a pediatric ward: measuring face-to-face proximity and mixing patterns with wearable sensors. PLoS One 6:e17144 Isella L, Stehle´ J, Barrat A, Cattuto C, Pinton J-F, Broeck WVD (2011b) What’s in a crowd? Analysis of face-to-face behavioral networks. J Theor Biol 271:166–180 Katz L (1953) A new status index derived from sociometric analysis. Psychometrika 18(1):39–43 Kossinets G, Watts DJ (2006) Empirical analysis of an evolving social network. Science 311(5757):88–90 Liben-Nowell D and Kleinberg JM (2003) The link prediction problem for social networks. In CIKM, pp 556–559 Lichtenwalter R, Lussier JT, Chawla NV (2010) New perspectives and methods in link prediction. In KDD, pp 243–252 Lou T, Tang J, Hopcroft J, Fang Z, Ding X (2013) Learning to predict reciprocity and triadic closure in social networks. ACM Trans Discov Knowl Data 7(2):5 Lu¨ L, Zhou T (2010) Link prediction in weighted networks: the role of weak ties. EPL Europhys Lett 89:18001 Macek BE, Scholz C, Atzmueller M, Stumme G (2012) Anatomy of a Conference. 23rd ACM Conference on Hypertext and Social Media., HT ’12Milwaukee, WI, USA, pp 245–254 Murata T and Moriyasu S (2007) Link prediction of social networks based on weighted proximity measures. In Web Intelligence Rapoport A (1953) Spread of information through a population with socio-structural bias: I. assumption of transitivity. Bull Math Biol 15(4):523–533 Romero DM and Kleinberg JM (2010) The directed closure process in hybrid social-information networks, with an analysis of link formation on Twitter. In ICWSM Romero DM, Meeder B, Barash V, Kleinberg JM (2011) Maintaining Ties on Social Media Sites: The Competing Effects of Balance, Exchange, and Betweenness. In Proc. 5th Intl. AAAI Conference on Weblogs and Social Media. The AAAI Press Scholz C, Atzmueller M, Stumme G (2012) On the predictability of human contacts: influence factors and the strength of stronger ties. IEEE Computer Society, In Proc. of SocialCom Scholz C, Atzmueller M, Barrat A, Cattuto C, Stumme G (2013) New insights and methods for predicting face-to-face contacts. In: Kiciman E, Ellison NB, Hogan B, Resnick P, Soboroff I (eds) Proc. 7th Intl. AAAI Conference on Weblogs and Social Media, Palo Alto, CA, USA, AAAI Press

Soc. Netw. Anal. Min. (2014) 4:217 Simmel G (1950) The Sociology of Georg Simmel. Free Press Stehle´ J, Voirin N, Barrat A, Cattuto C, Isella L, Pinton J-F, Quaggiotto M, Van den Broeck W, Re´gis C, Lina B, Vanhems P (2011) High-resolution measurements of face-to-face contact patterns in a primary school. PLoS One, 6(8):e23176, 08 Szomszor M, Cattuto C, den Broeck WV, Barrat A, Alani H (2010) Semantics, sensors, and the social web: the live social semantics experiments. ESWC 2:196–210 Tsugawa S and Ohsaki H (2013) Effectiveness of link prediction for face-to-face behavioral networks. PLoS One, 8(12):e81727, 12

Page 17 of 17

217

Walker HA, Thye SR, Simpson B, Lovaglia MJ, Willer D, Markovsky B (2000) Network exchange theory: recent developments and new directions. Soc Psychol Quart 63(4):324–337 Wang D, Pedreschi D, Song C, Giannotti F, Baraba´si A-L (2011) Human mobility, social ties, and link prediction. In KDD, pp 1100–1108 Zhou T, Lu L, Zhang Y-C (2009) Predicting missing links via local information. Eur Phys J B 71(4):623–630

123

Predictability of evolving contacts and triadic closure in human face-to-face proximity networks

Recommend Documents