Jointly published by Elsevier Science Ltd, Oxford and Akadkmiai Kiadr, Budapest
Scientometrics, Vol. 43, No. 3 (1998.) 455-462
Short Communication
COLLABORATION PATTERNS IN THEORETICAL POPULATION GENETICS HILDRUN KRETSCHMER,l B. M. GUPTA2 IlSSI, Borgsdorfer Str. 5, D-16540 Hohen Neuendorf (Germany) 2Scientometrics and Informetrics Group, National Institute of Science, Technology and Development Studies (NISTADS), Dr. K. S. Krishnan Marg, New Delhi 110 O12(India)
(Received September 11, 1998) The paper points out that the characteristic properties of general social networks are reflected in co-authorship patterns of theoretical population genetics as studied from 1900 to 1980. The results are consistent with the analyses of bibliographies where the co-authorship networks in invisible colleges probably have shown the same behavioural patterns as the non-scientific populations. The patterns of behaviour are portrayed in two-dimensional as well as threedimensional representations of co-authorship data in theoretical population genetics.
Introduction A number o f social-psychological and sociological studies have shown that social relations are apt to emerge more-than-coincidentally and frequently between similar persons. The similarity is refered to a range o f most diversified characterstics. Studies have been conducted, for instance, on friendships between groups o f persons classified according to their level o f education or on marriages between persons o f different religious affiliations. 1 Such classification basis m a y also be the age, gender, general approach, etc. This phenomenon is well known as "Birds o f a feather flock together". According to Blau, 2 a graduated structural parameter is a variable which hierarchically classifies the persons o f a population in accordance with their preference. Preference was accorded to insiders and not to outsiders and, additionally, there was an inverse relationship between status-distance (S=X-Y) and the contact-preferences between persons (Z).
0138-9130/98/US $15.00 Copyright 9 1998 Akad~miai Kiadr, Budapest All rights reserved
H. KRETSCHMER,B. M. GUPTA:COLLABORATIONPATTERNS1N GENETICS
z = 1/f(ISI) with X, Y being the status. This representation of the phenomenon is a two-dimensional one. But, there is another property of the characterstic structure underlying interpersonal relations in social networks, 3 the so-called edge effect. This nomenclature is intended to denote those pairs of persons with the status distance S=0 observable and especially noticable at the edge of status characteristics (i.e., persons showing the lowest or the highest educational level). A three-dimensional representation is suitable in this situation, i.e., Z's = f(X, Y).
The aim of a former study 4 was to raise questions about "how and to what extent this structure with its varying aspects of social relations that are generally valid for humanity" is reflected in scientific communities and, in consequence, is exerting a sustainable influence on processes of scientific knowledge. The evidence in this regard has been provided through a study on international co-authorship network in physics. 4 In the present paper the co-authorship data from the international bibliography from 1900 to 1980 has been analysed to study the two-and three-dimensional structures of the scientific community in theoretical population genetics.
Database For this study we have used a comprehensive database on theoretical population genetics speciality available in the form of a printed bibliography entitled "Bibliography of Theoretical Population Genetics", compiled by Felsenstein 5 in 1981. This database covers all forms of literature and represents different phases of the development of theoretical population genetics from 1870 to 1980. The database consists of 7877 documents contributed by 3209 authors. Of the total documents, 65.72% are single-authored, 23.05% two-authored, 5.61% three-authored, 1.37% fourauthored, and the rest more than four-authored.
Methodology Firstly, all contributing authors of this database were classified into groups according to the number of documents per author. For this purpose, "normal count procedure" was used for counting the number of documents per author (NP=i). Every
456
Scientometrics 43 (1998)
H. KRETSCHMER, B. M. GUPTA: COLLABORATION PATTERNS IN GENETICS
appearance of an author's name is counted. Out of 3209 total authors, we obtained 1906 authors having one document per author, i.e., NP=i=I, 465 authors having two documents per author, i.e., NP=i=2, 218 authors having three documents per author, i.e., NP=i=3, 123 authors having four documents per author, i.e., NP=i=4, and so on, as shown in Table 1. Table 1 Number of authors and collaborators in theoretical population genetics NP 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
NA 1906 465 218 123 112 54 38 37 22 30 ' 19 21 16 25 7 5 8 8 8 9 6 5 2 3 3 4 2 3 1 2 3
NC 2350 727 380 282 276 150 95 157 91 120 88 88 80 122 60 13 45 38 54 61 48 59 4 34 25 31 22 17 2 14 12
NP 32 33 34 35 36 37 38 40 41 44 45 46 47 51 52 53 54 56 58 59 62 70 72 75 79 84 98 119 120 123 199
NA 2 1 1 2 1 1 3 2 1 3 1 2 2 1 1 2 1 1 1 1 2 1 1 1 I 2 2 1 1 1 1
NC 21 6 10 31 9 7 46 33 14 23 0 38 18 19 12 19 20 21 7 30 35 30 25 5 22 29 44 9 21 4 9
NP=Number of papers; NA=Number of authors; NC=Number of collaborators.
Scientometrics 43 (1998)
457
H. KRETSCHMER,B. M. GUPTA:COLLABORATIONPATTERNSIN GENETICS
In the second step, each group of authors with the same number of documents per author, as identified above, were taken one by one and then their collaborating authors were identified and counted. For example, there were 1906 authors (having one document per author) observed to collaborate with 2350 authors. Similarly, there were 465 authors (having 2 documents per author) observed to collaborate with 727 authors, and so on, as shown in Table 1. For convenience, we will call the first set of authors as collaborating authors, and the second set of authors as collaborators. Since the data set generated through co-authorship relations was large, and in order to generate a managable matrix of collaborators, we have followed the suggestion of de Solla Price in the third step. According to him, the logarithm of the number of publications or documents is of higher degree of importance than the number of publications p e r se. So, as a result, the collaborating authors with i (i=1) document per author were grouped together (status X=I). The status X=2 is attached to the group of authors with i (i=2 and 3). The status X=3 is attached to the group of authors i (l'=4,5,6,7). The status X=4 is than attached to the group with i (i=8,9,. ...... ,15); X=5 to the group of authors with i(i=16,17,. ..... ,31); X=6 to group of authors with i (i=32,33,. ..... ,63); X=7 to group of authors with i (i=64,65,. ..... 127), and so on. The same one is valid for the collaborators. The status Y=I is attached to the collaborators with j (j=l) document per author, the status Y=2 to the group of collaborators with j (j=2 and 3), etc. The number of collaborators for each group of collaborating authors (with same number of documents per author) were further classified according to the number of documents per author, as shown in Table 2. For example, 1906 collaborating authors (having one document per author, i.e., i=1 and X=I) were observed to collaborate with 2350 collaborators. Of these total collaborators, 1209 collaborators have one document per author (i.e.,j=l and u 325 had two to three documents per author (i.e.,j=2 or 3 and Y=2), 233 have four to seven documents per author (Y=3), and so on. In this way a symmetrical matrix was obtained. This matrix, as shown in Table 2, represents the observed number of collaborators Cxy. The total number of collaborators (ZxZy Cxy) came out to be 6163. In some sociological studies of interpersonal relations in social networks, 3 a new type of index called the "homophily index" is used. This index provides information on the factor, by which the observed frequency in a cell of a matrix deviates from the occupancy of this cell that would otherwise be expected in case of statistical independence from characteristics. In order to calculate this index, we have to convert the present matrix, as shown in Table 2, into a new matrix, as shown in Table 3 using geometric mean.
458
Scientometrics 43 (1998)
H. KRETSCHMER, B. M. GUPTA: COLLABORATION PATTERNS IN GENETICS
Table 2 Numbers of collaborators Cxy by number of documents per author X/Y
1 2 3 4 5 6 7
1
2
3
4
5
6
7
i/j
1
2-3
4-7
8-15
16-31
32-63
64-199
1 2-3 4-7 8-15 16-31 32-63 64-199
1209 325 233 237 165 110 71
325 274 171 156 81 74 26
233 171 105 128 70 79 17
237 156 128 119 73 65 28
165 81 70 73 31 36 23
110 74 79 65 36 36 20
71 26 17 28 23 20 13
2350 1107 803 806 479 420 198
2350
1107
803
806
479
420
198
6163
Sum
Sum
Table 3 Homophily indices Z'xy in theoretical population genetics X/Y
1
2
3
4
5
1 2 3 4 5 6 7
1.875 0,935 0,891 0,857 1~020 0,728 1,006
0.935 1.462 1,213 1.046 0,928 0.909 0.683
0.891 1,213 0.991 1.141 1,067 1.290 0.594
0.857 1.046 1.141 1.003 1.051 1,003 0,925
1.020 0.928 1.067 1.051 0.763 0.950 1.299
The homophily
6
7
0.728 0.909 1.290 1.003 0.950 1.017 1.210
1,006 0.683 0.594 0.925 1.299 1.210 1.684
i n d i c e s Z ' x y a r e d e f i n e d as:
Z'xy = Cxyx G/(Gxx Gy) where G - geometric mean of the matrix data. Gx - geometric mean of the data in row X. Gy - geometric mean of the data in column Y.
Scientometrics 43 (1998)
459
H. KRETSCHMER, B. M. GUPTA: COLLABORATION PATTERNS IN GENETICS
1.3 1.2
N 1.1 t--
.0 o
1.0
*6 t-O
0.9
O 0.8 0.7 -7
-5
-3
-1
1
3
5
Status-Distance (s)
Fig. 1. Relation between status-distance and contact-preference
2.0
N o
1.5
e-
Q.
1.0
0.5 1
I
I
I
I
3
5
7
9
Status
I
11
I
I
13
15
(x+y)
Fig. 2. Edge-effect
460
Scientometrics 43 (1998)
H. KRETSCHMER,B. M. GUPTA:COLLABORATIONPATTERNSIN GENETICS
Results and discussion
Table 3 presents data on the homophily indices Z ' x y of the collaborator network in theoretical population genetics. It is observed that on an average, the homophily indices of the main diagonal (S=0) are higher than the other points in the matrix (shown in Table 3). This corresponds to the indication that "Birds of a feather flock together". The averages of Z's values of the homophily indices Z ' x y with S = constant in dependency upon S are shown in Fig. 1. It demonstrates the inverse relationship between statusdistance I S I and preference of contacts reflected in Zs (with one exception). The edge effect is observable along the main diagonal, as shown in Fig. 2. The values of Z's with X-Y are in accordance with an U-curve.
Fig. 3. Three-dimensionalrepresentationof homophily indices
Scientometrics 43 (1998)
461
H. KRETSCHMER, B. M. GUPTA: COLLABORATION PATTERNS IN GENETICS
The three characteristic properties of the structure named above ("Birds of a feather flock together", edge-effect, and inverse relationship) can be observed together in the three-dimensional representation of all homophily indices, as observed in Fig. 3. The pattern identified in co-authorship networks in subject fields or research specialities such as theoretical population genetics from 1900-1980 refer to the second stable phase of the irreversible process of a structural development. Structural development is, in a way, subdivided into to stable phases with an unstable transitional phase in between, with the system in the second stable phase being more structured than in the first one. The period from the onset of science up to about 1800 could be designated as the first stable phase that is characterized by more or less individual activities with single co-authorship. The trend towards collaboration and cooperation has become so dominant that it is necessary to study such processes within ~cientific research in an effort to gain fundamental knowledge on the intensification of research required because of the retarding growth rate of science in the future. For this reason, during the last few years, increasing activities are carried out internationally dealing with this subject. Therefore, the present study should be an encouragement to continue such analyses in various scientific fields, beyond theoretical population genetrics and physics, too.
References I. P. V. MARSDEN, Models and methods for characterizing the structural parameters of groups, Social Networks, 3 (1981) 1-27. 2. P. M. BLAU, Presidential address: Parameters of social structure, American Sociological Review 39 (1974) 615-635. 3. C. WOLF, Ein Simulationsmodell sozialer Netzwerke and einige unerwartete Einsichten zur ,~hnlichkeit sozialer Beziehungen. In: Tagung 'Netzwerkanalyse' der Deutschen Gesellschafi J~r Soziologie, K61n. 1996 (unpublished) 4. H. KRg'TSCttMER,Patterns of behaviour in co-authorship networks of invisible colleges, Scientometrics 40 (1997) 579-591. 5. J. FELSENSTEIN, Bibliography of Theoretical Population Genetics, Dowen, Hutchinson & Ross, Inc, Pennsylvania, 1981.
462
Scientometrics 43 (1998)