Scientometrics https://doi.org/10.1007/s11192-018-2816-5
Assessing the interdependencies between scientific disciplinary profiles Cinzia Daraio1 • Francesco Fabbri1 • Giulia Gavazzi1 • Maria Grazia Izzo1,2 Luca Leuzzi3,4 • Giammarco Quaglia1 • Giancarlo Ruocco2,4
•
Received: 15 January 2018 Akade´miai Kiado´, Budapest, Hungary 2018
Abstract The investigation of the dynamics of national disciplinary profiles is at the forefront in quantitative investigations of science. We propose a new approach to investigate the complex interactions among scientific disciplinary profiles. The approach is based on recent pseudo-likelihood techniques introduced in the framework of machine learning and complex systems. We infer, in a Bayesian framework, the network topology and the related interdependencies among national disciplinary profiles. We analyse data extracted from the Incites database which relate to the national scientific production of most productive world countries at disciplinary level over the period 1992–2016. Keywords Disciplinary profiles Country-level studies Pseudo-likelihood estimation Incites
Introduction and method The dynamics of national research systems is a topical issue in quantitative science and technology research. The number of works on this issue has seen a considerable increase from King (2004)’s work until the most recent years. Country-level studies of the evolution of disciplinary profiles include (Gla¨nzel 2000; Leydesdorff and Zhou 2005; Zhou and Leydesdorff 2006; Gla¨nzel and Schlemmer 2007; Gla¨nzel et al. 2006, 2008; Hu and Rousseau 2009; Tian et al. 2008; Wong 2013; Wong and Goh 2012; Yang et al. 2012; Radosevic and Yoruk 2014; Bongioanni et al. 2014, 2015; Shen et al. 2016; Li 2017). In particular, Shen et al. (2016) extend the Input-output model of the economist Leontief to investigate the interrelations between the scientific subfields of physics. & Giancarlo Ruocco
[email protected] 1
DIAG, Sapienza University of Rome, Rome, Italy
2
Present Address: Center for Life Nano Science, Fondazione Istituto Italiano di Tecnologia (IIT), Viale Regina Elena 291, 00161 Rome, Italy
3
Soft and Living Matter Lab, Institute of Nanotechnology, CNR-NANOTEC, Rome, Italy
4
Department of Physics, Sapienza University of Rome, Rome, Italy
123
Scientometrics
We consider research systems as complex systems. Once the analogy is well defined the mathematical tools developed for complex systems can be exploited for studying research systems. Science is considered a complex system also according to the sociological perspective (see e.g. Shi et al. 2015) based on the actor network theory (Latour 2005). Networks are general models, which can represent the relationships within or between given systems. The structure and function of complex networks is widely studied in the statistical mechanics: see e.g. the classical reviews by Albert and Baraba´si (2002) and Newman (2003) and the recent book by Baraba´si (2016). Network approaches are widely applied in scholarly evaluation. West and Vilhena (2014) provide a clear introduction to the topic, which has become a ‘‘cornerstone of bibliometric research’’ since the seminal work of Price (1965). The most studied networks include paper-level citation networks (in which nodes are the papers and the links represent the citations between the papers) and coauthorship networks (in which nodes are the authors and the links represent the frequency a pair of authors has coauthored). West and Vilhena (2014) conclude their overview stating that: ‘‘Network-based measures are more complicated than non-network measures, but the richness gained with such a measure is worth the extra effort.’’ The prediction of future links or the reconstruction of missing links from an incomplete network is a related interesting stream of literature (see Guns 2014). In this paper, which is based on Daraio (2017), we adopt a different level of analysis. In our modelling, nodes represent the disciplinary profiles (or the scientific production in a given research area) of countries and the links represent the interdependencies existing between them. In addition, we consider these links as unknown parameters. We infer these unknown parameters by applying recently introduced tools to solve inverse problems in graphical models with wide applications to complex systems. The approach that we propose in the present paper is based on the similarities between statistical physics models for complex systems and research systems. See Table 1. We exploit these analogies (Table 1) to model the interdependencies within the world research system as interactions. Networks represent relationships (links) between structures (entities). They can be formalized according to graph theory as follows (Mezard and Montanari 2009). A network G, also called a graph, is a set m of nodes (or vertices) together with a set e of links (or Table 1 Analogies between physics of complex systems and research systems Concept in the framework of statistical physics
Concept in the framework of scientific system
Node vectorial variable: spin si
Country’s disciplinary profile
Node variable component: si;c
Normalized country’s production on a single discipline
Node interactions or couplings: Jij
Country to country disciplinary profile interdependencies or interrelations
Hamiltonian, : P PN l l l H ¼ 12 b 1;N i;j Jij si sj i¼1 si hi
Generalized cost (social energy) function
b: inverse of the temperature
b: external global parameter
l ¼ 1; . . .; M set of data acquired at time t=1,…,T i ¼ 1; . . .; N, N: total number of nodes
N (=50): largest science producer countries
hi : external magnetic field on i
Contextual variables of country i
123
Scientometrics
edges) connecting them: G ¼ ðm; eÞ. The set of nodes (m) can be any finite set and the links (e) are unordered pairs of distinct nodes e m m. Interaction in physics is considered as a direct - and reciprocal - effect of one entity on one or more entities. The nature and the strength of this effect can be measured by applying tools developed by the statistical physics of complex systems. The concept of interaction in physics can find its correspondence in that of interdependency or interrelations for research systems. The latter means the existence of a mutual influence among countries’ scientific activity: all countries are to some degree affected by the research activity of all other countries. This influence can be considered discipline by discipline or on the basis of the overall scientific production (disciplinary profile). The term interrelation here encompasses all the channels of contact, exchange and so on, between two countries, which affect the convergence of the disciplinary profiles of the two countries. The interaction parameters of the generalized multicomponent spin model adopted here are thus effective parameters embedding several effects. Given the Hamiltonian (H) related to this model (see its formula in Table 1), a positive interaction between two countries actually would lead in the zero-temperature limit to convergence of their disciplinary profiles in order to satisfy the principle of minimum energy. The proposed model is able to handle complex situations. Since we are dealing with a disordered many-body system—whose pairwise interactions Jij actually depend on the couple of nodes i and j—competition can arise between them giving rise to the so-called frustration. This is the case of a spin blocked between two opposite configurations, which is not able to choose the profile to follow grounding on the principle of minimum energy. The ground state thus, in this case, is strongly degenerate. The observation of alignment hence does not necessarily imply a positive interaction as well as the observation of misalignment does not exclude it. Our choice of the generalized multicomponent spin model is also led by simplicity and because it guarantees the possibility to borrow the rigorous methodology developed by Boltzmann machine learning. The aim of the present analysis is to derive the level and structure of these interactions. It is an inverse problem because the inference of the interactions is drawn from a set of data on the disciplinary profiles of countries. According to Judge and Mittelhammer (2011), inverse problems arise when one wants to recover information on model parameters, i.e., coupling constants, by means of measurements of observable data. The solution of an inverse problem offers a connection between the data directly observed and the unknown information on model parameters (Neal 1993). In an inverse problem, the model which generates the observed data is an input of the theory. The model can be gained by an a priori knowledge or hypothesized on general insights. In the latter case its strength can be a posteriori verified by testing its prediction. We propose a new approach to make inference on the network topology and the related interdependencies between country-disciplinary profiles. The approach is developed in a Bayesian framework and relies on some recent pseudo-likelihood techniques introduced in the physics of complex systems (Ravikumar et al. 2010; Aurell and Ekeberg 2012; Marruzzo et al. 2017; Tyagi et al. 2016. In the following we briefly outline the proposed approach. For more details the reader is referred to the ‘‘Appendix’’. According to the Bayes’ theorem (see e.g. Barber 2012): pðfJgjfsgÞ ¼
pðfsgjfJgÞpðfJgÞ pðfsgjfJgÞpðfJgÞ ¼R pðfsgÞ fJg pðfsgjfJgÞpðfJgÞ
ð1Þ
where fsg are the data set and fJg are unknown parameters of a given model, p( ) stands for a probability distribution and pð j Þ for a conditional probability distribution.
123
Scientometrics
R We observe that fJg pðfsgjfJgÞpðfJgÞ is constant with respect to J. The maximum of the conditional probability p(J|s), is equal to the maximum of p(s|J) the so called likelihood function, if one assumes uniform the prior belief about the model parameters, p(J). We consider our system as a disordered system at equilibrium, described by a generalized multicomponent spin model. To this aim, our variables, which are the equivalent of the spins, are defined as normalized shares of publications, as follows: ðcÞ
Di ðtÞ ðcÞ ðcÞ ðcÞ ; Di ðtÞ ¼ ni ðtÞ nðcÞ ðtÞ; si ðtÞ ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi PN ðcÞ 2 i¼1 Di ðtÞ X 1 N ðcÞ nðcÞ ðtÞ ¼ n ðtÞ; c ¼ 1; . . .; D; t ¼ 1; . . .; T: i¼1 i N
ð2Þ
ðcÞ
where c ¼ 1; . . .; D, t ¼ 1; . . .; T, ni are the shares of articles published in a subject category c for a given country i, with i ¼ 1; . . .; N, over the period t ¼ 1; . . .; T (here 1980– 2016). They have the property that sðcÞ ¼ 0 and ðsðcÞ Þ2 ¼ N1 . In this way we account for the recent trend of increasing scientific production all over the world. By the normalization reported above we define as variables only instantaneous fluctuations around the world average production in each given discipline at one data sample recording. The variables si thus do not depend on the average scientific production trend over time. The assumption of equilibrium underlies a Boltzmann-Gibbs distribution, pðfsgjfJgÞ ¼
eHðfsgjfJgÞ ; ZðfJgÞ
ð3Þ
where H is the Hamiltonian of the generalized Ising model (defined in Table 1) and Z is the partition function (normalization factor). It is also possible to draw a link between equilibrium and symmetry of pairwise interactions. Symmetric couplings lead to a steady state described by the Boltzmann distribution and asymmetric ones to a non-equilibrium state (Krapivsky et al. 2010). It is also possible to assign to the system a particular dynamics, which leads it to a given steady state distribution. Recent developments achieved in dynamical inverse Ising models (Decelle and Ricci-Tersenghi 2016; Nguyen et al. 2017) could represent an interesting extension of the present work. The Hamiltonian is obtained by a generalization of the Ising model, originally introduced to describe the behavior of ferromagnetic systems. In its more general formalism, the Ising model can also account for a (node-independent) weight, b, and external biases, hi . When referred to magnetic systems, b is the inverse of temperature and hi the magnetic external field. For the sake of simplicity we fix here b ¼ 1 and hi ¼ 0, 8i 2 ð1; NÞ. Jii ¼ 0, Jij ¼ Jji . The Ising model has been largely applied in different fields, such as modelling the behaviour of magnets in statistical physics (Brush 1967), image processing and spatial statistics (Besag 1986; Geman and Geman 1984; Greig et al. 1989), modelling of social networks (Banerjee et al. 2008). Our choice is thus mainly justified from the possibility to recover and generalize tools developed and already tested in such different contexts. The Ising model, furthermore, is a classical example of a graphical model in exponential form, to which the Boltzmann machine learning (or some of its approximation methods) can be applied, as discussed in ‘Appendix’’. To solve the inverse problem we have to maximize the log-likelihood with respect to the set of parameters fJg:
123
Scientometrics
lðfJgÞ ¼ logðLðfJgÞÞ ¼
T X
Hðfsgt jfJgÞ T logðZðfJgÞÞ:
ð4Þ
t¼1
This is computationally hard to solve and for this reason a pseudolikelihood approach is applied. The solution to the inverse problem consists in finding the optimal values of the set of parameters fJg, which are supposed to generate the observable set of data. We can obtain the optimal values of the set parameters fJg representing the interrelations between a pair of countries either discipline by discipline, by maximizing the discipline-dependent Log-Likelihood function, or for the full disciplinary profile, by maximizing the Log-Likelihood function related to the vectorial variables s whose generic element refers to a given discipline (see Table 1 and the ‘‘Appendix’’ for details). Given a set of parameters fJg, a zero value of the parameter Jij between the pair ij means that the two countries are not interacting, a positive value indicate a tendency to align towards the same disciplinary profile, and a negative value, instead, shows a tendency towards ‘opposite’ disciplinary profiles. We furthermore analyze the interrelations between a pair of countries among two different disciplines. This leads to the definition of a set of cross-discipline coupling parameters fJ cd g, whose generic element Jijcd represents the interrelations of the two disciplines c and d among the couple of countries i, j. The optimal values of the set of parameters fJ cd g in this case maximize the log-likelihood function related to the HamilP tonian H ¼ 12 i;j Jijcd ðsci sdj þ sdi scj Þ. We notice that in this case the Hamiltonian contains also terms with i ¼ j. The parameters Jiicd indeed provide useful information about the interrelations between the couple of disciplines c; d within the same country. We finally emphasize that given the limited temporal range of data available, to enlarge the number of free parameters in the log-likelihood optimization routine, e.g. by considering a HamilP P tonian of the kind H ¼ cd ij Jijcd sci sdj , may hamper the convergence of the routine. The following sections describe the data available and the main results of the analysis. The final section concludes the paper and outlines directions for further research.
Data The data elaborated in this paper was extracted from InCites1. It is a web-based tool which contains bibliometric indicators about scientific production and citations of institutions and countries. The indicators are calculated on the Web of Science (WoS) documents.2 InCites includes many indicators at the country level. InCites indicators are especially used to analyse scientific production over a long period of time (see e.g. Bornmann and Leydesdorff 2013). We analyse the number of Web of Science Documents (P), a measure of total publications for each country. In this study, we use two subject area schemes. A very broad categorization, which is the GIPP schema. It covers all fields of scholarly research and is divided in six disciplines, namely (1) Arts and Humanities, (2) Clinical, Pre-Clinical and Health, (3) Engineering and Technology, (4) Life Sciences, (5) Physical Sciences, (6) 1
It is a product of Clarivate Analytics. Further information are available at https://clarivate.com/products/incites/.
2
The elaborations reported in this paper are based on indicators exported the 2018-02-26 from InCites dataset updated at 2018-02-10 which includes Web of Science content indexed through 2017-12-31.
123
Scientometrics
Social Sciences. In addition, we investigate the scientific production also according to the Essential Science Indicators (ESI) schema which comprises 22 subject areas in science and social sciences and is based on journal assignments. In the ESI schema Arts & Humanities journals are not included and each journal is found in only one of the 22 subject areas. Figure 1 provides an explorative descriptive analysis on the total scientific documents indexed in the WoS (P) in the GIPP areas analysed. In the following, social sciences and humanities were ignored in order to save space and due to their lower research output coverage in WoS. Data problems in bibliometric studies are well known. A common way to reduce them is to analyse macro-level bibliometric data. According to Nederhof (1988), comparative analyses are more reliable when the unit of analysis is more aggregated because in a larger sample size, micro random errors mutually compensate. Another issue of concern is given by the changes of coverage. These refer to the inclusion or exclusion of journals. Small countries with low number of scholarly outputs are obviously more affected by these changes. This may lead to unreliable values when a country only has a small number of scholarly outputs (see e.g. Schubert et al. 1989). To avoid this problem, we investigate the disciplinary profiles of the 50 most productive countries3. Overall, the scientific production of these countries, in terms of total documents in the WoS database represent around the 98% of the total scientific production as indexed in the WoS database. To increase the number of available data we transformed yearly data into weekly data by means of a linear interpolation. The final number of observations considered refers to weekly publication of the number of Wos documents (P) over the period 1992–2016.
Results and discussion In this section we summarize the outcome of our analyses. In particular, we illustrate: 1.
2. 3.
4.
the estimated interactions (or coupling parameters Jij described in ‘‘Introduction and method’’ section) between overall country’s disciplinary profiles and the inferred network topology, with overall disciplinary profile including five GIPP disciplines (without Arts and Humanities), in Fig. 2; the estimated interactions Jij between overall country’s disciplinary profiles including all 22 ESI disciplines, in Panel I of Fig. 3; the estimated interactions Jij between the profile of selected ESI disciplines (Physical Sciences, Computer Science and Medicine) of countries i and j (respectively in Panel II, III and IV of Fig. 3); the estimated cross-discipline interactions Jijcd (which represents the interrelations of the two disciplines c and d among the couple of countries i, j) between selected couple of ESI disciplines (Medicine and Physical Sciences, and Physics and Computer Science interactions) (see respectively Panel V and VI of Fig. 3);
3
The analysed countries are: Argentina (ARG), Australia (AUS), Austria (AUT), Belgium (BEL), Brazil (BRA), Bulgaria (BGR), Canada (CAN), Chile (CHL), China Mainland (CHN), Colombia (COL), Croatia (HRV), Denmark (DNK), Egypt (EGY), Finland (FIN), France (FRA), Germany (DEU), Greece (GRC), Hong Kong (HKG), Hungary (HUN), India (IND), Iran (IRN), Ireland (IRL), Israel (ISR), Italy (IT), Japan (JPN), Malaysia (MYS), Mexico (MEX), Netherlands (NLD), New Zealand (NZL), Norway (NOR), Pakistan (PAK), Poland (POL), Portugal (PRT), Romania (ROU), Russia (RUS), Saudi Arabia (SAU), Singapore (SGP), Slovenia (SVN), South Africa (ZAF), South Korea (KOR), Spain (ESP), Sweden (SWE), Switzerland (CHE) Taiwan (TWN), Thailand (THA), Turkey (TUR), Ukraine (UKR), United Kingdom (GBR), Usa (USA).
123
Scientometrics
Pre Clinical And Health.pdf
Sciences.pdf
Science.pdf
Medicine.pdf
Fig. 1 Ten most productive countries by Total WoS documents (P), period (1980–2017), in three GIPP and three ESI disciplines. From the top left we have: (i) Clinical, Pre-Clinical and Health (GIPP) (ii) Engineering and Technology (GIPP), (iii) Physical Sciences (GIPP), (iv) Physics (ESI), (v) Computer Science (ESI), (vi) Clinical Medicine (ESI)
5.
Inferred network topology derived from the estimated interactions J (illustrated in the previous points) in Fig. 4.
Figure 2 (left panel) shows the estimated interactions obtained by applying the methodology described in the previous section at a broad disciplinary classification. It considers the interactions between the overall disciplinary profile of countries based on five GIPP disciplines, namely (1) Clinical, Pre-Clinical and Health, (2) Engineering and Technology, (3) Life Sciences, (4) Physical Sciences, and (5) Social Sciences. We observe a trend to clustering among countries belonging to a given geo-political or cultural area. There is a strong positive interaction between USA, Great Britain, Canada and Australia; a weak positive interaction of USA with Germany, Netherlands and Japan, and negative interactions with all other countries. France and Germany show positive (although small) interactions between them and with Japan. China, on the other hand, has a strong positive interaction with India and Korea, weaker positive interaction with Iran and Taiwan; weak negative with all other countries.
123
Scientometrics
Fig. 2 Estimated interactions Jij (left panel) and inferred network topology. LEFT PANEL: Interactions Jij between the disciplinary profiles of countries i and j. The disciplinary classification is based on the GIPP scheme; five subjects included (Arts and Humanities were excluded). P The cost function used in the optimization which permits to obtain the Jij interactions is H ¼ 12 i6¼j Jij si sj (see ‘‘Introduction and method’’ section). Only the interactions Jij : jJij j [ 0:1Maxij jJij j are shown. RIGHT PANEL: Graphical representation of the network topology inferred from the values of the Jij parameters reported in the Left panel. The graph shows the most relevant interactions. The diameter of the node representing the i-th country is proportional to the number of interactions Jij . The thickness of the edge depends on the intensity of the related interaction (thicker edge means larger intensity)
Figure 2 (right panel) shows the inferred network topology derived from the estimated J (left panel). It highlights the strong connections between USA, Great Britain, Canada and Australia. The dominant effect of the largest or with largest production countries, such as China or USA, is due to a bias related to the proposed definition of the spins. In this case, a value of the country’s spin corresponds to the volume of the country’s scientific production. Since the Hamiltonian is a sum of terms proportional to the single spin, the contribution arising from the terms containing the largest spin values will be the dominant. The optimization routine leading to the maximization of the pseudolikelihood function will thus be more ’sensitive’ to the largest spin variables with respect to the smaller ones. Figure 3 illustrates the interactions at a more granular level by considering the ESI classification scheme based on 22 disciplines. A careful inspection of the different panels provides interesting insights about countries’ production of knowledge and the existing interrelations. Panel I of Fig. 3 shows—consistently with what observed for the GIPP based analysis (Fig. 2)—the existence of a a kind of ’Anglo-Saxon’ group (positive interactions between USA, Canada and Great Britain), with in addition some quite intense interactions between USA and Israel. On the other hand, USA shows negative interactions with all other countries, i.e. Europe, Latin America, Asia and Russia. These interactions seem to confirm the general trend observed for the coupling parameters fJg inferred from the Scopus data analysed in Daraio (2017). Panel I of Fig. 3 shows that Great Britain has a positive and intense interaction with USA, less intense but positive with Australia, positive and weak with Germany and the Netherlands. Nevertheless, Great Britain has a little interacting with other countries, especially with other Central European countries. Russia shows a positive interaction with
123
Scientometrics
Ukraine and with Germany and France, and positive but weak interaction with Poland but not with other Eastern European countries. As observed in Fig. 3 at a broader disciplinary classification level, China, again, has an intense and negative interaction with almost all other countries, including Japan, with the exception of Korea and Iran, and positive interaction with Poland. Japan instead presents a negative interaction with China and the USA, positive with Korea, Taiwan, Iran and European countries like Germany and Sweden. Also to be noted the positive interactions between Singapore, Taiwan and Korea. A Central Europe group seems to be defined by positive interactions between Germany, France, Italy and Spain. Finally, Brazil shows positive interaction with Argentina and Spain. Panel II of Fig. 3 shows the interactions of countries in Physics. We observe a general ‘opening’, ‘homologation’ or ‘convergence’ between the US, China and Russia. The interactions between USA and Russia, USA and Japan, USA and Italy are now positive (while were negative for the overall disciplinary profile). Positive interactions exist now (in physics) also between Russia, Great Britain and Japan. Finally, China shows positive interactions with India and some European countries like Spain. Panel III of Fig. 3 shows countries’ interactions in Computer Science. The situation is similar to that one observed for Physics but with a more marked US and China ‘dominance’. USA shows positive interactions with France, Germany as well as Italy, but negative with Russia; the positive interactions previously observed with Japan remains. About China, similar situation to the one observed for Physics. In panel IV of Fig. 3, where countries’ interactions in Medicine are illustrated, it appears a clear ’dominance’ of the USA (see also panel IV of Fig. 4), with a silent China (despite the size effect mentioned before). The US interactions are similar to those observed for Physics and Computer Science. The last two panels of Fig. 3 are dedicated to cross-discipline interactions between countries. About cross-discipline interactions between Medicine and Physics (Panel V), as can be seen from the values along the diagonal, the two subjects are positively interacting in countries such as USA, Great Britain, Korea, Russia (although weak) and Germany; negative for Japan and China. There are positive interactions between China and different European countries such as Italy, Spain, Netherlands, Switzerland, Poland. Positive interactions exists between China and the USA and China and Canada. There are positive interaction between USA, Ukraine, Russia, China, Japan and European countries like Germany and France but some negative for example with Italy. Finally, Japan has many positive interactions, but some negative for example with China. About cross-discipline interactions between Computer Science and Physics (Panel VI), again, as can be seen from the values along the diagonal, the two subjects interact negatively (are misaligned) for USA, Russia, Japan, positively (aligned) for Korea, China, Germany, Italy (at a weaker extent). The interactions between the two subjects within China are very strong and positive, remains positive between China and other countries (for example Iran, Russia and some European countries like Switzerland) but less intense. Negative interactions between China and the USA appear. Positive interactions of US with Japan, Canada and Australia. In European countries, it is positive for Germany and France but negative with the others. Negative also with Russia. Russia shows positive interactions with Great Britain, Australia and Asian countries like China, Hong Kong, Taiwan. Little interacting with European countries (or weakly negative as with France and Germany).
123
Scientometrics
I Overall
II Physical Sciences
III Computer Science
IV Medicine
V Medicine-Physics
V I Physics-Computer Science
Finally, Fig. 4 shows the inferred network topology derived from the estimated interactions illustrated in Figure 3. Summing up, moving from a high level of aggregation (GIPP) to a more granular and detailed disciplinary classification (ESI), allows us to obtain many interesting results about knowledge production and interdependencies between countries. Furthermore, the wealth
123
Scientometrics b Fig. 3 I Overall disciplinary profiles. Interactions Jij between the overall disciplinary profiles of countries P i and j. The cost function used in the optimization to estimate the Jij is H ¼ 12 i6¼j Jij si sj (see ‘‘Introduction and method’’ section). II–IV selected disciplines. Interactions Jijc between the profile of selected disciplines (c: Physical Sciences, Computer Science and Medicine) of countries i and j. The cost P function used in the optimization is H ¼ 12 i6¼j Jijc sci scj . Panels V–VI: cross-discipline INTERACTIONS. Interactions Jijcd that represent the interrelations of the two disciplines c and d among the couple of countries i, j) between selected couple of ESI disciplines (Medicine and Physical Sciences, and Physics and Computer P Science interactions). the Hamiltonian in this case is H ¼ 12 i;j Jijcd ðsci sdj þ sdi scj Þ. Note that in all panels (i) the ESI classification scheme is used, (ii) each country is identified by a 3-letters code (see list of countries in footnote 3), (iii) The intensity and sign of each interaction are reported by following the colormap shown on the right of the graph, and iv) only the interactions Jij : jJij j [ 0:1Maxij jJij j are shown
of the analyzes summarized and illustrated in this section shows the great potential of the approach proposed in this paper.
Conclusions The results summarized in this paper show the great potential of our proposed approach to infer the network of interactions between scientific disciplinary profiles at the macro level. Even if the results obtained may be due to possible biases and distortions in the analyzed data (as reported for example recently in Gingras and Khelfaoui (2017), we think that our approach can be very useful to support research policy analysis. The requirements and the disciplinary orientation of research funding programs are incentive to creativity as well as to the scientific production of scientists (Azoulay et al. 2011). Recent studies on the economics and organization of science have highlighted the lack of tools and analysis to determine how to allocate the funds among different disciplines (Antonelli et al. 2011). Of course, it is the policy of research that has to choose how to distribute resources across different scientific areas. The approach we propose in this paper may provide a useful tool to support policy decisions. Once the reliability of the proposed approach is established, it is possible to exploit the additional features of the generalized multicomponent spin model, e.g. to include in the analysis some measures of country’s contextual variables through the external field (hi ). For instance, an interesting extension of the study (left for future work) may be the inclusion of an external field given by a vector of research funding whose elements are country- and discipline-dependent, which together with the mutual interactions between countries can direct the disciplinary profile in a direction or in another. Once the inverse problem is solved, the obtained results could be used to simulate the impact of different levels of the funding (through the external field) on the disciplinary profiles emerging as outputs of the generalized multicomponent spin model which considers the interactions fixed to the one inferred, and change the external funding vector. This could guarantee a larger and more solid predictive capability than the simple extrapolation of observed trends. Other interesting developments of this paper, which are left to future research, include: • development of the methodology to eliminate the distortion of big countries’ effect; • application of the approach to other indicators of citations, highly cited publications and so on to compare the estimated interdependencies and the network topologies obtained; • refinements of the decimation (Marruzzo et al. 2017) to infer the network structure;
123
Scientometrics
I Overall
II Physical Sciences
Physics.png
III Computer Science
IV Medicine
Computer Science.png
Medicine.png
V Medicine-Physics
V I Physics-Computer Science
Medicine - Physics.png
Computer Science- Physics.png
Fig. 4 Graphical representation of the network topology inferred from the values of the J parameters obtained by the maximization of the pseudo-likelihood function and illustrated in Fig. 3. The graph shows the most relevant interactions. The diameter of the node representing the ith country is proportional to the number of interactions Jij . The thickness of the edge depends on the intensity of the related interaction (thicker edge means larger intensity)
123
Scientometrics
• overcome the limitations of the interpolation of data (to increase the number of observations) analyzing alternative data, such as downloads, which were not available for the present study; • and finally, the inclusion of the intensity of the auto-interdependence of countries (through the Jii ), the so called chemical potentials of countries. Acknowledgements The present study is an extended version of an article presented at the 16th International Conference on Scientometrics and Informetrics, Wuhan (China), 16–20 October 2017, Daraio (2017). This work was supported by the projects Sapienza Awards No. 6H15XNFS and No. PH11715C8239C105.
Appendix In this appendix we discuss the methodology used in this work to obtain the parameters of the maximum Log-Likelihood function introduced in the paper. Firstly, we discuss the general grounds of the validity of the method used. Secondly, we deal with the application to the specific case. Given the set of data, fsl ; l ¼ 1; 2; . . .; Mg, assuming that the observed data set are independent, and once defined the generative model, the Log-Likelihood function, lðfJgÞ becomes lðfJgÞ ¼ logðLðfJgÞ ¼
M X
Hðfsl gjfJgÞ M logðZðfJgÞÞ;
ð5Þ
l¼1
where l ¼ 1; . . .; M is the label for a set of data. The inference problem consists in determining the set of parameters fJg which maximizes the function in Eq. 5. We consider here the expression of the cost function (Hamiltonian) for a multicomponent variable si ¼ ðs1i ; . . .; sci ; . . .; sD i Þ, HðfsgjfJgÞ, given by 1X Jij si sj ; 2 i6¼j 1;N
HðfsgjfJgÞ ¼
ð6Þ
with Jij ¼ Jji . The symbol ‘‘’’ in Eq. 6 states for a scalar product. The presence of a scalar product ensures that orthogonal or quasi-orthogonal vectors (i.e. countries which have a number of publications whatever large but in different fields) will have a small weight in the cost function. The sum is extended to all couples of nodes (i, j) with i 6¼ j. The partition function ZðfJgÞ is X eHðfsgjfJgÞ ; ZðfJgÞ ¼ ð7Þ fsg the sum is extended to all possible configurations in the phase space of the set of variables fsg. The calculation of the above partition function is too demanding from a computational point of view already for a small number of variables. For this reason, we resort to the pseudo-likelihood approximation (Aurell and Ekeberg 2012, Tyagi et al. 2016, Marruzzo et al. 2017). It consists in maximizing a Pseudo-Log-Likelihood function based on the local conditional Log-Likelihood function at each node (see Eq. 10) in place of the Log-Likelihood function. It is possible to show that the estimation of the parameters obtained by a Pseudo-Log-Likelihood maximization is consistent with the one obtained by the maximization of the Log-Likelihood function, that is the two functions are maximized by the
123
Scientometrics
same set of parameters. The hypothesis under which this statement holds, i.e. the strict concavity of the Pseudo-Log-Likelihood function with respect to the elements of the set of parameters, is not too strict (see Hyvarinen 2006). Furthermore it is possible to show that under such a hypothesis the Pseudo-Log-Likelihood maximization is exact (i.e. equivalent to the Log-Likelihood maximization) in the case of infinite sampling (Aurell and Ekeberg 2012). An important advantage of the Pseudo-Log-Likelihood function is that it is possible to maximize it in polynomial time. According to the Pseudo-Log-Likelihood approach, we consider the likelihood built on the local conditional probability on each variable i, one by one. Instead of Eq. (5), the cost function (Eq. 6), is first rewritten as " # 1;N 1;N 1X 1X Jij sj sk Jkj sj HðfsgjfJgÞ ¼ si 2 i6¼j 2 k;j6¼i X ð8Þ ¼ si Ai ðfJgÞ sk Bi;k ðfJgÞ k6¼i
Hi ðsi jfsni g; fJgÞ þ Hni ðfsni gjfJgÞ: where sni indicates the set of all input-variables except the ith. The functions Ai ðfJgÞ ¼ P1;N P Jij sj and Bi;k ðfJgÞ ¼ 12 j61;N j ¼i Jij sj have been introduced in Eq. 8. The cost functions Hi ðsi jfsni g; fJgÞ and Hni ðfsni gjfJgÞ are implicitly defined in the same equation. Analogously we can rewrite the partition function as X eHðfsgjfJgÞ ZðfJgÞ ¼
1 2
fsg
¼
X
eHni ðfsni gjfJgÞ
fsni g
¼
X
eHi ðsi jfsni g;fJgÞ
fsi g
e
Hni ðfsni gjfJgÞ
e
Hni ðfsni gjfJgÞ
fsni g
X
X X
ð9Þ
esi Ai ðfJg
fsi g
Zi ðfsni g; fJgÞ:
fsni g
The local conditional probability at the ith node is 1 eHi ðsi jfsni g;fJgÞ ; ð10Þ Zi ðfJgÞ P and the local partition function is Zi ðfJgÞ ¼ fsi g eHi ðsi jfsni g;fJgÞ . By defining l0 ðsi jfsni gjfJgÞ ¼ log½pðsi jfsni g; fJgÞ, the Pseudo-Log-Likelihood function is defined as pðsi jfsni g; fJgÞ ¼
kðfJgÞ ¼
M X N X l¼1 i¼1
l0 ðsli jfsln gjfJgÞ i
N X
l0i :
ð11Þ
i¼1
The gradient of the Pseudo-Log-Likelihood function with respect to the parameter Jij is given by
123
Scientometrics
o kðfJgÞ oJij
M X 1
1 o Zi ðfJgÞ ¼ 2 Zi ðfJgÞ oJij l¼1 " # P Hi ðsi jfsni g;fJgÞ M 1X fsi g si sj e l l ¼ s s P Hi ðsi jfsni g;fJgÞ 2 l¼1 i j fsi g e " # M 1 1X l l s s \si sj [ i;fJg ¼ M 2 M l¼1 i j sli
slj
ð12Þ
1 ¼ M Cij \si sj [ i;fJg ; 2 where \ [ i;fJg states for ensemble average calculated over the probability distribution pðsi jfsni g; fJgÞ. Looking now at the gradient of the Log-Likelihood function, l(J) we 1 o observe that it is possible to rephrase the term ZðfJgÞ oJij ZðfJgÞ as 1 o 1 X Hni ðfsni gjfJgÞ X ZðfJgÞ ¼ e si sj eHi ðsi jfsni g;fJgÞ ZðfJgÞ oJij ZðfJgÞ fs g fs g i ni P Hi ðsi jfsni g;fJgÞ X 1 fs g si sj e ¼ eHðsjfJgÞ Pi Hi ðsi jfsni g;fJgÞ ZðfJgÞ fsg fsi g e P Hi ðsi jfsni g;fJgÞ fs g si sj e ¼ \ Pi [ ¼ \\si sj [ i;fJg [ fJg : Hi ðsi jfsni g;fJgÞ fsi g e
ð13Þ
Finally we obtain o 1 lðfJgÞ ¼ M Cij \\si sj [ i;fJg [ fJg : oJij 2
ð14Þ
By comparing Eqs. 12 and 14 it is possible to infer that in the limit M ! 1, i) both the gradients go to zero for the set of parameters fJg generating the observed data, ii) o o oJij kðfJgÞ ! oJij lðfJgÞ. This finally establishes the consistency of the maximum PseudoLog-Likelihood estimator. We observe, furthermore, its coincidence with the maximum Log-Likelihood estimator in the limit M ! 1. The gradient of the Log-Pseudo-Likelihood function can be calculated exactly, thus facilitating the computational solution of the inference problem. The explicit expression of o oJij kðfJgÞ is reported in the following. To deal with a lower number of parameters in place of maximizing the Pseudo-LogLikelihood function, given by the sum of the single-node Pseudo-Log-Likelihood functions (Eq. 11), we maximize each single-node Pseudo-Log-Likelihood function. Since the couplings should be symmetric the final estimate of the Jij parameter is obtained by taking the average ðJij þ Jji Þ=2. Using a standard Pseudo-Log-Likelihood maximization some coupling can be largely overestimated. To avoid such a drawback we used a l2 regularizer (Ravikumar 2010), i.e. in place of maximizing the kðfJgÞ function we maximize the function P kðfJgÞ l2 ð i;j Jij2 Þ1=2 , where l2 is a suitable chosen constant.
123
Scientometrics
The maximization of the single-node Pseudo-Log-Likelihood functions has been performed by means of the MATLAB fminunc package by selecting a trust-region optimization algorithm. In the following we first rephrase the expression of the Log-Likelihood function by isolating the contribution of the ith node and compare with the expression of the LogPseudo-Likelihood function to a deeper understanding of the differences between them. We finally calculate the gradient of the Pseudo-Likelihood function with respect to Jij . The P sum fsi g esi Ai ðfJgÞ in Eq. 9 has been calculated by assuming that the values of the ith input variable can continuously vary in the interval ½ 1; 1, obtaining Z 1 X Zi ðfJgÞ ¼ esi Ai ðfJg / dsi esi Ai ðfJgÞ 1
fsi g
¼
D Z 1 Y c¼1
1
c c dsci esi Ai ðfJgÞ
ð15Þ
D Y 2 sinhðAci ðfJgÞÞ : ¼ Aci ðfJgÞ c¼1
The proportionality constant in Eq. 15, equal to the inverse of the total number of all possible si configurations, does not influence the following derivations and it will be not explicitely considered. Similarly, it is possible to write the function P Zni ðfJgÞ ¼ fsni g eHni ðfsni gjfJgÞ , by exploiting the function Bi;k ðfJgÞ defined above, obtaining X eHni ðfsni gjfJgÞ Zni ðfJgÞ ¼ fsni g
¼
X
eHni;k ðfsni;k gjfJgÞ
fsni;k g
X
esk Bi;k ðfJg
ð16Þ
fsk g
D sinhðBc ðfJgÞÞ Y X i;k / eHni;k ðfsni;k gjfJgÞ : c ðfJgÞ B i;k c¼1 fsni;k g
By iterating this procedure to the remaining variables it is finally possible to write the partition function as the product ZðfJgÞ /
c D Y sinhðAc ðfJgÞÞ sinhðBi;k ðfJgÞÞ i
c¼1
Aci ðfJgÞ
Bci;k ðfJgÞ
. . .:
c sinhðFi;k;...;l ðfJgÞÞ c ðfJgÞ Fi;k;...;l
:
ð17Þ
The Log-Likelihood function becomes " # M D X X X sinhðAci ðfJgÞÞ ðlÞ ðlÞ l si Ai ðfJgÞ þ sk Bi;k ðfJgÞ M log lðfJgÞ ¼ Aci ðfJgÞ l¼1 c¼1 k6¼i ð18Þ ! !# c sinhðBci;k ðfJgÞÞ sinhðFi;k;...;l ðfJgÞÞ . . .: þ log þ const: þ log c ðfJgÞ Bci;k ðfJgÞ Fi;k;...;l The Pseudo-Log-Likelihood function, defined in Eqs. 10 and 11, takes now the expression
123
Scientometrics
kðfJgÞ ¼
13 0 cðlÞ 2 sinh Ai ðfJgÞ 4si AðlÞ ðfJgÞ A5 þ const: log@ i cðlÞ Ai ðfJgÞ i¼1 c¼1
M X N X l¼1
2
D X
ð19Þ
The difference between the Log-Likelihood function and the Pseudo-Log-Likelihood function clearly appears by comparing Eqs. 18 and 19. We can now explicitly calculate the gradient of the Pseudo-Log-Likelihood with respect to the set of parameters Jij . From Eq. 12, we need to calculate the quantity \sli slj [ i;fJg . It is (for sake of clarity the index l is omitted) P Hi ðsi jfsni g;fJgÞ fs g si sj e \si sj [ i;fJg ¼ Pi Hi ðsi jfsni g;fJgÞ fsi g e Z 1 1 sj ¼ dsi si esi Ai ðfJgÞ Zi ðfJgÞ 1 Z 1 ð20Þ D Y D 1 X a a ¼ scj dsai sci esi Ai Zi ðfJgÞ c¼1 a¼1 1 " # 1;D D 1 X c Y 2 2 a ¼ sj sinh Ai ðAci cosh Aci sinh Aci Þ: a c 2 Zi ðfJgÞ c¼1 A ð Þ A i a6¼c i The expression of Zi ðfJgÞ is reported in Eq. 15. It is possible to rewrite it, for a given index 2 sinhðAci Þ Q1;D 2 sinhðAai Þ . By inserting this latter expression in Eq. 20, we obtain c, as Zi / a6¼c Aa Ac i
i
\si sj [ i;fJg /
D X
scj
c¼1
¼
D X c¼1
2 2 ðAci Þ
" scj
ðAci cosh Aci sinh Aci Þ
1 ðcÞ
tanh Ai
1 ðcÞ
#
Aci 2 sinhðAci Þ
ð21Þ
;
Ai
and finally4 " " ## M X D o 1 1X 1 1 cðlÞ kðfJgÞ ¼ M Cij s cðlÞ : cðlÞ oJij 2 M l¼1 c¼1 j tanh A A i
ð22Þ
i
When we are dealing with interrelations among two different disciplines, labeled as c P and d, in place of Eq. (6), the Hamiltonian of the system is H ¼ 12 i;j Jijcd ðsci sdj þ sdi scj Þ. In this case, Eqs. (15), (21) and (22) should be changed consistently. This, however, does not introduce any further drawbacks. For example Eq. (A11) becomes X c d cd d c cd Zi ðfJ cd gÞ ¼ esi Ai ðfJ gÞþsi Ai ðfJ gÞ ; ð23Þ c d fsi g;fsi g
P1;N 1
where Aci ðfJ cd gÞ ¼ 2
4
j
Jijcd scj .
The proportionality constant for Zi ðfJgÞ and \si sj [ i;fJg is the same.
123
Scientometrics
References Antonelli, C., Franzoni, C., & Geuna, A. (2011). The organization, economics, and policy of scientific research: What we do know and what we dont know an agenda for research. Industrial and Corporate Change, 20(1), 201–213. Albert, R., & Baraba´si, A. L. (2002). Statistical mechanics of complex networks. Reviews of modern physics, 74(1), 47–97. Aurell, E., & Ekeberg, M. (2012). Inverse Ising inference using all the data. Physical Review Letters, 108(9), 090201. Azoulay, P., Graff Zivin, J. S., & Manso, G. (2011). Incentives and creativity: Evidence from the academic life sciences. The Rand Journal of Economics, 42(3), 527–554. Banerjee, O., El Ghaoui, L., & d’Aspremont, A. (2008). Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. Journal of Machine Learning Research, 9, 485–516. Baraba´si, A. L. (2016). Network science. Cambridge: Cambridge University Press. Barber, D. (2012). Bayesian reasoning and machine learning. Cambridge: Cambridge University Press. Besag, J. (1986). On the statistical analysis of dirty pictures. Journal of the Royal Statistical Society, Series B, 48(3), 259–279. Bongioanni, I., Daraio, C., & Ruocco, G. (2014). A quantitative measure to compare the disciplinary profiles of research systems and their evolution over time. Journal of Informetrics, 8(3), 710–727. Bongioanni I., Daraio C., Moed H. F., & Ruocco G. (2015). Comparing the disciplinary profiles of national and regional research systems by extensive and intensive measures. In Salah, A. A., Tonta, Y., Akdag Salah, A. A. , Sugimoto, C., Al, U. (Eds.), Proceedings of ISSI 2015 15th International Society of Scientometrics and Informetrics Conference, Istanbul, Turkey, 29 June to 3 July, 2015 (pp. 684–696). Bogazii University Printhouse. Bornmann, L., & Leydesdorff, L. (2013). Macro-indicators of citation impacts of six prolific countries: InCites data and the statistical significance of trends. PLoS One, 8(2), e56768. Brush, S. G. (1967). History of the Lenz–Ising Model. Reviews of Modern Physics, 39, 883–893. Daraio C., Fabbri F., Gavazzi G., Izzo M. G., Leuzzi L., Quaglia G., et al. (2017). Assessing the interdependencies between scientific disciplinary profiles at the country level: A pseudo-likelihood approach. In Proceedings of ISSI 2017 The 16th international conference on scientometrics and informetrics (pp. 1448–1459). China: Wuhan University (2017). Decelle A., & Ricci-Tersenghi F. Zhang P. , (2016). Data quality for the inverse Ising problem. Journal of Physics A: Mathematical and Theoretical, 49, 384001. de Price, D. J. S. (1965). Networks of scientific papers. Science, 149, 510–515. Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721–741. Gingras, Y., & Khelfaoui, M. (2017). Assessing the effect of the United States’ ‘‘citation advantage’’ on other countries’ scientific impact as measured in the Web of Science (WoS) database. Scientometrics, 114, 517–532. Gla¨nzel, W. (2000). Science in scandinavia: A bibliometric approach. Scientometrics, 48, 121–150. Gla¨nzel, W., Debackere, K., & Meyer, M. (2008). Triad or tetrad? On global changes in a dynamic world. Scientometrics, 74, 71–88. Gla¨nzel, W., & Schlemmer, B. (2007). National research proles in a changing Europe (19832003). An exploratory study of sectoral characteristics in the Triple Helix. Scientometrics, 70(2), 267–275. Gla¨nzel, W., Leta, J., & Thijs, B. (2006). Science in Brazil. Part 1: A macro-level comparative study. Scientometrics, 67(1), 67–86. Guns, R. (2014). Link prediction in measuring scholarly impact: Methods and practice (pp. 35–55). New York: Springer. Greig, D. M., Porteous, B. T., & Seheuly, A. H. (1989). Exact maximum a posteriori estimation for binary images. Journal of the Royal Statistical Society B, 51, 271–279. Hu, X. J., & Rousseau, R. (2009). Comparative study of the difference in research performance in biomedical fields among selected Western and Asian countries. Scientometrics, 81(2), 475–491. Hyvarinen, A. (2006). Consistency of pseudolikelihood estimation of fully visible Boltzmann machines. Neural Computation, 18, 2283–2292. Judge, G. G., & Mittelhammer, R. C. (2011). An information theoretic approach to econometrics. Cambridge: Cambridge University Press. Krapivsky, P. L., Redner, S., & Ben-Naim E. (2010). A kinetic view of statistical physics. Cambridge: Cambridge University Press. King, D. A. (2004). The scientic impact of nations. Nature, 430(6997), 311–316.
123
Scientometrics Latour, B. (2005). Reassembling the social-an introduction to actor-network-theory. Oxford: Oxford University Press. Leydesdorff, L., & Zhou, P. (2005). Are the contributions of China and Korea upsetting the world system of science? Scientometrics, 63(3), 617–630. Li, N. (2017). Evolutionary patterns of national disciplinary profiles in research: 19962015. Scientometrics, 111(1), 493–520. Marruzzo, A., Tyagi, P., Antenucci, F., Pagnani, A., & Leuzzi, L. (2017). Inverse problem for multi-body interaction of nonlinear waves. Scientific reports, 7(1), 3463. Mezard, M., & Montanari, A. (2009). Information, physics, and computation. Oxford: Oxford University Press. Neal, R. M. (1993). Probabilistic Inference Using Markov Chain Monte Carlo. Technical Report CRG-T393-1. Department of Computer Science, University of Toronto. Nederhof, A. J. (1988). The validity and reliability of evaluation of scholarly performance. In A. F. J. Van Raan (Ed.), Handbook of quantitative studies of science and technology, chapter 7 (pp. 193–228). London: Elsevier Science Pub Co. Newman, M. E. (2003). The structure and function of complex networks. SIAM Review, 45(2), 167–256. Nguyen H.C., Zecchina R., & Berg J. (2017). Inverse statistical problems: From the inverse Ising problem to data science arXiv:1702.01522v2 Radosevic, S., & Yoruk, E. (2014). Are there global shifts in the world science base? Analysing the catching up and falling behind of world regions. Scientometrics, 101(3), 1897–1924. Ravikumar, P., Wainwright, M. J., & Lafferty, J. D. (2010). High-dimensional Ising model selection using 1-regularized logistic regression. The Annals of Statistics, 38(3), 1287–1319. Schubert, A., Gla¨nzel, W., & Braun, T. (1989). Scientometric datales. A comprehensive set of indicators on 2649 journals and 96 countries in all major science elds 19811985. Scientometrics, 16(16), 3–478. Shen, Z., Yang, L., Pei, J., Li, M., Wu, C., Bao, J., et al. (2016). Interrelations among scientific fields and their relative influences revealed by an input output analysis. Journal of Informetrics, 10(1), 82–97. Shi, F., Foster, J. G., & Evans, J. A. (2015). Weaving the fabric of science: Dynamic network models of sciences unfolding structure. Social Networks, 43, 73–85. Tian, Y., Wen, C., & Hong, S. (2008). Global scientific production on GIS research by bibliometric analysis from 1997 to 2006. Journal of Informetrics, 2, 65–74. Tyagi, P., Marruzzo, A., Pagnani, A., Antenucci, F., & Leuzzi, L. (2016). Regularization and decimation pseudolikelihood approaches to statistical inference in X Y spin models. Physical Review B, 94(2), 024203. West, J. D., & Vilhena, D. A. (2014). A network approach to scholarly evaluation. In B. Cronin, & C. R. Sugimoto (Eds.), Beyond bibliometrics (pp. 151–166). MIT Press. Wong, C. Y. (2013). On a path to creative destruction: Science, technology and science-based technological trajectories of Japan and South Korea. Scientometrics, 96, 323–336. Wong, C. Y., & Goh, K. L. (2012). The pathway of development: science and technology of NIEs and selected Asian emerging economies. Scientometrics, 92, 523–548. Yang, L. Y., Yue, T., Ding, J. L., & Han, T. (2012). A comparison of disciplinary structure in science between the G7 and the BRIC countries by bibliometric methods. Scientometrics, 93, 497–516. Zhou, P., & Leydesdorff, L. (2006). The emergence of China as a leading nation in science. Research Policy, 35(1), 83–104.
123