Formational bounds of link prediction in collaboration networks

Kim, Jinseok; Diesner, Jana

doi:10.1007/s11192-019-03055-6

Formational bounds of link prediction in collaboration networks

Published: 09 March 2019

Volume 119, pages 687–706, (2019)
Cite this article

Scientometrics Aims and scope Submit manuscript

646 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

Link prediction in collaboration networks is often solved by identifying structural properties of existing nodes that are disconnected at one point in time, and that share a link later on. The maximally possible recall rate or upper bound of this approach’s success is capped by the proportion of links that are formed among existing nodes embedded in these properties. Consequentially, sustained links as well as links that involve one or two new network participants are typically not predicted. The purpose of this study is to highlight formational constraints that need to be considered to increase the practical value of link prediction methods targeted for collaboration networks. In this study, we identify the distribution of basic link formation types based on four large-scale, over-time collaboration networks, showing that roughly speaking, 25% of links represent continued collaborations, 25% of links are new collaborations between existing authors, and 50% are formed between an existing author and a new network member. This implies that for collaboration networks, increasing the accuracy of computational link prediction solutions may not be a reasonable goal when the ratio of collaboration links that are eligible to the classic link prediction process is low.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Link Prediction

Link Prediction in Heterogeneous Collaboration Networks

Using Weighted Interaction Metrics for Link Prediction in a Large Online Social Network

Notes

In this study, ‘collaboration’ means ‘coauthorship’ in a research paper and these two terms are used interchangeably.
https://www.nlm.nih.gov/bsd/licensee/medpmmenu.html.
https://databank.illinois.edu/datasets/IDB-4222651.
http://dblp.org/xml/release/; for this study, we downloaded the April 2015 release.
A list of 392 journal was obtained from Thomson Reuters Journal Citation Report 2012 for the category “Computer Science”. We retrieved records on these papers published in these journals from DBLP.
http://journals.aps.org/datasets; for this study, we obtained the APS 2014 release version under the permission of the American Physical Society.
Mark E. J. Newman at the University of Michigan Department of Physics kindly provided the disambiguation code.
http://scholar.ndsl.kr/index.do; for this study, we obtained the KISTI 2016 version under a research agreement with the Korea Institute for Science and Technology Information.
This demonstrates why varying past–present network time frames matters for this study. The idea of using different past–present network periods was suggested by one of the reviewers of this paper.
In some fields, such as natural language processing, recall and precision are often inversely related and therefore an average score such as the F metric (e.g., harmonic mean of precision and recall) is calculated.
For a detailed explanation for the Degree Product predictor, see “Appendix”.
This does not mean that all preferential attachment models are designed to explain power-law obeying networks. However, many studies on preferential attachment have attempted to model power-law obeying networks.
https://cran.r-project.org/web/packages/poweRlaw/index.html.
Many studies on power-law distribution in collaboration networks have fitted distribution tails (i.e., distribution of certain x values and above) to power-law slopes to assess the performance of proposed network generation models. Several studies have divided a degree distribution into two parts (below and above a certain x value) and fit them separately to different power-law slopes (e.g., Wagner and Leydesdorff 2005). A few others have tested power-law distributions with cut-offs (below certain x value) (e.g., Newman 2001b).

References

Adamic, L. A., & Adar, E. (2003). Friends and neighbors on the web. Social Networks, 25(3), 211–230. https://doi.org/10.1016/So378-8733(03)00009-1.
Article Google Scholar
Barabási, A. L., Jeong, H., Neda, Z., Ravasz, E., Schubert, A., & Vicsek, T. (2002). Evolution of the social network of scientific collaborations. Physica A-Statistical Mechanics and Its Applications, 311(3–4), 590–614. https://doi.org/10.1016/s0378-4371(02)00736-7.
Article MathSciNet MATH Google Scholar
Braun, T., Glänzel, W., & Schubert, A. (2001). Publication and cooperation patterns of the authors of neuroscience journals. Scientometrics, 51(3), 499–510. https://doi.org/10.1023/A:1019643002560.
Article Google Scholar
Cabanac, G., Hubert, G., & Milard, B. (2015). Academic careers in Computer Science: Continuance and transience of lifetime co-authorships. Scientometrics, 102(1), 135–150. https://doi.org/10.1007/s11192-014-1426-0.
Article Google Scholar
Chen, D.-B., Xiao, R., & Zeng, A. (2014). Predicting the evolution of spreading on complex networks. Scientific Reports. https://doi.org/10.1038/srep06108
Article Google Scholar
Chen, H., Li, X., & Huang, Z. (2005). Link prediction approach to collaborative filtering. Paper presented at the proceedings of the 5th ACM/IEEE-CS joint conference on digital libraries (JCDL ‘05).
Choudhury, N., & Uddin, S. (2017). Mining actor-level structural and neighborhood evolution for link prediction in dynamic networks. Paper presented at the Proceedings of the 2017 IEEE/ACM international conference on advances in social networks analysis and mining 2017, Sydney, Australia.
Choudhury, N., & Uddin, S. (2018). Evolutionary community mining for link prediction in dynamic networks. Paper presented at the complex networks & their applications VI, Lyon, France.
Clauset, A., Shalizi, C. R., & Newman, M. E. J. (2009). Power-law distributions in empirical data. Siam Review, 51(4), 661–703. https://doi.org/10.1137/070710111.
Article MathSciNet MATH Google Scholar
Fegley, B. D., & Torvik, V. I. (2013). Has large-scale named-entity network analysis been resting on a flawed assumption? PLoS ONE, 8(7), 1–16. https://doi.org/10.1371/journal.pone.0070299.
Article Google Scholar
Guns, R. (2014). Link prediction. In Measuring scholarly impact (pp. 35–55). Springer.
Guns, R., & Rousseau, R. (2014). Recommending research collaborations using link prediction and random forest classifiers. Scientometrics, 101(2), 1461–1473. https://doi.org/10.1007/s11192-013-1228-9.
Article Google Scholar
Kim, J. (2018). Evaluating author name disambiguation for digital libraries: A case of DBLP. Scientometrics, 116(3), 1867–1886. https://doi.org/10.1007/s11192-018-2824-5.
Article Google Scholar
Kim, J., & Diesner, J. (2015). The effect of data pre-processing on understanding the evolution of collaboration networks. Journal of Informetrics, 9(1), 226–236. https://doi.org/10.1016/j.joi.2015.01.002.
Article Google Scholar
Kim, J., & Diesner, J. (2016). Distortive effects of initial-based name disambiguation on measurements of large-scale coauthorship networks. Journal of the Association for Information Science and Technology, 67(6), 1446–1461.
Article Google Scholar
Kim, J., & Diesner, J. (2017). Over-time measurement of triadic closure in coauthorship networks. Social Network Analysis and Mining, 7(1), 1–12. https://doi.org/10.1007/s13278-017-0428-3.
Article Google Scholar
Kim, J., Tao, L., Lee, S.-H., & Diesner, J. (2016). Evolution and structure of scientific co-publishing network in Korea between 1948–2011. Scientometrics, 107(1), 27–41. https://doi.org/10.1007/s11192-016-1878-5.
Article Google Scholar
Lerchenmueller, M. J., & Sorenson, O. (2016). Author Disambiguation in PubMed: Evidence on the precision and recall of author-ity among NIH-funded scientists. PLoS ONE, 11(7), e0158731.
Article Google Scholar
Liben-Nowell, D., & Kleinberg, J. (2007). The link-prediction problem for social networks. Journal of the American Society for Information Science and Technology, 58(7), 1019–1031. https://doi.org/10.1002/asi.20591.
Article Google Scholar
Lü, L., & Zhou, T. (2011). Link prediction in complex networks: A survey. Physica A: Statistical Mechanics and Its Applications, 390(6), 1150–1170.
Article Google Scholar
Martin, T., Ball, B., Karrer, B., & Newman, M. E. J. (2013). Coauthorship and citation patterns in the Physical Review. Physical Review E, 88(1), 012814. https://doi.org/10.1103/physreve.88.012814.
Article Google Scholar
Milojević, S. (2010). Modes of collaboration in modern science: Beyond power laws and preferential attachment. Journal of the American Society for Information Science and Technology, 61(7), 1410–1423. https://doi.org/10.1002/asi.21331.
Article Google Scholar
Mohdeb, D., Boubetra, A., & Charikhi, M. (2016). Tie persistence in academic social networks. Informatica, 40(3), 353.
MathSciNet Google Scholar
Mollenhorst, G., Volker, B., & Flap, H. (2011). Shared contexts and triadic closure in core discussion networks. Social Networks, 33(4), 292–302. https://doi.org/10.1016/j.socnet.2011.09.001.
Article Google Scholar
Newman, D., Karimi, S., & Cavedon, L. (2009). Using topic models to interpret MEDLINE’s medical subject headings. In A. Nicholson, & X. Li (Eds.), AI 2009: Advances in artificial intelligence (Vol. 5866, pp. 270–279). Berlin, Heidelberg: Springer.
Newman, M. E. J. (2001a). Clustering and preferential attachment in growing networks. Physical Review E. https://doi.org/10.1103/physreve.64.025102.
Article Google Scholar
Newman, M. E. J. (2001b). The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences of the United States of America, 98(2), 404–409. https://doi.org/10.1073/pnas.021544898.
Article MathSciNet MATH Google Scholar
Pennock, D. M., Flake, G. W., Lawrence, S., Glover, E. J., & Giles, C. L. (2002). Winners don’t take all: Characterizing the competition for links on the web. Proceedings of the National Academy of Sciences of the United States of America, 99(8), 5207–5211. https://doi.org/10.1073/pnas.032085699.
Article MATH Google Scholar
Perc, M. (2014). The Matthew effect in empirical data. Journal of The Royal Society Interface. https://doi.org/10.1098/rsif.2014.0378.
Article Google Scholar
Price, D., & Gürsey, S. (1976). Studies in scientometrics. 1. Transience and continuance in scientific authorship. Paper presented at the international forum on information and documentation.
Reitz, F., & Hoffmann, O. (2011). Did they notice? A case-study on the community contribution to data quality in DBLP. In S. Gradmann, F. Borri, C. Meghini, & H. Schuldt (Eds.), Research and advanced technology for digital libraries, TPDL 2011 (Vol. 6966, pp. 204–215). Berlin: Springer.
Chapter Google Scholar
Resnick, P., & Varian, H. R. (1997). Recommender systems. Communications of the ACM, 40(3), 56–58.
Article Google Scholar
Schubert, A., & Glänzel, W. (1991). Publication dynamics—Models and indicators. Scientometrics, 20(1), 317–331. https://doi.org/10.1007/Bf02018161.
Article MATH Google Scholar
Taskar, B., Wong, M. F., Abbeel, P., & Koller, D. (2003). Link prediction in relational data. Paper presented at the advances in neural information processing systems.
Torvik, V. I., & Smalheiser, N. R. (2009). Author name disambiguation in MEDLINE. ACM Transactions on Knowledge Discovery from Data, 3(3), 1–29. https://doi.org/10.1145/1552303.1552304.
Article Google Scholar
Wagner, C. S., & Leydesdorff, L. (2005). Network structure, self-organization, and the growth of international collaboration in science. Research Policy, 34(10), 1608–1618. https://doi.org/10.1016/j.respol.2005.08.002.
Article Google Scholar
Yan, E., & Guns, R. (2014). Predicting and recommending collaborations: An author-, institution-, and country-level analysis. Journal of Informetrics, 8(2), 295–309. https://doi.org/10.1016/j.joi.2014.01.008.
Article Google Scholar

Download references

Acknowledgements

This work is supported, in part, by Korea Institute of Science and Technology Information (KISTI). We would like to thank Vetle Torvik (University of Illinois at Urbana-Champaign), the American Physical Society, DBLP, and KISTI for providing datasets. We are also grateful to Mark E. J. Newman (University of Michigan) for providing code for disambiguating author names in APS data and Raf Guns (University of Antwerp) for comments on link prediction processes in LinkPred.

Author information

Authors and Affiliations

Institute for Research on Innovation and Science, Survey Research Center, Institute for Social Research, University of Michigan, 330 Packard Street, Ann Arbor, MI, 48105, USA
Jinseok Kim
School of Information Sciences, University of Illinois at Urbana-Champaign, 501 E. Daniel St., Champaign, IL, 61820, USA
Jana Diesner

Authors

Jinseok Kim
View author publications
You can also search for this author in PubMed Google Scholar
Jana Diesner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jinseok Kim.

Appendix

Degree Product: (Barabási et al. 2002) showed that if links in a network are formed based on preferential attachment, the probability of two nodes to form a link is proportional to the product of the degrees of those two nodes. This is frequently used to predict link formation among nodes present in both past and present networks. In the following equation, S(x, y) is the prediction score for a pair of node x and y, and Γ(x) is the set of nodes connected to x.

$$S\left( {x, y} \right) = \left| {\Gamma \left( x \right)} \right| \times \left| {\Gamma \left( y \right)} \right|$$

(2)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, J., Diesner, J. Formational bounds of link prediction in collaboration networks. Scientometrics 119, 687–706 (2019). https://doi.org/10.1007/s11192-019-03055-6

Download citation

Received: 02 May 2018
Published: 09 March 2019
Issue Date: 15 May 2019
DOI: https://doi.org/10.1007/s11192-019-03055-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Formational bounds of link prediction in collaboration networks

Abstract

Access this article

Similar content being viewed by others

Link Prediction

Link Prediction in Heterogeneous Collaboration Networks

Using Weighted Interaction Metrics for Link Prediction in a Large Online Social Network

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Formational bounds of link prediction in collaboration networks

Abstract

Access this article

Similar content being viewed by others

Link Prediction

Link Prediction in Heterogeneous Collaboration Networks

Using Weighted Interaction Metrics for Link Prediction in a Large Online Social Network

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation