Skip to main content
Log in

The academic social network

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

By means of their academic publications, authors form a social network. Instead of sharing casual thoughts and photos (as in Facebook), authors select co-authors and reference papers written by other authors. Thanks to various efforts (such as Microsoft Academic Search and DBLP), the data necessary for analyzing the academic social network is becoming more available on the Internet. What type of information and queries would be useful for users to discover, beyond the search queries already available from services such as Google Scholar? In this paper, we explore this question by defining a variety of ranking metrics on different entities—authors, publication venues, and institutions. We go beyond traditional metrics such as paper counts, citations, and h-index. Specifically, we define metrics such as influence, connections, and exposure for authors. An author gains influence by receiving more citations, but also citations from influential authors. An author increases his or her connections by co-authoring with other authors, and especially from other authors with high connections. An author receives exposure by publishing in selective venues where publications have received high citations in the past, and the selectivity of these venues also depends on the influence of the authors who publish there. We discuss the computation aspects of these metrics, and the similarity between different metrics. With additional information of author-institution relationships, we are able to study institution rankings based on the corresponding authors’ rankings for each type of metric as well as different domains. We are prepared to demonstrate these ideas with a web site (http://pubstat.org) built from millions of publications and authors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

Notes

  1. As new publication records are added to the MAS data set from time to time, the real count keeps increasing. Therefore, the statistical information presented here is based on a snapshot taken on the dataset (only for the Computer Science field) at a certain time point.

  2. Conceptually, the definition of CC and BCC metrics are similar to the traditional term of full and fractional citation counting.

References

  • Anagnostopoulos, A., Kumar, R.,&Mahdian, M. (2008). Influence and correlation in social networks. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 7–15).

  • ARWU. (2012). The academic ranking of world universities (arwu by sjtu) 2012 in computer science. http://www.shanghairanking.com/SubjectCS2012.html.

  • Bakshy, E., Karrer, B., & Adamic, L. A. (2009). Social influence and the diffusion of user-created content. In Proceedings of the 10th ACM conference on electronic commerce (EC) (pp. 325–334).

  • Ball, P. (2005). Index aims for fair ranking of scientists. Nature, 436, 900.

    Article  Google Scholar 

  • Barabási, A. L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286, 509–512.

    Article  MathSciNet  Google Scholar 

  • Bergstrom, C. (2007). Eigenfactor: Measuring the value and prestige of scholarly journals. College and Research Libraries News, 68(5), 314–316.

    Google Scholar 

  • Bollen, J., Rodriquez, M. A., & Van de Sompel, H. (2006). Journal status. Scientometrics, 69(3), 669–687.

    Article  Google Scholar 

  • Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. In Proceedings of the 7th international conference on Would Wide Web (WWW).

  • Budalakoti, S., & Bekkerman, R. (2012). Bimodal invitation-navigation fair bets model for authority identification in a social network. In Proceedings of the 21st international conference on World Wide Web, ACM (pp 709–718).

  • Chen, P., Xie, H., Maslov, S., & Redner, S. (2007). Finding scientific gems with google’s pagerank algorithm. Journal of Informetrics, 1(1), 8–15.

    Article  Google Scholar 

  • Chin, W. S., Juan, Y. C., Zhuang, Y., Wu, F., Tung, H. Y., Yu, T., et al. (2013). Effective string processing and matching for author disambiguation. In Proceedings of the 2013 KDD Cup 2013 workshop (p 7). ACM.

  • Chiu, D. M., & Fu, T. Z. J. (2010). “Publish or Perish” in the Internet Age: a study of publication statistics in computer networking research. ACM Sigcomm Computer Communication Review (CCR), 40(1), 34–43.

    Article  Google Scholar 

  • Crandall, D., Cosley, D., Huttenlocher, D., Kleinberg, J., & Suri, S. (2008). Feedback effects between similarity and social influence in online communities. In Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 160–168).

  • Ding, Y., Yan, E., Frazho, A., & Caverlee, J. (2009). Pagerank for ranking authors in co-citation networks. Journal of the American Society for Information Science and Technology, 60(11), 2229–2243.

    Article  Google Scholar 

  • Easley, D. A., & Kleinberg, J. M. (2010). Networks, crowds, and markets—reasoning about a highly connected world. Cambridge: Cambridge University Press.

    Book  MATH  Google Scholar 

  • Egghe, L. (2006). An improvement of the H-index: The G-index. ISSI Newsletter, 2(1), 8–9.

    MathSciNet  Google Scholar 

  • Garfield, E. (1972). Citation analysis as a tool in journal evaluation. Science, 178(60), 471–479.

    Article  Google Scholar 

  • Getoor, L., & Machanavajjhala, A. (2012). Entity resolution: Theory, practice & open challenges. Proceedings of the VLDB Endowment, 5(12), 2018–2019.

    Article  Google Scholar 

  • Giles, C. L., Bollacker, K. D., & Lawrence, S. (1998). Citeseer: An automatic citation indexing system. In Proceedings of the third ACM conference on digital libraries (pp. 89–98).

  • González-Pereira, B., Guerrero Bote, V. P., & Moya-Anegón, F. (2009). The SJR indicator: A new indicator of journals’ scientific prestige. arXiv:0912.4141v1.

  • Harzing, A. W. (2008). Reflections on the h-index. http://www.harzing.com/pop_hindex.htm/.

  • Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences of the United States of America, 102, 16569–16572.

    Article  Google Scholar 

  • Jeong, H., Néda, Z., & Barabási, A. L. (2003). Measuring preferential attachment in evolving networks. Europhysics Letters, 61, 567–572.

    Article  Google Scholar 

  • Kleinberg, J. M. (1999). Authoritative sources in a hyperlinked environment. Journal of ACM, 48, 604–632.

    Article  MathSciNet  Google Scholar 

  • Langville, A. N., & Meyer, C. D. (2009). Google’s PageRank and Beyond: The Science of Search Engine Rankings. Princeton: Princeton University Press.

    Google Scholar 

  • Ley, M. (2009). Dblp: Some lessons learned. Proceedings of the VLDB Endowment, 2(2), 1493–1500.

    Article  MathSciNet  Google Scholar 

  • Leydesdorff, L., & Bornmann, L. (2011). How fractional counting of citations affects the impact factor: Normalization in terms of differences in citation potentials among fields of science. Journal of the American Society for Information Science and Technology, 62(2), 217–229.

    Article  Google Scholar 

  • Li, P., Yu, J. X., Liu, H., He, J., & Du, X. (2011). Ranking individuals and groups by influence propagation. In Advances in Knowledge Discovery and Data Mining (pp. 407–419), Berlin Heidelberg: Springer.

  • Merton, R. K. (1968). The Matthew effect in science. Science, 159, 56–63.

    Article  Google Scholar 

  • Meyer, C. D. (2000). Matrix analysis and applied linear algebra. Siam.

  • Newman, M. E. J. (2001a). Clustering and preferential attachment in growing networks. Physical Review E, 64(025), 102.

    Google Scholar 

  • Newman, M. E. J. (2001b). The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences USA, 98(2), 404–409.

    Article  MATH  Google Scholar 

  • Newman, M. E. J. (2004a). Coauthorship networks and patterns of scientific collaboration. Proceedings of the National Academy of Sciences USA, 101, 5200–5205.

    Article  Google Scholar 

  • Newman, M. E. J. (2004b). Who is the best connected scientist? A study of scientific coauthorship networks. In E. Ben-Naim, H. Frauenfelder & Z. Toroczkai (eds.), Complex networks (pp. 337–370). Berlin: Springer.

  • Nie, Z., Wen, J., & Ma, W. (2007). Object-level vertical search. In: Proceedings of the 3rd biennial conference on innovative data systems research (CIDR).

  • Pinski, G., & Narin, F. (1976). Citation influence for journal aggregates of scientific publications: Theory, with application to the literature of physics. Information Processing & Management, 12(5), 297–312.

    Article  Google Scholar 

  • QS (2013) The QS world university rankings by subject 2013—computer science & information systems, http://www.topuniversities.com/university-rankings/university-subject-rankings/2013/computer-science-and-information-systems/.

  • Radicchi, F., Fortunato, S., Markines, B., & Vespignani, A. (2009). Diffusion of scientific credits and the ranking of scientists. Physical Review E, 80(056), 103.

    Google Scholar 

  • Redner, S. (1998). How popular is your paper? An empirical study of the citation distribution. The European Physical Journal B, 4, 131–134.

    Article  Google Scholar 

  • Roy, S. B., De Cock, M., Mandava, V., Savanna, S., Dalessandro, B., Perlich, C., et al. (2013). The microsoft academic search dataset and kdd cup 2013. In Proceedings of the 2013 KDD cup 2013 workshop (p 1). ACM.

  • Seglen, P. O. (1992). The skewness of science. Journal of the American Society for Information Science, 43, 628–638.

    Article  Google Scholar 

  • Sekercioglu, C. H. (2008). Quantifying coauthors contributions. Science, 322(5900), 371.

    Article  Google Scholar 

  • de Solla Price, D. J. (1965). Networks of scientific papers. Science, 149(3683), 510–515.

    Article  Google Scholar 

  • de Solla Price, D. J. (1976). A general theory of bibliometric and other cumulative advantage process. Journal of the American Society for Information Science, 27, 292–306.

    Article  Google Scholar 

  • Sun, Y., & Giles, C. L. (2007). Popularity weighted ranking for academic digital libraries. In Proceedings of the 29th European conference on information retrieval eesearch (ECIR 2007).

  • Treeratpituk, P., & Giles, C. L. (2009). Disambiguating authors in academic publications using random forests. In Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries (pp. 39–48), ACM.

  • US-News. (2010). US News Ranking—the best graduate schools in computer science. http://grad-schools.usnews.rankingsandreviews.com/best-graduate-schools/top-science-schools/computer-science-rankings/.

  • Walker, D., Xie, H., Yan, K.K., Maslov, S. (2007). Ranking scientific publications using a model of network traffic. Journal of Statistical Mechanics: Theory and Experiment, 2007(6), P06010 UK: IOP Publishing.

  • Walter, G., Bloch, S., Hunt, G., & Fisher, K. (2003). Counting on citations: A flawed way to measure quality? Medical Journal of Australia, 178, 280–281.

    Google Scholar 

  • Waltman, L., & van Eck, N. J. (2010). The relation between eigenfactor, audience factor, and influence weight. Journal of the American Society for Information Science and Technology, 61(7), 1476–1486.

    Article  Google Scholar 

  • Yan, E., Ding, Y., & Sugimoto, C. R. (2011). P-rank: An indicator measuring prestige in heterogeneous scholarly networks. Journal of the American Society for Information Science and Technology, 62(3), 467–477.

    Google Scholar 

  • Zhou, D., Orshanskiy, S. A., Zha, H., & Giles, C. L. (2007). Co-ranking authors and documents in a heterogeneous network. In Proceedings of IEEE International Conference on Data Mining (ICDM).

  • Zitt, M., & Small, H. (2008). Modifying the journal impact factor by fractional citation weighting: The audience factor. Journal of the American Society for Information Science and Technology, 59(11), 1856–1860.

    Article  Google Scholar 

Download references

Acknowledgments

We appreciate the support from the Technology Transfer Office (TBF13ENG004) of the Chinese University of Hong Kong. We also appreciate the valuable comments provided by the reviewers.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tom Z. J. Fu.

Appendices

Appendix 1: The PageRank algorithm

Given a graph \(G=(V,E)\), the PageRank algorithm can be considered as a random walk starting from any node along the edges. After an infinite number of steps, the probability that a node is visited is the PageRank value of that node.

More formally, the probability distribution of visiting each node can be derived by solving a Markov Chain. The transition matrix C’s entries \(c_{ij}\) (\(i,j=1,2,\dots , n\)) represent the transition probability that the random walk will visit node j next given that it is currently at node i. Thus, \(c_{ij}\) can be expressed as

$$\begin{aligned} c_{ij} = {\text {Prob}}(j|i) = \frac{e_{ij}}{\sum _k e_{ik}} \end{aligned}$$
(1)

where \(e_{ij}\) is from the adjacency matrix for the graph G. If G is the citation graph, for example, then \(e_{ij}=1\) if paper i cites paper j; else \(e_{ij}=0\).

In general, C is a substochastic matrix with rows summing to either 0 (dangling nodes, see also Brin and Page 1998, for example, representing papers with citing no other papers) or 1 (normal nodes, or papers). For each dangling node, the corresponding row is replaced by \(\frac{1}{n}{\mathbf {e}}\), so that C becomes a stochastic matrix.

In order to ensure the Markov Chain C is irreducible, hence a solution is guaranteed to exist, C is further transformed as follows:

$$\begin{aligned} \widetilde{C} = \alpha C + (1-\alpha ){\mathbf {e}}{\mathbf {v}}^{\mathrm{{T}}}, \;\;\alpha \in (0,1). \end{aligned}$$
(2)

Here, \({\mathbf {e}}\) is a special column vector with all 1s, and of dimension n.

In Eq. (2), \({\mathbf {v}}\in {\mathcal {R}}^{n}\) is a probability vector (i.e., its values are between 0 and 1, and sum to 1). It is referred to as the teleportation vector, which can be used to configure some bias into the random walk. For our purposes, we let \({\mathbf {v}} = 1/n{\mathbf {e}}\) as the default setting.

Now, according to the Perron–Frobenius Theorem (Langville and Meyer 2009; Meyer 2000), matrix \(\widetilde{C}\) is stochastic, irreducible, and aperiodic, and the equation

$$\begin{aligned} \pi ^{\mathrm{{T}}}=\alpha \pi ^{\mathrm{{T}}}C+(1-\alpha )\frac{1}{n}{\mathbf {e}}^{\mathrm{{T}}},\;\;\alpha \in (0,1) \end{aligned}$$
(3)

which can be solved by iteration methods in practice.

Appendix 2: Definition of metrics in matrix form

We list the matrix form for the five metrics discussed in the previous sections in Table 18.

Table 18 Notations and derivations of the ranking metrics

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fu, T.Z.J., Song, Q. & Chiu, D.M. The academic social network. Scientometrics 101, 203–239 (2014). https://doi.org/10.1007/s11192-014-1356-x

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-014-1356-x

Keywords

Navigation