Abstract
Katz centrality is a fundamental concept to measure the influence of a vertex in a social network. However, existing approaches to calculating Katz centrality in a large-scale network are unpractical and computationally expensive. In this article, we propose a novel method to estimate Katz centrality based on graph sampling techniques, which object to achieve comparable estimation accuracy of the state-of-the-arts with much lower computational complexity. Specifically, we develop a Horvitz–Thompson estimate for Katz centrality by using a multi-round sampling approach and deriving an unbiased mean value estimator. We further propose SAKE, a
- 2017. Livemocha network dataset – KONECT. Retrieved from http://konect.uni-koblenz.de/networks/livemocha.Google Scholar
- Nesreen K. Ahmed, Jennifer Neville, and Ramana Kompella. 2010. Reconsidering the foundations of network sampling. In Proceedings of the 2nd Workshop on Information in Networks (WIN’10).Google Scholar
- Nesreen K. Ahmed, Jennifer Neville, and Ramana Kompella. 2014. Network sampling: From static to streaming graphs. ACM Transactions on Knowledge Discovery from Data 8, 2 (2014), 7.Google ScholarDigital Library
- Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. Dbpedia: A nucleus for a web of open data. In The Semantic Web. Springer, 722--735.Google ScholarDigital Library
- Michele Benzi and Christine Klymko. 2013. Total communicability as a centrality measure. Journal of Complex Networks 1, 2 (2013), 124--149.Google ScholarCross Ref
- Paolo Boldi and Sebastiano Vigna. 2014. Axioms for centrality. Internet Mathematics 10, 3–4 (2014), 222--262.Google ScholarCross Ref
- Phillip Bonacich. 1972. Factoring and weighting approaches to status scores and clique identification. Journal of Mathematical Sociology 2, 1 (1972), 113--120.Google ScholarCross Ref
- Stephen P. Borgatti, Kathleen M. Carley, and David Krackhardt. 2006. On the robustness of centrality measures under conditions of imperfect data. Social Networks 28, 2 (2006), 124--136.Google ScholarCross Ref
- Wei Chen and Shang-Hua Teng. 2017. Interplay between social influence and network centrality: A comparative study on Shapley centrality and single-node-influence centrality. In Proceedings of the 26th International Conference on World Wide Web (WWW’17). International World Wide Web Conferences Steering Committee, 967--976.Google ScholarDigital Library
- Elizabeth Costenbader and Thomas W. Valente. 2003. The stability of centrality measures when networks are sampled. Social Networks 25, 4 (2003), 283--307.Google ScholarCross Ref
- Eric D. Kolaczyk. 2009. Statistical Analysis of Network Data: Methods and Models. Springer.Google ScholarDigital Library
- Talya Eden, Shweta Jain, Ali Pinar, Dana Ron, and C. Seshadhri. 2018. Provable and practical approximations for the degree distribution using sublinear graph samples. In Proceedings of the 27th International Conference on World Wide Web (WWW’18). 449--458.Google Scholar
- Roohollah Etemadi and Jianguo Lu. 2017. Bias correction in clustering coefficient estimation. In Proceedings of the IEEE International Conference on Big Data (Big Data’17). IEEE, 606--615.Google ScholarCross Ref
- Kurt C. Foster, Stephen Q. Muth, John J. Potterat, and Richard B. Rothenberg. 2001. A faster Katz status score algorithm. Computational & Mathematical Organization Theory 7, 4 (2001), 275--285.Google ScholarDigital Library
- Minas Gjoka, Maciej Kurant, Carter T. Butts, and Athina Markopoulou. 2010. Walking in Facebook: A case study of unbiased sampling of OSNs. In Proceedings of the 2010 IEEE Infocom. IEEE, 1--9.Google ScholarCross Ref
- Stephen J. Hardiman and Liran Katzir. 2013. Estimating clustering coefficients and size of social networks via random walk. In Proceedings of the 22nd International Conference on World Wide Web (WWW’13). ACM, 539--550.Google Scholar
- Douglas D. Heckathorn and Christopher J. Cameron. 2017. Network sampling: From snowball and multiplicity to respondent-driven sampling. Annual Review of Sociology 43, 1 (2017), 101--119.Google ScholarCross Ref
- Wassily Hoeffding. 1963. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58, 301 (1963), 13--30.Google ScholarCross Ref
- Daniel G. Horvitz and Donovan J. Thompson. 1952. A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association 47, 260 (1952), 663--685.Google ScholarCross Ref
- Shiyu Ji and Zenghui Yan. 2016. Refining approximating betweenness centrality based on samplings. arXiv preprint arXiv:1608.04472 (2016).Google Scholar
- Leo Katz. 1953. A new status index derived from sociometric analysis. Psychometrika 18, 1 (1953), 39--43.Google ScholarCross Ref
- David Kempe, Jon Kleinberg, and Éva Tardos. 2003. Maximizing the spread of influence through a social network. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’03). ACM, 137--146.Google ScholarDigital Library
- Sang Hoon Lee, Pan-Jun Kim, and Hawoong Jeong. 2006. Statistical properties of sampled networks. Physical Review E 73, 1 (2006), 016102.Google ScholarCross Ref
- Jure Leskovec and Christos Faloutsos. 2006. Sampling from large graphs. In Proceedings of the 12th International Conference on Knowledge Discovery and Data Mining (KDD’06). ACM, 631--636.Google ScholarDigital Library
- Jure Leskovec, Kevin J. Lang, Anirban Dasgupta, and Michael W. Mahoney. 2008. Statistical properties of community structure in large social and information networks. In Proceedings of the 17th International Conference on World Wide Web (WWW’08). ACM, 695--704.Google Scholar
- Mingkai Lin, Wenzhong Li, Cam-tu Nguyen, Xiaoliang Wang, and Sanglu Lu. 2019. Sampling based Katz centrality estimation for large-scale social networks. In Proceedings of the International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP’19). Springer, 584--598.Google Scholar
- Arun S. Maiya and Tanya Y. Berger-Wolf. 2011. Benefits of bias: Towards better characterization of network sampling. In Proceedings of the 17th International Conference on Knowledge Discovery and Data Mining (KDD’11). ACM, 105--113.Google Scholar
- Christopher Manning, Prabhakar Raghavan, and Hinrich Schütze. 2010. Introduction to information retrieval. Natural Language Engineering 16, 1 (2010), 100--103.Google Scholar
- Toshiki Matsumura, Kenta Iwasaki, and Kazuyuki Shudo. 2018. Average path length estimation of social networks by random walk. In Proceedings of the 2018 IEEE International Conference on Big Data and Smart Computing (BigComp’18). IEEE, 611--614.Google ScholarCross Ref
- Eisha Nathan and David A. Bader. 2017. Approximating personalized Katz centrality in dynamic graphs. In Proceedings of the International Conference on Parallel Processing and Applied Mathematics (PPAM’17). Springer, 290--302.Google Scholar
- Eisha Nathan, Geoffrey Sanders, James Fairbanks, and David A. Bader. 2017. Graph ranking guarantees for numerical approximations to Katz centrality. In International Conference on Computational Science (ICCS’17), Vol. 108. Elsevier, 68--78. DOI:10.1016/j.procs.2017.05.021Google Scholar
- Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford InfoLab.Google Scholar
- Alireza Rezvanian, Behnaz Moradabadi, Mina Ghavipour, Mohammad Mehdi Daliri Khomami, and Mohammad Reza Meybodi. 2019. Social network sampling. In Learning Automata Approach for Social Networks. Springer, 91--149.Google Scholar
- Bernardete Ribeiro and Don Towsley. 2012. On the estimation accuracy of degree distributions from graph sampling. In Proceedings of the IEEE 51st IEEE Conference on Decision and Control.Google ScholarCross Ref
- Matthew Richardson and Pedro Domingos. 2002. Mining knowledge-sharing sites for viral marketing. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’02). ACM, 61--70.Google ScholarDigital Library
- Matteo Riondato and Evgenios M. Kornaropoulos. 2016. Fast approximation of betweenness centrality through sampling. Data Mining and Knowledge Discovery 30, 2 (2016), 438--475.Google ScholarDigital Library
- Lubos Takac and Michal Zabovsky. 2012. Data analysis in public social networks. In Proceedings of the International Scientific Conference and International Workshop Present Day Trends of Innovations, Vol. 1.Google Scholar
- Alexander van der Grinten, Elisabetta Bergamini, Oded Green, David A. Bader, and Henning Meyerhenke. 2018. Scalable Katz ranking computation in large static and dynamic graphs. In Proceedings of the 26th Annual European Symposium on Algorithms (ESA’18). 42:1--42:14.Google Scholar
- Claudia Wagner, Philipp Singer, Fariba Karimi, Jürgen Pfeffer, and Markus Strohmaier. 2017. Sampling from Social Networks with Attributes. In Proceedings of the 26th International Conference on World Wide Web (WWW’17). 1181--1190.Google ScholarDigital Library
- David Eppstein Joseph Wang. 2006. Fast approximation of centrality. Graph Algorithms and Applications 5, 5 (2006), 39.Google Scholar
- Tomasz Was and Oskar Skibski. 2018. An axiomatization of the Eigenvector and Katz centralities. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI’18).Google Scholar
- Jing Zhao, Ting-Hong Yang, Yongxu Huang, and Petter Holme. 2011. Ranking candidate disease genes from gene expression and protein interaction: A Katz-centrality based approach. PLoS ONE 6, 9 (2011), e24306.Google ScholarCross Ref
- Lin Zhu, Nicolas A. Menzies, Jianing Wang, Benjamin P. Linas, Steven M. Goodreau, and Joshua A. Salomon. 2020. estimation and correction of bias in network simulations based on respondent-driven sampling data. Scientific Reports 10, 1 (2020), 1--11.Google Scholar
Index Terms
- SAKE: Estimating Katz Centrality Based on Sampling for Large-Scale Social Networks
Recommendations
Sampling Based Katz Centrality Estimation for Large-Scale Social Networks
Algorithms and Architectures for Parallel ProcessingAbstractKatz centrality is a fundamental concept to measure the influence of a vertex in a social network. However, existing approaches to calculating Katz centrality in a large-scale network is unpractical and computationally expensive. In this paper, we ...
Efficiently Estimating Motif Statistics of Large Networks
Exploring statistics of locally connected subgraph patterns (also known as network motifs) has helped researchers better understand the structure and function of biological and Online Social Networks (OSNs). Nowadays, the massive size of some critical ...
Unbiased Characterization of Node Pairs over Large Graphs
TKDD Special Issue (SIGKDD'13)Characterizing user pair relationships is important for applications such as friend recommendation and interest targeting in online social networks (OSNs). Due to the large-scale nature of such networks, it is infeasible to enumerate all user pairs and ...
Comments