skip to main content
research-article

SAKE: Estimating Katz Centrality Based on Sampling for Large-Scale Social Networks

Authors Info & Claims
Published:18 April 2021Publication History
Skip Abstract Section

Abstract

Katz centrality is a fundamental concept to measure the influence of a vertex in a social network. However, existing approaches to calculating Katz centrality in a large-scale network are unpractical and computationally expensive. In this article, we propose a novel method to estimate Katz centrality based on graph sampling techniques, which object to achieve comparable estimation accuracy of the state-of-the-arts with much lower computational complexity. Specifically, we develop a Horvitz–Thompson estimate for Katz centrality by using a multi-round sampling approach and deriving an unbiased mean value estimator. We further propose SAKE, a Sampling-based Algorithm for fast Katz centrality Estimation. We prove that the estimator calculated by SAKE is probabilistically guaranteed to be within an additive error from the exact value. Extensive evaluation experiments based on four real-world networks show that the proposed algorithm can estimate Katz centralities for partial vertices with low sampling rate, low computation time, and it works well in identifying high influence vertices in social networks.

References

  1. 2017. Livemocha network dataset – KONECT. Retrieved from http://konect.uni-koblenz.de/networks/livemocha.Google ScholarGoogle Scholar
  2. Nesreen K. Ahmed, Jennifer Neville, and Ramana Kompella. 2010. Reconsidering the foundations of network sampling. In Proceedings of the 2nd Workshop on Information in Networks (WIN’10).Google ScholarGoogle Scholar
  3. Nesreen K. Ahmed, Jennifer Neville, and Ramana Kompella. 2014. Network sampling: From static to streaming graphs. ACM Transactions on Knowledge Discovery from Data 8, 2 (2014), 7.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. Dbpedia: A nucleus for a web of open data. In The Semantic Web. Springer, 722--735.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Michele Benzi and Christine Klymko. 2013. Total communicability as a centrality measure. Journal of Complex Networks 1, 2 (2013), 124--149.Google ScholarGoogle ScholarCross RefCross Ref
  6. Paolo Boldi and Sebastiano Vigna. 2014. Axioms for centrality. Internet Mathematics 10, 3–4 (2014), 222--262.Google ScholarGoogle ScholarCross RefCross Ref
  7. Phillip Bonacich. 1972. Factoring and weighting approaches to status scores and clique identification. Journal of Mathematical Sociology 2, 1 (1972), 113--120.Google ScholarGoogle ScholarCross RefCross Ref
  8. Stephen P. Borgatti, Kathleen M. Carley, and David Krackhardt. 2006. On the robustness of centrality measures under conditions of imperfect data. Social Networks 28, 2 (2006), 124--136.Google ScholarGoogle ScholarCross RefCross Ref
  9. Wei Chen and Shang-Hua Teng. 2017. Interplay between social influence and network centrality: A comparative study on Shapley centrality and single-node-influence centrality. In Proceedings of the 26th International Conference on World Wide Web (WWW’17). International World Wide Web Conferences Steering Committee, 967--976.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Elizabeth Costenbader and Thomas W. Valente. 2003. The stability of centrality measures when networks are sampled. Social Networks 25, 4 (2003), 283--307.Google ScholarGoogle ScholarCross RefCross Ref
  11. Eric D. Kolaczyk. 2009. Statistical Analysis of Network Data: Methods and Models. Springer.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Talya Eden, Shweta Jain, Ali Pinar, Dana Ron, and C. Seshadhri. 2018. Provable and practical approximations for the degree distribution using sublinear graph samples. In Proceedings of the 27th International Conference on World Wide Web (WWW’18). 449--458.Google ScholarGoogle Scholar
  13. Roohollah Etemadi and Jianguo Lu. 2017. Bias correction in clustering coefficient estimation. In Proceedings of the IEEE International Conference on Big Data (Big Data’17). IEEE, 606--615.Google ScholarGoogle ScholarCross RefCross Ref
  14. Kurt C. Foster, Stephen Q. Muth, John J. Potterat, and Richard B. Rothenberg. 2001. A faster Katz status score algorithm. Computational & Mathematical Organization Theory 7, 4 (2001), 275--285.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Minas Gjoka, Maciej Kurant, Carter T. Butts, and Athina Markopoulou. 2010. Walking in Facebook: A case study of unbiased sampling of OSNs. In Proceedings of the 2010 IEEE Infocom. IEEE, 1--9.Google ScholarGoogle ScholarCross RefCross Ref
  16. Stephen J. Hardiman and Liran Katzir. 2013. Estimating clustering coefficients and size of social networks via random walk. In Proceedings of the 22nd International Conference on World Wide Web (WWW’13). ACM, 539--550.Google ScholarGoogle Scholar
  17. Douglas D. Heckathorn and Christopher J. Cameron. 2017. Network sampling: From snowball and multiplicity to respondent-driven sampling. Annual Review of Sociology 43, 1 (2017), 101--119.Google ScholarGoogle ScholarCross RefCross Ref
  18. Wassily Hoeffding. 1963. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58, 301 (1963), 13--30.Google ScholarGoogle ScholarCross RefCross Ref
  19. Daniel G. Horvitz and Donovan J. Thompson. 1952. A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association 47, 260 (1952), 663--685.Google ScholarGoogle ScholarCross RefCross Ref
  20. Shiyu Ji and Zenghui Yan. 2016. Refining approximating betweenness centrality based on samplings. arXiv preprint arXiv:1608.04472 (2016).Google ScholarGoogle Scholar
  21. Leo Katz. 1953. A new status index derived from sociometric analysis. Psychometrika 18, 1 (1953), 39--43.Google ScholarGoogle ScholarCross RefCross Ref
  22. David Kempe, Jon Kleinberg, and Éva Tardos. 2003. Maximizing the spread of influence through a social network. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’03). ACM, 137--146.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Sang Hoon Lee, Pan-Jun Kim, and Hawoong Jeong. 2006. Statistical properties of sampled networks. Physical Review E 73, 1 (2006), 016102.Google ScholarGoogle ScholarCross RefCross Ref
  24. Jure Leskovec and Christos Faloutsos. 2006. Sampling from large graphs. In Proceedings of the 12th International Conference on Knowledge Discovery and Data Mining (KDD’06). ACM, 631--636.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Jure Leskovec, Kevin J. Lang, Anirban Dasgupta, and Michael W. Mahoney. 2008. Statistical properties of community structure in large social and information networks. In Proceedings of the 17th International Conference on World Wide Web (WWW’08). ACM, 695--704.Google ScholarGoogle Scholar
  26. Mingkai Lin, Wenzhong Li, Cam-tu Nguyen, Xiaoliang Wang, and Sanglu Lu. 2019. Sampling based Katz centrality estimation for large-scale social networks. In Proceedings of the International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP’19). Springer, 584--598.Google ScholarGoogle Scholar
  27. Arun S. Maiya and Tanya Y. Berger-Wolf. 2011. Benefits of bias: Towards better characterization of network sampling. In Proceedings of the 17th International Conference on Knowledge Discovery and Data Mining (KDD’11). ACM, 105--113.Google ScholarGoogle Scholar
  28. Christopher Manning, Prabhakar Raghavan, and Hinrich Schütze. 2010. Introduction to information retrieval. Natural Language Engineering 16, 1 (2010), 100--103.Google ScholarGoogle Scholar
  29. Toshiki Matsumura, Kenta Iwasaki, and Kazuyuki Shudo. 2018. Average path length estimation of social networks by random walk. In Proceedings of the 2018 IEEE International Conference on Big Data and Smart Computing (BigComp’18). IEEE, 611--614.Google ScholarGoogle ScholarCross RefCross Ref
  30. Eisha Nathan and David A. Bader. 2017. Approximating personalized Katz centrality in dynamic graphs. In Proceedings of the International Conference on Parallel Processing and Applied Mathematics (PPAM’17). Springer, 290--302.Google ScholarGoogle Scholar
  31. Eisha Nathan, Geoffrey Sanders, James Fairbanks, and David A. Bader. 2017. Graph ranking guarantees for numerical approximations to Katz centrality. In International Conference on Computational Science (ICCS’17), Vol. 108. Elsevier, 68--78. DOI:10.1016/j.procs.2017.05.021Google ScholarGoogle Scholar
  32. Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford InfoLab.Google ScholarGoogle Scholar
  33. Alireza Rezvanian, Behnaz Moradabadi, Mina Ghavipour, Mohammad Mehdi Daliri Khomami, and Mohammad Reza Meybodi. 2019. Social network sampling. In Learning Automata Approach for Social Networks. Springer, 91--149.Google ScholarGoogle Scholar
  34. Bernardete Ribeiro and Don Towsley. 2012. On the estimation accuracy of degree distributions from graph sampling. In Proceedings of the IEEE 51st IEEE Conference on Decision and Control.Google ScholarGoogle ScholarCross RefCross Ref
  35. Matthew Richardson and Pedro Domingos. 2002. Mining knowledge-sharing sites for viral marketing. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’02). ACM, 61--70.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Matteo Riondato and Evgenios M. Kornaropoulos. 2016. Fast approximation of betweenness centrality through sampling. Data Mining and Knowledge Discovery 30, 2 (2016), 438--475.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Lubos Takac and Michal Zabovsky. 2012. Data analysis in public social networks. In Proceedings of the International Scientific Conference and International Workshop Present Day Trends of Innovations, Vol. 1.Google ScholarGoogle Scholar
  38. Alexander van der Grinten, Elisabetta Bergamini, Oded Green, David A. Bader, and Henning Meyerhenke. 2018. Scalable Katz ranking computation in large static and dynamic graphs. In Proceedings of the 26th Annual European Symposium on Algorithms (ESA’18). 42:1--42:14.Google ScholarGoogle Scholar
  39. Claudia Wagner, Philipp Singer, Fariba Karimi, Jürgen Pfeffer, and Markus Strohmaier. 2017. Sampling from Social Networks with Attributes. In Proceedings of the 26th International Conference on World Wide Web (WWW’17). 1181--1190.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. David Eppstein Joseph Wang. 2006. Fast approximation of centrality. Graph Algorithms and Applications 5, 5 (2006), 39.Google ScholarGoogle Scholar
  41. Tomasz Was and Oskar Skibski. 2018. An axiomatization of the Eigenvector and Katz centralities. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI’18).Google ScholarGoogle Scholar
  42. Jing Zhao, Ting-Hong Yang, Yongxu Huang, and Petter Holme. 2011. Ranking candidate disease genes from gene expression and protein interaction: A Katz-centrality based approach. PLoS ONE 6, 9 (2011), e24306.Google ScholarGoogle ScholarCross RefCross Ref
  43. Lin Zhu, Nicolas A. Menzies, Jianing Wang, Benjamin P. Linas, Steven M. Goodreau, and Joshua A. Salomon. 2020. estimation and correction of bias in network simulations based on respondent-driven sampling data. Scientific Reports 10, 1 (2020), 1--11.Google ScholarGoogle Scholar

Index Terms

  1. SAKE: Estimating Katz Centrality Based on Sampling for Large-Scale Social Networks

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Knowledge Discovery from Data
        ACM Transactions on Knowledge Discovery from Data  Volume 15, Issue 4
        August 2021
        486 pages
        ISSN:1556-4681
        EISSN:1556-472X
        DOI:10.1145/3458847
        Issue’s Table of Contents

        Copyright © 2021 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 18 April 2021
        • Accepted: 1 December 2020
        • Revised: 1 November 2020
        • Received: 1 January 2020
        Published in tkdd Volume 15, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format