research-article

SAKE: Estimating Katz Centrality Based on Sampling for Large-Scale Social Networks

Authors:
Mingkai Lin

Nanjing University, Nanjing, Jiangsu, China

Nanjing University, Nanjing, Jiangsu, China
View Profile

,
Wenzhong Li

State Key Laboratory for Novel Software Technology and Sino-German Institutes of Social Computing, Nanjing University, Nanjing, Jiangsu, China

State Key Laboratory for Novel Software Technology and Sino-German Institutes of Social Computing, Nanjing University, Nanjing, Jiangsu, China

0000-0002-9199-3655
View Profile

,
Lynda J. Song

University of Leeds, UK

University of Leeds, UK
View Profile

,
Cam-Tu Nguyen

Nanjing University, Nanjing, Jiangsu, China

Nanjing University, Nanjing, Jiangsu, China
View Profile

,
Xiaoliang Wang

Nanjing University, Nanjing, Jiangsu, China

Nanjing University, Nanjing, Jiangsu, China
View Profile

,
Sanglu Lu

State Key Laboratory for Novel Software Technology and Sino-German Institutes of Social Computing, Nanjing University, Nanjing, Jiangsu, China

State Key Laboratory for Novel Software Technology and Sino-German Institutes of Social Computing, Nanjing University, Nanjing, Jiangsu, China
View Profile

ACM Transactions on Knowledge Discovery from Data Volume 15 Issue 4Article No.: 66pp 1–21https://doi.org/10.1145/3441646

Published:18 April 2021Publication History

ACM Transactions on Knowledge Discovery from Data

Abstract

Katz centrality is a fundamental concept to measure the influence of a vertex in a social network. However, existing approaches to calculating Katz centrality in a large-scale network are unpractical and computationally expensive. In this article, we propose a novel method to estimate Katz centrality based on graph sampling techniques, which object to achieve comparable estimation accuracy of the state-of-the-arts with much lower computational complexity. Specifically, we develop a Horvitz–Thompson estimate for Katz centrality by using a multi-round sampling approach and deriving an unbiased mean value estimator. We further propose SAKE, a Sampling-based Algorithm for fast Katz centrality Estimation. We prove that the estimator calculated by SAKE is probabilistically guaranteed to be within an additive error from the exact value. Extensive evaluation experiments based on four real-world networks show that the proposed algorithm can estimate Katz centralities for partial vertices with low sampling rate, low computation time, and it works well in identifying high influence vertices in social networks.

References

2017. Livemocha network dataset – KONECT. Retrieved from http://konect.uni-koblenz.de/networks/livemocha.Google Scholar
Nesreen K. Ahmed, Jennifer Neville, and Ramana Kompella. 2010. Reconsidering the foundations of network sampling. In Proceedings of the 2nd Workshop on Information in Networks (WIN’10).Google Scholar
Nesreen K. Ahmed, Jennifer Neville, and Ramana Kompella. 2014. Network sampling: From static to streaming graphs. ACM Transactions on Knowledge Discovery from Data 8, 2 (2014), 7.Google ScholarDigital Library
Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. Dbpedia: A nucleus for a web of open data. In The Semantic Web. Springer, 722--735.Google ScholarDigital Library
Michele Benzi and Christine Klymko. 2013. Total communicability as a centrality measure. Journal of Complex Networks 1, 2 (2013), 124--149.Google ScholarCross Ref
Paolo Boldi and Sebastiano Vigna. 2014. Axioms for centrality. Internet Mathematics 10, 3–4 (2014), 222--262.Google ScholarCross Ref
Phillip Bonacich. 1972. Factoring and weighting approaches to status scores and clique identification. Journal of Mathematical Sociology 2, 1 (1972), 113--120.Google ScholarCross Ref
Stephen P. Borgatti, Kathleen M. Carley, and David Krackhardt. 2006. On the robustness of centrality measures under conditions of imperfect data. Social Networks 28, 2 (2006), 124--136.Google ScholarCross Ref
Wei Chen and Shang-Hua Teng. 2017. Interplay between social influence and network centrality: A comparative study on Shapley centrality and single-node-influence centrality. In Proceedings of the 26th International Conference on World Wide Web (WWW’17). International World Wide Web Conferences Steering Committee, 967--976.Google ScholarDigital Library
Elizabeth Costenbader and Thomas W. Valente. 2003. The stability of centrality measures when networks are sampled. Social Networks 25, 4 (2003), 283--307.Google ScholarCross Ref
Eric D. Kolaczyk. 2009. Statistical Analysis of Network Data: Methods and Models. Springer.Google ScholarDigital Library
Talya Eden, Shweta Jain, Ali Pinar, Dana Ron, and C. Seshadhri. 2018. Provable and practical approximations for the degree distribution using sublinear graph samples. In Proceedings of the 27th International Conference on World Wide Web (WWW’18). 449--458.Google Scholar
Roohollah Etemadi and Jianguo Lu. 2017. Bias correction in clustering coefficient estimation. In Proceedings of the IEEE International Conference on Big Data (Big Data’17). IEEE, 606--615.Google ScholarCross Ref
Kurt C. Foster, Stephen Q. Muth, John J. Potterat, and Richard B. Rothenberg. 2001. A faster Katz status score algorithm. Computational & Mathematical Organization Theory 7, 4 (2001), 275--285.Google ScholarDigital Library
Minas Gjoka, Maciej Kurant, Carter T. Butts, and Athina Markopoulou. 2010. Walking in Facebook: A case study of unbiased sampling of OSNs. In Proceedings of the 2010 IEEE Infocom. IEEE, 1--9.Google ScholarCross Ref
Stephen J. Hardiman and Liran Katzir. 2013. Estimating clustering coefficients and size of social networks via random walk. In Proceedings of the 22nd International Conference on World Wide Web (WWW’13). ACM, 539--550.Google Scholar
Douglas D. Heckathorn and Christopher J. Cameron. 2017. Network sampling: From snowball and multiplicity to respondent-driven sampling. Annual Review of Sociology 43, 1 (2017), 101--119.Google ScholarCross Ref
Wassily Hoeffding. 1963. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58, 301 (1963), 13--30.Google ScholarCross Ref
Daniel G. Horvitz and Donovan J. Thompson. 1952. A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association 47, 260 (1952), 663--685.Google ScholarCross Ref
Shiyu Ji and Zenghui Yan. 2016. Refining approximating betweenness centrality based on samplings. arXiv preprint arXiv:1608.04472 (2016).Google Scholar
Leo Katz. 1953. A new status index derived from sociometric analysis. Psychometrika 18, 1 (1953), 39--43.Google ScholarCross Ref
David Kempe, Jon Kleinberg, and Éva Tardos. 2003. Maximizing the spread of influence through a social network. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’03). ACM, 137--146.Google ScholarDigital Library
Sang Hoon Lee, Pan-Jun Kim, and Hawoong Jeong. 2006. Statistical properties of sampled networks. Physical Review E 73, 1 (2006), 016102.Google ScholarCross Ref
Jure Leskovec and Christos Faloutsos. 2006. Sampling from large graphs. In Proceedings of the 12th International Conference on Knowledge Discovery and Data Mining (KDD’06). ACM, 631--636.Google ScholarDigital Library
Jure Leskovec, Kevin J. Lang, Anirban Dasgupta, and Michael W. Mahoney. 2008. Statistical properties of community structure in large social and information networks. In Proceedings of the 17th International Conference on World Wide Web (WWW’08). ACM, 695--704.Google Scholar
Mingkai Lin, Wenzhong Li, Cam-tu Nguyen, Xiaoliang Wang, and Sanglu Lu. 2019. Sampling based Katz centrality estimation for large-scale social networks. In Proceedings of the International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP’19). Springer, 584--598.Google Scholar
Arun S. Maiya and Tanya Y. Berger-Wolf. 2011. Benefits of bias: Towards better characterization of network sampling. In Proceedings of the 17th International Conference on Knowledge Discovery and Data Mining (KDD’11). ACM, 105--113.Google Scholar
Christopher Manning, Prabhakar Raghavan, and Hinrich Schütze. 2010. Introduction to information retrieval. Natural Language Engineering 16, 1 (2010), 100--103.Google Scholar
Toshiki Matsumura, Kenta Iwasaki, and Kazuyuki Shudo. 2018. Average path length estimation of social networks by random walk. In Proceedings of the 2018 IEEE International Conference on Big Data and Smart Computing (BigComp’18). IEEE, 611--614.Google ScholarCross Ref
Eisha Nathan and David A. Bader. 2017. Approximating personalized Katz centrality in dynamic graphs. In Proceedings of the International Conference on Parallel Processing and Applied Mathematics (PPAM’17). Springer, 290--302.Google Scholar
Eisha Nathan, Geoffrey Sanders, James Fairbanks, and David A. Bader. 2017. Graph ranking guarantees for numerical approximations to Katz centrality. In International Conference on Computational Science (ICCS’17), Vol. 108. Elsevier, 68--78. DOI:10.1016/j.procs.2017.05.021Google Scholar
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford InfoLab.Google Scholar
Alireza Rezvanian, Behnaz Moradabadi, Mina Ghavipour, Mohammad Mehdi Daliri Khomami, and Mohammad Reza Meybodi. 2019. Social network sampling. In Learning Automata Approach for Social Networks. Springer, 91--149.Google Scholar
Bernardete Ribeiro and Don Towsley. 2012. On the estimation accuracy of degree distributions from graph sampling. In Proceedings of the IEEE 51st IEEE Conference on Decision and Control.Google ScholarCross Ref
Matthew Richardson and Pedro Domingos. 2002. Mining knowledge-sharing sites for viral marketing. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’02). ACM, 61--70.Google ScholarDigital Library
Matteo Riondato and Evgenios M. Kornaropoulos. 2016. Fast approximation of betweenness centrality through sampling. Data Mining and Knowledge Discovery 30, 2 (2016), 438--475.Google ScholarDigital Library
Lubos Takac and Michal Zabovsky. 2012. Data analysis in public social networks. In Proceedings of the International Scientific Conference and International Workshop Present Day Trends of Innovations, Vol. 1.Google Scholar
Alexander van der Grinten, Elisabetta Bergamini, Oded Green, David A. Bader, and Henning Meyerhenke. 2018. Scalable Katz ranking computation in large static and dynamic graphs. In Proceedings of the 26th Annual European Symposium on Algorithms (ESA’18). 42:1--42:14.Google Scholar
Claudia Wagner, Philipp Singer, Fariba Karimi, Jürgen Pfeffer, and Markus Strohmaier. 2017. Sampling from Social Networks with Attributes. In Proceedings of the 26th International Conference on World Wide Web (WWW’17). 1181--1190.Google ScholarDigital Library
David Eppstein Joseph Wang. 2006. Fast approximation of centrality. Graph Algorithms and Applications 5, 5 (2006), 39.Google Scholar
Tomasz Was and Oskar Skibski. 2018. An axiomatization of the Eigenvector and Katz centralities. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI’18).Google Scholar
Jing Zhao, Ting-Hong Yang, Yongxu Huang, and Petter Holme. 2011. Ranking candidate disease genes from gene expression and protein interaction: A Katz-centrality based approach. PLoS ONE 6, 9 (2011), e24306.Google ScholarCross Ref
Lin Zhu, Nicolas A. Menzies, Jianing Wang, Benjamin P. Linas, Steven M. Goodreau, and Joshua A. Salomon. 2020. estimation and correction of bias in network simulations based on respondent-driven sampling data. Scientific Reports 10, 1 (2020), 1--11.Google Scholar

Index Terms

SAKE: Estimating Katz Centrality Based on Sampling for Large-Scale Social Networks
1. Information systems
  1. Information systems applications
    1. Data mining
  2. World Wide Web
    1. Web applications
      1. Social networks

Recommendations

Sampling Based Katz Centrality Estimation for Large-Scale Social Networks
Algorithms and Architectures for Parallel Processing
Abstract
Katz centrality is a fundamental concept to measure the influence of a vertex in a social network. However, existing approaches to calculating Katz centrality in a large-scale network is unpractical and computationally expensive. In this paper, we ...
Read More
Efficiently Estimating Motif Statistics of Large Networks

Exploring statistics of locally connected subgraph patterns (also known as network motifs) has helped researchers better understand the structure and function of biological and Online Social Networks (OSNs). Nowadays, the massive size of some critical ...
Read More
Unbiased Characterization of Node Pairs over Large Graphs
TKDD Special Issue (SIGKDD'13)

Characterizing user pair relationships is important for applications such as friend recommendation and interest targeting in online social networks (OSNs). Due to the large-scale nature of such networks, it is infeasible to enumerate all user pairs and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Knowledge Discovery from Data Volume 15, Issue 4
August 2021
486 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3458847
Editor:
Charu Aggarwal
IBM T. J. Watson Research, USA
Issue’s Table of Contents
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 April 2021
- Accepted: 1 December 2020
- Revised: 1 November 2020
- Received: 1 January 2020
Published in tkdd Volume 15, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Katz centrality
Social network
graph sampling
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 200
  Total Downloads
- Downloads (Last 12 months)47
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

SAKE: Estimating Katz Centrality Based on Sampling for Large-Scale Social Networks

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

Sampling Based Katz Centrality Estimation for Large-Scale Social Networks

Efficiently Estimating Motif Statistics of Large Networks

Unbiased Characterization of Node Pairs over Large Graphs