Abstract
Studying the topology of a network is critical to inferring underlying dynamics such as tolerance to failure, group behavior and spreading patterns. k-core decomposition is a well-established metric which partitions a graph into layers from external to more central vertices. In this paper we aim to explore whether k-core decomposition of large networks can be computed using a consumer-grade PC. We feature implementations of the "vertex-centric" distributed protocol introduced by Montresor, De Pellegrini and Miorandi on GraphChi and Webgraph. Also, we present an accurate implementation of the Batagelj and Zaversnik algorithm for k-core decomposition in Webgraph. With our implementations, we show that we can efficiently handle networks of billions of edges using a single consumer-level machine within reasonable time and can produce excellent approximations in only a fraction of the execution time. To the best of our knowledge, our biggest graphs are considerably larger than the graphs considered in the literature. Next, we present an optimized implementation of an external-memory algorithm (EMcore) by Cheng, Ke, Chu, and Özsu. We show that this algorithm also performs well for large datasets, however, it cannot predict whether a given memory budget is sufficient for a new dataset. We present a thorough analysis of all algorithms concluding that it is viable to compute k-core decomposition for large networks in a consumer-grade PC.
- M. Altaf-Ul-Amin, Y. Shinbo, K. Mihara, K. Kurokawa, and S. Kanaya. Development and implementation of an algorithm for detection of protein complexes in large interaction networks. BMC bioinformatics, 7(1):207, 2006.Google ScholarCross Ref
- J. I. Alvarez-Hamelin, L. Dall'Asta, A. Barrat, and A. Vespignani. k-core decomposition of internet graphs: hierarchies, self-similarity and measurement biases. arXiv preprint cs/0511007, 2005.Google Scholar
- J. I. Alvarez-Hamelin, L. Dall'Asta, A. Barrat, and A. Vespignani. Large scale networks fingerprinting and visualization using the k-core decomposition. In Advances in neural information processing systems, pages 41--50, 2005.Google ScholarDigital Library
- L. Antiqueira, O. N. Oliveira, L. da Fontoura Costa, and M. d. G. V. Nunes. A complex network approach to text summarization. Information Sciences, 179(5):584--599, 2009. Google ScholarDigital Library
- A.-L. Barabási and J. Frangos. Linked: the new science of networks science of networks. Basic Books, 2014.Google Scholar
- V. Batagelj and M. Zaversnik. An o (m) algorithm for cores decomposition of networks. arXiv preprint cs/0310049, 2003.Google Scholar
- D. P. Bertsekas and J. N. Tsitsiklis. Parallel and distributed computation: numerical methods. Prentice-Hall, Inc., 1989. Google ScholarDigital Library
- K. Bhawalkar, J. Kleinberg, K. Lewi, T. Roughgarden, and A. Sharma. Preventing unraveling in social networks: the anchored k-core problem. In Automata, Languages, and Programming, pages 440--451. Springer, 2012. Google ScholarDigital Library
- P. Boldi and S. Vigna. The webgraph framework i: compression techniques. In Proceedings of the 13th Int. Conference on World Wide Web, pages 595--602. ACM, 2004. Google ScholarDigital Library
- F. Bonchi, F. Gullo, A. Kaltenbrunner, and Y. Volkovich. Core decomposition of uncertain graphs. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1316--1325. ACM, 2014. Google ScholarDigital Library
- J. Cheng, Y. Ke, S. Chu, and M. T. Özsu. Efficient core decomposition in massive networks. In Data Engineering (ICDE), 2011 IEEE 27th Int. Conference on, pages 51--62. IEEE, 2011. Google ScholarDigital Library
- S. N. Dorogovtsev, A. V. Goltsev, and J. F. F. Mendes. K-core organization of complex networks. Physical review letters, 96(4), 2006.Google Scholar
- N. Du, B. Wu, X. Pei, B. Wang, and L. Xu. Community detection in large-scale social networks. In Proceedings of the 9th WebKDD workshop, pages 16--25. ACM, 2007.Google ScholarDigital Library
- S. Fortunato. Community detection in graphs. Physics Reports, 486(3):75--174, 2010.Google ScholarCross Ref
- A. V. Goltsev, S. N. Dorogovtsev, and J. Mendes. k-core (bootstrap) percolation on complex networks: Critical phenomena and nonlocal effects. Physical Review E, 73(5), 2006.Google ScholarCross Ref
- T. Gutierrez-Bunster, U. Stege, A. Thomo, and J. Taylor. How do biological networks differ from social networks? In Advances in Social Networks Analysis and Mining (ASONAM), 2014 IEEE/ACM Int. Conference on, pages 744--751. IEEE, 2014.Google ScholarDigital Library
- J. Healy, J. Janssen, E. Milios, and W. Aiello. Characterization of graphs using degree cores. In Algorithms and Models for the Web-Graph, pages 137--148. Springer, 2008. Google ScholarDigital Library
- N. Korovaiko and A. Thomo. Trust prediction from user-item ratings. Social Network Analysis and Mining, 3(3):749--759, 2013.Google ScholarCross Ref
- A. Kyrola, G. E. Blelloch, and C. Guestrin. Graphchi: Large-scale graph computation on just a pc. In OSDI, volume 12, pages 31--46, 2012. Google ScholarDigital Library
- V. E. Lee, N. Ruan, R. Jin, and C. Aggarwal. A survey of algorithms for dense subgraph discovery. In Managing and Mining Graph Data, pages 303--336. Springer, 2010.Google ScholarCross Ref
- X. Li, M. Wu, C.-K. Kwoh, and S.-K. Ng. Computational approaches for detecting protein complexes from protein interaction networks: a survey. BMC genomics, 11(Suppl 1):S3, 2010.Google ScholarCross Ref
- Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributed graphlab: a framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment, 5(8):716--727, 2012. Google ScholarDigital Library
- L. Lü and T. Zhou. Link prediction in complex networks: A survey. Physica A: Statistical Mechanics and its Applications, 390(6):1150--1170, 2011.Google ScholarCross Ref
- G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD Int. Conference on Management of data, pages 135--146. ACM, 2010. Google ScholarDigital Library
- A. Montresor, F. De Pellegrini, and D. Miorandi. Distributed k-core decomposition. Parallel and Distributed Systems, IEEE Transactions on, 24(2):288--300, 2013. Google ScholarDigital Library
- R. Pagh and F. F. Rodler. Cuckoo hashing. Journal of Algorithms, 51(2):122--144, 2004. Google ScholarDigital Library
- S. Pandit, D. H. Chau, S. Wang, and C. Faloutsos. Netprobe: a fast and scalable system for fraud detection in online auction networks. In Proceedings of the 16th Int. conference on World Wide Web, pages 201--210. ACM, 2007. Google ScholarDigital Library
- S. Papadopoulos, Y. Kompatsiaris, A. Vakali, and P. Spyridonos. Community detection in social media. Data Mining and Knowledge Discovery, 24(3):515--554, 2012. Google ScholarDigital Library
- J. Pattillo, N. Youssef, and S. Butenko. On clique relaxation models in network analysis. European Journal of Operational Research, 226(1):9--18, 2013.Google ScholarCross Ref
- M. Á. Serrano, M. Boguná, and A. Vespignani. Extracting the multiscale backbone of complex weighted networks. Proceedings of the national academy of sciences, 106(16):6483--6488, 2009.Google ScholarCross Ref
- M. Thorup. String hashing for linear probing. In Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 655--664. Society for Industrial and Applied Mathematics, 2009. Google ScholarDigital Library
- J. Ugander, L. Backstrom, C. Marlow, and J. Kleinberg. Structural diversity in social contagion. Proceedings of the National Academy of Sciences, 109(16):5962--5966, 2012.Google ScholarCross Ref
- J. Wang, M. Li, J. Chen, and Y. Pan. A fast hierarchical clustering algorithm for functional modules discovery in protein interaction networks. Computational Biology and Bioinformatics, IEEE/ACM Transactions on, 8(3):607--620, 2011. Google ScholarDigital Library
- L. Weng, F. Menczer, and Y.-Y. Ahn. Virality prediction and community structure in social networks. Scientific reports, 3, 2013.Google Scholar
- T. Wolf, A. Schroter, D. Damian, L. D. Panjer, and T. H. Nguyen. Mining task-based social networks to explore collaboration in software teams. Software, IEEE, 26(1):58--66, 2009. Google ScholarDigital Library
- W.-S. Yang and J.-B. Dia. Discovering cohesive subgroups from social networks for targeted advertising. Expert Systems with Applications, 34(3):2029--2038, 2008. Google ScholarDigital Library
Index Terms
- K-core decomposition of large networks on a single PC
Recommendations
Distance-generalized Core Decomposition
SIGMOD '19: Proceedings of the 2019 International Conference on Management of DataThe k-core of a graph is defined as the maximal subgraph in which every vertex is connected to at least k other vertices within that subgraph. In this work we introduce a distance-based generalization of the notion of k-core, which we refer to as the $(...
Streaming algorithms for k-core decomposition
A k-core of a graph is a maximal connected subgraph in which every vertex is connected to at least k vertices in the subgraph. k-core decomposition is often used in large-scale network analysis, such as community detection, protein function prediction, ...
Incremental k-core decomposition: algorithms and evaluation
A k-core of a graph is a maximal connected subgraph in which every vertex is connected to at least k vertices in the subgraph. k-core decomposition is often used in large-scale network analysis, such as community detection, protein function prediction, ...
Comments