ABSTRACT
Some of the most effective influential spreader detection algorithms are unstable to small perturbations of the network structure. Inspired by bagging in Machine Learning, we propose the first Perturb and Combine (P&C) procedure for networks. It (1) creates many perturbed versions of a given graph, (2) applies a node scoring function separately to each graph, and (3) combines the results. Experiments conducted on real-world networks of various sizes with the k-core, generalized k-core, and PageRank algorithms reveal that P&C brings substantial improvements. Moreover, this performance boost can be obtained at almost no extra cost through parallelization. Finally, a bias-variance analysis suggests that P&C works mainly by reducing bias, and that therefore, it should be capable of improving the performance of all vertex scoring functions, including stable ones. An extended version of this paper is provided by [1].
- A. J.-P. Tixier, M.-E. G. Rossi, F. D. Malliaros, J. Read, and M. Vazirgiannis, "Perturb and combine to identify influential spreaders in real-world networks," arXiv preprint arXiv:1807.09586, 2018.Google Scholar
- F. Hoppensteadt, Mathematical Theories of Populations: Demographics, Genetics, and Epidemics. Siam, 1975, vol. 20.Google Scholar
- J. Leskovec, L. A. Adamic, and B. A. Huberman, "The dynamics of viral marketing," ACM Transactions on the Web (TWEB), vol. 1, no. 1, p. 5, 2007.Google ScholarDigital Library
- S. Pei, L. Muchnik, J. S. Andrade Jr, Z. Zheng, and H. A. Makse, "Searching for superspreaders of information in real-world social media," Scientific reports, vol. 4, p. 5547, 2014.Google ScholarCross Ref
- K. Balog, L. Azzopardi, and M. De Rijke, "Formal models for expert finding in enterprise corpora," in Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2006, pp. 43--50.Google Scholar
- A. Tixier, F. Malliaros, and M. Vazirgiannis, "A graph degeneracy-based approach to keyword extraction," in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016, pp. 1860--1870.Google Scholar
- M. Kitsak, L. K. Gallos, S. Havlin, F. Liljeros, L. Muchnik, H. E. Stanley, and H. A. Makse, "Identification of influential spreaders in complex networks," Nature physics, vol. 6, no. 11, pp. 888--893, 2010.Google ScholarCross Ref
- G. F. De Arruda, A. L. Barbieri, P. M. Rodríguez, F. A. Rodrigues, Y. Moreno, and L. da Fontoura Costa, "Role of centrality for the identification of influential spreaders in complex networks," Physical Review E, vol. 90, no. 3, p. 032812, 2014.Google ScholarCross Ref
- F. D. Malliaros, M.-E. G. Rossi, and M. Vazirgiannis, "Locating influential nodes in complex networks," Scientific reports, vol. 6, p. 19307, 2016.Google ScholarCross Ref
- F. Malliaros, C. Giatsidis, A. Papadopoulos, and M. Vazirgiannis, "The core decomposition of networks: Theory, algorithms and applications," 2019.Google Scholar
- S. B. Seidman, "Network structure and minimum degree," Social networks, vol. 5, no. 3, pp. 269--287, 1983.Google ScholarCross Ref
- V. Batagelj and M. Zaveršnik, "Generalized cores," arXiv preprint cs/0202039, 2002.Google Scholar
- A. Adiga and A. K. S. Vullikanti, "How robust is the core of a network?" in Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 2013, pp. 541--556.Google Scholar
- A. V. Goltsev, S. N. Dorogovtsev, and J. F. F. Mendes, "k-core (bootstrap) percolation on complex networks: Critical phenomena and nonlocal effects," Physical Review E, vol. 73, no. 5, p. 056101, 2006.Google ScholarCross Ref
- L. Breiman, "Bias, variance, and arcing classifiers," 1996.Google Scholar
- L. Breiman, "Bagging predictors," Machine learning, vol. 24, no. 2, pp. 123--140, 1996.Google ScholarCross Ref
- L. Breiman, "Random forests," Machine learning, vol. 45, no. 1, pp. 5--32, 2001.Google ScholarDigital Library
- P. Erdös and A. Rényi, "On the evolution of random graphs," Publ. Math. Inst. Hung. Acad. Sci, vol. 5, pp. 17--61, 1960.Google Scholar
- F. Chung and L. Lu, "The average distances in random graphs with given expected degrees," Proceedings of the National Academy of Sciences, vol. 99, no. 25, pp. 15 879--15 882, 2002.Google ScholarCross Ref
- I. C. Ipsen and R. S. Wills, "Mathematical properties and analysis of googleâĂŹs pagerank."Google Scholar
- A. Y. Ng, A. X. Zheng, and M. I. Jordan, "Link analysis, eigenvectors and stability," in International Joint Conference on Artificial Intelligence, vol. 17, no. 1. LAWRENCE ERLBAUM ASSOCIATES LTD, 2001, pp. 903--910.Google Scholar
- L. Page, S. Brin, R. Motwani, and T. Winograd, "The pagerank citation ranking: Bringing order to the web." Stanford InfoLab, Tech. Rep., 1999.Google Scholar
- J. Leskovec and A. Krevl, "SNAP Datasets: Stanford large network dataset collection," http://snap.stanford.edu/data, Jun. 2014.Google Scholar
- B. Klimt and Y. Yang, "The enron corpus: A new dataset for email classification research," ECML '04: Proceedings of the 15th European Conference on Machine Learning, pp. 217--226, 2004.Google Scholar
- W. O. Kermack and A. G. McKendrick, "Contributions to the mathematical theory of epidemics. ii. the problem of endemicity," vol. 138, no. 834. JSTOR, 1932, pp. 55--83.Google Scholar
- D. Chakrabarti, Y. Wang, C. Wang, J. Leskovec, and C. Faloutsos, "Epidemic thresholds in real networks," ACM Transactions on Information and System Security (TISSEC), vol. 10, no. 4, pp. 1:1--1:26, 2008.Google ScholarDigital Library
- R. Mihalcea and P. Tarau, "TextRank: bringing order into texts," in Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 2004.Google Scholar
- A. Tixier, K. Skianis, and M. Vazirgiannis, "Gowvis: a web application for graph-of-words-based text visualization and summarization," Proceedings of ACL-2016 System Demonstrations, pp. 151--156, 2016.Google Scholar
- A. Hulth, "Improved automatic keyword extraction given more linguistic knowledge," in Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 2003, pp. 216--223.Google Scholar
- F. Rousseau and M. Vazirgiannis, "Main core retention on graph-of-words for single-document keyword extraction," in European Conference on Information Retrieval. Springer, 2015, pp. 382--393.Google Scholar
- K. Järvelin and J. Kekäläinen, "Cumulated gain-based evaluation of ir techniques," ACM Transactions on Information Systems (TOIS), vol. 20, no. 4, pp. 422--446, 2002.Google ScholarDigital Library
- Y. Grandvalet, "Bagging equalizes influence," Machine Learning, vol. 55, pp. 251--270, 2004.Google ScholarDigital Library
- Y. Raviv and N. Intrator, "Bootstrapping with noise: An effective regularization technique," Connection Science, vol. 8, no. 3-4, pp. 355--372, 1996.Google ScholarCross Ref
- B. W. Silverman, Density estimation for statistics and data analysis. Routledge, 1986.Google ScholarCross Ref
- D. Zügner, A. Akbarnejad, and S. Günnemann, "Adversarial attacks on neural networks for graph data," arXiv preprint arXiv:1805.07984, 2018.Google Scholar
- G. Ghoshal and A.-L. Barabási, "Ranking stability and super-stable nodes in complex networks," Nature communications, vol. 2, p. 394, 2011.Google ScholarCross Ref
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in neural information processing systems, 2012, pp. 1097--1105.Google Scholar
- T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word representations in vector space," arXiv preprint arXiv:1301.3781, 2013.Google Scholar
Recommendations
Activity recognition in rehabilitation training based on ensemble stochastic configuration networks
AbstractRehabilitation training for patients with limb activity dysfunction and sub-healthy state has gradually shifted from therapies to strategies with remote assistance. Stochastic configuration networks (SCNs) are characterized by a structure that ...
Equimatchable Graphs on Surfaces
A graph G is equimatchable if each matching in G is a subset of a maximum-size matching and it is factor critical if G-v has a perfect matching for each vertex v of G. It is known that any 2-connected equimatchable graph is either bipartite or factor ...
Advancing Ensemble Learning Performance through data transformation and classifiers fusion in granular computing context
Highlights- We proposed a new ensemble learning framework in the setting of granular computing.
AbstractClassification is a special type of machine learning tasks, which is essentially achieved by training a classifier that can be used to classify new instances. In order to train a high performance classifier, it is crucial to extract ...
Comments