Abstract
Small changes on the structure of a graph can have a dramatic effect on its connectivity. While in the traditional graph theory, the focus is on well-defined properties of graph connectivity, such as biconnectivity, in the context of a social graph, connectivity is typically manifested by its ability to carry on social processes. In this paper, we consider the problem of adding a small set of nonexisting edges (shortcuts) in a social graph with the main objective of minimizing its characteristic path length. This property determines the average distance between pairs of vertices and essentially controls how broadly information can propagate through a network. We formally define the problem of interest, characterize its hardness and propose a novel method, path screening, which quickly identifies important shortcuts to guide the augmentation of the graph. We devise a sampling-based variant of our method that can scale up the computation in larger graphs. The claims of our methods are formally validated. Through experiments on real and synthetic data, we demonstrate that our methods are a multitude of times faster than standard approaches, their accuracy outperforms sensible baselines and they can ease the spread of information in a network, for a varying range of conditions.
- Noga Alon. 1986. Eigenvalues and expanders. Combinatorica 6, 2, 83--96. Google ScholarDigital Library
- B. Awerbuch. 1985. A new distributed depth-first-search algorithm. Inform. Process. Lett. 20, 3, 147--150.Google ScholarCross Ref
- Baruch Awerbuch and Robert G. Gallager. 1985. Distributed BFS algorithms. In FOCS, 250--256. Google ScholarDigital Library
- Lars Backstrom and Jure Leskovec. 2011. Supervised random walks: predicting and recommending links in social networks. In WSDM, 635--644. Google ScholarDigital Library
- Albert-László Barabási and Réka Albert. 1999. Emergence of scaling in random networks. Science 286, 5439, 509--512.Google Scholar
- D. Bu, Y. Zhao, L. Cai, H. Xue, X. Zhu, H. Lu, J. Zhang, S. Sun, L. Ling, N. Zhang, G. Li, and R. Chen. 2003. Topological structure analysis of the protein-protein interaction network in budding yeast. Nucl. Acids Res. 31, 9, 2443--2450.Google ScholarCross Ref
- K. Mani Chandy and Jayadev Misra. 1982. Distributed computation on graphs: shortest path algorithms. ACM Commun. 25, 11, 833--837. Google ScholarDigital Library
- Vineet Chaoji, Sayan Ranu, Rajeev Rastogi, and Rushi Bhatt. 2012. Recommendations to boost content spread in social networks. In WWW. 529--538. Google ScholarDigital Library
- Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: simplified data processing on large clusters. In OSDI. 107--113. Google ScholarDigital Library
- Erik D. Demaine and Morteza Zadimoghaddam. 2010. Minimizing the diameter of a network using shortcut edges. In SWAT. 420--431. Google ScholarDigital Library
- E.W. Dijkstra. 1959. A note on two problems in connexion with graphs. Numerische mathematik 1, 1, 269--271. Google ScholarDigital Library
- D. Easley and J. Kleinberg. 2010. Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press, Cambridge, U.K. Google ScholarDigital Library
- Kapali P. Eswaran and Robert Endre Tarjan. 1976. Augmentation problems. SIAM J. Comput. 5, 4, 653--665.Google ScholarDigital Library
- Uriel Feige. 1998. A threshold of ln n for approximating set cover. J. ACM 45, 4, 634--652. Google ScholarDigital Library
- Yuval Filmus and Justin Ward. 2012. A tight combinatorial algorithm for submodular maximization subject to a matroid constraint. In FOCS. 659--668. Google ScholarDigital Library
- Greg N. Frederickson and Joseph JáJá. 1981. Approximation algorithms for several graph augmentation problems. SIAM J. Comput. 10, 2, 270--283.Google ScholarDigital Library
- Michael L. Fredman and Robert Endre Tarjan. 1987. Fibonacci heaps and their uses in improved network optimization algorithms. J. ACM 34, 3, 596--615. Google ScholarDigital Library
- M. Girvan and M. E. J. Newman. 2002. Community structure in social and biological networks. Proc. Nat. Acad. Sci. 99, 12, 7821--7826.Google ScholarCross Ref
- Andrew V. Goldberg and Chris Harrelson. 2005. Computing the shortest path: A search meets graph theory. In SODA. 156--165. Google ScholarDigital Library
- Mark Jerrum and Alistair Sinclair. 1989. Approximating the permanent. SIAM J. 18, 6, 1149--1178. Google ScholarDigital Library
- Donald B. Johnson. 1977. Efficient algorithms for shortest paths in sparse networks. J. ACM 24, 1, 1--13. Google ScholarDigital Library
- Srikanth Kandula and Ratul Mahajan. 2009. Sampling biases in network path measurements and what to do about it. In IMC. 156--169. Google ScholarDigital Library
- Michael Kapralov, Ian Post, and Jan Vondrak. 2013. Online submodular welfare maximization: Greedy is optimal. In SODA. 1216--1225. Google ScholarDigital Library
- George Karypis and Vipin Kumar. 1998. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20, 1, 359--392. Google ScholarDigital Library
- David Kempe, Jon Kleinberg, and Éva Tardos. 2003. Maximizing the spread of influence through a social network. In KDD. 137--147. Google ScholarDigital Library
- Samir Khuller and Ramakrishna Thurimella. 1993. Approximation algorithms for graph augmentation. J. Algorithms 14, 2, 214--225. Google ScholarDigital Library
- Jamie King. 2003. Conductance and rapidly mixing Markov chains. Technical report, University of Waterloo, 2003.Google Scholar
- Jon Kleinberg. 2007. Cascading behavior in networks: Algorithmic and economic issues. In Algorithmic Game Theory, Noam Nisan, Tim Roughgarden, Éva Tardos, and Vijay Vazirani (Eds.). Cambridge University Press, Cambridge, U.K.Google Scholar
- J. Kleinberg, A. Slivkins, and T. Wexler. 2004. Triangulation and embedding using small sets of beacons. In FOCS. 444--453. Google ScholarDigital Library
- Nikolaos Laoutaris, Laura J. Poplawski, Rajmohan Rajaraman, Ravi Sundaram, and Shang-Hua Teng. 2008. Bounded budget connection (BBC) games or how to make friends and influence people, on a budget. In PODC. 165--174. Google ScholarDigital Library
- Konstantina Lazaridou, Konstantinos Semertzidis, Evaggelia Pitoura, and Panayiotis Tsaparas. 2015. Identifying converging pairs of nodes on a budget. EDBT. 313--324.Google Scholar
- Jure Leskovec and Christos Faloutsos. 2006. Sampling from large graphs. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 631--636. Google ScholarDigital Library
- Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, and Natalie Glance. 2007. Cost-effective outbreak detection in networks. In KDD. 420--429. Google ScholarDigital Library
- Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. (June 2014). Retrieved from http://snap.stanford.edu/data.Google Scholar
- Jure Leskovec, Kevin J. Lang, Anirban Dasgupta, and Michael W. Mahoney. 2008. Statistical properties of community structure in large social and information networks. In WWW. 695--704. Google ScholarDigital Library
- David Liben-Nowell and Jon Kleinberg. 2003. The link prediction problem for social networks. In CIKM. 1019--1031. Google ScholarDigital Library
- László Lovász. 1993. Random walks on graphs: a survey. Combinatorics, Paul erdos is eighty 2, 1 (1993), 1--46.Google Scholar
- David Lusseau, Karsten Schneider, Oliver J. Boisseau, Patti Haase, Elisabeth Slooten, and Steve M Dawson. 2003. The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations. Beh. Ecol. Sociobiol. 54, 4, 91--125.Google ScholarCross Ref
- Karl Menger. 1927. Zur allgemeinen kurventheorie. Fundamenta Mathematicae 10, 1, 96--115.Google ScholarCross Ref
- Adam Meyerson and Brian Tagiku. 2009. Minimizing average shortest path distances via shortcut edge addition. In APPROX-RANDOM. Google ScholarDigital Library
- Abedelaziz Mohaisen, Aaram Yun, and Yongdae Kim. 2010. Measuring the mixing time of social graphs. In Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement. 383--389. Google ScholarDigital Library
- Stephen Morris. 2000. Contagion. Rev. Economic Stud. 67, 1, 57--78.Google ScholarCross Ref
- G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. 1978. An analysis of approximations for maximizing submodular set functions-I. Math. Program. 14, 1, 265--294.Google ScholarDigital Library
- M. E. J. Newman. 2005. A measure of betweenness centrality based on random walks. Soc. Netw. 27, 1, 39--54.Google ScholarCross Ref
- M. E. J. Newman. 2006. Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E 74, 3, 036104.Google ScholarCross Ref
- Zeev Nutov. 2005. Approximating connectivity augmentation problems. In SODA. 176--185. Google ScholarDigital Library
- Christos H. Papadimitriou and Kenneth Steiglitz. 1998. Combinatorial Optimization: Algorithms and Complexity. Courier Corporation, Mineola, New York.Google ScholarDigital Library
- Manos Papagelis, Francesco Bonchi, and Aristides Gionis. 2011. Suggesting ghost edges for a smaller world. In CIKM. 2305--2308. Google ScholarDigital Library
- Manos Papagelis, Gautam Das, and Nick Koudas. 2013. Sampling online social networks. IEEE Trans. Knowl. Data Eng. 25, 3, 662--676. Google ScholarDigital Library
- N. Parotisidis, Evaggelia Pitoura, and Panayiotis Tsaparas. 2015. Selecting shortcuts for a smaller world. In SIAM International Conference on Data Mining (SDM). 28--36.Google ScholarCross Ref
- Alexandrin Popescul, Rin Popescul, and Lyle H. Ungar. 2003. Statistical relational learning for link prediction. In IJCAI03 Workshop on Learning Statistical Models from Relational Data. 109--115.Google Scholar
- Miao Qiao, Hong Cheng, Lijun Chang, and Jeffrey Xu Yu. 2012. Approximate shortest distance computing: A query-dependent local landmark scheme. In ICDE. 55--68. Google ScholarDigital Library
- Matthew Richardson and Pedro Domingos. 2002. Mining knowledge-sharing sites for viral marketing. In KDD. 61--70. Google ScholarDigital Library
- K. Shvachko, H. Kuang, S. Radia, and R. Chansler. 2010. The Hadoop distributed file system. In MSST. 1--10. Google ScholarDigital Library
- Alistair Sinclair. 1992. Improved Bounds for Mixing Rates of Markov Chains and Multicommodity Flow. Springer, New York.Google ScholarDigital Library
- Yuan Tian, Qi He, Qiankun Zhao, Xingjie Liu, and Wang-chien Lee. 2010. Boosting social network connectivity with link revival. In CIKM. 589--598. Google ScholarDigital Library
- Hanghang Tong, B. Aditya Prakash, Tina Eliassi-Rad, Michalis Faloutsos, and Christos Faloutsos. 2012. Gelling, and melting, large graphs by edge manipulation. In CIKM. 245--254. Google ScholarDigital Library
- Jan Vondrak. 2008. Optimal approximation for the submodular welfare problem in the value oracle model. In STOC. 67--74. Google ScholarDigital Library
- Zhuojie Zhou, Nan Zhang, Zhiguo Gong, and Gautam Das. 2013. Faster random walks by rewiring online social networks on-the-fly. In ICDE. 769--780. Google ScholarDigital Library
Index Terms
- Refining Social Graph Connectivity via Shortcut Edge Addition
Recommendations
Extra edge connectivity and isoperimetric edge connectivity
An edge set S of a connected graph G is a k-extra edge cut, if G-S is no longer connected, and each component of G-S has at least k vertices. The cardinality of a minimum k-extra edge cut, denoted by @l"k(G), is the k-extra edge connectivity of G. The ...
Graph connectivity and its augmentation: applications of MA orderings
This paper surveys how the maximum adjacency (MA) ordering of the vertices in a graph can be used to solve various graph problems. We first explain that the minimum cut problem can be solved efficiently by utilizing the MA ordering. The idea is then ...
Constrained Edge-Splitting Problems
Splitting off two edges su,sv in a graph G means deleting su,sv and adding a new edge uv. Let G=(V+s,E) be k-edge-connected in V ($k\geq 2$) and let d(s) be even. Lovász proved that the edges incident to s can be split off in pairs in a such a way that ...
Comments