skip to main content
research-article

Refining Social Graph Connectivity via Shortcut Edge Addition

Published:12 October 2015Publication History
Skip Abstract Section

Abstract

Small changes on the structure of a graph can have a dramatic effect on its connectivity. While in the traditional graph theory, the focus is on well-defined properties of graph connectivity, such as biconnectivity, in the context of a social graph, connectivity is typically manifested by its ability to carry on social processes. In this paper, we consider the problem of adding a small set of nonexisting edges (shortcuts) in a social graph with the main objective of minimizing its characteristic path length. This property determines the average distance between pairs of vertices and essentially controls how broadly information can propagate through a network. We formally define the problem of interest, characterize its hardness and propose a novel method, path screening, which quickly identifies important shortcuts to guide the augmentation of the graph. We devise a sampling-based variant of our method that can scale up the computation in larger graphs. The claims of our methods are formally validated. Through experiments on real and synthetic data, we demonstrate that our methods are a multitude of times faster than standard approaches, their accuracy outperforms sensible baselines and they can ease the spread of information in a network, for a varying range of conditions.

References

  1. Noga Alon. 1986. Eigenvalues and expanders. Combinatorica 6, 2, 83--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. B. Awerbuch. 1985. A new distributed depth-first-search algorithm. Inform. Process. Lett. 20, 3, 147--150.Google ScholarGoogle ScholarCross RefCross Ref
  3. Baruch Awerbuch and Robert G. Gallager. 1985. Distributed BFS algorithms. In FOCS, 250--256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Lars Backstrom and Jure Leskovec. 2011. Supervised random walks: predicting and recommending links in social networks. In WSDM, 635--644. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Albert-László Barabási and Réka Albert. 1999. Emergence of scaling in random networks. Science 286, 5439, 509--512.Google ScholarGoogle Scholar
  6. D. Bu, Y. Zhao, L. Cai, H. Xue, X. Zhu, H. Lu, J. Zhang, S. Sun, L. Ling, N. Zhang, G. Li, and R. Chen. 2003. Topological structure analysis of the protein-protein interaction network in budding yeast. Nucl. Acids Res. 31, 9, 2443--2450.Google ScholarGoogle ScholarCross RefCross Ref
  7. K. Mani Chandy and Jayadev Misra. 1982. Distributed computation on graphs: shortest path algorithms. ACM Commun. 25, 11, 833--837. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Vineet Chaoji, Sayan Ranu, Rajeev Rastogi, and Rushi Bhatt. 2012. Recommendations to boost content spread in social networks. In WWW. 529--538. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: simplified data processing on large clusters. In OSDI. 107--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Erik D. Demaine and Morteza Zadimoghaddam. 2010. Minimizing the diameter of a network using shortcut edges. In SWAT. 420--431. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. E.W. Dijkstra. 1959. A note on two problems in connexion with graphs. Numerische mathematik 1, 1, 269--271. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. Easley and J. Kleinberg. 2010. Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press, Cambridge, U.K. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Kapali P. Eswaran and Robert Endre Tarjan. 1976. Augmentation problems. SIAM J. Comput. 5, 4, 653--665.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Uriel Feige. 1998. A threshold of ln n for approximating set cover. J. ACM 45, 4, 634--652. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Yuval Filmus and Justin Ward. 2012. A tight combinatorial algorithm for submodular maximization subject to a matroid constraint. In FOCS. 659--668. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Greg N. Frederickson and Joseph JáJá. 1981. Approximation algorithms for several graph augmentation problems. SIAM J. Comput. 10, 2, 270--283.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Michael L. Fredman and Robert Endre Tarjan. 1987. Fibonacci heaps and their uses in improved network optimization algorithms. J. ACM 34, 3, 596--615. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Girvan and M. E. J. Newman. 2002. Community structure in social and biological networks. Proc. Nat. Acad. Sci. 99, 12, 7821--7826.Google ScholarGoogle ScholarCross RefCross Ref
  19. Andrew V. Goldberg and Chris Harrelson. 2005. Computing the shortest path: A search meets graph theory. In SODA. 156--165. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Mark Jerrum and Alistair Sinclair. 1989. Approximating the permanent. SIAM J. 18, 6, 1149--1178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Donald B. Johnson. 1977. Efficient algorithms for shortest paths in sparse networks. J. ACM 24, 1, 1--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Srikanth Kandula and Ratul Mahajan. 2009. Sampling biases in network path measurements and what to do about it. In IMC. 156--169. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Michael Kapralov, Ian Post, and Jan Vondrak. 2013. Online submodular welfare maximization: Greedy is optimal. In SODA. 1216--1225. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. George Karypis and Vipin Kumar. 1998. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20, 1, 359--392. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. David Kempe, Jon Kleinberg, and Éva Tardos. 2003. Maximizing the spread of influence through a social network. In KDD. 137--147. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Samir Khuller and Ramakrishna Thurimella. 1993. Approximation algorithms for graph augmentation. J. Algorithms 14, 2, 214--225. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Jamie King. 2003. Conductance and rapidly mixing Markov chains. Technical report, University of Waterloo, 2003.Google ScholarGoogle Scholar
  28. Jon Kleinberg. 2007. Cascading behavior in networks: Algorithmic and economic issues. In Algorithmic Game Theory, Noam Nisan, Tim Roughgarden, Éva Tardos, and Vijay Vazirani (Eds.). Cambridge University Press, Cambridge, U.K.Google ScholarGoogle Scholar
  29. J. Kleinberg, A. Slivkins, and T. Wexler. 2004. Triangulation and embedding using small sets of beacons. In FOCS. 444--453. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Nikolaos Laoutaris, Laura J. Poplawski, Rajmohan Rajaraman, Ravi Sundaram, and Shang-Hua Teng. 2008. Bounded budget connection (BBC) games or how to make friends and influence people, on a budget. In PODC. 165--174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Konstantina Lazaridou, Konstantinos Semertzidis, Evaggelia Pitoura, and Panayiotis Tsaparas. 2015. Identifying converging pairs of nodes on a budget. EDBT. 313--324.Google ScholarGoogle Scholar
  32. Jure Leskovec and Christos Faloutsos. 2006. Sampling from large graphs. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 631--636. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, and Natalie Glance. 2007. Cost-effective outbreak detection in networks. In KDD. 420--429. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. (June 2014). Retrieved from http://snap.stanford.edu/data.Google ScholarGoogle Scholar
  35. Jure Leskovec, Kevin J. Lang, Anirban Dasgupta, and Michael W. Mahoney. 2008. Statistical properties of community structure in large social and information networks. In WWW. 695--704. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. David Liben-Nowell and Jon Kleinberg. 2003. The link prediction problem for social networks. In CIKM. 1019--1031. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. László Lovász. 1993. Random walks on graphs: a survey. Combinatorics, Paul erdos is eighty 2, 1 (1993), 1--46.Google ScholarGoogle Scholar
  38. David Lusseau, Karsten Schneider, Oliver J. Boisseau, Patti Haase, Elisabeth Slooten, and Steve M Dawson. 2003. The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations. Beh. Ecol. Sociobiol. 54, 4, 91--125.Google ScholarGoogle ScholarCross RefCross Ref
  39. Karl Menger. 1927. Zur allgemeinen kurventheorie. Fundamenta Mathematicae 10, 1, 96--115.Google ScholarGoogle ScholarCross RefCross Ref
  40. Adam Meyerson and Brian Tagiku. 2009. Minimizing average shortest path distances via shortcut edge addition. In APPROX-RANDOM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Abedelaziz Mohaisen, Aaram Yun, and Yongdae Kim. 2010. Measuring the mixing time of social graphs. In Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement. 383--389. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Stephen Morris. 2000. Contagion. Rev. Economic Stud. 67, 1, 57--78.Google ScholarGoogle ScholarCross RefCross Ref
  43. G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. 1978. An analysis of approximations for maximizing submodular set functions-I. Math. Program. 14, 1, 265--294.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. M. E. J. Newman. 2005. A measure of betweenness centrality based on random walks. Soc. Netw. 27, 1, 39--54.Google ScholarGoogle ScholarCross RefCross Ref
  45. M. E. J. Newman. 2006. Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E 74, 3, 036104.Google ScholarGoogle ScholarCross RefCross Ref
  46. Zeev Nutov. 2005. Approximating connectivity augmentation problems. In SODA. 176--185. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Christos H. Papadimitriou and Kenneth Steiglitz. 1998. Combinatorial Optimization: Algorithms and Complexity. Courier Corporation, Mineola, New York.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Manos Papagelis, Francesco Bonchi, and Aristides Gionis. 2011. Suggesting ghost edges for a smaller world. In CIKM. 2305--2308. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Manos Papagelis, Gautam Das, and Nick Koudas. 2013. Sampling online social networks. IEEE Trans. Knowl. Data Eng. 25, 3, 662--676. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. N. Parotisidis, Evaggelia Pitoura, and Panayiotis Tsaparas. 2015. Selecting shortcuts for a smaller world. In SIAM International Conference on Data Mining (SDM). 28--36.Google ScholarGoogle ScholarCross RefCross Ref
  51. Alexandrin Popescul, Rin Popescul, and Lyle H. Ungar. 2003. Statistical relational learning for link prediction. In IJCAI03 Workshop on Learning Statistical Models from Relational Data. 109--115.Google ScholarGoogle Scholar
  52. Miao Qiao, Hong Cheng, Lijun Chang, and Jeffrey Xu Yu. 2012. Approximate shortest distance computing: A query-dependent local landmark scheme. In ICDE. 55--68. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Matthew Richardson and Pedro Domingos. 2002. Mining knowledge-sharing sites for viral marketing. In KDD. 61--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. K. Shvachko, H. Kuang, S. Radia, and R. Chansler. 2010. The Hadoop distributed file system. In MSST. 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Alistair Sinclair. 1992. Improved Bounds for Mixing Rates of Markov Chains and Multicommodity Flow. Springer, New York.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Yuan Tian, Qi He, Qiankun Zhao, Xingjie Liu, and Wang-chien Lee. 2010. Boosting social network connectivity with link revival. In CIKM. 589--598. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Hanghang Tong, B. Aditya Prakash, Tina Eliassi-Rad, Michalis Faloutsos, and Christos Faloutsos. 2012. Gelling, and melting, large graphs by edge manipulation. In CIKM. 245--254. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Jan Vondrak. 2008. Optimal approximation for the submodular welfare problem in the value oracle model. In STOC. 67--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Zhuojie Zhou, Nan Zhang, Zhiguo Gong, and Gautam Das. 2013. Faster random walks by rewiring online social networks on-the-fly. In ICDE. 769--780. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Refining Social Graph Connectivity via Shortcut Edge Addition

        Recommendations

        Reviews

        Salvatore F. Pileggi

        Small changes to the structure of graphs imply a huge impact on their connectivity. While in a purely theoretical approach, a topological change affects just formal properties, graph data structures associated with well-defined semantics (for example, a social graph) can significantly alter the reality they are representing (such as social processes in a social graph). In this paper, the characteristic path length (average distance between pairs of vertices) is minimized by applying a graph augmentation technique: a small set of nonexisting edges is added to the original structure in order to define shortcuts. Even though contextual properties (for example, influence of people, relevance of content) play an important role in the propagation of the information in practice, shorter paths are a guarantee of improved performance. This work is definitely a solid contribution, including a clear introduction to the problem, an appreciated section on motivating applications, and a validation/evaluation of the model with synthetic data. As a very minor remark, I would have liked a deeper discussion on the effects of augmentation techniques in social graphs, also including negative ones. Online Computing Reviews Service

        Access critical reviews of Computing literature here

        Become a reviewer for Computing Reviews.

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Knowledge Discovery from Data
          ACM Transactions on Knowledge Discovery from Data  Volume 10, Issue 2
          October 2015
          291 pages
          ISSN:1556-4681
          EISSN:1556-472X
          DOI:10.1145/2835206
          Issue’s Table of Contents

          Copyright © 2015 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 12 October 2015
          • Accepted: 1 April 2015
          • Revised: 1 February 2015
          • Received: 1 May 2014
          Published in tkdd Volume 10, Issue 2

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader