skip to main content
research-article

Uncertain Graph Processing through Representative Instances

Published:23 October 2015Publication History
Skip Abstract Section

Abstract

Data in several applications can be represented as an uncertain graph whose edges are labeled with a probability of existence. Exact query processing on uncertain graphs is prohibitive for most applications, as it involves evaluation over an exponential number of instantiations. Thus, typical approaches employ Monte-Carlo sampling, which (i) draws a number of possible graphs (samples), (ii) evaluates the query on each of them, and (iii) aggregates the individual answers to generate the final result. However, this approach can also be extremely time consuming for large uncertain graphs commonly found in practice. To facilitate efficiency, we study the problem of extracting a single representative instance from an uncertain graph. Conventional processing techniques can then be applied on this representative to closely approximate the result on the original graph.

In order to maintain data utility, the representative instance should preserve structural characteristics of the uncertain graph. We start with representatives that capture the expected vertex degrees, as this is a fundamental property of the graph topology. We then generalize the notion of vertex degree to the concept of n-clique cardinality, that is, the number of cliques of size n that contain a vertex. For the first problem, we propose two methods: Average Degree Rewiring (ADR), which is based on random edge rewiring, and Approximate B-Matching (ABM), which applies graph matching techniques. For the second problem, we develop a greedy approach and a game-theoretic framework. We experimentally demonstrate, with real uncertain graphs, that indeed the representative instances can be used to answer, efficiently and accurately, queries based on several metrics such as shortest path distance, clustering coefficient, and betweenness centrality.

References

  1. Serge Abiteboul, Paris Kanellakis, and Gosta Grahne. 1987. On the representation and querying of sets of possible worlds. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'87). 34--48. DOI:http://dx.doi.org/10.1145/38713.38724 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Eytan Adar and Christopher Re. 2007. Managing uncertainty in social networks. IEEE Data Eng. Bulle. 30, 2 (July 2007), 15--22.Google ScholarGoogle Scholar
  3. Charu C. Aggarwal and Philip S. Yu. 2009. A survey of uncertain data algorithms and applications. IEEE Trans. Know. Data Eng. 21, 5 (May 2009), 609--623. DOI:http://dx.doi.org/10.1109/TKDE.2008.190 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. William Aiello, Fan Chung, and Linyuan Lu. 2000. A random graph model for massive graphs. In Proceedings of the ACM Symposium on Theory of Computing (STOC'00). 171--180. DOI:http://dx.doi.org/10.1145/335305.335326 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Sheldon B. Akers, Dov Harel, and Balakrishnan Krishnamurthy. 1994. The star graph: An attractive alternative to the N-cube. In Interconnection Networks for High-performance Parallel Computers, IEEE Computer Society Press, Los Alamitos, CA, 145--152. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Saurabh Asthana, Oliver D. King, Francis D. Gibbons, and Frederick P. Roth. 2004. Predicting protein complex membership using probabilistic network reliability. Genome Res. 14, 4 (June 2004), 1170--1175. DOI:http://dx.doi.org/10.1101/gr.2203804Google ScholarGoogle ScholarCross RefCross Ref
  7. Joseph Blitzstein and Persi Diaconis. 2011. A sequential importance sampling algorithm for generating random graphs with prescribed degrees. Int. Math. 6, 4 (March 2011), 489--522. DOI:http://dx.doi.org/10.1080/15427951.2010.557277Google ScholarGoogle Scholar
  8. Paolo Boldi, Francesco Bonchi, Aristides Gionis, and Tamir Tassa. 2012. Injecting uncertainty in graphs for identity obfuscation. Proc. VLDB Endow. 5, 11 (July 2012), 1376--1387. DOI:http://dx.doi.org/10.14778/2350229.2350254 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Francesco Bonchi, Francesco Gullo, Andreas Kaltenbrunner, and Yana Volkovich. 2014. Core decomposition of uncertain graphs. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'14). 1316--1325. DOI:http://dx.doi.org/10.1145/2623330.2623655 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Fan Chung and Linyuan Lu. 2002. The average distances in random graphs with given expected degrees. Proc. Nat. Acad. Sci. 99, 25 (Dec. 2002), 15879--15882. DOI:http://dx.doi.org/10.1073/pnas.252631999Google ScholarGoogle ScholarCross RefCross Ref
  11. Fan Chung, Linyuan Lu, and Van H. Vu. 2003. Spectra of random graphs with given expected degrees. Proc. Nat. Acad. Sci. 100, 11 (May 2003), 6313--6318. DOI:http://dx.doi.org/10.1073/pnas.0937490100Google ScholarGoogle ScholarCross RefCross Ref
  12. Nilesh Dalvi and Dan Suciu. 2007. Efficient query evaluation on probabilistic databases. VLDB J. 16, 4 (Oct. 2007), 523--544. DOI:http://dx.doi.org/10.1007/s00778-006-0004-3 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Charo I. Del Genio, Hyunju Kim, Zoltn Toroczkai, and Kevin E. Bassler. 2010. Efficient and exact sampling of simple graphs with given arbitrary degree sequence. PLoS ONE 5, 4 (April 2010), e10012. DOI:http://dx.doi.org/10.1371/journal.pone.0010012Google ScholarGoogle ScholarCross RefCross Ref
  14. Paul Erdös and Tibor Gallai. 1960. Graphs with prescribed degrees of vertices (Hungarian). Mat. Lapok 11 (1960), 264--274.Google ScholarGoogle Scholar
  15. Michalis Faloutsos, Petros Faloutsos, and Christos Faloutsos. 1999. On power-law relationships of the Internet topology. In Proceedings of the ACM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM'99). 251--262. DOI:http://dx.doi.org/10.1145/316188.316229 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Pierre Fraigniaud. 2007. Small worlds as navigable augmented networks: Model, analysis, and validation. In Algorithms ESA. Lecture Notes in Computer Science, Vol. 4698, Springer, Berlin, 2--11. DOI:http://dx.doi.org/10.1007/978-3-540-75520-3_2 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ana M. Gonzalez, Bo Dalsgaard, and Jens M. Olesen. 2010. Centrality measures and the importance of generalist species in pollination networks. Ecological Complex. 7, 1 (March 2010), 36--43. DOI:http://dx.doi.org/10.1016/j.ecocom.2009.03.008Google ScholarGoogle Scholar
  18. Seifollah L. Hakimi. 1962. On realizability of a set of integers as degrees of the vertices of a linear graph. J. Soc. Indust. Appl. Math. 10, 3 (Sept. 1962), 496--506. http://www.jstor.org/stable/2098746Google ScholarGoogle ScholarCross RefCross Ref
  19. Stefan Hougardy. 2009. Linear time approximation algorithms for degree constrained subgraph problems. In Research Trends in Combinatorial Optimization, Springer, Berlin, 185--200. DOI:http://dx.doi.org/10.1007/978-3-540-76796-1_9Google ScholarGoogle Scholar
  20. Xiaocheng Hu, Yufei Tao, and Chin-Wan Chung. 2013. Massive graph triangulation. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'13). 325--336. DOI:http://dx.doi.org/10.1145/2463676.2463704 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Christian Hübler, Hans-Peter Kriegel, Karsten Borgwardt, and Zoubin Ghahramani. 2008. Metropolis algorithms for representative subgraph sampling. In Proceedings of the IEEE International Conference on Data Mining (ICDM'08). 283--292. DOI:http://dx.doi.org/10.1109/ICDM.2008.124 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Ruoming Jin, Lin Liu, and Charu C. Aggarwal. 2011a. Discovering highly reliable subgraphs in uncertain graphs. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'11). 992--1000. DOI:http://dx.doi.org/10.1145/2020408.2020569 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Ruoming Jin, Lin Liu, Bolin Ding, and Haixun Wang. 2011b. Distance-constraint reachability computation in uncertain graphs. Proc. VLDB Endow. 4, 9 (June 2011), 551--562. DOI:http://dx.doi.org/10.14778/2002938.2002941 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. David Kempe, Jon Kleinberg, and Éva Tardos. 2003. Maximizing the spread of influence through a social network. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'03). 137--146. DOI:http://dx.doi.org/10.1145/956750.956769 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Arijit Khan, Francesco Bonchi, Aristides Gionis, and Francesco Gullo. 2014. Fast reliability search in uncertain graphs. In Proceedings of the International Conference on Extending Database Technology (EDBT'14). 535--546. DOI:http://dx.doi.org/10.5441/002/edbt.2014.48Google ScholarGoogle Scholar
  26. Jon Kleinberg. 2006. Complex networks and decentralized search algorithms. In Proceedings of the International Congress of Mathematicians (ICM'06). 1019--1044.Google ScholarGoogle Scholar
  27. George Kollios, Michalis Potamias, and Evimaria Terzi. 2013. Clustering large probabilistic graphs. IEEE Trans. Knowl. Data Eng. 25, 2 (Feb. 2013), 325--336. DOI:http://dx.doi.org/10.1109/TKDE.2011.243 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Gueorgi Kossinets and Duncan J. Watts. 2006. Empirical analysis of an evolving social network. Science 311, 5757 (Jan. 2006), 88--90. DOI:http://dx.doi.org/10.1126/science.1116869Google ScholarGoogle ScholarCross RefCross Ref
  29. Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos, and Zoubin Ghahramani. 2010. Kronecker graphs: An approach to modeling networks. J. Mach. Learn. Res. 11 (March 2010), 985--1042. http://dl.acm.org/citation.cfm?id=1756006.1756039 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Jure Leskovec and Christos Faloutsos. 2006. Sampling from large graphs. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'06). 631--636. DOI:http://dx.doi.org/10.1145/1150402.1150479 Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Rong-Hua Li, Jeffrey Xu Yu, Rui Mao, and Tan Jin. 2014. Efficient and accurate query evaluation on uncertain graphs via recursive stratified sampling. In Proceedings of the IEEE International Conference on Data Engineering (ICDE'14). 892--903. DOI:http://dx.doi.org/10.1109/ICDE.2014.6816709Google ScholarGoogle ScholarCross RefCross Ref
  32. David Liben-Nowell and Jon Kleinberg. 2003. The link prediction problem for social networks. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM'03). 556--559. DOI:http://dx.doi.org/10.1145/956863.956972 Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Fredrik Liljeros, Christofer R. Edling, Luis A. Nunes Amaral, H. Eugene Stanley, and Yvonne Åberg. 2001. The web of human sexual contacts. Nature 411, 6840 (June 2001), 907--908. DOI:http://dx.doi.org/10.1038/35082140Google ScholarGoogle ScholarCross RefCross Ref
  34. Lin Liu, Ruoming Jin, Charu C. Aggarwal, and Yelong Shen. 2012. Reliable clustering on uncertain graphs. In Proceedings of the IEEE International Conference on Data Mining (ICDM'12). 459--468. DOI:http://dx.doi.org/10.1109/ICDM.2012.11 Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Yang-Yu Liu, Jean-Jacques Slotine, and Albert-Laszlo Barabasi. 2011. Controllability of complex networks. Nature 473, 7346 (May 2011), 167--173. DOI:http://dx.doi.org/10.1038/nature10011Google ScholarGoogle ScholarCross RefCross Ref
  36. László Lovász. 1970. Subgraphs with prescribed valencies. J. Combin. Theory 8, 4 (1970), 391--416. DOI:http://dx.doi.org/10.1016/S0021-9800(70)80033-3Google ScholarGoogle ScholarCross RefCross Ref
  37. Priya Mahadevan, Dmitri Krioukov, Kevin Fall, and Amin Vahdat. 2006. Systematic topology analysis and generation using degree correlations. In Proceedings of the ACM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM'06). 135--146. DOI:http://dx.doi.org/10.1145/1159913.1159930 Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Miller McPherson, Lynn Smith-Lovin, and James M. Cook. 2001. Birds of a feather: Homophily in social networks. Ann. Rev. Sociol. 27, 1 (Aug. 2001), 415--444. DOI:http://dx.doi.org/10.1146/annurev.soc.27.1.415Google ScholarGoogle ScholarCross RefCross Ref
  39. Julián Mestre. 2006. Greedy in approximation algorithms. In Algorithms ESA. Lecture Notes in Computer Science, Vol. 4168. Springer, Berlin, 528--539. DOI:http://dx.doi.org/10.1007/11841036_48 Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Silvio Micali and Vijay V. Vazirani. 1980. An O(&surd;V∣ċ∣E∣) algorithm for finding maximum matching in general graphs. In Proceedings of the IEEE Annual Symposium on Foundations of Computer Science (FOCS'80). 17--27. DOI:http://dx.doi.org/10.1109/SFCS.1980.12 Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Daniele Micciancio. 2001. The hardness of the closest vector problem with preprocessing. IEEE Trans. Inform. Theory 47, 3 (March 2001), 1212--1215. DOI:http://dx.doi.org/10.1109/18.915688 Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Milena Mihail and Nisheeth K. Vishnoi. 2002. On generating graphs with prescribed degree sequences for computer network modeling applications. Position Paper. In Proceedings of the 3rd Workshop on Approximate and Randomized Algorithms for Communication Networks (ARACNE'02). 1--11.Google ScholarGoogle Scholar
  43. Dov Monderer and Lloyd S. Shapley. 1996. Potential games. Games Econ. Behav. 14, 1 (May 1996), 124--143. DOI:http://dx.doi.org/10.1006/game.1996.0044Google ScholarGoogle ScholarCross RefCross Ref
  44. Walaa Eldin Moustafa, Angelika Kimmig, Amol Deshpande, and Lise Getoor. 2014. Subgraph pattern matching over uncertain graphs with identity linkage uncertainty. In Proceedings of the IEEE International Conference on Data Engineering (ICDE'14). 904--915. DOI:http://dx.doi.org/10.1109/ICDE.2014.6816710Google ScholarGoogle ScholarCross RefCross Ref
  45. Jaroslav Nešetšril and Svatopluk Poljak. 1985. On the complexity of the subgraph problem. Commentationes Mathematicae Universitatis Carolinae 26, 2 (Jan. 1985), 415--419.Google ScholarGoogle Scholar
  46. Panos Parchas, Francesco Gullo, Dimitris Papadias, and Franceseco Bonchi. 2014. The pursuit of a good possible world: Extracting representative instances of uncertain graphs. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'14). 967--978. DOI:http://dx.doi.org/10.1145/2588555.2593668 Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Michalis Potamias, Francesco Bonchi, Aristides Gionis, and George Kollios. 2010. K-nearest neighbors in uncertain graphs. Proc. VLDB Endow. 3, 1--2 (Sept. 2010), 997--1008. DOI:http://dx.doi.org/10.14778/1920841.1920967 Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Gerardo Rubino. 1998. Network reliability evaluation. In State-of-the Art in Performance Modeling and Simulation. Gordon and Breach; London. 275--302. DOI:http://dx.doi.org/10.1002/wics.81 Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Petteri Sevon, Lauri Eronen, Petteri Hintsanen, Kimmo Kulovesi, and Hannu Toivonen. 2006. Link discovery in graphs derived from biological databases. In Proceedings of the Springer International Conference on Data Integration in the Life Sciences (DILS'06). 35--49. http://dx.doi.org/10.1007/11799511_5 Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Pang-Ning Tan, Michael Steinbach, and Vipin Kumar. 2005. Introduction to Data Mining. Addison-Wesley Longman Publishing Co., Inc., Boston, MA.Google ScholarGoogle Scholar
  51. Hongsuda Tangmunarunkit, Ramesh Govindan, Sugih Jamin, Scott Shenker, and Walter Willinger. 2002. Network topology generators: Degree-based vs. structural. In Proceedings of the ACM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM'02). 147--159. DOI:http://dx.doi.org/10.1145/633025.633040 Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Ali Tizghadam and Alberto Leon-Garcia. 2010. Betweenness centrality and resistance distance in communication networks. IEEE Netw. 24, 6 (Nov.-Dec. 2010), 10--16. DOI:http://dx.doi.org/10.1109/MNET.2010.5634437 Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Leslie G. Valiant. 1979. The complexity of enumeration and reliability problems. SIAM J. Comp. 8, 3 (1979), 410--421. DOI:http://dx.doi.org/10.1137/0208032Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Christian von Mering, Martijn A. Huynen, Daniel Jaeggi, Steffen Schmidt, Peer Bork, and Berend Snel. 2003. STRING: A database of predicted functional associations between proteins. Nucleic Acids Res. 31, 1 (Jan. 2003), 258--261. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC165481/.Google ScholarGoogle Scholar
  55. Ye Yuan, Lei Chen, and Guoren Wang. 2010. Efficiently answering probability threshold-based shortest path queries over uncertain graphs. In Database Systems for Advanced Applications. Lecture Notes in Computer Science, Vol. 5981. Springer, Berlin, 155--170. DOI:http://dx.doi.org/10.1007/978-3-642-12026-8_14 Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Ye Yuan, Guoren Wang, Lei Chen, and Haixun Wang. 2012. Efficient subgraph similarity search on large probabilistic graph databases. Proc. VLDB Endow. 5, 9 (May 2012), 800--811. DOI:http://dx.doi.org/10.14778/2311906.2311908 Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Ye Yuan, Guoren Wang, Haixun Wang, and Lei Chen. 2011. Efficient subgraph search over large uncertain graphs. Proc. VLDB Endow. 4, 11 (Jan. 2011), 876--886.Google ScholarGoogle Scholar
  58. Zhaonian Zou, Jianzhong Li, Hong Gao, and Shuo Zhang. 2010a. Finding top-k maximal cliques in an uncertain graph. In Proceedings of the IEEE International Conference on Data Engineering (ICDE'10). 649--652. DOI:http://dx.doi.org/10.1109/ICDE.2010.5447891Google ScholarGoogle ScholarCross RefCross Ref
  59. Zhaonian Zou, Jianzhong Li, Hong Gao, and Shuo Zhang. 2010b. Mining frequent subgraph patterns from uncertain graph data. IEEE Trans. Knowl. Data Eng. 22, 9 (May 2010), 1203--1218. DOI:http://dx.doi.org/10.1109/TKDE.2010.80 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Uncertain Graph Processing through Representative Instances

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Database Systems
          ACM Transactions on Database Systems  Volume 40, Issue 3
          October 2015
          247 pages
          ISSN:0362-5915
          EISSN:1557-4644
          DOI:10.1145/2838914
          Issue’s Table of Contents

          Copyright © 2015 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 23 October 2015
          • Accepted: 1 August 2015
          • Revised: 1 June 2015
          • Received: 1 December 2014
          Published in tods Volume 40, Issue 3

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader