Abstract
Data in several applications can be represented as an uncertain graph whose edges are labeled with a probability of existence. Exact query processing on uncertain graphs is prohibitive for most applications, as it involves evaluation over an exponential number of instantiations. Thus, typical approaches employ Monte-Carlo sampling, which (i) draws a number of possible graphs (samples), (ii) evaluates the query on each of them, and (iii) aggregates the individual answers to generate the final result. However, this approach can also be extremely time consuming for large uncertain graphs commonly found in practice. To facilitate efficiency, we study the problem of extracting a single representative instance from an uncertain graph. Conventional processing techniques can then be applied on this representative to closely approximate the result on the original graph.
In order to maintain data utility, the representative instance should preserve structural characteristics of the uncertain graph. We start with representatives that capture the expected vertex degrees, as this is a fundamental property of the graph topology. We then generalize the notion of vertex degree to the concept of n-clique cardinality, that is, the number of cliques of size n that contain a vertex. For the first problem, we propose two methods: Average Degree Rewiring (ADR), which is based on random edge rewiring, and Approximate B-Matching (ABM), which applies graph matching techniques. For the second problem, we develop a greedy approach and a game-theoretic framework. We experimentally demonstrate, with real uncertain graphs, that indeed the representative instances can be used to answer, efficiently and accurately, queries based on several metrics such as shortest path distance, clustering coefficient, and betweenness centrality.
- Serge Abiteboul, Paris Kanellakis, and Gosta Grahne. 1987. On the representation and querying of sets of possible worlds. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'87). 34--48. DOI:http://dx.doi.org/10.1145/38713.38724 Google ScholarDigital Library
- Eytan Adar and Christopher Re. 2007. Managing uncertainty in social networks. IEEE Data Eng. Bulle. 30, 2 (July 2007), 15--22.Google Scholar
- Charu C. Aggarwal and Philip S. Yu. 2009. A survey of uncertain data algorithms and applications. IEEE Trans. Know. Data Eng. 21, 5 (May 2009), 609--623. DOI:http://dx.doi.org/10.1109/TKDE.2008.190 Google ScholarDigital Library
- William Aiello, Fan Chung, and Linyuan Lu. 2000. A random graph model for massive graphs. In Proceedings of the ACM Symposium on Theory of Computing (STOC'00). 171--180. DOI:http://dx.doi.org/10.1145/335305.335326 Google ScholarDigital Library
- Sheldon B. Akers, Dov Harel, and Balakrishnan Krishnamurthy. 1994. The star graph: An attractive alternative to the N-cube. In Interconnection Networks for High-performance Parallel Computers, IEEE Computer Society Press, Los Alamitos, CA, 145--152. Google ScholarDigital Library
- Saurabh Asthana, Oliver D. King, Francis D. Gibbons, and Frederick P. Roth. 2004. Predicting protein complex membership using probabilistic network reliability. Genome Res. 14, 4 (June 2004), 1170--1175. DOI:http://dx.doi.org/10.1101/gr.2203804Google ScholarCross Ref
- Joseph Blitzstein and Persi Diaconis. 2011. A sequential importance sampling algorithm for generating random graphs with prescribed degrees. Int. Math. 6, 4 (March 2011), 489--522. DOI:http://dx.doi.org/10.1080/15427951.2010.557277Google Scholar
- Paolo Boldi, Francesco Bonchi, Aristides Gionis, and Tamir Tassa. 2012. Injecting uncertainty in graphs for identity obfuscation. Proc. VLDB Endow. 5, 11 (July 2012), 1376--1387. DOI:http://dx.doi.org/10.14778/2350229.2350254 Google ScholarDigital Library
- Francesco Bonchi, Francesco Gullo, Andreas Kaltenbrunner, and Yana Volkovich. 2014. Core decomposition of uncertain graphs. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'14). 1316--1325. DOI:http://dx.doi.org/10.1145/2623330.2623655 Google ScholarDigital Library
- Fan Chung and Linyuan Lu. 2002. The average distances in random graphs with given expected degrees. Proc. Nat. Acad. Sci. 99, 25 (Dec. 2002), 15879--15882. DOI:http://dx.doi.org/10.1073/pnas.252631999Google ScholarCross Ref
- Fan Chung, Linyuan Lu, and Van H. Vu. 2003. Spectra of random graphs with given expected degrees. Proc. Nat. Acad. Sci. 100, 11 (May 2003), 6313--6318. DOI:http://dx.doi.org/10.1073/pnas.0937490100Google ScholarCross Ref
- Nilesh Dalvi and Dan Suciu. 2007. Efficient query evaluation on probabilistic databases. VLDB J. 16, 4 (Oct. 2007), 523--544. DOI:http://dx.doi.org/10.1007/s00778-006-0004-3 Google ScholarDigital Library
- Charo I. Del Genio, Hyunju Kim, Zoltn Toroczkai, and Kevin E. Bassler. 2010. Efficient and exact sampling of simple graphs with given arbitrary degree sequence. PLoS ONE 5, 4 (April 2010), e10012. DOI:http://dx.doi.org/10.1371/journal.pone.0010012Google ScholarCross Ref
- Paul Erdös and Tibor Gallai. 1960. Graphs with prescribed degrees of vertices (Hungarian). Mat. Lapok 11 (1960), 264--274.Google Scholar
- Michalis Faloutsos, Petros Faloutsos, and Christos Faloutsos. 1999. On power-law relationships of the Internet topology. In Proceedings of the ACM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM'99). 251--262. DOI:http://dx.doi.org/10.1145/316188.316229 Google ScholarDigital Library
- Pierre Fraigniaud. 2007. Small worlds as navigable augmented networks: Model, analysis, and validation. In Algorithms ESA. Lecture Notes in Computer Science, Vol. 4698, Springer, Berlin, 2--11. DOI:http://dx.doi.org/10.1007/978-3-540-75520-3_2 Google ScholarDigital Library
- Ana M. Gonzalez, Bo Dalsgaard, and Jens M. Olesen. 2010. Centrality measures and the importance of generalist species in pollination networks. Ecological Complex. 7, 1 (March 2010), 36--43. DOI:http://dx.doi.org/10.1016/j.ecocom.2009.03.008Google Scholar
- Seifollah L. Hakimi. 1962. On realizability of a set of integers as degrees of the vertices of a linear graph. J. Soc. Indust. Appl. Math. 10, 3 (Sept. 1962), 496--506. http://www.jstor.org/stable/2098746Google ScholarCross Ref
- Stefan Hougardy. 2009. Linear time approximation algorithms for degree constrained subgraph problems. In Research Trends in Combinatorial Optimization, Springer, Berlin, 185--200. DOI:http://dx.doi.org/10.1007/978-3-540-76796-1_9Google Scholar
- Xiaocheng Hu, Yufei Tao, and Chin-Wan Chung. 2013. Massive graph triangulation. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'13). 325--336. DOI:http://dx.doi.org/10.1145/2463676.2463704 Google ScholarDigital Library
- Christian Hübler, Hans-Peter Kriegel, Karsten Borgwardt, and Zoubin Ghahramani. 2008. Metropolis algorithms for representative subgraph sampling. In Proceedings of the IEEE International Conference on Data Mining (ICDM'08). 283--292. DOI:http://dx.doi.org/10.1109/ICDM.2008.124 Google ScholarDigital Library
- Ruoming Jin, Lin Liu, and Charu C. Aggarwal. 2011a. Discovering highly reliable subgraphs in uncertain graphs. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'11). 992--1000. DOI:http://dx.doi.org/10.1145/2020408.2020569 Google ScholarDigital Library
- Ruoming Jin, Lin Liu, Bolin Ding, and Haixun Wang. 2011b. Distance-constraint reachability computation in uncertain graphs. Proc. VLDB Endow. 4, 9 (June 2011), 551--562. DOI:http://dx.doi.org/10.14778/2002938.2002941 Google ScholarDigital Library
- David Kempe, Jon Kleinberg, and Éva Tardos. 2003. Maximizing the spread of influence through a social network. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'03). 137--146. DOI:http://dx.doi.org/10.1145/956750.956769 Google ScholarDigital Library
- Arijit Khan, Francesco Bonchi, Aristides Gionis, and Francesco Gullo. 2014. Fast reliability search in uncertain graphs. In Proceedings of the International Conference on Extending Database Technology (EDBT'14). 535--546. DOI:http://dx.doi.org/10.5441/002/edbt.2014.48Google Scholar
- Jon Kleinberg. 2006. Complex networks and decentralized search algorithms. In Proceedings of the International Congress of Mathematicians (ICM'06). 1019--1044.Google Scholar
- George Kollios, Michalis Potamias, and Evimaria Terzi. 2013. Clustering large probabilistic graphs. IEEE Trans. Knowl. Data Eng. 25, 2 (Feb. 2013), 325--336. DOI:http://dx.doi.org/10.1109/TKDE.2011.243 Google ScholarDigital Library
- Gueorgi Kossinets and Duncan J. Watts. 2006. Empirical analysis of an evolving social network. Science 311, 5757 (Jan. 2006), 88--90. DOI:http://dx.doi.org/10.1126/science.1116869Google ScholarCross Ref
- Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos, and Zoubin Ghahramani. 2010. Kronecker graphs: An approach to modeling networks. J. Mach. Learn. Res. 11 (March 2010), 985--1042. http://dl.acm.org/citation.cfm?id=1756006.1756039 Google ScholarDigital Library
- Jure Leskovec and Christos Faloutsos. 2006. Sampling from large graphs. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'06). 631--636. DOI:http://dx.doi.org/10.1145/1150402.1150479 Google ScholarDigital Library
- Rong-Hua Li, Jeffrey Xu Yu, Rui Mao, and Tan Jin. 2014. Efficient and accurate query evaluation on uncertain graphs via recursive stratified sampling. In Proceedings of the IEEE International Conference on Data Engineering (ICDE'14). 892--903. DOI:http://dx.doi.org/10.1109/ICDE.2014.6816709Google ScholarCross Ref
- David Liben-Nowell and Jon Kleinberg. 2003. The link prediction problem for social networks. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM'03). 556--559. DOI:http://dx.doi.org/10.1145/956863.956972 Google ScholarDigital Library
- Fredrik Liljeros, Christofer R. Edling, Luis A. Nunes Amaral, H. Eugene Stanley, and Yvonne Åberg. 2001. The web of human sexual contacts. Nature 411, 6840 (June 2001), 907--908. DOI:http://dx.doi.org/10.1038/35082140Google ScholarCross Ref
- Lin Liu, Ruoming Jin, Charu C. Aggarwal, and Yelong Shen. 2012. Reliable clustering on uncertain graphs. In Proceedings of the IEEE International Conference on Data Mining (ICDM'12). 459--468. DOI:http://dx.doi.org/10.1109/ICDM.2012.11 Google ScholarDigital Library
- Yang-Yu Liu, Jean-Jacques Slotine, and Albert-Laszlo Barabasi. 2011. Controllability of complex networks. Nature 473, 7346 (May 2011), 167--173. DOI:http://dx.doi.org/10.1038/nature10011Google ScholarCross Ref
- László Lovász. 1970. Subgraphs with prescribed valencies. J. Combin. Theory 8, 4 (1970), 391--416. DOI:http://dx.doi.org/10.1016/S0021-9800(70)80033-3Google ScholarCross Ref
- Priya Mahadevan, Dmitri Krioukov, Kevin Fall, and Amin Vahdat. 2006. Systematic topology analysis and generation using degree correlations. In Proceedings of the ACM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM'06). 135--146. DOI:http://dx.doi.org/10.1145/1159913.1159930 Google ScholarDigital Library
- Miller McPherson, Lynn Smith-Lovin, and James M. Cook. 2001. Birds of a feather: Homophily in social networks. Ann. Rev. Sociol. 27, 1 (Aug. 2001), 415--444. DOI:http://dx.doi.org/10.1146/annurev.soc.27.1.415Google ScholarCross Ref
- Julián Mestre. 2006. Greedy in approximation algorithms. In Algorithms ESA. Lecture Notes in Computer Science, Vol. 4168. Springer, Berlin, 528--539. DOI:http://dx.doi.org/10.1007/11841036_48 Google ScholarDigital Library
- Silvio Micali and Vijay V. Vazirani. 1980. An O(&surd;V∣ċ∣E∣) algorithm for finding maximum matching in general graphs. In Proceedings of the IEEE Annual Symposium on Foundations of Computer Science (FOCS'80). 17--27. DOI:http://dx.doi.org/10.1109/SFCS.1980.12 Google ScholarDigital Library
- Daniele Micciancio. 2001. The hardness of the closest vector problem with preprocessing. IEEE Trans. Inform. Theory 47, 3 (March 2001), 1212--1215. DOI:http://dx.doi.org/10.1109/18.915688 Google ScholarDigital Library
- Milena Mihail and Nisheeth K. Vishnoi. 2002. On generating graphs with prescribed degree sequences for computer network modeling applications. Position Paper. In Proceedings of the 3rd Workshop on Approximate and Randomized Algorithms for Communication Networks (ARACNE'02). 1--11.Google Scholar
- Dov Monderer and Lloyd S. Shapley. 1996. Potential games. Games Econ. Behav. 14, 1 (May 1996), 124--143. DOI:http://dx.doi.org/10.1006/game.1996.0044Google ScholarCross Ref
- Walaa Eldin Moustafa, Angelika Kimmig, Amol Deshpande, and Lise Getoor. 2014. Subgraph pattern matching over uncertain graphs with identity linkage uncertainty. In Proceedings of the IEEE International Conference on Data Engineering (ICDE'14). 904--915. DOI:http://dx.doi.org/10.1109/ICDE.2014.6816710Google ScholarCross Ref
- Jaroslav Nešetšril and Svatopluk Poljak. 1985. On the complexity of the subgraph problem. Commentationes Mathematicae Universitatis Carolinae 26, 2 (Jan. 1985), 415--419.Google Scholar
- Panos Parchas, Francesco Gullo, Dimitris Papadias, and Franceseco Bonchi. 2014. The pursuit of a good possible world: Extracting representative instances of uncertain graphs. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'14). 967--978. DOI:http://dx.doi.org/10.1145/2588555.2593668 Google ScholarDigital Library
- Michalis Potamias, Francesco Bonchi, Aristides Gionis, and George Kollios. 2010. K-nearest neighbors in uncertain graphs. Proc. VLDB Endow. 3, 1--2 (Sept. 2010), 997--1008. DOI:http://dx.doi.org/10.14778/1920841.1920967 Google ScholarDigital Library
- Gerardo Rubino. 1998. Network reliability evaluation. In State-of-the Art in Performance Modeling and Simulation. Gordon and Breach; London. 275--302. DOI:http://dx.doi.org/10.1002/wics.81 Google ScholarDigital Library
- Petteri Sevon, Lauri Eronen, Petteri Hintsanen, Kimmo Kulovesi, and Hannu Toivonen. 2006. Link discovery in graphs derived from biological databases. In Proceedings of the Springer International Conference on Data Integration in the Life Sciences (DILS'06). 35--49. http://dx.doi.org/10.1007/11799511_5 Google ScholarDigital Library
- Pang-Ning Tan, Michael Steinbach, and Vipin Kumar. 2005. Introduction to Data Mining. Addison-Wesley Longman Publishing Co., Inc., Boston, MA.Google Scholar
- Hongsuda Tangmunarunkit, Ramesh Govindan, Sugih Jamin, Scott Shenker, and Walter Willinger. 2002. Network topology generators: Degree-based vs. structural. In Proceedings of the ACM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM'02). 147--159. DOI:http://dx.doi.org/10.1145/633025.633040 Google ScholarDigital Library
- Ali Tizghadam and Alberto Leon-Garcia. 2010. Betweenness centrality and resistance distance in communication networks. IEEE Netw. 24, 6 (Nov.-Dec. 2010), 10--16. DOI:http://dx.doi.org/10.1109/MNET.2010.5634437 Google ScholarDigital Library
- Leslie G. Valiant. 1979. The complexity of enumeration and reliability problems. SIAM J. Comp. 8, 3 (1979), 410--421. DOI:http://dx.doi.org/10.1137/0208032Google ScholarDigital Library
- Christian von Mering, Martijn A. Huynen, Daniel Jaeggi, Steffen Schmidt, Peer Bork, and Berend Snel. 2003. STRING: A database of predicted functional associations between proteins. Nucleic Acids Res. 31, 1 (Jan. 2003), 258--261. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC165481/.Google Scholar
- Ye Yuan, Lei Chen, and Guoren Wang. 2010. Efficiently answering probability threshold-based shortest path queries over uncertain graphs. In Database Systems for Advanced Applications. Lecture Notes in Computer Science, Vol. 5981. Springer, Berlin, 155--170. DOI:http://dx.doi.org/10.1007/978-3-642-12026-8_14 Google ScholarDigital Library
- Ye Yuan, Guoren Wang, Lei Chen, and Haixun Wang. 2012. Efficient subgraph similarity search on large probabilistic graph databases. Proc. VLDB Endow. 5, 9 (May 2012), 800--811. DOI:http://dx.doi.org/10.14778/2311906.2311908 Google ScholarDigital Library
- Ye Yuan, Guoren Wang, Haixun Wang, and Lei Chen. 2011. Efficient subgraph search over large uncertain graphs. Proc. VLDB Endow. 4, 11 (Jan. 2011), 876--886.Google Scholar
- Zhaonian Zou, Jianzhong Li, Hong Gao, and Shuo Zhang. 2010a. Finding top-k maximal cliques in an uncertain graph. In Proceedings of the IEEE International Conference on Data Engineering (ICDE'10). 649--652. DOI:http://dx.doi.org/10.1109/ICDE.2010.5447891Google ScholarCross Ref
- Zhaonian Zou, Jianzhong Li, Hong Gao, and Shuo Zhang. 2010b. Mining frequent subgraph patterns from uncertain graph data. IEEE Trans. Knowl. Data Eng. 22, 9 (May 2010), 1203--1218. DOI:http://dx.doi.org/10.1109/TKDE.2010.80 Google ScholarDigital Library
Index Terms
- Uncertain Graph Processing through Representative Instances
Recommendations
The pursuit of a good possible world: extracting representative instances of uncertain graphs
SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of DataData in several applications can be represented as an uncertain graph, whose edges are labeled with a probability of existence. Exact query processing on uncertain graphs is prohibitive for most applications, as it involves evaluation over an ...
An uncertain chromatic number of an uncertain graph based on $$\alpha $$ź-cut coloring
An uncertain graph is a graph in which the edges are indeterminate and the existence of edges are characterized by belief degrees which are uncertain measures. This paper aims to bring graph coloring and uncertainty theory together. A new approach for ...
Mining frequent subgraphs over uncertain graph databases under probabilistic semantics
Frequent subgraph mining has been extensively studied on certain graph data. However, uncertainty is intrinsic in graph data in practice, but there is very few work on mining uncertain graph data. This paper focuses on mining frequent subgraphs over ...
Comments