research-article

Uncertain Graph Processing through Representative Instances

Authors:
Panos Parchas

Hong Kong University of Science and Technology

Hong Kong University of Science and Technology
View Profile

,
Francesco Gullo

Yahoo Labs, Barcelona

Yahoo Labs, Barcelona
View Profile

,
Dimitris Papadias

Hong Kong University of Science and Technology

Hong Kong University of Science and Technology
View Profile

,
Francesco Bonchi

Yahoo Labs, Barcelona

Yahoo Labs, Barcelona
View Profile

Authors Info & Claims

ACM Transactions on Database Systems Volume 40 Issue 3Article No.: 20pp 1–39https://doi.org/10.1145/2818182

Published:23 October 2015Publication History

ACM Transactions on Database Systems

Abstract

Data in several applications can be represented as an uncertain graph whose edges are labeled with a probability of existence. Exact query processing on uncertain graphs is prohibitive for most applications, as it involves evaluation over an exponential number of instantiations. Thus, typical approaches employ Monte-Carlo sampling, which (i) draws a number of possible graphs (samples), (ii) evaluates the query on each of them, and (iii) aggregates the individual answers to generate the final result. However, this approach can also be extremely time consuming for large uncertain graphs commonly found in practice. To facilitate efficiency, we study the problem of extracting a single representative instance from an uncertain graph. Conventional processing techniques can then be applied on this representative to closely approximate the result on the original graph.

In order to maintain data utility, the representative instance should preserve structural characteristics of the uncertain graph. We start with representatives that capture the expected vertex degrees, as this is a fundamental property of the graph topology. We then generalize the notion of vertex degree to the concept of n-clique cardinality, that is, the number of cliques of size n that contain a vertex. For the first problem, we propose two methods: Average Degree Rewiring (ADR), which is based on random edge rewiring, and Approximate B-Matching (ABM), which applies graph matching techniques. For the second problem, we develop a greedy approach and a game-theoretic framework. We experimentally demonstrate, with real uncertain graphs, that indeed the representative instances can be used to answer, efficiently and accurately, queries based on several metrics such as shortest path distance, clustering coefficient, and betweenness centrality.

References

Serge Abiteboul, Paris Kanellakis, and Gosta Grahne. 1987. On the representation and querying of sets of possible worlds. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'87). 34--48. DOI:http://dx.doi.org/10.1145/38713.38724 Google ScholarDigital Library
Eytan Adar and Christopher Re. 2007. Managing uncertainty in social networks. IEEE Data Eng. Bulle. 30, 2 (July 2007), 15--22.Google Scholar
Charu C. Aggarwal and Philip S. Yu. 2009. A survey of uncertain data algorithms and applications. IEEE Trans. Know. Data Eng. 21, 5 (May 2009), 609--623. DOI:http://dx.doi.org/10.1109/TKDE.2008.190 Google ScholarDigital Library
William Aiello, Fan Chung, and Linyuan Lu. 2000. A random graph model for massive graphs. In Proceedings of the ACM Symposium on Theory of Computing (STOC'00). 171--180. DOI:http://dx.doi.org/10.1145/335305.335326 Google ScholarDigital Library
Sheldon B. Akers, Dov Harel, and Balakrishnan Krishnamurthy. 1994. The star graph: An attractive alternative to the N-cube. In Interconnection Networks for High-performance Parallel Computers, IEEE Computer Society Press, Los Alamitos, CA, 145--152. Google ScholarDigital Library
Saurabh Asthana, Oliver D. King, Francis D. Gibbons, and Frederick P. Roth. 2004. Predicting protein complex membership using probabilistic network reliability. Genome Res. 14, 4 (June 2004), 1170--1175. DOI:http://dx.doi.org/10.1101/gr.2203804Google ScholarCross Ref
Joseph Blitzstein and Persi Diaconis. 2011. A sequential importance sampling algorithm for generating random graphs with prescribed degrees. Int. Math. 6, 4 (March 2011), 489--522. DOI:http://dx.doi.org/10.1080/15427951.2010.557277Google Scholar
Paolo Boldi, Francesco Bonchi, Aristides Gionis, and Tamir Tassa. 2012. Injecting uncertainty in graphs for identity obfuscation. Proc. VLDB Endow. 5, 11 (July 2012), 1376--1387. DOI:http://dx.doi.org/10.14778/2350229.2350254 Google ScholarDigital Library
Francesco Bonchi, Francesco Gullo, Andreas Kaltenbrunner, and Yana Volkovich. 2014. Core decomposition of uncertain graphs. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'14). 1316--1325. DOI:http://dx.doi.org/10.1145/2623330.2623655 Google ScholarDigital Library
Fan Chung and Linyuan Lu. 2002. The average distances in random graphs with given expected degrees. Proc. Nat. Acad. Sci. 99, 25 (Dec. 2002), 15879--15882. DOI:http://dx.doi.org/10.1073/pnas.252631999Google ScholarCross Ref
Fan Chung, Linyuan Lu, and Van H. Vu. 2003. Spectra of random graphs with given expected degrees. Proc. Nat. Acad. Sci. 100, 11 (May 2003), 6313--6318. DOI:http://dx.doi.org/10.1073/pnas.0937490100Google ScholarCross Ref
Nilesh Dalvi and Dan Suciu. 2007. Efficient query evaluation on probabilistic databases. VLDB J. 16, 4 (Oct. 2007), 523--544. DOI:http://dx.doi.org/10.1007/s00778-006-0004-3 Google ScholarDigital Library
Charo I. Del Genio, Hyunju Kim, Zoltn Toroczkai, and Kevin E. Bassler. 2010. Efficient and exact sampling of simple graphs with given arbitrary degree sequence. PLoS ONE 5, 4 (April 2010), e10012. DOI:http://dx.doi.org/10.1371/journal.pone.0010012Google ScholarCross Ref
Paul Erdös and Tibor Gallai. 1960. Graphs with prescribed degrees of vertices (Hungarian). Mat. Lapok 11 (1960), 264--274.Google Scholar
Michalis Faloutsos, Petros Faloutsos, and Christos Faloutsos. 1999. On power-law relationships of the Internet topology. In Proceedings of the ACM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM'99). 251--262. DOI:http://dx.doi.org/10.1145/316188.316229 Google ScholarDigital Library
Pierre Fraigniaud. 2007. Small worlds as navigable augmented networks: Model, analysis, and validation. In Algorithms ESA. Lecture Notes in Computer Science, Vol. 4698, Springer, Berlin, 2--11. DOI:http://dx.doi.org/10.1007/978-3-540-75520-3_2 Google ScholarDigital Library
Ana M. Gonzalez, Bo Dalsgaard, and Jens M. Olesen. 2010. Centrality measures and the importance of generalist species in pollination networks. Ecological Complex. 7, 1 (March 2010), 36--43. DOI:http://dx.doi.org/10.1016/j.ecocom.2009.03.008Google Scholar
Seifollah L. Hakimi. 1962. On realizability of a set of integers as degrees of the vertices of a linear graph. J. Soc. Indust. Appl. Math. 10, 3 (Sept. 1962), 496--506. http://www.jstor.org/stable/2098746Google ScholarCross Ref
Stefan Hougardy. 2009. Linear time approximation algorithms for degree constrained subgraph problems. In Research Trends in Combinatorial Optimization, Springer, Berlin, 185--200. DOI:http://dx.doi.org/10.1007/978-3-540-76796-1_9Google Scholar
Xiaocheng Hu, Yufei Tao, and Chin-Wan Chung. 2013. Massive graph triangulation. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'13). 325--336. DOI:http://dx.doi.org/10.1145/2463676.2463704 Google ScholarDigital Library
Christian Hübler, Hans-Peter Kriegel, Karsten Borgwardt, and Zoubin Ghahramani. 2008. Metropolis algorithms for representative subgraph sampling. In Proceedings of the IEEE International Conference on Data Mining (ICDM'08). 283--292. DOI:http://dx.doi.org/10.1109/ICDM.2008.124 Google ScholarDigital Library
Ruoming Jin, Lin Liu, and Charu C. Aggarwal. 2011a. Discovering highly reliable subgraphs in uncertain graphs. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'11). 992--1000. DOI:http://dx.doi.org/10.1145/2020408.2020569 Google ScholarDigital Library
Ruoming Jin, Lin Liu, Bolin Ding, and Haixun Wang. 2011b. Distance-constraint reachability computation in uncertain graphs. Proc. VLDB Endow. 4, 9 (June 2011), 551--562. DOI:http://dx.doi.org/10.14778/2002938.2002941 Google ScholarDigital Library
David Kempe, Jon Kleinberg, and Éva Tardos. 2003. Maximizing the spread of influence through a social network. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'03). 137--146. DOI:http://dx.doi.org/10.1145/956750.956769 Google ScholarDigital Library
Arijit Khan, Francesco Bonchi, Aristides Gionis, and Francesco Gullo. 2014. Fast reliability search in uncertain graphs. In Proceedings of the International Conference on Extending Database Technology (EDBT'14). 535--546. DOI:http://dx.doi.org/10.5441/002/edbt.2014.48Google Scholar
Jon Kleinberg. 2006. Complex networks and decentralized search algorithms. In Proceedings of the International Congress of Mathematicians (ICM'06). 1019--1044.Google Scholar
George Kollios, Michalis Potamias, and Evimaria Terzi. 2013. Clustering large probabilistic graphs. IEEE Trans. Knowl. Data Eng. 25, 2 (Feb. 2013), 325--336. DOI:http://dx.doi.org/10.1109/TKDE.2011.243 Google ScholarDigital Library
Gueorgi Kossinets and Duncan J. Watts. 2006. Empirical analysis of an evolving social network. Science 311, 5757 (Jan. 2006), 88--90. DOI:http://dx.doi.org/10.1126/science.1116869Google ScholarCross Ref
Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos, and Zoubin Ghahramani. 2010. Kronecker graphs: An approach to modeling networks. J. Mach. Learn. Res. 11 (March 2010), 985--1042. http://dl.acm.org/citation.cfm?id=1756006.1756039 Google ScholarDigital Library
Jure Leskovec and Christos Faloutsos. 2006. Sampling from large graphs. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'06). 631--636. DOI:http://dx.doi.org/10.1145/1150402.1150479 Google ScholarDigital Library
Rong-Hua Li, Jeffrey Xu Yu, Rui Mao, and Tan Jin. 2014. Efficient and accurate query evaluation on uncertain graphs via recursive stratified sampling. In Proceedings of the IEEE International Conference on Data Engineering (ICDE'14). 892--903. DOI:http://dx.doi.org/10.1109/ICDE.2014.6816709Google ScholarCross Ref
David Liben-Nowell and Jon Kleinberg. 2003. The link prediction problem for social networks. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM'03). 556--559. DOI:http://dx.doi.org/10.1145/956863.956972 Google ScholarDigital Library
Fredrik Liljeros, Christofer R. Edling, Luis A. Nunes Amaral, H. Eugene Stanley, and Yvonne Åberg. 2001. The web of human sexual contacts. Nature 411, 6840 (June 2001), 907--908. DOI:http://dx.doi.org/10.1038/35082140Google ScholarCross Ref
Lin Liu, Ruoming Jin, Charu C. Aggarwal, and Yelong Shen. 2012. Reliable clustering on uncertain graphs. In Proceedings of the IEEE International Conference on Data Mining (ICDM'12). 459--468. DOI:http://dx.doi.org/10.1109/ICDM.2012.11 Google ScholarDigital Library
Yang-Yu Liu, Jean-Jacques Slotine, and Albert-Laszlo Barabasi. 2011. Controllability of complex networks. Nature 473, 7346 (May 2011), 167--173. DOI:http://dx.doi.org/10.1038/nature10011Google ScholarCross Ref
László Lovász. 1970. Subgraphs with prescribed valencies. J. Combin. Theory 8, 4 (1970), 391--416. DOI:http://dx.doi.org/10.1016/S0021-9800(70)80033-3Google ScholarCross Ref
Priya Mahadevan, Dmitri Krioukov, Kevin Fall, and Amin Vahdat. 2006. Systematic topology analysis and generation using degree correlations. In Proceedings of the ACM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM'06). 135--146. DOI:http://dx.doi.org/10.1145/1159913.1159930 Google ScholarDigital Library
Miller McPherson, Lynn Smith-Lovin, and James M. Cook. 2001. Birds of a feather: Homophily in social networks. Ann. Rev. Sociol. 27, 1 (Aug. 2001), 415--444. DOI:http://dx.doi.org/10.1146/annurev.soc.27.1.415Google ScholarCross Ref
Julián Mestre. 2006. Greedy in approximation algorithms. In Algorithms ESA. Lecture Notes in Computer Science, Vol. 4168. Springer, Berlin, 528--539. DOI:http://dx.doi.org/10.1007/11841036_48 Google ScholarDigital Library
Silvio Micali and Vijay V. Vazirani. 1980. An O(&surd;V&mid;ċ&mid;E&mid;) algorithm for finding maximum matching in general graphs. In Proceedings of the IEEE Annual Symposium on Foundations of Computer Science (FOCS'80). 17--27. DOI:http://dx.doi.org/10.1109/SFCS.1980.12 Google ScholarDigital Library
Daniele Micciancio. 2001. The hardness of the closest vector problem with preprocessing. IEEE Trans. Inform. Theory 47, 3 (March 2001), 1212--1215. DOI:http://dx.doi.org/10.1109/18.915688 Google ScholarDigital Library
Milena Mihail and Nisheeth K. Vishnoi. 2002. On generating graphs with prescribed degree sequences for computer network modeling applications. Position Paper. In Proceedings of the 3rd Workshop on Approximate and Randomized Algorithms for Communication Networks (ARACNE'02). 1--11.Google Scholar
Dov Monderer and Lloyd S. Shapley. 1996. Potential games. Games Econ. Behav. 14, 1 (May 1996), 124--143. DOI:http://dx.doi.org/10.1006/game.1996.0044Google ScholarCross Ref
Walaa Eldin Moustafa, Angelika Kimmig, Amol Deshpande, and Lise Getoor. 2014. Subgraph pattern matching over uncertain graphs with identity linkage uncertainty. In Proceedings of the IEEE International Conference on Data Engineering (ICDE'14). 904--915. DOI:http://dx.doi.org/10.1109/ICDE.2014.6816710Google ScholarCross Ref
Jaroslav Nešetšril and Svatopluk Poljak. 1985. On the complexity of the subgraph problem. Commentationes Mathematicae Universitatis Carolinae 26, 2 (Jan. 1985), 415--419.Google Scholar
Panos Parchas, Francesco Gullo, Dimitris Papadias, and Franceseco Bonchi. 2014. The pursuit of a good possible world: Extracting representative instances of uncertain graphs. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'14). 967--978. DOI:http://dx.doi.org/10.1145/2588555.2593668 Google ScholarDigital Library
Michalis Potamias, Francesco Bonchi, Aristides Gionis, and George Kollios. 2010. K-nearest neighbors in uncertain graphs. Proc. VLDB Endow. 3, 1--2 (Sept. 2010), 997--1008. DOI:http://dx.doi.org/10.14778/1920841.1920967 Google ScholarDigital Library
Gerardo Rubino. 1998. Network reliability evaluation. In State-of-the Art in Performance Modeling and Simulation. Gordon and Breach; London. 275--302. DOI:http://dx.doi.org/10.1002/wics.81 Google ScholarDigital Library
Petteri Sevon, Lauri Eronen, Petteri Hintsanen, Kimmo Kulovesi, and Hannu Toivonen. 2006. Link discovery in graphs derived from biological databases. In Proceedings of the Springer International Conference on Data Integration in the Life Sciences (DILS'06). 35--49. http://dx.doi.org/10.1007/11799511_5 Google ScholarDigital Library
Pang-Ning Tan, Michael Steinbach, and Vipin Kumar. 2005. Introduction to Data Mining. Addison-Wesley Longman Publishing Co., Inc., Boston, MA.Google Scholar
Hongsuda Tangmunarunkit, Ramesh Govindan, Sugih Jamin, Scott Shenker, and Walter Willinger. 2002. Network topology generators: Degree-based vs. structural. In Proceedings of the ACM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM'02). 147--159. DOI:http://dx.doi.org/10.1145/633025.633040 Google ScholarDigital Library
Ali Tizghadam and Alberto Leon-Garcia. 2010. Betweenness centrality and resistance distance in communication networks. IEEE Netw. 24, 6 (Nov.-Dec. 2010), 10--16. DOI:http://dx.doi.org/10.1109/MNET.2010.5634437 Google ScholarDigital Library
Leslie G. Valiant. 1979. The complexity of enumeration and reliability problems. SIAM J. Comp. 8, 3 (1979), 410--421. DOI:http://dx.doi.org/10.1137/0208032Google ScholarDigital Library
Christian von Mering, Martijn A. Huynen, Daniel Jaeggi, Steffen Schmidt, Peer Bork, and Berend Snel. 2003. STRING: A database of predicted functional associations between proteins. Nucleic Acids Res. 31, 1 (Jan. 2003), 258--261. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC165481/.Google Scholar
Ye Yuan, Lei Chen, and Guoren Wang. 2010. Efficiently answering probability threshold-based shortest path queries over uncertain graphs. In Database Systems for Advanced Applications. Lecture Notes in Computer Science, Vol. 5981. Springer, Berlin, 155--170. DOI:http://dx.doi.org/10.1007/978-3-642-12026-8_14 Google ScholarDigital Library
Ye Yuan, Guoren Wang, Lei Chen, and Haixun Wang. 2012. Efficient subgraph similarity search on large probabilistic graph databases. Proc. VLDB Endow. 5, 9 (May 2012), 800--811. DOI:http://dx.doi.org/10.14778/2311906.2311908 Google ScholarDigital Library
Ye Yuan, Guoren Wang, Haixun Wang, and Lei Chen. 2011. Efficient subgraph search over large uncertain graphs. Proc. VLDB Endow. 4, 11 (Jan. 2011), 876--886.Google Scholar
Zhaonian Zou, Jianzhong Li, Hong Gao, and Shuo Zhang. 2010a. Finding top-k maximal cliques in an uncertain graph. In Proceedings of the IEEE International Conference on Data Engineering (ICDE'10). 649--652. DOI:http://dx.doi.org/10.1109/ICDE.2010.5447891Google ScholarCross Ref
Zhaonian Zou, Jianzhong Li, Hong Gao, and Shuo Zhang. 2010b. Mining frequent subgraph patterns from uncertain graph data. IEEE Trans. Knowl. Data Eng. 22, 9 (May 2010), 1203--1218. DOI:http://dx.doi.org/10.1109/TKDE.2010.80 Google ScholarDigital Library

Index Terms

Uncertain Graph Processing through Representative Instances

Recommendations

The pursuit of a good possible world: extracting representative instances of uncertain graphs
SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data

Data in several applications can be represented as an uncertain graph, whose edges are labeled with a probability of existence. Exact query processing on uncertain graphs is prohibitive for most applications, as it involves evaluation over an ...
Read More
An uncertain chromatic number of an uncertain graph based on $$\alpha $$ź-cut coloring

An uncertain graph is a graph in which the edges are indeterminate and the existence of edges are characterized by belief degrees which are uncertain measures. This paper aims to bring graph coloring and uncertainty theory together. A new approach for ...
Read More
Mining frequent subgraphs over uncertain graph databases under probabilistic semantics

Frequent subgraph mining has been extensively studied on certain graph data. However, uncertainty is intrinsic in graph data in practice, but there is very few work on mining uncertain graph data. This paper focuses on mining frequent subgraphs over ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Database Systems Volume 40, Issue 3
October 2015
247 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/2838914
Editor:
Christian S. Jensen
Aalborg University, Denmark
Issue’s Table of Contents
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 October 2015
- Accepted: 1 August 2015
- Revised: 1 June 2015
- Received: 1 December 2014
Published in tods Volume 40, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Uncertain graph
n-clique cardinality
possible world
representative
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 29
  Total Citations
  View Citations
- 482
  Total Downloads
- Downloads (Last 12 months)20
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Uncertain Graph Processing through Representative Instances

ACM Transactions on Database Systems

Abstract

References

Cited By

Index Terms

Recommendations

The pursuit of a good possible world: extracting representative instances of uncertain graphs

An uncertain chromatic number of an uncertain graph based on $$\alpha $$ź-cut coloring

Mining frequent subgraphs over uncertain graph databases under probabilistic semantics

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Uncertain Graph Processing through Representative Instances

ACM Transactions on Database Systems

Abstract

References

Cited By

Index Terms

Recommendations

The pursuit of a good possible world: extracting representative instances of uncertain graphs

An uncertain chromatic number of an uncertain graph based on $$\alpha $$ź-cut coloring

Mining frequent subgraphs over uncertain graph databases under probabilistic semantics

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media