ABSTRACT
Data in several applications can be represented as an uncertain graph, whose edges are labeled with a probability of existence. Exact query processing on uncertain graphs is prohibitive for most applications, as it involves evaluation over an exponential number of instantiations. Even approximate processing based on sampling is usually extremely expensive since it requires a vast number of samples to achieve reasonable quality guarantees. To overcome these problems, we propose algorithms for creating deterministic representative instances of uncertain graphs that maintain the underlying graph properties. Specifically, our algorithms aim at preserving the expected vertex degrees because they capture well the graph topology. Conventional processing techniques can then be applied on these instances to closely approximate the result on the uncertain graph. We experimentally demonstrate, with real and synthetic uncertain graphs, that indeed the representative instances can be used to answer, efficiently and accurately, queries based on several properties such as shortest path distance, clustering coefficient and betweenness centrality.
- S. Abiteboul, P. Kanellakis, and G. Grahne. On the representation and querying of sets of possible worlds. In SIGMOD, pages 34--48, 1987. Google ScholarDigital Library
- E. Adar and C. Re. Managing uncertainty in social networks. IEEE Data Eng. Bull., 30(2):15--22, 2007.Google Scholar
- C. C. Aggarwal and P. S. Yu. A survey of uncertain data algorithms and applications. TKDE, 21(5):609--623, 2009. Google ScholarDigital Library
- S. Asthana, O. D. King, F. D. Gibbons, and F. P. Roth. Predicting protein complex membership using probabilistic network reliability. Genome Res., 14:1170--1175, 2004.Google ScholarCross Ref
- J. Blitzstein and P. Diaconis. A sequential importance sampling algorithm for generating random graphs with prescribed degrees. Internet Mathematics, 6(4):489--522, 2011.Google ScholarCross Ref
- P. Boldi, F. Bonchi, A. Gionis, and T. Tassa. Injecting uncertainty in graphs for identity obfuscation. PVLDB, 5(11):1376--1387, 2012. Google ScholarDigital Library
- L. Chen and C. Wang. Continuous subgraph pattern search over certain and uncertain graph streams. TKDE, 22(8):1093--1109, 2010. Google ScholarDigital Library
- F. R. K. Chung and L. Lu. The average distance in a random graph with given expected degrees. Internet Mathematics, 1(1):91--113, 2003.Google ScholarCross Ref
- N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. In VLDB, pages 864--875, 2004. Google ScholarDigital Library
- P. Erdös and T. Gallai. Graphs with prescribed degrees of vertices (hungarian). Mat. Lapok, 11:264--274, 1960.Google Scholar
- P. Fraigniaud. Small worlds as navigable augmented networks: model, analysis, and validation. In ESA, pages 2--11, 2007. Google ScholarDigital Library
- A. M. M. Gonzalez, B. Dalsgaard, and J. M. Olesen. Centrality measures and the importance of generalist species in pollination networks. Ecological Complexity, 7(1):36--43, 2010.Google ScholarCross Ref
- S. L. Hakimi. On realizability of a set of integers as degrees of the vertices of a linear graph (i). J. Soc. Indust. Appl. Math., 10(3):496--506, 1962.Google ScholarCross Ref
- S. Hougardy. Linear time approximation algorithms for degree constrained subgraph problems. In W. Cook, L. Lovasz, and J. Vygen, editors, Research Trends in Combinatorial Optimization, pages 185--200. Springer, 2009.Google ScholarCross Ref
- R. Jin, L. Liu, and C. C. Aggarwal. Discovering highly reliable subgraphs in uncertain graphs. In KDD, pages 992--1000, 2011. Google ScholarDigital Library
- R. Jin, L. Liu, B. Ding, and H. Wang. Distance-constraint reachability computation in uncertain graphs. PVLDB, 4(9):551--562, 2011. Google ScholarDigital Library
- D. Kempe, J. M. Kleinberg, and É. Tardos. Maximizing the spread of influence through a social network. In KDD, pages 137--146, 2003. Google ScholarDigital Library
- J. Kleinberg. Complex networks and decentralized search algorithms. In Int. Congr. of Mathematicians (ICM), 2006.Google Scholar
- G. Kossinets and D. J. Watts. Empirical analysis of an evolving social network. Science, 311(5757):88--90, 2006.Google ScholarCross Ref
- J. Leskovec and C. Faloutsos. Sampling from large graphs. In KDD, pages 631--636, 2006. Google ScholarDigital Library
- D. Liben-Nowell and J. Kleinberg. The link prediction problem for social networks. In CIKM, pages 556--559, 2003. Google ScholarDigital Library
- P. Mahadevan, D. Krioukov, K. Fall, and A. Vahdat. Systematic topology analysis and generation using degree correlations. SIGCOMM Comput. Commun. Rev., 36(4):135--146, 2006. Google ScholarDigital Library
- P. Mahadevan, D. Krioukov, M. Fomenkov, X. Dimitropoulos, k. c. claffy, and A. Vahdat. The internet as-level topology: Three data sources and one definitive metric. SIGCOMM Comput. Commun. Rev., 36(1):17--26, Jan. 2006. Google ScholarDigital Library
- J. Mestre. Greedy in approximation algorithms. In ESA, pages 528--539, 2006. Google ScholarDigital Library
- S. Micali and V. V. Vazirani. An O(√|V|•|E|) algorithm for finding maximum matching in general graphs. In FOCS, pages 17--27, 1980. Google ScholarDigital Library
- D. Micciancio. The hardness of the closest vector problem with preprocessing. IEEE Trans. on Information Theory, 47(3):1212--1215, 2001. Google ScholarDigital Library
- M. Mihail and N. K. Vishnoi. On generating graphs with prescribed vertex degrees for complex network modelling. ARACNE, pages 1--11, 2002.Google Scholar
- M. E. Newman. Spread of epidemic disease on networks. Phys. Rev. E, 66(1), 2002.Google ScholarCross Ref
- M. Potamias, F. Bonchi, A. Gionis, and G. Kollios. k-Nearest Neighbors in uncertain graphs. PVLDB, 3(1):997--1008, 2010. Google ScholarDigital Library
- P. Sevon, L. Eronen, P. Hintsanen, K. Kulovesi, and H. Toivonen. Link discovery in graphs derived from biological databases. In DILS, pages 35--49, 2006. Google ScholarDigital Library
- A. Tizghadam and A. Leon-Garcia. Betweenness centrality and resistance distance in communication networks. IEEE Network, 24(6):10--16, 2010. Google ScholarDigital Library
- L. G. Valiant. The complexity of enumeration and reliability problems. SIAM J. on Computing, 8(3):410--421, 1979.Google ScholarCross Ref
- D. J. Watts. Collective dynamics of 'small-world' networks. Nature, 393:440--442, 1998.Google ScholarCross Ref
- Y. Yuan, L. Chen, and G. Wang. Efficiently answering probability threshold-based shortest path queries over uncertain graphs. In DASFAA, pages 155--170, 2010. Google ScholarDigital Library
- Y. Yuan, G. Wang, L. Chen, and H. Wang. Efficient subgraph similarity search on large probabilistic graph databases. PVLDB, 5(9):800--811, 2012. Google ScholarDigital Library
- Y. Yuan, G. Wang, H. Wang, and L. Chen. Efficient subgraph search over large uncertain graphs. PVLDB, 4(11):876--886, 2011.Google ScholarDigital Library
- Z. Zou, J. Li, H. Gao, and S. Zhang. Finding top-k maximal cliques in an uncertain graph. In ICDE, pages 649--652, 2010.Google ScholarCross Ref
- Z. Zou, J. Li, H. Gao, and S. Zhang. Mining frequent subgraph patterns from uncertain graph data. TKDE, 22(9):1203--1218, 2010. Google ScholarDigital Library
Index Terms
- The pursuit of a good possible world: extracting representative instances of uncertain graphs
Recommendations
Uncertain Graph Processing through Representative Instances
Data in several applications can be represented as an uncertain graph whose edges are labeled with a probability of existence. Exact query processing on uncertain graphs is prohibitive for most applications, as it involves evaluation over an exponential ...
An uncertain chromatic number of an uncertain graph based on $$\alpha $$ź-cut coloring
An uncertain graph is a graph in which the edges are indeterminate and the existence of edges are characterized by belief degrees which are uncertain measures. This paper aims to bring graph coloring and uncertainty theory together. A new approach for ...
Mining frequent subgraphs over uncertain graph databases under probabilistic semantics
Frequent subgraph mining has been extensively studied on certain graph data. However, uncertainty is intrinsic in graph data in practice, but there is very few work on mining uncertain graph data. This paper focuses on mining frequent subgraphs over ...
Comments