Abstract
Complex networks, such as biological, social, and communication networks, often entail uncertainty, and thus, can be modeled as probabilistic graphs. Similar to the problem of similarity search in standard graphs, a fundamental problem for probabilistic graphs is to efficiently answer k-nearest neighbor queries (k-NN), which is the problem of computing the k closest nodes to some specific node.
In this paper we introduce a framework for processing k-NN queries in probabilistic graphs. We propose novel distance functions that extend well-known graph concepts, such as shortest paths. In order to compute them in probabilistic graphs, we design algorithms based on sampling. During k-NN query processing we efficiently prune the search space using novel techniques.
Our experiments indicate that our distance functions outperform previously used alternatives in identifying true neighbors in real-world biological data. We also demonstrate that our algorithms scale for graphs with tens of millions of edges.
- E. Adar and C. Re. Managing uncertainty in social networks. IEEE Data Eng. Bull., 30(2):15--22, 2007.Google Scholar
- C. Aggarwal, Y. Li, J. Wang, and J. Wang. Frequent pattern mining with uncertain data. In KDD, 2009. Google ScholarDigital Library
- P. Agrawal, O. Benjelloun, A. D. Sarma, C. Hayworth, S. Nabar, T. Sugihara, and J. Widom. Trio: A system for data, uncertainty, and lineage. In VLDB, 2006. Google ScholarDigital Library
- L. Antova, T. Jansen, C. Koch, and D. Olteanu. Fast and simple relational processing of uncertain data. In ICDE, 2008. Google ScholarDigital Library
- S. Asthana, O. D. King, F. D. Gibbons, and F. P. Roth. Predicting protein complex membership using probabilistic network reliability. Genome Research, 14:1170--1175, 2004.Google ScholarCross Ref
- K. Avrachenkov, N. Litvak, D. Nemirovsky, and N. Osipova. Monte carlo methods in pagerank computation: When one iteration is sufficient. Technical report, SIAM Journal of Numerical Analysis, 2005. Google ScholarDigital Library
- C. Bishop. Pattern Recognition and Machine Learning. Springer, 2006. Google ScholarDigital Library
- S. Biswas and R. Morris. Exor: opportunistic multi-hop routing for wireless networks. In SIGCOMM, 2005. Google ScholarDigital Library
- P. Boldi and S. Vigna. The webgraph framework ii: Codes for the world-wide web. In Data Compression Conference, 2004. Google ScholarDigital Library
- R. Cheng, L. Chen, J. Chen, and X. Xie. Evaluating probability threshold k-nearest-neighbor queries over uncertain data. In EDBT, 2009. Google ScholarDigital Library
- G. Cormode, F. Li, and K. Yi. Semantics of ranking queries for probabilistic data and expected ranks. In ICDE, 2009. Google ScholarDigital Library
- G. Cormode and A. McGregor. Approximation algorithms for clustering uncertain data. In PODS, 2008. Google ScholarDigital Library
- N. N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. In VLDB, 2004. Google ScholarDigital Library
- G. Dantzig. Linear Programming and Extensions. Princeton University Press, August 1998.Google Scholar
- P. Domingos and M. Richardson. Mining the network value of customers. In KDD, 2001. Google ScholarDigital Library
- D. Fogaras and B. Rácz. Towards scaling fully personalized pagerank. Algorithms and Models for the Web-Graph, pages 105--117, 2004.Google ScholarCross Ref
- H. Frank. Shortest path in probabilistic graphs. Operations Research, 17(4):583--599, July-August 1969.Google ScholarDigital Library
- T. Ge, S. Zdonik, and S. Madden. Top-k queries on uncertain data: On score distribution and typical answers. In SIGMOD, 2009. Google ScholarDigital Library
- J. Ghosh, H. Ngo, S. Yoon, and C. Qiao. On a routing problem within probabilistic graphs and its application to intermittently connected networks. In INFOCOM, 2007.Google ScholarDigital Library
- M. Hua, J. Pei, W. Zhang, and X. Lin. Ranking queries on uncertain data: a probabilistic threshold approach. In SIGMOD, 2008. Google ScholarDigital Library
- P. Jaillet. Shortest path problems with nodes failures. Networks, 22:589--605, 1992.Google ScholarCross Ref
- R. Jampani, F. Xu, M. Wu, L. L. Perez, C. M. Jermaine, and P. J. Haas. Mcdb: a monte carlo approach to managing uncertain data. In SIGMOD Conference, pages 687--700, 2008. Google ScholarDigital Library
- C. Jin, K. Yi, L. Chen, J. X. Yu, and X. Lin. Sliding-window top-k queries on uncertain streams. PVLDB, 1(1), 2008. Google ScholarDigital Library
- D. Kempe, J. M. Kleinberg, and É. Tardos. Maximizing the spread of influence through a social network. In KDD, 2003. Google ScholarDigital Library
- C. Koch. Approximating predicates and expressive queries on probabilistic databases. In PODS, 2008. Google ScholarDigital Library
- N. J. Krogan, G. Cagney, and al. Global landscape of protein complexes in the yeast saccharomyces cerevisiae. Nature, 440(7084):637--643, March 2006.Google ScholarCross Ref
- J. Li, B. Saha, and A. Deshpande. A unified approach to ranking in probabilistic databases. PVLDB, 2(1):502--513, 2009. Google ScholarDigital Library
- D. Liben-Nowell and J. Kleinberg. The link prediction problem for social networks. In CIKM, 2003. Google ScholarDigital Library
- L. Lovász. Random walks on graphs: A survey. In Combinatorics, Paul Erdõs is Eighty, 1993.Google Scholar
- H. Mewes and al. Mips: analysis and annotation of proteins from whole genomes. Nucleic Acids Res, 32:D41--44, 2004.Google ScholarCross Ref
- L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.Google Scholar
- H. Qiu and E. R. Hancock. Clustering and embedding using commute times. IEEE PAME, 29(11), 2007. Google ScholarDigital Library
- D. Rasteiro and J. Anjo. Optimal paths in probabilistic networks. Journal of Mathematical Sciences, 120(2), 2004.Google Scholar
- C. Re, N. N. Dalvi, and D. Suciu. Efficient top-k query evaluation on probabilistic data. In ICDE, 2007.Google ScholarCross Ref
- M. Renz, T. Bernecker, F. Verhein, A. Zuefle, and H.-P. Kriegel. Probabilistic frequent itemset mining in uncertain databases. In KDD, 2009. Google ScholarDigital Library
- P. Sarkar, A. W. Moore, and A. Prakash. Fast incremental proximity search in large graphs. In ICML, 2008. Google ScholarDigital Library
- P. Sen, A. Deshpande, and L. Getoor. PrDB: managing and exploiting rich correlations in probabilistic databases. VLDB Journal, 2009. Google ScholarDigital Library
- P. Sevon, L. Eronen, P. Hintsanen, K. Kulovesi, and H. Toivonen. Link discovery in graphs derived from biological databases. In DILS, 2006. Google ScholarDigital Library
- S. Singh, C. Mayfield, S. Mittal, S. Prabhakar, S. E. Hambrusch, and R. Shah. Orion 2.0: native support for uncertain data. In SIGMOD, 2008. Google ScholarDigital Library
- M. Soliman, I. Ilyas, and K. C.-C. Chang. Top-k query processing in uncertain databases. In ICDE, 2007.Google ScholarCross Ref
- M. A. Soliman and I. F. Ilyas. Ranking with uncertain scores. In ICDE, 2009. Google ScholarDigital Library
- Y. Tian, J. M. Patel, V. Nair, S. Martini, and M. Kretzler. Periscope/gq: a graph querying toolkit. PVLDB, 1(2):1404--1407, 2008. Google ScholarDigital Library
- L. G. Valiant. The complexity of enumeration and reliability problems. SIAM Journal on Computing, 8(3):410--421, 1979.Google ScholarDigital Library
- D. Z. Wang, E. Michelakis, M. Garofalakis, and J. M. Hellerstein. Bayesstore: managing large, uncertain data repositories with probabilistic graphical models. Proc. VLDB Endow., 1(1):340--351, 2008. Google ScholarDigital Library
- K. Yi, F. Li, G. Kollios, and D. Srivastava. Efficient processing of top-k queries in uncertain databases with x-relations. IEEE Trans. Knowl. Data Eng., 20(12):1669--1682, 2008. Google ScholarDigital Library
- M. L. Yiu, N. Mamoulis, X. Dai, Y. Tao, and M. Vaitis. Efficient evaluation of probabilistic advanced spatial queries on existentially uncertain data. IEEE TKDE, 21(1), 2009. Google ScholarDigital Library
- X. Zhang and J. Chomicki. On the semantics and evaluation of top-k queries in probabilistic databases. In DBRank, 2008.Google ScholarDigital Library
Index Terms
- k-nearest neighbors in uncertain graphs
Recommendations
On efficiently finding reverse k-nearest neighbors over uncertain graphs
Reverse k-nearest neighbor ($$\hbox {R}k\hbox {NN}$$RkNN) query on graphs returns the data objects that take a specified query object q as one of their k-nearest neighbors. It has significant influence in many real-life applications including resource ...
A Genetic Programming Approach for Searching on Nearest Neighbors Graphs
ICMR '19: Proceedings of the 2019 on International Conference on Multimedia RetrievalThe use of nearest neighbors graphs has been leading to considerable gains over classic approaches (e.g., tree-indexing-based methods) in approximate nearest neighbor searches. Classical searches on these graphs are conduced in a greedy way by moving at ...
Reverse Nearest Neighbors in Large Graphs
A reverse nearest neighbor (RNN) query returns the data objects that have a query point as their nearest neighbor (NN). Although such queries have been studied quite extensively in Euclidean spaces, there is no previous work in the context of large ...
Comments