research-article

k-nearest neighbors in uncertain graphs

Authors:
Michalis Potamias

Boston University

Boston University
View Profile

,
Francesco Bonchi

Yahoo! Research, Barcelona, Spain

Yahoo! Research, Barcelona, Spain
View Profile

,
Aristides Gionis

Yahoo! Research, Barcelona, Spain

Yahoo! Research, Barcelona, Spain
View Profile

,
George Kollios

Boston University

Boston University
View Profile

Proceedings of the VLDB Endowment Volume 3 Issue 1-2pp 997–1008https://doi.org/10.14778/1920841.1920967

Published:01 September 2010Publication History

Proceedings of the VLDB Endowment

Abstract

Complex networks, such as biological, social, and communication networks, often entail uncertainty, and thus, can be modeled as probabilistic graphs. Similar to the problem of similarity search in standard graphs, a fundamental problem for probabilistic graphs is to efficiently answer k-nearest neighbor queries (k-NN), which is the problem of computing the k closest nodes to some specific node.

In this paper we introduce a framework for processing k-NN queries in probabilistic graphs. We propose novel distance functions that extend well-known graph concepts, such as shortest paths. In order to compute them in probabilistic graphs, we design algorithms based on sampling. During k-NN query processing we efficiently prune the search space using novel techniques.

Our experiments indicate that our distance functions outperform previously used alternatives in identifying true neighbors in real-world biological data. We also demonstrate that our algorithms scale for graphs with tens of millions of edges.

References

E. Adar and C. Re. Managing uncertainty in social networks. IEEE Data Eng. Bull., 30(2):15--22, 2007.Google Scholar
C. Aggarwal, Y. Li, J. Wang, and J. Wang. Frequent pattern mining with uncertain data. In KDD, 2009. Google ScholarDigital Library
P. Agrawal, O. Benjelloun, A. D. Sarma, C. Hayworth, S. Nabar, T. Sugihara, and J. Widom. Trio: A system for data, uncertainty, and lineage. In VLDB, 2006. Google ScholarDigital Library
L. Antova, T. Jansen, C. Koch, and D. Olteanu. Fast and simple relational processing of uncertain data. In ICDE, 2008. Google ScholarDigital Library
S. Asthana, O. D. King, F. D. Gibbons, and F. P. Roth. Predicting protein complex membership using probabilistic network reliability. Genome Research, 14:1170--1175, 2004.Google ScholarCross Ref
K. Avrachenkov, N. Litvak, D. Nemirovsky, and N. Osipova. Monte carlo methods in pagerank computation: When one iteration is sufficient. Technical report, SIAM Journal of Numerical Analysis, 2005. Google ScholarDigital Library
C. Bishop. Pattern Recognition and Machine Learning. Springer, 2006. Google ScholarDigital Library
S. Biswas and R. Morris. Exor: opportunistic multi-hop routing for wireless networks. In SIGCOMM, 2005. Google ScholarDigital Library
P. Boldi and S. Vigna. The webgraph framework ii: Codes for the world-wide web. In Data Compression Conference, 2004. Google ScholarDigital Library
R. Cheng, L. Chen, J. Chen, and X. Xie. Evaluating probability threshold k-nearest-neighbor queries over uncertain data. In EDBT, 2009. Google ScholarDigital Library
G. Cormode, F. Li, and K. Yi. Semantics of ranking queries for probabilistic data and expected ranks. In ICDE, 2009. Google ScholarDigital Library
G. Cormode and A. McGregor. Approximation algorithms for clustering uncertain data. In PODS, 2008. Google ScholarDigital Library
N. N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. In VLDB, 2004. Google ScholarDigital Library
G. Dantzig. Linear Programming and Extensions. Princeton University Press, August 1998.Google Scholar
P. Domingos and M. Richardson. Mining the network value of customers. In KDD, 2001. Google ScholarDigital Library
D. Fogaras and B. Rácz. Towards scaling fully personalized pagerank. Algorithms and Models for the Web-Graph, pages 105--117, 2004.Google ScholarCross Ref
H. Frank. Shortest path in probabilistic graphs. Operations Research, 17(4):583--599, July-August 1969.Google ScholarDigital Library
T. Ge, S. Zdonik, and S. Madden. Top-k queries on uncertain data: On score distribution and typical answers. In SIGMOD, 2009. Google ScholarDigital Library
J. Ghosh, H. Ngo, S. Yoon, and C. Qiao. On a routing problem within probabilistic graphs and its application to intermittently connected networks. In INFOCOM, 2007.Google ScholarDigital Library
M. Hua, J. Pei, W. Zhang, and X. Lin. Ranking queries on uncertain data: a probabilistic threshold approach. In SIGMOD, 2008. Google ScholarDigital Library
P. Jaillet. Shortest path problems with nodes failures. Networks, 22:589--605, 1992.Google ScholarCross Ref
R. Jampani, F. Xu, M. Wu, L. L. Perez, C. M. Jermaine, and P. J. Haas. Mcdb: a monte carlo approach to managing uncertain data. In SIGMOD Conference, pages 687--700, 2008. Google ScholarDigital Library
C. Jin, K. Yi, L. Chen, J. X. Yu, and X. Lin. Sliding-window top-k queries on uncertain streams. PVLDB, 1(1), 2008. Google ScholarDigital Library
D. Kempe, J. M. Kleinberg, and É. Tardos. Maximizing the spread of influence through a social network. In KDD, 2003. Google ScholarDigital Library
C. Koch. Approximating predicates and expressive queries on probabilistic databases. In PODS, 2008. Google ScholarDigital Library
N. J. Krogan, G. Cagney, and al. Global landscape of protein complexes in the yeast saccharomyces cerevisiae. Nature, 440(7084):637--643, March 2006.Google ScholarCross Ref
J. Li, B. Saha, and A. Deshpande. A unified approach to ranking in probabilistic databases. PVLDB, 2(1):502--513, 2009. Google ScholarDigital Library
D. Liben-Nowell and J. Kleinberg. The link prediction problem for social networks. In CIKM, 2003. Google ScholarDigital Library
L. Lovász. Random walks on graphs: A survey. In Combinatorics, Paul Erdõs is Eighty, 1993.Google Scholar
H. Mewes and al. Mips: analysis and annotation of proteins from whole genomes. Nucleic Acids Res, 32:D41--44, 2004.Google ScholarCross Ref
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.Google Scholar
H. Qiu and E. R. Hancock. Clustering and embedding using commute times. IEEE PAME, 29(11), 2007. Google ScholarDigital Library
D. Rasteiro and J. Anjo. Optimal paths in probabilistic networks. Journal of Mathematical Sciences, 120(2), 2004.Google Scholar
C. Re, N. N. Dalvi, and D. Suciu. Efficient top-k query evaluation on probabilistic data. In ICDE, 2007.Google ScholarCross Ref
M. Renz, T. Bernecker, F. Verhein, A. Zuefle, and H.-P. Kriegel. Probabilistic frequent itemset mining in uncertain databases. In KDD, 2009. Google ScholarDigital Library
P. Sarkar, A. W. Moore, and A. Prakash. Fast incremental proximity search in large graphs. In ICML, 2008. Google ScholarDigital Library
P. Sen, A. Deshpande, and L. Getoor. PrDB: managing and exploiting rich correlations in probabilistic databases. VLDB Journal, 2009. Google ScholarDigital Library
P. Sevon, L. Eronen, P. Hintsanen, K. Kulovesi, and H. Toivonen. Link discovery in graphs derived from biological databases. In DILS, 2006. Google ScholarDigital Library
S. Singh, C. Mayfield, S. Mittal, S. Prabhakar, S. E. Hambrusch, and R. Shah. Orion 2.0: native support for uncertain data. In SIGMOD, 2008. Google ScholarDigital Library
M. Soliman, I. Ilyas, and K. C.-C. Chang. Top-k query processing in uncertain databases. In ICDE, 2007.Google ScholarCross Ref
M. A. Soliman and I. F. Ilyas. Ranking with uncertain scores. In ICDE, 2009. Google ScholarDigital Library
Y. Tian, J. M. Patel, V. Nair, S. Martini, and M. Kretzler. Periscope/gq: a graph querying toolkit. PVLDB, 1(2):1404--1407, 2008. Google ScholarDigital Library
L. G. Valiant. The complexity of enumeration and reliability problems. SIAM Journal on Computing, 8(3):410--421, 1979.Google ScholarDigital Library
D. Z. Wang, E. Michelakis, M. Garofalakis, and J. M. Hellerstein. Bayesstore: managing large, uncertain data repositories with probabilistic graphical models. Proc. VLDB Endow., 1(1):340--351, 2008. Google ScholarDigital Library
K. Yi, F. Li, G. Kollios, and D. Srivastava. Efficient processing of top-k queries in uncertain databases with x-relations. IEEE Trans. Knowl. Data Eng., 20(12):1669--1682, 2008. Google ScholarDigital Library
M. L. Yiu, N. Mamoulis, X. Dai, Y. Tao, and M. Vaitis. Efficient evaluation of probabilistic advanced spatial queries on existentially uncertain data. IEEE TKDE, 21(1), 2009. Google ScholarDigital Library
X. Zhang and J. Chomicki. On the semantics and evaluation of top-k queries in probabilistic databases. In DBRank, 2008.Google ScholarDigital Library

Index Terms

k-nearest neighbors in uncertain graphs
1. Mathematics of computing
  1. Discrete mathematics
    1. Graph theory
      1. Graph algorithms

Index terms have been assigned to the content through auto-classification.

Recommendations

On efficiently finding reverse k-nearest neighbors over uncertain graphs

Reverse k-nearest neighbor ($$\hbox {R}k\hbox {NN}$$RkNN) query on graphs returns the data objects that take a specified query object q as one of their k-nearest neighbors. It has significant influence in many real-life applications including resource ...
Read More
A Genetic Programming Approach for Searching on Nearest Neighbors Graphs
ICMR '19: Proceedings of the 2019 on International Conference on Multimedia Retrieval

The use of nearest neighbors graphs has been leading to considerable gains over classic approaches (e.g., tree-indexing-based methods) in approximate nearest neighbor searches. Classical searches on these graphs are conduced in a greedy way by moving at ...
Read More
Reverse Nearest Neighbors in Large Graphs

A reverse nearest neighbor (RNN) query returns the data objects that have a query point as their nearest neighbor (NN). Although such queries have been studied quite extensively in Euclidean spaces, there is no previous work in the context of large ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Proceedings of the VLDB Endowment Volume 3, Issue 1-2
September 2010
1658 pages
ISSN:2150-8097
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 September 2010
Published in pvldb Volume 3, Issue 1-2
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 83
  Total Citations
  View Citations
- 842
  Total Downloads
- Downloads (Last 12 months)60
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

k-nearest neighbors in uncertain graphs

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

On efficiently finding reverse k-nearest neighbors over uncertain graphs

A Genetic Programming Approach for Searching on Nearest Neighbors Graphs

Reverse Nearest Neighbors in Large Graphs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

k-nearest neighbors in uncertain graphs

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

On efficiently finding reverse k-nearest neighbors over uncertain graphs

A Genetic Programming Approach for Searching on Nearest Neighbors Graphs

Reverse Nearest Neighbors in Large Graphs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media