skip to main content
research-article

k-nearest neighbors in uncertain graphs

Published:01 September 2010Publication History
Skip Abstract Section

Abstract

Complex networks, such as biological, social, and communication networks, often entail uncertainty, and thus, can be modeled as probabilistic graphs. Similar to the problem of similarity search in standard graphs, a fundamental problem for probabilistic graphs is to efficiently answer k-nearest neighbor queries (k-NN), which is the problem of computing the k closest nodes to some specific node.

In this paper we introduce a framework for processing k-NN queries in probabilistic graphs. We propose novel distance functions that extend well-known graph concepts, such as shortest paths. In order to compute them in probabilistic graphs, we design algorithms based on sampling. During k-NN query processing we efficiently prune the search space using novel techniques.

Our experiments indicate that our distance functions outperform previously used alternatives in identifying true neighbors in real-world biological data. We also demonstrate that our algorithms scale for graphs with tens of millions of edges.

References

  1. E. Adar and C. Re. Managing uncertainty in social networks. IEEE Data Eng. Bull., 30(2):15--22, 2007.Google ScholarGoogle Scholar
  2. C. Aggarwal, Y. Li, J. Wang, and J. Wang. Frequent pattern mining with uncertain data. In KDD, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. Agrawal, O. Benjelloun, A. D. Sarma, C. Hayworth, S. Nabar, T. Sugihara, and J. Widom. Trio: A system for data, uncertainty, and lineage. In VLDB, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. L. Antova, T. Jansen, C. Koch, and D. Olteanu. Fast and simple relational processing of uncertain data. In ICDE, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Asthana, O. D. King, F. D. Gibbons, and F. P. Roth. Predicting protein complex membership using probabilistic network reliability. Genome Research, 14:1170--1175, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  6. K. Avrachenkov, N. Litvak, D. Nemirovsky, and N. Osipova. Monte carlo methods in pagerank computation: When one iteration is sufficient. Technical report, SIAM Journal of Numerical Analysis, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Bishop. Pattern Recognition and Machine Learning. Springer, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Biswas and R. Morris. Exor: opportunistic multi-hop routing for wireless networks. In SIGCOMM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. P. Boldi and S. Vigna. The webgraph framework ii: Codes for the world-wide web. In Data Compression Conference, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Cheng, L. Chen, J. Chen, and X. Xie. Evaluating probability threshold k-nearest-neighbor queries over uncertain data. In EDBT, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. Cormode, F. Li, and K. Yi. Semantics of ranking queries for probabilistic data and expected ranks. In ICDE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G. Cormode and A. McGregor. Approximation algorithms for clustering uncertain data. In PODS, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. N. N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. In VLDB, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. G. Dantzig. Linear Programming and Extensions. Princeton University Press, August 1998.Google ScholarGoogle Scholar
  15. P. Domingos and M. Richardson. Mining the network value of customers. In KDD, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Fogaras and B. Rácz. Towards scaling fully personalized pagerank. Algorithms and Models for the Web-Graph, pages 105--117, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  17. H. Frank. Shortest path in probabilistic graphs. Operations Research, 17(4):583--599, July-August 1969.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. T. Ge, S. Zdonik, and S. Madden. Top-k queries on uncertain data: On score distribution and typical answers. In SIGMOD, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Ghosh, H. Ngo, S. Yoon, and C. Qiao. On a routing problem within probabilistic graphs and its application to intermittently connected networks. In INFOCOM, 2007.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Hua, J. Pei, W. Zhang, and X. Lin. Ranking queries on uncertain data: a probabilistic threshold approach. In SIGMOD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. P. Jaillet. Shortest path problems with nodes failures. Networks, 22:589--605, 1992.Google ScholarGoogle ScholarCross RefCross Ref
  22. R. Jampani, F. Xu, M. Wu, L. L. Perez, C. M. Jermaine, and P. J. Haas. Mcdb: a monte carlo approach to managing uncertain data. In SIGMOD Conference, pages 687--700, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. Jin, K. Yi, L. Chen, J. X. Yu, and X. Lin. Sliding-window top-k queries on uncertain streams. PVLDB, 1(1), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. D. Kempe, J. M. Kleinberg, and É. Tardos. Maximizing the spread of influence through a social network. In KDD, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. C. Koch. Approximating predicates and expressive queries on probabilistic databases. In PODS, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. N. J. Krogan, G. Cagney, and al. Global landscape of protein complexes in the yeast saccharomyces cerevisiae. Nature, 440(7084):637--643, March 2006.Google ScholarGoogle ScholarCross RefCross Ref
  27. J. Li, B. Saha, and A. Deshpande. A unified approach to ranking in probabilistic databases. PVLDB, 2(1):502--513, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. D. Liben-Nowell and J. Kleinberg. The link prediction problem for social networks. In CIKM, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. L. Lovász. Random walks on graphs: A survey. In Combinatorics, Paul Erdõs is Eighty, 1993.Google ScholarGoogle Scholar
  30. H. Mewes and al. Mips: analysis and annotation of proteins from whole genomes. Nucleic Acids Res, 32:D41--44, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  31. L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.Google ScholarGoogle Scholar
  32. H. Qiu and E. R. Hancock. Clustering and embedding using commute times. IEEE PAME, 29(11), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. D. Rasteiro and J. Anjo. Optimal paths in probabilistic networks. Journal of Mathematical Sciences, 120(2), 2004.Google ScholarGoogle Scholar
  34. C. Re, N. N. Dalvi, and D. Suciu. Efficient top-k query evaluation on probabilistic data. In ICDE, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  35. M. Renz, T. Bernecker, F. Verhein, A. Zuefle, and H.-P. Kriegel. Probabilistic frequent itemset mining in uncertain databases. In KDD, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. P. Sarkar, A. W. Moore, and A. Prakash. Fast incremental proximity search in large graphs. In ICML, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. P. Sen, A. Deshpande, and L. Getoor. PrDB: managing and exploiting rich correlations in probabilistic databases. VLDB Journal, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. P. Sevon, L. Eronen, P. Hintsanen, K. Kulovesi, and H. Toivonen. Link discovery in graphs derived from biological databases. In DILS, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. S. Singh, C. Mayfield, S. Mittal, S. Prabhakar, S. E. Hambrusch, and R. Shah. Orion 2.0: native support for uncertain data. In SIGMOD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. M. Soliman, I. Ilyas, and K. C.-C. Chang. Top-k query processing in uncertain databases. In ICDE, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  41. M. A. Soliman and I. F. Ilyas. Ranking with uncertain scores. In ICDE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Y. Tian, J. M. Patel, V. Nair, S. Martini, and M. Kretzler. Periscope/gq: a graph querying toolkit. PVLDB, 1(2):1404--1407, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. L. G. Valiant. The complexity of enumeration and reliability problems. SIAM Journal on Computing, 8(3):410--421, 1979.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. D. Z. Wang, E. Michelakis, M. Garofalakis, and J. M. Hellerstein. Bayesstore: managing large, uncertain data repositories with probabilistic graphical models. Proc. VLDB Endow., 1(1):340--351, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. K. Yi, F. Li, G. Kollios, and D. Srivastava. Efficient processing of top-k queries in uncertain databases with x-relations. IEEE Trans. Knowl. Data Eng., 20(12):1669--1682, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. M. L. Yiu, N. Mamoulis, X. Dai, Y. Tao, and M. Vaitis. Efficient evaluation of probabilistic advanced spatial queries on existentially uncertain data. IEEE TKDE, 21(1), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. X. Zhang and J. Chomicki. On the semantics and evaluation of top-k queries in probabilistic databases. In DBRank, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. k-nearest neighbors in uncertain graphs
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image Proceedings of the VLDB Endowment
      Proceedings of the VLDB Endowment  Volume 3, Issue 1-2
      September 2010
      1658 pages

      Publisher

      VLDB Endowment

      Publication History

      • Published: 1 September 2010
      Published in pvldb Volume 3, Issue 1-2

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader