skip to main content
10.1145/1376616.1376685acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Ranking queries on uncertain data: a probabilistic threshold approach

Authors Info & Claims
Published:09 June 2008Publication History

ABSTRACT

Uncertain data is inherent in a few important applications such as environmental surveillance and mobile object tracking. Top-k queries (also known as ranking queries) are often natural and useful in analyzing uncertain data in those applications. In this paper, we study the problem of answering probabilistic threshold top-k queries on uncertain data, which computes uncertain records taking a probability of at least p to be in the top-k list where p is a user specified probability threshold. We present an efficient exact algorithm, a fast sampling algorithm, and a Poisson approximation based algorithm. An empirical study using real and synthetic data sets verifies the effectiveness of probabilistic threshold top-k queries and the efficiency of our methods.

References

  1. S. Abiteboul et al. On the representation and querying of sets of possible worlds. In SIGMOD'87. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Angluin and L. G. Valiant. Fast probabilistic algorithms for hamiltonian circuits and matchings. In STOC'77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. L. Antova et al. 10106 worlds and beyond: Efficient representation and processing of incomplete information. In ICDE'07.Google ScholarGoogle Scholar
  4. O. Benjelloun et al. Uldbs: databases with uncertainty and lineage. In VLDB'06. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. L. Le Cam. An approximation theorem for poisson binomial distribution. Pacific J. Math., 10:1181--1197, 1960.Google ScholarGoogle ScholarCross RefCross Ref
  6. R. Cheng et al. Evaluating probabilistic queries over imprecise data. In SIGMOD'03. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. Cheng et al. Efficient indexing methods for probabilistic threshold queries over uncertain data. In VLDB'04. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. N. Dalvi and D. Suciu. Management of probabilistic data: foundations and challenges. In PODS'07. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. In VLDB'04. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. N. Dalvi and D. Suciu. The dichotomy of conjunctive queries on probabilistic structures. In PODS'07. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Dey and S. Sarkar. A probabilistic relational model and algebra. ACM Trans. Database Syst., 21(3):339--369, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Fagin et al. Optimal aggregation algorithms for middleware. In PODS'01. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. L. Hodges and Jr. L. Le Cam. The poisson approximation to the poisson binomial distribution. The Annals of Mathematical Statistics, 31(3):737--740, 1960.Google ScholarGoogle ScholarCross RefCross Ref
  14. W. Hoeffding. On the distribution of the number of successes in independent trials. Annals of Mathematical Statistics, 27:713--721, 1956.Google ScholarGoogle ScholarCross RefCross Ref
  15. M. Hua et al. Efficiently Answering Probabilistic Threshold Top-k Queries on Uncertain Data. In ICDE'08. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. T. Imielinski and Jr. Witold Lipski. Incomplete information in relational databases. Journal of ACM, 31(4):761--791, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. K. Lange. Numerical analysis for statisticians. Statistics and computing. 1999.Google ScholarGoogle Scholar
  18. S. K. Lee. Imprecise and uncertain information in databases: An evidential approach. In ICDE'92. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. X. Lian and L. Chen. Probabilistic Ranked Queries in Uncertain Databases. In EDBT'08. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, United Kingdom, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Pei et al. Probabilistic skylines on uncertain data. In VLDB'07, Viena, Austria, September 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C. Ré et al. Efficient top-k query evaluation on probabilistic data. In ICDE'07.Google ScholarGoogle Scholar
  23. A. D. Sarma et al. Working models for uncertain data. In ICDE'06. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. Silberstein et al. A Sampling-Based Approach to Optimizing Top-k Queries in Sensor Networks. In ICDE'06. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Singh et al. Indexing uncertain categorical data. In ICDE'07.Google ScholarGoogle Scholar
  26. M. A. Soliman et al. Top-k query processing in uncertain databases. In ICDE'07.Google ScholarGoogle Scholar
  27. Y. Tao et al. Indexing multi-dimensional uncertain data with arbitrary probability density functions. In VLDB'05. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Y. H. Wang. On the number of successes in independent trials. Statistica Sinica, 3:295--312, 1993.Google ScholarGoogle Scholar
  29. K. Yi et al. Efficient processing of top-k queries in uncertain databases. In ICDE'08. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. X. Zhang and J. Chomicki. On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases. In DBRank'08 Workshop.Google ScholarGoogle Scholar

Index Terms

  1. Ranking queries on uncertain data: a probabilistic threshold approach

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data
        June 2008
        1396 pages
        ISBN:9781605581026
        DOI:10.1145/1376616

        Copyright © 2008 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 9 June 2008

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate785of4,003submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader