ABSTRACT
Uncertain data is inherent in a few important applications such as environmental surveillance and mobile object tracking. Top-k queries (also known as ranking queries) are often natural and useful in analyzing uncertain data in those applications. In this paper, we study the problem of answering probabilistic threshold top-k queries on uncertain data, which computes uncertain records taking a probability of at least p to be in the top-k list where p is a user specified probability threshold. We present an efficient exact algorithm, a fast sampling algorithm, and a Poisson approximation based algorithm. An empirical study using real and synthetic data sets verifies the effectiveness of probabilistic threshold top-k queries and the efficiency of our methods.
- S. Abiteboul et al. On the representation and querying of sets of possible worlds. In SIGMOD'87. Google ScholarDigital Library
- D. Angluin and L. G. Valiant. Fast probabilistic algorithms for hamiltonian circuits and matchings. In STOC'77. Google ScholarDigital Library
- L. Antova et al. 10106 worlds and beyond: Efficient representation and processing of incomplete information. In ICDE'07.Google Scholar
- O. Benjelloun et al. Uldbs: databases with uncertainty and lineage. In VLDB'06. Google ScholarDigital Library
- L. Le Cam. An approximation theorem for poisson binomial distribution. Pacific J. Math., 10:1181--1197, 1960.Google ScholarCross Ref
- R. Cheng et al. Evaluating probabilistic queries over imprecise data. In SIGMOD'03. Google ScholarDigital Library
- R. Cheng et al. Efficient indexing methods for probabilistic threshold queries over uncertain data. In VLDB'04. Google ScholarDigital Library
- N. Dalvi and D. Suciu. Management of probabilistic data: foundations and challenges. In PODS'07. Google ScholarDigital Library
- N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. In VLDB'04. Google ScholarDigital Library
- N. Dalvi and D. Suciu. The dichotomy of conjunctive queries on probabilistic structures. In PODS'07. Google ScholarDigital Library
- D. Dey and S. Sarkar. A probabilistic relational model and algebra. ACM Trans. Database Syst., 21(3):339--369, 1996. Google ScholarDigital Library
- R. Fagin et al. Optimal aggregation algorithms for middleware. In PODS'01. Google ScholarDigital Library
- J. L. Hodges and Jr. L. Le Cam. The poisson approximation to the poisson binomial distribution. The Annals of Mathematical Statistics, 31(3):737--740, 1960.Google ScholarCross Ref
- W. Hoeffding. On the distribution of the number of successes in independent trials. Annals of Mathematical Statistics, 27:713--721, 1956.Google ScholarCross Ref
- M. Hua et al. Efficiently Answering Probabilistic Threshold Top-k Queries on Uncertain Data. In ICDE'08. Google ScholarDigital Library
- T. Imielinski and Jr. Witold Lipski. Incomplete information in relational databases. Journal of ACM, 31(4):761--791, 1984. Google ScholarDigital Library
- K. Lange. Numerical analysis for statisticians. Statistics and computing. 1999.Google Scholar
- S. K. Lee. Imprecise and uncertain information in databases: An evidential approach. In ICDE'92. Google ScholarDigital Library
- X. Lian and L. Chen. Probabilistic Ranked Queries in Uncertain Databases. In EDBT'08. Google ScholarDigital Library
- R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, United Kingdom, 1995. Google ScholarDigital Library
- J. Pei et al. Probabilistic skylines on uncertain data. In VLDB'07, Viena, Austria, September 2007. Google ScholarDigital Library
- C. Ré et al. Efficient top-k query evaluation on probabilistic data. In ICDE'07.Google Scholar
- A. D. Sarma et al. Working models for uncertain data. In ICDE'06. Google ScholarDigital Library
- A. Silberstein et al. A Sampling-Based Approach to Optimizing Top-k Queries in Sensor Networks. In ICDE'06. Google ScholarDigital Library
- S. Singh et al. Indexing uncertain categorical data. In ICDE'07.Google Scholar
- M. A. Soliman et al. Top-k query processing in uncertain databases. In ICDE'07.Google Scholar
- Y. Tao et al. Indexing multi-dimensional uncertain data with arbitrary probability density functions. In VLDB'05. Google ScholarDigital Library
- Y. H. Wang. On the number of successes in independent trials. Statistica Sinica, 3:295--312, 1993.Google Scholar
- K. Yi et al. Efficient processing of top-k queries in uncertain databases. In ICDE'08. Google ScholarDigital Library
- X. Zhang and J. Chomicki. On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases. In DBRank'08 Workshop.Google Scholar
Index Terms
- Ranking queries on uncertain data: a probabilistic threshold approach
Recommendations
Ranking queries on uncertain data
Uncertain data is inherent in a few important applications. It is far from trivial to extend ranking queries (also known as top-k queries), a popular type of queries on certain data, to uncertain data. In this paper, we cast ranking queries on uncertain ...
Top-k best probability queries and semantics ranking properties on probabilistic databases
There has been much interest in answering top-k queries on probabilistic data in various applications such as market analysis, personalized services, and decision making. In probabilistic relational databases, the most common problem in answering top-k ...
Probabilistic Reverse Nearest Neighbor Queries on Uncertain Data
Uncertain data are inherent in various important applications and reverse nearest neighbor (RNN) query is an important query type for many applications. While many different types of queries have been studied on uncertain data, there is no previous work ...
Comments