Skip to main content
Top
Published in: Journal of Intelligent Information Systems 1/2017

02-04-2016

Efficient pruning for top-K ranking queries on attribute-wise uncertain datasets

Authors: Jianwen Chen, Ling Feng

Published in: Journal of Intelligent Information Systems | Issue 1/2017

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Top-K ranking queries in uncertain databases aim to find the top-K tuples according to a ranking function. The interplay between score and uncertainty makes top-K ranking in uncertain databases an intriguing issue, leading to rich query semantics. Recently, a unified ranking framework based on parameterized ranking functions (PRFs) has been formulated, which generalizes many previously proposed ranking semantics. Under the PRFs based ranking framework, efficient pruning approach for Top-K ranking on datasets with tuple-wise uncertainty has been well studied in the literature. However, this cannot be applied to top-K ranking on datasets with attribute-wise uncertainty, which are often natural and useful in analyzing uncertain data in many applications. This paper aims to develop efficient pruning techniques for top-K ranking on datasets with attribute-wise uncertainty under the PRFs based ranking framework, which has not been well studied in the literature. We first develop a Tuple Insertion Based Algorithm for computing each tuple’s PRF value, which reduce the time cost from the state of the art cubic order of magnitude to quadratic order of magnitude. Based on the Tuple Insertion Based Algorithm, three pruning strategies are developed to further reduce the time cost. The mathematics of deriving the Tuple Insertion Based Algorithm and corresponding pruning strategies are also presented. At last, we show that our pruning algorithms can also be applied to the computation of the top-k aggregate queries. The experimental results on both real and synthetic data demonstrate the effectiveness and efficiency of the proposed pruning techniques.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
go back to reference Aggarwal, C.C., & Yu, P.S. (2009). A survey of uncertain data algorithms and applications: TKDE. Aggarwal, C.C., & Yu, P.S. (2009). A survey of uncertain data algorithms and applications: TKDE.
go back to reference Agrawal, P., Benjelloun, O., Sarma, A.D., Hayworth, C., Nabar, S., Sugihara, T., & Widom, J. (2006). Trio: A system for data, uncertainty, and lineage. In VLDB. Agrawal, P., Benjelloun, O., Sarma, A.D., Hayworth, C., Nabar, S., Sugihara, T., & Widom, J. (2006). Trio: A system for data, uncertainty, and lineage. In VLDB.
go back to reference Cheng, R., Kalahnikov, D.V., & Prabhakar, S. (2003). Evaluating probabilistic queries over imprecise data. In SIGMOD. Cheng, R., Kalahnikov, D.V., & Prabhakar, S. (2003). Evaluating probabilistic queries over imprecise data. In SIGMOD.
go back to reference DeGroot, M.H., & Schervish, M.J. (2011). Probability and Statistics, 4edn: Pearson. DeGroot, M.H., & Schervish, M.J. (2011). Probability and Statistics, 4edn: Pearson.
go back to reference Deshpande, A., Guestrin, C., Madden, S.R., Hellerstein, J.M., & Hong, W. (2004). Model-driven data acquisition in sensor networks. In VLDB. Deshpande, A., Guestrin, C., Madden, S.R., Hellerstein, J.M., & Hong, W. (2004). Model-driven data acquisition in sensor networks. In VLDB.
go back to reference Ge, T., Zdonik, S., & Madden, S. (2009). Top-k queries on uncertain data: On score distribution and typical answeres. In SIGMOD. Ge, T., Zdonik, S., & Madden, S. (2009). Top-k queries on uncertain data: On score distribution and typical answeres. In SIGMOD.
go back to reference Hua, M., Pei, J., Zhang, W., & Lin, X. (2008). Ranking queries on uncertain data: A probabilistic threshold approach. In SIGMOD. Hua, M., Pei, J., Zhang, W., & Lin, X. (2008). Ranking queries on uncertain data: A probabilistic threshold approach. In SIGMOD.
go back to reference Ilyas, I.F., Beskales, G., & Soliman, M.A. (2008). A survey of top-k query processing techniques in relational database systems: ACM Computing Surveys. Ilyas, I.F., Beskales, G., & Soliman, M.A. (2008). A survey of top-k query processing techniques in relational database systems: ACM Computing Surveys.
go back to reference Jestes, J., Cormode, G., Li, F., & Yi, K. (2011). Semantics of ranking queries for probabilistic data: TKDE. Jestes, J., Cormode, G., Li, F., & Yi, K. (2011). Semantics of ranking queries for probabilistic data: TKDE.
go back to reference Kanagal, B., & Deshpande, A. (2008). Online filtering, smoothing and probabilitic modeling of streaming data. In ICDE. Kanagal, B., & Deshpande, A. (2008). Online filtering, smoothing and probabilitic modeling of streaming data. In ICDE.
go back to reference Li, J., & Deshpande, A. (2010). Ranking continuous probabilistic datasets. In VLDB. Li, J., & Deshpande, A. (2010). Ranking continuous probabilistic datasets. In VLDB.
go back to reference Li, J., Saha, B., & Deshpande, A. (2009). A unified approach to ranking in probabilistic databases. In VLDB. Li, J., Saha, B., & Deshpande, A. (2009). A unified approach to ranking in probabilistic databases. In VLDB.
go back to reference Lian, X., & Chen, L. (2008). Probabilisitc ranked queries in uncertain databases. In EDBT. Lian, X., & Chen, L. (2008). Probabilisitc ranked queries in uncertain databases. In EDBT.
go back to reference Lian, X., & Chen, L. (2011). Probabilistic inverse ranking queries in uncertain databases: The VLDB Journal. Lian, X., & Chen, L. (2011). Probabilistic inverse ranking queries in uncertain databases: The VLDB Journal.
go back to reference Lyness, J.N. (1969). Notes on the adaptive simpson quadrature routine: Journal of ACM. Lyness, J.N. (1969). Notes on the adaptive simpson quadrature routine: Journal of ACM.
go back to reference Pei, J., Hua, M., Tao, Y., & Lin, X. (2008). Query answering techniques on uncertain and probabilistic data. In SIGMOD. Pei, J., Hua, M., Tao, Y., & Lin, X. (2008). Query answering techniques on uncertain and probabilistic data. In SIGMOD.
go back to reference Ré, C., Dalvi, N., & Suciu, D. (2007). Efficient top-k query evaluation on probabilistic data. In ICDE. Ré, C., Dalvi, N., & Suciu, D. (2007). Efficient top-k query evaluation on probabilistic data. In ICDE.
go back to reference Sarma, A.D., Benjelloun, O., Halevy, A., & Widom, J. (2006). Working models for uncertain data. Sarma, A.D., Benjelloun, O., Halevy, A., & Widom, J. (2006). Working models for uncertain data.
go back to reference Soliman, M.A., & Ilyab, I.F. (2008). Probabilistic top-k and ranking-aggregate queries: TODS. Soliman, M.A., & Ilyab, I.F. (2008). Probabilistic top-k and ranking-aggregate queries: TODS.
go back to reference Soliman, M.A., & Ilyas, I.F. (2009). Ranking with uncertain scores. In ICDE. Soliman, M.A., & Ilyas, I.F. (2009). Ranking with uncertain scores. In ICDE.
go back to reference Soliman, M.A., Ilyas, I.F., & Chang, K.C.C. (2007). Top-k query processing in uncertain databases. In ICDE. Soliman, M.A., Ilyas, I.F., & Chang, K.C.C. (2007). Top-k query processing in uncertain databases. In ICDE.
go back to reference Song, C., Li, Z., & Ge, T. (2013). Top-k oracle: A new way to present top-k tuples for uncertain data. In ICDE. Song, C., Li, Z., & Ge, T. (2013). Top-k oracle: A new way to present top-k tuples for uncertain data. In ICDE.
go back to reference Tao, Y., Cheng, R., Xiao, X., Ngai, W.K., Kao, B., & Prabhakar, S. (2005). Indexing multi-dimensional uncertain data with arbitrary probability density functions. In VLDB. Tao, Y., Cheng, R., Xiao, X., Ngai, W.K., Kao, B., & Prabhakar, S. (2005). Indexing multi-dimensional uncertain data with arbitrary probability density functions. In VLDB.
go back to reference Wang, C., Yuan, L.Y., You, H.H., & Zaiane, O.R. (2011). On pruning for top-k ranking in uncertain databases. In VLDB. Wang, C., Yuan, L.Y., You, H.H., & Zaiane, O.R. (2011). On pruning for top-k ranking in uncertain databases. In VLDB.
go back to reference Zhang, X., & Chomicki, J. (2009). Semantics and evaluation of top-k queries in probabilistic databases. Distrib Parallel Databases, 26, 67–126.CrossRef Zhang, X., & Chomicki, J. (2009). Semantics and evaluation of top-k queries in probabilistic databases. Distrib Parallel Databases, 26, 67–126.CrossRef
Metadata
Title
Efficient pruning for top-K ranking queries on attribute-wise uncertain datasets
Authors
Jianwen Chen
Ling Feng
Publication date
02-04-2016
Publisher
Springer US
Published in
Journal of Intelligent Information Systems / Issue 1/2017
Print ISSN: 0925-9902
Electronic ISSN: 1573-7675
DOI
https://doi.org/10.1007/s10844-016-0403-x

Other articles of this Issue 1/2017

Journal of Intelligent Information Systems 1/2017 Go to the issue

Premium Partner