Skip to main content
Erschienen in:
Buchtitelbild

2019 | OriginalPaper | Buchkapitel

Fast Distributed Top-q and Top-k Query Processing

verfasst von : Claus Dabringer, Johann Eder

Erschienen in: Transactions on Large-Scale Data- and Knowledge-Centered Systems XLI

Verlag: Springer Berlin Heidelberg

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Top-k queries retrieve the k results of a query which score best for an objective function representing the preferences of users. To require that the returned results also have to satisfy the preferences to a certain degree we introduce top-q queries which return all results which approximate the user preferences to at least some minim degree q. We show how top-q queries and top-k queries can be combined enabling the user to post a large number of interesting queries. Furthermore, we show that the calculation of top-q queries can be integrated in algorithms efficiently processing top-k queries. We implemented our approach and evaluated it against the fastest threshold based top-k query answering approaches (BPA-2). Our experiments showed an improvement by one to two orders of magnitude regarding time and memory requirements. Furthermore, we show how such queries can be processed in highly distributed peer-to-peer databases in an efficient way and propose an adaptive algorithm which takes several parameters of the network of databases into account to optimize the processing of distributed top-k queries.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Agrawal, S., Chaudhuri, S.: Automated ranking of database query results. In: CIDR, pp. 888–899 (2003) Agrawal, S., Chaudhuri, S.: Automated ranking of database query results. In: CIDR, pp. 888–899 (2003)
3.
Zurück zum Zitat Akbarinia, R., Pacitti, E., Valduriez, P.: Reducing network traffic in unstructured P2P systems using top-k queries. Distrib. Parallel Databases 19, 67–86 (2006)CrossRef Akbarinia, R., Pacitti, E., Valduriez, P.: Reducing network traffic in unstructured P2P systems using top-k queries. Distrib. Parallel Databases 19, 67–86 (2006)CrossRef
4.
Zurück zum Zitat Akbarinia, R., Pacitti, E., Valduriez, P.: Best position algorithms for top-k queries. In: Proceedings of the 33rd Internatinal Conference on Very Large Databases, pp. 495–506. VLDB Endowment (2007) Akbarinia, R., Pacitti, E., Valduriez, P.: Best position algorithms for top-k queries. In: Proceedings of the 33rd Internatinal Conference on Very Large Databases, pp. 495–506. VLDB Endowment (2007)
5.
Zurück zum Zitat Asslaber, M., Abuja, P., et al.: The Genome Austria Tissue Bank (GATIB). Pathobiology 74, 251–258 (2007)CrossRef Asslaber, M., Abuja, P., et al.: The Genome Austria Tissue Bank (GATIB). Pathobiology 74, 251–258 (2007)CrossRef
6.
Zurück zum Zitat Balke, W.-T., Nejdl, W., Siberski, W., Thaden, U.: Progressive distributed top-k retrieval in peer-to-peer networks. In: Proceedings of the 21st International Conference on Data Engineering, ICDE 2005, pp. 174–185. IEEE Computer Society (2005) Balke, W.-T., Nejdl, W., Siberski, W., Thaden, U.: Progressive distributed top-k retrieval in peer-to-peer networks. In: Proceedings of the 21st International Conference on Data Engineering, ICDE 2005, pp. 174–185. IEEE Computer Society (2005)
7.
Zurück zum Zitat Church, K., Gale, W.: Inverse document frequency (IDF): a measure of deviations from Poisson. In: Armstrong, S., Church, K., Isabelle, P., Manzi, S., Tzoukermann, E., Yarowsky, D. (eds.) Natural Language Processing Using Very Large Corpora. Text, Speech and Language Technology, vol. 11, pp. 283–295. Springer, Dordrecht (1999). https://doi.org/10.1007/978-94-017-2390-9_18CrossRef Church, K., Gale, W.: Inverse document frequency (IDF): a measure of deviations from Poisson. In: Armstrong, S., Church, K., Isabelle, P., Manzi, S., Tzoukermann, E., Yarowsky, D. (eds.) Natural Language Processing Using Very Large Corpora. Text, Speech and Language Technology, vol. 11, pp. 283–295. Springer, Dordrecht (1999). https://​doi.​org/​10.​1007/​978-94-017-2390-9_​18CrossRef
8.
Zurück zum Zitat Ciglic, M., Eder, J., Koncilia, C.: Anonymization of data sets with NULL values. In: Hameurlain, A., Küng, J., Wagner, R., Decker, H., Lhotska, L., Link, S. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXIV. LNCS, vol. 9510, pp. 193–220. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49214-7_7CrossRef Ciglic, M., Eder, J., Koncilia, C.: Anonymization of data sets with NULL values. In: Hameurlain, A., Küng, J., Wagner, R., Decker, H., Lhotska, L., Link, S. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXIV. LNCS, vol. 9510, pp. 193–220. Springer, Heidelberg (2016). https://​doi.​org/​10.​1007/​978-3-662-49214-7_​7CrossRef
9.
Zurück zum Zitat Conner, W., Hwang, S.-W., Nahrstedt, K.: Unified framework for top-k query processing in peer-to-peer networks. Technical report, University of Illinois (2007) Conner, W., Hwang, S.-W., Nahrstedt, K.: Unified framework for top-k query processing in peer-to-peer networks. Technical report, University of Illinois (2007)
10.
Zurück zum Zitat Dabringer, C.: Efficient local and distributed query processing in a biomedical environment. Ph.D. thesis, Alpen Adria Universität Klagenfurt (2012) Dabringer, C.: Efficient local and distributed query processing in a biomedical environment. Ph.D. thesis, Alpen Adria Universität Klagenfurt (2012)
11.
Zurück zum Zitat Dabringer, C., Eder, J.: Efficient top-k retrieval for user preference queries. In: Proceedings of the 26th ACM Symposium on Applied Computing (2011) Dabringer, C., Eder, J.: Efficient top-k retrieval for user preference queries. In: Proceedings of the 26th ACM Symposium on Applied Computing (2011)
12.
Zurück zum Zitat Dabringer, C., Eder, J.: Fast top-k query answering. In: Proceedings of the 22th International Conference on Database and Expert Systems Applications (2011) Dabringer, C., Eder, J.: Fast top-k query answering. In: Proceedings of the 22th International Conference on Database and Expert Systems Applications (2011)
15.
17.
Zurück zum Zitat Eder, J., Gottweis, H., Zatloukal, K.: IT solutions for privacy protection in biobanking. Publ. Health Genom. 15(5), 254–262 (2012)CrossRef Eder, J., Gottweis, H., Zatloukal, K.: IT solutions for privacy protection in biobanking. Publ. Health Genom. 15(5), 254–262 (2012)CrossRef
18.
Zurück zum Zitat Eder, J., Koncilia, C., Morzy, T.: A model for a temporal data warehouse. In: Open Enterprise Solutions: Systems, Experiences and Organizations (OES-SEO 2001). Luiss Edizioni (2001) Eder, J., Koncilia, C., Morzy, T.: A model for a temporal data warehouse. In: Open Enterprise Solutions: Systems, Experiences and Organizations (OES-SEO 2001). Luiss Edizioni (2001)
19.
Zurück zum Zitat Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: Proceedings of the 2001 ACM Symposium on Principles of Database Systems, pp. 102–113. ACM, New York (2001) Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: Proceedings of the 2001 ACM Symposium on Principles of Database Systems, pp. 102–113. ACM, New York (2001)
20.
Zurück zum Zitat Fang, Q., Yang, G.: Efficient top-k query processing algorithms in highly distributed environments. J. Comput. 9(9), 2000–2006 (2014)CrossRef Fang, Q., Yang, G.: Efficient top-k query processing algorithms in highly distributed environments. J. Comput. 9(9), 2000–2006 (2014)CrossRef
21.
Zurück zum Zitat Fang, Q., Zhao, Y., Yang, G., Wang, B., Zheng, W.: Best position algorithms for top-k query processing in highly distributed environments. In: Proceedings of the 2010 First International Conference on Networking and Distributed Computing, ICNDC 2010, pp. 397–401. IEEE Computer Society, Washington, DC (2010) Fang, Q., Zhao, Y., Yang, G., Wang, B., Zheng, W.: Best position algorithms for top-k query processing in highly distributed environments. In: Proceedings of the 2010 First International Conference on Networking and Distributed Computing, ICNDC 2010, pp. 397–401. IEEE Computer Society, Washington, DC (2010)
22.
Zurück zum Zitat Feuerstein, S., Pribyl, B.: Oracle PL/SQL Programming, 5th edn. Paperback, Sebastopol (2009)MATH Feuerstein, S., Pribyl, B.: Oracle PL/SQL Programming, 5th edn. Paperback, Sebastopol (2009)MATH
23.
Zurück zum Zitat Frank, A., Asuncion, A.: UCI Machine Learning Repository (2010) Frank, A., Asuncion, A.: UCI Machine Learning Repository (2010)
24.
Zurück zum Zitat Guntzer, U., Balke, W.-T., Kiessling, W.: Optimizing multi-feature queries for image databases. In: Proceedings of the 26th International Conference on Very Large Databases, pp. 419–428. Morgan Kaufmann Publishers Inc., San Francisco (2000) Guntzer, U., Balke, W.-T., Kiessling, W.: Optimizing multi-feature queries for image databases. In: Proceedings of the 26th International Conference on Very Large Databases, pp. 419–428. Morgan Kaufmann Publishers Inc., San Francisco (2000)
25.
Zurück zum Zitat Guntzer, U., Balke, W.-T., Kiessling, W.: Towards efficient multi-feature queries in heterogeneous environments. In: Proceedings of the IEEE International Conference on IT: Coding and Computing, pp. 622–628 (2001) Guntzer, U., Balke, W.-T., Kiessling, W.: Towards efficient multi-feature queries in heterogeneous environments. In: Proceedings of the IEEE International Conference on IT: Coding and Computing, pp. 622–628 (2001)
26.
Zurück zum Zitat Hagihara, R., Shinohara, M., Hara, T., Nishio, S.: A message processing method for top-k query for traffic reduction in ad hoc networks. In: Proceedings of the Tenth Interenational Conference on Mobile Data Management, MDM 2009, pp. 11–20. IEEE Computer Society (2009) Hagihara, R., Shinohara, M., Hara, T., Nishio, S.: A message processing method for top-k query for traffic reduction in ad hoc networks. In: Proceedings of the Tenth Interenational Conference on Mobile Data Management, MDM 2009, pp. 11–20. IEEE Computer Society (2009)
27.
Zurück zum Zitat Hofer-Picout, P., et al.: Conception and implementation of an Austrian biobank directory integration framework. Biopreservation Biobanking 15(4), 332–340 (2017)CrossRef Hofer-Picout, P., et al.: Conception and implementation of an Austrian biobank directory integration framework. Biopreservation Biobanking 15(4), 332–340 (2017)CrossRef
28.
Zurück zum Zitat Hristidis, V., Hu, Y., Ipeirotis, P.G.: Ranked queries over sources with Boolean query interfaces without ranking support. In: 26th IEEE International Conference on Data Engineering (2010) Hristidis, V., Hu, Y., Ipeirotis, P.G.: Ranked queries over sources with Boolean query interfaces without ranking support. In: 26th IEEE International Conference on Data Engineering (2010)
29.
Zurück zum Zitat Hua, M., Pei, J., Fu, A.W.C., Lin, X., Leung, H.-F.: Efficiently answering top-k typicality queries on large databases. In: Proceedings of the 33rd Interenational Conference on Very Large Databases, pp. 890–901. VLDB Endowment (2007) Hua, M., Pei, J., Fu, A.W.C., Lin, X., Leung, H.-F.: Efficiently answering top-k typicality queries on large databases. In: Proceedings of the 33rd Interenational Conference on Very Large Databases, pp. 890–901. VLDB Endowment (2007)
30.
Zurück zum Zitat Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40(4), 1–58 (2008)CrossRef Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40(4), 1–58 (2008)CrossRef
31.
Zurück zum Zitat Levandoski, J.J., Mokbel, M.F., Khalefa, M.E., Korukanti, V.R.: FlexPref: a framework for extensible preference evaluation in database systems. In: ICDE, New York, NY, USA (2010) Levandoski, J.J., Mokbel, M.F., Khalefa, M.E., Korukanti, V.R.: FlexPref: a framework for extensible preference evaluation in database systems. In: ICDE, New York, NY, USA (2010)
32.
Zurück zum Zitat Litton, J.-E.: Launch of an infrastructure for health research: BBMRI-ERIC. Biopreservation Biobanking 16, 233–241 (2018)CrossRef Litton, J.-E.: Launch of an infrastructure for health research: BBMRI-ERIC. Biopreservation Biobanking 16, 233–241 (2018)CrossRef
33.
Zurück zum Zitat Mamoulis, N., Yiu, M.L., Cheng, K.H., Cheung, D.W.: Efficient top-k aggregation of ranked inputs. ACM Trans. Database Syst. 32(3), 19 (2007)CrossRef Mamoulis, N., Yiu, M.L., Cheng, K.H., Cheung, D.W.: Efficient top-k aggregation of ranked inputs. ACM Trans. Database Syst. 32(3), 19 (2007)CrossRef
34.
Zurück zum Zitat Marian, A., Bruno, N., Gravano, L.: Evaluating top-k queries over web-accessible databases. ACM Trans. Database Syst. 29(2), 319–362 (2004)CrossRef Marian, A., Bruno, N., Gravano, L.: Evaluating top-k queries over web-accessible databases. ACM Trans. Database Syst. 29(2), 319–362 (2004)CrossRef
35.
Zurück zum Zitat Nepal, S., Ramakrishna, M.: Query processing issues in image (multimedia) databases. In: ICDE, pp. 22–29 (1999) Nepal, S., Ramakrishna, M.: Query processing issues in image (multimedia) databases. In: ICDE, pp. 22–29 (1999)
36.
Zurück zum Zitat Owens, K.T.: Building Intelligent Databases with Oracle PL/SQL, Triggers, and Stored Procedures, 2nd edn. Prentice-Hall Inc., Upper Saddle River (1998) Owens, K.T.: Building Intelligent Databases with Oracle PL/SQL, Triggers, and Stored Procedures, 2nd edn. Prentice-Hall Inc., Upper Saddle River (1998)
37.
Zurück zum Zitat Robertson, S.: Understanding inverse document frequency: on theoretical arguments for idf. J. Doc. 60, 503–520 (2004)CrossRef Robertson, S.: Understanding inverse document frequency: on theoretical arguments for idf. J. Doc. 60, 503–520 (2004)CrossRef
40.
Zurück zum Zitat Vlachou, A., Doulkeridis, C., Nørvåg, K., Vazirgiannis, M.: On efficient top-k query processing in highly distributed environments. In: Procedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, pp. 753–764. ACM (2008) Vlachou, A., Doulkeridis, C., Nørvåg, K., Vazirgiannis, M.: On efficient top-k query processing in highly distributed environments. In: Procedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, pp. 753–764. ACM (2008)
41.
Zurück zum Zitat Wichmann, H.-E., Kuhn, K., et al.: Comprehensive catalog of European biobanks. Nat. Biotechnol. 29(9), 795–797 (2011)CrossRef Wichmann, H.-E., Kuhn, K., et al.: Comprehensive catalog of European biobanks. Nat. Biotechnol. 29(9), 795–797 (2011)CrossRef
Metadaten
Titel
Fast Distributed Top-q and Top-k Query Processing
verfasst von
Claus Dabringer
Johann Eder
Copyright-Jahr
2019
Verlag
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-662-58808-6_1