Skip to main content
Erschienen in: Discover Computing 3/2017

13.03.2017 | Information Retrieval Efficiency

Waves: a fast multi-tier top-k query processing algorithm

verfasst von: Caio Moura Daoud, Edleno Silva de Moura, David Fernandes, Altigran Soares da Silva, Cristian Rossi, Andre Carvalho

Erschienen in: Discover Computing | Ausgabe 3/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, we present Waves, a novel document-at-a-time algorithm for fast computing of top-k query results in search systems. The Waves algorithm uses multi-tier indexes for processing queries. It performs successive tentative evaluations of results which we call waves. Each wave traverses the index, starting from a specific tier level i. Each wave i may insert only those documents that occur in that tier level into the answer. After processing a wave, the algorithm checks whether the answer achieved might be changed by successive waves or not. A new wave is started only if it has a chance of changing the top-k scores. We show through experiments that such lazy query processing strategy results in smaller query processing times when compared to previous approaches proposed in the literature. We present experiments to compare Waves’ performance to the state-of-the-art document-at-a-time query processing methods that preserve top-k results and show scenarios where the method can be a good alternative algorithm for computing top-k results.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
The function for Waves will be available only in case acceptance, since the repository is public and can be reached by other people, but still the other codes maybe useful for the reviewers in case of doubts about the implementation of the methods.
 
Literatur
Zurück zum Zitat Anh, V., & Moffat, A. (2006). Pruned query evaluation using pre-computed impacts. In: ACM SIGIR (pp. 372–379). Anh, V., & Moffat, A. (2006). Pruned query evaluation using pre-computed impacts. In: ACM SIGIR (pp. 372–379).
Zurück zum Zitat Anh, V. N., de Kretser, O., & Moffat, A. (2001). Vector-space ranking with effective early termination. In ACM SIGIR (pp. 35–42). Anh, V. N., de Kretser, O., & Moffat, A. (2001). Vector-space ranking with effective early termination. In ACM SIGIR (pp. 35–42).
Zurück zum Zitat Baeza-Yates, R., & Ribeiro-Neto, B. (2011). Modern information retrieval (2nd ed.). Reading: Addison-Wesley Publishing Company. Baeza-Yates, R., & Ribeiro-Neto, B. (2011). Modern information retrieval (2nd ed.). Reading: Addison-Wesley Publishing Company.
Zurück zum Zitat Broder, A. Z., Carmel, D., Herscovici, M., Soffer, A., & Zien, J. (2003). Efficient query evaluation using a two-level retrieval process. In ACM CIKM (pp. 426–434). Broder, A. Z., Carmel, D., Herscovici, M., Soffer, A., & Zien, J. (2003). Efficient query evaluation using a two-level retrieval process. In ACM CIKM (pp. 426–434).
Zurück zum Zitat Carmel, D., Cohen, D., Fagin, R., Farchi, E., Herscovici, M., Maarek, Y. S., et al. (2001). Static index pruning for information retrieval systems. In ACM SIGIR (pp. 43–50). Carmel, D., Cohen, D., Fagin, R., Farchi, E., Herscovici, M., Maarek, Y. S., et al. (2001). Static index pruning for information retrieval systems. In ACM SIGIR (pp. 43–50).
Zurück zum Zitat Carvalho, A., Rossi, C., de Moura, E. S., Fernandes, D., & da Silva, A. S. (2012). LePrEF: Learn to pre-compute evidence fusion for efficient query evaluation. JASIST, 55(92), 1–28. Carvalho, A., Rossi, C., de Moura, E. S., Fernandes, D., & da Silva, A. S. (2012). LePrEF: Learn to pre-compute evidence fusion for efficient query evaluation. JASIST, 55(92), 1–28.
Zurück zum Zitat Chakrabarti, K., Chaudhuri, S., & Ganti, V. (2011). Interval-based pruning for top-k processing over compressed lists. In Proceedings of the 2011 IEEE 27th international conference on data engineering, ICDE ’11 (pp. 709–720). IEEE Computer Society, Washington, DC, USA. doi:10.1109/ICDE.2011.5767855. Chakrabarti, K., Chaudhuri, S., & Ganti, V. (2011). Interval-based pruning for top-k processing over compressed lists. In Proceedings of the 2011 IEEE 27th international conference on data engineering, ICDE ’11 (pp. 709–720). IEEE Computer Society, Washington, DC, USA. doi:10.​1109/​ICDE.​2011.​5767855.
Zurück zum Zitat Daoud, C., de Moura, E. S., Fernandes, D., da Silva, A. S., Carvalho, A. L., & Rossi, C. (2016). Fast top-k preserving query processing using two-tier indexes. Information Processing & Management, 52(5), 855–872.CrossRef Daoud, C., de Moura, E. S., Fernandes, D., da Silva, A. S., Carvalho, A. L., & Rossi, C. (2016). Fast top-k preserving query processing using two-tier indexes. Information Processing & Management, 52(5), 855–872.CrossRef
Zurück zum Zitat Dimopoulos, C., Nepomnyachiy, S., & Suel, T. (2013). A candidate filtering mechanism for fast top-k query processing on modern cpus. In ACM SIGIR (pp. 723–732). Dimopoulos, C., Nepomnyachiy, S., & Suel, T. (2013). A candidate filtering mechanism for fast top-k query processing on modern cpus. In ACM SIGIR (pp. 723–732).
Zurück zum Zitat Ding, S., & Suel, T. (2011). Faster top-k document retrieval using block-max indexes. In ACM SIGIR (pp. 993–1002). Ding, S., & Suel, T. (2011). Faster top-k document retrieval using block-max indexes. In ACM SIGIR (pp. 993–1002).
Zurück zum Zitat Fontoura, M., Josifovski, V., Liu, J., Venkatesan, S., Zhu, X., & Zien, J. Y. (2011). Evaluation strategies for top-k queries over memory-resident inverted indexes. PVLDB, 4(12), 1213–1224. Fontoura, M., Josifovski, V., Liu, J., Venkatesan, S., Zhu, X., & Zien, J. Y. (2011). Evaluation strategies for top-k queries over memory-resident inverted indexes. PVLDB, 4(12), 1213–1224.
Zurück zum Zitat Herrera, M. R., de Moura, E. S., Cristo, M., Silva, T. P., & da Silva, A. S. (2010). Exploring features for the automatic identification of user goals in web search. Information Processing & Management, 46(2), 131–142.CrossRef Herrera, M. R., de Moura, E. S., Cristo, M., Silva, T. P., & da Silva, A. S. (2010). Exploring features for the automatic identification of user goals in web search. Information Processing & Management, 46(2), 131–142.CrossRef
Zurück zum Zitat Moura, E Sd, Santos, C Fd, Araujo, Bd S, Silva, A Sd, Calado, P., Nascimento, M. A., et al. (2008). Locality-based pruning methods for web search. ACM Transactions on Information Systems (TOIS), 26(2), 9.CrossRef Moura, E Sd, Santos, C Fd, Araujo, Bd S, Silva, A Sd, Calado, P., Nascimento, M. A., et al. (2008). Locality-based pruning methods for web search. ACM Transactions on Information Systems (TOIS), 26(2), 9.CrossRef
Zurück zum Zitat Ntoulas, A., & Cho, J. (2007). Pruning policies for two-tiered inverted index with correctness guarantee. In ACM SIGIR (pp. 191–198). Ntoulas, A., & Cho, J. (2007). Pruning policies for two-tiered inverted index with correctness guarantee. In ACM SIGIR (pp. 191–198).
Zurück zum Zitat Ottaviano, G., & Venturini, R. (2014). Partitioned Elias–Fano indexes. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval (pp. 273–282). ACM. Ottaviano, G., & Venturini, R. (2014). Partitioned Elias–Fano indexes. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval (pp. 273–282). ACM.
Zurück zum Zitat Risvik, K., Aasheim, Y., & Lidal, M. (2003). Multi-tier architecture for web search engines. In First Latin American web congress (pp. 132–143). Risvik, K., Aasheim, Y., & Lidal, M. (2003). Multi-tier architecture for web search engines. In First Latin American web congress (pp. 132–143).
Zurück zum Zitat Robertson, S. E., & Walker, S. (1994). Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In ACM SIGIR (pp. 232–241). Robertson, S. E., & Walker, S. (1994). Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In ACM SIGIR (pp. 232–241).
Zurück zum Zitat Rossi, C., de Moura, E. S., Carvalho, A. L., & da Silva, A. S. (2013). Fast document-at-a-time query processing using two-tier indexes. In ACM SIGIR (pp. 183–192). Rossi, C., de Moura, E. S., Carvalho, A. L., & da Silva, A. S. (2013). Fast document-at-a-time query processing using two-tier indexes. In ACM SIGIR (pp. 183–192).
Zurück zum Zitat Salton, G., Wong, A., & Yang, C. S. (1974). A vector space model for automatic indexing. Tech. Rep., Ithaca, NY. Salton, G., Wong, A., & Yang, C. S. (1974). A vector space model for automatic indexing. Tech. Rep., Ithaca, NY.
Zurück zum Zitat Shan, D., Ding, S., He, J., Yan, H., & Li, X. (2012). Optimized top-k processing with global page scores on block-max indexes. In WSDM (pp. 423–432). Shan, D., Ding, S., He, J., Yan, H., & Li, X. (2012). Optimized top-k processing with global page scores on block-max indexes. In WSDM (pp. 423–432).
Zurück zum Zitat Silvestri, F. (2007). Sorting out the document identifier assignment problem. In European conference on information retrieval (pp. 101–112). Springer. Silvestri, F. (2007). Sorting out the document identifier assignment problem. In European conference on information retrieval (pp. 101–112). Springer.
Zurück zum Zitat Skobeltsyn, G., Junqueira, F., Plachouras, V., & Baeza-Yates, R. (2008). ResIn: A combination of results caching and index pruning for high-performance web search engines. In ACM SIGIR (pp. 131–138). Skobeltsyn, G., Junqueira, F., Plachouras, V., & Baeza-Yates, R. (2008). ResIn: A combination of results caching and index pruning for high-performance web search engines. In ACM SIGIR (pp. 131–138).
Zurück zum Zitat Strohman, T., & Croft, W. B. (2007). Efficient document retrieval in main memory. In ACM SIGIR (pp. 175–182). Strohman, T., & Croft, W. B. (2007). Efficient document retrieval in main memory. In ACM SIGIR (pp. 175–182).
Zurück zum Zitat Zukowski, M., Heman, S., Nes, N., & Boncz, P. (2006). Super-scalar ram-cpu cache compression. In Proceedings of the 22nd international conference on data engineering, ICDE’06 pp. 59. IEEE Computer Society, Washington, DC, USA. Zukowski, M., Heman, S., Nes, N., & Boncz, P. (2006). Super-scalar ram-cpu cache compression. In Proceedings of the 22nd international conference on data engineering, ICDE’06 pp. 59. IEEE Computer Society, Washington, DC, USA.
Metadaten
Titel
Waves: a fast multi-tier top-k query processing algorithm
verfasst von
Caio Moura Daoud
Edleno Silva de Moura
David Fernandes
Altigran Soares da Silva
Cristian Rossi
Andre Carvalho
Publikationsdatum
13.03.2017
Verlag
Springer Netherlands
Erschienen in
Discover Computing / Ausgabe 3/2017
Print ISSN: 2948-2984
Elektronische ISSN: 2948-2992
DOI
https://doi.org/10.1007/s10791-017-9298-6

Weitere Artikel der Ausgabe 3/2017

Discover Computing 3/2017 Zur Ausgabe

Information Retrieval Efficiency

Efficient distributed selective search