Skip to main content
Erschienen in: Discover Computing 3/2017

11.03.2017 | Information Retrieval Efficiency

Performance improvements for search systems using an integrated cache of lists + intersections

verfasst von: Gabriel Tolosa, Esteban Feuerstein, Luca Becchetti, Alberto Marchetti-Spaccamela

Erschienen in: Discover Computing | Ausgabe 3/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Modern information retrieval systems use several levels of caching to speedup computation by exploiting frequent, recent or costly data used in the past. Previous studies show that the use of caching techniques is crucial in search engines, as it helps reducing query response times and processing workloads on search servers. In this work we propose and evaluate a static cache that acts simultaneously as list and intersection cache, offering a more efficient way of handling cache space. We also use a query resolution strategy that takes advantage of the existence of this cache to reorder the query execution sequence. In addition, we propose effective strategies to select the term pairs that should populate the cache. We also represent the data in cache in both raw and compressed forms and evaluate the differences between them using different configurations of cache sizes. The results show that the proposed Integrated Cache outperforms the standard posting lists cache in most of the cases, taking advantage not only of the intersection cache but also the query resolution strategy.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
2
The framework also supports the representation of three-term intersections at the expense of that structure of the integrated lists becomes more complex.
 
Literatur
Zurück zum Zitat Anh, V. N., & Moffat, A. (2005). Inverted index compression using word-aligned binary codes. Information Retrieval, 8(1), 151–166.CrossRef Anh, V. N., & Moffat, A. (2005). Inverted index compression using word-aligned binary codes. Information Retrieval, 8(1), 151–166.CrossRef
Zurück zum Zitat Baeza-Yates, R., & Ribeiro-Neto, B. (2011). Modern information retrieval: The concepts and technology behind search (2nd ed.). Reading, MA: Addison-Wesley Prof., Inc. Baeza-Yates, R., & Ribeiro-Neto, B. (2011). Modern information retrieval: The concepts and technology behind search (2nd ed.). Reading, MA: Addison-Wesley Prof., Inc.
Zurück zum Zitat Baeza-Yates, R., Gionis, A., Junqueira, F., Murdock, V., Plachouras, V., & Silvestri, F. (2007). The impact of caching on search engines. In Proceedings of the 30th international conference on research and development in information retrieval, SIGIR ’14. Baeza-Yates, R., Gionis, A., Junqueira, F., Murdock, V., Plachouras, V., & Silvestri, F. (2007). The impact of caching on search engines. In Proceedings of the 30th international conference on research and development in information retrieval, SIGIR ’14.
Zurück zum Zitat Barroso, L. A., Dean, J., & Hölzle, U. (2003). Web search for a planet: The google cluster architecture. IEEE Micro, 23(2), 22–28.CrossRef Barroso, L. A., Dean, J., & Hölzle, U. (2003). Web search for a planet: The google cluster architecture. IEEE Micro, 23(2), 22–28.CrossRef
Zurück zum Zitat Broder, A. Z., Carmel, D., Herscovici, M., Soffer, A., & Zien, J. (2003). Efficient query evaluation using a two-level retrieval process. In Proceedings of the 12th international conference on information and knowledge management, CIKM ’03. (pp 426–434). New York, NY: ACM. Broder, A. Z., Carmel, D., Herscovici, M., Soffer, A., & Zien, J. (2003). Efficient query evaluation using a two-level retrieval process. In Proceedings of the 12th international conference on information and knowledge management, CIKM ’03. (pp 426–434). New York, NY: ACM.
Zurück zum Zitat Cambazoglu, B. B., Zaragoza, H., Chapelle, O., Chen, J., Liao, C., Zheng, Z., et al. (2010). Early exit optimizations for additive machine learned ranking systems. In Proceedings of the 3rd ACM international conference on Web search and data mining, WSDM ’10. Cambazoglu, B. B., Zaragoza, H., Chapelle, O., Chen, J., Liao, C., Zheng, Z., et al. (2010). Early exit optimizations for additive machine learned ranking systems. In Proceedings of the 3rd ACM international conference on Web search and data mining, WSDM ’10.
Zurück zum Zitat Cao, P., & Irani, S. (1997). Cost-aware www proxy caching algorithms. In Proceedings of the USENIX symposium on internet technologies and systems on USENIX symposium on internet technologies and systems, USITS’97 (pp. 18–18). Berkeley, CA, USA: USENIX Association. Cao, P., & Irani, S. (1997). Cost-aware www proxy caching algorithms. In Proceedings of the USENIX symposium on internet technologies and systems on USENIX symposium on internet technologies and systems, USITS’97 (pp. 18–18). Berkeley, CA, USA: USENIX Association.
Zurück zum Zitat Catena, M., Macdonald, C., & Ounis, I. (2014). On inverted index compression for search engine efficiency. In M. de Rijke, T. Kenter, A. de Vries, C. Zhai, F. de Jong, K. Radinsky, & K. Hofmann (eds) Advances in information retrieval, lecture notes in computer science (Vol. 8416, pp. 359–371). New York: Springer. Catena, M., Macdonald, C., & Ounis, I. (2014). On inverted index compression for search engine efficiency. In M. de Rijke, T. Kenter, A. de Vries, C. Zhai, F. de Jong, K. Radinsky, & K. Hofmann (eds) Advances in information retrieval, lecture notes in computer science (Vol. 8416, pp. 359–371). New York: Springer.
Zurück zum Zitat Culpepper, J. S., & Moffat, A. (2007). Compact set representation for information retrieval. In Proceedings of the 14th International Conference on String Processing and Information Retrieval, SPIRE’07 (pp. 137–148). Berlin: Heidelberg. Culpepper, J. S., & Moffat, A. (2007). Compact set representation for information retrieval. In Proceedings of the 14th International Conference on String Processing and Information Retrieval, SPIRE’07 (pp. 137–148). Berlin: Heidelberg.
Zurück zum Zitat Dean, J. (2009). Challenges in building large-scale information retrieval systems: Invited talk. In Proceedings of the 2nd ACM International Conference on Web Search and Data Mining, WSDM ’09 (pp. 1–1). New York, NY, USA: ACM. Dean, J. (2009). Challenges in building large-scale information retrieval systems: Invited talk. In Proceedings of the 2nd ACM International Conference on Web Search and Data Mining, WSDM ’09 (pp. 1–1). New York, NY, USA: ACM.
Zurück zum Zitat Ding, S., Attenberg, J., Baeza-Yates, R., & Suel, T. (2011). Batch query processing for web search engines. In Proceedings of the 4th ACM international conference on Web search and data mining, WSDM ’11 (pp. 137–146). New York, NY, USA. Ding, S., Attenberg, J., Baeza-Yates, R., & Suel, T. (2011). Batch query processing for web search engines. In Proceedings of the 4th ACM international conference on Web search and data mining, WSDM ’11 (pp. 137–146). New York, NY, USA.
Zurück zum Zitat Edmonds, J. (1965). Maximum matching and a polyhedron with \(0,1\) vertices. Journal of Research of the National Bureau of Standards, 69(B), 125–130.MathSciNetCrossRefMATH Edmonds, J. (1965). Maximum matching and a polyhedron with \(0,1\) vertices. Journal of Research of the National Bureau of Standards, 69(B), 125–130.MathSciNetCrossRefMATH
Zurück zum Zitat Elias, P. (2006). Universal codeword sets and representations of the integers. IEEE Transactions on Information Theory, 21(2), 194–203.MathSciNetCrossRefMATH Elias, P. (2006). Universal codeword sets and representations of the integers. IEEE Transactions on Information Theory, 21(2), 194–203.MathSciNetCrossRefMATH
Zurück zum Zitat Fagni, T., Perego, R., Silvestri, F., & Orlando, S. (2006). Boosting the performance of web search engines: Caching and prefetching query results by exploiting historical usage data. ACM Transactions on Information Systems, 24(1), 51–78.CrossRef Fagni, T., Perego, R., Silvestri, F., & Orlando, S. (2006). Boosting the performance of web search engines: Caching and prefetching query results by exploiting historical usage data. ACM Transactions on Information Systems, 24(1), 51–78.CrossRef
Zurück zum Zitat Feuerstein, E., & Tolosa, G. (2013). Analysis of cost-aware policies for intersection caching in search nodes. In Proceedings of the XXXII conference of the Chilean society of computer science, SCCC ’13 Feuerstein, E., & Tolosa, G. (2013). Analysis of cost-aware policies for intersection caching in search nodes. In Proceedings of the XXXII conference of the Chilean society of computer science, SCCC ’13
Zurück zum Zitat Feuerstein, E., & Tolosa, G. (2014). Cost-aware intersection caching and processing strategies for in-memory inverted indexes. In Proceedings of 11th workshop on large-scale and distributed systems for information retrieval, LSDS-IR ’14. New York. Feuerstein, E., & Tolosa, G. (2014). Cost-aware intersection caching and processing strategies for in-memory inverted indexes. In Proceedings of 11th workshop on large-scale and distributed systems for information retrieval, LSDS-IR ’14. New York.
Zurück zum Zitat Gan, Q., & Suel, T. (2009). Improved techniques for result caching in web search engines. In Proceedings of the 18th international conference on World wide web, WWW ’09, pp. 431–440. Gan, Q., & Suel, T. (2009). Improved techniques for result caching in web search engines. In Proceedings of the 18th international conference on World wide web, WWW ’09, pp. 431–440.
Zurück zum Zitat Goldstein, J., Ramakrishnan, R., & Shaft, U. (1998). Compressing relations and indexes. In Proceedings of the 14th international conference on data engineering, ICDE ’98 (pp 370–379). Washington, DC, USA: IEEE Computer Society. Goldstein, J., Ramakrishnan, R., & Shaft, U. (1998). Compressing relations and indexes. In Proceedings of the 14th international conference on data engineering, ICDE ’98 (pp 370–379). Washington, DC, USA: IEEE Computer Society.
Zurück zum Zitat Golomb, S. (1966). Run-length encodings. In IEEE transactions on information theory (Vol. 12, No. 3). Golomb, S. (1966). Run-length encodings. In IEEE transactions on information theory (Vol. 12, No. 3).
Zurück zum Zitat Hirai, J., Raghavan, S., Garcia-Molina, H., & Paepcke, A. (2000). Webbase: A repository of web pages. In Proceedings of the 9th international World Wide Web conference on computer networks. Amsterdam: North-Holland Publishing Co. Hirai, J., Raghavan, S., Garcia-Molina, H., & Paepcke, A. (2000). Webbase: A repository of web pages. In Proceedings of the 9th international World Wide Web conference on computer networks. Amsterdam: North-Holland Publishing Co.
Zurück zum Zitat Jonassen, S., Cambazoglu, B. B., & Silvestri, F. (2012). Prefetching query results and its impact on search engines. In Proceedings of the 35th international conference on research and development in information retrieval, USA, SIGIR ’12, pp. 631–640. Jonassen, S., Cambazoglu, B. B., & Silvestri, F. (2012). Prefetching query results and its impact on search engines. In Proceedings of the 35th international conference on research and development in information retrieval, USA, SIGIR ’12, pp. 631–640.
Zurück zum Zitat Lam, H. T., Perego, R., Quan, N. T., & Silvestri, F. (2009). Entry pairing in inverted file. In Proceedings of the 10th international conference on Web information systems engineering, WISE ’09 (pp. 511–522). Berlin: Springer. Lam, H. T., Perego, R., Quan, N. T., & Silvestri, F. (2009). Entry pairing in inverted file. In Proceedings of the 10th international conference on Web information systems engineering, WISE ’09 (pp. 511–522). Berlin: Springer.
Zurück zum Zitat Levene, H. (1960). Contributions to probability and statistics: Essays in honor of harold hotelling. Redwood City: Stanford University Press. Levene, H. (1960). Contributions to probability and statistics: Essays in honor of harold hotelling. Redwood City: Stanford University Press.
Zurück zum Zitat Long, X., & Suel, T. (2005). Three-level caching for efficient query processing in large web search engines. In Proceedings of the 14th international conference on World Wide Web, USA, WWW ’05, pp 257–266. Long, X., & Suel, T. (2005). Three-level caching for efficient query processing in large web search engines. In Proceedings of the 14th international conference on World Wide Web, USA, WWW ’05, pp 257–266.
Zurück zum Zitat Macdonald, C., Ounis, I., & Tonellotto, N. (2011). Upper-bound approximations for dynamic pruning. ACM Transactions on Information Systems, 29(4), 17:1–17:28.CrossRef Macdonald, C., Ounis, I., & Tonellotto, N. (2011). Upper-bound approximations for dynamic pruning. ACM Transactions on Information Systems, 29(4), 17:1–17:28.CrossRef
Zurück zum Zitat Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. New York, NY, USA: Cambridge University Press.CrossRefMATH Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. New York, NY, USA: Cambridge University Press.CrossRefMATH
Zurück zum Zitat Markatos, E. (2001). On caching search engine query results. Computer Communications, 24(2), 137–143.CrossRef Markatos, E. (2001). On caching search engine query results. Computer Communications, 24(2), 137–143.CrossRef
Zurück zum Zitat Melink, S., Raghavan, S., Yang, B., & Garcia-Molina, H. (2001). Building a distributed full-text index for the web. ACM Transactions on Information Systems, 19(3), 217–241.CrossRef Melink, S., Raghavan, S., Yang, B., & Garcia-Molina, H. (2001). Building a distributed full-text index for the web. ACM Transactions on Information Systems, 19(3), 217–241.CrossRef
Zurück zum Zitat Ozcan, R., Altingovde, I. S., & Ulusoy, O. (2011). Cost-aware strategies for query result caching in web search engines. ACM Transactions on the Web, 5(2), 9:1–9:25.CrossRef Ozcan, R., Altingovde, I. S., & Ulusoy, O. (2011). Cost-aware strategies for query result caching in web search engines. ACM Transactions on the Web, 5(2), 9:1–9:25.CrossRef
Zurück zum Zitat Ozcan, R., Sengor Altingovde, I., Barla Cambazoglu, B., Junqueira, F. P., & Ulusoy, O. (2012). A five-level static cache architecture for web search engines. Information Processing & Management, 48(5), 828–840.CrossRef Ozcan, R., Sengor Altingovde, I., Barla Cambazoglu, B., Junqueira, F. P., & Ulusoy, O. (2012). A five-level static cache architecture for web search engines. Information Processing & Management, 48(5), 828–840.CrossRef
Zurück zum Zitat Pass, G., Chowdhury, A., & Torgeson, C. (2006). A picture of search. In Proceedings of the 1st international conference on scalable information systems. ACM, InfoScale ’06. Pass, G., Chowdhury, A., & Torgeson, C. (2006). A picture of search. In Proceedings of the 1st international conference on scalable information systems. ACM, InfoScale ’06.
Zurück zum Zitat Saraiva, P. C., Silva de Moura, E., Ziviani, N., Meira, W., Fonseca, R., & Riberio-Neto, B. (2001). Rank-preserving two-level caching for scalable search engines. In Proceedings of the 24th international conference on research and development in information retrieval, USA, SIGIR ’01, pp. 51–58. Saraiva, P. C., Silva de Moura, E., Ziviani, N., Meira, W., Fonseca, R., & Riberio-Neto, B. (2001). Rank-preserving two-level caching for scalable search engines. In Proceedings of the 24th international conference on research and development in information retrieval, USA, SIGIR ’01, pp. 51–58.
Zurück zum Zitat Skobeltsyn, G., Junqueira, F., Plachouras, V., & Baeza-Yates, R. (2008). Resin: A combination of results caching and index pruning for high-performance web search engines. In Proceedings of the 31th international conference on research and development in information retrieval, SIGIR ’08 (pp. 131–138). New York, NY, USA: ACM. Skobeltsyn, G., Junqueira, F., Plachouras, V., & Baeza-Yates, R. (2008). Resin: A combination of results caching and index pruning for high-performance web search engines. In Proceedings of the 31th international conference on research and development in information retrieval, SIGIR ’08 (pp. 131–138). New York, NY, USA: ACM.
Zurück zum Zitat Tolosa, G., Becchetti, L., Feuerstein, E., & Marchetti-Spaccamela, A. (2014). Performance improvements for search systems using an integrated cache of lists+intersections. In Proceedings of the 21st international symposium on string processing and information retrieval, SPIRE 2014 (Vol. 8799, pp. 227–235). New York, NY, USA: Springer. Tolosa, G., Becchetti, L., Feuerstein, E., & Marchetti-Spaccamela, A. (2014). Performance improvements for search systems using an integrated cache of lists+intersections. In Proceedings of the 21st international symposium on string processing and information retrieval, SPIRE 2014 (Vol. 8799, pp. 227–235). New York, NY, USA: Springer.
Zurück zum Zitat Turtle, H., & Flood, J. (1995). Query evaluation: Strategies and optimizations. Information Processing and Management, 31(6), 831–850.CrossRef Turtle, H., & Flood, J. (1995). Query evaluation: Strategies and optimizations. Information Processing and Management, 31(6), 831–850.CrossRef
Zurück zum Zitat Webber, W., & Moat, A. (2005). In search of reliable retrieval experiments. In Proceedings of the 10th Australasian document computing symposium, ADCS’05, pp. 26–33. Webber, W., & Moat, A. (2005). In search of reliable retrieval experiments. In Proceedings of the 10th Australasian document computing symposium, ADCS’05, pp. 26–33.
Zurück zum Zitat Williams, H. E., & Zobel, J. (1999). Compressing integers for fast file access. The Computer Journal, 42, 193–201.CrossRef Williams, H. E., & Zobel, J. (1999). Compressing integers for fast file access. The Computer Journal, 42, 193–201.CrossRef
Zurück zum Zitat Witten, I. H., Moffat, A., & Bell, T. C. (1999). Managing gigabytes (2nd Ed.). San Francisco, CA, USA: Compressing and Indexing Documents and Images. Morgan Kaufmann Publishers Inc. Witten, I. H., Moffat, A., & Bell, T. C. (1999). Managing gigabytes (2nd Ed.). San Francisco, CA, USA: Compressing and Indexing Documents and Images. Morgan Kaufmann Publishers Inc.
Zurück zum Zitat Young, N. E. (1998). On-line file caching. In Proceedings of the 9th annual ACM-SIAM symposium on discrete algorithms, society for industrial and applied mathematics. Philadelphia, PA, USA, SODA ’98, pp 82–86. Young, N. E. (1998). On-line file caching. In Proceedings of the 9th annual ACM-SIAM symposium on discrete algorithms, society for industrial and applied mathematics. Philadelphia, PA, USA, SODA ’98, pp 82–86.
Zurück zum Zitat Zhang, J., Long, X., & Suel, T. (2008). Performance of compressed inverted list caching in search engines. In Proceedings of the 17th international conference on World Wide Web, USA, WWW ’08, pp. 387–396. Zhang, J., Long, X., & Suel, T. (2008). Performance of compressed inverted list caching in search engines. In Proceedings of the 17th international conference on World Wide Web, USA, WWW ’08, pp. 387–396.
Zurück zum Zitat Zobel, J., & Moffat, A. (2006). Inverted files for text search engines. ACM Computer Surveys, 38(2), Article No 6. Zobel, J., & Moffat, A. (2006). Inverted files for text search engines. ACM Computer Surveys, 38(2), Article No 6.
Zurück zum Zitat Zukowski, M., Heman, S., Nes, N., & Boncz, P. (2006). Super-scalar ram-cpu cache compression. In Proceedings of the 22nd international conference on data engineering, ICDE ’06 (pp. 59–70). Washington, DC, USA: IEEE Computer Society. Zukowski, M., Heman, S., Nes, N., & Boncz, P. (2006). Super-scalar ram-cpu cache compression. In Proceedings of the 22nd international conference on data engineering, ICDE ’06 (pp. 59–70). Washington, DC, USA: IEEE Computer Society.
Metadaten
Titel
Performance improvements for search systems using an integrated cache of lists + intersections
verfasst von
Gabriel Tolosa
Esteban Feuerstein
Luca Becchetti
Alberto Marchetti-Spaccamela
Publikationsdatum
11.03.2017
Verlag
Springer Netherlands
Erschienen in
Discover Computing / Ausgabe 3/2017
Print ISSN: 2948-2984
Elektronische ISSN: 2948-2992
DOI
https://doi.org/10.1007/s10791-017-9299-5

Weitere Artikel der Ausgabe 3/2017

Discover Computing 3/2017 Zur Ausgabe

Information Retrieval Efficiency

Efficient distributed selective search