Skip to main content
Top
Published in: Journal of Intelligent Information Systems 1/2015

01-02-2015

Answering keyword queries through cached subqueries in best match retrieval models

Authors: Myron Papadakis, Yannis Tzitzikas

Published in: Journal of Intelligent Information Systems | Issue 1/2015

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Caching is one of the techniques that Information Retrieval Systems (IRS) and Web Search Engines (WSEs) use to reduce processing costs and attain faster response times. In this paper we introduce Top-K SCRC (Set Cover Results Cache), a novel technique for results caching which aims at maximizing the utilization of cache. Identical queries are treated as in plain results caching (i.e. their evaluation does not require accessing the index), while combinations of cached sub-queries are exploited as in posting lists caching, however the exploited subqueries are not necessarily single-word queries. The problem of finding the right set of cached subqueries to answer an incoming query, is actually the Exact Set Cover problem. This technique can be applied in any best match retrieval model that is based on a decomposable scoring function, and we show that several best-match retrieval models (i.e VSM, Okapi BM25 and hybrid retrieval models) rely on such scoring functions. To increase the capacity (in queries) of the cache only the top-K results of each cached query are stored and we introduce metrics for measuring the accuracy of the composed top-K answer. By analyzing queries submitted to real-world WSEs, we verified that there is a significant proportion of queries whose terms is the result of a union of the terms of other queries. The comparative evaluation over traces of real query sets showed that the Top-K SCRC is on the average two times faster than a plain Top-K RC for the same cache size.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
1
Regarding semi-decomposable scoring functions, the SCRC structure does not need to change at all. The score of each document in a cached answer is its decomposable query-dependent score.
 
2
In query independent ranking models we do not have this issue.
 
Literature
go back to reference Baeza-Yate, R., Junqueira, F., Plachouras, V., Witschel, H. (2007). Admission policies for caches of search engine results. String Processing and Information Retrieval, (pp. 74–85). Springer. Baeza-Yate, R., Junqueira, F., Plachouras, V., Witschel, H. (2007). Admission policies for caches of search engine results. String Processing and Information Retrieval, (pp. 74–85). Springer.
go back to reference Baeza-Yates, R., Gionis, A., Junqueira, F.P., Murdock, V., Plachouras, V., Silvestri, F. (2008). Design trade-offs for search engine caching. ACM Transactions on the Web, 2(4), 1–28. doi:10.1145/1409220.1409223.CrossRef Baeza-Yates, R., Gionis, A., Junqueira, F.P., Murdock, V., Plachouras, V., Silvestri, F. (2008). Design trade-offs for search engine caching. ACM Transactions on the Web, 2(4), 1–28. doi:10.​1145/​1409220.​1409223.CrossRef
go back to reference Baeza-Yates, R., Junqueira, F., Plachouras, V., Witschel, H. (2007). Admission policies for caches of search engine results. Lecture Notes in Computer Science, 4726, 74.CrossRefMathSciNet Baeza-Yates, R., Junqueira, F., Plachouras, V., Witschel, H. (2007). Admission policies for caches of search engine results. Lecture Notes in Computer Science, 4726, 74.CrossRefMathSciNet
go back to reference Baeza-Yates, R., & Saint-Jean, F. (2003). A three level search engine index based in query log distribution. String Processing and Information Retrieval, (pp. 56–65). Springer. Baeza-Yates, R., & Saint-Jean, F. (2003). A three level search engine index based in query log distribution. String Processing and Information Retrieval, (pp. 56–65). Springer.
go back to reference Baeza-Yates, R.A., Gionis, A., Junqueira, F., Murdock, V., Plachouras, V., Silvestri, F. (2007). The impact of caching on search engines. SIGIR, (pp. 183–190). Baeza-Yates, R.A., Gionis, A., Junqueira, F., Murdock, V., Plachouras, V., Silvestri, F. (2007). The impact of caching on search engines. SIGIR, (pp. 183–190).
go back to reference Baeza-Yates, R.A., & Saint-Jean, F. (2003). A three level search engine index based in query log distribution. SPIRE, (pp. 56–65). Baeza-Yates, R.A., & Saint-Jean, F. (2003). A three level search engine index based in query log distribution. SPIRE, (pp. 56–65).
go back to reference Broder, A.Z., Carmel, D., Herscovici, M., Soffer, A., Zien, J. (2003). Efficient query evaluation using a two-level retrieval process. CIKM ’03: Procs of the 12th intern. conf. on Information and knowledge management, (pp. 426–434). New York: ACM. Broder, A.Z., Carmel, D., Herscovici, M., Soffer, A., Zien, J. (2003). Efficient query evaluation using a two-level retrieval process. CIKM ’03: Procs of the 12th intern. conf. on Information and knowledge management, (pp. 426–434). New York: ACM.
go back to reference Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C. (2001). Introduction to Algorithms, 2nd edn: The MIT Press and McGraw-Hill Book Company. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C. (2001). Introduction to Algorithms, 2nd edn: The MIT Press and McGraw-Hill Book Company.
go back to reference Fafalios, P., Kitsos, I., Marketakis, Y., Baldassarre, C., Salampasis, M., Tzitzikas, Y. (2012). Web searching with entity mining at query time. Multidisciplinary Information Retrieval, 73–88. Fafalios, P., Kitsos, I., Marketakis, Y., Baldassarre, C., Salampasis, M., Tzitzikas, Y. (2012). Web searching with entity mining at query time. Multidisciplinary Information Retrieval, 73–88.
go back to reference Fagni, T., Perego, R., Silvestri, F., Orlando, S. (2006). Boosting the performance of web search engines: Caching and prefetching query results by exploiting historical usage data. ACM Transactions on Information Systems, 24(1), 51–78.CrossRef Fagni, T., Perego, R., Silvestri, F., Orlando, S. (2006). Boosting the performance of web search engines: Caching and prefetching query results by exploiting historical usage data. ACM Transactions on Information Systems, 24(1), 51–78.CrossRef
go back to reference Jansen, B., & Pooch, U. (2000). A review of web searching studies and a framework for future research. Journal of the American Society for Information Science and Technology, 52(3), 235–246.CrossRef Jansen, B., & Pooch, U. (2000). A review of web searching studies and a framework for future research. Journal of the American Society for Information Science and Technology, 52(3), 235–246.CrossRef
go back to reference Jansen, B., & Spink, A. (2005). An analysis of web searching by european alltheweb. com users. Information Processing & Management, 41(2), 361–381.CrossRef Jansen, B., & Spink, A. (2005). An analysis of web searching by european alltheweb. com users. Information Processing & Management, 41(2), 361–381.CrossRef
go back to reference Jansen, B., Spink, A., Saracevic, T. (2000). Real life, real users, and real needs: a study and analysis of user queries on the web. Information Processing & Management, 36(2), 207–227.CrossRef Jansen, B., Spink, A., Saracevic, T. (2000). Real life, real users, and real needs: a study and analysis of user queries on the web. Information Processing & Management, 36(2), 207–227.CrossRef
go back to reference Lempel, R., & Moran, S. (2003). Predictive caching and prefetching of query results in search engines. Procs of the 12th intern. conf. on World Wide Web, (pp. 19–28). New York: ACM. Lempel, R., & Moran, S. (2003). Predictive caching and prefetching of query results in search engines. Procs of the 12th intern. conf. on World Wide Web, (pp. 19–28). New York: ACM.
go back to reference Long, X., & Suel, T. (2006). Three-Level Caching for Efficient Query Processing in Large Web Search Engines. World Wide Web, 9(4), 369–395.CrossRef Long, X., & Suel, T. (2006). Three-Level Caching for Efficient Query Processing in Large Web Search Engines. World Wide Web, 9(4), 369–395.CrossRef
go back to reference Ma, H., & Wang, B. (2012). User-aware caching and prefetching query results in web search engines. Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, (pp. 1163–1164). ACM. Ma, H., & Wang, B. (2012). User-aware caching and prefetching query results in web search engines. Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, (pp. 1163–1164). ACM.
go back to reference Markatos, E. (2001). On caching search engine query results. Computer Communications, 24(2), 137–143.CrossRef Markatos, E. (2001). On caching search engine query results. Computer Communications, 24(2), 137–143.CrossRef
go back to reference Papadakos, P., Armenatzoglou, N., Kopidaki, S., Tzitzikas, Y. (2012). On exploiting static and dynamically mined metadata for exploratory web searching. Knowledge and Information Systems, 30(3), 493–525.CrossRef Papadakos, P., Armenatzoglou, N., Kopidaki, S., Tzitzikas, Y. (2012). On exploiting static and dynamically mined metadata for exploratory web searching. Knowledge and Information Systems, 30(3), 493–525.CrossRef
go back to reference Papadakos, P., Theoharis, Y., Marketakis, Y., Armenatzoglou, N., Tzitzikas, Y. (2008). Mitos: Design and evaluation of a dbms-based web search engine. Informatics, 2008. PCI’08. Panhellenic Conference on, (pp. 49–53): IEEE. Papadakos, P., Theoharis, Y., Marketakis, Y., Armenatzoglou, N., Tzitzikas, Y. (2008). Mitos: Design and evaluation of a dbms-based web search engine. Informatics, 2008. PCI’08. Panhellenic Conference on, (pp. 49–53): IEEE.
go back to reference Saraiva, P.C., de Moura, E.S., Fonseca, R.C., Wagner, M., Ribeiro-Neto, B.A., Ziviani, N. (2001). Rank-preserving two-level caching for scalable search engines. SIGIR, (pp. 51–58). Saraiva, P.C., de Moura, E.S., Fonseca, R.C., Wagner, M., Ribeiro-Neto, B.A., Ziviani, N. (2001). Rank-preserving two-level caching for scalable search engines. SIGIR, (pp. 51–58).
go back to reference Skobeltsyn, G., Junqueira, F., Plachouras, V., Baeza-Yates, R.A. (2008). Resin: a combination of results caching and index pruning for high-performance web search engines. SIGIR, (pp. 131–138). Skobeltsyn, G., Junqueira, F., Plachouras, V., Baeza-Yates, R.A. (2008). Resin: a combination of results caching and index pruning for high-performance web search engines. SIGIR, (pp. 131–138).
go back to reference Soffer, A., Carmel, D., Cohen, D., Fagin, R., Farchi, E., Herscovici, M., Maarek, Y.S. (2001). Static index pruning for information retrieval systems. SIGIR, (pp. 43–50). Soffer, A., Carmel, D., Cohen, D., Fagin, R., Farchi, E., Herscovici, M., Maarek, Y.S. (2001). Static index pruning for information retrieval systems. SIGIR, (pp. 43–50).
go back to reference Tzitzikas, Y., Spyratos, N., Constantopoulos, P. (2005). Mediators over taxonomy-based information sources. The VLDB Journal, 14(1), 112–136.CrossRef Tzitzikas, Y., Spyratos, N., Constantopoulos, P. (2005). Mediators over taxonomy-based information sources. The VLDB Journal, 14(1), 112–136.CrossRef
go back to reference Vazirani, V. (2001). Approximation algorithms: Springer. Vazirani, V. (2001). Approximation algorithms: Springer.
go back to reference Xie, Y., & O’Hallaron, D. (2002). Locality in search engine queries and its implications for caching. INFOCOM 2002. Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings. IEEE, vol. 3, pp. 1238–1247. IEEE. Xie, Y., & O’Hallaron, D. (2002). Locality in search engine queries and its implications for caching. INFOCOM 2002. Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings. IEEE, vol. 3, pp. 1238–1247. IEEE.
go back to reference Xie, Y., & O’Hallaron, D.R. (2002). Locality in search engine queries and its implications for caching. INFOCOM. Xie, Y., & O’Hallaron, D.R. (2002). Locality in search engine queries and its implications for caching. INFOCOM.
go back to reference Zhang, J., Long, X., Suel, T. (2008). Performance of compressed inverted list caching in search engines. WWW, (pp. 387–396). Zhang, J., Long, X., Suel, T. (2008). Performance of compressed inverted list caching in search engines. WWW, (pp. 387–396).
Metadata
Title
Answering keyword queries through cached subqueries in best match retrieval models
Authors
Myron Papadakis
Yannis Tzitzikas
Publication date
01-02-2015
Publisher
Springer US
Published in
Journal of Intelligent Information Systems / Issue 1/2015
Print ISSN: 0925-9902
Electronic ISSN: 1573-7675
DOI
https://doi.org/10.1007/s10844-014-0330-7

Other articles of this Issue 1/2015

Journal of Intelligent Information Systems 1/2015 Go to the issue

Premium Partner