Skip to main content

2017 | OriginalPaper | Buchkapitel

SMERA: Semantic Mixed Approach for Web Query Expansion and Reformulation

verfasst von : Bissan Audeh, Philippe Beaune, Michel Beigbeder

Erschienen in: Advances in Knowledge Discovery and Management

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Matching users’ information needs and relevant documents is the basic goal of information retrieval systems. However, relevant documents do not necessarily contain the same terms as the ones in users’ queries. In this paper, we use semantics to better express users’ queries. Furthermore, we distinguish between two types of concepts: those extracted from a set of pseudo relevance documents, and those extracted from a semantic resource such as an ontology. With this distinction in mind we propose a Semantic Mixed query Expansion and Reformulation Approach (SMERA) that uses these two types of concepts to improve web queries. This approach considers several challenges such as the selective choice of expansion terms, the treatment of named entities, and the reformulation of the query in a user-friendly way. We evaluate SMERA on four standard web collections from INEX and TREC evaluation campaigns. Our experiments show that SMERA improves the performance of an information retrieval system compared to non-modified original queries. In addition, our approach provides a statistically significant improvement in precision over a competitive query expansion method while generating concept-based queries that are more comprehensive and easy to interpret.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
In this paper we define an effective query is the one that obtains good results with standard measures used in evaluation campaigns, in particular, precision measures for the case of web queries.
 
2
LSI: Latent Semantic Indexing (Deerwester et al. 1990).
 
3
Our experiments showed no significant difference between using euclidian and cosine distances, in this paper we used euclidian distance because it is more clear for our graphical demonstration in Figs. 3 and 4.
 
Literatur
Zurück zum Zitat Audeh, B., Beaune, P., & Beigbeder, M. (2013). Recall-oriented evaluation for information retrieval systems. In: Information Retrieval Facility Conference (IRFC), Limassol, Chypre. Audeh, B., Beaune, P., & Beigbeder, M. (2013). Recall-oriented evaluation for information retrieval systems. In: Information Retrieval Facility Conference (IRFC), Limassol, Chypre.
Zurück zum Zitat Barr, C., Jones, R., & Regelson, M. (2008). The linguistic structure of english web-search queries. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 1021–1030). Association for Computational Linguistics. Barr, C., Jones, R., & Regelson, M. (2008). The linguistic structure of english web-search queries. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 1021–1030). Association for Computational Linguistics.
Zurück zum Zitat Bendersky, M., & Croft, W. B. (2008). Discovering key concepts in verbose queries. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 491–498). ACM. Bendersky, M., & Croft, W. B. (2008). Discovering key concepts in verbose queries. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 491–498). ACM.
Zurück zum Zitat Bendersky, M., Metzler, D., & Croft, W. B. (2012). Effective query formulation with multiple information sources. In Proceedings of the Fifth ACM International Conference on Web Search and Data Mining (pp. 443–452). ACM. Bendersky, M., Metzler, D., & Croft, W. B. (2012). Effective query formulation with multiple information sources. In Proceedings of the Fifth ACM International Conference on Web Search and Data Mining (pp. 443–452). ACM.
Zurück zum Zitat Bendersky, M., Rey, M., & Croft, W. B. (2011). Parameterized concept weighting in verbose queries. In SIGIR. ACM Press. Bendersky, M., Rey, M., & Croft, W. B. (2011). Parameterized concept weighting in verbose queries. In SIGIR. ACM Press.
Zurück zum Zitat Blei, D., Ng, A., & Jordan, M. (2003). Latent dirichlet allocation. The Journal of Machine Learning Research, 3, 993–1022.MATH Blei, D., Ng, A., & Jordan, M. (2003). Latent dirichlet allocation. The Journal of Machine Learning Research, 3, 993–1022.MATH
Zurück zum Zitat Brandao, W., Silva, A., Moura, E., & Ziviani, N. (2011). Exploiting entity semantics for query expansion. In IADIS International Conference WWW/Internet, Rio de Janeiro. Brandao, W., Silva, A., Moura, E., & Ziviani, N. (2011). Exploiting entity semantics for query expansion. In IADIS International Conference WWW/Internet, Rio de Janeiro.
Zurück zum Zitat Cronen-Townsend, S., Zhou, Y., & Croft, W. B. (2002). Predicting query performance. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (p. 299). Cronen-Townsend, S., Zhou, Y., & Croft, W. B. (2002). Predicting query performance. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (p. 299).
Zurück zum Zitat Deerwester, S., Dumais, S. T., Furnas, G. W., & Landauer, T. K. (1990). Indexing by latent semantic analysis. Society, 41, 391–407. Deerwester, S., Dumais, S. T., Furnas, G. W., & Landauer, T. K. (1990). Indexing by latent semantic analysis. Society, 41, 391–407.
Zurück zum Zitat Deveaud, R., Bonnefoy, L., & Bellot, P. (2013). Quantification et identification des concepts implicites d’une requête. In CORIA 2013, La dixième édition de la COnférence en Recherche d’Information et Applications, Neuchâtel. Deveaud, R., Bonnefoy, L., & Bellot, P. (2013). Quantification et identification des concepts implicites d’une requête. In CORIA 2013, La dixième édition de la COnférence en Recherche d’Information et Applications, Neuchâtel.
Zurück zum Zitat Gabrilovich, E., & Markovitch, S. (2007). Computing semantic relatedness using wikipedia-based explicit semantic analysis. In IJCAI. Gabrilovich, E., & Markovitch, S. (2007). Computing semantic relatedness using wikipedia-based explicit semantic analysis. In IJCAI.
Zurück zum Zitat Hoffart, J., Yosef, M. A., Bordino, I., Furstenau, H., Pinkal, M., Spaniol, M., Taneva, B., Thater, S., & Weikum, G. (2011). Robust disambiguation of named entities in text. In EMNLP 2011 Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 782–792). Hoffart, J., Yosef, M. A., Bordino, I., Furstenau, H., Pinkal, M., Spaniol, M., Taneva, B., Thater, S., & Weikum, G. (2011). Robust disambiguation of named entities in text. In EMNLP 2011 Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 782–792).
Zurück zum Zitat Huston, S., & Croft, W. B. (2010). Evaluating verbose query processing techniques. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 291–298). ACM. Huston, S., & Croft, W. B. (2010). Evaluating verbose query processing techniques. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 291–298). ACM.
Zurück zum Zitat Jansen, B. J., Spink, A., & Saracevic, T. (2000). Real life, real users, and real needs: A study and analysis of user queries on the web. Information Processing and Management, 36, 207–227.CrossRef Jansen, B. J., Spink, A., & Saracevic, T. (2000). Real life, real users, and real needs: A study and analysis of user queries on the web. Information Processing and Management, 36, 207–227.CrossRef
Zurück zum Zitat Kumaran, G., & Carvalho, V. R. (2009). Reducing long queries using query quality predictors. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (p. 564). NY, USA: ACM Press. Kumaran, G., & Carvalho, V. R. (2009). Reducing long queries using query quality predictors. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (p. 564). NY, USA: ACM Press.
Zurück zum Zitat Lavrenko, V., & Croft, W. B. (2001). Relevance based language models. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 120–127). NY, USA: ACM Press. Lavrenko, V., & Croft, W. B. (2001). Relevance based language models. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 120–127). NY, USA: ACM Press.
Zurück zum Zitat Maxwell, K. T., & Croft, W. B. (2013). Compact query term selection using topically related text. In Proceedings of the 36th International ACM SIGIR (pp. 583–592). Maxwell, K. T., & Croft, W. B. (2013). Compact query term selection using topically related text. In Proceedings of the 36th International ACM SIGIR (pp. 583–592).
Zurück zum Zitat Metzler, D., & Croft, W. B. (2004). Combining the language model and inference network approaches to retrieval. Information Processing and Management, 40, 735–750.CrossRef Metzler, D., & Croft, W. B. (2004). Combining the language model and inference network approaches to retrieval. Information Processing and Management, 40, 735–750.CrossRef
Zurück zum Zitat Metzler, D., & Croft, W. B. (2005). A Markov random field model for term dependencies. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (p. 472). NY, USA: ACM Press. Metzler, D., & Croft, W. B. (2005). A Markov random field model for term dependencies. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (p. 472). NY, USA: ACM Press.
Zurück zum Zitat Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., & Miller, K. J. (1990). Introduction to wordnet: An on-line lexical database. International Journal of Lexicography, 3(4), 235–244.CrossRef Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., & Miller, K. J. (1990). Introduction to wordnet: An on-line lexical database. International Journal of Lexicography, 3(4), 235–244.CrossRef
Zurück zum Zitat Ponte, J. M., & Croft, W. B. (1998). A language modeling approach to information retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 275–281). ACM. Ponte, J. M., & Croft, W. B. (1998). A language modeling approach to information retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 275–281). ACM.
Zurück zum Zitat Qiu, Y., & Frei, H. (1993). Concept based query expansion. In Proceedings of the International ACM SIGIR Conference on Research and Development in Informaion Retrieval (Vol. 11, p. 212). NY: ACM. Qiu, Y., & Frei, H. (1993). Concept based query expansion. In Proceedings of the International ACM SIGIR Conference on Research and Development in Informaion Retrieval (Vol. 11, p. 212). NY: ACM.
Zurück zum Zitat Rocchio, J. J., & Salton, G. (1965). Information search optimization and iterative retrieval techniques. In Fall Joint Computer Conference (pp. 293–305). Rocchio, J. J., & Salton, G. (1965). Information search optimization and iterative retrieval techniques. In Fall Joint Computer Conference (pp. 293–305).
Zurück zum Zitat Shah, C., & Croft, W. B. (2004). Evaluating high accuracy retrieval techniques chirag shah. In SIGIR. ACM Press. Shah, C., & Croft, W. B. (2004). Evaluating high accuracy retrieval techniques chirag shah. In SIGIR. ACM Press.
Zurück zum Zitat Strohman, T., Metzler, D., Turtle, H., & Croft, W. (2004). Indri: A language-model based search engine for complex queries. In Proceedings of the International Conference on Intelligence Analysis. Strohman, T., Metzler, D., Turtle, H., & Croft, W. (2004). Indri: A language-model based search engine for complex queries. In Proceedings of the International Conference on Intelligence Analysis.
Zurück zum Zitat Suchanek, F. M., Kasneci, G., & Weikum, G. (2007). Yago: A core of semantic knowledge. In Proceedings of the 16th International Conference on World Wide Web (pp. 697–706). ACM. Suchanek, F. M., Kasneci, G., & Weikum, G. (2007). Yago: A core of semantic knowledge. In Proceedings of the 16th International Conference on World Wide Web (pp. 697–706). ACM.
Zurück zum Zitat Voorhees, E. M. (1994). Query expansion using lexical-semantic relations. In SIGIR 1994. ACM Press. Voorhees, E. M. (1994). Query expansion using lexical-semantic relations. In SIGIR 1994. ACM Press.
Zurück zum Zitat Xu, Y., Ding, F., & Wang, B. (2008). Entity-based query reformulation using wikipedia. In Proceeding of the 17th ACM Conference on Information and Knowledge Mining - CIKM 2008 (p. 1441). NY, USA: ACM Press. Xu, Y., Ding, F., & Wang, B. (2008). Entity-based query reformulation using wikipedia. In Proceeding of the 17th ACM Conference on Information and Knowledge Mining - CIKM 2008 (p. 1441). NY, USA: ACM Press.
Zurück zum Zitat Zhao, L., & Callan, J. (2010). Term necessity prediction. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management (pp. 259–268). ACM. Zhao, L., & Callan, J. (2010). Term necessity prediction. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management (pp. 259–268). ACM.
Zurück zum Zitat Zobel, J. (2004). Questioning query expansion: An examination of behaviour and parameters. In SIGIR. ACM Press. Zobel, J. (2004). Questioning query expansion: An examination of behaviour and parameters. In SIGIR. ACM Press.
Metadaten
Titel
SMERA: Semantic Mixed Approach for Web Query Expansion and Reformulation
verfasst von
Bissan Audeh
Philippe Beaune
Michel Beigbeder
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-45763-5_9