Top

Discover Computing

Published in:

01-08-2013 | Search Intents and Diversification

Learning to rank query suggestions for adhoc and diversity search

Authors: Rodrygo L. T. Santos, Craig Macdonald, Iadh Ounis

Published in: Discover Computing | Issue 4/2013

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Query suggestions have become pervasive in modern web search, as a mechanism to guide users towards a better representation of their information need. In this article, we propose a ranking approach for producing effective query suggestions. In particular, we devise a structured representation of candidate suggestions mined from a query log that leverages evidence from other queries with a common session or a common click. This enriched representation not only helps overcome data sparsity for long-tail queries, but also leads to multiple ranking criteria, which we integrate as features for learning to rank query suggestions. To validate our approach, we build upon existing efforts for web search evaluation and propose a novel framework for the quantitative assessment of query suggestion effectiveness. Thorough experiments using publicly available data from the TREC Web track show that our approach provides effective suggestions for adhoc and diversity search.

previous article Introduction to the special issue on search intents and diversification

next article Mining subtopics from different aspects for diversifying search results

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

An analogy to the document ranking problem can be made in which field-based models, such as BM25F (Zaragoza et al. 2004), leverage evidence from fields such as the title, body, URL, or the anchor text of incoming hyperlinks in order to score a document.

http://www.gnu.org/software/gzip

http://terrier.org.

http://boston.lti.cs.cmu.edu/Data/clueweb09/

http://research.microsoft.com/en-us/um/people/nickcr/wscd09

All rankings were obtained in February 2012 using Bing API v2.0.

All query suggestions were obtained in February 2012 using Bing API v2.0.

Note that suggestions with a relevance label 1 (i.e., with a positive yet lower retrieval effectiveness than that attained by the initial query) are also considered, as they may bring useful evidence for the diversification scenario addressed in Sec. 6.2.

Alonso, O., Rose, D. E., & Stewart, B. (2008). Crowdsourcing for relevance evaluation. SIGIR Forum, 42(2), 9–15.CrossRef

Amati, G. (2003). Probabilistic models for information retrieval based on divergence from randomness. PhD thesis. :University of Glasgow.

Amati, G., Ambrosi, E., Bianchi, M., Gaibisso, C., & Gambosi, G. (2007). FUB, IASI-CNR and University of Tor Vergata at TREC 2007 Blog track. In Proceedings of TREC.

Baeza-Yates, R. A., Hurtado, C. A., & Mendoza, M. (2004). Query recommendation using query logs in search engines. In Proceedings of ClustWeb at EDBT (pp. 588–596).

Boldi, P., Bonchi, F., Castillo, C., Donato, D., Gionis, A., & Vigna, S. (2008). The query-flow graph: Model and applications. In Proceedings of CIKM (pp. 609–618).

Boldi, P., Bonchi, F., Castillo, C., Donato, D., & Vigna, S. (2009). Query suggestions using query-flow graphs. In Proceedings of WSCD at WSDM (pp. 56–63).

Broccolo, D., Marcon, L., Nardini, F. M., Perego, R., & Silvestri, F. (2012). Generating suggestions for queries in the long tail with an inverted index. Information Processing and Management, 48(2), 326–339.CrossRef

Burges, C. J. C. (2010). From RankNet to LambdaRank to LambdaMART: An overview. Technical report MSR-TR-2010-82, Microsoft Research.

Carterette, B., Allan, J., & Sitaraman, R. (2006). Minimal test collections for retrieval evaluation. In Proceedings of SIGIR (pp. 268–275).

Carterette, B., Pavlu, V., Kanoulas, E., Aslam, J. A., & Allan, J. (2009). If I dad a million queries. In Proceedings of ECIR (pp. 288–300). New York: Springer.

Chapelle, O., & Chang, Y. (2011). Yahoo! learning to rank challenge overview. Journal of Machine Learning Research, 14, 1–24.

Chapelle, O., Metlzer, D., Zhang, Y., & Grinspan, P. (2009). Expected reciprocal rank for graded relevance. In Proceedings of CIKM (pp. 621–630).

Clarke, C. L. A., Craswell, N., & Soboroff, I. (2009). Overview of the TREC 2009 Web track. In Proceeding of TREC.

Clarke, C. L. A., Craswell, N., Soboroff, I., & Ashkan, A. (2011). A comparative analysis of cascade measures for novelty and diversity. In Proceedings of WSDM (pp. 75–84).

Clarke, C. L. A., Craswell, N., Soboroff, I., & Cormack, G. V. (2010). Overview of the TREC 2010 Web track. In Proceedings of TREC.

Clarke, C. L. A., Craswell, N., Soboroff, I., & Voorhees, E. M. (2011). Overview of the TREC 2011 Web track. In Proceedidngs of TREC.

Clarke, C. L. A., Kolla, M., Cormack, G. V., Vechtomova, O., Ashkan, A., Büttcher, S., & MacKinnon, I. (2008). Novelty and diversity in information retrieval evaluation. In Proceedings of SIGIR (pp. 659–666).

Clarke, C. L. A., Kolla, M., & Vechtomova, O. (2009). An effectiveness measure for ambiguous and underspecified queries. In Proceedings of ICTIR (pp. 188–199).

Cucerzan, S., & White, R. W. (2007). Query suggestion based on user landing pages. In Proceedings of SIGIR (pp. 875–876). New York: ACM.

Dang, V., Bendersky, M., & Croft, W. B. (2010). Learning to rank query reformulations. In Proceedings of SIGIR (pp. 807–808). :ACM.

Dean, J. (2009). Challenges in building large-scale information retrieval systems: invited talk. In Proceedings of WSDM (p. 1). New York: ACM.

Downey, D., Dumais, S., & Horvitz, E. (2007). Heads and tails: studies of web search with common and rare queries. In Proceedings of SIGIR (pp. 847–848).

Fonseca, B. M., Golgher, P. B., De Moura, E. S., Pôssas, B., & Ziviani, N. (2003). Discovering search engine related queries using association rules. Journal of Web Engineering, 2, 215–227.

Ganjisaffar, Y., Caruana, R., & Lopes, C. (2011). Bagging gradient-boosted trees for high precision, low variance ranking models. In Proceedings of SIGIR (pp. 85–94), Beijing, China.

Hauff, C., Kelly, D., & Azzopardi, L. (2010). A comparison of user and system query performance predictions. In Proceedings of CIKM (pp. 979–988).

Jansen, B. J., Spink, A., Bateman, J., & Saracevic, T. (1998). Real life information retrieval: A study of user queries on the web. SIGIR Forum, 32(1), 5–17.CrossRef

Järvelin, K., & Kekäläinen, J. (2002). Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 20(4), 422–446.CrossRef

Jones, R., Rey, B., Madani, O., & Greiner, W. (2006). Generating query substitutions. In Proceedings of WWW (pp. 387–396).

Liu, T.-Y. (2009). Learning to rank for information retrieval. Foundations and Trends in Information Retrieval, 3(3), 225–331.CrossRef

Mei, Q., Zhou, D., & Church, K. (2008). Query suggestion using hitting time. In Proceedings of CIKM (pp. 469–478).

Metzler, D. (2007). Automatic feature selection in the Markov random field model for information retrieval. In Proceedings of CIKM (pp. 253–262).

Metzler, D., & Croft, W. B. (2005). A Markov random field model for term dependencies. In Proceedings of SIGIR (pp. 472–479).

Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., & Lioma, C. (2006). Terrier: A high performance and scalable information retrieval platform. In Proceedings of OSIR at SIGIR.

Peng, J., Macdonald, C., He, V., Plachouras, V., & Ounis, I. (2007). Incorporating term dependency in the DFR framework. In Proceedings of SIGIR. New York: ACM Press.

Qin, T., Liu, T.-Y., Xu, J., & Li, H. (2009). LETOR: A benchmark collection for research on learning to rank for information retrieval. Information Retrieval, 13(4), 347–374.

Robertson, S. (2008). On the optimisation of evaluation metrics. In Proceedings of LR4IR at SIGIR.

Robertson, S. E., Walker, S., Jones, S., Hancock-Beaulieu, M., & Gatford, M. (1994). Okapi at TREC-3. In Proceedings of TREC.

Santos, R. L. T., Macdonald, C., & Ounis, I. (2010). Exploiting query reformulations for web search result diversification. In Proceedings of WWW (pp. 881–890).

Santos, R. L. T., Macdonald, C., & Ounis, I. (2011). How diverse are web search results? In Proceedings of SIGIR (pp. 1187–1188).

Santos, R. L. T., Macdonald, C., & Ounis, I. (2011). Intent-aware search result diversification. In Proceedings of SIGIR (pp. 595–604).

Sheldon, D., Shokouhi, M., Szummer, M., & Craswell, N. (2011). LambdaMerge: merging the results of query reformulations. In Proceedings of WSDM (pp. 795–804).

Silvestri, F. (2010). Mining query logs: turning search usage data into knowledge. Foundations and Trends® in Information Retrieval, 4(1–2), 1–174.MATHCrossRef

Song, R., Luo, Z., Nie, J.-Y., Yu, Y., & Hon, H.-W. (2009). Identification of ambiguous queries in web search. Information Processing and Management, 45(2), 216–229.CrossRef

Song, Y., Zhou, D., & Wei He, L. (2011). Post-ranking query suggestion by diversifying search results. In Proceedings of SIGIR (pp. 815–824). Beijing, China.

Spärck-Jones, K., Robertson, S. E., & Sanderson, M. (2007). Ambiguous requests: Implications for retrieval tests, systems and theories. SIGIR Forum, 41(2), 8–17.CrossRef

Szpektor, I., Gionis, A., & Maarek, Y. (2011). Improving recommendation for long-tail queries via templates. In Proceedings of WWW (pp. 47–56).

Wang, X., & Zhai, C. (2008). Mining term association patterns from search logs for effective query reformulation. In Proceedings of CIKM (pp. 479–488).

Zaragoza, H., Craswell, N., Taylor, M. J., Saria, S., & Robertson, S. E. (2004). Microsoft Cambridge at TREC 13: Web and hard tracks. In Proceedings of TREC.

Zhai, C., & Lafferty, J. (2001). A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of SIGIR (pp. 334–342).

Zhang, Z., & Nasraoui, O. (2006). Mining search engine query logs for query recommendation. In Proceedings of WWW (pp. 1039–1040).

Title: Learning to rank query suggestions for adhoc and diversity search
Authors: Rodrygo L. T. Santos
Craig Macdonald
Iadh Ounis
Publication date: 01-08-2013
Publisher: Springer Netherlands
Published in: Discover Computing / Issue 4/2013
Print ISSN: 2948-2984
Electronic ISSN: 2948-2992
DOI: https://doi.org/10.1007/s10791-012-9211-2

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 4/2013

Mining subtopics from different aspects for diversifying search results

Diversified search evaluation: lessons from the NTCIR-9 INTENT task

Introduction to the special issue on search intents and diversification

Mining subtopics from text fragments for a web query

Increasing evaluation sensitivity to diversity

Premium Partner