ABSTRACT
The ability to aggregate huge volumes of queries over a large population of users allows search engines to build precise models for a variety of query-assistance features such as query recommendation, correction, etc. Yet, no matter how much data is aggregated, the long-tail distribution implies that a large fraction of queries are rare. As a result, most query assistance services perform poorly or are not even triggered on long-tail queries. We propose a method to extend the reach of query assistance techniques (and in particular query recommendation) to long-tail queries by reasoning about rules between query templates rather than individual query transitions, as currently done in query-flow graph models. As a simple example, if we recognize that 'Montezuma' is a city in the rare query "Montezuma surf" and if the rule 'city surf → beach has been observed, we are able to offer "Montezuma beach" as a recommendation, even if the two queries were never observed in a same session. We conducted experiments to validate our hypothesis, first via traditional small-scale editorial assessments but more interestingly via a novel automated large scale evaluation methodology. Our experiments show that general coverage can be relatively increased by 24% using templates without penalizing quality. Furthermore, for 36% of the 95M queries in our query flow graph, which have no out edges and thus could not be served recommendations, we can now offer at least one recommendation in 98% of the cases.
- G. Agarwal, G. Kabra, and K. C.-C. Chang. Towards rich query interpretation: Walking back and forth for mining query templates. In WWW, 2010. Google ScholarDigital Library
- I. Antonellis, H. Garcia-Molina, and C.-C. Chang. Simrank++: Query rewriting through link analysis of the click graph. In VLDB, 2008. Google ScholarDigital Library
- R. A. Baeza-Yates, C. A. Hurtado, and M. Mendoza. Query recommendation using query logs in search engines. In EDBT Workshops, 2004.Google ScholarDigital Library
- D. Beeferman and A. Berger. Agglomerative clustering of a search engine query log. In KDD, 2000. Google ScholarDigital Library
- P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, and S. Vigna. The query-flow graph: model and applications. In CIKM, 2008. Google ScholarDigital Library
- D. Chiang. A hierarchical phrase-based model for statistical machine translation. In ACL, 2005. Google ScholarDigital Library
- J. Cohen. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1):37--46, April 1960.Google ScholarCross Ref
- N. Craswell and M. Szummer. Random walks on the click graph. In SIGIR, 2007. Google ScholarDigital Library
- M. Fernández-Fernández and D. Gayo-Avello. Hierarchical taxonomy extraction by mining topical query sessions. In KDIR, 2009.Google Scholar
- B. M. Fonseca, P. B. Golgher, E. S. de Moura, and N. Ziviani. Using association rules to discover search engines related queries. In LA-WEB, 2003. Google ScholarDigital Library
- A. Fuxman, P. Tsaparas, K. Achan, and R. Agrawal. Using the wisdom of the crowds for keyword generation. In WWW, 2008. Google ScholarDigital Library
- S. Goel, A. Broder, E. Gabrilovich, and B. Pang. Anatomy of the long tail: ordinary people with extraordinary tastes. In WSDM, 2010. Google ScholarDigital Library
- J. Guo, G. Xu, X. Cheng, and H. Li. Named entity recognition in query. In SIGIR, 2009. Google ScholarDigital Library
- R. Jones, B. Rey, O. Madani, and W. Greiner. Generating query substitutions. In WWW, 2006. Google ScholarDigital Library
- J. R. Landis and G. G. Koch. The measurement of observer agreement for categorical data. Biometrics, 33(1), 1977.Google Scholar
- X. Li, Y.-Y. Wang, and A. Acero. Extracting structured information from user queries with semi-supervised conditional random fields. In SIGIR, 2009. Google ScholarDigital Library
- D. Lin and P. Pantel. Discovery of inference rules for question answering. Natural Language Engineering, 7(4):343--360, 2001. Google ScholarDigital Library
- J. Luxenburger, S. Elbassuoni, and G. Weikum. Matching task profiles and user needs in personalized web search. In CIKM, 2008. Google ScholarDigital Library
- Q. Mei, D. Zhou, and K. Church. Query suggestion using hitting time. In CIKM, 2008. Google ScholarDigital Library
- G. A. Miller. Wordnet: A lexical database for english. In Communications of the ACM, 1995. Google ScholarDigital Library
- M. Paşca. Weakly-supervised discovery of named entities using web search queries. In CIKM, 2007.Google ScholarDigital Library
- M. Richardson. Learning about the world through long-term query logs. ACM Transactions on the Web, 2008. Google ScholarDigital Library
- E. Sadikov, J. Madhavan, L. Wang, and A. Halevy. Clustering query refinements by user intent. In WWW, 2010. Google ScholarDigital Library
- R. Snow, D. Jurafsky, and A. Y. Ng. Learning syntactic patterns for automatic hypernym discovery. In NIPS, 2005.Google Scholar
- F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: A core of semantic knowledge - unifying wordnet and wikipedia. In WWW, 2007. Google ScholarDigital Library
- I. Szpektor and I. Dagan. Learning entailment rules for unary templates. In COLING, 2008. Google ScholarDigital Library
- J.-R. Wen, J.-Y. Nie, and H.-J. Zhang. Clustering user queries of a search engine. In WWW, 2001. Google ScholarDigital Library
- R. W. White, M. Bilenko, and S. Cucerzan. Studying the use of popular destinations to enhance web search interaction. In SIGIR, 2007. Google ScholarDigital Library
- R. W. White, M. Bilenko, and S. Cucerzan. Leveraging popular destinations to enhance web search interaction. ACM Transactions on the Web, 2(3), 2008. Google ScholarDigital Library
- Z. Zhang and O. Nasraoui. Mining search engine query logs for query recommendation. In WWW, 2006. Google ScholarDigital Library
Recommendations
Long-tail Session-based Recommendation
RecSys '20: Proceedings of the 14th ACM Conference on Recommender SystemsSession-based recommendation focuses on the prediction of user actions based on anonymous sessions and is a necessary method in the lack of user historical data. However, none of the existing session-based recommendation methods explicitly takes the ...
Entity-Based Query Recommendation for Long-Tail Queries
Query recommendation, which suggests related queries to search engine users, has attracted a lot of attention in recent years. Most of the existing solutions, which perform analysis of users’ search history (or query logs), are often insufficient for ...
KB-Enabled Query Recommendation for Long-Tail Queries
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge ManagementIn recent years, query recommendation algorithms have been designed to provide related queries for search engine users. Most of these solutions, which perform extensive analysis of users' search history (or query logs), are largely insufficient for long-...
Comments