ABSTRACT
Web search providers often include search services for domain-specific subcollections, called verticals, such as news, images, videos, job postings, company summaries, and artist profiles. We address the problem of vertical selection, predicting relevant verticals (if any) for queries issued to the search engine's main web search page. In contrast to prior query classification and resource selection tasks, vertical selection is associated with unique resources that can inform the classification decision. We focus on three sources of evidence: (1) the query string, from which features are derived independent of external resources, (2) logs of queries previously issued directly to the vertical, and (3) corpora representative of vertical content. We focus on 18 different verticals, which differ in terms of semantics, media type, size, and level of query traffic. We compare our method to prior work in federated search and retrieval effectiveness prediction. An in-depth error analysis reveals unique challenges across different verticals and provides insight into vertical selection for future work.
- S. M. Beitzel, E. C. Jensen, O. Frieder, D. D. Lewis, A. Chowdhury, and A. Kolcz. Improving automatic query classification via semi-supervised learning. In ICDM 2005, pages 42--49, 2005. Google ScholarDigital Library
- S. M. Beitzel, E. C. Jensen, D. D. Lewis, A. Chowdhury, and O. Frieder. Automatic classification of web queries using very large unlabeled query logs. TOIS, 25(2):9, 2007. Google ScholarDigital Library
- A. Bhattacharyya. On a measure of divergence between two statistical populations defined by probability distributions. Bull. Calcutta Math. Soc., 35:99 -- 109, 1943.Google Scholar
- J. Callan. Distributed information retrieval. In W. B. Croft, editor, Advances in Information Retrieval, pages 127--150. Kluwer Academic Publishers, 2000.Google Scholar
- J. Callan and M. Connell. Query-based sampling of text databases. TOIS, 19(2):97--130, 2001. Google ScholarDigital Library
- J. P. Callan, Z. Lu, and W. B. Croft. Searching distributed collections with inference networks. In SIGIR 1995, pages 21--28, 1995. Google ScholarDigital Library
- S. Cronen-Townsend, Y. Zhou, and W. B. Croft. Predicting query performance. In SIGIR 2002, pages 299--306, 2002. Google ScholarDigital Library
- F. Diaz. Integration of News Content Into Web Results. In WSDM 2009, pages 182--191, 2009. Google ScholarDigital Library
- L. Gravano, H. Garca-molina, A. Tomasic, I. Rocquencourt, and N. L. Gravano. Gloss: Text-source discovery over the internet. Transactions on Database Systems, 24:229--264, 1999. Google ScholarDigital Library
- X. Li, Y.-Y. Wang, and A. Acero. Learning query intent from regularized click graphs. In SIGIR 2008, pages 339--346, 2008. Google ScholarDigital Library
- Y. Li, Z. Zheng, and H. K. Dai. Kdd cup-2005 report: facing a great challenge. SIGKDD Explor. Newsl., 7(2):91--99, 2005. Google ScholarDigital Library
- V. Murdock and M. Lalmas, editors. SIGIR 2008 Workshop on Aggregated Search, 2008.Google Scholar
- D. Shen, R. Pan, J.--T. Sun, J. J. Pan, K. Wu, J. Yin, and Q. Yang. Q2c@ust: our winning solution to query classification in kddcup 2005. SIGKDD Explor. Newsl., 7(2):100--110, 2005. Google ScholarDigital Library
- D. Shen, J.-T. Sun, Q. Yang, and Z. Chen. Building bridges for web query classification. In SIGIR 2006, pages 131--138, 2006. Google ScholarDigital Library
- M. Shokouhi, J. Zobel, S. Tahaghoghi, and F. Scholer. Using query logs to establish vocabularies in distributed information retrieval. Inf. Process. Manage., 43(1):169--180, 2007. Google ScholarDigital Library
- L. Si. Federated Search of Text Search Engines in Uncooperative Environments. PhD thesis, Carnegie Mellon University, Pittsburgh, PA, 2006.Google Scholar
- L. Si and J. Callan. Relevant document distribution estimation method for resource selection. In SIGIR 2003, pages 298---305, 2003. Google ScholarDigital Library
- I. H. Witten and T. C. Bell. The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression. Transactions on Information Theory, 37, 1991.Google Scholar
- J. Xu and W. B. Croft. Cluster--based language models for distributed retrieval. In SIGIR 1999, pages 254--261. ACM, 1999. Google ScholarDigital Library
- B. Yuwono and D. L. Lee. Server ranking for distributed text retrieval systems on the internet. In DASFAA 1997, pages 41--50. World Scientific Press, 1997. Google ScholarDigital Library
Index Terms
- Sources of evidence for vertical selection
Recommendations
Adaptation of offline vertical selection predictions in the presence of user feedback
SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrievalWeb search results often integrate content from specialized corpora known as verticals. Given a query, one important aspect of aggregated search is the selection of relevant verticals from a set of candidate verticals. One drawback to previous ...
Vertical selection in the information domain of children
JCDL '13: Proceedings of the 13th ACM/IEEE-CS joint conference on Digital librariesIn this paper we explore the vertical selection methods in aggregated search in the specific domain of topics for children between 7 and 12 years old. A test collection consisting of 25 verticals, 3.8K queries and relevant assessments for a large sample ...
Evaluating reward and risk for vertical selection
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge managementThe aggregation of search results from heterogeneous verticals (news, videos, blogs, etc) has become an important consideration in search. When aiming to select suitable verticals, from which items are selected to be shown along with the standard "ten ...
Comments