research-article

Sources of evidence for vertical selection

Authors:
Jaime Arguello

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

,
Fernando Diaz

Yahoo! Labs Montreal, Montreal, PQ, Canada

Yahoo! Labs Montreal, Montreal, PQ, Canada
View Profile

,
Jamie Callan

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

,
Jean-Francois Crespo

Yahoo! Labs Montreal, Montreal, PQ, Canada

Yahoo! Labs Montreal, Montreal, PQ, Canada
View Profile

SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrievalJuly 2009Pages 315–322https://doi.org/10.1145/1571941.1571997

Published:19 July 2009Publication History

SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval

Pages 315–322

ABSTRACT

Web search providers often include search services for domain-specific subcollections, called verticals, such as news, images, videos, job postings, company summaries, and artist profiles. We address the problem of vertical selection, predicting relevant verticals (if any) for queries issued to the search engine's main web search page. In contrast to prior query classification and resource selection tasks, vertical selection is associated with unique resources that can inform the classification decision. We focus on three sources of evidence: (1) the query string, from which features are derived independent of external resources, (2) logs of queries previously issued directly to the vertical, and (3) corpora representative of vertical content. We focus on 18 different verticals, which differ in terms of semantics, media type, size, and level of query traffic. We compare our method to prior work in federated search and retrieval effectiveness prediction. An in-depth error analysis reveals unique challenges across different verticals and provides insight into vertical selection for future work.

References

S. M. Beitzel, E. C. Jensen, O. Frieder, D. D. Lewis, A. Chowdhury, and A. Kolcz. Improving automatic query classification via semi-supervised learning. In ICDM 2005, pages 42--49, 2005. Google ScholarDigital Library
S. M. Beitzel, E. C. Jensen, D. D. Lewis, A. Chowdhury, and O. Frieder. Automatic classification of web queries using very large unlabeled query logs. TOIS, 25(2):9, 2007. Google ScholarDigital Library
A. Bhattacharyya. On a measure of divergence between two statistical populations defined by probability distributions. Bull. Calcutta Math. Soc., 35:99 -- 109, 1943.Google Scholar
J. Callan. Distributed information retrieval. In W. B. Croft, editor, Advances in Information Retrieval, pages 127--150. Kluwer Academic Publishers, 2000.Google Scholar
J. Callan and M. Connell. Query-based sampling of text databases. TOIS, 19(2):97--130, 2001. Google ScholarDigital Library
J. P. Callan, Z. Lu, and W. B. Croft. Searching distributed collections with inference networks. In SIGIR 1995, pages 21--28, 1995. Google ScholarDigital Library
S. Cronen-Townsend, Y. Zhou, and W. B. Croft. Predicting query performance. In SIGIR 2002, pages 299--306, 2002. Google ScholarDigital Library
F. Diaz. Integration of News Content Into Web Results. In WSDM 2009, pages 182--191, 2009. Google ScholarDigital Library
L. Gravano, H. Garca-molina, A. Tomasic, I. Rocquencourt, and N. L. Gravano. Gloss: Text-source discovery over the internet. Transactions on Database Systems, 24:229--264, 1999. Google ScholarDigital Library
X. Li, Y.-Y. Wang, and A. Acero. Learning query intent from regularized click graphs. In SIGIR 2008, pages 339--346, 2008. Google ScholarDigital Library
Y. Li, Z. Zheng, and H. K. Dai. Kdd cup-2005 report: facing a great challenge. SIGKDD Explor. Newsl., 7(2):91--99, 2005. Google ScholarDigital Library
V. Murdock and M. Lalmas, editors. SIGIR 2008 Workshop on Aggregated Search, 2008.Google Scholar
D. Shen, R. Pan, J.--T. Sun, J. J. Pan, K. Wu, J. Yin, and Q. Yang. Q2c@ust: our winning solution to query classification in kddcup 2005. SIGKDD Explor. Newsl., 7(2):100--110, 2005. Google ScholarDigital Library
D. Shen, J.-T. Sun, Q. Yang, and Z. Chen. Building bridges for web query classification. In SIGIR 2006, pages 131--138, 2006. Google ScholarDigital Library
M. Shokouhi, J. Zobel, S. Tahaghoghi, and F. Scholer. Using query logs to establish vocabularies in distributed information retrieval. Inf. Process. Manage., 43(1):169--180, 2007. Google ScholarDigital Library
L. Si. Federated Search of Text Search Engines in Uncooperative Environments. PhD thesis, Carnegie Mellon University, Pittsburgh, PA, 2006.Google Scholar
L. Si and J. Callan. Relevant document distribution estimation method for resource selection. In SIGIR 2003, pages 298---305, 2003. Google ScholarDigital Library
I. H. Witten and T. C. Bell. The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression. Transactions on Information Theory, 37, 1991.Google Scholar
J. Xu and W. B. Croft. Cluster--based language models for distributed retrieval. In SIGIR 1999, pages 254--261. ACM, 1999. Google ScholarDigital Library
B. Yuwono and D. L. Lee. Server ranking for distributed text retrieval systems on the internet. In DASFAA 1997, pages 41--50. World Scientific Press, 1997. Google ScholarDigital Library

Index Terms

Sources of evidence for vertical selection
1. Information systems
  1. Information retrieval

Recommendations

Adaptation of offline vertical selection predictions in the presence of user feedback
SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval

Web search results often integrate content from specialized corpora known as verticals. Given a query, one important aspect of aggregated search is the selection of relevant verticals from a set of candidate verticals. One drawback to previous ...
Read More
Vertical selection in the information domain of children
JCDL '13: Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries

In this paper we explore the vertical selection methods in aggregated search in the specific domain of topics for children between 7 and 12 years old. A test collection consisting of 25 verticals, 3.8K queries and relevant assessments for a large sample ...
Read More
Evaluating reward and risk for vertical selection
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

The aggregation of search results from heterogeneous verticals (news, videos, blogs, etc) has become an important consideration in search. When aiming to select suitable verticals, from which items are selected to be shown along with the standard "ten ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
July 2009
896 pages
ISBN:9781605584836
DOI:10.1145/1571941
General Chairs:
James Allan
University of Massachusetts Amherst, USA
,
Javed Aslam
Northeastern University, USA
,
Program Chairs:
Mark Sanderson
University of Sheffield, UK
,
ChengXiang Zhai
University of Illinois at Urbana-Champaign, USA
,
Justin Zobel
University of Melbourne, Australia
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 July 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
aggregated search
distributed information retrieval
query classification
resource selection
vertical selection
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 131
  Total Citations
  View Citations
- 1,494
  Total Downloads
- Downloads (Last 12 months)24
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Sources of evidence for vertical selection

SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Adaptation of offline vertical selection predictions in the presence of user feedback

Vertical selection in the information domain of children

Evaluating reward and risk for vertical selection