Article

Robust classification of rare queries using web knowledge

Authors:
Andrei Z. Broder

Yahoo Research

Yahoo Research
View Profile

,
Marcus Fontoura

Yahoo Research

Yahoo Research
View Profile

,
Evgeniy Gabrilovich

Yahoo Research

Yahoo Research
View Profile

,
Amruta Joshi

Yahoo Research

Yahoo Research
View Profile

,
Vanja Josifovski

Yahoo Research

Yahoo Research
View Profile

,
Tong Zhang

Yahoo Research

Yahoo Research
View Profile

SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrievalJuly 2007Pages 231–238https://doi.org/10.1145/1277741.1277783

Published:23 July 2007Publication History

SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

Pages 231–238

ABSTRACT

We propose a methodology for building a practical robust query classification system that can identify thousands of query classes with reasonable accuracy, while dealing in real-time with the query volume of a commercial web search engine. We use a blind feedback technique: given a query, we determine its topic by classifying the web search results retrieved by the query. Motivated by the needs of search advertising, we primarily focus on rare queries, which are the hardest from the point of view of machine learning, yet in aggregation account for a considerable fraction of search engine traffic. Empirical evaluation confirms that our methodology yields a considerably higher classification accuracy than previously reported. We believe that the proposed methodology will lead to better matching of online ads to rare queries and overall to a better user experience.

References

S. Beitzel, E. Jensen, O. Frieder, D. Grossman, D. Lewis, A. Chowdhury, and A. Kolcz. Automatic web query classification using labeled and unlabeled training data. In Proceedings of SIGIR'05, 2005. Google ScholarDigital Library
S. Beitzel, E. Jensen, O. Frieder, D. Lewis, A. Chowdhury, and A. Kolcz. Improving automatic query classification via semi-supervised learning. In Proceedings of ICDM'05, 2005. Google ScholarDigital Library
R. Duda and P. Hart. Pattern Classification and Scene Analysis. John Wiley and Sons, 1973. Google ScholarDigital Library
E. Efthimiadis and P. Biron. UCLA-Okapi at TREC-2: Query expansion experiments. In TREC-2, 1994.Google Scholar
E. Gabrilovich and S. Markovitch. Feature generation for text categorization using world knowledge. In IJCAI'05, pages 1048--1053, 2005. Google ScholarDigital Library
L. Gravano, V. Hatzivassiloglou, and R. Lichtenstein. Categorizing web queries according to geographical locality. In CIKM'03, 2003. Google ScholarDigital Library
E. Han and G. Karypis. Centroid-based document classification: Analysis and experimental results. In PKDD'00, September 2000. Google ScholarDigital Library
K. Jarvelin and J. Kekalainen. IR evaluation methods for retrieving highly relevant documents. In SIGIR'00, 2000. Google ScholarDigital Library
Z. Kardkovacs, D. Tikk, and Z. Bansaghi. The ferrety algorithm for the KDD Cup 2005 problem. In SIGKDD Explorations, volume 7. ACM, 2005. Google ScholarDigital Library
P. Kowalczyk, I. Zukerman, and M. Niemann. Analyzing the effect of query class on document retrieval performance. In Proc. Australian Conf. on AI, pages 550--561, 2004.Google ScholarDigital Library
Y. Li, Z. Zheng, and H. Dai. KDD CUP-2005 report: Facing a great challenge. In SIGKDD Explorations, volume 7, pages 91--99. ACM, December 2005. Google ScholarDigital Library
M. Mitra, A. Singhal, and C. Buckley. Improving automatic query expansion. In SIGIR'98, pages 206--214, 1998. Google ScholarDigital Library
M. Moran and B. Hunt. Search Engine Marketing, Inc.: Driving Search Traffic to Your Company's Web Site. Prentice Hall, Upper Saddle River, NJ, 2005. Google ScholarDigital Library
S. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. Okapi at TREC-3. In TREC-3, 1995.Google Scholar
J. Rocchio. Relevance feedback in information retrieval. In The SMART Retrieval System: Experiments in Automatic Document Processing, pages 313--323. Prentice Hall, 1971. Google ScholarDigital Library
G. Salton and C. Buckley. Improving retrieval performance by relevance feedback. JASIS, 41(4):288--297, 1990.Google ScholarCross Ref
T. Santner and D. Duffy. The Statistical Analysis of Discrete Data. Springer-Verlag, 1989.Google ScholarCross Ref
D. Shen, R. Pan, J. Sun, J. Pan, K. Wu, J. Yin, and Q. Yang. Q2CυUST: Our winning solution to query classification in KDDCUP 2005. In SIGKDD Explorations, volume 7, pages 100--110. ACM, 2005. Google ScholarDigital Library
D. Shen, R. Pan, J. Sun, J. Pan, K. Wu, J. Yin, and Q. Yang. Query enrichment for web-query classification. ACM TOIS, 24:320--352, July 2006. Google ScholarDigital Library
D. Shen, J. Sun, Q. Yang, and Z. Chen. Building bridges for web query classification. In SIGIR'06, pages 131--138, 2006. Google ScholarDigital Library
D. Vogel, S. Bickel, P. Haider, R. Schimpfky, P. Siemen, S. Bridges, and T. Scheffer. Classifying search engine queries using the web as background knowledge. In SIGKDD Explorations, volume 7. ACM, 2005. Google ScholarDigital Library
E. Voorhees. Query expansion using lexical-semantic relations. In SIGIR'94, 1994. Google ScholarDigital Library
J. Xu and W. Bruce Croft. Improving the effectiveness of information retrieval with local context analysis. ACM TOIS, 18(1):79--112, 2000. Google ScholarDigital Library

Index Terms

Robust classification of rare queries using web knowledge
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
      1. Relevance assessment
    2. Information retrieval query processing

Recommendations

Classifying search queries using the Web as a source of knowledge

We propose a methodology for building a robust query classification system that can identify thousands of query classes, while dealing in real time with the query volume of a commercial Web search engine. We use a pseudo relevance feedback technique: ...
Read More
Varying approaches to topical web query classification
SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

Topical classification of web queries has drawn recent interest because of the promise it offers in improving retrieval effectiveness and efficiency. However, much of this promise depends on whether classification is performed before or after the query ...
Read More
Regularized query classification using search click information

Hundreds of millions of users each day submit queries to the Web search engine. The user queries are typically very short which makes query understanding a challenging problem. In this paper, we propose a novel approach for query representation and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
July 2007
946 pages
ISBN:9781595935977
DOI:10.1145/1277741
General Chairs:
Wessel Kraaij
TNO, The Netherlands
,
Arjen P. de Vries
CWI, The Netherlands
,
Program Chairs:
Charles L. A. Clarke
University of Waterloo, Canada
,
Norbert Fuhr
University of Duisburg-Essen, Germany
,
Noriko Kando
National Institute of Informatics, Japan
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 July 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
blind relevance feedback
query classification
web search
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 145
  Total Citations
  View Citations
- 1,402
  Total Downloads
- Downloads (Last 12 months)16
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Robust classification of rare queries using web knowledge

SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Classifying search queries using the Web as a source of knowledge

Varying approaches to topical web query classification

Regularized query classification using search click information