research-article

Learning query intent from regularized click graphs

Authors:
Xiao Li

Microsoft Research, Redmond, WA, USA

Microsoft Research, Redmond, WA, USA
View Profile

,
Ye-Yi Wang

Microsoft Research, Redmond, WA, USA

Microsoft Research, Redmond, WA, USA
View Profile

,
Alex Acero

Microsoft Research, Redmond, WA, USA

Microsoft Research, Redmond, WA, USA
View Profile

SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrievalJuly 2008Pages 339–346https://doi.org/10.1145/1390334.1390393

Published:20 July 2008Publication History

SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

Pages 339–346

ABSTRACT

This work presents the use of click graphs in improving query intent classifiers, which are critical if vertical search and general-purpose search services are to be offered in a unified user interface. Previous works on query classification have primarily focused on improving feature representation of queries, e.g., by augmenting queries with search engine results. In this work, we investigate a completely orthogonal approach --- instead of enriching feature representation, we aim at drastically increasing the amounts of training data by semi-supervised learning with click graphs. Specifically, we infer class memberships of unlabeled queries from those of labeled ones according to their proximities in a click graph. Moreover, we regularize the learning with click graphs by content-based classification to avoid propagating erroneous labels. We demonstrate the effectiveness of our algorithms in two different applications, product intent and job intent classification. In both cases, we expand the training data with automatically labeled queries by over two orders of magnitude, leading to significant improvements in classification performance. An additional finding is that with a large amount of training data obtained in this fashion, classifiers using only query words/phrases as features can work remarkably well.

References

E. Agichtein, E. Brill, and S. Dumais. Improving web search ranking by incorporating user behavior information. In SIGIR'06: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, pages 19--26, 2006. Google ScholarDigital Library
L. D. Baker and A. McCallum. Distributional clustering of words for text classification. In SIGIR'98: Proceedings of the 21st Annual International ACM SIGIR conference on Research and development in information retrieval, pages 96--103, August 1998. Google ScholarDigital Library
D. Beeferman and A. Berger. Agglomerative clustering of a search engine query log. In Knowledge Discovery and Data Mining, pages 407--416, 2000. Google ScholarDigital Library
S. Beitzel, E. Jensen, A. Chowdhury, and O. Frieder. Varying approaches to topical web query classification. In SIGIR'07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development, pages 783--784, 2007. Google ScholarDigital Library
S. Beitzel, E. Jensen, O. Frieder, D. Lewis, A. Chowdhury, and A. Kolcz. Improving automatic query classification via semi-supervised learning. In ICDM'05: Proceedings of the 5th IEEE International Conference on Data Mining, 2005. Google ScholarDigital Library
A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In COLT: Proceedings of the Workshop on Computational Learning Theory, 1998. Google ScholarDigital Library
A. Broder, M. Fontoura, E. Gabrilovich, A. Joshi, V. Josifovski, and T. Zhang. Robust classification of rare queries using web knowledge. In SIGIR'07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, July 2007. Google ScholarDigital Library
N. Craswell and M. Szummer. Random walk on the click graph. In SIGIR'07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 239--246, July 2007. Google ScholarDigital Library
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2001.Google Scholar
J.-Y. N. J.-R. Wen and H.-J. Zhang. Clustering user queries of a search engine. In Proceedings of the 10th International World Wide Web Conference, 2001. Google ScholarDigital Library
U. Lee, Z. Liu, and J. Cho. Automatic identification of user goals in web search. In WWW2005: The 14th International World Wide Web Conference 2005, 2005. Google ScholarDigital Library
V. S. M. Belkin, P. Niyogi and P. Bartlett. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research, 7(Nov), 2006. Google ScholarDigital Library
B. Nguyen and M. Kan. Functional faceted web query analysis. In WWW2007: 16th International World Wide Web Conference, 2007.Google Scholar
K. Nigam, J. Lafferty, and A. McCallum. Using maximum entropy for text classification. In IJCAI'99: Workshop on Machine Learning for Information Filtering, pages 61--67, 1999.Google Scholar
F. C. Pereira, N. Tishby, and L. Lee. Distributional clustering of English words. In 30th Annual Meeting of the Association for Computational Linguistics, pages 183--190, 1993. Google ScholarDigital Library
D. Shen, J. Sun, Q. Yang, and Z. Chen. Building bridges for web query classification. In SIGIR'06: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, pages 131--138, 2006. Google ScholarDigital Library
M. Szummer and T.Jaakkola. Partially labeled classification with Markov random walks. In Advances in Neural Information Processing Systems, volume 14, 2001.Google Scholar
G.-R. Xue, D. Shen, Q. Yang, H.-J. Zeng, Z. Chen, Y. Yu, W. Xi, and W.-Y. Ma. IRC: An iterative reinforcement categorization algorithm for interrelated web objects. In Proceedings of the 4th IEEE International Conference on Data Mining, 2004. Google ScholarDigital Library
D. Yarowsky. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the33rd Annual Meeting of the Association for Computational Linguistics, pages 189--196, 1995. Google ScholarDigital Library
D. Zhou, O. Bousquet, T. Lal, J. Weston, and B. Scholkopf. Learning with local and global consistency. In Advances in Neural Information Processing Systems, 2003.Google ScholarDigital Library
X. Zhu and Z. Ghahramani. Learning from labeled and unlabeled data with label propagation. Technical Report CMU-CALD-02, Carnegie Mellon University, 2002.Google Scholar

Index Terms

Learning query intent from regularized click graphs
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing

Recommendations

Understanding user's query intent with wikipedia
WWW '09: Proceedings of the 18th international conference on World wide web

Understanding the intent behind a user's query can help search engine to automatically route the query to some corresponding vertical search engines to obtain particularly relevant contents, thus, greatly improving user satisfaction. There are three ...
Read More
Learning with click graph for query intent classification

Topical query classification, as one step toward understanding users' search intent, is gaining increasing attention in information retrieval. Previous works on this subject primarily focused on enrichment of query features, for example, by augmenting ...
Read More
Regularized query classification using search click information

Hundreds of millions of users each day submit queries to the Web search engine. The user queries are typically very short which makes query understanding a challenging problem. In this paper, we propose a novel approach for query representation and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
July 2008
934 pages
ISBN:9781605581644
DOI:10.1145/1390334
General Chairs:
Tat-Seng Chua
National University of Singapore
,
Mun-Kew Leong
National Library Board, Singapore
,
Program Chairs:
Syung Hyon Myaeng
Information and Communications University, Korea
,
Douglas W. Oard
University of Maryland, College Park, USA
,
Fabrizio Sebastiani
Consiglio Nazionale delle Ricerche, Italy
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 July 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
click-through data
query classification
semi-supervised learning
user intent
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 197
  Total Citations
  View Citations
- 1,982
  Total Downloads
- Downloads (Last 12 months)35
- Downloads (Last 6 weeks)11
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Learning query intent from regularized click graphs

SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Understanding user's query intent with wikipedia

Learning with click graph for query intent classification

Regularized query classification using search click information