ABSTRACT
This work presents the use of click graphs in improving query intent classifiers, which are critical if vertical search and general-purpose search services are to be offered in a unified user interface. Previous works on query classification have primarily focused on improving feature representation of queries, e.g., by augmenting queries with search engine results. In this work, we investigate a completely orthogonal approach --- instead of enriching feature representation, we aim at drastically increasing the amounts of training data by semi-supervised learning with click graphs. Specifically, we infer class memberships of unlabeled queries from those of labeled ones according to their proximities in a click graph. Moreover, we regularize the learning with click graphs by content-based classification to avoid propagating erroneous labels. We demonstrate the effectiveness of our algorithms in two different applications, product intent and job intent classification. In both cases, we expand the training data with automatically labeled queries by over two orders of magnitude, leading to significant improvements in classification performance. An additional finding is that with a large amount of training data obtained in this fashion, classifiers using only query words/phrases as features can work remarkably well.
- E. Agichtein, E. Brill, and S. Dumais. Improving web search ranking by incorporating user behavior information. In SIGIR'06: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, pages 19--26, 2006. Google ScholarDigital Library
- L. D. Baker and A. McCallum. Distributional clustering of words for text classification. In SIGIR'98: Proceedings of the 21st Annual International ACM SIGIR conference on Research and development in information retrieval, pages 96--103, August 1998. Google ScholarDigital Library
- D. Beeferman and A. Berger. Agglomerative clustering of a search engine query log. In Knowledge Discovery and Data Mining, pages 407--416, 2000. Google ScholarDigital Library
- S. Beitzel, E. Jensen, A. Chowdhury, and O. Frieder. Varying approaches to topical web query classification. In SIGIR'07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development, pages 783--784, 2007. Google ScholarDigital Library
- S. Beitzel, E. Jensen, O. Frieder, D. Lewis, A. Chowdhury, and A. Kolcz. Improving automatic query classification via semi-supervised learning. In ICDM'05: Proceedings of the 5th IEEE International Conference on Data Mining, 2005. Google ScholarDigital Library
- A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In COLT: Proceedings of the Workshop on Computational Learning Theory, 1998. Google ScholarDigital Library
- A. Broder, M. Fontoura, E. Gabrilovich, A. Joshi, V. Josifovski, and T. Zhang. Robust classification of rare queries using web knowledge. In SIGIR'07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, July 2007. Google ScholarDigital Library
- N. Craswell and M. Szummer. Random walk on the click graph. In SIGIR'07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 239--246, July 2007. Google ScholarDigital Library
- T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2001.Google Scholar
- J.-Y. N. J.-R. Wen and H.-J. Zhang. Clustering user queries of a search engine. In Proceedings of the 10th International World Wide Web Conference, 2001. Google ScholarDigital Library
- U. Lee, Z. Liu, and J. Cho. Automatic identification of user goals in web search. In WWW2005: The 14th International World Wide Web Conference 2005, 2005. Google ScholarDigital Library
- V. S. M. Belkin, P. Niyogi and P. Bartlett. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research, 7(Nov), 2006. Google ScholarDigital Library
- B. Nguyen and M. Kan. Functional faceted web query analysis. In WWW2007: 16th International World Wide Web Conference, 2007.Google Scholar
- K. Nigam, J. Lafferty, and A. McCallum. Using maximum entropy for text classification. In IJCAI'99: Workshop on Machine Learning for Information Filtering, pages 61--67, 1999.Google Scholar
- F. C. Pereira, N. Tishby, and L. Lee. Distributional clustering of English words. In 30th Annual Meeting of the Association for Computational Linguistics, pages 183--190, 1993. Google ScholarDigital Library
- D. Shen, J. Sun, Q. Yang, and Z. Chen. Building bridges for web query classification. In SIGIR'06: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, pages 131--138, 2006. Google ScholarDigital Library
- M. Szummer and T.Jaakkola. Partially labeled classification with Markov random walks. In Advances in Neural Information Processing Systems, volume 14, 2001.Google Scholar
- G.-R. Xue, D. Shen, Q. Yang, H.-J. Zeng, Z. Chen, Y. Yu, W. Xi, and W.-Y. Ma. IRC: An iterative reinforcement categorization algorithm for interrelated web objects. In Proceedings of the 4th IEEE International Conference on Data Mining, 2004. Google ScholarDigital Library
- D. Yarowsky. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the33rd Annual Meeting of the Association for Computational Linguistics, pages 189--196, 1995. Google ScholarDigital Library
- D. Zhou, O. Bousquet, T. Lal, J. Weston, and B. Scholkopf. Learning with local and global consistency. In Advances in Neural Information Processing Systems, 2003.Google ScholarDigital Library
- X. Zhu and Z. Ghahramani. Learning from labeled and unlabeled data with label propagation. Technical Report CMU-CALD-02, Carnegie Mellon University, 2002.Google Scholar
Index Terms
- Learning query intent from regularized click graphs
Recommendations
Understanding user's query intent with wikipedia
WWW '09: Proceedings of the 18th international conference on World wide webUnderstanding the intent behind a user's query can help search engine to automatically route the query to some corresponding vertical search engines to obtain particularly relevant contents, thus, greatly improving user satisfaction. There are three ...
Learning with click graph for query intent classification
Topical query classification, as one step toward understanding users' search intent, is gaining increasing attention in information retrieval. Previous works on this subject primarily focused on enrichment of query features, for example, by augmenting ...
Regularized query classification using search click information
Hundreds of millions of users each day submit queries to the Web search engine. The user queries are typically very short which makes query understanding a challenging problem. In this paper, we propose a novel approach for query representation and ...
Comments