skip to main content
10.1145/1390334.1390393acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Learning query intent from regularized click graphs

Published:20 July 2008Publication History

ABSTRACT

This work presents the use of click graphs in improving query intent classifiers, which are critical if vertical search and general-purpose search services are to be offered in a unified user interface. Previous works on query classification have primarily focused on improving feature representation of queries, e.g., by augmenting queries with search engine results. In this work, we investigate a completely orthogonal approach --- instead of enriching feature representation, we aim at drastically increasing the amounts of training data by semi-supervised learning with click graphs. Specifically, we infer class memberships of unlabeled queries from those of labeled ones according to their proximities in a click graph. Moreover, we regularize the learning with click graphs by content-based classification to avoid propagating erroneous labels. We demonstrate the effectiveness of our algorithms in two different applications, product intent and job intent classification. In both cases, we expand the training data with automatically labeled queries by over two orders of magnitude, leading to significant improvements in classification performance. An additional finding is that with a large amount of training data obtained in this fashion, classifiers using only query words/phrases as features can work remarkably well.

References

  1. E. Agichtein, E. Brill, and S. Dumais. Improving web search ranking by incorporating user behavior information. In SIGIR'06: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, pages 19--26, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. L. D. Baker and A. McCallum. Distributional clustering of words for text classification. In SIGIR'98: Proceedings of the 21st Annual International ACM SIGIR conference on Research and development in information retrieval, pages 96--103, August 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. Beeferman and A. Berger. Agglomerative clustering of a search engine query log. In Knowledge Discovery and Data Mining, pages 407--416, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Beitzel, E. Jensen, A. Chowdhury, and O. Frieder. Varying approaches to topical web query classification. In SIGIR'07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development, pages 783--784, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Beitzel, E. Jensen, O. Frieder, D. Lewis, A. Chowdhury, and A. Kolcz. Improving automatic query classification via semi-supervised learning. In ICDM'05: Proceedings of the 5th IEEE International Conference on Data Mining, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In COLT: Proceedings of the Workshop on Computational Learning Theory, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Broder, M. Fontoura, E. Gabrilovich, A. Joshi, V. Josifovski, and T. Zhang. Robust classification of rare queries using web knowledge. In SIGIR'07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, July 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. N. Craswell and M. Szummer. Random walk on the click graph. In SIGIR'07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 239--246, July 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2001.Google ScholarGoogle Scholar
  10. J.-Y. N. J.-R. Wen and H.-J. Zhang. Clustering user queries of a search engine. In Proceedings of the 10th International World Wide Web Conference, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. U. Lee, Z. Liu, and J. Cho. Automatic identification of user goals in web search. In WWW2005: The 14th International World Wide Web Conference 2005, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. V. S. M. Belkin, P. Niyogi and P. Bartlett. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research, 7(Nov), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. B. Nguyen and M. Kan. Functional faceted web query analysis. In WWW2007: 16th International World Wide Web Conference, 2007.Google ScholarGoogle Scholar
  14. K. Nigam, J. Lafferty, and A. McCallum. Using maximum entropy for text classification. In IJCAI'99: Workshop on Machine Learning for Information Filtering, pages 61--67, 1999.Google ScholarGoogle Scholar
  15. F. C. Pereira, N. Tishby, and L. Lee. Distributional clustering of English words. In 30th Annual Meeting of the Association for Computational Linguistics, pages 183--190, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Shen, J. Sun, Q. Yang, and Z. Chen. Building bridges for web query classification. In SIGIR'06: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, pages 131--138, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Szummer and T.Jaakkola. Partially labeled classification with Markov random walks. In Advances in Neural Information Processing Systems, volume 14, 2001.Google ScholarGoogle Scholar
  18. G.-R. Xue, D. Shen, Q. Yang, H.-J. Zeng, Z. Chen, Y. Yu, W. Xi, and W.-Y. Ma. IRC: An iterative reinforcement categorization algorithm for interrelated web objects. In Proceedings of the 4th IEEE International Conference on Data Mining, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. Yarowsky. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the33rd Annual Meeting of the Association for Computational Linguistics, pages 189--196, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. Zhou, O. Bousquet, T. Lal, J. Weston, and B. Scholkopf. Learning with local and global consistency. In Advances in Neural Information Processing Systems, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. X. Zhu and Z. Ghahramani. Learning from labeled and unlabeled data with label propagation. Technical Report CMU-CALD-02, Carnegie Mellon University, 2002.Google ScholarGoogle Scholar

Index Terms

  1. Learning query intent from regularized click graphs

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
      July 2008
      934 pages
      ISBN:9781605581644
      DOI:10.1145/1390334

      Copyright © 2008 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 20 July 2008

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate792of3,983submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader