skip to main content
10.1145/1277741.1277783acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Robust classification of rare queries using web knowledge

Published:23 July 2007Publication History

ABSTRACT

We propose a methodology for building a practical robust query classification system that can identify thousands of query classes with reasonable accuracy, while dealing in real-time with the query volume of a commercial web search engine. We use a blind feedback technique: given a query, we determine its topic by classifying the web search results retrieved by the query. Motivated by the needs of search advertising, we primarily focus on rare queries, which are the hardest from the point of view of machine learning, yet in aggregation account for a considerable fraction of search engine traffic. Empirical evaluation confirms that our methodology yields a considerably higher classification accuracy than previously reported. We believe that the proposed methodology will lead to better matching of online ads to rare queries and overall to a better user experience.

References

  1. S. Beitzel, E. Jensen, O. Frieder, D. Grossman, D. Lewis, A. Chowdhury, and A. Kolcz. Automatic web query classification using labeled and unlabeled training data. In Proceedings of SIGIR'05, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Beitzel, E. Jensen, O. Frieder, D. Lewis, A. Chowdhury, and A. Kolcz. Improving automatic query classification via semi-supervised learning. In Proceedings of ICDM'05, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Duda and P. Hart. Pattern Classification and Scene Analysis. John Wiley and Sons, 1973. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. E. Efthimiadis and P. Biron. UCLA-Okapi at TREC-2: Query expansion experiments. In TREC-2, 1994.Google ScholarGoogle Scholar
  5. E. Gabrilovich and S. Markovitch. Feature generation for text categorization using world knowledge. In IJCAI'05, pages 1048--1053, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. L. Gravano, V. Hatzivassiloglou, and R. Lichtenstein. Categorizing web queries according to geographical locality. In CIKM'03, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. E. Han and G. Karypis. Centroid-based document classification: Analysis and experimental results. In PKDD'00, September 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. K. Jarvelin and J. Kekalainen. IR evaluation methods for retrieving highly relevant documents. In SIGIR'00, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Z. Kardkovacs, D. Tikk, and Z. Bansaghi. The ferrety algorithm for the KDD Cup 2005 problem. In SIGKDD Explorations, volume 7. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. Kowalczyk, I. Zukerman, and M. Niemann. Analyzing the effect of query class on document retrieval performance. In Proc. Australian Conf. on AI, pages 550--561, 2004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Y. Li, Z. Zheng, and H. Dai. KDD CUP-2005 report: Facing a great challenge. In SIGKDD Explorations, volume 7, pages 91--99. ACM, December 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Mitra, A. Singhal, and C. Buckley. Improving automatic query expansion. In SIGIR'98, pages 206--214, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Moran and B. Hunt. Search Engine Marketing, Inc.: Driving Search Traffic to Your Company's Web Site. Prentice Hall, Upper Saddle River, NJ, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. Okapi at TREC-3. In TREC-3, 1995.Google ScholarGoogle Scholar
  15. J. Rocchio. Relevance feedback in information retrieval. In The SMART Retrieval System: Experiments in Automatic Document Processing, pages 313--323. Prentice Hall, 1971. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. G. Salton and C. Buckley. Improving retrieval performance by relevance feedback. JASIS, 41(4):288--297, 1990.Google ScholarGoogle ScholarCross RefCross Ref
  17. T. Santner and D. Duffy. The Statistical Analysis of Discrete Data. Springer-Verlag, 1989.Google ScholarGoogle ScholarCross RefCross Ref
  18. D. Shen, R. Pan, J. Sun, J. Pan, K. Wu, J. Yin, and Q. Yang. Q2CυUST: Our winning solution to query classification in KDDCUP 2005. In SIGKDD Explorations, volume 7, pages 100--110. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. Shen, R. Pan, J. Sun, J. Pan, K. Wu, J. Yin, and Q. Yang. Query enrichment for web-query classification. ACM TOIS, 24:320--352, July 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. Shen, J. Sun, Q. Yang, and Z. Chen. Building bridges for web query classification. In SIGIR'06, pages 131--138, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Vogel, S. Bickel, P. Haider, R. Schimpfky, P. Siemen, S. Bridges, and T. Scheffer. Classifying search engine queries using the web as background knowledge. In SIGKDD Explorations, volume 7. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. E. Voorhees. Query expansion using lexical-semantic relations. In SIGIR'94, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Xu and W. Bruce Croft. Improving the effectiveness of information retrieval with local context analysis. ACM TOIS, 18(1):79--112, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Robust classification of rare queries using web knowledge

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
        July 2007
        946 pages
        ISBN:9781595935977
        DOI:10.1145/1277741

        Copyright © 2007 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 23 July 2007

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate792of3,983submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader