skip to main content
10.1145/860435.860449acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Query type classification for web document retrieval

Published:28 July 2003Publication History

ABSTRACT

The heterogeneous Web exacerbates IR problems and short user queries make them worse. The contents of web documents are not enough to find good answer documents. Link information and URL information compensates for the insufficiencies of content information. However, static combination of multiple evidences may lower the retrieval performance. We need different strategies to find target documents according to a query type. We can classify user queries as three categories, the topic relevance task, the homepage finding task, and the service finding task. In this paper, a user query classification scheme is proposed. This scheme uses the difference of distribution, mutual information, the usage rate as anchor texts, and the POS information for the classification. After we classified a user query, we apply different algorithms and information for the better results. For the topic relevance task, we emphasize the content information, on the other hand, for the homepage finding task, we emphasize the Link information and the URL information. We could get the best performance when our proposed classification method with the OKAPI scoring algorithm was used.

References

  1. R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. ACM PRESS BOOKS, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. P. Bailey, N. Craswell, and D. Hawking. Engineering a multi-purpose test collection for web retrieval experiments. Information Processing and Management, to appear. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1-7):107--117, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Broder. A taxonomy of web search. SIGIR Forum, 36(2), 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. W. B. Croft. Combining approaches to information retrieval. In Advances in Information Retrieval: Recent Research from the Center for Intelligent Information Retrieval, pages 1--36. Kluwer Academic Publishers, 2000.Google ScholarGoogle Scholar
  6. CSIRO. Web research collections - trec web track. www.ted.cmis.csiro.au /TRECWeb/, 2001.Google ScholarGoogle Scholar
  7. E. Fox and J. Shaw. Combination of multiple searches. In Text REtrieval Conference (TREC-1), pages 243--252, 1993.Google ScholarGoogle Scholar
  8. D. Hawking and N. Craswell. Overview of the trec-2001 web track. In Text REtrieval Conference (TREC-10), pages 61--67, 2001.Google ScholarGoogle Scholar
  9. E. Jaynes. Information theory and statistical mechanics. Physics Review, 106(4):620--630, 1957.Google ScholarGoogle ScholarCross RefCross Ref
  10. J. H. Lee. Analyses of multiple evidence combination. In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 267--276, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. D. Manning and H. Schutze. Foundations of Statistical Natural Language Processing. The MIT Press, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. P. Ogilvie and J. Callan. Experiments using the lemur toolkit. In Text REtrieval Conference (TREC-10), pages 103--108, 2001.Google ScholarGoogle Scholar
  13. L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.Google ScholarGoogle Scholar
  14. J. M. Ponte. Language models for relevance feedback. In W. B. Croft, editor, Advances in Information Retrieval: Recent Research from the Center for Intelligent Information Retrieval, pages 73--95. Kluwer Academic Publishers, 2000.Google ScholarGoogle Scholar
  15. S. E. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. Okapi at trec-3. In Text REtrieval Conference (TREC-2), pages 109--126, 1994.Google ScholarGoogle Scholar
  16. T. Westerveld, W. Kraaij, and D. Hiemstra. Retrieving web pages using content, links, urls and anchors. In Text REtrieval Conference (TREC-10), pages 663--672, 2001.Google ScholarGoogle Scholar
  17. K. Yang. Combining text and link-based retrieval methods for web ir. In Text REtrieval Conference (TREC-10), pages 609--618, 2001.Google ScholarGoogle Scholar

Index Terms

  1. Query type classification for web document retrieval

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
      July 2003
      490 pages
      ISBN:1581136463
      DOI:10.1145/860435

      Copyright © 2003 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 28 July 2003

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      SIGIR '03 Paper Acceptance Rate46of266submissions,17%Overall Acceptance Rate792of3,983submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader