Article

Query type classification for web document retrieval

Authors:
In-Ho Kang

KAIST

KAIST
View Profile

,
GilChang Kim

KAIST

KAIST
View Profile

SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrievalJuly 2003Pages 64–71https://doi.org/10.1145/860435.860449

Published:28 July 2003Publication History

SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval

Pages 64–71

ABSTRACT

The heterogeneous Web exacerbates IR problems and short user queries make them worse. The contents of web documents are not enough to find good answer documents. Link information and URL information compensates for the insufficiencies of content information. However, static combination of multiple evidences may lower the retrieval performance. We need different strategies to find target documents according to a query type. We can classify user queries as three categories, the topic relevance task, the homepage finding task, and the service finding task. In this paper, a user query classification scheme is proposed. This scheme uses the difference of distribution, mutual information, the usage rate as anchor texts, and the POS information for the classification. After we classified a user query, we apply different algorithms and information for the better results. For the topic relevance task, we emphasize the content information, on the other hand, for the homepage finding task, we emphasize the Link information and the URL information. We could get the best performance when our proposed classification method with the OKAPI scoring algorithm was used.

References

R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. ACM PRESS BOOKS, 1999. Google ScholarDigital Library
P. Bailey, N. Craswell, and D. Hawking. Engineering a multi-purpose test collection for web retrieval experiments. Information Processing and Management, to appear. Google ScholarDigital Library
S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1-7):107--117, 1998. Google ScholarDigital Library
A. Broder. A taxonomy of web search. SIGIR Forum, 36(2), 2002. Google ScholarDigital Library
W. B. Croft. Combining approaches to information retrieval. In Advances in Information Retrieval: Recent Research from the Center for Intelligent Information Retrieval, pages 1--36. Kluwer Academic Publishers, 2000.Google Scholar
CSIRO. Web research collections - trec web track. www.ted.cmis.csiro.au /TRECWeb/, 2001.Google Scholar
E. Fox and J. Shaw. Combination of multiple searches. In Text REtrieval Conference (TREC-1), pages 243--252, 1993.Google Scholar
D. Hawking and N. Craswell. Overview of the trec-2001 web track. In Text REtrieval Conference (TREC-10), pages 61--67, 2001.Google Scholar
E. Jaynes. Information theory and statistical mechanics. Physics Review, 106(4):620--630, 1957.Google ScholarCross Ref
J. H. Lee. Analyses of multiple evidence combination. In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 267--276, 1997. Google ScholarDigital Library
C. D. Manning and H. Schutze. Foundations of Statistical Natural Language Processing. The MIT Press, 1999. Google ScholarDigital Library
P. Ogilvie and J. Callan. Experiments using the lemur toolkit. In Text REtrieval Conference (TREC-10), pages 103--108, 2001.Google Scholar
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.Google Scholar
J. M. Ponte. Language models for relevance feedback. In W. B. Croft, editor, Advances in Information Retrieval: Recent Research from the Center for Intelligent Information Retrieval, pages 73--95. Kluwer Academic Publishers, 2000.Google Scholar
S. E. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. Okapi at trec-3. In Text REtrieval Conference (TREC-2), pages 109--126, 1994.Google Scholar
T. Westerveld, W. Kraaij, and D. Hiemstra. Retrieving web pages using content, links, urls and anchors. In Text REtrieval Conference (TREC-10), pages 663--672, 2001.Google Scholar
K. Yang. Combining text and link-based retrieval methods for web ir. In Text REtrieval Conference (TREC-10), pages 609--618, 2001.Google Scholar

Index Terms

Query type classification for web document retrieval
1. Information systems
  1. Information systems applications

Recommendations

Context-aware query classification
SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval

Understanding users'search intent expressed through their search queries is crucial to Web search and online advertisement. Web query classification (QC) has been widely studied for this purpose. Most previous QC algorithms classify individual queries ...
Read More
Modeling anchor text and classifying queries to enhance web document retrieval
WWW '08: Proceedings of the 17th international conference on World Wide Web

Several types of queries are widely used on the World Wide Web and the expected retrieval method can vary depending on the query type. We propose a method for classifying queries into informational and navigational types. Because terms in navigational ...
Read More
Varying approaches to topical web query classification
SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

Topical classification of web queries has drawn recent interest because of the promise it offers in improving retrieval effectiveness and efficiency. However, much of this promise depends on whether classification is performed before or after the query ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
July 2003
490 pages
ISBN:1581136463
DOI:10.1145/860435
General Chairs:
Charles Clarke
University of Waterloo, Canada
,
Gordon Cormack
University of Waterloo, Canada
,
Program Chairs:
Jamie Callan
Carnegie Mellon University, Pittsburgh, PA
,
David Hawking
Australian National University, Australia
,
Alan Smeaton
Dublin City University, Ireland
Copyright © 2003 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 July 2003
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
URL information
combination of multiple evidences
link information
query classification
Qualifiers
- Article
Conference

Acceptance Rates
SIGIR '03 Paper Acceptance Rate46of266submissions,17%Overall Acceptance Rate792of3,983submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 170
  Total Citations
  View Citations
- 3,675
  Total Downloads
- Downloads (Last 12 months)30
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Query type classification for web document retrieval

SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Context-aware query classification

Modeling anchor text and classifying queries to enhance web document retrieval

Varying approaches to topical web query classification