Article

Efficient query evaluation using a two-level retrieval process

Authors:
Andrei Z. Broder

IBM Watson Research Center

IBM Watson Research Center
View Profile

,
David Carmel

IBM Research Lab in Haifa, Haifa, ISRAEL

IBM Research Lab in Haifa, Haifa, ISRAEL
View Profile

,
Michael Herscovici

IBM Research Lab in Haifa, Haifa, ISRAEL

IBM Research Lab in Haifa, Haifa, ISRAEL
View Profile

,
Aya Soffer

IBM Research Lab in Haifa, Haifa, ISRAEL

IBM Research Lab in Haifa, Haifa, ISRAEL
View Profile

,
Jason Zien

IBM Almaden Research Center

IBM Almaden Research Center
View Profile

CIKM '03: Proceedings of the twelfth international conference on Information and knowledge managementNovember 2003Pages 426–434https://doi.org/10.1145/956863.956944

Published:03 November 2003Publication History

CIKM '03: Proceedings of the twelfth international conference on Information and knowledge management

Pages 426–434

ABSTRACT

We present an efficient query evaluation method based on a two level approach: at the first level, our method iterates in parallel over query term postings and identifies candidate documents using an approximate evaluation taking into account only partial information on term occurrences and no query independent factors; at the second level, promising candidates are fully evaluated and their exact scores are computed. The efficiency of the evaluation process can be improved significantly using dynamic pruning techniques with very little cost in effectiveness. The amount of pruning can be controlled by the user as a function of time allocated for query evaluation. Experimentally, using the TREC Web Track data, we have determined that our algorithm significantly reduces the total number of full evaluations by more than 90%, almost without any loss in precision or recall. At the heart of our approach there is an efficient implementation of a new Boolean construct called WAND or Weak AND that might be of independent interest.

References

Altavista. http://www.altavista.com.Google Scholar
L. Bahl, P. de Souza, P. Gopalakrishnan, D. Nahamoo, and M. Picheny. A fast match for continuous speech recognition using allophonic models. In Proceedings of IEEE ICASSP, pages 17--20, March 1992.Google ScholarCross Ref
K. Beyer, A. Jhingran, B. Lyle, S. Rajagopalan, and E. Shekita. Pivot join: A runtime operator for text search. Submitted for publication, 2003.Google Scholar
B. H. Bloom. Space time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422--426, July 1970. Google ScholarDigital Library
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. WWW7/C omputer Networks and ISDN Systems, 30:107--117, April 1998. Google ScholarDigital Library
C. Buckley and A. F. Lewit. Optimization of inverted vector searches. In Proceedings of the Eighth International ACM-SIGIR Conference, pages 97--110, Montreal, Canada, June 1985. Google ScholarDigital Library
M. Burrows. Object-oriented interface for an index. US Patent 5,809,502, 1998.Google Scholar
M. Burrows. Method for parsing, indexing and searching world-wide-web pages. US Patent 5,864,863, 1999.Google Scholar
D. Carmel, E. Amitay, M. Herscovici, Y. S. Maarek, Y. Petruschka, and A. Soffer. Juru at TREC 10 - Experiments with Index Pruning. In Proceeding of Tenth Text REtrieval Conference (TREC-10). National Institute of Standards and Technology (NIST), 2001.Google Scholar
C. Clarke, G. Cormack, and F. Burkowski. Shortest substring ranking. In Proceedings of the Fourth Text Retrieval Conference (TREC-4). National Institute of Standards and Technology (NIST), November 1995.Google Scholar
Google. http://www.google.com.Google Scholar
D. Harman and G. Candela. Retrieving records from a gigabyte of text on a minicomputer using statistical ranking. Journal of the American Society of Information Science, 41(8):581--589, 1990.Google ScholarCross Ref
D. Hawking and N. Craswell. Overview of the TREC-2001 Web Track. In E. M. Voorhees and D. K. Harman, editors, Proceedings of the Tenth Text Retrieval Conference (TREC-10). National Institute of Standards and Technology (NIST), 2001.Google Scholar
M. Kaszkiel and J. Zobel. Term-ordered query evaluation versus document-ordered query evaluation for large document databases. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 343--344, Melbourne, Austrailia, August 1998. Google ScholarDigital Library
Y. Maarek and F. Smadja. Full text indexing based on lexical relations: An application: Software libraries. In Proceedings of the Twelfth International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 198--206, Cambridge, MA, June 1989. Google ScholarDigital Library
A. Moffat and J. Zobel. Fast ranking in limited space. In Proceedings of the 10th IEEE International Conference on Data Engineering, pages 428--4376, Houston, TX, February 1994. Google ScholarDigital Library
J. K. Mullin. Optimal semijoins for distributed database systems. IEEE Transactions on Software Engineering, 16(5):558, May 1990. Google ScholarDigital Library
M. Persin. Document filtering for fast ranking. In Proceedings of the Seventeenth International ACM-SIGIR Conference on Research and Development in Information Retrieval, pages 339--348, Dublin, Ireland, July 1994. Google ScholarDigital Library
G. Salton, E. A. Fox, and H. Wu. Extended boolean information retrieval. Communications of the ACM, 26(11):1022--1036, 1983. Google ScholarDigital Library
H. Turtle and J. Flood. Query evaluation: Strategies and optimizations. Information Processing and Management, 31(6):831--850, 1995. Google ScholarDigital Library
A. N. Vo and A. Moffat. Compressed inverted files with reduced decoding overheads. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 290--297, Melbourne, Australia, August 1998. Google ScholarDigital Library
E. M. Voorhees and D. K. Harman. Overview of the Tenth Text REtrieval Conference (TREC-10). In Proceedings of the Tenth Text Retrieval Conference (TREC-10). National Institute of Standards and Technology (NIST), 2001.Google ScholarCross Ref

Index Terms

Efficient query evaluation using a two-level retrieval process
1. Information systems
  1. Information retrieval

Recommendations

Query evaluation using overlapping views: completeness and efficiency
SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of data

We study the problem of finding efficient equivalent view-based rewritings of relational queries, focusing on query optimization using materialized views under the assumption that base relations cannot contain duplicate tuples. A lot of work in the ...
Read More
Efficient Top-k Query Answering through its Top-N Rewritings Using Views
PIKM '15: Proceedings of the 8th Workshop on Ph.D. Workshop in Information and Knowledge Management

Recently, various algorithms were proposed to speed up top-k query answering by using multiple materialized query results. Nevertheless, for most of the proposed algorithms, a potentially costly view selection operation is required. In fact, the ...
Read More
A Comparison of Document-at-a-Time and Score-at-a-Time Query Evaluation
WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining

We present an empirical comparison between document-at-a-time (DaaT) and score-at-a-time (SaaT) document ranking strategies within a common framework. Although both strategies have been extensively explored, the literature lacks a fair, direct ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '03: Proceedings of the twelfth international conference on Information and knowledge management
November 2003
592 pages
ISBN:1581137230
DOI:10.1145/956863
General Chair:
Donald Kraft
Louisiana State University
,
Program Chairs:
Ophir Frieder
Illinois Institute of Technology
,
Joachim Hammer
University of Florida
,
Sajda Qureshi
University of Nebraska, Omaha
,
Len Seligman
The MITRE Corporation
Copyright © 2003 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 November 2003
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
WAND
document-at-a-time
efficient query evaluation
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 285
  Total Citations
  View Citations
- 2,858
  Total Downloads
- Downloads (Last 12 months)180
- Downloads (Last 6 weeks)29
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Efficient query evaluation using a two-level retrieval process

CIKM '03: Proceedings of the twelfth international conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Query evaluation using overlapping views: completeness and efficiency

Efficient Top-k Query Answering through its Top-N Rewritings Using Views

A Comparison of Document-at-a-Time and Score-at-a-Time Query Evaluation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Efficient query evaluation using a two-level retrieval process

CIKM '03: Proceedings of the twelfth international conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Query evaluation using overlapping views: completeness and efficiency

Efficient Top-k Query Answering through its Top-N Rewritings Using Views

A Comparison of Document-at-a-Time and Score-at-a-Time Query Evaluation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media