Abstract
Queries to text collections are resolved by ranking the documents in the collection and returning the highest-scoring documents to the user. An alternative retrieval method is to rank passages, that is, short fragments of documents, a strategy that can improve effectiveness and identify relevant material in documents that are too large for users to consider as a whole. However, ranking of passages can considerably increase retrieval costs. In this article we explore alternative query evaluation techniques, and develop new tecnhiques for evaluating queries on passages. We show experimentally that, appropriately implemented, effective passage retrieval is practical in limited memory on a desktop machine. Compared to passage ranking with adaptations of current document ranking algorithms, our new “DO-TOS” passage-ranking algorithm requires only a fraction of the resources, at the cost of a small loss of effectiveness.
- ALLAN, J. 1995. Relevance feedback with too much data. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Re-trieval (SIGIR '95, Seattle, WA, July 9-13), E. A. Fox, P. Ingwersen, and R. Fidel, Eds. ACM Press, New York, NY, 337-343. Google Scholar
- ANH,V.N.AND MOFFAT, A. 1998. Compressed inverted files with reduced decoding overheads. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '98, Melbourne, Australia, Aug. 24-28), W. B. Croft, A. Moffat, C. J. van Rijsbergen, R. Wilkinson, and J. Zobel, Eds. ACM Press, New York, NY, 290-297. Google Scholar
- BELL,T.C.,MOFFAT, A., WITTEN,I.H.,AND ZOBEL, J. 1995. The MG retrieval system: Compressing for space and speed. Commun. ACM 38, 4 (Apr. 1995), 41-42. Google Scholar
- BERTINO, E., OOI, B., SACKS-DAVIS, R., TAN, K.-L., AND ZOBEL, J. 1997. Text databases. In Indexing Techniques for Advanced Database Systems Kluwer Academic Publishers, Hing-ham, MA.Google Scholar
- BROWN, E. W. 1995. Fast evaluation of structured queries for information retrieval. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '95, Seattle, WA, July 9-13), E. A. Fox, P. Ingwersen, and R. Fidel, Eds. ACM Press, New York, NY, 30-38. Google Scholar
- BUCKLEY,C.AND LEWIT, A. F. 1985. Optimization of inverted vector searches. In Proceedings of the eighth annual international ACM SIGIR conference on Research and Development in Information Retrieval (SIGIR '85, Montr~al, P.Q., Canada, June 5-7, 1985), J. M. Tague, Ed. ACM Press, New York, NY, 97-110. Google Scholar
- CALLAN, J. P. 1994. Passage-level evidence in document retrieval. In Proceedings of the 17th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR '94, Dublin, Ireland, July 3-6), W. B. Croft and C. J. van Rijsbergen, Eds. Springer-Verlag, New York, NY, 302-310. Google Scholar
- CLARKE,C.L.A.,CORMACK,G.V.,AND BURKOWSKI, F. J. 1995. Shortest substring ranking MultiText experiments for TREC-4. In Proceedings of the 4th Text Retrieval Conference (TREC-4, Washington, D.C., Nov.), D. K. Harman, Ed. National Institute of Standards and Technology, Gaithersburg, MD, 295-304.Google Scholar
- CLARKE, C., CORMACK, G., AND TUDHOPE, E. 1997. Relevance ranking for one to three term queries. In Proceedings of the 5th RIAO Conference 388-412.Google Scholar
- CORMACK, G., PALMER, C., BIESBROUCK, M., AND CLARKE, C. 1998. Deriving very short queries for high precision and recall. In Proceedings of the 7th Text Retreival Conference (TREC-7)Google Scholar
- FRAKES,W.B.AND BAEZA-YATES, R., Eds. 1992. Information Retrieval: Data Structures and Algorithms. Prentice-Hall, Inc., Upper Saddle River, NJ. Google Scholar
- FULLER, M., KASZKIEL, M., KIM, D., NG, C., ROBERTSON, J., WILKINSON, R., WU, M., AND ZOBEL, J. 1998. TREC 7 ad hoc, speech, and interactive tracks at MDS/CSIRO. In Proceedings of the 7th Text Retreival Conference (TREC-7)Google Scholar
- FULLER, M., KASZKIEL, M., NG, C., VINES, P., WILKINSON, R., AND ZOBEL, J. 1997. MDS TREC 6 report. In Proceedings of the 6th Text Retreival Conference (TREC-6, Nov.), E. Voorhees and D. Harman, Eds. 241-258.Google Scholar
- HARMAN, D. K. 1995. Overview of the second text retrieval conference (TREC-2). Inf. Process. Manage. 31, 3 (May-June), 271-289. Google Scholar
- HARMAN,D.AND CANDELA, G. 1990. Retrieving records from a gigabyte of text on a minicomputer using statistical ranking. J. Am. Soc. Inf. Sci. 41, 8, 581-589.Google Scholar
- HEARST,M.A.AND PLAUNT, C. 1993. Subtopic structuring for full-length document access. In Proceedings of the 16th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR '93, Pittsburgh, PA, June 27-July), R. Korfhage, E. Rasmussen, and P. Willett, Eds. ACM Press, New York, NY, 59-68. Google Scholar
- KASZKIEL,M.AND ZOBEL, J. 1997. Passage retrieval revisited. In Proceedings of the 20th Annual International ACM Conference on Research and Development in Information Re-trieval (SIGIR '97, Philadelphia, PA, July 27-31), N. J. Belkin, A. D. Narasimhalu, P. Willett, W. Hersh, F. Can, and E. Voorhees, Eds, ACM Press, New York, NY, 178-185. Google Scholar
- MITRA, M., SINGHAL, A., AND BUCKLEY, C. 1998. Improving automatic query expansion. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '98, Melbourne, Australia, Aug. 24-28), W. B. Croft, A. Moffat, C. J. van Rijsbergen, R. Wilkinson, and J. Zobel, Eds. ACM Press, New York, NY, 206-214. Google Scholar
- MITTENDORF,E.AND SCH~UBLE, P. 1994. Document and passage retrieval based on hidden Markov models. In Proceedings of the 17th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR '94, Dublin, Ireland, July 3-6), W. B. Croft and C. J. van Rijsbergen, Eds. Springer-Verlag, New York, NY, 318-327. Google Scholar
- MOFFAT,A.AND ZOBEL, J. 1996. Self-indexing inverted files for fast text retrieval. ACM Trans. Inf. Syst. 14, 4, 349-379. Google Scholar
- MOFFAT, A., ZOBEL, J., AND KLEIN, S. 1995. Improved inverted file processing for large text databases. In Proceedings of the 6th Australasian Database Conference (Adelaide, Jan.), R. Sacks-Davis and J. Zobel, Eds. 162-171.Google Scholar
- PERSIN, M. 1996. Efficient implementation of text retrieval techniques. RMIT, Melbourne, Australia.Google Scholar
- PERSIN, M., ZOBEL, J., AND SACKS-DAVIS, R. 1996. Filtered document retrieval with frequency-sorted indexes. J. Am. Soc. Inf. Sci. 47, 10, 749-764. Google Scholar
- SALTON, G. 1989. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley Series in Computer Science. Addison-Wesley Longman Publ. Co., Inc., Reading, MA. Google Scholar
- SALTON,G.AND BUCKLEY, C. 1991. Automatic text structuring and retrieval-experiments in automatic encyclopedia searching. In Procedings of the 14th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '91, Chicago, IL, Oct. 13-16), E. Fox, Ed. ACM Press, New York, NY, 21-30. Google Scholar
Index Terms
- Efficient passage ranking for document databases
Recommendations
Enhancing relevance models with adaptive passage retrieval
ECIR'08: Proceedings of the IR research, 30th European conference on Advances in information retrievalPassage retrieval and pseudo relevance feedback/query expansion have been reported as two effective means for improving document retrieval in literature. Relevance models, while improving retrieval in most cases, hurts performance on some heterogeneous ...
A passage-based approach to learning to rank documents
AbstractAccording to common relevance-judgments regimes, such as TREC’s, a document can be deemed relevant to a query even if it contains a very short passage of text with pertinent information. This fact has motivated work on passage-based document ...
Completely-arbitrary passage retrieval in language modeling approach
AIRS'08: Proceedings of the 4th Asia information retrieval conference on Information retrieval technologyPassage retrieval has been expected to be an alternative method to re-solve length-normalization problem, since passages have more uniform lengths and topics, than documents. An important issue in the passage retrieval is to determine the type of the ...
Comments