skip to main content
article
Free Access

Efficient passage ranking for document databases

Published:01 October 1999Publication History
Skip Abstract Section

Abstract

Queries to text collections are resolved by ranking the documents in the collection and returning the highest-scoring documents to the user. An alternative retrieval method is to rank passages, that is, short fragments of documents, a strategy that can improve effectiveness and identify relevant material in documents that are too large for users to consider as a whole. However, ranking of passages can considerably increase retrieval costs. In this article we explore alternative query evaluation techniques, and develop new tecnhiques for evaluating queries on passages. We show experimentally that, appropriately implemented, effective passage retrieval is practical in limited memory on a desktop machine. Compared to passage ranking with adaptations of current document ranking algorithms, our new “DO-TOS” passage-ranking algorithm requires only a fraction of the resources, at the cost of a small loss of effectiveness.

References

  1. ALLAN, J. 1995. Relevance feedback with too much data. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Re-trieval (SIGIR '95, Seattle, WA, July 9-13), E. A. Fox, P. Ingwersen, and R. Fidel, Eds. ACM Press, New York, NY, 337-343. Google ScholarGoogle Scholar
  2. ANH,V.N.AND MOFFAT, A. 1998. Compressed inverted files with reduced decoding overheads. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '98, Melbourne, Australia, Aug. 24-28), W. B. Croft, A. Moffat, C. J. van Rijsbergen, R. Wilkinson, and J. Zobel, Eds. ACM Press, New York, NY, 290-297. Google ScholarGoogle Scholar
  3. BELL,T.C.,MOFFAT, A., WITTEN,I.H.,AND ZOBEL, J. 1995. The MG retrieval system: Compressing for space and speed. Commun. ACM 38, 4 (Apr. 1995), 41-42. Google ScholarGoogle Scholar
  4. BERTINO, E., OOI, B., SACKS-DAVIS, R., TAN, K.-L., AND ZOBEL, J. 1997. Text databases. In Indexing Techniques for Advanced Database Systems Kluwer Academic Publishers, Hing-ham, MA.Google ScholarGoogle Scholar
  5. BROWN, E. W. 1995. Fast evaluation of structured queries for information retrieval. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '95, Seattle, WA, July 9-13), E. A. Fox, P. Ingwersen, and R. Fidel, Eds. ACM Press, New York, NY, 30-38. Google ScholarGoogle Scholar
  6. BUCKLEY,C.AND LEWIT, A. F. 1985. Optimization of inverted vector searches. In Proceedings of the eighth annual international ACM SIGIR conference on Research and Development in Information Retrieval (SIGIR '85, Montr~al, P.Q., Canada, June 5-7, 1985), J. M. Tague, Ed. ACM Press, New York, NY, 97-110. Google ScholarGoogle Scholar
  7. CALLAN, J. P. 1994. Passage-level evidence in document retrieval. In Proceedings of the 17th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR '94, Dublin, Ireland, July 3-6), W. B. Croft and C. J. van Rijsbergen, Eds. Springer-Verlag, New York, NY, 302-310. Google ScholarGoogle Scholar
  8. CLARKE,C.L.A.,CORMACK,G.V.,AND BURKOWSKI, F. J. 1995. Shortest substring ranking MultiText experiments for TREC-4. In Proceedings of the 4th Text Retrieval Conference (TREC-4, Washington, D.C., Nov.), D. K. Harman, Ed. National Institute of Standards and Technology, Gaithersburg, MD, 295-304.Google ScholarGoogle Scholar
  9. CLARKE, C., CORMACK, G., AND TUDHOPE, E. 1997. Relevance ranking for one to three term queries. In Proceedings of the 5th RIAO Conference 388-412.Google ScholarGoogle Scholar
  10. CORMACK, G., PALMER, C., BIESBROUCK, M., AND CLARKE, C. 1998. Deriving very short queries for high precision and recall. In Proceedings of the 7th Text Retreival Conference (TREC-7)Google ScholarGoogle Scholar
  11. FRAKES,W.B.AND BAEZA-YATES, R., Eds. 1992. Information Retrieval: Data Structures and Algorithms. Prentice-Hall, Inc., Upper Saddle River, NJ. Google ScholarGoogle Scholar
  12. FULLER, M., KASZKIEL, M., KIM, D., NG, C., ROBERTSON, J., WILKINSON, R., WU, M., AND ZOBEL, J. 1998. TREC 7 ad hoc, speech, and interactive tracks at MDS/CSIRO. In Proceedings of the 7th Text Retreival Conference (TREC-7)Google ScholarGoogle Scholar
  13. FULLER, M., KASZKIEL, M., NG, C., VINES, P., WILKINSON, R., AND ZOBEL, J. 1997. MDS TREC 6 report. In Proceedings of the 6th Text Retreival Conference (TREC-6, Nov.), E. Voorhees and D. Harman, Eds. 241-258.Google ScholarGoogle Scholar
  14. HARMAN, D. K. 1995. Overview of the second text retrieval conference (TREC-2). Inf. Process. Manage. 31, 3 (May-June), 271-289. Google ScholarGoogle Scholar
  15. HARMAN,D.AND CANDELA, G. 1990. Retrieving records from a gigabyte of text on a minicomputer using statistical ranking. J. Am. Soc. Inf. Sci. 41, 8, 581-589.Google ScholarGoogle Scholar
  16. HEARST,M.A.AND PLAUNT, C. 1993. Subtopic structuring for full-length document access. In Proceedings of the 16th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR '93, Pittsburgh, PA, June 27-July), R. Korfhage, E. Rasmussen, and P. Willett, Eds. ACM Press, New York, NY, 59-68. Google ScholarGoogle Scholar
  17. KASZKIEL,M.AND ZOBEL, J. 1997. Passage retrieval revisited. In Proceedings of the 20th Annual International ACM Conference on Research and Development in Information Re-trieval (SIGIR '97, Philadelphia, PA, July 27-31), N. J. Belkin, A. D. Narasimhalu, P. Willett, W. Hersh, F. Can, and E. Voorhees, Eds, ACM Press, New York, NY, 178-185. Google ScholarGoogle Scholar
  18. MITRA, M., SINGHAL, A., AND BUCKLEY, C. 1998. Improving automatic query expansion. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '98, Melbourne, Australia, Aug. 24-28), W. B. Croft, A. Moffat, C. J. van Rijsbergen, R. Wilkinson, and J. Zobel, Eds. ACM Press, New York, NY, 206-214. Google ScholarGoogle Scholar
  19. MITTENDORF,E.AND SCH~UBLE, P. 1994. Document and passage retrieval based on hidden Markov models. In Proceedings of the 17th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR '94, Dublin, Ireland, July 3-6), W. B. Croft and C. J. van Rijsbergen, Eds. Springer-Verlag, New York, NY, 318-327. Google ScholarGoogle Scholar
  20. MOFFAT,A.AND ZOBEL, J. 1996. Self-indexing inverted files for fast text retrieval. ACM Trans. Inf. Syst. 14, 4, 349-379. Google ScholarGoogle Scholar
  21. MOFFAT, A., ZOBEL, J., AND KLEIN, S. 1995. Improved inverted file processing for large text databases. In Proceedings of the 6th Australasian Database Conference (Adelaide, Jan.), R. Sacks-Davis and J. Zobel, Eds. 162-171.Google ScholarGoogle Scholar
  22. PERSIN, M. 1996. Efficient implementation of text retrieval techniques. RMIT, Melbourne, Australia.Google ScholarGoogle Scholar
  23. PERSIN, M., ZOBEL, J., AND SACKS-DAVIS, R. 1996. Filtered document retrieval with frequency-sorted indexes. J. Am. Soc. Inf. Sci. 47, 10, 749-764. Google ScholarGoogle Scholar
  24. SALTON, G. 1989. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley Series in Computer Science. Addison-Wesley Longman Publ. Co., Inc., Reading, MA. Google ScholarGoogle Scholar
  25. SALTON,G.AND BUCKLEY, C. 1991. Automatic text structuring and retrieval-experiments in automatic encyclopedia searching. In Procedings of the 14th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '91, Chicago, IL, Oct. 13-16), E. Fox, Ed. ACM Press, New York, NY, 21-30. Google ScholarGoogle Scholar

Index Terms

  1. Efficient passage ranking for document databases

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader