ABSTRACT
In static full-text retrieval systems, which accommodate metrical as well as Boolean operators, the traditional approach to query processing uses a “concordance”, from which large sets of coordinates are retrieved and then merged and/or collated. Alternatively, in a system with l documents, the concordance can be replaced by a set of bit-maps of fixed length l, which are constructed for every different word of the database and serve as occurrence maps. We propose to combine the concordance and bit-map approaches, and show how this can speed up the processing of queries: fast ANDing and ORing of the maps in a preprocessing stage, lead to large I/O savings in collating coordinates of keywords needed to satisfy the metrical and Boolean constraints. Moreover, the bit-maps give partial information on the distribution of the coordinates of the keywords, which can be used when queries must be processed by stages, due to their complexity and the sizes of the involved sets of coordinates. The new techniques are partially implemented at the Responsa Retrieval Project.
- 1.Attar R., Choueka Y., Dershowitz N., Fraenkel A.S., KEDMA Linguistic tools for retrieval systems, J. A CM 25 (1978) 52-66. Google ScholarDigital Library
- 2.Boyer R.S., Moore J.S., A fast string searching algorithm, Comm. ACM 20 (1977) 762-772. Google ScholarDigital Library
- 3.Bratley P., Choueka Y., Processing truncated terms in document retrieval systems, Inf. Processing ~ Management 18 (1982) 257-266.Google Scholar
- 4.Choueka Y., Full text systems and research in the humanities, Computers and the Humanities XIV (1980) 153-169.Google ScholarCross Ref
- 5.Choueka Y., Fraenkel A.S., Klein S.T.~ Segal E., Improved hierarchical bit-vector compression in document retrieval systems, Proc. 9-th A CM-SIGIR Conf., Pisa; ACM, Baltimore, MD (1986) 88-97. Google ScholarDigital Library
- 6.Davis D.R., Lin A.D., Secondary key retrieval using an IBM 7090-1301 system, Comm. A CM 8 (1965) 243-246. Google ScholarDigital Library
- 7.Fraenkel A.S., All about the Responsa Retrieval Project you always wanted to know but were afraid to ask, Expanded Summary, Jurimetrics J. 16 (1976) 149- 156.Google Scholar
- 8.Fraenkel A.S., Klein S.T., Novel compression of sparse bit-strings ~ preliminary report, Combinatorial Algo. rithms on Words, NATO ASI Series Vol. F12, Springer Verlag, Berlin (1985) 169-183.Google Scholar
- 9.Heaps P., Information Retrieval, Computational and Theoretical Aspects, Academic Press, New York (1978). Google ScholarDigital Library
- 10.Reingold E.M., Nievergelt J., Deo N.~ Combinatorial Algorithms: Theory and Practice, Prentice-Hall, Inc., Englewood Cliffs, NJ (1977). Google ScholarDigital Library
Index Terms
- Improved techniques for processing queries in full-text systems
Recommendations
Approaches to passage retrieval in full text information systems
SIGIR '93: Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrievalLarge collections of full-text documents are now commonly used in automated information retrieval. When the stored document texts are long, the retrieval of complete documents may not be in the users' best interest. In such circumstance, efficient and ...
Processing content-oriented XPath queries
CIKM '04: Proceedings of the thirteenth ACM international conference on Information and knowledge managementDocument-centric XML collections contain text-rich documents, marked up with XML tags that add lightweight semantics to the text. Querying such collections calls for a hybrid query language: the text-rich nature of the documents suggests a content-...
Comments