ABSTRACT
This paper presents a new data structure and an associated strategy to be utilized by indexing facilities for text retrieval systems. The paper starts by reviewing some of the goals that may be considered when designing such an index and continues with a small survey of various current strategies. It then presents an indexing strategy referred to as surrogate subsets discussing its appropriateness in the light of the specified goals. Various design issues and implementation details are discussed. Our strategy requires that a surrogate file be divided into a large number of subsets separated by free space which will allow the index to expand when new material is appended to the database. Experimental results report on the utilization of free space when the database is enlarged.
- BAY72.BAYER, R. AND MCCREIGHT, E., Organization and maintenance of large ordered indexes, Acta Informatica, vol. 1, no. 3, 1972, pp. 173-189. ~Google ScholarDigital Library
- BER87.BERRA, P. B., CHUNG, S. M. AND HACHEM, N. I., Computer architecture for a surrogate file to a very large data / knowledge base, IEEE Computer, vol. 20, no. 3, 1987, pp. 25-32. Google ScholarDigital Library
- CHR84.CHRISTODOULAKIS, S. AND FALOUTSOS, C., Design considerations for a message file server, IEEE Trans. Sofn#,are Engineering, Vol. SE-10, No. 2, Mar. 1984, pp. 201-210.Google ScholarDigital Library
- CHR86.CHRISTODOULAKIS, S. AND FALOUTSOS, C., Design and performance considerations for an optical disk-based, multimedia object server, Computer, Vol. 19, No. 12, Dec. 1986, pp. 45-56. Google ScholarDigital Library
- DEF88.DEFAZIO, S. AND GREENWALD, C., The Mead information retrieval system, IEEE Compcon 88, Feb. 1988, pp. 431.Google Scholar
- DEF89.DEFAZIO, S., Private communication.Google Scholar
- FAL84.FALOUTSOS, C. AND CHRISTODOULAKIS, S., Signature f'des: an access method for documents and its analytical performance evaluation, A CM Transactions on Office information Systems, Vol. 2, No. 4, Oct. 1984, pp.267-288. Google ScholarDigital Library
- FAL85.FALOUTSOS, C., Access methods for text, Computing Surveys, Vol. 17, No. 1, Mar. 1985, pp. 49-74. Google ScholarDigital Library
- FAL87D.FALOUTSOS, C. AND CHAN, R., Fast text access methods for optical disks- designs and performance comparison, UMIACS-TR-87-66, CS-TR- 1958, Dept. of Comp. Sci. and Inst. for Adv. Comp. Studies, Univ. of Maryland, Dec. 1987, 29 pages.Google Scholar
- FAL87S.FALOUTSOS, C. AND CHRISTOIX)ULAKIS, S., Optimal signature extraction and information loss, ACM Transactions on Database Systems, Vol. 12, No.3, Sept. 1987, pp. 395-428. Google ScholarDigital Library
- HAS81.HASKIN, R. L., Special purpose processors for text retrieval, Database Engineering, Vol. 4, No. 1, Sept. 1981, pp. 16-29.Google Scholar
- LAR83.LARSON, P. A., A method for speeding up text retrieval, Proceedings of ACM SIGMOD Conference, May, ACM, New York, 1983, pp. 117-123. Google ScholarDigital Library
- ROB79.ROBERTS, C. S., Partial-match retrieval via the method of superimposed codes, Proc. IEEE, 67,12, Dec. 1979, 1624-1642.Google ScholarCross Ref
- SAL86.S ALTON, G., Another look at automatic text retrieval systems, Communications of the ACM, Vol. 29, No. 7, July 1986, pp. 648-656. Google ScholarDigital Library
- STA86.STANFILL, C. AND KAHLE, t}., Parallel free-text search on the connection machine system, Communications of the ACM, Vol. 29, No. 12, Dec. 1986, pp. 1229-1239. Google ScholarDigital Library
- TEO82.TEORY, T. J. AND FRY, J. P., Design of Database Structures, Prentice Hall, Englewood Cliffs, New Jersey, 1982. Google ScholarDigital Library
- TSI83.TSICHRITZIS D. AND CHRISTODOLR.AKiS, S., Message files, ACM Trans. Office Information Systems, Vol. 1, No. 1, Jan. 1983, pp. 88-98. Google ScholarDigital Library
Index Terms
- Surrogate subsets: a free space management strategy for the index of a text retrieval system
Recommendations
Inverted index maintenance strategy for flashSSDs
An inverted index is a core data structure of Information Retrieval systems, especially in search engines. Since the search environments have become more dynamic, many on-line index maintenance strategies have been proposed. Previous strategies were ...
Fast Forward Index Methods for Pseudo-Relevance Feedback Retrieval
The inverted index is the dominant indexing method in information retrieval systems. It enables fast return of the list of all documents containing a given query term. However, for retrieval schemes involving query expansion, as in pseudo-relevance ...
A new full-text indexing model with low space overhead for chinese text retrieval
Text retrieval systems require an index to allow efficient retrieval of documents at the cost of some storage overhead. This paper proposes a novel full-text indexing model for Chinese text retrieval based on the concept of adjacency matrix of directed ...
Comments