skip to main content
10.1145/253168.253190acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article
Free Access

Improved hierarchical bit-vector compression in document retrieval systems

Authors Info & Claims
Published:01 September 1986Publication History

ABSTRACT

The “concordance” of an information retrieval system can often be stored in form of bit-maps, which are usually very sparse and should be compressed. Hierarchical bit-vector compression consists of partitioning a vector vi into equi-sized blocks, constructing a new bit-vector vi+1 which points to the non-zero blocks in vi, dropping the zero-blocks of vi, and repeating the process for vi+1. We refine the method by pruning some of the tree branches if they ultimately point to very few documents; these document numbers are then added to an appended list which is compressed by the prefix-omission technique. The new method was thoroughly tested on the bit-maps of the Responsa Retrieval Project, and gave a relative improvement of about 40% over the conventional hierarchical compression method.

References

  1. 1.Choueka Y., Full text systems and Research in the Humanities, Computer8 and the Humanities X_IV (1980) 153-169.Google ScholarGoogle Scholar
  2. 2.Davis D.R., Lin A.D., Secondary key retrieval using an IBM 7090-1301 system, Comm. of the A CM 8 (1965) 243-246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3.Fraenkel A.S., All about the Responsa Retrieval Project you always wanted to know but were afraid to ask, Expanded Summary, Jurimetric8 J. 16 (1976) 149- 156.Google ScholarGoogle Scholar
  4. 4.Fraenkel A.S., Klein S.T., Novel Compression of sparse Bit-Strings-- Preliminaxy Report, Combinatorial Algorithnts on Wo#ds, NATO ASI Series Vol F12, Springer Verlag, Berlin (1985) 169-183.Google ScholarGoogle Scholar
  5. 5.Jakobsson M., Huffman coding in Bit- Vector Compression, Inf. Proc. Letters T (1978) 304-307.Google ScholarGoogle Scholar
  6. 6.J akobsson M., Evaluation of s Hierarchical Bit-Vector Compression Technique, Inf. Proc. Letters 14 (1982) 147-149.Google ScholarGoogle ScholarCross RefCross Ref
  7. 7.Knuth D.E., The Art of Computer Programming, Vol. II, Seml-numerical Algorithms, Addison-Wesley, Reading, Mass. (1973). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8.Nevalainen O., Jakobsson M., Berg R., Compression of clustered inverted files, in Proc. of the 7-th Syrup. on Math. Foundations of Comp. Sc., Zakopane, Poland (1978) 393-402.Google ScholarGoogle Scholar
  9. 9.Schuegraf E.J., Compression of large in-. verted files with hyperbolic term distribution, Inf. Proc. and Management 12 (1976) 377-384.Google ScholarGoogle ScholarCross RefCross Ref
  10. 10.Teuhola 3., A Compression method for Clustered Bit-Vectors, inf. Proc. Letter8 7 (1978) 308-311.Google ScholarGoogle Scholar
  11. 11.Vallarino O., On the use of bit-maps for multiple key retrieval, SIGPLAN Notices, Special Issue Vol. II (1976) 108-114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12.Wedekind H., H#rder T., Datenbank- 8#teme II, B.-I. Wissenschaftsverlag, Mannheim (1976).Google ScholarGoogle Scholar
  1. Improved hierarchical bit-vector compression in document retrieval systems

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGIR '86: Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval
      September 1986
      283 pages
      ISBN:0897911873
      DOI:10.1145/253168

      Copyright © 1986 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 September 1986

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate792of3,983submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader