skip to main content
10.1145/133160.133210acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article
Free Access

Parameterised compression for sparse bitmaps

Authors Info & Claims
Published:01 June 1992Publication History

ABSTRACT

Full-text retrieval systems often use either a bitmap or an inverted file to identify which documents contain which terms, so that the documents containing any combination of query terms can be quickly located. Bitmaps of term occurrences are large, but are usually sparse, and thus are amenable to a variety of compression techniques. Here we consider techniques in which the encoding of each bitvector within the bitmap is parameterised, so that a different code can be used for each bitvector. Our experimental results show that the new methods yield better compression than previous techniques.

References

  1. 1.A. Bookstein and S.T. Klein. Compression of correlated bit-vectors. Information Systems, 16(4):387-400, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2.A. Bookstein and S.T. Klein. Flexible compression for bitmap sets. In j.A. Storer and J.H. Reif, editors~ Proc. IEEE Data Compression Conference, pages 402-410, Snowbird, Utah, April 1991.Google ScholarGoogle ScholarCross RefCross Ref
  3. 3.A. Bookstein and S.T. Klein. Generative roodels for bitmap sets with compression applications. In Proc. 1~ ~th ACM-SIGIR Conference on Information Retrieval, pages 63-71, Chicago, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4.A. Bookstein, 8.T. Klein, and T. Raita. Model based concordance compression. In J.A. Storer and J.H. Reif, editors, Proc. IEEE Data Compression Conference, Snowbird, Utah, March 1992. To appear.Google ScholarGoogle Scholar
  5. 5.Y. Choueka, A.S. Fraenkel, and S.T. Klein. Compression of concordances in full-text retrieval systems. In Proc. 11'th A CM-SIGIR Conference on Information Retrieval, pages 597-612, Grenoble, France, June 1988. ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6.Y. Choueka, A.S. Fraenkel, S.T. Klein, and E. Segal. Improved hierarchical bit-vector compression in document retrieval systems. In Proc. 9'th A CM-SIGIR Conference on Information Retrieval, pages 88-97, Pisa, Italy, September 1986. ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7.W.B. Croft and P. Savino. Implementing ranking strategies using text signatures. A CM Transactions on Office Information Systems, 6( 1):42-62, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8.P. Elias. Universal codeword sets and representations of the integers. IEEE Transactions on Information Theory, IT-21:194-203, March 1975.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. 9.A.S. Fraenkel and $.T. Klein. Novel compression of sparse bit-strings~Preliminary report. In A. Apostolico and Z. Galil, editors, Combinatorial Algorithms on Words, Volume 12, NATO ASI Series F, pages 169-183, Berlin, 1985. Springer-Verlag.Google ScholarGoogle ScholarCross RefCross Ref
  10. 10.R.G. Gallager and D.C. Van Voorhis. OptimM source codes for geometrically distributed alphabets. IEEE Transactions on Information Theory, IT-21(2):228-230, March 1975.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11.S.W. Golomb. Run-length encodings. IEEE Transactions on Information Theory, IT- 12(3):399-401, July 1966.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12.D.A. Huffman. A method for the construction of minimum redundancy codes. Proc. IRE, 40(9):1098-1101, September 1952.Google ScholarGoogle ScholarCross RefCross Ref
  13. 13.M. Jakobsson. Huffman coding in bit-vector compression. Information Processing Letters, 7(6):304-307, October 1978.Google ScholarGoogle ScholarCross RefCross Ref
  14. 14.M. Lesk. Some applications of inverted indexes on the Unix system. In Unix Programmers Manual, Volume 214. Bell Laboratories, Murray Hill, New Jersey, 1988.Google ScholarGoogle Scholar
  15. 15.M.D. Mcilroy. Development of a spelling fist. IEEE Transactions on Communications, COM-30(1):91-99, January 1982.Google ScholarGoogle Scholar
  16. 16.A.M. Moffat. Economical inversion of large text files. Computing Systems, 5(2), 1992. To appear.Google ScholarGoogle Scholar
  17. 17.A.M. Moffat and J. Zobel. Coding for compression in full-text retrieval systems. In J.A. Storer and J.H. Reif, editors, Proc. IEEE Data Compression Conference, Snowbird, Utah, March 1992. To appear.Google ScholarGoogle Scholar
  18. 18.G. Salton and M.J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, New York, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. 19.E.J. Schuegraf. Compression of large inverted files with hyperbolic term distribution. Information Processing ~Z Management, 12:377- 384, 1976.Google ScholarGoogle Scholar
  20. 20.Z. Somogyi. The Melbourne University bibliography system. Technical Report 90/3, Department of Computer Science, The University of Melbourne, Parkville, Victoria 3052, Australia, March 1990.Google ScholarGoogle Scholar
  21. 21.J. Teuhola. A compression method for clustered bit-vectors. Information Processing Letters, 7(6):308-311, October 1978.Google ScholarGoogle ScholarCross RefCross Ref
  22. 22.B. Tuthill. Refer~a bibliography system. In Unix User's Manual Supplementary Documents. 4.2 Berkeley Software Distribution, Berkeley, California, 1984.Google ScholarGoogle Scholar
  23. 23.I.H. Witten, T.C. Bell, and C. Nevill. Models for compression in full-text retrieval systems. In J.A. Storer and J.H. Reif, editors, Proc. IEEE Data Compression Conference, pages 23-32, Snowbird, Utah, April 1991.Google ScholarGoogle ScholarCross RefCross Ref
  24. 24.I.H. Witten, R. Neal, and J.G. C/eary. Arithmetic coding for data compression. Communications of the A CM, 30(6):520-541, June 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. 25.J. Zobel and A.M. Moffat. Adding compression to a full-text retrieval system. In Proc. 15'th Australian Computer Science Conference, pages 1077-1089, Hobart, January 1992. University of Tasmania.Google ScholarGoogle Scholar
  26. 26.J. Zobel, A.M. Moffat, and It. Sacks-Davis. An efficient indexing technique for full-text database systems. Technical Report 92/21, Collaborative Information Technology Research Institute, Department of Computer Science, Royal Melbourne Institute of Technology, Australia, February 1992.Google ScholarGoogle Scholar

Index Terms

  1. Parameterised compression for sparse bitmaps

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGIR '92: Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
          June 1992
          352 pages
          ISBN:0897915232
          DOI:10.1145/133160

          Copyright © 1992 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 June 1992

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate792of3,983submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader