ABSTRACT
The “concordance” of an information retrieval system can often be stored in form of bit-maps, which are usually very sparse and should be compressed. Hierarchical bit-vector compression consists of partitioning a vector vi into equi-sized blocks, constructing a new bit-vector vi+1 which points to the non-zero blocks in vi, dropping the zero-blocks of vi, and repeating the process for vi+1. We refine the method by pruning some of the tree branches if they ultimately point to very few documents; these document numbers are then added to an appended list which is compressed by the prefix-omission technique. The new method was thoroughly tested on the bit-maps of the Responsa Retrieval Project, and gave a relative improvement of about 40% over the conventional hierarchical compression method.
- 1.Choueka Y., Full text systems and Research in the Humanities, Computer8 and the Humanities X_IV (1980) 153-169.Google Scholar
- 2.Davis D.R., Lin A.D., Secondary key retrieval using an IBM 7090-1301 system, Comm. of the A CM 8 (1965) 243-246. Google ScholarDigital Library
- 3.Fraenkel A.S., All about the Responsa Retrieval Project you always wanted to know but were afraid to ask, Expanded Summary, Jurimetric8 J. 16 (1976) 149- 156.Google Scholar
- 4.Fraenkel A.S., Klein S.T., Novel Compression of sparse Bit-Strings-- Preliminaxy Report, Combinatorial Algorithnts on Wo#ds, NATO ASI Series Vol F12, Springer Verlag, Berlin (1985) 169-183.Google Scholar
- 5.Jakobsson M., Huffman coding in Bit- Vector Compression, Inf. Proc. Letters T (1978) 304-307.Google Scholar
- 6.J akobsson M., Evaluation of s Hierarchical Bit-Vector Compression Technique, Inf. Proc. Letters 14 (1982) 147-149.Google ScholarCross Ref
- 7.Knuth D.E., The Art of Computer Programming, Vol. II, Seml-numerical Algorithms, Addison-Wesley, Reading, Mass. (1973). Google ScholarDigital Library
- 8.Nevalainen O., Jakobsson M., Berg R., Compression of clustered inverted files, in Proc. of the 7-th Syrup. on Math. Foundations of Comp. Sc., Zakopane, Poland (1978) 393-402.Google Scholar
- 9.Schuegraf E.J., Compression of large in-. verted files with hyperbolic term distribution, Inf. Proc. and Management 12 (1976) 377-384.Google ScholarCross Ref
- 10.Teuhola 3., A Compression method for Clustered Bit-Vectors, inf. Proc. Letter8 7 (1978) 308-311.Google Scholar
- 11.Vallarino O., On the use of bit-maps for multiple key retrieval, SIGPLAN Notices, Special Issue Vol. II (1976) 108-114. Google ScholarDigital Library
- 12.Wedekind H., H#rder T., Datenbank- 8#teme II, B.-I. Wissenschaftsverlag, Mannheim (1976).Google Scholar
- Improved hierarchical bit-vector compression in document retrieval systems
Recommendations
Constrained and recursive hierarchical table-lookup vector quantization
DCC '96: Proceedings of the Conference on Data CompressionThis paper presents techniques for the design of generic constrained and recursive vector quantizer encoders implemented by table-lookups. These vector quantizers include entropy-constrained VQ, tree structured VQ, classified VQ, product VQ, mean-...
Comments