Abstract
This paper surveys a variety of data compression methods spanning almost 40 years of research, from the work of Shannon, Fano, and Huffman in the late 1940s to a technique developed in 1986. The aim of data compression is to reduce redundancy in stored or communicated data, thus increasing effective data density. Data compression has important application in the areas of file storage and distributed systems. Concepts from information theory as they relate to the goals and evaluation of data compression methods are discussed briefly. A framework for evaluation and comparison of methods is constructed and applied to the algorithms presented. Comparisons of both theoretical and empirical natures are reported, and possibilities for future research are suggested.
- ABRAMSON, N. 1963. in{ormation Theory and Coding. McGraw-Hill, New York.Google Scholar
- APOSTOLICO, A., AND FRAENKEL, A. S. 1985. Robust transmission of unbounded strings using Fibonacci representations. Tech. Rep. CS85-14, Dept. of Applied Mathematics, The Weizmann Institute of Science, Rehovot, Israel.Google Scholar
- ARC 1986. ARC File Archive Utility, Version 5.1. System Enhancement Associates, Wayne, N.J.Google Scholar
- ASH, R. B. 1965. Information Theory. Interscience, New York.Google Scholar
- BENTLEY, J. L., SLEATOR, D. D., TARJAN, R. E., ANO WEI, V. K. 1986. A locally adaptive data compression scheme. Commun. ACM 29, 4 (Apr.), 320-330. Google Scholar
- BOBROW, L. S., ANO HAKIMI, S. L. 1969. Graph theoretic prefix codes and their synchronizing properties. Inf. Contr. 15, 1 (July), 70-94.Google Scholar
- BRENT, R. P., AND KUN(}, H. T. 1978. Fast algorithms for manipulating formal power series. J. ACM 25, 4 (Oct.), 581-595. Google Scholar
- CAPOCELLI, R. M., GIANCARLO, R., AND TANEJA, I. J. 1986. Bounds on the redundancy of Huffman codes. IEEE Trans. In{. Theory 32, 6 (Nov.), 854-857. Google Scholar
- CAPPELLINI, V., ED. 1985. Data Compression and Error Control Techniques with Applications. Academic Press, London. Google Scholar
- CONNELL, J. B. 1973. A Huffman-Shannon-Fano code. Proc. IEEE 61, 7 (July), 1046-1047.Google Scholar
- CORMACK, G. V. 1985. Data compression on a database system. Commun. ACM 28, 12 (Dec.), 1336- 1342. Google Scholar
- CORMACK, G. V., AND HORSPOOL, R. N. 1984. Algorithms for adaptive Huffman codes. Inf. Process. Lett. 18, 3 (Mar.), 159-165. Google Scholar
- CORTESI, D. 1982. An effective text-compression algorithm. BYTE 7, 1 (Jan.), 397-403.Google Scholar
- COT, N. 1977. Characterization and design of optireal prefix codes. Ph.D. dissertation, Computer Science Dept., Stanford Univ., Stanford, Calif.Google Scholar
- ELIAS, P. 1975. Universal codeword sets and representations of the integers. IEEE Trans. Inf. Theory 21, 2 (Mar.), 194-203.Google Scholar
- ELIAS, P. 1987. Interval and recency rank source coding: Two on-line adaptive variable-length schemes. IEEE Trans. In{. Theory 33, 1 (Jan.), 3-10. Google Scholar
- FALLER, N. 1973. An adaptive system for data compression. In Record of the 7th Asilomar Conference on Circuits, Systems and Computers (Pacific Grove, Calif., Nov.). Naval Postgraduate School, Monterey, Calif., pp. 593-597.Google Scholar
- FANO, R. M. 1949. Transmission of Information. M.I.T. Press, Cambridge, Mass.Google Scholar
- FRAENKEL, A. S., ANO KLEIN, S. T. 1985. Robust universal complete codes as alternatives to Huffman codes. Tech. Rep. CS85-16, Dept. of Applied Mathematics, The Weizmann institute of Science, Rehovot, Israel.Google Scholar
- FRAENKEL, A. S., MOR, M., AND PERL, ~. 1983. Is text compression by prefixes and suffixes practical? Acta Inf. 20, 4 (Dec.), 371-375.Google Scholar
- GALLAGER, R. G. 1968. Information Theory and Reliable Communication. Wiley, New York. Google Scholar
- GALLAGER, R. G. 1978. Variations on a theme by Huffman. IEEE Trans. Inf. Theory 24, 6 (Nov.), 668-674.Google Scholar
- GAREY, M. R. 1974. Optimal binary search trees with restricted maximal depth. SIAM J. Comput. 3, 2 (June), 101-110.Google Scholar
- GILBERT, E. N. 1971. Codes based on inaccurate source probabilities. IEEE Trans. Inf. Theory 17, 3 (May), 304-314.Google Scholar
- GILBERT, E. N., AND MOORE, E. F. 1959. Variablelength binary encodings. Bell S~st. Tech. J. 38, 4 (July), 933-967.Google Scholar
- GLASSEY, C. R., AND KARP, R. M. 1976. On the optimality of Huffman trees. SIAM J. Appl. Math. 31, 2 (Sept.), 368-378.Google Scholar
- GONZALEZ, R. C., AND WINTZ, P. 1977. Digital Image Processing. Addison-Wesley, Reading, Mass. Google Scholar
- HAHN, B. 1974. A new technique for compression and storage of data. Commun. ACM 17, 8 (Aug.), 434-436. Google Scholar
- HESTER, J. H., AND HIRSCHBERG, D. S. 1985. Selforganizing linear search. ACM Comput. Surv. 17, 3 (Sept.), 295-311. Google Scholar
- HORSPOOL, R. N., AND CORMACK, G. V. 1987. A locally adaptive data compression scheme. Commun. ACM 30, 9 (Sept.), 792-794.Google Scholar
- HU, T. C., ANO TAN, K. C. 1972. Path length of binary search trees. SIAM J. Appl. Math 22, 2 (Mar.), 225-234.Google Scholar
- HU, T. C., AND TUCKER, A. C. 1971. Optimal computer search trees and variable-length alphabetic codes. SIAM J. Appl. Math. 21, 4 (Dec.), 514- 532.Google Scholar
- HUFFMAN, D. A. 1952. A method for the construction of minimum-redundancy codes. Proc. IRE 40, 9 (Sept.), 1098-1101.Google Scholar
- INGELS, F. M. 1971. Information and Coding Theory. intext, Scranton, Penn.Google Scholar
- ITAI, A. 1976. Optimal alphabetic trees. SIAM J. Comput. 5, 1 (Mar.), 9-18.Google Scholar
- KARP, R. M. 1961. Minimum redundancy coding for the discrete noiseless channel. IRE Trans. Inf. Theory 7, i (Jan.), 27-38.Google Scholar
- KNUTH, D. E. 1971. Optimum binary search trees. Acta In{. 1, 1 (Jan.), 14-25.Google Scholar
- KNUTH, D. E. 1985. Dynamic Huffman coding. J. Algorithms 6, 2 (June), 163-180. Google Scholar
- KRAUSE, R. M. 1962. Channels which transmit letters of unequal duration. Inf. Control 5, i (Mar.), 13-24.Google Scholar
- LAESER, R. P., MCLAUGHLIN, W. I., AND WOLFF, D. M. 1986. Engineering Voyager 2's encounter with Uranus. Sci. Am. 255, 5 (Nov.), 36-45.Google Scholar
- LANGDON, G. G., AND RISSANEN, J. J. 1983. A double-adaptive file compression algorithm. 1EEE Trans. Comm. 31, 11 (Nov.), 1253-1255.Google Scholar
- LLEWELLYN, J. A. 1987. Data compression for a source with Markov characteristics. Comput. J. 30, 2, 149-156. Google Scholar
- MCINTYRE, D. R., ANO PECHURA, M. A. 1985. Data compression using static Huffman code-decode tables. Commun. ACM 28, 6 (June), 612- 616. Google Scholar
- MEHLHORN, K. 1980. An efficient algorithm for constructing nearly optimal prefix codes. IEEE Trans. In{. Theory 26, 5 (Sept.), 513-517.Google Scholar
- NEUMANN, P. G. 1962. Efficient error-limiting variable-length codes. IRE Trans. In{. Theory 8, 4 (July), 292-304.Google Scholar
- PARKER, D. S. 1980. Conditions for the optimality of the Huffman algorithm. SIAM J. Comput. 9, 3 (Aug.), 470-489.Google Scholar
- PASCO, R. 1976. Source coding algorithms for fast data compression. Ph.D. dissertation, Dept. of Electrical Engineering, Stanford Univ., Stanford, Calif. Google Scholar
- PECHURA, M. 1982. File archival techniques using data compression. Commun. ACM 25, 9 (Sept.), 605-609. Google Scholar
- PERL, Y., GARE~, M. R., AND EVEN, S. 1975. Efficient generation of optimal prefix code: Equiprobable words using unequal cost letters. J. ACM 22, 2 (Apr.), 202-214. Google Scholar
- PKARC FAST! 1987. PKARC FAST/File Archival Utility, Version 3.5. PKWARE, Inc. Glendale, Wis.Google Scholar
- REGHBATI, H. K. 1981. An overview of data compression techniques. Computer 14, 4 (Apr.), 71-75.Google Scholar
- RISSANEN, J. J. 1976. Generalized Kraft inequality and arithmetic coding. IBM J. Res. Dev. 20 (May), 198-203.Google Scholar
- RISSANEN, J. J. 1983. A universal data compression system. IEEE Trans. In{. Theory 29, 5 (Sept.), 656-664.Google Scholar
- RODEH, M., PRATT, V. R., AND EVEN, S. 1981. Linear algorithm for data compression via string matching. J. ACM 28, i (Jan.), 16-24. Google Scholar
- RUBIN, F. 1976. Experiments in text file compression. Commun. ACM 19, 11 (Nov.), 617-623. Google Scholar
- RUBIN, F. 1979. Arithmetic stream coding using fixed precision registers. IEEE Trans. In{. Theory 25, 6 (Nov.), 672-675.Google Scholar
- RUDNER, B. 1971. Construction of minimumredundancy codes with optimal synchronizing property. IEEE Trans. Inf. Theory 17, 4 (July), 478-487.Google Scholar
- RUTH, S. S., AND KREUTZER, P. J. 1972. Data compression for large business files. Datamation 18, 9 (Sept.), 62-66.Google Scholar
- RYABKO, B. Y. 1987. A locally adaptive data compression scheme. Commun. A CM 16, 2 (Sept.), 792.Google Scholar
- SAMET, H. 1984. The Quadtree and related hierarchical data structures. A CM Comput. Surv. 30, 9 (June), 187-260. Google Scholar
- SCHWARTZ, E. S. 1964. An optimum encoding with minimum longest code and total number of digits. Inf. Control 7, 1 (Mar.), 37-44.Google Scholar
- SCHWARTZ, E. S., AND KALLICK, B. 1964. Generating a canonical prefix encoding. Commun. ACM 7, 3 (Mar.), 166-169. Google Scholar
- SEVERANCE, D. G. 1983. A practitioner's guide to data base compression. In{. Syst. 8, 1, 51-62.Google Scholar
- SHANNON, C. E., AND WEAVER, W. 1949. The Mathematical Theory o{ Communication. University of Illinois Press, Urbana, Ill. Google Scholar
- SNYDERMAN, M., AND HUNT, B. 1970. The myriad virtues of text compaction. Datamation 16, 12 (Dec.), 36-40.Google Scholar
- STANDISH, T. A. 1980. Data Structure Techniques. Addison-Wesley, Reading, Mass. Google Scholar
- STIFFLER, J. J. 1971. Theory of Synchronous Communications. Prentice-Hall, Englewood Cliffs, N.J.Google Scholar
- STORER, J. A., AND SZYMANSKI, T. G. 1982. Data compression via textual substitution. J. A CM 29, 4 (Oct.), 928-951. Google Scholar
- TANAKA, H. 1987. Data structure of Huffman codes and its application to efficient encoding and decoding. IEEE Trans. Inf. Theory 33, 1 (Jan.), 154-156. Google Scholar
- TROPPER, R. 1982. Binary-coded text, A textcompression method. BYTE 7, 4 (Apr.), 398-413.Google Scholar
- UNIX. 1984. UNIX User's Manual, Version 4.2. Berkeley Software Distribution, Virtual VAX-11 Version, Univ. of California, Berkeley, Calif.Google Scholar
- VARN, B. 1971. Optimal variable length codes (arbitrary symbol cost and equal code word probability). inf. Control 19, 4 (Nov.), 289-301.Google Scholar
- VITTER, J. S. 1987. Design and analysis of dynamic Huffman codes. J. ACM 34, 4 (Oct.), 825-845. Google Scholar
- WAGNER, R. A. 1973. Common phrases and minimum-space text storage. Commun. A CM 16, 3 (Mar.), 148-152. Google Scholar
- WELCH, T. A. 1984. A technique for high-performance data compression. Computer 17, 6 (June), 8-19.Google Scholar
- WILKINS, L. C., AND WINTZ, P. A. 1971. Bibliography on data compression, picture properties and picture coding. IEEE Trans. In{. Theory 17, 2, 180-197.Google Scholar
- WITTEN, I. H., NEAL, R. M., AND CLEARY, J. G. 1987. Arithmetic coding for data compression. Commun. ACM 30, 6 (June), 520-540. Google Scholar
- ZIMMERMAN, S. 1959. An optimal search procedure. Am. Math. Monthly 66, (Oct.), 690-693.Google Scholar
- ZIV, J., AND LEMPEL, A. 1977. A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory 23, 3 (May), 337-343.Google Scholar
- ZIV, J., AND LEMPEL, A. 1978. Compression of individual sequences via variable-rate coding. IEEE Trans. Inf. Theory 24, 5 (Sept.), 530-536.Google Scholar
Comments