Skip to main content

2011 | Buch

Entropy and Information Theory

insite
SUCHEN

Über dieses Buch

This book is an updated version of the information theory classic, first published in 1990. About one-third of the book is devoted to Shannon source and channel coding theorems; the remainder addresses sources, channels, and codes and on information and distortion measures and their properties.

New in this edition:

Expanded treatment of stationary or sliding-block codes and their relations to traditional block codesExpanded discussion of results from ergodic theory relevant to information theoryExpanded treatment of B-processes -- processes formed by stationary coding memoryless sourcesNew material on trading off information and distortion, including the Marton inequalityNew material on the properties of optimal and asymptotically optimal source codesNew material on the relationships of source coding and rate-constrained simulation or modeling of random processes

Significant material not covered in other information theory texts includes stationary/sliding-block codes, a geometric view of information theory provided by process distance measures, and general Shannon coding theorems for asymptotic mean stationary sources, which may be neither ergodic nor stationary, and d-bar continuous channels.

Inhaltsverzeichnis

Frontmatter
Chapter 1. Information Sources
Abstract
An information source or source is a mathematical model for a physical entity that produces a succession of symbols called “outputs” in a random manner. The symbols produced may be real numbers such as voltage measurements from a transducer, binary numbers as in computer data, two dimensional intensity fields as in a sequence of images, continuous or discontinuous waveforms, and so on. The space containing all of the possible output symbols is called the alphabet of the source and a source is essentially an assignment of a probability measure to events consisting of sets of sequences of symbols from the alphabet. It is useful, however, to explicitly treat the notion of time as a transformation of sequences produced by the source. Thus in addition to the common random process model we shall also consider modeling sources by dynamical systems as considered in ergodic theory. The material in this chapter is a distillation of [55, 58] and is intended to establish notation.
Robert M. Gray
Chapter 2. Pair Processes: Channels, Codes, and Couplings
Abstract
We have considered a random process or source {X n } as a sequence of random entities, where the object produced at each time could be quite general, e.g., a random variable, vector, or waveform. Hence sequences of pairs of random objects such as {X n , Y n } are included in the general framework. We now focus on the possible interrelations between the two components of such a pair process. First consider the situation where we begin with one source, say {X n }, called the input and use either a random or a deterministic mapping of the input sequence {X n } to form an output sequence fYng. We generally refer to the mapping as a channel if it is random and a code if it is deterministic. Hence a code is a special case of a channel and results for channels will immediately imply corresponding results for codes. The initial point of interest will be conditions on the structure of the channel under which the resulting pair process fXn; Yng will inherit stationarity and ergodic properties from the original source {X n }. We will also be interested in the behavior resulting when the output of one channel serves as the input to another, that is, when we form a new channel as a cascade of other channels. Such cascades yield models of a communication system which typically has a code mapping (called the encoder) followed by a channel followed by another code mapping (called the decoder). Lastly, pair processes arise naturally in other situations, including coupling two separate processes by constructing a joint distribution. This chapter develops the context for the development in future chapters of the properties of information and entropy arising in pair processes.
Robert M. Gray
Chapter 3. Entropy
Abstract
The development of the idea of entropy of random variables and processes by Claude Shannon provided the beginnings of information theory and of the modern age of ergodic theory. Entropy and related information measures will be shown to provide useful descriptions of the long term behavior of random processes and this behavior is a key factor in developing the coding theorems of information theory. Here the various notions of entropy for random variables, vectors, processes, and dynamical systems are introduced and and there fundamental properties derived. In this chapter the case of finite-alphabet random processes is emphasized for simplicity, reflecting the historical development of the subject. Occasionally we consider more general cases when it will ease later developments.
Robert M. Gray
Chapter 4. The Entropy Ergodic Theorem
Abstract
The goal of this chapter is to prove an ergodic theorem for sample entropy of finite-alphabet random processes. The result is sometimes called the ergodic theorem of information theory or the asymptotic equipartition (AEP) theorem, but it is best known as the Shannon-McMillan-Breiman theorem. It provides a common foundation to many of the results of both ergodic theory and information theory.
Robert M. Gray
Chapter 5. Distortion and Approximation
Abstract
Various notions of the distortion between random variables, vectors, and processes as well as between different codings of a common source are quantified in this chapter. A distortion measure is not a “measure” in the sense used so far — it is an assignment of a nonnegative real number which indicates how bad an approximation one symbol or random object or coding is of another. The smaller the distortion, the better the approximation. If the two objects correspond to the input and output of a communication system, then the average distortion provides a measure of the performance or fidelity of the system. Small average distortion means high fidelity and good performance, while large average distortion means low fidelity and poor performance. Distortion measures generalize the idea of a distance or metric and they need not have metric properties such as the triangle inequality and symmetry, but such properties can be exploited when available and unsurprisingly the most important notions of distortion are either metrics or simple functions of metrics. We shall encounter several notions of distortion and a diversity of applications, with the most important application being the average distortion between input and output as a measure of the performance of a communications system. Other applications include extensions of finite memory channels to channels which approximate finite memory channels, geometric characterizations of the optimal performance of communications systems, approximations of complicated codes by simpler ones, and modeling random processes.
Robert M. Gray
Chapter 6. Distortion and Entropy
Abstract
Results are developed relating the goodness of approximation as measured by average Hamming distance between codes and and the d-bar distance between sources to the closeness of entropy rate. A few easy applications provide important properties of entropy rate.
One might suspect that if two codes for a common source closely approximate each other, then the resulting output entropies should also be close. Similarly, it seems reasonable to expect that if two random processes well approximate each other, than their entropy rates should be close. Such notions of continuity of entropy with respect to the goodness of approximation between codes and processes are the focus of this chapter and are fundamental to to the development and extensions to follow. A few easy applications are collected in this chapter.
Robert M. Gray
Chapter 7. Relative Entropy
Abstract
A variety of information measures have been introduced for finite alphabet random variables, vectors, and processes: entropy, mutual information, relative entropy, conditional entropy, and conditional mutual information. All of these can be expressed in terms of divergence and hence the generalization of these definitions and their properties to infinite alphabets will follow from a general definition of divergence. In this chapter the definition and properties of divergence in this general setting are developed, including the formulas for evaluating divergence as an expectation of information density and as a limit of divergences of finite codings. We also develop several inequalities for and asymptotic properties of divergence. These results provide the groundwork needed for generalizing the ergodic theorems of information theory from finite to standard alphabets. The general definitions of entropy and information measures originated in the pioneering work of Kolmogorov and his colleagues Gelfand, Yaglom, Dobrushin, and Pinsker
Robert M. Gray
Chapter 8. Information Rates
Abstract
Definitions of information rate for processes with standard alphabets are developed and a mean ergodic theorem for information densities is proved. The relations among several different measures of information rate are developed.
Robert M. Gray
Chapter 9. Distortion and Information
Abstract
A pair random process (X; Y) = {X n , Y n } can be generated by a source and a channel, a source and a code, a combination of a source, channel, encoder, and decoder, or two sources and a coupling. We have developed in some detail the properties of two quantities characterizing relations between the two components of a pair process: the average distortion between the components and their mutual information rate. In this chapter relations are developed between distortion and rate, where rate is measured by mutual information. The primary results concern the Shannon distortion-rate function and its dual, the Shannon rate-distortion function, which will be seen to provide bounds on the relationships between distortion and entropy and to characterize the optimal performance in source coding and rate-constrained simulation systems.
Robert M. Gray
Chapter 10. Relative Entropy Rates
Abstract
Many of the basic properties of relative entropy are extended to sequences of random variables and to processes. Several limiting properties of entropy rates are proved and a mean ergodic theorem for relative entropy densities is given. The principal ergodic theorems for relative entropy and information densities in the general case are given in the next chapter.
Robert M. Gray
Chapter 11. Ergodic Theorems for Densities
Abstract
This chapter is devoted to developing ergodic theorems first for relative entropy densities and then information densities for the general case of AMS processes with standard alphabets. The general results were first developed by Barron using the martingale convergence theorem and a new martingale inequality. The similar results of Algoet and Cover can be proved without direct recourse to martingale theory. They infer the result for the stationary Markov approximation and for the infinite order approximation from the ordinary ergodic theorem. They then demonstrate that the growth rate of the true density is asymptotically sandwiched between that for the kth order Markov approximation and the infinite order approximation and that no gap is left between these asymptotic upper and lower bounds in the limit as k → ∞. They use martingale theory to show that the values between which the limiting density is sandwiched are arbitrarily close to each other, but in this chapter it is shown that martingale theory is not needed and this property follows from the results of Chapter 8.
Robert M. Gray
Chapter 12. Source Coding Theorems
Abstract
The source coding theorems subject to a fidelity criterion are develped for AMS sources and additive and subadditive distortion measures. The results are first developed for the classic case of block coding and then to sliding-block codes. The operational distortion-rate function for both classes of codes is shown to equal the Shannon distortion-rate function.
Robert M. Gray
Chapter 13. Properties of Good Source Codes
Abstract
Necessary conditions for a source code to be optimal or a sequence of source codes to be asymptotically optimal for a stationary source are developed for block and sliding-block codes.
Robert M. Gray
Chapter 14. Coding for Noisy Channels
Abstract
Reliable communication over a noisy channel is the focus of this chapter. The chapter begins with a development of the classic fundamental results of Feinstein regarding reliable communication of block codes and the relation of operational channel capacity to Shannon capacity for discrete channels. A technique of Dobrushin is used to extend Feinstein’s results for channels with no input memory or anticipation by making codes robust to small changes in the conditional distributions describing channels. This leads in turn to the extension of block coding theorems to d-bar continuous channels, discrete noisy channels where the noise distribution within a block can be well approximated in a d-bar sense with only finite knowledge of past and future inputs. Traditional channel coding theorems for block codes assume knowledge of synchronization – when the blocks begin. Another technique of Doburshin is used to synchronize block codes through noisy channels. Combining synchronized block codes with the Rohlin-Kakutani theorem yields a coding theorem for sliding-block channel coding. Finally, combining the source coding theorems with channel coding theorems yields joint-source and channel coding theorems.
Robert M. Gray
Backmatter
Metadaten
Titel
Entropy and Information Theory
verfasst von
Robert M. Gray
Copyright-Jahr
2011
Verlag
Springer US
Electronic ISBN
978-1-4419-7970-4
Print ISBN
978-1-4419-7969-8
DOI
https://doi.org/10.1007/978-1-4419-7970-4

Neuer Inhalt