Chapter 1. Information Sources

Abstract

An information source or source is a mathematical model for a physical entity that produces a succession of symbols called “outputs” in a random manner. The symbols produced may be real numbers such as voltage measurements from a transducer, binary numbers as in computer data, two dimensional intensity fields as in a sequence of images, continuous or discontinuous waveforms, and so on. The space containing all of the possible output symbols is called the alphabet of the source and a source is essentially an assignment of a probability measure to events consisting of sets of sequences of symbols from the alphabet. It is useful, however, to explicitly treat the notion of time as a transformation of sequences produced by the source. Thus in addition to the common random process model we shall also consider modeling sources by dynamical systems as considered in ergodic theory.

Robert M. Gray

Chapter 2. Entropy and Information

Abstract

The development of the idea of entropy of random variables and processes by Claude Shannon provided the beginnings of information theory and of the modern age of ergodic theory. We shall see that entropy and related information measures provide useful descriptions of the long term behavior of random processes and that this behavior is a key factor in developing the coding theorems of information theory. We now introduce the various notions of entropy for random variables, vectors, processes, and dynamical systems and we develop many of the fundamental properties of entropy.

Robert M. Gray

Chapter 3. The Entropy Ergodic Theorem

Abstract

The goal of this chapter is to prove an ergodic theorem for sample entropy of finite alphabet random processes. The result is sometimes called the ergodic theorem of information theory or the asymptotic equipartion theorem, but it is best known as the Shannon-McMillan-Breiman theorem. It provides a common foundation to many of the results of both ergodic theory and information theory. Shannon [129] first developed the result for convergence in probability for stationary ergodic Markov sources. McMillan [103] proved L ¹ convergence for stationary ergodic sources and Breiman [19] [20] proved almost everywhere convergence for stationary and ergodic sources. Billingsley [15] extended the result to stationary nonergodic sources. Jacobs [67] [66] extended it to processes dominated by a stationary measure and hence to two-sided AMS processes. Gray and Kieffer [54] extended it to processes asymptotically dominated by a stationary measure and hence to all AMS processes. The generalizations to AMS processes build on the Billingsley theorem for the stationary mean. Following generalizations of the definitions of entropy and information, corresponding generalizations of the entropy ergodic theorem will be considered in Chapter 8.

Robert M. Gray

Chapter 4. Information Rates I

Abstract

Before proceeding to generalizations of the various measures of information, entropy, and divergence to nondiscrete alphabets, we consider several properties of information and entropy rates of finite alphabet processes. We show that codes that produce similar outputs with high probability yield similar rates and that entropy and information rate, like ordinary entropy and information, are reduced by coding. The discussion introduces a basic tool of ergodic theory-the partition distance- and develops several versions of an early and fundamental result from information theory-Fano’s inequality. We obtain an ergodic theorem for information densities of finite alphabet processes as a simple application of the general ShannonMcMillan-Breiman theorem coupled with some definitions. In Chapter 6 these results easily provide L ¹ ergodic theorems for information densities for more general processes.

Robert M. Gray

Chapter 5. Relative Entropy

Abstract

A variety of information measures have been introduced for finite alphabet random variables, vectors, and processes:entropy, mutual information, relative entropy, conditional entropy, and conditional mutual information. All of these can be expressed in terms of divergence and hence the generalization of these definitions to infinite alphabets will follow from a general definition of divergence. Many of the properties of generalized information measures will then follow from those of generalized divergence.

Robert M. Gray

Chapter 6. Information Rates II

Abstract

In this chapter we develop general definitions of information rate for processes with standard alphabets and we prove a mean ergodic theorem for information densities. The L ¹ results are extensions of the results of Moy [105] and Perez [123] for stationary processes, which in turn extended the Shannon-McMillan theorem from entropies of discrete alphabet processes to information densities. (See also Kieffer [85].) We also relate several different measures of information rate and consider the mutual information between a stationary process and its ergodic component function. In the next chapter we apply the results of Chapter 5 on divergence to the definitions of this chapter for limiting information and entropy rates to obtain a number of results describing the behavior of such rates. In Chapter 8 almost everywhere ergodic theorems for relative entropy and information densities are proved.

Robert M. Gray

Chapter 7. Relative Entropy Rates

Abstract

This chapter extends many of the basic properties of relative entropy to sequences of random variables and to processes. Several limiting properties of entropy rates are proved and a mean ergodic theorem for relative entropy densities is given. The principal ergodic theorems for relative entropy and information densities in the general case are given in the next chapter.

Robert M. Gray

Chapter 8. Ergodic Theorems for Densities

Abstract

This chapter is devoted to developing ergodic theorems first for relative entropy densities and then information densities for the general case of AMS processes with standard alphabets. The general results were first developed by Barron [9] using the martingale convergence theorem and a new martingale inequality. The similar results of Algoet and Cover [7] can be proved without direct recourse to martingale theory. They infer the result for the stationary Markov approximation and for the infinite order approximation from the ordinary ergodic theorem. They then demonstrate that the growth rate of the true density is asymptotically sandwiched between that for the kth order Markov approximation and the infinite order approximation and that no gap is left between these asymptotic upper and lower bounds in the limit as k → ∞. They use martingale theory to show that the values between which the limiting density is sandwiched are arbitrarily close to each other, but we shall see that this is not necessary and this property follows from the results of Chapter 6.

Robert M. Gray

Chapter 9. Channels and Codes

Abstract

We have considered a random process or source {X _n} as a sequence of random entities, where the object produced at each time could be quite general, e.g., a random variable, vector, or waveform. Hence sequences of pairs of random objects such as {X _n,Y _n} are included in the general framework. We now focus on the possible interrelations between the two components of such a pair process. In particular, we consider the situation where we begin with one source, say {X _n}, called the input and use either a random or a deterministic mapping to form a new source {Y _n}, called the output. We generally refer to the mapping as a channel if it is random and a code if it is deterministic. Hence a code is a special case of a channel and results for channels will immediately imply the corresponding results for codes. The initial point of interest will be conditions on the structure of the channel under which the resulting pair process {X _n,Y _n} will inherit stationarity and ergodic properties from the original source {X _n}. We will also be interested in the behavior resulting when the output of one channel serves as the input to another, that is, when we form a new channel as a cascade of other channels. Such cascades yield models of a communication system which typically has a code mapping (called the encoder) followed by a channel followed by another code mapping (called the decoder).

Robert M. Gray

Chapter 10. Distortion

Abstract

We now turn to quantification of various notions of the distortion between random variables, vectors and processes. A distortion measure is not a “measure” in the sense used so far; it is an assignment of a nonnegative real number which indicates how bad an approximation one symbol or random object is of another; the smaller the distortion, the better the approximation. If the two objects correspond to the input and output of a communication system, then the distortion provides a measure of the performance of the system. Distortion measures need not have metric properties such as the triangle inequality and symmetry, but such properties can be exploited when available. We shall encounter several notions of distortion and a diversity of applications, with eventually the most important application being a measure of the performance of a communications system by an average distortion between the input and output. Other applications include extensions of finite memory channels to channels which approximate finite memory channels and different characterizations of the optimal performance of communications systems.

Robert M. Gray

Chapter 11. Source Coding Theorems

Abstract

In this chapter and the next we develop the basic coding theorems of information theory. As is traditional, we consider two important special cases first and then later form the overall result by combining these special cases. In the first case we assume that the channel is noiseless, but it is constrained in the sense that it can only pass R bits per input symbol to the receiver. Since this is usually insufficient for the receiver to perfectly recover the source sequence, we attempt to code the source so that the receiver can recover it with as little distortion as possible. This leads to the theory of source coding or source coding subject to a fidelity criterion or data compression, where the latter name reflects the fact that sources with infinite or very large entropy are “compressed” to fit across the given communication link. In the next chapter we ignore the source and focus on a discrete alphabet channel and construct codes that can communicate any of a finite number of messages with small probability of error and we quantify how large the message set can be. This operation is called channel coding or error control coding. We then develop joint source and channel codes which combine source coding and channel coding so as to code a given source for communication over a given channel so as to minimize average distortion. The ad hoc division into two forms of coding is convenient and will permit performance near that of the OPTA function for the codes considered.

Robert M. Gray

Chapter 12. Coding for noisy channels

Abstract

In the treatment of source coding the communication channel was assumed to be noiseless. If the channel is noisy, then the coding strategy must be different. Now some form of error control is required to undo the damage caused by the channel. The overall communication problem is usually broken into two pieces: A source coder is designed for a noiseless channel with a given resolution or rate and an error correction code is designed for the actual noisy channel in order to make it appear almost noiseless. The combination of the two codes then provides the desired overall code or joint source and channel code. This division is natural in the sense that optimizing a code for a particular source may suggest quite different structure than optimizing it for a channel. The structures must be compatible at some point, however, so that they can be used together.

Robert M. Gray

Springer Professional

Entropy and Information Theory

Inhaltsverzeichnis

Frontmatter

Chapter 1. Information Sources

Chapter 2. Entropy and Information

Chapter 3. The Entropy Ergodic Theorem

Chapter 4. Information Rates I

Chapter 5. Relative Entropy

Chapter 6. Information Rates II

Chapter 7. Relative Entropy Rates

Chapter 8. Ergodic Theorems for Densities

Chapter 9. Channels and Codes

Chapter 10. Distortion

Chapter 11. Source Coding Theorems

Chapter 12. Coding for noisy channels

Backmatter