Introduction to Digital Audio Coding and Standards | springerprofessional.de

Springer Professional

nach oben

2003 | Buch

Kapitel lesen Erstes Kapitel lesen

Introduction to Digital Audio Coding and Standards

verfasst von: Marina Bosi, Richard E. Goldberg

Verlag: Springer US

Buchreihe : The International Series in Engineering and Computer Science

Enthalten in: Professional Book Archive

Einloggen, um Zugang zu erhalten

Über dieses Buch

Introduction to Digital Audio Coding and Standards provides a detailed introduction to the methods, implementations, and official standards of state-of-the-art audio coding technology. In the book, the theory and implementation of each of the basic coder building blocks is addressed. The building blocks are then fit together into a full coder and the reader is shown how to judge the performance of such a coder. Finally, the authors discuss the features, choices, and performance of the main state-of-the-art coders defined in the ISO/IEC MPEG and HDTV standards and in commercial use today.
The ultimate goal of this book is to present the reader with a solid enough understanding of the major issues in the theory and implementation of perceptual audio coders that they are able to build their own simple audio codec. There is no other source available where a non-professional has access to the true secrets of audio coding.

Anzeige

Inhaltsverzeichnis

Frontmatter

Audio Coding Methods

Frontmatter

Chapter 1. Introduction

Abstract

We hear a sound and we want to store it for later replay — what information do we need to capture? Physicists tell us that sound is a pressure wave (i.e., vibration) in the air so we can measure this pressure wave with a mechanical device and then mechanically reproduce the pressure wave later. This is the principle used by Thomas Edison and other manufacturers of early gramophones (precursors to phonographs) in which a large cone concentrated the vibrations to a point where a needle scratched its vibrating path onto a spinning cylinder or disk. Later, a hand-cranked or other form of motor would turn the spinning cylinder or disk and the needle’s forced movement along its prior path would cause the cone to recreate the pressure wave. The advent of electronic technology has allowed us to convert the pressure wave into a voltage reading that can be transferred onto a variety of storage media, for example as a changing degree of magnetization along a cassette tape. The basic idea in analogue technology, however, is still the same — to represent sound by the amplitude of its vibration over time. This tells us that one basic representation of sound is as a changing function of time t, which we denote x(t) as shown in Figure 1.

Marina Bosi, Richard E. Goldberg

Chapter 2. Quantization

Abstract

As we saw in the previous chapter, sound can be represented as a function of time, where both the sound amplitude and the time values are continuous in nature. Unfortunately, before we can represent an audio signal in digital format we need to convert continuous signal amplitude values into a discrete representation that is storable by a computer — an action which does cause loss of information. The reason for this conversion is that computers store numbers using finite numbers of bits so amplitude values can be stored with only finite precision. In this chapter, we address the quantization of continuous signal amplitudes into discrete amplitudes and determine how much distortion is caused by the process. Typically, quantization noise is the major cause of distortion in the coding process of audio signals. In later chapters, we address the perceptual impacts of this signal distortion and discuss the design trade-off between signal distortion and coder data rate. In this chapter, however, we focus on the basics of quantization.

Marina Bosi, Richard E. Goldberg

Chapter 3. Representation of Audio Signals

Abstract

In many instances it is more appropriate to describe audio signals as some function of frequency rather than time, since we perceive sounds in terms of their tonal content and their frequency representation often offers a more compact representation. The Fourier Transform is the basic tool that allows us to transform from functions of time like x(t) into corresponding functions of frequency like X(f). In this chapter, we first review some basic math notation and the “Dirac delta function”, since we will make use of its properties in many derivations. We then describe the Fourier Transform and its inverse to see how signals can be translated between their frequency and time domain representations. We also describe summary characteristics of signals and show how they can be calculated from either the time or the frequency-domain information. We discuss the Fourier series, which is a variation of the Fourier Transform that applies to periodic signals. In particular, we show how the Fourier series provides a more parsimonious description of time-limited signals than the full Fourier Transform without any loss of information. We show how we can apply the same insight in the frequency domain to prove the Sampling Theorem, which tells us that band-limited signals can be fully represented using only discrete time samples of the signal.

Marina Bosi, Richard E. Goldberg

Chapter 4. Time to Frequency Mapping Part I: The PQMF

Abstract

In this and the following chapter, we discuss common techniques used in mapping audio signals from the time domain into the frequency domain. The basic idea is that we can often reduce the redundancy in an audio signal by subdividing its content into its frequency components and then appropriately allocating the bit pool available. Highly tonal signals have frequency components that are slowly changing in time. The data necessary to fully describe these signals can be significantly less than that involved in directly describing the signal’s shape as time passes.

Marina Bosi, Richard E. Goldberg

Chapter 5. Time to Frequency Mapping Part II: The MDCT

Abstract

The PQMF solution to developing near-perfect reconstruction filter banks (see Chapter 4) was extremely important. Another approach to the time to frequency mapping of audio signals is historically connected to the development of transform coding. In this approach, block transform methods were used to take a block of sampled data and transform it into a different representation. For example, K data samples in the time domain could be transformed into K data samples in the frequency domain using the Discrete Fourier Transform, DFT. Moreover, exceedingly fast algorithms such as the Fast Fourier Transform, FFT, were developed for carrying out these transforms for large block sizes. Researchers discovered early on that they had to be very careful about how blocks of samples are analyzed/synthesized due to edge effects across blocks. This led to active research into what type of smooth windows and overlapping of data should be used to not distort the frequency content of the data. This line of research focused on windows, transforms, and overlap-and-add techniques of coding.

Marina Bosi, Richard E. Goldberg

Chapter 6. Introduction to Psychoacoustics

Abstract

In the introduction to this book, we saw that the last stage in the coding chain is the human ear. A good understanding of how the human ear works can be a powerful tool in the design of audio codecs. The general idea is that quantization noise can be placed in areas of the signal spectrum where it least affects the fidelity of the signal, so that the data rate can be reduced without introducing audible distortion.

Marina Bosi, Richard E. Goldberg

Chapter 7. Psychoacoustic Models for Audio Coding

Abstract

In the prior chapter we learned about the limits to human hearing. We learned about the threshold in quiet or hearing threshold below which sounds are inaudible. The hearing threshold is very important to coder design because it represents frequency-dependent levels below which quantization noise levels will be inaudible. The implication in the coded representation of the signal is that certain frequency components can be quantized with a relatively small number of bits without introducing audible distortion.

Marina Bosi, Richard E. Goldberg

Chapter 8. Bit Allocation Strategies

Abstract

The most common approach in perceptual coding of audio signals is to subdivide the input signal into frequency components and to encode each component separately. In Chapters 4 and 5 we discussed different time to frequency mapping techniques and how these techniques can represent the input signal in the frequency domain and allow for redundancy removal. Time domain based coding algorithms such as ADPCM can achieve similar results in terms of redundancy removal (see also Chapter 3). In this framework, typically the audio signal is treated as a single, wide-band signal and prediction and inverse filtering are adopted to describe it. In this context, the main difference between time-domain and frequency domain coding algorithms is the degree of redundancy removal and signal decorrelation.

Marina Bosi, Richard E. Goldberg

Chapter 9. Building a Perceptual Audio Coder

Abstract

In this chapter we discuss how the coder building blocks described in the prior chapters can be fit together into a working perceptual audio coder. Particular attention is given to how to create masking curves for use in bit allocation. We also discuss issues in setting up standardized bitstream formats so that coded data can be decoded using decoders provided from a variety of vendors.

Marina Bosi, Richard E. Goldberg

Chapter 10. Quality Measurement of Perceptual Audio Codecs

Abstract

Audio coding involves balancing data rate and system complexity limitations against needs for high-quality audio. While audio quality is a fundamental concept in audio coding, it remains very difficult to describe it in objective terms. Traditional quality measurements such as the signal to noise ratio or the total block distortion provide simple, objective measures of audio quality but they ignore psychoacoustic effects that can lead to large differences in perceived quality. In contrast, perceptual objective measurement schemes, which rely upon specific models of hearing, are subject to the criticism that the predicted results do not correlate well with the perceived audio quality. While neither simple objective measures nor perceptual measures are considered fully satisfactory, audio coding has traditionally relied on formal listening tests to assess a system’s audio quality when a highly accurate assessment is needed. After all, human listeners are the ultimate judges of quality in any application.

Marina Bosi, Richard E. Goldberg

Audio Coding Standards

Frontmatter

Chapter 11. MPEG-1 Audio

Abstract

After the introduction of digital video technologies and the CD format in the mid eighties, a flurry of applications that involved digital audio/video and multimedia technologies started to emerge. The need for interoperability, high-quality picture accompanied by CD-quality audio at lower data rates, and for a common file format led to the institution of a new standardization group within the joint technical committee on information technology (JTC 1) sponsored by the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC). This group, the Moving Picture Experts Group (MPEG), was established at the end of the eighties with the mandate to develop standards for coded representation of moving pictures, associated audio, and their combination [Chiariglione 95].

Marina Bosi, Richard E. Goldberg

Chapter 12. MPEG-2 Audio

Abstract

This chapter describes the MPEG-2 Audio coding family which was developed to extend the MPEG-1 Audio functionality to lower data rate and multichannel applications. First we will discuss MPEG-2 LSF and “MPEG- 2.5” systems in which lower sampling frequencies were used as a means to reduce data rate in MPEG-1 audio coding. (The MP3 audio format as usually implemented as the MPEG-2.5 extension of Layer III.) Next we will discuss the MPEG-2 BC system which extended the MPEG-1 Audio coders to multichannel operation while preserving backwards compatibility so that MPEG-2 BC data streams would be playable on existing MPEG-1 players. In the next chapter, we will discuss the MPEG-2 AAC system which made use of all of the advanced audio coding techniques available at the time to reach higher quality in a multichannel system than was achievable while maintaining compatibility with MPEG-1 players.

Marina Bosi, Richard E. Goldberg

Chapter 13. MPEG-2 AAC

Abstract

This chapter describes the MPEG-2 Advanced Audio Coding, AAC, system.⁵ Started in 1994, another effort of the MPEG-2 Audio committee was to define a higher quality multichannel standard than achievable while requiring MPEG-1 backwards compatibility. The so called MPEG-2 non backwards compatible audio standard, later renamed MPEG-2 Advanced Audio Coding (MPEG-2 AAC) [ISO/IEC 13818–7] was finalized in 1997. AAC made use of all of the advanced audio coding techniques available at the time of its development to provide very high quality multichannel audio.

Marina Bosi, Richard E. Goldberg

Chapter 14. Dolby AC-3

Abstract

In Chapters 11 through 13 we discussed the goals and the main features of ISO/IEC MPEG-1 and -2 Audio. Other standards bodies addressed the coding of audio based on specific applications. For example, the North American HDTV standard [ATSC A/52/10], the DVD-Video standard [DVD-Video] and the DVB [ETS 300 421] standard all make use of Dolby AC-3, also known as Dolby Digital.

Marina Bosi, Richard E. Goldberg

Chapter 15. MPEG-4 Audio

Abstract

In Chapters 11, 12 and 13 we discussed the goals of the first two phases of the MPEG Audio standard, MPEG-1 and MPEG-2, and we reviewed the main features of the specifications. MPEG-4 is another ISO/IEC standard that was proposed as a work item in 1992 [ISO/IEC MPEG N271]. In addition to audiovisual coding at very low bit rates, the MPEG-4 standard addresses different functionalities, such as, for example, scalability, 3-D, synthetic/natural hybrid coding, etc. MPEG-4 became an ISO/IEC final draft international standard, FDIS, in October 1998 (ISO/IEC 14496 version 1), see for example [ISO/IEC MPEG N2501,N2506,ISO/IEC MPEG N2502and ISO/IEC MPEG N2503]. The second version of ISO/IEC 14496 was finalized in December 1999 [ISO/IEC 14996]. In order to address the needs of emerging applications, the scope of the standard was expanded in later amendments and, even currently, a number of new features are under development. These features will be incorporated in new extensions to the standard, where the newer versions of the standard are compatible with the older ones.

Marina Bosi, Richard E. Goldberg

Backmatter

Titel: Introduction to Digital Audio Coding and Standards
verfasst von: Marina Bosi
Richard E. Goldberg
Copyright-Jahr: 2003
Verlag: Springer US
Electronic ISBN: 978-1-4615-0327-9
Print ISBN: 978-1-4613-5022-4
DOI: https://doi.org/10.1007/978-1-4615-0327-9