Skip to main content

2017 | Buch

Spatial Audio Reproduction with Primary Ambient Extraction

insite
SUCHEN

Über dieses Buch


This book first introduces the background of spatial audio reproduction, with different types of audio content and for different types of playback systems. A literature study on the classical and emerging Primary Ambient Extraction (PAE) techniques is presented. The emerging techniques aim to improve the extraction performance and also enhance the robustness of PAE approaches in dealing with more complex signals encountered in practice. The in-depth theoretical study helps readers to understand the rationales behind these approaches. Extensive objective and subjective experiments validate the feasibility of applying PAE in spatial audio reproduction systems. These experimental results, together with some representative audio examples and MATLAB codes of the key algorithms, illustrate clearly the differences among various approaches and also help readers gain insights on selecting different approaches for different applications.

Inhaltsverzeichnis

Frontmatter
Chapter 1. Introduction
Abstract
This chapter gives a brief introduction on the motivation of the work on spatial audio reproduction using a sound scene decomposition technique referred to as primary ambient extraction.
JianJun He
Chapter 2. Literature Review on Spatial Audio
Abstract
Inspired by the human auditory system, the sound scene is considered as the mixture of a foreground sound (primary component, directional) and a background sound (ambient component, diffuse). The primary and ambient components are rendered separately to preserve their spatial characteristics, in accordance with the actual playback configurations. The core problem is how to extract the primary and ambient components from channel-based audio content efficiently. To answer this question, this chapter begins with the fundamentals of spatial hearing and reviews existing spatial audio reproduction techniques, as well as prior arts in primary ambient extraction, which is also compared with another sound scene decomposition technique: blind source separation.
JianJun He
Chapter 3. Linear Estimation-Based Primary Ambient Extraction
Abstract
Audio signals for moving pictures and video games are often linear combinations of primary and ambient components. In spatial audio analysis–synthesis, these mixed signals are usually decomposed into primary and ambient components to facilitate flexible spatial rendering and enhancement. Existing approaches such as principal component analysis (PCA) and least squares (LS) are widely used to perform this decomposition from stereo signals. However, the performance of these approaches in primary ambient extraction (PAE) has not been well studied, and no comparative analysis among the existing approaches has been carried out so far. In this paper, we generalize the existing approaches into a linear estimation framework. Under this framework, we propose a series of performance measures to identify the components that contribute to the extraction error. Based on the generalized linear estimation framework and our proposed performance measures, a comparative study and experimental testing of the linear estimation-based PAE approaches including existing PCA, LS, and three proposed variant LS approaches are presented.
JianJun He
Chapter 4. Ambient Spectrum Estimation-Based Primary Ambient Extraction
Abstract
The diversity of today’s playback systems requires a flexible, efficient, and immersive reproduction of sound scenes in digital media. Spatial audio reproduction based on primary ambient extraction (PAE) fulfills this objective, where accurate extraction of primary and ambient components from sound mixtures in channel-based audio is crucial. Severe extraction error was found in existing PAE approaches when dealing with sound mixtures that contain a relatively strong ambient component, a commonly encountered case in the sound scenes of digital media. In this paper, we propose a novel ambient spectrum estimation (ASE) framework to improve the performance of PAE. The ASE framework exploits the equal magnitude of the uncorrelated ambient components in two channels of a stereo signal and reformulates the PAE problem into the problem of estimating either ambient phase or magnitude. In particular, we take advantage of the sparse characteristic of the primary components to derive sparse solutions for ASE-based PAE, together with an approximate solution that can significantly reduce the computational cost. Our objective and subjective experimental results demonstrate that the proposed ASE approaches significantly outperform existing approaches, especially when the ambient component is relatively strong.
JianJun He
Chapter 5. Time-Shifting-Based Primary Ambient Extraction
Abstract
One of the key issues in spatial audio analysis and reproduction is to decompose a signal into primary and ambient components based on their directional and diffuse spatial features, respectively. Existing approaches employed in primary ambient extraction (PAE), such as principal component analysis (PCA), are mainly based on a basic stereo signal model. The performance of these PAE approaches has not been well studied for the input signals that do not satisfy all the assumptions of the stereo signal model. In practice, one such case commonly encountered is that the primary components of the stereo signal are partially correlated at zero lag, referred to as the primary-complex case. In this paper, we take PCA as a representative of existing PAE approaches and investigate the performance degradation of PAE with respect to the correlation of the primary components in the primary-complex case. A time-shifting technique is proposed in PAE to alleviate the performance degradation due to the low correlation of the primary components in such stereo signals. This technique involves time-shifting the input signal according to the estimated inter-channel time difference of the primary component prior to the signal decomposition using conventional PAE approaches. To avoid the switching artifacts caused by the varied time-shifting in successive time frames, overlapped output mapping is suggested. Based on the results from our experiments, PAE approaches with the proposed time-shifting technique are found to be superior to the conventional PAE approaches in terms of extraction accuracy and spatial accuracy.
JianJun He
Chapter 6. Multiple Source-Based Primary Ambient Extraction
Abstract
In practice, the complex audio scenes could even include multiple concurrent sources in the primary components. Subband techniques are commonly implemented in PAE to deal with such signals. The effect of subband decomposition on PAE is investigated. The results indicate that the partitioning of the frequency bins is very critical in PAE, and the proposed top-down (TD) adaptive partitioning method achieves superior performance, as compared to the conventional partitioning methods. Moreover, we also extended the time-shifting technique to multiple shifts. It is found that the consecutive multishift PAE with proper weighting yields more robust results. These techniques help improve the robustness of PAE in dealing with complex signals.
JianJun He
Chapter 7. Conclusions and Future Works
Abstract
In conclusion, several advancements on primary ambient extraction (PAE) are presented. Objective and subjective evaluations validate the feasibility of applying PAE in spatial audio reproduction. With these advanced PAE approaches readily applied, the listeners can thus immerse himself/herself in the reproduced sound scenes, without the limitation on the audio contents or playback systems. Future work on PAE is also discussed in this chapter.
JianJun He
Metadaten
Titel
Spatial Audio Reproduction with Primary Ambient Extraction
verfasst von
JianJun He
Copyright-Jahr
2017
Verlag
Springer Singapore
Electronic ISBN
978-981-10-1551-9
Print ISBN
978-981-10-1550-2
DOI
https://doi.org/10.1007/978-981-10-1551-9

Neuer Inhalt