Skip to main content

About this book

This book reports on the application of advanced models of the human binaural hearing system in modern technology, among others, in the following areas: binaural analysis of aural scenes, binaural de-reverberation, binaural quality assessment of audio channels, loudspeakers and performance spaces, binaural perceptual coding, binaural processing in hearing aids and cochlea implants, binaural systems in robots, binaural/tactile human-machine interfaces, speech-intelligibility prediction in rooms and/or multi-speaker scenarios. An introduction to binaural modeling and an outlook to the future are provided. Further, the book features a MATLAB toolbox to enable readers to construct their own dedicated binaural models on demand.

Table of Contents


An Introduction to Binaural Processing

The binaural auditory system performs a number of astonishing functions, such as precise localization of sound sources, analysis of auditory scenes, segregation of auditory streams, providing situational awareness in reflective environments, suppression of reverberance, noise and coloration, enhancement of desired talkers over undesired ones, providing spatial impression and the sense of immersion. These functions are of profound interest for technological application and, hence, the subject of increasing engineering efforts. Generic application areas for binaural algorithms are, among others, aural virtual environments, hearing aids, assessment of product-sound quality, room acoustics, speech technology, audio technology, robotic ears, and tools for research into auditory physiology and aural perception. This introductory chapter starts with a discussion of the performance of binaural hearing and then lists relevant areas for technological application. After a short presentation of the physiological background, signal-processing algorithms as applied to binaural modeling are described. These signal-processing algorithms are manifold, but can be roughly divided into localization models and detection models. Both approaches are discussed in some detail. The chapter is meant to serve as an introduction to the main body of the book.
A. Kohlrausch, J. Braasch, D. Kolossa, J. Blauert

The Auditory Modeling Toolbox

The Auditory Modeling Toolbox, AMToolbox, is a Matlab/Octave toolbox for developing and applying auditory perceptual models with a particular focus on binaural models. The philosophy behind the AMToolbox is the consistent implementation of auditory models, good documentation, and user-friendly access in order to allow students and researchers to work with and to advance existing models. In addition to providing the model implementations, published human data and model demonstrations are provided. Further, model implementations can be evaluated by running so-called experiments aimed at reproducing results from the corresponding publications. AMToolbox includes many of the models described in this volume. It is freely available from http://​amtoolbox.​sourceforge.​net
P. L. Søndergaard, P. Majdak

Trends in Acquisition of Individual Head-Related Transfer Functions

Head-related transfer functions, HRTFs, that is, the pair of acoustic transfer functions from a sound source in anechoic space to the human ears, are important elements of binaural technology. In order to make practical use of them, fast and comfortable means for the acquisition of individual HRTFs are required and, furthermore, convenient data formats for their comprehensive representation are needed. This chapter first recapitulates early and seminal work in the field of HRTFs. From here, a concise picture of recent trends for spatially discrete and continuous measurement of HRTFs, with a focus on the more recent continuous, that is, dynamic approach, is developed. For the continuous method, latest results regarding the optimization of the loudspeaker excitation signal for the measurement are presented. With respect to HRTF representation and usage, the chapter refers to spatially-discrete databases in time- or frequency-domain and, additionally, to the spatial Fourier-series domain. The latter constitutes an ideal basis for both interpolation and extrapolation of discrete data as well as for the representation of the results of spherically-continuous measurements.
G. Enzner, Chr. Antweiler, S. Spors

Assessment of Sagittal-Plane Sound Localization Performance in Spatial-Audio Applications

Sound localization in sagittal planes, SPs, including front-back discrimination, relies on spectral cues resulting from the filtering of incoming sounds by the torso, head and pinna. While acoustic spectral features are well-described by head-related transfer functions, HRTFs, models for SP localization performance have received little attention. In this article, a model predicting SP localization performance of human listeners is described. Listener-specific calibrations are provided for 17 listeners as a basis to predict localization performance in various applications. In order to demonstrate the potential of this listener-specific model approach, predictions for three applications are provided, namely, the evaluation of non-individualized HRTFs for binaural recordings, the assessment of the quality of spatial cues for the design of hearing-assist devices and the estimation and improvement of the perceived direction of phantom sources in surround-sound systems.
R. Baumgartner, P. Majdak, B. Laback

Modeling Horizontal Localization of Complex Sounds in the Impaired and Aided Impaired Auditory System

Background noise, room reflections, or interfering sound sources represent a challenge for daily one-to-one communication, particularly for hearing-impaired listeners, even when wearing hearing aid devices. Through a modeling approach, this project investigated how peripheral hearing loss impairs the processing of spatial cues in adverse listening conditions. A binaural model in which the peripheral processor can be tuned to account for individual hearing loss was developed to predict localization in anechoic and reverberant rooms. Hearing impairment was accounted for by a loss of sensitivity, a loss of cochlear compression and reduced frequency selectivity. A spatial cue-selection mechanism processed the output of the binaural equalization-&-cancellation processor to evaluate the localization information’s reliability based on interaural coherence. The simulations in anechoic environment suggested that the sound-source-location estimates become less reliable and blurred in the case of reduced audibility. Simulations in rooms suggested that the broadening of the auditory filters reduces the fidelity of spectral cues and affects the internal representation of interaural level differences. The model-based analysis of hearing-aid processing showed that amplification and compression used to recover audibility also partially recovered the internal representation of the spatial cues in the impaired auditory system. Future work is needed to extend and experimentally validate the model. Overall, the current model represents a first step towards the development of a dedicated research tool for investigating and understanding the processing of spatial cues in adverse listening conditions, with a long-term goal of contributing to solving the cocktail-party problem for normal hearing and hearing-impaired listeners
N. Le Goff, J. M. Buchholz, T. Dau

Binaural Scene Analysis with Multidimensional Statistical Filters

The segregation of concurrent speakers and other sound sources is an important aspect in improving the performance of audio technology, such as noise reduction and automatic speech recognition, ASR, in difficult acoustic conditions. This technology is relevant for applications like hearing aids, mobile audio devices, robotics, hands-free audio communication and speech-based computer interfaces. Computational auditory-scene analysis (CASA) techniques simulate aspects of processing properties of the human perceptual system using statistical signal-processing techniques to improve inferences about the causes of audio input received by the system. This study argues that CASA is a promising approach to achieve source separation and outlines several theoretical arguments to support this hypothesis. With a focus on computational binaural scene analysis, principles of CASA techniques are reviewed. Furthermore, in an experimental approach, the applicability of a recent model of binaural interaction to improve ASR performance in multi-speaker conditions with spatially separated moving speakers is explored. The binaural model provides input to a statistical inference filter that employs a priori information on possible movements of the sources in order to track the positions of the speakers. The tracks are used to adapt a beamformer that selects a specific speaker. The output of the beamformer is subsequently used for an ASR task. Compared to the unprocessed, that is, mixed, data in a two-speaker condition, the word recognition rates obtained with the enhanced signals based on binaural information were increased from 30.8 to 88.4 %, demonstrating the potential of the proposed CASA-based approach.
C. Spille, B. T. Meyer, M. Dietz, V. Hohmann

Extracting Sound-Source-Distance Information from Binaural Signals

The problem of distance estimation by computational methods utilizing binaural information is discussed. Initially, a brief overview is given concerning findings related to the auditory distance perception. Then, several acoustical parameters that depend on the distance between the source and the receiver especially within reverberant rooms are presented. An overview of several existing distance estimation techniques using binaural signals is given and a recent distance estimation method is presented in more detail. This method relies on several statistical features extracted from binaural signals and incorporates all the above features into a classification framework based on Gaussian Mixture models.
E. Georganti, T. May, S. van de Par, J. Mourjopoulos

A Binaural Model that Analyses Acoustic Spaces and Stereophonic Reproduction Systems by Utilizing Head Rotations

It is well known that head rotations are instrumental in resolving front/back confusions in human sound localization. A mechanism for a binaural model is proposed here to extend current cross-correlation models to compensate for head rotations. The algorithm tracks sound sources in the head-related coordinate system, HRCS, as well as in the room-related coordinate system, RRCS. It is also aware of the current head position within the room. The sounds are positioned in space using an HRTF catalog at \(1^{\circ }\) azimuthal resolution. The position of the sound source is determined through the interaural cross-correlation, ICC, functions across several auditory bands that are mapped to functions of azimuth and superposed. The maxima of the cross-correlation functions determine the position of the sound source. Unfortunately, two peaks usually occur, one at or near the correct location and the second at the front/back reversed position. When the model is programmed to virtually turn its head, the degree-based cross-correlation functions are shifted with the current head angle to match the RRCS. During this procedure, the ICC peak for the correct hemisphere will prevail if integrated over time for the duration of the head rotation, whereas the front/back reversed peak will average out.
J. Braasch, S. Clapp, A. Parks, T. Pastore, N. Xiang

Binaural Systems in Robotics

Audition is often described by physiologists as the most important sense in humans, due to its essential role in communication and socialization. But quite surprisingly, the interest of this modality for robotics arose only in the 2000s, brought to evidence by cognitive robotics and Human–robot interaction. Since then, numerous contributions have been proposed to the field of robot audition, ranging from sound localization to scene analysis. Binaural approaches were investigated first, then became forsaken due to mixed results. Nevertheless, the last years have witnessed a renewal of interest in binaural active audition, that is, in the opportunities and challenges opened by the coupling of binaural sensing and robot motion. This chapter proposes a comprehensive state of the art of binaural approaches to robot audition. Though the literature on binaural audition and, more generally, on acoustics and signal processing, is a fundamental source of knowledge, the tasks, constraints, and environments of robotics raise original issues. These are reviewed, prior to the most prominent contributions, platforms and projects. Two lines of research in binaural active audition, conducted by the current authors, are then outlined, one of which is tightly connected to psychology of perception.
S. Argentieri, A. Portello, M. Bernard, P. Danès, B. Gas

Binaural Assessment of Multichannel Reproduction

This chapter outlines the problem of evaluating multichannel reproduction by example of the wave-field synthesis method. This method is known for providing good localization of reproduced source within an extended listening area. The localization performance for a virtual point sources was investigated for various listener positions and loudspeaker-array configurations. Respective results of listening-test were compared with localization predictions by a binaural model. With this model, a localization map can be obtained that covers most listener positions within the synthesis area. With such a localization map, designers of loudspeaker-setups for wave-field synthesis can estimate the localization and localization accuracy to be expected from a given multichannel setup. To enable perception of sound sources at arbitrary positions within the synthesis area of a given wave-field synthesis implementation, input signals to the two ears had to be generated. This was realized by means of dynamic binaural synthesis, a technique that allows for instantaneous switching between different listening scenarios. In a formal pre-test, it was verified that dynamic binaural simulation has no influence on the listeners’ localization performance as compared to natural hearing. Both the test procedure and the modeling results can be taken as a basis for further research regarding the evaluation of multichannel reproduction, an area that is still sparsely covered.
H. Wierstorf, A. Raake, S. Spors

Optimization of Binaural Algorithms for Maximum Predicted Speech Intelligibility

At the threshold of understanding speech in noisy environments, the neural exploitation of interaural differences constitutes a significant margin of intelligibility. Mimicking spatial hearing for improving speech intelligibility in hearing aids and robotics has been demonstrated by compelling results. This chapter reviews binaural speech processors and highlights approaches for their optimal application. Binaural algorithms of speech enhancement draw on the assumption that the target and the noise signal strike a head-mounted processor from different directions and cause distinctive interaural parameters. This assumption is, however, often violated in degraded acoustics. In order to study this degradation, the chapter starts with an examination of binaural statistics in different noise conditions. Subsequently, standard binaural speech processors are studied that use different waveform features, namely, the binaural coherence of the fine-structure, the binaural differences of the fine-structure and the binaural differences of the envelope. As a means to cater to a fair comparison, each algorithm underwent a stochastic optimization of the algorithmic parameters in a set of prototypical speech-in-noise scenes, whereby an instrumental measure of speech intelligibility served as the objective function. Furthermore, the binaural speech processors are applied at the output of commercially-available hearing aids that feature superdirective beamformers with different directivity modes. In this way, the SNR-gain that adds to the pre-processing of the beamformer is assessed. For deriving filter gains from binaural statistics, part three of this chapter describes histogram-based methods and parametric approaches for the binaural fine-structure algorithm, and compares these in a realistic environment with reverberation and additive noise. Part three can also be read as a hands-on description, thereby addressing students and engineers, who are striving for an ad-hoc implementation of a binaural system for speech enhancement.
A. Schlesinger, Chr. Luther

Modeling Sound Localization with Cochlear Implants

This chapter describes a model framework for evaluating the precision of as to which interaural time differences, ITD, are represented in the left- and right-ear auditory-nerve responses. This approach is very versatile, as it allows not only for the evaluation of spiking neuronal responses from models of intact inner ears but also of responses of the deaf ears of cochlear implantees. The model framework delivers quantitative data and, therefore, enables comparisons between different cochlear-implant coding strategies. As the model of electric excitation of the auditory nerve also includes effects such as channel crosstalk, neuronal adaptation and mismatch of electrode positions between left and right ears, its predictive power is much higher than an analysis of the electrical impulses delivered to the electrodes. Evaluation of a novel fine-structure-coding strategy as used by a major implant manufacturer, revealed that, in a best case scenario, sophisticated strategies should be able to provide ITD cues with sufficient precision for sound localization. However, whether these cues can actually be exploited by cochlear implant users has yet to be determined by listening tests. Nevertheless, the model framework introduced here is a valuable tool for the development and pre-evaluation of bilateral cochlear implant coding strategies.
M. Nicoletti, Chr. Wirtz, W. Hemmert

Binaural Assessment of Parametrically Coded Spatial Audio Signals

In parametric time-frequency-domain spatial audio techniques, the sound field is encoded as a combination of a few audio channels with metadata. The metadata parametrizes the spatial properties of the sound field that are known to be perceivable to humans. The most well-known techniques are reviewed in this chapter. The spatial artifacts specific to such techniques are described, such as dynamically or statically biased directions, spatially too narrow auditory images, and effects of off-sweet-spot listening. Such cases are analyzed with a binaural auditory model, and it is shown that the artifacts are clearly visualized thereby.
M. Takanen, O. Santala, V. Pulkki

Binaural Dereverberation

Room reverberation degrades the quality and intelligibility of speech and also reduces the performance of automatic speech-recognition systems. Hence, blind or semi-blind dereverberation methods have been developed, utilising single or multiple input channels. Dereverberation is also important for binaural applications in the context of digital hearing aids, binaural telephony and hands free devices. However, the development of binaural dereverberation solutions is not trivial. Apart from the challenging task of reducing reverberation without introducing audible artifacts, binaural dereverberation should also at least preserve the interaural arrival-time and amplitude differences of the signals at the two ears, as these represent relevant cues for sound-source localization. In this chapter, an overview of auditory perception and physical features of reverberation is given. Further, a literature review of dereverberation methods is presented, leading to the more recent binaural techniques. Two specific binaural-dereverberation methods will be considered in more detail, one of them relying on binaural coherence and the other one on utilizing spectral subtraction for suppressing late-reverberation effects. The results of performance tests on these methods will be presented, along with a discussion of suitable objective and perceptual evaluation methods.
A. Tsilfidis, A. Westermann, J. M. Buchholz, E. Georganti, J. Mourjopoulos

Binaural Localization and Detection of Speakers in Complex Acoustic Scenes

The robust localization of speech sources is required for a wide range of applications, among them hearing aids and teleconferencing systems. This chapter focuses on binaural approaches to estimate the spatial position of multiple competing speakers in adverse acoustic scenarios by only exploiting the signals reaching both ears. A set of experiments is conducted to systematically evaluate the impact of reverberation and interfering noise on speaker-localization performance. In particular, the spatial distribution of the interfering noise has a considerable effect on speaker-localization performance, being most detrimental if the noise field contains strong directional components. In these conditions, interfering noise might be erroneously classified as a speaker position. This observation highlights the necessity to combine the localization stage with a decision about the underlying source type in order to enable a robust localization of speakers in noisy environments.
T. May, S. van de Par, A. Kohlrausch

Predicting Binaural Speech Intelligibility in Architectural Acoustics

A binaural model of speech understanding in background noise is presented and applied to the problem of predicting intelligibility in noisy rooms. It is shown that the model can make accurate predictions from binaural room impulse responses that are short compared to the reverberation time of the room. The model indicates (1) that there can be wide variations in intelligibility even within a fairly uniform listening space when multiple noise sources are present, (2) reverberation time is a poor predictor of intelligibility, (3) intelligibility varies as a function of the listener’s’ head orientation. The effects of room occupancy, restaurant table orientation and hearing impairment are also discussed.
J. F. Culling, M. Lavandier, S. Jelfs

Assessment of Binaural–Proprioceptive Interaction in Human-Machine Interfaces

Binaural models help to predict human localization under the assumption that a corresponding localization process is based on acoustic signals, thus, on unimodal information. However, what happens if this localization process is realized in an environment with available bimodal or even multimodal sensory input? Do we still consider the auditory modality in the localization process? Can binaural models help to predict human localization in bimodal or multimodal scenes? At the beginning, this chapter focuses on binaural-visual localization and demonstrates that binaural models are definitely required for modeling human localization even when visual information is available. The main part of this chapter dedicates to binaural-proprioceptive localization. First, an experiment is described with which the proprioceptive localization performance was quantitatively measured. Second, the influence of binaural signals on proprioception was investigated to reveal whether synthetically generated spatial sound can improve human proprioceptive localization. The results demonstrate that it is indeed possible to auditorily guide proprioception. In conclusion, binaural models can not only be used for modeling human binaural-visual, but also for modeling human binaural-proprioceptive localization. It is shown that binaural-modeling algorithms, thus, play an important role for further technical developments.
M. Stamm, M. E. Altinsoy

Further Challenges and the Road Ahead

Models of binaural hearing are well established versatile tools for many technological applications. Traditionally, most of these models are restricted to the processing of the acoustical input signals to the two ears. Yet, signal processing alone cannot model cognitive processes like the identification of salient perceptual cues, focused attention, the formation of aural objects, the composition of aural scenes and their interpretation, as well as the assignment of meaning to them and, eventually, the performance of quality judgements. Further, for many technological purposes, human listeners have to be conceived as active agents that explore their environment actively in a multi-modal fashion, thereby also considering information from senses other than hearing. To include these functions, binaural models will have to become more intelligent and, consequently, contain increasing inherent knowledge, coupled with means to further develop this knowledge in situation- and task-specific ways. In this chapter, a general vision is presented of how such future systems may be constructed, and some tools are introduced that may be useful in this context.
J. Blauert, D. Kolossa, K. Obermayer, K. Adiloğlu


Additional information