Skip to main content

2020 | Buch

The Technology of Binaural Understanding

insite
SUCHEN

Über dieses Buch

This book offers a computational framework for modeling active exploratory listening that assigns meaning to auditory scenes. Understanding auditory perception and cognitive processes involved with our interaction with the world are of high relevance for a vast variety of ICT systems and applications. Human beings do not react according to what they perceive, but rather, they react on the grounds of what the percepts mean to them in their current action-specific, emotional and cognitive situation. Thus, while many models that mimic the signal processing involved in human visual and auditory processing have been proposed, these models cannot predict the experience and reactions of human users. This book presents a model that incorporates both signal-driven (bottom-up), and hypothesis-driven (top-down) processing.

Inhaltsverzeichnis

Frontmatter

Forming and Interpreting Aural Objects: Effects and Models

Frontmatter
Reflexive and Reflective Auditory Feedback
Abstract
Current models of binaural hearing go beyond bottom-up-driven processing and, instead, use a hybrid approach by including top-down, hypothesis-driven algorithms. Such hybrid models first identify and characterize auditory objects. Out of these objects, the model infers an auditory scene, from which it can extrapolate understanding, form judgments, and initiate actions. For example, when embedded in a mobile robot, a binaural hearing system can provide the information needed to carry out search-and-rescue tasks. Further, such systems are able to make judgments, for instance, on the quality of experience in spaces for musical performances. As with humans, such actions and judgments are based on sets of references built from perceptual structures, inherent and acquired knowledge, and the intellectual capabilities of the systems—in other words, on the “brains” of the model systems and the knowledge contained in them. To achieve these goals, adequate feedback loops must to be considered, evaluated, and implemented within technological models of auditory systems. In this chapter, a number of such feedback loops are described and discussed that have already been implemented and evaluated. A distinction is made between reflexive and reflective feedback mechanisms, the latter, including cognitive activities.
Jens Blauert, Guy J. Brown
Auditory Gestalt Rules and Their Application
Abstract
The formation of auditory objects is of high interest for both the understanding of human hearing as well as for computer-based analysis of sound signals. Breaking down an acoustic scene into meaningful units facilitates the segregation and recognition of sound-sources from a mixture. These are abilities that are particularly challenging for machine listening as well as for hearing-impaired listeners.  An early approach to explaining object perception in the visual domain was made by the Gestalt psychologists. They aimed at setting up specific rules according to which sensory input is grouped into either one coherent or multiple separate objects. Inspired by these Gestalt Rules and by exploiting physical and perceptual properties of sounds, different algorithms have been designed to segregate sound mixtures into auditory objects. This chapter reviews some literature on such algorithms and the underlying principles of auditory object formation with a special focus on the connection between perceptual findings and their technical implementation.
Sarinah Sutojo, Joachim Thiemann, Armin Kohlrausch, Steven van de Par
Selective Binaural Attention and Attention Switching
Abstract
This chapter examines the cognitive control mechanisms underlying auditory selective attention by considering the influence of variables that increase the complexity of the auditory scene concerning technical aspects such as dynamic binaural hearing, room acoustics, head movements, and interfering noise sources as well as those that influence the efficiency of cognitive processing. Classical research in auditory selective attention does not represent realistic or close to real-life listening experiences, of which room acoustics, distracting sources, as well as the dynamic reproduction of an acoustic scene including head movements, are essential parts. The chapter starts with an introduction to the subject of maintaining and switching attention from the standpoint of cognitive psychology. A paradigm suitable for the study of intentional switching of auditory selective attention is introduced through dichotic stimulus representation with different single number words (1–9, excluding 5) uttered by speakers of different gender presented simultaneously, one to the participant’s left ear and the other to the right ear. The listener is required to categorize, as quickly as possible, the target number as being either smaller or larger than five, with a visual cue indicating the listener’s task in each trial. This paradigm is gradually extended from dichotic reproduction to a complex dynamic acoustic scene to study the binaural effects in selective attention and attention switching, including different room acoustic conditions. Various technical possibilities are evaluated to validate the binaural reproduction of the spatial scene, minimizing errors on account of the acoustic virtual reality. Additionally, the influence of different binaural reproduction methods on the selective attention and attention switching model is carefully examined and compared to a natural listening condition using loudspeakers in an anechoic setting. The application of the binaural listening paradigm in anechoic conditions tests a listener’s ability to switch auditory attention in various setups intentionally. According to the results, intentional switching of the attention focus is associated with higher reaction times compared to maintaining the focus of attention on a single source. Also, particularly concerning the error rates, there is an observable effect of the stimulus category (i.e., stimuli spoken by target and distractor may evoke the same answer (congruent) or different answers (incongruent)). The congruency effect may be construed as an implicit performance measure of the degree to which task-irrelevant information is filtered out. The binaural paradigm has also been applied to older (slightly hearing-impaired) participants, with the results of which have been compared to experiments involving young normal-hearing participants, resulting in higher error rates and reaction times. Scenarios involving even more complex environments, including room acoustics (i.e., reverberation), have shown reaction times and error rates that rely significantly on reverberation time. Switch costs, in particular, reaction time differences between switch trials and repetition trials, can highly depend on the reverberation time.
Janina Fels, Josefa Oberem, Iring Koch
Blackboard Systems for Cognitive Audition
Abstract
An essential part of auditory scene understanding is building an internal model of the world surrounding the listener. This internal representation can be mimicked computationally via a blackboard-system-based software architecture. Blackboard systems allow efficient integration of different perceptual modalities, algorithms, and data representations into a coherent and flexible computational framework. The term “blackboard” in this context stands for a flexible and compositional internal data representation, allowing individual software modules to access and process available information. This modular architecture also makes the system adaptable to different application scenarios and provides interfaces to incorporate feedback paths, which allows the system to derive task-optimal active behavior from the internal model. Extending conventional blackboard systems with modern machine-learning techniques, specifically probabilistic modeling and neural networks, enables the system to incorporate learning strategies into this computational framework. Additionally, online learning and adaptation strategies can be integrated into the data representation within the blackboard. This is particularly useful for developing feedback approaches. This chapter gives a review of existing blackboard systems for different applications and provides the necessary theoretical foundations. Subsequently, novel extensions that were recently introduced in the context of binaural scene analysis and understanding are presented and discussed. A special focus is set on possibilities for incorporating feedback and learning strategies into the framework.
Christopher Schymura, Dorothea Kolossa

Configuring and Understanding Aural-Space

Frontmatter
Formation of Three-Dimensional Auditory Space
Abstract
Human listeners need to permanently interact with their three-dimensional (3D) environment. To this end, spatial hearing requires efficient perceptual mechanisms to form a sufficiently accurate 3D auditory space. This chapter discusses the formation of the 3D auditory space from various perspectives. The aim is to show links between cognition, acoustics, neurophysiology, and psychophysics. The first part presents recent cognitive concepts for creating internal models of the complex auditory environment. Second, the acoustic signals available at the ears are described and the spatial information they convey is discussed. Third, neural substrates forming the 3D auditory space in the brain are explored. Finally, the chapter elaborates on psychophysical spatial tasks and percepts that are only possible because of the formation of the auditory space.
Piotr Majdak, Robert Baumgartner, Claudia Jenny
Biological Aspects of Perceptual Space Formation
Abstract
Traditional ideas of how auditory space is formed and represented in the brain have been dominated by the concept of topographically arranged neuronal maps —similar to what is known from the visual system. Specifically, it had canonically been assumed that the brain’s representation  of the location of sound sources is “hard-wired”, that is a specific location in space relative to the head is encoded by a particular sub-set of neurons tuned to that head angle. However, recent experimental findings strongly contradict this assumption for the computation of sound location in mammals (including humans). These data rather suggest a “relative” spatial code that favors the determination of changes in location over its absolute position. Here we explain the mechanisms underlying neuronal spatial sensitivity in mammals  and summarize the data that led to this paradigm shift. We further explain that a consideration of evolutionary constraints of spatial cue use and their processing strategies is crucial for the understanding of the concepts underlying auditory spatial representation in mammals. Finally, we review recent neurophysiological and psychophysical findings demonstrating pronounced context-dependent plasticity  in the neuronal coding and perception. We conclude that mammalian spatial hearing is based on a relative representation of auditory space, which has significant implications for how we localize sound sources in complex environments.
Michael Pecka, Christian Leibold, Benedikt Grothe
Auditory Spatial Impression in Concert Halls
Abstract
This chapter discusses the acoustics of concert halls from the viewpoint of binaural perception. It explains how early reflections have a crucial role in the quality of sound, perceived dynamics, and timbre. In particular, the directions from which these reflections reach the listener are important for human spatial hearing. The chapter has strong links to psychoacoustical phenomena, such as the precedence effect, binaural loudness, and spaciousness. The chapter discusses which aspects of a concert hall give listeners the impression of intimacy and the perception of proximity to the sound. Moreover, it is explained how a concert hall can change the perceived dynamics of a music ensemble. Examples are presented using measured data from real concert halls.
Tapio Lokki, Jukka Pätynen
Auditory Room Learning and Adaptation to Sound Reflections
Abstract
Sound reflections are abundant in everyday listening spaces, but they are rarely bothersome, and people are often not even aware of their presence. As shown in several studies, this is partially due to adaptation of the human auditory system to the spatiotemporal reflection pattern, namely, through an increase in the echo threshold that follows repeated exposure to the same reflection pattern. This raises the question of whether adaptation mechanisms to room reflections lead to improved localization accuracy as well—a measure more tangible for everyday listening. Moreover, this benefit would only be useful if it could be maintained through changes in the reflection pattern such as those produced by head turns or body movement within the room, or from sources at different locations. Therefore, a particular mechanism is hypothesized by the current authors based on learning a representation of the room geometry, rather than learning of or adapting to a specific reflection pattern. This chapter reviews and discusses the available literature on the build-up of the precedence effect and related effects in speech understanding in reverberation. In light of the hypothesis of room learning, it aims to trigger a discussion about the underlying mechanisms.
Bernhard U. Seeber, Samuel Clapp
Room Effect on Musicians’ Performance
Abstract
This chapter reviews the basics of music and room acoustics perception, an overview of auralization methods for the investigation of music performance and a series of studies related to the impact of room acoustics on listeners and musicians. The acoustics of the performance environment play a major role for musicians, both during rehearsals and concerts. However, systematic investigations of music performance are challenging due to the variety of conditions that determine the artists’ performance. Set-ups that allow controlled studies with variable but well-defined acoustic conditions have been developed over the last decades with increasing naturalness and applicability. Current auralization methods allow the reproduction of measured or synthesized room acoustics in real-time, thus enabling the perceptual assessment of room acoustics in laboratory conditions, isolating acoustics from other potential impacting factors. Common methodologies, as well as advantages and limitations of such virtual environments for the study of music and room acoustics perception are discussed in the first section. The virtual environments enable studies that help to explain why and how room acoustics can affect the listener subjective impact of a musical performance and to what extent listeners can be classified depending on their individual taste. Recent studies have shown that musicians systematically adjust their musical performance and adapt to the room acoustical conditions. The most important findings from these studies are presented in the second section. Methods and results from recent investigations of the impact of room acoustics on music performance are discussed in the third section of this chapter.
Malte Kob, Sebastià V. Amengual Garí, Zora Schärer Kalkandjiev
Binaural Modeling from an Evolving-Habitat Perspective
Abstract
Functional binaural models have been used since the mid-20th century to simulate laboratory experiments. The goal of this chapter is to extend the capabilities of a cross-correlation model so it can demonstrate human listening in complex scenarios found in nature and human-built environments. A ray-tracing model is introduced that simulates a number of environments for this study. This chapter discusses how the auditory system is used to read and understand the environment and how tasks that require binaural hearing may have evolved throughout human history. As use cases, sound localization in a forest is examined, as well as the binaural analysis of spatially diffuse and rectangular rooms. The model is also used to simulate binaural hearing during a walk-through a simulated office-suite environment.
Jonas Braasch

Processing Cross-Modal Inference

Frontmatter
Psychophysical Models of Sound Localisation with Audiovisual Interactions
Abstract
Visual signals can have an important impact on the perceived location of sound sources. Neurological mechanisms enable interactions between seeing and hearing to form a sense of space. The effect of vision on auditory localisation percepts is of fundamental importance. A sound source is either perceived at the location of the visual source or it is perceptually shifted toward it’s direction. This bias is one form of visual capture. The extent of the interactions depends on time and space constraints beyond which visual and auditory cues do not necessarily interact. These constraints and interactions vary for the localisation of sources along the horizontal and vertical planes, as well as with distance. While the traditional models of audiovisual interaction in space perception assume sensory integration, recent models allow for sensory cues to either interact or not. Models of visual dominance, modality appropriateness, and maximum likelihood estimation predict one combined percept. The newer models of causal inference allow for varied perceptual outcomes depending on the relationship of the different sensory cues. Finally, visual spatial cues can induce changes to how sounds are localised after the audiovisual experience. This notorious effect, known as the ventriloquism aftereffect, is possibly the main mechanism of auditory space learning and calibration. The ventriloquism aftereffect has been described with a causal inference model and with an inverse model. The current chapter discusses all of the above concepts, establishing a connection between psychophysical data and available models.
Catarina Mendonça
Cross-Modal and Cognitive Processes in Sound Localization
Abstract
To perceptually situate a sound source in the context of its surrounding environment, a listener must integrate two spatial estimates, (1), the location, relative to the listener’s head, of the auditory event associated with the sound-source and, (2), the location of the listener’s head relative to the environment. This chapter introduces the general background of auditory localization as a multi-sensory process and reviews studies of cross-modal interactions with auditory localization for stationary/moving sound sources and listeners. Included are relevant results from recent experiments at Arizona State University’s Spatial-Hearing and Auditory Computation and Neurophysiology Laboratories. Finally, a conceptual model of the integrated multisensory/multi-system processes is described.
M. Torben Pastore, Yi Zhou, William A. Yost
Spatial Soundscape Superposition and Multimodal Interaction
Abstract
Contemporary listeners are exposed to overlaid cacophonies of sonic sources, both intentional and incidental.  Such soundscape superposition can be usefully characterized by where such combination actually occurs: in the air, at the ears of listeners, in the auditory imagery subjectively evoked by such events, or in whatever audio equipment is used to mix, transmit, and display such signals. This chapter regards superposition of spatial soundscapes: physically, perceptually, and procedurally. Physical (acoustic) superposition describes such aspects as configuration of personal sound transducers, panning among multiple sources, speaker arrays, and the modern challenge of how to integrate and exploit mobile devices and “smart speakers.”  Perceptual (subjective and psychological) superposition describes such aspects as binaural image formation, auditory objects and spaces, and multimodal sensory interpretation. Procedural (logical and cognitive) superposition describes higher-level relaxation of insistence upon literal auralization, leveraging idiom and convention to enhance practical expressiveness, metaphorical mappings between real objects and virtual position such as separation of direction and distance; range -compression and -indifference; layering of soundscapes;  audio windowing, narrowcasting, and multipresence as strategies for managing privacy; and mixed reality deployments.
Michael Cohen, William L. Martens

Evaluating Aural-Scene Quality and Speech Understanding

Frontmatter
Binaural Evaluation of Sound Quality and Quality of Experience
Abstract
The chapter outlines the concepts of Sound Quality  and Quality of Experience (QoE).  Building on these, it describes a conceptual model of sound quality perception and experience during active listening in a spatial-audio context. The presented model of sound quality perception considers both bottom-up (signal-driven) as well as top-down (hypothesis-driven) perceptual functional processes. Different studies by the authors and from the literature are discussed in light of their suitability to help develop implementations of the conceptual model. As a key prerequisite, the underlying perceptual ground-truth data required for model training and validation are discussed, as well as means for deriving these from respective listening tests. Both feature-based and more holistic modeling approaches are analyzed. Overall, open research questions are summarized, deriving trajectories for future work on spatial-audio Sound Quality and Quality of Experience modeling.
Alexander Raake, Hagen Wierstorf
The Language of Rooms: From Perception to Cognition to Aesthetic Judgment
Abstract
Rooms are not perceptual objects themselves; they can only be perceived through their effect on the presented signal, the sound source, and the human receiver. An overview of different approaches to identify the qualities and the dimensions of “room acoustical impression” will be provided, that have resulted in psychological measuring instruments for room acoustical evaluation from the audience perspective. It will be outlined how the psychoacoustic aspects of room acoustical perception are embedded in a socio-cultural practice that leads to an aesthetic judgment on the quality of performance venues for music and speech.
Stefan Weinzierl, Steffen Lepa, Martin Thiering
Modeling the Aesthetics of Audio-Scene Reproduction
Abstract
Reviewing work from diverse scientific fields, this chapter approaches the human aesthetic response to reproduced audio as a process of attraction and efficient (“fluent”) processing for certain auditory stimuli that can be associated with listener pleasure (valence) and attention (arousal), provided that they conform to specific semantic and contextual principles, either derived from perceived signal features or from top-down cognitive processes. Recent techniques for room-related loudspeaker-based presentation of auditory scenes, especially via multichannel reproduction, further extend the options for manipulating the source signals to allow the rendering of virtual sources beyond the frontal azimuth angles and to enhance the listener envelopment. Hence, such methods increase arousal and valence and contribute additional factors to the listeners’ aesthetic experience for reproduced natural or virtual scenes. This chapter also examines the adaptation of existing models of aesthetic response to include listeners’ aesthetic assessments of spatial-audio reproduction in conjunction with present and evolving methods for evaluating the quality of such audio presentations. Given that current sound-quality assessment methods are usually strongly rooted in objective, instrumental measures and models, which intentionally exclude the observers’ emotions, preferences and liking (hedonic response), the chapter also proposes a computational model structure that can incorporate aesthetic functionality beyond or in conjunction with quality assessment.
John Mourjopoulos
A Virtual Testbed for Binaural Agents
Abstract
Current developments in modeling the auditory system lead to increasing inclusion of cognitive functions, such as dynamic auditory scene analysis. This qualifies these systems as auditory front-ends for autonomous agents. Such agents can, for example, be mobile robotic systems, that is, they can move around in their environments, explore them, and develop internal models of them. Thereby, they can monitor their environments and become active in cases where potentially hazardous things happen. For example, in a Search-&-Rescue scenario (SAR), the agents could identify and save persons in dangerous situations. In this chapter, a virtual testbed for such systems is described that was developed in the EU project Two!Ears (www.​twoears.​eu) There, in simulated scenarios, the agents have to localize and identify potential victims and, consequently, rescue them according to dynamic SAR plans. The actions are predominantly based on binaural cues, derived from the two ear signals of head-and-torso simulators (dummy heads) on carriages that can actively move about in the scenes to be explored. Such a simulation system can provide a tool to monitor and evaluate the cognitive processes of autonomous systems while these are dynamically executing assigned tasks.
Jens Blauert
Binaural Technology for Machine Speech Recognition and Understanding
Abstract
It is well known that binaural processing is very useful for separating incoming sound sources as well as for improving speech intelligibility in reverberant environments. This chapter describes and compares a number of ways in which automatic-speech-recognition accuracy in difficult acoustical environments can be improved through the use of signal processing techniques that are motivated by our understanding of binaural perception and binaural technology. These approaches are all based on the exploitation of interaural differences in arrival time and intensity of the signals arriving at the two ears to separate signals according to direction of arrival and to enhance the desired target signal. Their structure is motivated by classic models of binaural hearing as well as the precedence effect. We describe the structure and operation of a number of methods that use two or more microphones to improve the accuracy of automatic-speech-recognition systems operating in cluttered, noisy, and reverberant environments. The individual implementations differ in the methods by which binaural principles are imposed on speech processing, and in the precise mechanism used to extract interaural time and intensity differences. Algorithms that exploit binaural information can provide substantially improved speech-recognition accuracy in noisy, cluttered, and reverberant environments compared to baseline delay-and-sum beamforming. The type of signal manipulation that is most effective for improving performance in reverberation is different from what is most effective for ameliorating the effects of degradation caused by spatially-separated interfering sound sources.
Richard M. Stern, Anjali Menon
Modeling Binaural Speech Understanding in Complex Situations
Abstract
This chapter reviews binaural models available to predict speech intelligibility for different kinds of interference and in the presence of reverberation. A particular effort is made to quantify their performances and to highlight the a priori knowledge they require in order to make a prediction. In addition, cognitive factors that are not included in current models are considered. The lack of these factors may limit the ability of current models to predict speech understanding in real-world listening situations fully.
Mathieu Lavandier, Virginia Best

Applying Cognitive Mechanisms to Audio Technology

Frontmatter
Creating Auditory Illusions with Spatial-Audio Technologies
Abstract
Perception of sound fields reproduced by loudspeaker arrays which are driven by spatial-audio technologies, such as Wave-Field Synthesis or Higher-Order Ambisonics, is examined in the light of “Spatial Auditory Illusions”. The spatial-audio technologies are based on illusions as to which real sound-sources vanish perceptually in favor of virtual sources. Furthermore, spatial-audio technologies are able to synthesize sound fields which are very similar to sound fields as created by real sources. In this chapter, these illusions are first explored as a function of the acoustic (physical) properties of the synthesized sound fields. Then, the perceptual dimensions are reviewed of what is actually heard when being exposed to these synthesized sound fields.
Rozenn Nicol
Creating Auditory Illusions with Binaural Technology
Abstract
It is pointed out that beyond reproducing the physically correct sound pressure at the eardrums, more effects play a significant role in the quality of the auditory illusion. In some cases, these can dominate perception and even overcome physical deviations. Perceptual effects like the room-divergence effect, additional visual influences, personalization, pose and position tracking as well as adaptation processes are discussed. These effects are described individually, and the interconnections between them are highlighted. With the results from experiments performed by the authors, the perceptual effects can be quantified. Furthermore, concepts are proposed to optimize reproduction systems with regard to those effects. One example could be a system that adapts to varying listening situations as well as individual listening habits, experience and preference.
Karlheinz Brandenburg, Florian Klein, Annika Neidhardt, Ulrike Sloma, Stephan Werner
Toward Cognitive Usage of Binaural Displays
Abstract
Based on acoustic input to their two ears, humans are able to collect rich spatial information. To explore their acoustic environment in more detail, they thereby move their bodies and heads to resolve ambiguities as might appear in static spatial hearing. This process is termed “active listening.” This chapter introduces new research regarding two specific aspects of active listening, namely, (i), facilitation of sound localization in the median plane and, (ii), augmentation of the discrimination angle for frontal auditory object. As active listening affects spatial hearing significantly, the design of systems for spatial-sound presentation requires substantial expertise in this field. In this context, a dynamic binaural display was developed that supports active listening. The display was applied to edutainment applications such as training the spatial-perception competence of visually impaired persons. Two examples were specifically investigated for this purpose, namely, a maze game and an action game. The former facilitates players’ ability to draw cognitive maps. The latter improves the sound-localization performance of players, their eye-contact frequency during conversation, and their ability to avoid approaching objects. Results suggest that binaural displays that support active listening are indeed capable of enhancing listener experience in reproduced and virtual auditory scenes.
Yôiti Suzuki, Akio Honda, Yukio Iwaya, Makoto Ohuchi, Shuichi Sakamoto
Audition as a Trigger of Head Movements
Abstract
In multimodal realistic environments, audition and vision are the prominent two sensory modalities that work together to provide humans with a best possible perceptual understanding of the environment. Yet, when designing artificial binaural systems, this collaboration is often not honored. Instead, substantial effort is made to construct best performing purely auditory-scene-analysis systems, sometimes with goals and ambitions that reach beyond human capabilities. It is often not considered that, what enables us to perform so well in complex environments, is the ability of: (i) using more than one source of information, for instance, visual in addition to auditory one and, (ii) making assumptions about the objects to be perceived on the basis of a priori knowledge. In fact, the human capability of inferring information from one modality to another one helps substantially to efficiently analyze the complex environments that humans face everyday. Along this line of thinking, this chapter addresses the effects of attention reorientation triggered by audition. Accordingly, it discusses mechanisms that lead to appropriate motor reactions, such as head movements for putting our visual sensors toward an audiovisual object of interest. After presenting some of the neuronal foundations of multimodal integration and motor reactions linked to auditory-visual perception, some ideas and issues from the field of a robotics are tackled. This is accomplished by referring to computational modeling. Thereby some biological bases are discussed as underlie active multimodal perception, and it is demonstrated how these can be taken into account when designing artificial agents endowed with human-like perception.
Benjamin Cohen-Lhyver, Sylvain Argentieri, Bruno Gas
Intelligent Hearing Instruments—Trends and Challenges
Abstract
Hearing instruments (HIs) aim at helping people with hearing impairment who often have difficulties to understand speech in noisy environments. This chapter provides an overview of the current technological trends and challenges in the field of HI applications. It covers the state-of-the-art of signal-processing algorithms used in modern digital HIs. Focus is given on the extensions of such algorithms for applications, where microphone signals are employed from both the left and right HIs (binaural case). Furthermore, the chapter refers to the challenges for the optimal parametrization and steering of the HI algorithms. The concepts of environment classification for automatically controlling the settings of an HI in different listening situations are discussed and a brief summary of sound-source-localization methods is given. Finally, this chapter discusses the current trends of adding sensors in HIs that can potentially further enhance the hearing performance of the devices and improve the life of hearing-impaired people.
Eleftheria Georganti, Gilles Courtois, Peter Derleth, Stefan Launer
Scene-Aware Dynamic-Range Compression in Hearing Aids
Abstract
Wide dynamic-range compression (WDRC) is one of the essential building blocks in hearing aids and aims at improving audibility while maintaining acceptable loudness at high sound pressure levels for hearing-impaired (HI) listeners. While fast-acting compression with a short release time allows amplifying low-intensity speech sounds on short time scales corresponding to syllables or phonemes, such processing also typically amplifies noise components in speech gaps. The latter reduces the output signal-to-noise ratio (SNR) and disrupts the acoustic properties of the background noise. Moreover, the use of fast-acting compression distorts auditory cues involved in the spatial perception of sounds in rooms by amplifying low-level reverberant energy portions of the sound relative to the direct sound. Some of these shortcomings can be avoided by choosing a longer release time, but such a slow-acting compression system fails to amplify soft speech components on short time scales and compromises on the ability to restore loudness perception. This chapter investigates the benefit of a new scene-aware dynamic-range compression strategy, which attempts to combine the advantages of both fast- and slow-acting compression. Specifically, the release time of the compressor is adaptively changed to provide fast- and slow-acting compression depending on whether the target was present or absent. The benefit of this scene-aware compression strategy was evaluated instrumentally in acoustic scenarios where speech and noise were present simultaneously. Moreover, a subjective listening test was conducted to assess the impact of scene-aware compression on reverberant speech signals by measuring the perceived location and spatial distribution of virtualized speech in normal-hearing (NH) listeners.
Tobias May, Borys Kowalewski, Torsten Dau
Backmatter
Metadaten
Titel
The Technology of Binaural Understanding
herausgegeben von
Prof. Dr. Jens Blauert
Prof. Jonas Braasch
Copyright-Jahr
2020
Electronic ISBN
978-3-030-00386-9
Print ISBN
978-3-030-00385-2
DOI
https://doi.org/10.1007/978-3-030-00386-9

    Premium Partner