nach oben

2013 | Buch

Kapitel lesen Erstes Kapitel lesen

Intelligent Audio Analysis

verfasst von: Björn W. Schuller

Verlag: Springer Berlin Heidelberg

Buchreihe : Signals and Communication Technology

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik"

Einloggen, um Zugang zu erhalten

Über dieses Buch

This book provides the reader with the knowledge necessary for comprehension of the field of Intelligent Audio Analysis. It firstly introduces standard methods and discusses the typical Intelligent Audio Analysis chain going from audio data to audio features to audio recognition. Further, an introduction to audio source separation, and enhancement and robustness are given.

After the introductory parts, the book shows several applications for the three types of audio: speech, music, and general sound. Each task is shortly introduced, followed by a description of the specific data and methods applied, experiments and results, and a conclusion for this specific task.

The books provides benchmark results and standardized test-beds for a broader range of audio analysis tasks. The main focus thereby lies on the parallel advancement of realism in audio analysis, as too often today’s results are overly optimistic owing to idealized testing conditions, and it serves to stimulate synergies arising from transfer of methods and leads to a holistic audio analysis.

Inhaltsverzeichnis

Frontmatter

Introduction

Frontmatter

Chapter 1. Intelligent Audio Analysis: A Definition

Abstract

A definition of Intelligent Audio Analysis is given. Further, real-life conditions are defined comprising the aspects of non-prototypical and non-preselected independent test data that has never been used during optimisation, and fully automatic processing.

Björn Schuller

Chapter 2. Motivation, Aims, and Solutions

Abstract

Research in the field of Intelligent Audio Analysis possesses rich application potential of commercial and practical interest. These comprise audio alteration, retrieval, interaction, monitoring and surveillance, and entertainment. A detailed description is given alongside current aims to advance the field.

Björn Schuller

Chapter 3. Structure of the Book

Abstract

A division into four main parts is outlined leading from the introductory short motivation with general aims and solutions towards reaching of these to a definition of audio analysis per se to audio analysis methods and applications before concluding.

Björn Schuller

Intelligent Audio Analysis Methods

Frontmatter

Chapter 4. Chain of Audio Processing

Abstract

The chain of processing in a typical Intelligent Audio Analysis system is outlined. Along its path, it leads from preprocessing to Low Level Descriptor extraction, chunking, supra segmental analysis and hierarchical functional extraction, feature reduction, feature selection and generation, parameter selection, model learning to the actual classification or regression. This can be followed by a fusion with other information streams, and encoding for the application context. The individual steps are explained in more detail.

Björn Schuller

Chapter 5. Audio Data

Abstract

In order to train and test Intelligent Audio Analysis systems, audio data is needed. In fact, this is often considered as one of the main bottle necks and the common opinion is that there is "no data like more data". In this light, the requirements for collecting and providing audio databases are outlined. This includes in particular the establishment of a reliable gold standard. Explanatory examples are given for the three types of audio—speech, music, and general sound—by the corpora TUM AVIC, NTWICM, and the FindSounds database.

Björn Schuller

Chapter 6. Audio Features

Abstract

To represent the information contained in an audio (stream) in a compact way focussing on a task of interest, a parameterised form is usually chosen. These parameters describe properties of the audio usually in a highly information reduced form and typically at a considerably lower rate, such as the mean energy or pitch over a longer period of time. As different Intelligent Audio Analysis tasks are often best represented by different such ’features’, a broad selection of the most typical ones is presented. This includes description of the digitalisation and segmentation of the audio as first step. Features include intensity, zero-crossings, auto correlation, spectrum and cepstrum, linear prediction, line spectral pairs, perceptual linear prediction, formants, fundamental frequency and voicing probability, and jitter and shimmer from the speech domain. Further, music, sound, and textual descriptors are included. Then, the principle of supra-segmental brute-forcing and subsequent reduction and selection are explained. As an example serves the widely used openSMILE feature extractor.

Björn Schuller

Chapter 7. Audio Recognition

Abstract

A huge variety of learning algorithms is applied in the field of Intelligent Audio Analysis depending on the nature of the target task of interest such as being static or dynamic. In addition, manifold other factors are decisive when selecting the optimal algorithm incorporating aspects such as efficiency or reliability. The most frequently found representatives are explained in detail: Decision Trees, Support Vector-based approaches, Artificial Neural Networks including the Long Short-Term Memory paradigm, as well as dynamic modelling by hidden Markov models. In addition, bootstrapping, meta-learning, and tandem learning are described. This is followed by the optimal evaluation of such algorithms touching partitioning and balancing, and evaluation measures as are frequently used in the field.

Björn Schuller

Chapter 8. Audio Source Separation

Abstract

In order to enhance the (audio) signal of interest in the case of added audio sources, one can aim at their separation. Albeit being very demanding, Audio Source Separation of audio signals has many interesting applications: for example, in Music Information Retrieval, it allows for polyphonic transcription or recognition of lyrics in singing after decomposing the original recording into voices and/or instruments such as drums or guitars, or vocals, e.g., for ’query by humming’. Here, non-negative matrix factorisation-based (NMF) approaches are explained. Further, ’NMF Activation Features’ are introduced and exemplified in the speech processing domain.

Björn Schuller

Chapter 9. Audio Enhancement and Robustness

Abstract

Once an audio recognition system that functions under idealistic conditions is established, the primary concern shifts towards making it robust in the real-world. Several options exist for system improvement along the chain of processing, and have proved to be promising especially in the monaural case. Here, most frequently methods and some recent candidates are explained, first including advanced front-end feature extraction, unsupervised spectral subtraction, feature enhancement and normalisation by Cepstral Mean Subtraction, Mean and Variance Normalisation, and Histogram Equalisation. Then, model-based feature enhancement based on (switching) linear dynamical modelling is followed by model architectures such as (hidden) conditional random fields, and switching autoregressive approaches.

Björn Schuller

Intelligent Audio Analysis Applications

Frontmatter

Chapter 10. Applications in Intelligent Speech Analysis

Abstract

Speech is broadly considered as being the most natural communication form for humans. Obviously, there are manifold applications opening up for general technical and computer systems, once they are able to recognise speech as well as humans do—be it for interaction purposes with humans, mediation purposes between humans, or speech retrieval. Here, state-of-the-art methodology is presented for highly robust speech recognition, nonlinguistic vocalisation recognition, paralinguistic speaker states and traits as exemplified by sentiment, emotion, interest, age, gender, intoxication and sleepiness. All examples stem from the author’s recent work. In particular the latter are chosen from a series of Challenges co-organised by the author at Interspeech from 2009 onwards.

Björn Schuller

Chapter 11. Applications in Intelligent Music Analysis

Abstract

As digitised music has conquered the market for more than a decade, advanced techniques of Intelligent Music Analysis are gaining interest and importance. From this exciting field, recent application examples were selected for presentation in detail from the work of the author including current performance benchmarks. These comprise drum-beat separation, onset detection, tempo, metre, ballroom dance style, and mood determination, key and chord recognition, and structure analysis alongside singer trait classification. The latter includes singer age, gender, race, and height recognition.

Björn Schuller

Chapter 12. Applications in Intelligent Sound Analysis

Abstract

Apart from speech and music, general sound can also carry relevant information. This is, however, a considerably less researched field up to-date. Most prominent in this area are the tasks of acoustic event detection and classification that can be subsumed under the area of computational auditory scene analysis. Fields of application include media retrieval including affective content analysis or human-machine and human-robot interaction, animal vocalisation recognition, and monitoring of industrial processes. Here, three applications in real-life Intelligent Sound Analysis are given from the work of the author: audio-based animal recognition, acoustic event classification, and prediction of emotion as induced in sound listeners. In particular, weakly supervised learning techniques are presented to cope with the typical label-sparseness in this field.

Björn Schuller

Conclusion

Frontmatter

Chapter 13. Discussion

Abstract

A statement on how the state-of-the-art in the field of Intelligent Audio Analysis was advanced more recently is provided at first. Based upon this, a distilled ’best practice’ recommendation is given to the reader. This includes aspects of high realism, standardised, multi-faceted and machine-aided data collection, source separation, feature brute-forcing, temporal evolution modelling, coupling of tasks, and standardisation. Then, a critical discussion is led on missing aspects and remaining research steps. Considerations in this direction comprise the request for more robustness, blind separation and multi-task processing of real-life streams, massive weakly supervised and evolutionary learning, closure of the gap between analysis and synthesis, cross-cultural and cross-lingual widening, novel tasks, further unification and transfer of methods, confidence measures, distributed processing, and new competitive research challenges.

Björn Schuller

Chapter 14. Vision

Abstract

Based on recent examples of the author’s work from the domains of Intelligent Speech, Music, and Sound Analysis, a comprehensive overview is given on currently obtainable performances in the field of Intelligent Audio Analysis. This comprises discrete classification tasks, namely, digit, spelling, phoneme, word, and non-linguistic vocalisation recognition alongside writer sentiment and speaker emotion, age, gender, intoxication, and sleepiness recognition. Further, continuous writer sentiment, and speaker interest and height determination as well as sound listener induced arousal and valence prediction are contained. Based on these performances, future perspectives are given.

Björn Schuller

Backmatter

Titel: Intelligent Audio Analysis
verfasst von: Björn W. Schuller
Verlag: Springer Berlin Heidelberg
Electronic ISBN: 978-3-642-36806-6
Print ISBN: 978-3-642-36805-9
DOI: https://doi.org/10.1007/978-3-642-36806-6

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Jonas Klose/© Pine Valley Capital GmbH, Carina Kießling von der Strategieberatung Roland Berger/© Monika Walther Fotografie | ATZ, Beijing Auto Show 2024: Deutsche Hersteller wollen angreifen./© EKH-Pictures / Generated with AI / Stock.adobe.com, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade

Springer Professional

Über dieses Buch

Inhaltsverzeichnis

Frontmatter

Introduction

Frontmatter

Chapter 1. Intelligent Audio Analysis: A Definition

Chapter 2. Motivation, Aims, and Solutions

Chapter 3. Structure of the Book

Intelligent Audio Analysis Methods

Frontmatter

Chapter 4. Chain of Audio Processing

Chapter 5. Audio Data

Chapter 6. Audio Features

Chapter 7. Audio Recognition

Chapter 8. Audio Source Separation

Chapter 9. Audio Enhancement and Robustness

Intelligent Audio Analysis Applications

Frontmatter

Chapter 10. Applications in Intelligent Speech Analysis

Chapter 11. Applications in Intelligent Music Analysis

Chapter 12. Applications in Intelligent Sound Analysis

Conclusion

Frontmatter

Chapter 13. Discussion

Chapter 14. Vision

Backmatter

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.