Skip to main content
main-content

Über dieses Buch

In this brief, the authors discuss recently explored spectral (sub-segmental and pitch synchronous) and prosodic (global and local features at word and syllable levels in different parts of the utterance) features for discerning emotions in a robust manner. The authors also delve into the complementary evidences obtained from excitation source, vocal tract system and prosodic features for the purpose of enhancing emotion recognition performance. Features based on speaking rate characteristics are explored with the help of multi-stage and hybrid models for further improving emotion recognition performance. Proposed spectral and prosodic features are evaluated on real life emotional speech corpus.

Inhaltsverzeichnis

Frontmatter

Chapter 1. Introduction

Abstract
This chapter briefly discusses about the importance of analysis of emotions from speech signal. Significance of emotions from psychological and signal processing aspects is discussed. Influence of emotions on the characteristics of speech production system is briefly mentioned. Role of excitation source, vocal tract system and prosodic features is discussed in the context of various speech tasks highlighting the task of recognizing emotions. Different types of emotional speech databases used for carrying out various emotion-specific tasks are briefly discussed. Various applications related to speech emotion recognition are mentioned. Important state-of-the-art issues prevailing in the area of emotional speech processing are discussed at the end of the chapter along with a note on the organization of the book.
K. Sreenivasa Rao, Shashidhar G. Koolagudi

Chapter 2. Robust Emotion Recognition using Pitch Synchronous and Sub-syllabic Spectral Features

Abstract
This chapter discusses the use of vocal tract information for recognizing the emotions. Linear prediction cepstral coefficients (LPCC) and mel frequency cepstral coefficients (MFCC) are used as the correlates of vocal tract information. In addition to LPCCs and MFCCs, formant related features are also explored in this work for recognizing emotions from speech. Extraction of the above mentioned spectral features is discussed in brief. Further extraction of these features from sub-syllabic regions such as consonants, vowels and consonant-vowel transition regions is discussed. Extraction of spectral features from pitch synchronous analysis is also discussed. Basic philosophy and use of Gaussian mixture models is discussed in this chapter for classifying the emotions. The emotion recognition performance obtained from different vocal tract features is compared. Proposed spectral features are evaluated on Indian and Berlin emotion databases. Performance of Gaussian mixture models in classifying the emotional utterances using vocal tract features is compared with neural network models.
K. Sreenivasa Rao, Shashidhar G. Koolagudi

Chapter 3. Robust Emotion Recognition using Sentence, Word and Syllable Level Prosodic Features

Abstract
This chapter discuss about the use of prosodic information in discriminating the emotions. The motivation for exploring the prosodic features to recognize the emotions is illustrated using the gross statistics and time varying patterns of prosodic parameters. Prosodic correlates of speech such as energy, duration and pitch parameters are computed from the emotional utterances. Global prosodic features representing the gross statistics of prosody and local prosodic features representing the finer variations in prosody are introduced in this chapter for discriminating the emotions. These parameters are further extracted separately from different levels such as entire utterances, words and syllables. The analysis of contribution of emotional information by the initial, middle and final portions of sentences, words and syllables are studied. Use of support vector machines for classifying emotional utterances based on prosodic features has been demonstrated. Chapter ends with discussion on emotion recognition results and important conclusions.
K. Sreenivasa Rao, Shashidhar G. Koolagudi

Chapter 4. Robust Emotion Recognition using Combination of Excitation Source, Spectral and Prosodic Features

Abstract
Different speech features may offer emotion specific information in different ways. This chapter explores the combination evidences offered by various speech features. In this chapter, we consider excitation source, spectral and prosodic features as specific individual speech features for classifying the emotions. Various combinations of the above mentioned individual features are explored for improving the emotion recognition performance. Since, the features are derived from different levels, the emotion specific characteristics captured by these features may be complementary or non-overlapping in nature. By properly exploiting these evidences, the recognition performance will definitely improved. From the results, its is observed that all the combinations explored in this have enhanced the recognition performance significantly.
K. Sreenivasa Rao, Shashidhar G. Koolagudi

Chapter 5. Robust Emotion Recognition using Speaking Rate Features

Abstract
In this chapter speaking rate characteristics of speech are explored for discriminating the emotions. In real life, we observe that certain emotions are very active with high speaking rate and some are passive with low speaking rate. With this motivation, in this chapter, we have proposed a two stage emotion recognition system, where the emotions are classified into three broad groups (active, neutral and passive) at the first stage and during second stage emotions in each broad group are further classified. Spectral and prosodic features are explored in each stage for discriminating the emotions. Combination of spectral and prosodic features is observed to be performed better.
K. Sreenivasa Rao, Shashidhar G. Koolagudi

Chapter 6. Emotion Recognition on Real Life Emotions

Abstract
Collecting and modelling real life emotions is a real challenge. However, final aim of any emotion recognition system is to identify real world emotions with reasonable accuracy. From the literature it is observed that combination of different features improves the classification performance. In this chapter score level combination of different features has been studied for recognizing real life emotions. For modelling real life emotions, there is a need of good database containing wide variety of real life emotions. In this chapter, Hindi movie database has been used to represent real world emotions. Single and multi-speaker data is collected to study the speaker influence on emotion recognition. Different features are explored for identifying the collected emotions. From the results, it is observed that spectral features carry robust emotion specific information.
K. Sreenivasa Rao, Shashidhar G. Koolagudi

Chapter 7. Summary and Conclusions

Abstract
This chapter summarizes the research work presented in this book, highlights the contributions of the work and discusses the scope for future work. In this book, main attention was given to emotion specific spectral and prosodic features for performing the robust emotion recognition. The book is organized into 7 chapters. The first chapter introduces speech emotion recognition as the contemporary research area. In Chap. 2 the spectral features extracted from sub-syllabic regions such as vowels, consonants and CV transition regions are proposed for robust emotion recognition from speech. Pitch synchronously extracted spectral features are also used in Chap. 2 for recognizing the emotions. Chapter 3 proposes use of dynamic prosodic features for recognition of emotions. These dynamic features along with the static prosodic features derived from sentence, word and syllable levels are used for characterizing the emotions. Emotion specific information present in different positions (initial, middle and final) of the speech utterances is used for emotion classification. In Chap. 4, combinations of various emotion specific speech features are explored for developing the robust emotion recognition systems. Chapter 5 deals with the method of multistage emotion classification using combination of features. In this chapter two stage emotion recognition system is developed using spectral and prosodic features. Chapter 6 introduces real life emotion recognition approach using different features. Chapter 7 concludes the present work and flashes light on the directions for further research.
K. Sreenivasa Rao, Shashidhar G. Koolagudi

Backmatter

Weitere Informationen