Skip to main content
Top

2014 | Book

Speech Processing in Mobile Environments

insite
SEARCH

About this book

This book focuses on speech processing in the presence of low-bit rate coding and varying background environments. The methods presented in the book exploit the speech events which are robust in noisy environments. Accurate estimation of these crucial events will be useful for carrying out various speech tasks such as speech recognition, speaker recognition and speech rate modification in mobile environments. The authors provide insights into designing and developing robust methods to process the speech in mobile environments. Covering temporal and spectral enhancement methods to minimize the effect of noise and examining methods and models on speech and speaker recognition applications in mobile environments.

Table of Contents

Frontmatter
Chapter 1. Introduction
Abstract
The rapid growth of mobile users is creating great deal of interest in the development of robust speech systems in mobile environment. Some of the new and exciting services enabled by speech systems in mobile environment are: speech interface to the mobile devices, information retrieval through mobile devices, voice-based person authentication, and forensic investigation. Issues involved in adapting the present speech processing technology to mobile systems are: effect of varying background noise, degradations introduced by the speech coders, and errors introduced due to transmission impairments. In this work, the major focus is on improving the recognition performance of speech systems in the presence of speech coding and background noise by using vowel onset points (VOPs). This chapter provides the overall objective of the present work and scope of the book. The chapter-wise organization and evolution of ideas related to present work are given at the end of this chapter.
K. Sreenivasa Rao, Anil Kumar Vuppala
Chapter 2. Background and Literature Review
Abstract
This chapter provides the systematic review of the existing approaches for vowel onset point detection, speech systems in mobile environment, Consonant-Vowel (CV) recognition in Indian languages, and time scale modification (TSM). In addition to providing the review of above-mentioned topics, authors have discussed about the short comings present in the existing approaches and derived the motivation and scope of the present work.
K. Sreenivasa Rao, Anil Kumar Vuppala
Chapter 3. Vowel Onset Point Detection from Coded and Noisy Speech
Abstract
Most of the existing vowel onset point (VOP) detection methods are developed for clean speech. In this chapter, we propose methods for detection of VOPs in the presence of speech coding and background noise conditions. VOP detection method for coded speech is based on the spectral energy between 500 and 2,500 Hz frequency band of the speech segments present in glottal closure region. In case of noisy speech, the proposed VOP detection method exploits the spectral energy at the formant locations of the speech segments present in glottal closure region. The proposed VOP detection methods are evaluated using objective measures and consonant vowel (CV) unit recognition experiments.
K. Sreenivasa Rao, Anil Kumar Vuppala
Chapter 4. Consonant–Vowel Recognition in the Presence of Coding and Background Noise
Abstract
In this chapter, an approach for improving the recognition performance of CV units under clean, coded, and noisy conditions is presented. Proposed CV recognition method is carried out in two stages. In the first stage vowel category of CV unit is recognized, and in the second stage consonant category is recognized. At each stage of the proposed method, complementary evidences from support vector machine (SVM) and hidden Markov models (HMM) are combined for enhancing the recognition performance of CV units. In the proposed CV recognition approach, VOP is used as an anchor point for extracting features from the CV unit. Therefore, VOP detection methods presented in previous chapter are used for this work. Performance of the proposed CV recognition method is demonstrated under coding and noisy conditions. Recognition studies are carried out using isolated CV and CV units from Telugu broadcast news databases. Further, performance of the CV recognition system under background noise is improved by using combined temporal and spectral processing-based preprocessing methods.
K. Sreenivasa Rao, Anil Kumar Vuppala
Chapter 5. Spotting and Recognition of Consonant–Vowel Units from Continuous Speech
Abstract
Automatic speech recognition is the process of converting speech into text. It is carried out by transforming speech signal into a sequence of symbols by using acoustic models, and converting this sequence of symbols into text by using a language model. Two approaches are commonly used for speech recognition. The first approach is based on word-level matching using word models, and then using a language model. The major drawback of this approach is to develop word models for all words of a language. In a language generally the number of words will be of order 105–106. The second approach is based on segmenting speech into subword units, and labeling them using a subword unit recognizer. The limitation of this approach lies in accurate segmentation of speech into subword units of varying durations. An approach to continuous speech recognition by spotting consonant–vowel (CV) units is presented in literature in the context of Indian languages. This approach is based on the detection of vowel onset points (VOPs) and labeling the segments around the VOPs using a CV recognizer. The major issues in this approach are accurate detection of VOPs and labeling the regions around these VOPs. In literature AANN models are used for the detection of VOPs with 30% and 6% missed and spurious rates, respectively. The performance of CV spotting and recognition using AANN models is significantly low due to inaccurate detection of VOPs. In this chapter, we propose an approach for spotting and recognition of CV units from continuous speech using accurate VOPs. Here, VOPs are determined using two-stage approach. In stage-1, VOPs are determined using the evidences from excitation source, spectral energy, and modulation spectrum of the speech segments. In stage-2, VOPs determined in stage-1 are verified and the genuine VOPs are positioned accurately using the deviation between successive epoch intervals.
K. Sreenivasa Rao, Anil Kumar Vuppala
Chapter 6. Speaker Identification and Time Scale Modification Using VOPs
Abstract
In this chapter, the proposed two-stage VOP detection method is used for improving the Speaker Identification (SI) performance in the presence of coding. With the help of VOPs, the crucial regions of speech segments which mainly characterize speaker-specific information are determined. Features extracted from these crucial speech segments are used for speaker identification task for improving the recognition accuracy. The accurate VOPs determined from the proposed method are also explored for nonuniform time scale modification. The proposed nonuniform time scale modification method provides high quality speech while varying speech rate. In this method, vowel regions are modified nonuniformly based on the type of vowel, and consonant and transition regions are unaltered irrespective of speaking rate. Here, vowel onset points are used to determine consonant, vowel, and transition regions.
K. Sreenivasa Rao, Anil Kumar Vuppala
Chapter 7. Summary and Conclusions
Abstract
This book discusses some important issues in speech processing in the context of mobile environment. The major challenges in speech processing in mobile environment are: varying background conditions, speech coding and transmission channel errors. This book suggests signal processing methods to determine some crucial events in speech, which are robust to above-said adverse conditions. In this work, authors have proposed vowel onset points (VOPs) as crucial events in speech, which are robust to speech coding and background noisy environments. By using VOPs as anchor points, speech signals are processed in the presence of coding and noisy environments for developing the speech systems such as speech recognition, speaker recognition, and speaking rate modification. From the results, it is grossly observed that the performance of developed speech systems is superior compared to systems developed without using the knowledge of VOPs. This chapter summarizes the findings of the present work, highlights the major contributions, and flashlights on the directions for future work.
K. Sreenivasa Rao, Anil Kumar Vuppala
Backmatter
Metadata
Title
Speech Processing in Mobile Environments
Authors
K. Sreenivasa Rao
Anil Kumar Vuppala
Copyright Year
2014
Electronic ISBN
978-3-319-03116-3
Print ISBN
978-3-319-03115-6
DOI
https://doi.org/10.1007/978-3-319-03116-3