Skip to main content
Top
Published in: International Journal of Speech Technology 2/2013

01-06-2013

Vowel onset point detection for noisy speech using spectral energy at formant frequencies

Authors: Anil Kumar Vuppala, K. Sreenivasa Rao

Published in: International Journal of Speech Technology | Issue 2/2013

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this paper, we propose a method for robust detection of the vowel onset points (VOPs) from noisy speech. The proposed VOP detection method exploits the spectral energy at formant frequencies of the speech segments present in glottal closure region. In this work, formants are extracted by using group delay function, and glottal closure instants are extracted by using zero frequency filter based method. Performance of the proposed VOP detection method is compared with the existing method, which uses the combination of evidence from excitation source, spectral peaks energy and modulation spectrum. Speech data from TIMIT database and noise samples from NOISEX database are used for analyzing the performance of the VOP detection methods. Significant improvement in the performance of VOP detection is observed by using proposed method compared to existing method.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Gangashetty, S. V., Sekhar, C. C., & Yegnanarayana, B. (2004a). Detection of vowel onset points in continuous speech using autoassociative neural network models. In Proc. int. conf. spoken language processing (pp. 401–410). Gangashetty, S. V., Sekhar, C. C., & Yegnanarayana, B. (2004a). Detection of vowel onset points in continuous speech using autoassociative neural network models. In Proc. int. conf. spoken language processing (pp. 401–410).
go back to reference Gangashetty, S. V., Sekhar, C. C., & Yegnanarayana, B. (2004b). Extraction of fixed dimension patterns from varying duration segments of consonant-vowel utterances. In Proc. of IEEE ICISIP (pp. 159–164). Gangashetty, S. V., Sekhar, C. C., & Yegnanarayana, B. (2004b). Extraction of fixed dimension patterns from varying duration segments of consonant-vowel utterances. In Proc. of IEEE ICISIP (pp. 159–164).
go back to reference Hermes, D. J. (1990). Vowel onset detection. The Journal of the Acoustical Society of America, 87, 866–873. CrossRef Hermes, D. J. (1990). Vowel onset detection. The Journal of the Acoustical Society of America, 87, 866–873. CrossRef
go back to reference Joseph, M. A., Guruprasad, S., & Yegnanarayana, B. (2006). Extracting formants from short segments of speech using group delay functions. In Proc. of interspeech (pp. 1009–1012). Joseph, M. A., Guruprasad, S., & Yegnanarayana, B. (2006). Extracting formants from short segments of speech using group delay functions. In Proc. of interspeech (pp. 1009–1012).
go back to reference Murty, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio, Speech, and Language Processing, 16(8), 1602–1613. CrossRef Murty, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio, Speech, and Language Processing, 16(8), 1602–1613. CrossRef
go back to reference Prasanna, S. R. M., & Yegnanarayana, B. (2005). Detection of vowel onset point events using excitation source information. In Proc. of interspeech (pp. 1133–1136). Prasanna, S. R. M., & Yegnanarayana, B. (2005). Detection of vowel onset point events using excitation source information. In Proc. of interspeech (pp. 1133–1136).
go back to reference Prasanna, S. R. M., Gangashetty, S. V., & Yegnanarayana, B. (2001). Significance of vowel onset point for speech analysis. In Proc. of int. conf. signal processing and communications (pp. 81–88). Prasanna, S. R. M., Gangashetty, S. V., & Yegnanarayana, B. (2001). Significance of vowel onset point for speech analysis. In Proc. of int. conf. signal processing and communications (pp. 81–88).
go back to reference Prasanna, S. R. M., Reddy, B. V. S., & Krishnamoorthy, P. (2009). Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Transactions on Audio, Speech, and Language Processing, 17(4), 556–565. CrossRef Prasanna, S. R. M., Reddy, B. V. S., & Krishnamoorthy, P. (2009). Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Transactions on Audio, Speech, and Language Processing, 17(4), 556–565. CrossRef
go back to reference Rao, K. S., & Yegnanarayana, B. (2009). Duration modification using glottal closure instants and vowel onset points. Speech Communication, 51, 1263–1269. CrossRef Rao, K. S., & Yegnanarayana, B. (2009). Duration modification using glottal closure instants and vowel onset points. Speech Communication, 51, 1263–1269. CrossRef
go back to reference Vuppala, A. K., Rao, K. S., Chakrabarti, S., Krishnamoorthy, P., & Prasanna, S. R. M. (2011). Recognition of consonant-vowel (cv) units under background noise using combined temporal and spectral preprocessing. International Journal of Speech Technology, 14(1). Vuppala, A. K., Rao, K. S., Chakrabarti, S., Krishnamoorthy, P., & Prasanna, S. R. M. (2011). Recognition of consonant-vowel (cv) units under background noise using combined temporal and spectral preprocessing. International Journal of Speech Technology, 14(1).
go back to reference Vuppala, A. K., Rao, K. S., & Chakrabarti, S. (2012a). Improved consonant–vowel recognition for low bit-rate coded speech. Wiley International Journal of Adaptive Control and Signal Processing, 26(4), 333–349. CrossRef Vuppala, A. K., Rao, K. S., & Chakrabarti, S. (2012a). Improved consonant–vowel recognition for low bit-rate coded speech. Wiley International Journal of Adaptive Control and Signal Processing, 26(4), 333–349. CrossRef
go back to reference Vuppala, A. K., Rao, K. S., & Chakrabarti, S. (2012b). Spotting and recognition of consonant-vowel units from continuous speech using accurate vowel onset points. Circuits, Systems, and Signal Processing, 31(4), 1459–1474. CrossRef Vuppala, A. K., Rao, K. S., & Chakrabarti, S. (2012b). Spotting and recognition of consonant-vowel units from continuous speech using accurate vowel onset points. Circuits, Systems, and Signal Processing, 31(4), 1459–1474. CrossRef
go back to reference Wang, J.-H., & Chen, S.-H. (1999). A c/v segmentation algorithm for mandarin speech using wavelet transforms. In Proc. IEEE int. conf. acoust., speech, signal processing (pp. 1261–1264). Wang, J.-H., & Chen, S.-H. (1999). A c/v segmentation algorithm for mandarin speech using wavelet transforms. In Proc. IEEE int. conf. acoust., speech, signal processing (pp. 1261–1264).
go back to reference Wang, J.-F., Wu, C. H., Chang, S. H., & Lee, J. Y. (1991). A hierarchical neural network based C/V segmentation algorithm for mandarin speech recognition. IEEE Transactions on Signal Processing, 39(9), 2141–2146. CrossRef Wang, J.-F., Wu, C. H., Chang, S. H., & Lee, J. Y. (1991). A hierarchical neural network based C/V segmentation algorithm for mandarin speech recognition. IEEE Transactions on Signal Processing, 39(9), 2141–2146. CrossRef
Metadata
Title
Vowel onset point detection for noisy speech using spectral energy at formant frequencies
Authors
Anil Kumar Vuppala
K. Sreenivasa Rao
Publication date
01-06-2013
Publisher
Springer US
Published in
International Journal of Speech Technology / Issue 2/2013
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-012-9179-8

Other articles of this Issue 2/2013

International Journal of Speech Technology 2/2013 Go to the issue