Skip to main content

2013 | OriginalPaper | Buchkapitel

2. Robust Emotion Recognition using Pitch Synchronous and Sub-syllabic Spectral Features

verfasst von : K. Sreenivasa Rao, Shashidhar G. Koolagudi

Erschienen in: Robust Emotion Recognition using Spectral and Prosodic Features

Verlag: Springer New York

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This chapter discusses the use of vocal tract information for recognizing the emotions. Linear prediction cepstral coefficients (LPCC) and mel frequency cepstral coefficients (MFCC) are used as the correlates of vocal tract information. In addition to LPCCs and MFCCs, formant related features are also explored in this work for recognizing emotions from speech. Extraction of the above mentioned spectral features is discussed in brief. Further extraction of these features from sub-syllabic regions such as consonants, vowels and consonant-vowel transition regions is discussed. Extraction of spectral features from pitch synchronous analysis is also discussed. Basic philosophy and use of Gaussian mixture models is discussed in this chapter for classifying the emotions. The emotion recognition performance obtained from different vocal tract features is compared. Proposed spectral features are evaluated on Indian and Berlin emotion databases. Performance of Gaussian mixture models in classifying the emotional utterances using vocal tract features is compared with neural network models.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat D. Ververidis, C. Kotropoulos, Emotional speech recognition: Resources, features, and methods. SPC 48, 1162–1181 (2006) D. Ververidis, C. Kotropoulos, Emotional speech recognition: Resources, features, and methods. SPC 48, 1162–1181 (2006)
2.
Zurück zum Zitat D. Neiberg, K. Elenius, K. Laskowski, Emotion recognition in spontaneous speech using GMMs, in INTERSPEECH 2006—ICSLP, (Pittsburgh, Pennsylvania), pp. 809–812, 17–19 Sept 2006 D. Neiberg, K. Elenius, K. Laskowski, Emotion recognition in spontaneous speech using GMMs, in INTERSPEECH 2006—ICSLP, (Pittsburgh, Pennsylvania), pp. 809–812, 17–19 Sept 2006
3.
Zurück zum Zitat D. Bitouk, R. Verma, A. Nenkova, Class-level spectral features for emotion recognition, Speech Commun. (2010) (in Press) D. Bitouk, R. Verma, A. Nenkova, Class-level spectral features for emotion recognition, Speech Commun. (2010) (in Press)
4.
Zurück zum Zitat S.G. Koolagudi, S. Maity, V.A. Kumar, S. Chakrabarti, K.S. Rao, IITKGP-SESC: Speech Database for Emotion Analysis. Communications in Computer and Information Science, JIIT University, Noida, India, Springer. ISSN: 1865–0929 ed., 17–19 Aug 2009 S.G. Koolagudi, S. Maity, V.A. Kumar, S. Chakrabarti, K.S. Rao, IITKGP-SESC: Speech Database for Emotion Analysis. Communications in Computer and Information Science, JIIT University, Noida, India, Springer. ISSN: 1865–0929 ed., 17–19 Aug 2009
5.
Zurück zum Zitat F. Burkhardt, A. Paeschke, M. Rolfes, W. Sendlmeier, B. Weiss, A database of german emotional speech, in Interspeech, Lissabon, 2005 F. Burkhardt, A. Paeschke, M. Rolfes, W. Sendlmeier, B. Weiss, A database of german emotional speech, in Interspeech, Lissabon, 2005
6.
Zurück zum Zitat L.R. Rabiner, B.H. Juang, Fundamentals of Speech Recognition (Prentice-Hall, Englewood Cliffs, New Jersy, 1993) L.R. Rabiner, B.H. Juang, Fundamentals of Speech Recognition (Prentice-Hall, Englewood Cliffs, New Jersy, 1993)
7.
Zurück zum Zitat J. Chen, Y.A. Huang, Q. Li, K.K. Paliwal, Recognition of noisy speech using dynamic spectral subband centroids. IEEE Signal Process. Lett. 11, 258–261 (2004)CrossRef J. Chen, Y.A. Huang, Q. Li, K.K. Paliwal, Recognition of noisy speech using dynamic spectral subband centroids. IEEE Signal Process. Lett. 11, 258–261 (2004)CrossRef
8.
Zurück zum Zitat S.V. Gangashetty, C.C. Sekhar, B. Yegnanarayana, Detection of vowel on set points in continuous speech using auto-associative neural network models, in INTERSPEECH, IEEE, 2004 S.V. Gangashetty, C.C. Sekhar, B. Yegnanarayana, Detection of vowel on set points in continuous speech using auto-associative neural network models, in INTERSPEECH, IEEE, 2004
9.
Zurück zum Zitat K.S. Rao, B. Yegnanarayana, Duration modification using glottal closure instants and vowel onset points. Speech Commun. 51, 1263–1269 (2009)CrossRef K.S. Rao, B. Yegnanarayana, Duration modification using glottal closure instants and vowel onset points. Speech Commun. 51, 1263–1269 (2009)CrossRef
10.
Zurück zum Zitat S.R.M. Prasanna, B.V.S. Reddy, P. Krishnamoorthy, Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Trans. Audio Speech Lang. Process. 17, 556–565 (2009)CrossRef S.R.M. Prasanna, B.V.S. Reddy, P. Krishnamoorthy, Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Trans. Audio Speech Lang. Process. 17, 556–565 (2009)CrossRef
11.
Zurück zum Zitat Y. Zeng, H. Wu, R. Gao, Pitch synchronous analysis method and fisher criterion based speaker identification, in Third International Conference on Natural Computation, vol. 2 (IEEE Computer Society, Washington DC, USA, 2007), pp. 691–695. ISBN: 0-7695-2875-9 Y. Zeng, H. Wu, R. Gao, Pitch synchronous analysis method and fisher criterion based speaker identification, in Third International Conference on Natural Computation, vol. 2 (IEEE Computer Society, Washington DC, USA, 2007), pp. 691–695. ISBN: 0-7695-2875-9
12.
Zurück zum Zitat H. Muta, T. Baer, K. Wagatsuma, T. Muraoka, H. Fukuda, Pitch synchronous analysis of hoarseness in running speech. J. Acoust. Soc. Am. 84, 1292–1301 (1988)CrossRef H. Muta, T. Baer, K. Wagatsuma, T. Muraoka, H. Fukuda, Pitch synchronous analysis of hoarseness in running speech. J. Acoust. Soc. Am. 84, 1292–1301 (1988)CrossRef
13.
Zurück zum Zitat K. Murty, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16, 1602–1613 (2008)CrossRef K. Murty, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16, 1602–1613 (2008)CrossRef
14.
Zurück zum Zitat B. Yegnanarayana, S.P. Kishore, AANN an alternative to GMM for pattern recognition. Neural Networks 15, 459–469 (2002)CrossRef B. Yegnanarayana, S.P. Kishore, AANN an alternative to GMM for pattern recognition. Neural Networks 15, 459–469 (2002)CrossRef
15.
Zurück zum Zitat B. Yegnanarayana, Artificial Neural Networks (Prentice-Hall, New Delhi, India, 1999) B. Yegnanarayana, Artificial Neural Networks (Prentice-Hall, New Delhi, India, 1999)
16.
Zurück zum Zitat S. Haykin, Neural Networks: A Comprehensive Foundation (Pearson Education Aisa, Inc., New Delhi, India, 1999) S. Haykin, Neural Networks: A Comprehensive Foundation (Pearson Education Aisa, Inc., New Delhi, India, 1999)
17.
Zurück zum Zitat K.I. Diamantaras, S.Y. Kung, Principal Component Neural Networks: Theory and Applications (Wiley, New York, 1996) K.I. Diamantaras, S.Y. Kung, Principal Component Neural Networks: Theory and Applications (Wiley, New York, 1996)
18.
Zurück zum Zitat M.S. Ikbal, H. Misra, B. Yegnanarayana, Analysis of autoassociative mapping neural networks, in Proceedings of the International Joint Conference on Neural Networks (IJCNN), (USA, 1999), pp. 854–858 M.S. Ikbal, H. Misra, B. Yegnanarayana, Analysis of autoassociative mapping neural networks, in Proceedings of the International Joint Conference on Neural Networks (IJCNN), (USA, 1999), pp. 854–858
19.
Zurück zum Zitat S.P. Kishore, B. Yegnanarayana, Online text-independent speaker verification system using autoassociative neural network models, in Proceedings of the International Joint Conference on Neural Networks (IJCNN), vol. 2 (Washington, DC, USA, 2001), pp. 1548–1553 S.P. Kishore, B. Yegnanarayana, Online text-independent speaker verification system using autoassociative neural network models, in Proceedings of the International Joint Conference on Neural Networks (IJCNN), vol. 2 (Washington, DC, USA, 2001), pp. 1548–1553
20.
Zurück zum Zitat A.V.N.S. Anjani, Autoassociate neural network models for processing degraded speech, Master’s Thesis, MS Thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India, 2000 A.V.N.S. Anjani, Autoassociate neural network models for processing degraded speech, Master’s Thesis, MS Thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India, 2000
21.
Zurück zum Zitat K.S. Reddy, Source and system features for speaker recognition, Master’s Thesis, MS Thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India, 2004 K.S. Reddy, Source and system features for speaker recognition, Master’s Thesis, MS Thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India, 2004
22.
Zurück zum Zitat C.S. Gupta, Significance of source features for speaker recognition, Master’s Thesis, MS Thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India, 2003 C.S. Gupta, Significance of source features for speaker recognition, Master’s Thesis, MS Thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India, 2003
23.
Zurück zum Zitat S. Desai, A. W. Black, B.Yegnanarayana, K. Prahallad, Spectral mapping using artificial neural networks for voice conversion. IEEE Trans. Audio Speech Lang. Process. 18, 954–964 (2010) S. Desai, A. W. Black, B.Yegnanarayana, K. Prahallad, Spectral mapping using artificial neural networks for voice conversion. IEEE Trans. Audio Speech Lang. Process. 18, 954–964 (2010)
24.
Zurück zum Zitat K.S. Rao, B. Yegnanarayana, Intonation modeling for indian languages. Comput. Speech Lang. 23, 240–256 (2009)CrossRef K.S. Rao, B. Yegnanarayana, Intonation modeling for indian languages. Comput. Speech Lang. 23, 240–256 (2009)CrossRef
25.
Zurück zum Zitat C.K. Mohan, B. Yegnanarayana, Classification of sport videos using edge-based features and autoassociative neural network models. Signal Image Video Process. 4, 61–73 (2008). doi: 10.1007/s11760-008-0097-9 C.K. Mohan, B. Yegnanarayana, Classification of sport videos using edge-based features and autoassociative neural network models. Signal Image Video Process. 4, 61–73 (2008). doi: 10.​1007/​s11760-008-0097-9
26.
Zurück zum Zitat L. Mary, B. Yegnanarayana, Autoassociative neural network models for language identification, in International Conference on Intelligent Sensing and Information Processing, IEEE, pp. 317–320, 24 Aug 2004. doi: 10.1109/ICISIP.2004.1287674 L. Mary, B. Yegnanarayana, Autoassociative neural network models for language identification, in International Conference on Intelligent Sensing and Information Processing, IEEE, pp. 317–320, 24 Aug 2004. doi: 10.​1109/​ICISIP.​2004.​1287674
27.
Zurück zum Zitat B. Yegnanarayana, K.S. Reddy, S.P. Kishore, Source and system features for speaker recognition using aann models, in IEEE International Conference on Acoustics, Speech, and Signal Processing, (Salt Lake City, UT), May 2001 B. Yegnanarayana, K.S. Reddy, S.P. Kishore, Source and system features for speaker recognition using aann models, in IEEE International Conference on Acoustics, Speech, and Signal Processing, (Salt Lake City, UT), May 2001
28.
Zurück zum Zitat C.S. Gupta, S.R.M. Prasanna, B. Yegnanarayana, Autoassociative neural network models for online speaker verification using source features from vowels, in International Joint Conference on Neural Networks, (Honululu, Hawii, USA), May 2002 C.S. Gupta, S.R.M. Prasanna, B. Yegnanarayana, Autoassociative neural network models for online speaker verification using source features from vowels, in International Joint Conference on Neural Networks, (Honululu, Hawii, USA), May 2002
29.
Zurück zum Zitat B. Yegnanarayana, K.S. Reddy, S.P. Kishore, Source and system features for speaker recognition using AANN models, in Proceedings of the IEEE International Conference Acoustics, Speech, Signal Processing, (Salt Lake City, Utah, USA), pp. 409–412, May 2001 B. Yegnanarayana, K.S. Reddy, S.P. Kishore, Source and system features for speaker recognition using AANN models, in Proceedings of the IEEE International Conference Acoustics, Speech, Signal Processing, (Salt Lake City, Utah, USA), pp. 409–412, May 2001
30.
31.
Zurück zum Zitat O.M. Mubarak, E. Ambikairajah, J. Epps, Analysis of an mfcc-based audio indexing system for efficient coding of multimedia sources, in The 8th International Symposium on Signal Processing and its Applications, (Sydney, Australia), 28–31 Aug 2005 O.M. Mubarak, E. Ambikairajah, J. Epps, Analysis of an mfcc-based audio indexing system for efficient coding of multimedia sources, in The 8th International Symposium on Signal Processing and its Applications, (Sydney, Australia), 28–31 Aug 2005
32.
Zurück zum Zitat R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, 2nd edn. (A Wiley-interscience Publications, Singapore, 2004) R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, 2nd edn. (A Wiley-interscience Publications, Singapore, 2004)
Metadaten
Titel
Robust Emotion Recognition using Pitch Synchronous and Sub-syllabic Spectral Features
verfasst von
K. Sreenivasa Rao
Shashidhar G. Koolagudi
Copyright-Jahr
2013
Verlag
Springer New York
DOI
https://doi.org/10.1007/978-1-4614-6360-3_2

Neuer Inhalt