Skip to main content
Top

2016 | OriginalPaper | Chapter

5. Emotional Speech Recognition

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Recent years have been marked by a growing need for systems that can grasp human emotions and in particular, recognize emotions. Emotions lie at the centre of any social communication and form the basis for an intelligent and meaningful interaction. The chapter further discusses the acoustic correlates of emotions and describes various techniques and developments imperative to support speech interfaces that recognize emotional expressions in real world settings. Significant advancement in the areas of knowledge representation, infrastructure requirements and algorithm implementation is a prerequisite for modeling effective future speech recognition systems that are more robust and dynamic in nature.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Reeves B, Nass C (1996) The media equation: how people treat computers, television and new media like real people and places. Cambridge University Press, Cambridge Reeves B, Nass C (1996) The media equation: how people treat computers, television and new media like real people and places. Cambridge University Press, Cambridge
2.
go back to reference Flanagan JL (1972) Speech analysis, synthesis, and perception, 2nd edn. Springer, New YorkCrossRef Flanagan JL (1972) Speech analysis, synthesis, and perception, 2nd edn. Springer, New YorkCrossRef
3.
go back to reference Scherer KR (2003) Vocal communication of emotion: a review of research paradigms. Speech Commun 40:227–256CrossRefMATH Scherer KR (2003) Vocal communication of emotion: a review of research paradigms. Speech Commun 40:227–256CrossRefMATH
4.
go back to reference Potamianos G, Neti C, Gravier G, Garg A, Senior A (2003) Recent advances in the automatic recognition of audiovisual speech. Proc IEEE 91(9):1306–1326CrossRef Potamianos G, Neti C, Gravier G, Garg A, Senior A (2003) Recent advances in the automatic recognition of audiovisual speech. Proc IEEE 91(9):1306–1326CrossRef
5.
go back to reference Scherer K (1996) Adding the affective dimension: a new look in speech analysis and synthesis. In: Proceeding of international conference on spoken language processing (ICSLP 1996), pp 1808–1811 Scherer K (1996) Adding the affective dimension: a new look in speech analysis and synthesis. In: Proceeding of international conference on spoken language processing (ICSLP 1996), pp 1808–1811
6.
go back to reference Chen LS, Tao H, Huang TS, Miyasato T, Nakatsu R (1998) Emotion recognition from audiovisual information. In Proceedings of IEEE workshop on multimedia signal processing, Los Angeles, CA, pp 83–88, 7–9 Dec 1998 Chen LS, Tao H, Huang TS, Miyasato T, Nakatsu R (1998) Emotion recognition from audiovisual information. In Proceedings of IEEE workshop on multimedia signal processing, Los Angeles, CA, pp 83–88, 7–9 Dec 1998
7.
go back to reference De Silva L, Ng P (2000) Bimodal emotion recognition. In: Proceedings of automatic face and gesture recognition, 2000, pp 332–335 De Silva L, Ng P (2000) Bimodal emotion recognition. In: Proceedings of automatic face and gesture recognition, 2000, pp 332–335
8.
go back to reference Schneiderman B (1993) Human values and the future of technology: a declaration of responsibility. In: Schneiderman B (ed) Sparks of innovation in human-computer interaction, Ablex Publ, 1(1), Jan 1994, pp 67–71 (ACM Interactions ) Schneiderman B (1993) Human values and the future of technology: a declaration of responsibility. In: Schneiderman B (ed) Sparks of innovation in human-computer interaction, Ablex Publ, 1(1), Jan 1994, pp 67–71 (ACM Interactions )
9.
go back to reference Baker J, Deng L, Glass J, Khudanpur S, Lee C, Morgan N, O’Shaughnessy D (2009) Developments and directions in speech recognition and understanding, Part 1 [DSP Education]. IEEE Signal Process Mag 26(3):75–80CrossRef Baker J, Deng L, Glass J, Khudanpur S, Lee C, Morgan N, O’Shaughnessy D (2009) Developments and directions in speech recognition and understanding, Part 1 [DSP Education]. IEEE Signal Process Mag 26(3):75–80CrossRef
10.
go back to reference Morgan N, Zhu Q, Stolcke A, Sonmez K, Sivadas S, Shinozaki T, Ostendorf M, Jain P, Hermansky H, Ellis D, Doddington G, Chen B, Cetin O, Bourlard H, Athineos M (2005) Pushing the envelope—aside. IEEE Signal Process Mag 22(5):81–88CrossRef Morgan N, Zhu Q, Stolcke A, Sonmez K, Sivadas S, Shinozaki T, Ostendorf M, Jain P, Hermansky H, Ellis D, Doddington G, Chen B, Cetin O, Bourlard H, Athineos M (2005) Pushing the envelope—aside. IEEE Signal Process Mag 22(5):81–88CrossRef
11.
go back to reference Olukotun K (2006) A conversation with John Hennessy and David Patterson. ACM Queue Mag 4(10):14–22CrossRef Olukotun K (2006) A conversation with John Hennessy and David Patterson. ACM Queue Mag 4(10):14–22CrossRef
12.
go back to reference Klein D (2005) The unsupervised learning of natural language structure. PhD thesis, Stanford University Klein D (2005) The unsupervised learning of natural language structure. PhD thesis, Stanford University
13.
go back to reference Park A (2006) unsupervised pattern discovery in speech: applications to word acquisition and speaker segmentation. PhD thesis, MIT Park A (2006) unsupervised pattern discovery in speech: applications to word acquisition and speaker segmentation. PhD thesis, MIT
14.
go back to reference Venkataraman A (2001) A statistical model for word discovery in transcribed speech. Comput Linguist 27(3):352–372CrossRef Venkataraman A (2001) A statistical model for word discovery in transcribed speech. Comput Linguist 27(3):352–372CrossRef
15.
go back to reference Rosenberg AE, Lee CH, Soong FK (1994) Cepstral channel normalization techniques for HMM-based speaker verification. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing, 1994, pp 1835–1838 Rosenberg AE, Lee CH, Soong FK (1994) Cepstral channel normalization techniques for HMM-based speaker verification. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing, 1994, pp 1835–1838
Metadata
Title
Emotional Speech Recognition
Author
Swati Johar
Copyright Year
2016
DOI
https://doi.org/10.1007/978-3-319-28047-9_5