Skip to main content
Top

2012 | OriginalPaper | Chapter

5. Speech Under Stress and Lombard Effect: Impact and Solutions for Forensic Speaker Recognition

Authors : John H. L. Hansen, Ph.D., Abhijeet Sangwan, Ph.D., Wooil Kim, Ph.D.

Published in: Forensic Speaker Recognition

Publisher: Springer New York

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In the field of voice forensics, the ability to perform effective speaker recognition from input audio streams is an important task. However, in many situations, individuals willchange the manner in which they produce their speech due to the environment (i.e., Lombard Effect), their speaker state (i.e., emotion, cognitive stress), and secondary tasks (i.e., task stress at hand, both physical and/or cognitive). Automatic recognition schemes for both speech and speaker ID are impacted by the variability introduced in these conditions. Extensive research in the field of speech under stress has been performed for speech recognition, primarily for low-vocabulary isolated-word recognition. However, limited formal research has been performed for speaker ID/verification primarily due to the lack of effective corpora in the field. This chapter addresses speech under stress including Lombard effect for the purposes of speaker recognition. Domains where stress/variability occur (Lombard Effect, Physical Stress, Cognitive Stress) will first be considered. Next, to perform effective speaker recognition it is necessary to detect if a subject is under stress, which is a useful trait in and of itself for voice forensics and biometrics, and therefore we consider prior research on the detection of speech under stress. Next, the impact of stress on speaker recognition is considered, and finally we address ways to improve speaker recognition in these domains (TEO features, alternative sensors, classification schemes, etc.). While speech under stress has been considered, the domain of speaker recognition represents an emerging research aspect which deserves further investigations.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
2.
go back to reference Hansen JHL (1988) Analysis and compensation of stressed and noisy speech with application to robust automatic recognition. Ph.D. Thesis, School of Electrical and Computer Engineering, Georgia Institute of Technology, p 396 Hansen JHL (1988) Analysis and compensation of stressed and noisy speech with application to robust automatic recognition. Ph.D. Thesis, School of Electrical and Computer Engineering, Georgia Institute of Technology, p 396
3.
go back to reference Hansen JHL, Clements M (1987) Evaluation of speech under stress and emotional conditions. J Acoust Soc Am 82(s1):S17–S18CrossRef Hansen JHL, Clements M (1987) Evaluation of speech under stress and emotional conditions. J Acoust Soc Am 82(s1):S17–S18CrossRef
4.
go back to reference Hansen JHL (1989) Evaluation of acoustic correlates of speech under stress for robust speech recognition. IEEE proceedings of the fifteenth annual northeast bioengineering conference, (invited paper), March 1989. Boston, Mass, pp 31–32 Hansen JHL (1989) Evaluation of acoustic correlates of speech under stress for robust speech recognition. IEEE proceedings of the fifteenth annual northeast bioengineering conference, (invited paper), March 1989. Boston, Mass, pp 31–32
5.
go back to reference Hansen JHL, Clements M (1989) Stress compensation and noise reduction algorithms for robust speech recognition. IEEE proceedings international conference on acoustics, speech, and signal processing, May 1989. Glasgow, Scotland, pp 266–269 Hansen JHL, Clements M (1989) Stress compensation and noise reduction algorithms for robust speech recognition. IEEE proceedings international conference on acoustics, speech, and signal processing, May 1989. Glasgow, Scotland, pp 266–269
8.
go back to reference Hansen JHL, Bou-Ghazale S (1997) Getting started with SUSAS: a speech under simulated and actual stress database, vol 4, Sept 1997. EUROSPEECH-97, Rhodes, pp 1743–1746 Hansen JHL, Bou-Ghazale S (1997) Getting started with SUSAS: a speech under simulated and actual stress database, vol 4, Sept 1997. EUROSPEECH-97, Rhodes, pp 1743–1746
9.
go back to reference Hansen JHL, Swail C, South AJ, Moore RK, Steeneken H, Cupples EJ, Anderson T, Vloeberghs CRA, Trancoso I, Verlinde P (2000) The impact of speech under ‘stress’ on military speech technology. NATO Research and Technology Organization RTO-TR-10, AC/323(IST)TP/5 IST/TG-01, March 2000 (ISBN: 92-837-1027-4) Hansen JHL, Swail C, South AJ, Moore RK, Steeneken H, Cupples EJ, Anderson T, Vloeberghs CRA, Trancoso I, Verlinde P (2000) The impact of speech under ‘stress’ on military speech technology. NATO Research and Technology Organization RTO-TR-10, AC/323(IST)TP/5 IST/TG-01, March 2000 (ISBN: 92-837-1027-4)
10.
go back to reference Engbert IS, Hansen AV, (2007) Documentation of the Danish emotional speech database DES. Center for PersonKommunikation, Aalborg University, Denmark, Tech. Rep. Engbert IS, Hansen AV, (2007) Documentation of the Danish emotional speech database DES. Center for PersonKommunikation, Aalborg University, Denmark, Tech. Rep.
11.
go back to reference Burkhardt F, Paeschke A, Rolfes M, Sendlmeier W, Weiss B (2005) A Database of German Emotional Speech. ISCA Interspeech-05, Lisbon, pp 1517–1520 Burkhardt F, Paeschke A, Rolfes M, Sendlmeier W, Weiss B (2005) A Database of German Emotional Speech. ISCA Interspeech-05, Lisbon, pp 1517–1520
12.
go back to reference Ikeno A, Varadarajan V, Patil S, Hansen JHL (2007) UT-Scope: speech under lombard effect and cognitive stress. IEEE Aerospace Conference, March 2007, Big Sky, Montana, pp 1–7, 3–10 Ikeno A, Varadarajan V, Patil S, Hansen JHL (2007) UT-Scope: speech under lombard effect and cognitive stress. IEEE Aerospace Conference, March 2007, Big Sky, Montana, pp 1–7, 3–10
13.
go back to reference Douglas-Cowie E, Cowie R, Sneddon I, Cox C, Lowry O, McRorie M, Martin JC, Devillers L, Abrilian S, Batliner A (2007) The HUMAINE database: addressing the collection and annotation of naturalistic and induced emotional data. Affective Computing and Intelligent Interaction, pp 488-500 Douglas-Cowie E, Cowie R, Sneddon I, Cox C, Lowry O, McRorie M, Martin JC, Devillers L, Abrilian S, Batliner A (2007) The HUMAINE database: addressing the collection and annotation of naturalistic and induced emotional data. Affective Computing and Intelligent Interaction, pp 488-500
14.
go back to reference Steidl S (2009) Automatic classification of emotion-related user states in spontaneous children’s speech. Logos, Berlin Steidl S (2009) Automatic classification of emotion-related user states in spontaneous children’s speech. Logos, Berlin
15.
go back to reference Angkititrakul P, Hansen JHL (2008) UTDrive: the smart vehicle project. In-vehicle corpus and signal processing for driver behavior. Springer (Chapter 5) Angkititrakul P, Hansen JHL (2008) UTDrive: the smart vehicle project. In-vehicle corpus and signal processing for driver behavior. Springer (Chapter 5)
16.
go back to reference Angkititrakul P, Petracca M, Sathyanarayana A, Hansen JHL (2007) UTDrive: driver behavior and speech interactive systems for in-vehicle environments. IEEE Intelligent Vehicle Symposium, 13–15 June 2007, Istanbul Angkititrakul P, Petracca M, Sathyanarayana A, Hansen JHL (2007) UTDrive: driver behavior and speech interactive systems for in-vehicle environments. IEEE Intelligent Vehicle Symposium, 13–15 June 2007, Istanbul
17.
go back to reference Angkititrakul P, Hansen JHL (2007) Getting start with UTDrive: driver-behavior modeling and assessment of distraction for in-vehicle speech systems. ISCA INTERSPEECH-2007, Aug 2007, Antwerp, pp 1334–1337 Angkititrakul P, Hansen JHL (2007) Getting start with UTDrive: driver-behavior modeling and assessment of distraction for in-vehicle speech systems. ISCA INTERSPEECH-2007, Aug 2007, Antwerp, pp 1334–1337
18.
go back to reference Steininger S, Schiel F, Dioubina O, Raubold S (2002) Development of user-state conventions for the multimodal corpus in SmartKom. Workshop on Multimodal Resources and Multimodal Systems Evaluation, Las Palmas, pp 33–37 Steininger S, Schiel F, Dioubina O, Raubold S (2002) Development of user-state conventions for the multimodal corpus in SmartKom. Workshop on Multimodal Resources and Multimodal Systems Evaluation, Las Palmas, pp 33–37
19.
go back to reference Douglas-Cowie E, Campbell N, Cowie R, Roach P (2003) Emotional speech: towards a new generation of databases. Speech Commun 40(1–2):33–60CrossRefMATH Douglas-Cowie E, Campbell N, Cowie R, Roach P (2003) Emotional speech: towards a new generation of databases. Speech Commun 40(1–2):33–60CrossRefMATH
20.
go back to reference Fernandez R, Picard RW (2002) Modeling drivers’ speech under stress. ISCA Workshop (ITRW) on Speech and Emotion, Belfast Fernandez R, Picard RW (2002) Modeling drivers’ speech under stress. ISCA Workshop (ITRW) on Speech and Emotion, Belfast
21.
go back to reference Hansen JHL, Varadarajan VS (2009) Analysis and normalization of Lombard speech under different types and levels of noise with application to in-set/out-of-set speaker recognition. IEEE Trans Audio Speech Lang Process 17(2):366–378CrossRef Hansen JHL, Varadarajan VS (2009) Analysis and normalization of Lombard speech under different types and levels of noise with application to in-set/out-of-set speaker recognition. IEEE Trans Audio Speech Lang Process 17(2):366–378CrossRef
22.
go back to reference Patil S, Sangwan A, Hansen JHL (2010) Speech under physical stress: a production-based framework. IEEE ICASSP-2010: International Conference Acoustics, Speech, and Signal Processing, Dallas, pp 5146–5149 Patil S, Sangwan A, Hansen JHL (2010) Speech under physical stress: a production-based framework. IEEE ICASSP-2010: International Conference Acoustics, Speech, and Signal Processing, Dallas, pp 5146–5149
23.
go back to reference Cairns D, Hansen JHL (1994) Nonlinear analysis and detection of speech under stressed conditions. J Acoust Soc Am 96(6):3392–3400CrossRef Cairns D, Hansen JHL (1994) Nonlinear analysis and detection of speech under stressed conditions. J Acoust Soc Am 96(6):3392–3400CrossRef
24.
go back to reference Kaiser JF (1990) On a simple algorithm to calculate the ‘energy’ of a signal. IEEE ICASSP-1990, New Mexico, pp 381–384 Kaiser JF (1990) On a simple algorithm to calculate the ‘energy’ of a signal. IEEE ICASSP-1990, New Mexico, pp 381–384
25.
go back to reference Hansen JHL (1996) Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition. Speech Commun 20(2):151–170CrossRef Hansen JHL (1996) Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition. Speech Commun 20(2):151–170CrossRef
26.
go back to reference Hansen JHL (1993) Adaptive source generator compensation and enhancement for speech recognition in noisy stressful environments, vol II. IEEE ICASSP-93, April 1993, Minneapolis, pp 95–98 Hansen JHL (1993) Adaptive source generator compensation and enhancement for speech recognition in noisy stressful environments, vol II. IEEE ICASSP-93, April 1993, Minneapolis, pp 95–98
27.
go back to reference Hansen JHL, Womack B, Arslan L (1994) A source generator based production model for environmental robustness in speech recognition, vol 3. ICSLP-94: international conference spoken language processing, Sept 1994, Yokohama, pp 1003–1006 Hansen JHL, Womack B, Arslan L (1994) A source generator based production model for environmental robustness in speech recognition, vol 3. ICSLP-94: international conference spoken language processing, Sept 1994, Yokohama, pp 1003–1006
28.
go back to reference Bou-Ghazale S, Hansen JHL (1995) Improving recognition and synthesis of stressed speech via feature perturbation in a source generator framework. NATO-ESCA international tutorial and research workshop on speech under stress, Sept 1995, Lisbon, pp 45–48 Bou-Ghazale S, Hansen JHL (1995) Improving recognition and synthesis of stressed speech via feature perturbation in a source generator framework. NATO-ESCA international tutorial and research workshop on speech under stress, Sept 1995, Lisbon, pp 45–48
29.
go back to reference Bou-Ghazale S, Hansen JHL (1995) A source generator based modeling framework for synthesis of speech under stress, vol 1. IEEE ICASSP-95: international conference on acoustics, speech, and signal processing, May 1995, Detroit, pp 664–667 Bou-Ghazale S, Hansen JHL (1995) A source generator based modeling framework for synthesis of speech under stress, vol 1. IEEE ICASSP-95: international conference on acoustics, speech, and signal processing, May 1995, Detroit, pp 664–667
30.
go back to reference Hansen JHL, Cairns D (1995) ICARUS: a source generator based real-time system for speech recognition in noise, stress, and Lombard effect. Speech Commun 16(4):391–422CrossRef Hansen JHL, Cairns D (1995) ICARUS: a source generator based real-time system for speech recognition in noise, stress, and Lombard effect. Speech Commun 16(4):391–422CrossRef
31.
go back to reference Hansen JHL, Clements M (1995) Source generator equalization and enhancement of spectral properties for robust speech recognition in noise and stress. IEEE Trans Speech Audio Process 3(5):407–415CrossRef Hansen JHL, Clements M (1995) Source generator equalization and enhancement of spectral properties for robust speech recognition in noise and stress. IEEE Trans Speech Audio Process 3(5):407–415CrossRef
32.
go back to reference Hansen JHL, Womack B (1996) Feature analysis and neural network based classification of speech under stress. IEEE Trans Speech Audio Process 4(4):307-313CrossRef Hansen JHL, Womack B (1996) Feature analysis and neural network based classification of speech under stress. IEEE Trans Speech Audio Process 4(4):307-313CrossRef
33.
go back to reference Zhou G, Hansen JHL, Kaiser JF (2001) Nonlinear feature based classification of speech under stress. IEEE Trans Speech Audio Process 9(2):201–216CrossRef Zhou G, Hansen JHL, Kaiser JF (2001) Nonlinear feature based classification of speech under stress. IEEE Trans Speech Audio Process 9(2):201–216CrossRef
34.
go back to reference Rahurkar M, Hansen JHL, Meyerhoff J, Saviolakis G, Koenig M (2002) Frequency band analysis for stress detection using a Teager energy operator based feature. ISCA INTERSPEECH-02/ICSLP-02, Denver, pp 2021–2024 Rahurkar M, Hansen JHL, Meyerhoff J, Saviolakis G, Koenig M (2002) Frequency band analysis for stress detection using a Teager energy operator based feature. ISCA INTERSPEECH-02/ICSLP-02, Denver, pp 2021–2024
35.
go back to reference Hansen JHL, Kim W, Rahurkar M, Ruzanski E, Meyerhoff J (2011) Robust emotional stressed speech detection using weighted frequency subbands. EURASIP J Advan Signal Processing. doi:10.1155/2011/906789 Hansen JHL, Kim W, Rahurkar M, Ruzanski E, Meyerhoff J (2011) Robust emotional stressed speech detection using weighted frequency subbands. EURASIP J Advan Signal Processing. doi:10.1155/2011/906789
36.
go back to reference Rahurkar MA, Hansen JHL, Meyerhoff J, Saviolakis G, Koenig M (2003) Frequency distribution based weighted sub-band approach for classification of emotional/stressful content in speech. ISCA INTERSPEECH-03, Sept 2003, Geneva, pp 721–724 Rahurkar MA, Hansen JHL, Meyerhoff J, Saviolakis G, Koenig M (2003) Frequency distribution based weighted sub-band approach for classification of emotional/stressful content in speech. ISCA INTERSPEECH-03, Sept 2003, Geneva, pp 721–724
37.
go back to reference Ruzanski E, Hansen JHL, Finan D, Meyerhoff J (2005) Improved ‘TEO’ feature-based automatic stress detection using physiological and acoustic speech sensors. ISCA INTERSPEECH-05, Sept 2005, Lisbon, pp 2653–2656 Ruzanski E, Hansen JHL, Finan D, Meyerhoff J (2005) Improved ‘TEO’ feature-based automatic stress detection using physiological and acoustic speech sensors. ISCA INTERSPEECH-05, Sept 2005, Lisbon, pp 2653–2656
38.
go back to reference Patil S, Hansen JHL (2008) Detection of speech under physical stress: model development, sensor selection, and feature fusion. ISCA INTERSPEECH-08, Sept 2008, Brisbane, pp 817–820 Patil S, Hansen JHL (2008) Detection of speech under physical stress: model development, sensor selection, and feature fusion. ISCA INTERSPEECH-08, Sept 2008, Brisbane, pp 817–820
39.
go back to reference Godin KW, Hansen JHL (2008) Analysis and perception of speech under physical task stress. ISCA INTERSPEECH-08, Sept 2008, Brisbane, pp 1674–1677 Godin KW, Hansen JHL (2008) Analysis and perception of speech under physical task stress. ISCA INTERSPEECH-08, Sept 2008, Brisbane, pp 1674–1677
40.
go back to reference Boril H, Sadjadi O, Kleinschmidt T, Hansen JHL (2010) Analysis and detection of cognitive load and frustration in drivers’ speech. ISCA Interspeech-10, 26–30 Sept 2010, Makuhari, pp 502–505 Boril H, Sadjadi O, Kleinschmidt T, Hansen JHL (2010) Analysis and detection of cognitive load and frustration in drivers’ speech. ISCA Interspeech-10, 26–30 Sept 2010, Makuhari, pp 502–505
41.
go back to reference Casale S, Russo A, Serrano S (2007) Multistyle classification of speech under stress using feature subset selection based on genetic algorithms. Speech Commun 49:801–810CrossRef Casale S, Russo A, Serrano S (2007) Multistyle classification of speech under stress using feature subset selection based on genetic algorithms. Speech Commun 49:801–810CrossRef
42.
go back to reference Karlsson I, Banziger T, Dankovicov J, Johnstone T, Lindberg J, Melin H, Nolan F, Scherer K (2000) Verification with elicited speaking styles in the VeriVox project. Speech Commun 31:121–129CrossRef Karlsson I, Banziger T, Dankovicov J, Johnstone T, Lindberg J, Melin H, Nolan F, Scherer K (2000) Verification with elicited speaking styles in the VeriVox project. Speech Commun 31:121–129CrossRef
43.
go back to reference Lippmann R, Martin E, Paul D (1987) Multi-style training for robust isolated-word speech recognition. IEEE ICASSP-87, April 1987, pp 705–708 Lippmann R, Martin E, Paul D (1987) Multi-style training for robust isolated-word speech recognition. IEEE ICASSP-87, April 1987, pp 705–708
44.
go back to reference Chen Y (1988) Cepstral domain talker stress compensation for robust speech recognition. IEEE Trans Acoust Speech Signal Proc 36(4):433–439CrossRefMATH Chen Y (1988) Cepstral domain talker stress compensation for robust speech recognition. IEEE Trans Acoust Speech Signal Proc 36(4):433–439CrossRefMATH
45.
go back to reference Varadarajan VS, Hansen JHL (2006) Analysis of Lombard effect under different types and levels of background noise with application to in-set speaker ID systems. ISCA INTERSPEECH-06, Sept 2006, Pittsburgh, pp 937–940 Varadarajan VS, Hansen JHL (2006) Analysis of Lombard effect under different types and levels of background noise with application to in-set speaker ID systems. ISCA INTERSPEECH-06, Sept 2006, Pittsburgh, pp 937–940
46.
go back to reference Varadarajan VS, Hansen JHL,Ikeno A (2006) UT-Scope: a corpus for speech under cognitive/physical task stress and emotional. ELRA—LREC-2006: language resources and evaluation conference, May 22–26, 2006, Genoa Varadarajan VS, Hansen JHL,Ikeno A (2006) UT-Scope: a corpus for speech under cognitive/physical task stress and emotional. ELRA—LREC-2006: language resources and evaluation conference, May 22–26, 2006, Genoa
47.
go back to reference Ikeno A, Hansen JHL (2007) Lombard speech impact on perceptual speaker recognition. ISCA INTERSPEECH-07, Aug 2007, Antwerp, pp 414–417 Ikeno A, Hansen JHL (2007) Lombard speech impact on perceptual speaker recognition. ISCA INTERSPEECH-07, Aug 2007, Antwerp, pp 414–417
48.
go back to reference Narayana ML, Kopparapu SK (2009) On the use of stress information in speech for speaker recognition. TENCON-2009, Jan 2009 Narayana ML, Kopparapu SK (2009) On the use of stress information in speech for speaker recognition. TENCON-2009, Jan 2009
49.
go back to reference Hansen JHL, Patil S (2007) Speech under stress: analysis, modeling and recognition. In: Müller C (ed) Speaker classification I: fundamentals, features, and methods. Springer, Berlin, pp 108–137CrossRef Hansen JHL, Patil S (2007) Speech under stress: analysis, modeling and recognition. In: Müller C (ed) Speaker classification I: fundamentals, features, and methods. Springer, Berlin, pp 108–137CrossRef
50.
go back to reference Boril H (2008) Robust speech recognition: analysis and equalization of Lombard effect in Czech corpora. PhD Thesis, Czech Technical University, Prague, p 149 Boril H (2008) Robust speech recognition: analysis and equalization of Lombard effect in Czech corpora. PhD Thesis, Czech Technical University, Prague, p 149
Metadata
Title
Speech Under Stress and Lombard Effect: Impact and Solutions for Forensic Speaker Recognition
Authors
John H. L. Hansen, Ph.D.
Abhijeet Sangwan, Ph.D.
Wooil Kim, Ph.D.
Copyright Year
2012
Publisher
Springer New York
DOI
https://doi.org/10.1007/978-1-4614-0263-3_5