Skip to main content

2017 | OriginalPaper | Buchkapitel

An Algorithm for Detection of Breath Sounds in Spontaneous Speech with Application to Speaker Recognition

verfasst von : Sri Harsha Dumpala, K. N. R. K. Raju Alluri

Erschienen in: Speech and Computer

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Automatic detection and demarcation of non-speech sounds in speech is critical for developing sophisticated human-machine interaction systems. The main objective of this study is to develop acoustic features capturing the production differences between speech and breath sounds in terms of both, excitation source and vocal tract system based characteristics. Using these features, a rule-based algorithm is proposed for automatic detection of breath sounds in spontaneous speech. The proposed algorithm outperforms the previous methods for detection of breath sounds in spontaneous speech. Further, the importance of breath detection for speaker recognition is analyzed by considering an i-vector-based speaker recognition system. Experimental results show that the detection of breath sounds, prior to i-vector extraction, is essential to nullify the effect of breath sounds occurring in test samples on speaker recognition, which otherwise will degrade the performance of i-vector-based speaker recognition systems.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Lei, B., Rahman, S.A., Song, I.: Content-based classification of breath sound with enhanced features. Neurocomputing 141, 139–147 (2014)CrossRef Lei, B., Rahman, S.A., Song, I.: Content-based classification of breath sound with enhanced features. Neurocomputing 141, 139–147 (2014)CrossRef
2.
Zurück zum Zitat Dumpala, S.H., Sridaran, K.V., Gangashetty, S.V., Yegnanarayana, B.: Analysis of laughter and speech-laugh signals using excitation source information. In: ICASSP, pp. 975–979 (2014) Dumpala, S.H., Sridaran, K.V., Gangashetty, S.V., Yegnanarayana, B.: Analysis of laughter and speech-laugh signals using excitation source information. In: ICASSP, pp. 975–979 (2014)
3.
Zurück zum Zitat Drugman, T., Urbain, J., Dutoit, T.: Assessment of audio features for automatic cough detection. In: EUSIPCO, pp. 1289–1293 (2011) Drugman, T., Urbain, J., Dutoit, T.: Assessment of audio features for automatic cough detection. In: EUSIPCO, pp. 1289–1293 (2011)
4.
Zurück zum Zitat Dumpala, S.H., Gangamohan, P., Gangashetty, S.V., Yegnanarayana, B.: Use of vowels in discriminating speech-laugh from laughter and neutral speech. In: Interspeech, pp. 1437–1441 (2016) Dumpala, S.H., Gangamohan, P., Gangashetty, S.V., Yegnanarayana, B.: Use of vowels in discriminating speech-laugh from laughter and neutral speech. In: Interspeech, pp. 1437–1441 (2016)
5.
Zurück zum Zitat Ruinskiy, D., Lavner, Y.: An effective algorithm for automatic detection and exact demarcation of breath sounds in speech and song signals. IEEE Trans. Audio Speech Lang. Process. 15(3), 838–850 (2007)CrossRef Ruinskiy, D., Lavner, Y.: An effective algorithm for automatic detection and exact demarcation of breath sounds in speech and song signals. IEEE Trans. Audio Speech Lang. Process. 15(3), 838–850 (2007)CrossRef
6.
Zurück zum Zitat Zelasko, P., Jadczyk, T., Zilko, B.: HMM-based breath and filled pauses elimination in ASR. In: SIGMAP, pp. 255–260 (2014) Zelasko, P., Jadczyk, T., Zilko, B.: HMM-based breath and filled pauses elimination in ASR. In: SIGMAP, pp. 255–260 (2014)
7.
Zurück zum Zitat Igras, M., Zilko, B.: Wavelet method for breath detection in audio signals. In: ICME, pp. 1–6 (2013) Igras, M., Zilko, B.: Wavelet method for breath detection in audio signals. In: ICME, pp. 1–6 (2013)
8.
Zurück zum Zitat Godin, K.W., Hansen, J.H.: Physical task stress and speaker variability in voice quality. EURASIP J. Audio Speech Music Proc. 1, 1–13 (2015) Godin, K.W., Hansen, J.H.: Physical task stress and speaker variability in voice quality. EURASIP J. Audio Speech Music Proc. 1, 1–13 (2015)
9.
Zurück zum Zitat Nakano, T., Ogata, J., Goto, M., Hiraga, Y.: Analysis and automatic detection of breath sounds in unaccompanied singing voice. In: ICMPC, pp. 387–390 (2008) Nakano, T., Ogata, J., Goto, M., Hiraga, Y.: Analysis and automatic detection of breath sounds in unaccompanied singing voice. In: ICMPC, pp. 387–390 (2008)
10.
Zurück zum Zitat Igras, M., Zilko, B.: Different types of pauses as a source of biometry. In: Models and Analysis of Vocal Emissions for Biomedical Applications, pp. 197–200 (2013) Igras, M., Zilko, B.: Different types of pauses as a source of biometry. In: Models and Analysis of Vocal Emissions for Biomedical Applications, pp. 197–200 (2013)
11.
Zurück zum Zitat Rapcan, V., D’Arcy, S., Reilly, R.B.: Automatic breath sound detection and removal for cognitive studies of speech and language. In: ISSC, pp. 1–6 (2009) Rapcan, V., D’Arcy, S., Reilly, R.B.: Automatic breath sound detection and removal for cognitive studies of speech and language. In: ISSC, pp. 1–6 (2009)
12.
Zurück zum Zitat Janicki, A.: On the impact of non-speech sounds on speaker recognition. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2012. LNCS, vol. 7499, pp. 566–572. Springer, Heidelberg (2012). doi:10.1007/978-3-642-32790-2_69 CrossRef Janicki, A.: On the impact of non-speech sounds on speaker recognition. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2012. LNCS, vol. 7499, pp. 566–572. Springer, Heidelberg (2012). doi:10.​1007/​978-3-642-32790-2_​69 CrossRef
13.
Zurück zum Zitat Pitt, M.A., Dilley, L., Johnson, K., Kiesling, S., Raymond, W., Hume, E., Fosler-Lussier, E.: Buckeye Corpus of Conversational Speech (2nd release). Department of Psychology, Ohio State University (Distributor), Columbus, OH (2007) Pitt, M.A., Dilley, L., Johnson, K., Kiesling, S., Raymond, W., Hume, E., Fosler-Lussier, E.: Buckeye Corpus of Conversational Speech (2nd release). Department of Psychology, Ohio State University (Distributor), Columbus, OH (2007)
14.
Zurück zum Zitat Dumpala, S.H., Nellore, B.T., Nevali, R.R., Gangashetty, S.V., Yegnanarayana, B.: Robust features for sonorant segmentation in continuous speech. In: Interspeech, pp. 1987–1991 (2015) Dumpala, S.H., Nellore, B.T., Nevali, R.R., Gangashetty, S.V., Yegnanarayana, B.: Robust features for sonorant segmentation in continuous speech. In: Interspeech, pp. 1987–1991 (2015)
15.
Zurück zum Zitat Dumpala, S.H., Nellore, B.T., Nevali, R.R., Gangashetty, S.V., Yegnanarayana, B.: Robust vowel landmark detection using epoch-based features. In: Interspeech, pp. 160–164 (2016) Dumpala, S.H., Nellore, B.T., Nevali, R.R., Gangashetty, S.V., Yegnanarayana, B.: Robust vowel landmark detection using epoch-based features. In: Interspeech, pp. 160–164 (2016)
16.
Zurück zum Zitat Murty, K.S.R., Yegnanarayana, B.: Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16, 1602–1613 (2008)CrossRef Murty, K.S.R., Yegnanarayana, B.: Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16, 1602–1613 (2008)CrossRef
17.
Zurück zum Zitat Yegnanarayana, B., Dhananjaya, N.G.: Spectro-temporal analysis of speech signals using zero-time windowing and group delay function. Speech Commun. 55(6), 782–795 (2013)CrossRef Yegnanarayana, B., Dhananjaya, N.G.: Spectro-temporal analysis of speech signals using zero-time windowing and group delay function. Speech Commun. 55(6), 782–795 (2013)CrossRef
18.
Zurück zum Zitat Hirose, H.: Investigating the physiology of laryngeal structures. In: The Handbook of Phonetic Sciences, Cambridge, pp. 116–136 (1995) Hirose, H.: Investigating the physiology of laryngeal structures. In: The Handbook of Phonetic Sciences, Cambridge, pp. 116–136 (1995)
21.
Zurück zum Zitat Dumpala, S.H., Kopparapu, S.K.: Improved speaker recognition system for stressed speech using deep neural networks. In: IJCNN, pp. 1257–1264 (2017) Dumpala, S.H., Kopparapu, S.K.: Improved speaker recognition system for stressed speech using deep neural networks. In: IJCNN, pp. 1257–1264 (2017)
Metadaten
Titel
An Algorithm for Detection of Breath Sounds in Spontaneous Speech with Application to Speaker Recognition
verfasst von
Sri Harsha Dumpala
K. N. R. K. Raju Alluri
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-66429-3_9