Skip to main content
Top
Published in: Medical & Biological Engineering & Computing 6/2019

04-03-2019 | Original Article

Cochlea-inspired speech recognition interface

Authors: Mladen Russo, Maja Stella, Marjan Sikora, Matko Šarić

Published in: Medical & Biological Engineering & Computing | Issue 6/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Automatic speech recognition (ASR) technology provides a natural interface for human-machine interaction. Typical ASR systems can achieve high performance in quiet environments but, unlike humans, perform poorly in real-world situations. To better simulate the human auditory periphery and improve the performance in realistic noisy scenarios, we propose two models of speech recognition front-ends based on a biophysical cochlear model. The first front-end is based on the method of signal reconstruction from a basilar membrane response. When applied to noisy speech, this method results in improved signal quality. This method can be used as a preprocessing step in a standard ASR system and can also be used as a noise reduction technique for other applications. The second front-end we propose is based on the construction of speech recognition coefficients directly from a basilar membrane response. Experimental results using a continuous-density hidden Markov model (HMM) recognizer demonstrate significant improvement in performance compared to standard Mel-frequency cepstral coefficients (MFCC) in various types of noisy conditions.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Muhammad G (2015) Automatic speech recognition using interlaced derivative pattern for cloud based healthcare system. Clust Comput 18(2):795CrossRef Muhammad G (2015) Automatic speech recognition using interlaced derivative pattern for cloud based healthcare system. Clust Comput 18(2):795CrossRef
2.
go back to reference Lee S, Kang S, Han DK, Ko H (2016) Dialogue enabling speech-to-text user assistive agent system for hearing-impaired person. Med Biol Eng Comput 54(6):915CrossRef Lee S, Kang S, Han DK, Ko H (2016) Dialogue enabling speech-to-text user assistive agent system for hearing-impaired person. Med Biol Eng Comput 54(6):915CrossRef
3.
go back to reference Jain V et al (2008) An expert system for predicting the effects of speech interference due to noise pollution on humans using fuzzy approach. Expert Syst Appl 35(4):1978CrossRef Jain V et al (2008) An expert system for predicting the effects of speech interference due to noise pollution on humans using fuzzy approach. Expert Syst Appl 35(4):1978CrossRef
4.
go back to reference Mporas I, Kocsis O, Ganchev T, Fakotakis N (2010) Robust speech interaction in motorcycle environment. Expert Syst Appl 37(3):1827CrossRef Mporas I, Kocsis O, Ganchev T, Fakotakis N (2010) Robust speech interaction in motorcycle environment. Expert Syst Appl 37(3):1827CrossRef
6.
go back to reference Li J, Deng L, Gong Y, Haeb-Umbach R (2014) An overview of noise-robust automatic speech recognition. IEEE/ACM Trans Audio Speech, Lang Process 22(4):745CrossRef Li J, Deng L, Gong Y, Haeb-Umbach R (2014) An overview of noise-robust automatic speech recognition. IEEE/ACM Trans Audio Speech, Lang Process 22(4):745CrossRef
7.
go back to reference Lippmann RP (1997) Speech recognition by machines and humans. Speech Commun 22(1):1CrossRef Lippmann RP (1997) Speech recognition by machines and humans. Speech Commun 22(1):1CrossRef
8.
go back to reference Stolcke A, Droppo J (2017) Comparing human and machine errors in conversational speech transcription. In: INTERSPEECH (2017) Stolcke A, Droppo J (2017) Comparing human and machine errors in conversational speech transcription. In: INTERSPEECH (2017)
9.
go back to reference Tchorz J, Kollmeier B (1999) A model of auditory perception as front end for automatic speech recognition. J Acoust Soc Amer 106(4):2040CrossRef Tchorz J, Kollmeier B (1999) A model of auditory perception as front end for automatic speech recognition. J Acoust Soc Amer 106(4):2040CrossRef
10.
go back to reference Hermansky H (1990) Perceptual linear predictive (PLP) analysis of speech. J Acoust Soc Amer 87(4):1738CrossRef Hermansky H (1990) Perceptual linear predictive (PLP) analysis of speech. J Acoust Soc Amer 87(4):1738CrossRef
11.
go back to reference Holmberg M, Gelbart D, Hemmert W (2006) Automatic speech recognition with an adaptation model motivated by auditory processing. IEEE Trans Audio, Speech, Lang Process 14(1):43CrossRef Holmberg M, Gelbart D, Hemmert W (2006) Automatic speech recognition with an adaptation model motivated by auditory processing. IEEE Trans Audio, Speech, Lang Process 14(1):43CrossRef
12.
go back to reference Jankowski Jr CR, Vo HDH, Lippmann RP (1995) A comparison of signal processing front ends for automatic word recognition. IEEE Trans Speech Audio Process 3(4):286CrossRef Jankowski Jr CR, Vo HDH, Lippmann RP (1995) A comparison of signal processing front ends for automatic word recognition. IEEE Trans Speech Audio Process 3(4):286CrossRef
13.
go back to reference Seneff S (1986) A computational model for the peripheral auditory system: application of speech recognition research. In: Proceedings ICASSP’86, vol 11, pp 1983–1986 (1986) Seneff S (1986) A computational model for the peripheral auditory system: application of speech recognition research. In: Proceedings ICASSP’86, vol 11, pp 1983–1986 (1986)
14.
go back to reference Ghitza O (1994) Auditory models and human performance in tasks related to speech coding and speech recognition. IEEE Trans Speech Audio Process 2(1):115CrossRef Ghitza O (1994) Auditory models and human performance in tasks related to speech coding and speech recognition. IEEE Trans Speech Audio Process 2(1):115CrossRef
15.
go back to reference Kim DS, Lee SY, Kil R (1999) Auditory processing of speech signals for robust speech recognition in real-world noisy environments. IEEE Trans Speech Audio Process 7(1):55CrossRef Kim DS, Lee SY, Kil R (1999) Auditory processing of speech signals for robust speech recognition in real-world noisy environments. IEEE Trans Speech Audio Process 7(1):55CrossRef
16.
go back to reference Moritz N, Anemüller J, Kollmeier B (2015) An auditory inspired amplitude modulation filter bank for robust feature extraction in automatic speech recognition. IEEE/ACM IEEE Trans Audio Speech Lang Process 23 (11):1926 Moritz N, Anemüller J, Kollmeier B (2015) An auditory inspired amplitude modulation filter bank for robust feature extraction in automatic speech recognition. IEEE/ACM IEEE Trans Audio Speech Lang Process 23 (11):1926
17.
go back to reference Stern RM, Morgan N (2012) Hearing is believing: biologically inspired methods for robust automatic speech recognition. IEEE Sig Process Mag 29(6):34CrossRef Stern RM, Morgan N (2012) Hearing is believing: biologically inspired methods for robust automatic speech recognition. IEEE Sig Process Mag 29(6):34CrossRef
18.
go back to reference Mammano F, Nobili R (1993) Biophysics of the cochlea: linear approximation. J Acoust Soc Amer 93 (6):3320CrossRef Mammano F, Nobili R (1993) Biophysics of the cochlea: linear approximation. J Acoust Soc Amer 93 (6):3320CrossRef
19.
go back to reference Nobili R, Mammano F (1996) Biophysics of the cochlea. II: stationary nonlinear phenomenology. J Acoust Soc Amer 99(4):2244CrossRef Nobili R, Mammano F (1996) Biophysics of the cochlea. II: stationary nonlinear phenomenology. J Acoust Soc Amer 99(4):2244CrossRef
21.
go back to reference Patterson R, Nimmo-Smith I, Holdsworth J, Rice P (1988) An efficient auditory filterbank based on the gammatone function. APU report 2341 Patterson R, Nimmo-Smith I, Holdsworth J, Rice P (1988) An efficient auditory filterbank based on the gammatone function. APU report 2341
22.
go back to reference van Netten SM, Duifhuis H (1983) Modelling an active, nonlinear cochlea. In: Boer ED, Viergever MA (eds) Mechanics of hearing. (Nijhoff/Delft Univ. Press, 1983), pp 143–151 van Netten SM, Duifhuis H (1983) Modelling an active, nonlinear cochlea. In: Boer ED, Viergever MA (eds) Mechanics of hearing. (Nijhoff/Delft Univ. Press, 1983), pp 143–151
23.
go back to reference Nobili R, Vetesnik A, Turicchia L, Mammano F (2003) Otoacoustic emissions from residual oscillations of the cochlear basilar membrane in a human ear model. J Assoc Res Otolaryngol 4(4):478CrossRef Nobili R, Vetesnik A, Turicchia L, Mammano F (2003) Otoacoustic emissions from residual oscillations of the cochlear basilar membrane in a human ear model. J Assoc Res Otolaryngol 4(4):478CrossRef
24.
go back to reference Russo M, Rozic N, Stella M (2011) Biophysical cochlear model: time-frequency analysis and signal reconstruction. Acta Acustica united with Acustica 97(4):632CrossRef Russo M, Rozic N, Stella M (2011) Biophysical cochlear model: time-frequency analysis and signal reconstruction. Acta Acustica united with Acustica 97(4):632CrossRef
25.
go back to reference ITU-T. Rec. P.862. Perceptual evaluation of speech quality (PESQ): an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs (2001) ITU-T. Rec. P.862. Perceptual evaluation of speech quality (PESQ): an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs (2001)
26.
go back to reference Robles L, Ruggero MA (2001) Mechanics of the mammalian cochlea. Physiol Rev 81(3):1305CrossRef Robles L, Ruggero MA (2001) Mechanics of the mammalian cochlea. Physiol Rev 81(3):1305CrossRef
27.
go back to reference Kim Y, Xin J, Qi Y (2006) A study of hearing aid gain functions based on a nonlinear nonlocal feedforward cochlea model. Hear Res 215(1-2):84CrossRef Kim Y, Xin J, Qi Y (2006) A study of hearing aid gain functions based on a nonlinear nonlocal feedforward cochlea model. Hear Res 215(1-2):84CrossRef
28.
go back to reference Iwano K, Seki T, Furui S (2002) Noise robust speech recognition using F0 contour extracted by hough transform. In: INTERSPEECH (2002) Iwano K, Seki T, Furui S (2002) Noise robust speech recognition using F0 contour extracted by hough transform. In: INTERSPEECH (2002)
29.
go back to reference Gu L, Rose K (2001) Perceptual harmonic cepstral coefficients for speech recognition in noisy environment. In: Proceedings ICASSP’01, vol 1, pp 125–128 (2001) Gu L, Rose K (2001) Perceptual harmonic cepstral coefficients for speech recognition in noisy environment. In: Proceedings ICASSP’01, vol 1, pp 125–128 (2001)
31.
go back to reference Young SJ, Evermann G, Gales MJF, Hain T, Kershaw D, Moore G, Odell J, Ollason D, Povey D, Valtchev V, Woodland PC (2006) The HTK Book, version, 3.4, (Cambridge University Engineering Department, 2006) Young SJ, Evermann G, Gales MJF, Hain T, Kershaw D, Moore G, Odell J, Ollason D, Povey D, Valtchev V, Woodland PC (2006) The HTK Book, version, 3.4, (Cambridge University Engineering Department, 2006)
32.
go back to reference Martin R (2001) Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Trans Speech Audio Process 9(5):504CrossRef Martin R (2001) Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Trans Speech Audio Process 9(5):504CrossRef
Metadata
Title
Cochlea-inspired speech recognition interface
Authors
Mladen Russo
Maja Stella
Marjan Sikora
Matko Šarić
Publication date
04-03-2019
Publisher
Springer Berlin Heidelberg
Published in
Medical & Biological Engineering & Computing / Issue 6/2019
Print ISSN: 0140-0118
Electronic ISSN: 1741-0444
DOI
https://doi.org/10.1007/s11517-019-01963-6

Other articles of this Issue 6/2019

Medical & Biological Engineering & Computing 6/2019 Go to the issue

Premium Partner