Skip to main content

2020 | OriginalPaper | Buchkapitel

Binaural Technology for Machine Speech Recognition and Understanding

verfasst von : Richard M. Stern, Anjali Menon

Erschienen in: The Technology of Binaural Understanding

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

It is well known that binaural processing is very useful for separating incoming sound sources as well as for improving speech intelligibility in reverberant environments. This chapter describes and compares a number of ways in which automatic-speech-recognition accuracy in difficult acoustical environments can be improved through the use of signal processing techniques that are motivated by our understanding of binaural perception and binaural technology. These approaches are all based on the exploitation of interaural differences in arrival time and intensity of the signals arriving at the two ears to separate signals according to direction of arrival and to enhance the desired target signal. Their structure is motivated by classic models of binaural hearing as well as the precedence effect. We describe the structure and operation of a number of methods that use two or more microphones to improve the accuracy of automatic-speech-recognition systems operating in cluttered, noisy, and reverberant environments. The individual implementations differ in the methods by which binaural principles are imposed on speech processing, and in the precise mechanism used to extract interaural time and intensity differences. Algorithms that exploit binaural information can provide substantially improved speech-recognition accuracy in noisy, cluttered, and reverberant environments compared to baseline delay-and-sum beamforming. The type of signal manipulation that is most effective for improving performance in reverberation is different from what is most effective for ameliorating the effects of degradation caused by spatially-separated interfering sound sources.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Aarabi, P., and G. Shi. 2004. Phase-based dual-microphone robust speech enhancment. IEEE Transactions on Systems, Man, and Cybernetics, Part B 34: 1763–1773. Aarabi, P., and G. Shi. 2004. Phase-based dual-microphone robust speech enhancment. IEEE Transactions on Systems, Man, and Cybernetics, Part B 34: 1763–1773.
Zurück zum Zitat Allen, J.B., D.A. Berkley, and J. Blauert. 1977. Multimicrophone signal-processing technique to remove room reverberation from speech signals. Journal of the Acoustical Society of America 62 (4): 912–915.ADS Allen, J.B., D.A. Berkley, and J. Blauert. 1977. Multimicrophone signal-processing technique to remove room reverberation from speech signals. Journal of the Acoustical Society of America 62 (4): 912–915.ADS
Zurück zum Zitat Allen, J.B., and L.R. Rabiner. 1977. A unified approach to short-time Fourier analysis and synthesis. Proceedings of the IEEE 65 (11): 1558–1564. Allen, J.B., and L.R. Rabiner. 1977. A unified approach to short-time Fourier analysis and synthesis. Proceedings of the IEEE 65 (11): 1558–1564.
Zurück zum Zitat Araki, S., T. Hayashi, M. Delcroix, M. Fujimoto, K. Takeda, and T. Nakatani. 2015. Exploring multi-channel features for denoissing-autoencoder-based speech enhancement. In Proceedings on IEEE International Conference on Acoustics, Speech and Signal Processing, 116–120 Araki, S., T. Hayashi, M. Delcroix, M. Fujimoto, K. Takeda, and T. Nakatani. 2015. Exploring multi-channel features for denoissing-autoencoder-based speech enhancement. In Proceedings on IEEE International Conference on Acoustics, Speech and Signal Processing, 116–120
Zurück zum Zitat Beutelmann, R., and T. Brand. 2006. Prediction of speech intelligibility in spatial noise and reverberation for normal-hearing and hearing-impaired listeners. Journal of Acoustical Society of America 120: 331–342.ADS Beutelmann, R., and T. Brand. 2006. Prediction of speech intelligibility in spatial noise and reverberation for normal-hearing and hearing-impaired listeners. Journal of Acoustical Society of America 120: 331–342.ADS
Zurück zum Zitat Beutelmann, R., T. Brand, and B. Kollmeier. 2010. Revision, extension, and evaluation of a binaural speech intelligibility model. Journal of Acoustical Society of America 127: 2479–2497.ADS Beutelmann, R., T. Brand, and B. Kollmeier. 2010. Revision, extension, and evaluation of a binaural speech intelligibility model. Journal of Acoustical Society of America 127: 2479–2497.ADS
Zurück zum Zitat Blauert, J. 1980. Modeling of interaural time and intensity difference discrimination. In Psychophysical, Physiological, and Behavioural Studies in Hearing, eds. G. van den Brink, and F. Bilsen, 412–424. Delft: Delft University Press. Blauert, J. 1980. Modeling of interaural time and intensity difference discrimination. In Psychophysical, Physiological, and Behavioural Studies in Hearing, eds. G. van den Brink, and F. Bilsen, 412–424. Delft: Delft University Press.
Zurück zum Zitat Blauert, J. 1983. Review paper: Psychoacoustic binaural phenomena. In Hearing–Physiologica Bases and Psychophysics, eds. R. Klinke, and R. Hartmann, 182–189. Heidelberg: Springer-Verlag. Blauert, J. 1983. Review paper: Psychoacoustic binaural phenomena. In Hearing–Physiologica Bases and Psychophysics, eds. R. Klinke, and R. Hartmann, 182–189. Heidelberg: Springer-Verlag.
Zurück zum Zitat Blauert, J. 1997. Spatial Hearing: The Psychophysics of Human Sound Localization, 2nd ed. Cambridge, MA: MIT Press. Blauert, J. 1997. Spatial Hearing: The Psychophysics of Human Sound Localization, 2nd ed. Cambridge, MA: MIT Press.
Zurück zum Zitat Blauert, J., and W. Cobben. 1978. Some considerations of binaural cross-correlation analysis. Acustica 39: 96–103. Blauert, J., and W. Cobben. 1978. Some considerations of binaural cross-correlation analysis. Acustica 39: 96–103.
Zurück zum Zitat Bodden, M. 1993. Modelling human sound-source localization and the cocktail party effect. Acta Acustica 1: 43–55. Bodden, M. 1993. Modelling human sound-source localization and the cocktail party effect. Acta Acustica 1: 43–55.
Zurück zum Zitat Bodden, M., and Anderson, T.R. 1995. A binaural selectivity model for speech recognition. In Proceedings of Eurospeech 1995 (European Speech Communication Association). Bodden, M., and Anderson, T.R. 1995. A binaural selectivity model for speech recognition. In Proceedings of Eurospeech 1995 (European Speech Communication Association).
Zurück zum Zitat Boll, S.F. 1979. Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing 27(2), 113–120. Boll, S.F. 1979. Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing 27(2), 113–120.
Zurück zum Zitat Bourlard, H., and Morgan, N. 1994. Connectionist Speech Recognition: A hybrid approach. Kluwer Academic Publishers. Bourlard, H., and Morgan, N. 1994. Connectionist Speech Recognition: A hybrid approach. Kluwer Academic Publishers.
Zurück zum Zitat Braasch, J. 2005. Modelling of binaural hearing. In Communication Acoustics, ed. J. Blauert, Chap. 4, 75–108. Berlin: Springer-Verlag Braasch, J. 2005. Modelling of binaural hearing. In Communication Acoustics, ed. J. Blauert, Chap. 4, 75–108. Berlin: Springer-Verlag
Zurück zum Zitat Breebaart, J., S. van de Par, and A. Kohlrausch. 2001a. Binaural processing model based on contralateral inhibition. I. Model structure. Journal of the Acoustical Society of America 110: 1074–1088.ADS Breebaart, J., S. van de Par, and A. Kohlrausch. 2001a. Binaural processing model based on contralateral inhibition. I. Model structure. Journal of the Acoustical Society of America 110: 1074–1088.ADS
Zurück zum Zitat Breebaart, J., S. van de Par, and A. Kohlrausch. 2001b. Binaural processing model based on contralateral inhibition. II. Dependence on spectral parameters. Journal of the Acoustical Society of America 110: 1089–1103.ADS Breebaart, J., S. van de Par, and A. Kohlrausch. 2001b. Binaural processing model based on contralateral inhibition. II. Dependence on spectral parameters. Journal of the Acoustical Society of America 110: 1089–1103.ADS
Zurück zum Zitat Breebaart, J., S. van de Par, and A. Kohlrausch. 2001c. Binaural processing model based on contralateral inhibition. III. Dependence on temporal parameters. Journal of the Acoustical Society of America 110: 1117–1125. Breebaart, J., S. van de Par, and A. Kohlrausch. 2001c. Binaural processing model based on contralateral inhibition. III. Dependence on temporal parameters. Journal of the Acoustical Society of America 110: 1117–1125.
Zurück zum Zitat Bregman, A.S. 1990. Auditory Scene Analysis. Cambridge, MA: MIT Press. Bregman, A.S. 1990. Auditory Scene Analysis. Cambridge, MA: MIT Press.
Zurück zum Zitat Brown, G.J., and M.P. Cooke. 1994. Computational auditory scene analysis. Computer Speech and Language 8: 297–336. Brown, G.J., and M.P. Cooke. 1994. Computational auditory scene analysis. Computer Speech and Language 8: 297–336.
Zurück zum Zitat Brown, G.J., S. Harding, and J.P. Barker, 2006. Speech separation based on the statistics of binaural auditory features. In Proceedings of IEEE International Conference Acoustical, Speech, and Signal Processing, vol. V, 949 – 952. Brown, G.J., S. Harding, and J.P. Barker, 2006. Speech separation based on the statistics of binaural auditory features. In Proceedings of IEEE International Conference Acoustical, Speech, and Signal Processing, vol. V, 949 – 952.
Zurück zum Zitat Brown, G.J., and K.J. Palomäki. 2011. A computational model of binaural speech recognition: Role of across-frequency vs. within-frequency processing and internal noise. Speech Communication 53: 924–940. Brown, G.J., and K.J. Palomäki. 2011. A computational model of binaural speech recognition: Role of across-frequency vs. within-frequency processing and internal noise. Speech Communication 53: 924–940.
Zurück zum Zitat Burkhard, M.D., and R.M. Sachs. 1975. Anthroponetric manikin for acoustic research. Journal of the Acoustical Society of America 58: 214–222.ADS Burkhard, M.D., and R.M. Sachs. 1975. Anthroponetric manikin for acoustic research. Journal of the Acoustical Society of America 58: 214–222.ADS
Zurück zum Zitat Cantu, M. 2018. Sound source segregation of multiple concurrent talkers via short-time target cancellation. Ph.D. thesis, Boston University. Cantu, M. 2018. Sound source segregation of multiple concurrent talkers via short-time target cancellation. Ph.D. thesis, Boston University.
Zurück zum Zitat Cho, B.J., H. Kwon, J.-W. Cho, C. Kim, R.M. Stern, and H.-M. Park. 2016. A subband-based stationary-component suppression method using harmonics and power ratio for reverberant speech recognition. IEEE Signal Processing Letters 23 (6): 780–784.ADS Cho, B.J., H. Kwon, J.-W. Cho, C. Kim, R.M. Stern, and H.-M. Park. 2016. A subband-based stationary-component suppression method using harmonics and power ratio for reverberant speech recognition. IEEE Signal Processing Letters 23 (6): 780–784.ADS
Zurück zum Zitat Colburn, H.S. 1969. Some physiological limitations on binaural performance. Ph.D. thesis, Massachusetts Institute of Technology. Colburn, H.S. 1969. Some physiological limitations on binaural performance. Ph.D. thesis, Massachusetts Institute of Technology.
Zurück zum Zitat Colburn, H.S. 1973. Theory of binaural interaction based on auditory-nerve data. I. general strategy and preliminary results on interaural discrimination. Journal of the Acoustical Society of America 54: 1458–1470.ADS Colburn, H.S. 1973. Theory of binaural interaction based on auditory-nerve data. I. general strategy and preliminary results on interaural discrimination. Journal of the Acoustical Society of America 54: 1458–1470.ADS
Zurück zum Zitat Colburn, H.S., and N.I. Durlach. 1978. Models of binaural interaction. In Hearing, ed. E.C. Carterette, and M. P. Friedmann, Vol. IV of Handbook of Perception, Chap. 11, 467–518. New York: Academic Press Colburn, H.S., and N.I. Durlach. 1978. Models of binaural interaction. In Hearing, ed. E.C. Carterette, and M. P. Friedmann, Vol. IV of Handbook of Perception, Chap. 11, 467–518. New York: Academic Press
Zurück zum Zitat Colburn, H.S., and A. Kulkarni. 2005. Models of sound localization. In Sound Source Localization, eds. R. Fay, and T. Popper, Springer Handbook of Auditory Research, Chap. 8, 272–316. Springer-Verlag Colburn, H.S., and A. Kulkarni. 2005. Models of sound localization. In Sound Source Localization, eds. R. Fay, and T. Popper, Springer Handbook of Auditory Research, Chap. 8, 272–316. Springer-Verlag
Zurück zum Zitat Cooke, M., P. Green, L. Josifovski, and A. Vizinho. 2001. Robust automatic speech recognition with missing and unreliable acoustic data. Speech Communication 34: 267–285.MATH Cooke, M., P. Green, L. Josifovski, and A. Vizinho. 2001. Robust automatic speech recognition with missing and unreliable acoustic data. Speech Communication 34: 267–285.MATH
Zurück zum Zitat Cooke, M.P., and D. P.W. Ellis. 2001. The auditory organization of speech and other sources in listeners and computational models. Speech Communication 35, 141–177. Cooke, M.P., and D. P.W. Ellis. 2001. The auditory organization of speech and other sources in listeners and computational models. Speech Communication 35, 141–177.
Zurück zum Zitat Davis, S.B., and P. Mermelstein. 1980. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing 28: 357–366. Davis, S.B., and P. Mermelstein. 1980. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing 28: 357–366.
Zurück zum Zitat Dempster, A.P., N.M. Laird, and D.B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B 39: 1–38.MathSciNetMATH Dempster, A.P., N.M. Laird, and D.B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B 39: 1–38.MathSciNetMATH
Zurück zum Zitat DeSimio, M.P., T.R. Anderson, and J.J. Westerkamp. 1996. Phoneme recognition with a model of binaural hearing. IEEE Transactions on Speech and Audio Processing 4: 157–166. DeSimio, M.P., T.R. Anderson, and J.J. Westerkamp. 1996. Phoneme recognition with a model of binaural hearing. IEEE Transactions on Speech and Audio Processing 4: 157–166.
Zurück zum Zitat Dietz, M., J.H. Lestang, P. Majdak, R.M. Stern, T. Marquardt, S.D. Ewert, W.M. Hartmann, and D.F.M. Goodman. 2017. A framework for testing and comparing binaural models. Hearing Research 360: 92–106. Dietz, M., J.H. Lestang, P. Majdak, R.M. Stern, T. Marquardt, S.D. Ewert, W.M. Hartmann, and D.F.M. Goodman. 2017. A framework for testing and comparing binaural models. Hearing Research 360: 92–106.
Zurück zum Zitat Dietz, M., T. Marquardt, N.H. Salminen, and D. McAlpine. 2013. Emphasis of spatial cues in the temporal fine structure during the rising segments of amplitude-modulated sounds. Proceedings of the National Academy of Sciences of the United States of America 110: 15151–15156.ADS Dietz, M., T. Marquardt, N.H. Salminen, and D. McAlpine. 2013. Emphasis of spatial cues in the temporal fine structure during the rising segments of amplitude-modulated sounds. Proceedings of the National Academy of Sciences of the United States of America 110: 15151–15156.ADS
Zurück zum Zitat Domnitz, R.H., and H.S. Colburn. 1976. Analysis of binaural detection models for dependence on interaural target parameters. Journal of the Acoustical Society of America 59: 599–601.ADS Domnitz, R.H., and H.S. Colburn. 1976. Analysis of binaural detection models for dependence on interaural target parameters. Journal of the Acoustical Society of America 59: 599–601.ADS
Zurück zum Zitat Domnitz, R.H., and H.S. Colburn. 1977. Lateral position and interaural discrimination. Journal of the Acoustical Society of America 61: 1586–1598.ADS Domnitz, R.H., and H.S. Colburn. 1977. Lateral position and interaural discrimination. Journal of the Acoustical Society of America 61: 1586–1598.ADS
Zurück zum Zitat Droppo, J. 2013. Feature compensation. In Techniques for Noise Robustness in Automatic Speech Recognition, ed. T. Virtanen, B. Raj, and R. Singh, Chap. 9. Wiley Droppo, J. 2013. Feature compensation. In Techniques for Noise Robustness in Automatic Speech Recognition, ed. T. Virtanen, B. Raj, and R. Singh, Chap. 9. Wiley
Zurück zum Zitat Durlach, N.I. 1963. Equalization and cancellation theory of binaural masking level differences. Journal of the Acoustical Society of America 35 (8): 1206–1218.ADS Durlach, N.I. 1963. Equalization and cancellation theory of binaural masking level differences. Journal of the Acoustical Society of America 35 (8): 1206–1218.ADS
Zurück zum Zitat Durlach, N.I. 1972. Binaural signal detection: Equalization and cancellation theory. In Foundations of Modern Auditory Theory, vol. 2, ed. J.V. Tobias, 369–462. New York: Academic Press. Durlach, N.I. 1972. Binaural signal detection: Equalization and cancellation theory. In Foundations of Modern Auditory Theory, vol. 2, ed. J.V. Tobias, 369–462. New York: Academic Press.
Zurück zum Zitat Durlach, N.I., and H.S. Colburn. 1978. Binaural phenomena. In Hearing, ed. E.C. Carterette, and M.P. Friedman, 365–466., Vol. IV of Handbook of Perception New York: Academic Press. Durlach, N.I., and H.S. Colburn. 1978. Binaural phenomena. In Hearing, ed. E.C. Carterette, and M.P. Friedman, 365–466., Vol. IV of Handbook of Perception New York: Academic Press.
Zurück zum Zitat Faller, C., and J. Merimaa. 2004. Sound localization in complex listening situations: Selection of binaural cues based on interaural coherence. Journal of the Acoustical Society of America 116 (5): 3075–3089.ADS Faller, C., and J. Merimaa. 2004. Sound localization in complex listening situations: Selection of binaural cues based on interaural coherence. Journal of the Acoustical Society of America 116 (5): 3075–3089.ADS
Zurück zum Zitat Fan, N., J. Du, and L.-R. Dai. 2016. A regression approach to binaural speech segregation via deep neural networks. In Proceedings of IEEE International Symposium on Chinese Spoken Language Processing, 116–120. Fan, N., J. Du, and L.-R. Dai. 2016. A regression approach to binaural speech segregation via deep neural networks. In Proceedings of IEEE International Symposium on Chinese Spoken Language Processing, 116–120.
Zurück zum Zitat Flanagan, J.L., J.D. Johnston, R. Zahn, and G.W. Elko. 1985. Computer-steered microphone arrays for sound transduction in large rooms. Journal of the Acoustical Society of America 78: 1508–1518.ADS Flanagan, J.L., J.D. Johnston, R. Zahn, and G.W. Elko. 1985. Computer-steered microphone arrays for sound transduction in large rooms. Journal of the Acoustical Society of America 78: 1508–1518.ADS
Zurück zum Zitat Gaik, W. 1993. Combined evaluation of interaural time and intensity differences: Psychoacoustic results and computer modeling. Journal of the Acoustical Society of America 94: 98–110.ADS Gaik, W. 1993. Combined evaluation of interaural time and intensity differences: Psychoacoustic results and computer modeling. Journal of the Acoustical Society of America 94: 98–110.ADS
Zurück zum Zitat Gilkey, R.H., and Anderson, T.A. (eds.). 1997. Binaural and Spatial Hearing in Real and Virtual Environments. Psychology Press. Gilkey, R.H., and Anderson, T.A. (eds.). 1997. Binaural and Spatial Hearing in Real and Virtual Environments. Psychology Press.
Zurück zum Zitat Gold, B., N. Morgan, and D. Ellis. 2011. Speech and Audio Signal Processing, 2nd ed. Wiley Interscience. Gold, B., N. Morgan, and D. Ellis. 2011. Speech and Audio Signal Processing, 2nd ed. Wiley Interscience.
Zurück zum Zitat Goodfellow, I., Y. Bengio, and A. Courville. 2016. Deep Learning. MIT Press. Goodfellow, I., Y. Bengio, and A. Courville. 2016. Deep Learning. MIT Press.
Zurück zum Zitat Harding, S., J. Barker, and G.J. Brown. 2006. Mask estimation for missing data speech recognition based on statistics of binaural interaction. IEEE Transactions on Speech and Audio Processing 14: 58–67. Harding, S., J. Barker, and G.J. Brown. 2006. Mask estimation for missing data speech recognition based on statistics of binaural interaction. IEEE Transactions on Speech and Audio Processing 14: 58–67.
Zurück zum Zitat Hartung, K., and C. Trahiotis. 2001. Peripheral auditory processing and investigations of the “precedence effect” which utilize successive transient stimuli. Journal of the Acoustical Society of America 110 (3): 1505–1513. Hartung, K., and C. Trahiotis. 2001. Peripheral auditory processing and investigations of the “precedence effect” which utilize successive transient stimuli. Journal of the Acoustical Society of America 110 (3): 1505–1513.
Zurück zum Zitat Hawley, M.L., R.Y. Litovsky, and H.S. Colburn. 1999. Speech intelligibility and localization in a multi-source environment. Journal of the Acoustical Society of America 105: 3436–3448.ADS Hawley, M.L., R.Y. Litovsky, and H.S. Colburn. 1999. Speech intelligibility and localization in a multi-source environment. Journal of the Acoustical Society of America 105: 3436–3448.ADS
Zurück zum Zitat Haykin, S. 2018. Neural Networks And Learning Machines, 3rd ed. Springer. Haykin, S. 2018. Neural Networks And Learning Machines, 3rd ed. Springer.
Zurück zum Zitat Hermansky, H. 1990. Perceptual linear predictive (PLP) analysis of speech. Journal of the Acoustical Society of America 87 (4): 1738–1752. Hermansky, H. 1990. Perceptual linear predictive (PLP) analysis of speech. Journal of the Acoustical Society of America 87 (4): 1738–1752.
Zurück zum Zitat Hermansky, H., D.P.W. Ellis, and S. Sharma. 2000. Tandem connectionist feature extraction for conventional hmm systems. In Proceedings of the IEEE ICASSP, 1635–1638. Hermansky, H., D.P.W. Ellis, and S. Sharma. 2000. Tandem connectionist feature extraction for conventional hmm systems. In Proceedings of the IEEE ICASSP, 1635–1638.
Zurück zum Zitat Hermansky, H., and N. Morgan. 1994. RASTA processing of speech. IEEE Transactions on Speech and Audio Processing 2: 578–589. Hermansky, H., and N. Morgan. 1994. RASTA processing of speech. IEEE Transactions on Speech and Audio Processing 2: 578–589.
Zurück zum Zitat Hinton, G., L. Deng, D. Yu, G.E. Dahl, and Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T. N., and Kingsbury, B. 2012. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine 29, 82–97. Hinton, G., L. Deng, D. Yu, G.E. Dahl, and Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T. N., and Kingsbury, B. 2012. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine 29, 82–97.
Zurück zum Zitat Jeffress, L.A. 1948. A place theory of sound localization. Journal of Comparative Physiology, Psychology 41: 35–39. Jeffress, L.A. 1948. A place theory of sound localization. Journal of Comparative Physiology, Psychology 41: 35–39.
Zurück zum Zitat Jeub, M., M. Dorbecker, and P. Vary. 2011a. Semi-analytical model for the binaural coherence of noise fields. IEEE Signal Processing Letters 18 (3): 197–200.ADS Jeub, M., M. Dorbecker, and P. Vary. 2011a. Semi-analytical model for the binaural coherence of noise fields. IEEE Signal Processing Letters 18 (3): 197–200.ADS
Zurück zum Zitat Jeub, M., C. Nelke, C. Beaugeant, and P. Vary. 2011b. Blind estimation of the coherent-to-diffuse energy ratio from noisy speech signals. In Proceedings of the\(19^{th}\)European Signal Processing Conference. Jeub, M., C. Nelke, C. Beaugeant, and P. Vary. 2011b. Blind estimation of the coherent-to-diffuse energy ratio from noisy speech signals. In Proceedings of the\(19^{th}\)European Signal Processing Conference.
Zurück zum Zitat Jeub, M., M. Schafer, T. Esch, and P. Vary. 2010. Model-based dereverberation preserving binaural cues. IEEE Transactions on Audio, Speech, and Language Processing 18 (7): 1732–1745. Jeub, M., M. Schafer, T. Esch, and P. Vary. 2010. Model-based dereverberation preserving binaural cues. IEEE Transactions on Audio, Speech, and Language Processing 18 (7): 1732–1745.
Zurück zum Zitat Jeub, M., M. Schafer, and P. Vary. 2009. A binaural room impulse response database for the evaluation of dereverberation algorithms. In Proceedings on\(16^{th}\)International Conference on Digital Signal Processing, 1–5. Jeub, M., M. Schafer, and P. Vary. 2009. A binaural room impulse response database for the evaluation of dereverberation algorithms. In Proceedings on\(16^{th}\)International Conference on Digital Signal Processing, 1–5.
Zurück zum Zitat Jiang, Y., D. Wang, R. Liu, and Z. Feng. 2014. Binaural classification for reverberant speech segregation using deep neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing 22 (12): 2112–2121. Jiang, Y., D. Wang, R. Liu, and Z. Feng. 2014. Binaural classification for reverberant speech segregation using deep neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing 22 (12): 2112–2121.
Zurück zum Zitat Johnson, D.H., and D.E. Dudgeon. 1993. Array Signal Processing: Concepts and Techniques. Englewood Cliffs NJ: Prentice-Hall.MATH Johnson, D.H., and D.E. Dudgeon. 1993. Array Signal Processing: Concepts and Techniques. Englewood Cliffs NJ: Prentice-Hall.MATH
Zurück zum Zitat Kates, J.M. 1991. A time-domain digital cochlear model. IEEE Transaction on Signal Processing 39: 2573–2592.ADS Kates, J.M. 1991. A time-domain digital cochlear model. IEEE Transaction on Signal Processing 39: 2573–2592.ADS
Zurück zum Zitat Kim, C., C. Khawand, and R.M. Stern. 2012. Two-microphone source separation algorithm based on statistical modeling of angle distributions. In Proceedings of the IEEE International Conference Acoustical, Speech and Signal Processing. Kim, C., C. Khawand, and R.M. Stern. 2012. Two-microphone source separation algorithm based on statistical modeling of angle distributions. In Proceedings of the IEEE International Conference Acoustical, Speech and Signal Processing.
Zurück zum Zitat Kim, C., K. Kumar, B. Raj, and R.M. Stern. 2009. Signal separation for robust speech recognition based on phase difference information obtained in the frequency domain. In Proceedings of the Interspeech Conference. Kim, C., K. Kumar, B. Raj, and R.M. Stern. 2009. Signal separation for robust speech recognition based on phase difference information obtained in the frequency domain. In Proceedings of the Interspeech Conference.
Zurück zum Zitat Kim, C., K. Kumar, and R.M. Stern. 2011. Binaural sound source separation motivated by auditory processing. In Proceedings of the Interspeech Conference, Prague, Czech Republic, vol. 23, 780–784. Kim, C., K. Kumar, and R.M. Stern. 2011. Binaural sound source separation motivated by auditory processing. In Proceedings of the Interspeech Conference, Prague, Czech Republic, vol. 23, 780–784.
Zurück zum Zitat Kim, C., and R.M. Stern. 2010. Nonlinear enhancement of onset for robust speech recognition. In Proceedings of the Interspeech Conference. Makuhari, Japan Kim, C., and R.M. Stern. 2010. Nonlinear enhancement of onset for robust speech recognition. In Proceedings of the Interspeech Conference. Makuhari, Japan
Zurück zum Zitat Kim, C., and R.M. Stern. 2016. Power-normalized cepstral coefficients (PNCC) for robust speech recognition. IEEE Transactions on Audio, Speech, and Language Processing 24(7), 1315–1329. Kim, C., and R.M. Stern. 2016. Power-normalized cepstral coefficients (PNCC) for robust speech recognition. IEEE Transactions on Audio, Speech, and Language Processing 24(7), 1315–1329.
Zurück zum Zitat Kim, C., R.M. Stern, K. Eom, and J. Kee. 2010. Automatic selection of thresholds for signal separation algorithms based on interaural delay. In Proceedings of the Interspeech Conference. Makuhari, Japan. Kim, C., R.M. Stern, K. Eom, and J. Kee. 2010. Automatic selection of thresholds for signal separation algorithms based on interaural delay. In Proceedings of the Interspeech Conference. Makuhari, Japan.
Zurück zum Zitat Kohonen, T. 1989. The neural phonetic typewriter. IEEE Computer Magazine, 11–22. Kohonen, T. 1989. The neural phonetic typewriter. IEEE Computer Magazine, 11–22.
Zurück zum Zitat Kohlrausch, A., J. Braasch, D. Kolossa, and J. Blauert. 2013. An introduction to binaural processing. In The Technology of Binarual Listening, ed. J. Blauert., Springer and ASA Press. Kohlrausch, A., J. Braasch, D. Kolossa, and J. Blauert. 2013. An introduction to binaural processing. In The Technology of Binarual Listening, ed. J. Blauert., Springer and ASA Press.
Zurück zum Zitat Kumatani, K., J. McDonough, and B. Raj. 2012. Microphone array processing for robust speech recognition. IEEE Signal Processing Magazine 29 (6): 127–140.ADS Kumatani, K., J. McDonough, and B. Raj. 2012. Microphone array processing for robust speech recognition. IEEE Signal Processing Magazine 29 (6): 127–140.ADS
Zurück zum Zitat Lindemann, W. 1986a. Extension of a binaural cross-correlation model by contralateral inhibition. I. simulation of lateralization for stationary signals. Journal of the Acoustical Society of America 80: 1608–1622.ADS Lindemann, W. 1986a. Extension of a binaural cross-correlation model by contralateral inhibition. I. simulation of lateralization for stationary signals. Journal of the Acoustical Society of America 80: 1608–1622.ADS
Zurück zum Zitat Lindemann, W. 1986b. Extension of a binaural cross-correlation model by contralateral inhibition. II. the law of the first wavefront. Journal of the Acoustical Society of America 80: 1623–1630.ADS Lindemann, W. 1986b. Extension of a binaural cross-correlation model by contralateral inhibition. II. the law of the first wavefront. Journal of the Acoustical Society of America 80: 1623–1630.ADS
Zurück zum Zitat Lippmann, R.P. 1987. An introduction to computing with neural nets. IEEE ASSP Magazine 4 (2): 4–22. Lippmann, R.P. 1987. An introduction to computing with neural nets. IEEE ASSP Magazine 4 (2): 4–22.
Zurück zum Zitat Lippmann, R.P. 1989. Review of neural networks for speech recognition. Neural Computation 1 (1): 1–38. Lippmann, R.P. 1989. Review of neural networks for speech recognition. Neural Computation 1 (1): 1–38.
Zurück zum Zitat Litovsky, R.Y., S.H. Colburn, W.A. Yost, and S.J. Guzman. 1999. The precedence effect. Journal of the Acoustical Society of America 106: 1633–1654.ADS Litovsky, R.Y., S.H. Colburn, W.A. Yost, and S.J. Guzman. 1999. The precedence effect. Journal of the Acoustical Society of America 106: 1633–1654.ADS
Zurück zum Zitat Lyon, R.F. 1984. Computational models of neural auditory processing. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing of the International Conference on Acoustics, Speech and Signal Processing, 36.1.1–36.1.4. Lyon, R.F. 1984. Computational models of neural auditory processing. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing of the International Conference on Acoustics, Speech and Signal Processing, 36.1.1–36.1.4.
Zurück zum Zitat Mandel, M.I., R.J. Weiss, and D.P.W. Ellis. 2010. Model-based expectation-maximization source separation and localization. IEEE Transactions on Audio, Speech, and Language Processing 18 (2): 382–394. Mandel, M.I., R.J. Weiss, and D.P.W. Ellis. 2010. Model-based expectation-maximization source separation and localization. IEEE Transactions on Audio, Speech, and Language Processing 18 (2): 382–394.
Zurück zum Zitat Martin, K.D. 1997. Echo suppression in a computational model of the precedence effect. In Proceedings of the IEEE Mohonk Workshop on Applications of Signal Processing to Acoustics and Audio. Martin, K.D. 1997. Echo suppression in a computational model of the precedence effect. In Proceedings of the IEEE Mohonk Workshop on Applications of Signal Processing to Acoustics and Audio.
Zurück zum Zitat May, T., S.V.D. Par, and A. Kohlrausch. 2012. A binaural scene analyzer for joint localization and recognition of speakers in the presence of interfering noise sources and reverberation. IEEE Transactions on Audio, Speech, and Language Processing 20: 108–121. May, T., S.V.D. Par, and A. Kohlrausch. 2012. A binaural scene analyzer for joint localization and recognition of speakers in the presence of interfering noise sources and reverberation. IEEE Transactions on Audio, Speech, and Language Processing 20: 108–121.
Zurück zum Zitat May, T., S. van de Par, and A. Kohlrausch. 2011. A probabilistic model for robust localization based on a binaural auditory front-end. IEEE Transactions on Audio, Speech, and Language Processing 19 (1): 1–13. May, T., S. van de Par, and A. Kohlrausch. 2011. A probabilistic model for robust localization based on a binaural auditory front-end. IEEE Transactions on Audio, Speech, and Language Processing 19 (1): 1–13.
Zurück zum Zitat Mehrgardt, S., and V. Mellert. 1977. Transformation charactersitics of the external human ear. Journal of the Acoustical Society of America 61: 1567–1576.ADS Mehrgardt, S., and V. Mellert. 1977. Transformation charactersitics of the external human ear. Journal of the Acoustical Society of America 61: 1567–1576.ADS
Zurück zum Zitat Menon, A. 2018. Robust recognition of binaural speech signals using techniques based on human auditory processing. Ph.D. thesis, Carnegie Mellon University. Menon, A. 2018. Robust recognition of binaural speech signals using techniques based on human auditory processing. Ph.D. thesis, Carnegie Mellon University.
Zurück zum Zitat Mi, J., and H.S. Colburn. 2016. A binaural grouping model for predicting speech intelligibility in multitalker environments. Trends in Hearing 20: 1–12. Mi, J., and H.S. Colburn. 2016. A binaural grouping model for predicting speech intelligibility in multitalker environments. Trends in Hearing 20: 1–12.
Zurück zum Zitat Mi, J., M. Groll, and H.S. Colburn. 2017. Comparison of a target-equalization-cancellation approach and a localization approach to source separation. Journal of the Acoustical Society of America 142 (5): 2933–2941.ADS Mi, J., M. Groll, and H.S. Colburn. 2017. Comparison of a target-equalization-cancellation approach and a localization approach to source separation. Journal of the Acoustical Society of America 142 (5): 2933–2941.ADS
Zurück zum Zitat Miao, Y., and F. Metze. 2017. End-to-end architectures for speech recognition. In New Era for Robust Speech Recognition: Exploiting Deep Learning, ed. Watanabe, S., M. Delcroix, F. Metze, and J.R. Hershey, 299–323. Springer International Publishing Miao, Y., and F. Metze. 2017. End-to-end architectures for speech recognition. In New Era for Robust Speech Recognition: Exploiting Deep Learning, ed. Watanabe, S., M. Delcroix, F. Metze, and J.R. Hershey, 299–323. Springer International Publishing
Zurück zum Zitat Mitra, V., H. Franco, R. Stern, J.V. Hout, L. Ferrer, M. Graciarena, W. Wang, D. Vergyri, A. Alwan, and J.H.L. Nansen. 2017. Robust features in deep learning-based speech recognition. In New Era for Robust Speech Recognition: Exploiting Deep Learning, ed. Watanabe, S., M. Delcroix, F. Metze, and J.R. Hershey, 183–212. Springer International Publishing Mitra, V., H. Franco, R. Stern, J.V. Hout, L. Ferrer, M. Graciarena, W. Wang, D. Vergyri, A. Alwan, and J.H.L. Nansen. 2017. Robust features in deep learning-based speech recognition. In New Era for Robust Speech Recognition: Exploiting Deep Learning, ed. Watanabe, S., M. Delcroix, F. Metze, and J.R. Hershey, 183–212. Springer International Publishing
Zurück zum Zitat Moore, B.C.J. 2012. An Introduction to the Psychology of Hearing, 6th ed. Bingley UK, London: Emerald Group Publishing Ltd. Moore, B.C.J. 2012. An Introduction to the Psychology of Hearing, 6th ed. Bingley UK, London: Emerald Group Publishing Ltd.
Zurück zum Zitat Moreno, P.J., B. Raj, and R.M. Stern. 1996. A vector Taylor series approach for environment-independent speech recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 733–736 Moreno, P.J., B. Raj, and R.M. Stern. 1996. A vector Taylor series approach for environment-independent speech recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 733–736
Zurück zum Zitat Osman, E. 1971. A correlation model of binaural masking level differences. Journal of the Acoustical Society of America 50: 1494–1511.ADS Osman, E. 1971. A correlation model of binaural masking level differences. Journal of the Acoustical Society of America 50: 1494–1511.ADS
Zurück zum Zitat Palomäki, K.J., G.J. Brown, and D.L. Wang. 2004. A binaural processor for missing data speech recognition in the presence of noise and small-room reverberation. Speech Communication 43 (4): 361–378. Palomäki, K.J., G.J. Brown, and D.L. Wang. 2004. A binaural processor for missing data speech recognition in the presence of noise and small-room reverberation. Speech Communication 43 (4): 361–378.
Zurück zum Zitat Park, H.-M., and R.M. Stern. 2009. Spatial separation of speech signals using continuously-variable weighting factors estimated from comparisons of zero crossings. Speech Communication Journal 51 (1): 15–25. Park, H.-M., and R.M. Stern. 2009. Spatial separation of speech signals using continuously-variable weighting factors estimated from comparisons of zero crossings. Speech Communication Journal 51 (1): 15–25.
Zurück zum Zitat Patterson, R.D., I. Nimmo-Smith, J. Holdsworth, and P. Rice. 1988. An efficient auditory filterbank based on the gammatone function, Applied Psychology Unit (APU) Report 2341. Cambridge UK Patterson, R.D., I. Nimmo-Smith, J. Holdsworth, and P. Rice. 1988. An efficient auditory filterbank based on the gammatone function, Applied Psychology Unit (APU) Report 2341. Cambridge UK
Zurück zum Zitat Rabiner, L.R. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77 (2): 257–286. Rabiner, L.R. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77 (2): 257–286.
Zurück zum Zitat Rabiner, L.R., and B.-H. Juang. 1993. Fundamentals of Speech Recognition. Prentice-Hall. Rabiner, L.R., and B.-H. Juang. 1993. Fundamentals of Speech Recognition. Prentice-Hall.
Zurück zum Zitat Raj, B., M.L. Seltzer, and R.M. Stern. 2004. Reconstruction of missing features for robust speech recognition. Speech Communication 43 (4): 275–296. Raj, B., M.L. Seltzer, and R.M. Stern. 2004. Reconstruction of missing features for robust speech recognition. Speech Communication 43 (4): 275–296.
Zurück zum Zitat Raj, B., and R.M. Stern. 2005. Missing-feature approaches in speech recognition. IEEE Signal Processing Magazine 22 (5): 101–115.ADS Raj, B., and R.M. Stern. 2005. Missing-feature approaches in speech recognition. IEEE Signal Processing Magazine 22 (5): 101–115.ADS
Zurück zum Zitat Rickard, S. 2007. The DUET blind source separation algorithm. In Blind Speech Separation, ed. Makino, S., T. Lee, and H.E. Sawada. New York: Springer-Verlag. Rickard, S. 2007. The DUET blind source separation algorithm. In Blind Speech Separation, ed. Makino, S., T. Lee, and H.E. Sawada. New York: Springer-Verlag.
Zurück zum Zitat Roman, N., S. Srinivasan, and D. Wang. 2006. Binaural segregation in multisource. Journal of the Acoustical Society of America 120: 4040–4051. Roman, N., S. Srinivasan, and D. Wang. 2006. Binaural segregation in multisource. Journal of the Acoustical Society of America 120: 4040–4051.
Zurück zum Zitat Roman, N., D.L. Wang, and G.J. Brown. 2003. Speech segregation based on sound localization. Journal of the Acoustical Society of America 114 (4): 2236–2252.ADS Roman, N., D.L. Wang, and G.J. Brown. 2003. Speech segregation based on sound localization. Journal of the Acoustical Society of America 114 (4): 2236–2252.ADS
Zurück zum Zitat Rosenblatt, R. 1959. Principles of Neurodynamics. New York: Spartan Books. Rosenblatt, R. 1959. Principles of Neurodynamics. New York: Spartan Books.
Zurück zum Zitat Schroeder, M.R. 1977. New viewpoints in binaural interactions. In Psychophysics and Physiology of Hearing, ed. Evans, E.F. and J.P. Wilson, 455–467. London: Academic Press Schroeder, M.R. 1977. New viewpoints in binaural interactions. In Psychophysics and Physiology of Hearing, ed. Evans, E.F. and J.P. Wilson, 455–467. London: Academic Press
Zurück zum Zitat Shamma, S.A., N. Shen, and P. Gopalaswamy. 1989. Binaural processing without neural delays. Journal of the Acoustical Society of America 86: 987–1006.ADS Shamma, S.A., N. Shen, and P. Gopalaswamy. 1989. Binaural processing without neural delays. Journal of the Acoustical Society of America 86: 987–1006.ADS
Zurück zum Zitat Shao, Y., and D.L. Wang. 2008. Robust speaker identification using auditory features and computational auditory scene analysis. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 1589–1592 Shao, Y., and D.L. Wang. 2008. Robust speaker identification using auditory features and computational auditory scene analysis. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 1589–1592
Zurück zum Zitat Srinivasan, S., M. Roman, and D. Wang. 2006. Binary and ratio time-frequency masks for robust speech recognition. Speech Communication 48: 1486–1501. Srinivasan, S., M. Roman, and D. Wang. 2006. Binary and ratio time-frequency masks for robust speech recognition. Speech Communication 48: 1486–1501.
Zurück zum Zitat Stecker, G.C., J.D. Ostreicher, and A.D. Brown. 2013. Temporal weighting functions for interaural time and level differences. III. Temporal weighting for lateral position judgments. Journal of the Acoustical Society of America 134: 1242–1252. Stecker, G.C., J.D. Ostreicher, and A.D. Brown. 2013. Temporal weighting functions for interaural time and level differences. III. Temporal weighting for lateral position judgments. Journal of the Acoustical Society of America 134: 1242–1252.
Zurück zum Zitat Stern, R.M., and H.S. Colburn. 1978. Theory of binaural interaction based on auditory-nerve data. IV. A model for subjective lateral position. Journal of the Acoustical Society of America 64: 127–140. Stern, R.M., and H.S. Colburn. 1978. Theory of binaural interaction based on auditory-nerve data. IV. A model for subjective lateral position. Journal of the Acoustical Society of America 64: 127–140.
Zurück zum Zitat Stern, R.M., and Trahiotis, C. 1995. Models of binaural interaction. In Hearing, ed. Moore, B.C.J., Handbook of Perception and Cognition, 2 ed, Chap. 10, 347–386. New York: Academic. Stern, R.M., and Trahiotis, C. 1995. Models of binaural interaction. In Hearing, ed. Moore, B.C.J., Handbook of Perception and Cognition, 2 ed, Chap. 10, 347–386. New York: Academic.
Zurück zum Zitat Stern, R.M., and C. Trahiotis. 1996. Models of binaural perception. In Binaural and Spatial Hearing in Real and Virtual Environments, ed. Gilkey, R. and T.R. Anderson, Chap. 24, 499–531. Lawrence Erlbaum Associates Stern, R.M., and C. Trahiotis. 1996. Models of binaural perception. In Binaural and Spatial Hearing in Real and Virtual Environments, ed. Gilkey, R. and T.R. Anderson, Chap. 24, 499–531. Lawrence Erlbaum Associates
Zurück zum Zitat Stern, R.M., D. Wang, and G.J. Brown. 2006. Binaural sound localization. In Computational Auditory Scene Analysis, ed. Wang, D., and G.J: Brown, Chap. 5. Wiley-IEEE Press Stern, R.M., D. Wang, and G.J. Brown. 2006. Binaural sound localization. In Computational Auditory Scene Analysis, ed. Wang, D., and G.J: Brown, Chap. 5. Wiley-IEEE Press
Zurück zum Zitat Stern, R.M., A.S. Zeiberg, and C. Trahiotis. 1988. Lateralization of complex binaural stimuli: a weighted image model. Journal of the Acoustical Society of America 84: 156–165.ADS Stern, R.M., A.S. Zeiberg, and C. Trahiotis. 1988. Lateralization of complex binaural stimuli: a weighted image model. Journal of the Acoustical Society of America 84: 156–165.ADS
Zurück zum Zitat Stevens, S.S., J. Volkman, and E. Newman. 1937. A scale for the measurement of the psychological magnitude pitch. Journal of the Acoustical Society of America 8 (3): 185–190.ADS Stevens, S.S., J. Volkman, and E. Newman. 1937. A scale for the measurement of the psychological magnitude pitch. Journal of the Acoustical Society of America 8 (3): 185–190.ADS
Zurück zum Zitat Stockham, T.G., T.M. Cannon, and R.B. Ingrebretsen. 1975. Blind deconvolution through digital signal processing. Proceedings of the IEEE 63 (4): 678–692. Stockham, T.G., T.M. Cannon, and R.B. Ingrebretsen. 1975. Blind deconvolution through digital signal processing. Proceedings of the IEEE 63 (4): 678–692.
Zurück zum Zitat Thiergart, O., G. Del Galdo, and E.A. Habets. 2012. Signal-to-reverberant ratio estimation based on the complex spatial coherence between omnidirectional microphones. In: 2010 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 309–312. Thiergart, O., G. Del Galdo, and E.A. Habets. 2012. Signal-to-reverberant ratio estimation based on the complex spatial coherence between omnidirectional microphones. In: 2010 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 309–312.
Zurück zum Zitat Trahiotis, C., L.R. Bernstein, R.M. Stern, and T.N. Buell. 2005. Interaural correlation as the basis of a working model of binaural processing: An introduction. In Sound Source Localization, ed. R. Fay, and T. Popper, 238–271., Springer Handbook of Auditory Research. Heidelberg: Springer-Verlag. Trahiotis, C., L.R. Bernstein, R.M. Stern, and T.N. Buell. 2005. Interaural correlation as the basis of a working model of binaural processing: An introduction. In Sound Source Localization, ed. R. Fay, and T. Popper, 238–271., Springer Handbook of Auditory Research. Heidelberg: Springer-Verlag.
Zurück zum Zitat Van Trees, H.L. 2004. Detection, Estimation, and Modulation Theory: Optimum Array Processing. Wiley. Van Trees, H.L. 2004. Detection, Estimation, and Modulation Theory: Optimum Array Processing. Wiley.
Zurück zum Zitat Virtanen, T., B. Raj, and R. Singh, eds. 2012. Noise-Robust Techniques for Automatic Speech Recognition. Wiley. Virtanen, T., B. Raj, and R. Singh, eds. 2012. Noise-Robust Techniques for Automatic Speech Recognition. Wiley.
Zurück zum Zitat Wallach, H.W., E.B. Newman, and M.R. Rosenzweig. 1949. The precedence effect in sound localization. American Journal of Psychology 62: 315–337. Wallach, H.W., E.B. Newman, and M.R. Rosenzweig. 1949. The precedence effect in sound localization. American Journal of Psychology 62: 315–337.
Zurück zum Zitat Wan, R., N.I. Durlach, and H.S. Colburn. 2010. Application of an extended equalization-cancellation model to speech intelligibility with spatially distributed maskers. Journal of the Acoustical Society of America 128: 3678–3690.ADS Wan, R., N.I. Durlach, and H.S. Colburn. 2010. Application of an extended equalization-cancellation model to speech intelligibility with spatially distributed maskers. Journal of the Acoustical Society of America 128: 3678–3690.ADS
Zurück zum Zitat Wan, R., N.I. Durlach, and H.S. Colburn. 2014. Application of a short-time version of the equalization-cancellation model to speech intelligibility experiments with speech maskers. Journal of the Acoustical Society of America 136: 768–776.ADS Wan, R., N.I. Durlach, and H.S. Colburn. 2014. Application of a short-time version of the equalization-cancellation model to speech intelligibility experiments with speech maskers. Journal of the Acoustical Society of America 136: 768–776.ADS
Zurück zum Zitat Wang, D., and G.J. Brown, eds. 2006. Computational Auditory Scene Analysis: Principles, Algorithms, and Applications. Wiley-IEEE Press. Wang, D., and G.J. Brown, eds. 2006. Computational Auditory Scene Analysis: Principles, Algorithms, and Applications. Wiley-IEEE Press.
Zurück zum Zitat Wang, D.L., and J. Chen. 2018. Supervised speech separation based on deep learning: An overview. IEEE Transactions on Audio, Speech, and Language Processing 26: 1702–1726.ADS Wang, D.L., and J. Chen. 2018. Supervised speech separation based on deep learning: An overview. IEEE Transactions on Audio, Speech, and Language Processing 26: 1702–1726.ADS
Zurück zum Zitat Wang, Y., and D.L. Wang. 2013. Towards scaling up classification-based speech separation. IEEE Transactions on Audio, Speech, and Language Processing 21: 1381–1390. Wang, Y., and D.L. Wang. 2013. Towards scaling up classification-based speech separation. IEEE Transactions on Audio, Speech, and Language Processing 21: 1381–1390.
Zurück zum Zitat Watanabe, S., M. Delcroix, F. Metze, and J.R. Hershey, eds. 2017. New Era for Robust Speech Recognition: Exploiting Deep Learning. Springer International. Watanabe, S., M. Delcroix, F. Metze, and J.R. Hershey, eds. 2017. New Era for Robust Speech Recognition: Exploiting Deep Learning. Springer International.
Zurück zum Zitat Westermann, A., J.M. Buchholz, and T. Dau. 2013. Binaural dereverberation based on interaural coherence histograms. The Journal of the Acoustical Society of America 133 (5): 2767–2777. Westermann, A., J.M. Buchholz, and T. Dau. 2013. Binaural dereverberation based on interaural coherence histograms. The Journal of the Acoustical Society of America 133 (5): 2767–2777.
Zurück zum Zitat Wightman, F.L., and D.J. Kistler. 1989a. Headphone simulation of free-field listening. I: Stimulus synthesis. The Journal of the Acoustical Society of America 85: 858–867.ADS Wightman, F.L., and D.J. Kistler. 1989a. Headphone simulation of free-field listening. I: Stimulus synthesis. The Journal of the Acoustical Society of America 85: 858–867.ADS
Zurück zum Zitat Wightman, F.L., and D.J. Kistler. 1989b. Headphone simulation of free-field listening. II: Psychophysical validation. Journal of the Acoustical Society of America 87: 868–878.ADS Wightman, F.L., and D.J. Kistler. 1989b. Headphone simulation of free-field listening. II: Psychophysical validation. Journal of the Acoustical Society of America 87: 868–878.ADS
Zurück zum Zitat Wightman, F.L., and D.J. Kistler. 1999. Resolution of front-back ambiguity in spatial hearing by listener and source movement. The Journal of the Acoustical Society of America 105 (5): 2841–2853.ADS Wightman, F.L., and D.J. Kistler. 1999. Resolution of front-back ambiguity in spatial hearing by listener and source movement. The Journal of the Acoustical Society of America 105 (5): 2841–2853.ADS
Zurück zum Zitat Woodruff, J., and D.L. Wang. 2013. Binaural detection, localization, and segregation in reverberant environments based on joint pitch and azimuth cues. IEEE Transactions on Audio, Speech, and Language Processing 21: 806–815. Woodruff, J., and D.L. Wang. 2013. Binaural detection, localization, and segregation in reverberant environments based on joint pitch and azimuth cues. IEEE Transactions on Audio, Speech, and Language Processing 21: 806–815.
Zurück zum Zitat Yost, W.A. 1981. Lateral position of sinusoids presented with intensitive and temporal differences. Journal of the Acoustical Society of America 70: 397–409.ADS Yost, W.A. 1981. Lateral position of sinusoids presented with intensitive and temporal differences. Journal of the Acoustical Society of America 70: 397–409.ADS
Zurück zum Zitat Yost, W.A. 2013. Fundamentals of Hearing: An Introduction, 5th ed. Burlington MA: Academic Press. Yost, W.A. 2013. Fundamentals of Hearing: An Introduction, 5th ed. Burlington MA: Academic Press.
Zurück zum Zitat Yu, Y., W. Wang, and P. Han. 2016. Localization based stereo speech source separation using probabilistic time-frequency masking and deep neural networks. EURASIP Journal on Audio, Speech, and Music Processing 2016: 1–18. Yu, Y., W. Wang, and P. Han. 2016. Localization based stereo speech source separation using probabilistic time-frequency masking and deep neural networks. EURASIP Journal on Audio, Speech, and Music Processing 2016: 1–18.
Zurück zum Zitat Zhang, X., M.G. Heinz, I.C. Bruce, and L.H. Carney. 2001. A phenomenological model for the response of auditory-nerve fibers: I. nonlinear tuning with compression and suppression. Journal of the Acoustical Society of America 109: 648–670. Zhang, X., M.G. Heinz, I.C. Bruce, and L.H. Carney. 2001. A phenomenological model for the response of auditory-nerve fibers: I. nonlinear tuning with compression and suppression. Journal of the Acoustical Society of America 109: 648–670.
Zurück zum Zitat Zhang, X., and D. Wang. 2017. Deep learning based binaural speech separation in reverberant environments. IEEE/ACM Transactions on Audio, Speech, and Language Processing 25 (5): 1075–1084. Zhang, X., and D. Wang. 2017. Deep learning based binaural speech separation in reverberant environments. IEEE/ACM Transactions on Audio, Speech, and Language Processing 25 (5): 1075–1084.
Zurück zum Zitat Zheng, C., A. Schwarz, W. Kellermann, and X. Li. 2015. Binaural coherent-to-diffuse-ratio estimation for dereverberation using an ITD model. In Proceedings of the\(23^{rd}\)European Signal Processing Conference (EUSIPCO), 1048–1052. Zheng, C., A. Schwarz, W. Kellermann, and X. Li. 2015. Binaural coherent-to-diffuse-ratio estimation for dereverberation using an ITD model. In Proceedings of the\(23^{rd}\)European Signal Processing Conference (EUSIPCO), 1048–1052.
Zurück zum Zitat Zilany, M.S.A., I.C. Bruce, P.C. Nelson, and L.H. Carney. 2009. A phenomenological model of the synapse between the inner hair cell and auditory nerve: Long-term adaptation with power-law dynamics. Journal of the Acoustical Society of America 125: 2390–2412.ADS Zilany, M.S.A., I.C. Bruce, P.C. Nelson, and L.H. Carney. 2009. A phenomenological model of the synapse between the inner hair cell and auditory nerve: Long-term adaptation with power-law dynamics. Journal of the Acoustical Society of America 125: 2390–2412.ADS
Zurück zum Zitat Zurek, P.M. 1993. Binaural advantages and directional effects in speech intelligibility. In Acoustical Factors Affecting Hearing Aid Performance, ed. G.A. Studebaker, and I. Hochberg. Boston: Allyn and Bacon. Zurek, P.M. 1993. Binaural advantages and directional effects in speech intelligibility. In Acoustical Factors Affecting Hearing Aid Performance, ed. G.A. Studebaker, and I. Hochberg. Boston: Allyn and Bacon.
Zurück zum Zitat Zurek, P.M., R.L. Freyman, and U. Balakrishnan. 2004. Auditory target detection in reverberation. Journal of the Acoustical Society of America 115 (4): 1609–1620.ADS Zurek, P.M., R.L. Freyman, and U. Balakrishnan. 2004. Auditory target detection in reverberation. Journal of the Acoustical Society of America 115 (4): 1609–1620.ADS
Metadaten
Titel
Binaural Technology for Machine Speech Recognition and Understanding
verfasst von
Richard M. Stern
Anjali Menon
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-00386-9_18

    Premium Partner