Skip to main content

2008 | OriginalPaper | Buchkapitel

4. Perception of Speech and Sound

verfasst von : Birger Kollmeier, Prof., Thomas Brand, Dr., Bernd Meyer, Ph.D

Erschienen in: Springer Handbook of Speech Processing

Verlag: Springer Berlin Heidelberg

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The transformation of acoustical signals into auditory sensations can be characterized by psychophysical quantities such as loudness, tonality, or perceived pitch. The resolution limits of the auditory system produce spectral and temporal masking phenomena and impose constraints on the perception of amplitude modulations. Binaural hearing (i.e., utilizing the acoustical difference across both ears) employs interaural time and intensity differences to produce localization and binaural unmasking phenomena such as the binaural intelligibility level difference, i.e., the speech reception threshold difference between listening to speech in noise monaurally versus listening with both ears.
The acoustical information available to the listener for perceiving speech even under adverse conditions can be characterized using the articulation index, the speech transmission index, and the speech intelligibility index. They can objectively predict speech reception thresholds as a function of spectral content, signal-to-noise ratio, and preservation of amplitude modulations in the speech waveform that enter the listenerʼs ear. The articulatory or phonetic information available to and received by the listener can be characterized by speech feature sets. Transinformation analysis allows one to detect the relative transmission error connected with each of these speech features. The comparison across man and machine in speech recognition allows one to test hypotheses and models of human speech perception. Conversely, automatic speech recognition may be improved by introducing human signal-processing principles into machine processing algorithms.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
4.1.
Zurück zum Zitat H. Fastl, E. Zwicker: Psychoacoustics: Facts and Models (Springer, Berlin-Heidelberg 2005) H. Fastl, E. Zwicker: Psychoacoustics: Facts and Models (Springer, Berlin-Heidelberg 2005)
4.2.
Zurück zum Zitat E. Zwicker, G. Flottorp, S.S. Stevens: Critical bandwidth in loudness summation, J. Acoust. Soc. Am. 29, 548 (1957)CrossRef E. Zwicker, G. Flottorp, S.S. Stevens: Critical bandwidth in loudness summation, J. Acoust. Soc. Am. 29, 548 (1957)CrossRef
4.3.
Zurück zum Zitat T. Dau, B. Kollmeier, A. Kohlrausch: Modeling auditory processing of amplitude modulation. I. Detection and masking with narrow-band carriers, J. Acoust. Soc. Am. 102, 2892-2905 (1997)CrossRef T. Dau, B. Kollmeier, A. Kohlrausch: Modeling auditory processing of amplitude modulation. I. Detection and masking with narrow-band carriers, J. Acoust. Soc. Am. 102, 2892-2905 (1997)CrossRef
4.4.
Zurück zum Zitat M.R. Schroeder: Computer Speech: Recognition, Compression, Synthesis (Springer, Berlin-Heidelberg 2005)MATH M.R. Schroeder: Computer Speech: Recognition, Compression, Synthesis (Springer, Berlin-Heidelberg 2005)MATH
4.5.
Zurück zum Zitat B.C.J. Moore, R.D. Patterson: Auditory Frequency Selectivity (Plenum, New York 1986)CrossRef B.C.J. Moore, R.D. Patterson: Auditory Frequency Selectivity (Plenum, New York 1986)CrossRef
4.6.
Zurück zum Zitat A.J. Houtsma: Pitch perception. In: Handbook of Perception and Cognition: Hearing, ed. by B.C.J. Moore (Academic, London 1995) pp. 267-295 A.J. Houtsma: Pitch perception. In: Handbook of Perception and Cognition: Hearing, ed. by B.C.J. Moore (Academic, London 1995) pp. 267-295
4.7.
Zurück zum Zitat J. Verhey, D. Pressnitzer, I.M. Winter: The psychphysics and physiology of co-modulation masking release, Exp. Brain Res. 153, 405-417 (2003)CrossRef J. Verhey, D. Pressnitzer, I.M. Winter: The psychphysics and physiology of co-modulation masking release, Exp. Brain Res. 153, 405-417 (2003)CrossRef
4.8.
Zurück zum Zitat R. Beutelmann, T. Brand: Prediction of speech intelligibility in spatial noise and reverberation for normal-hearing and hearing-impaired listeners J. Acoust. Soc. Am. 120(1), 33-42 (2006) R. Beutelmann, T. Brand: Prediction of speech intelligibility in spatial noise and reverberation for normal-hearing and hearing-impaired listeners J. Acoust. Soc. Am. 120(1), 33-42 (2006)
4.9.
Zurück zum Zitat H.S. Colburn, N.I. Durlach: Models of binaural interaction. In: Handbook of Perception, Vol. 4 (Academic, New York 1978) pp. 467-518 H.S. Colburn, N.I. Durlach: Models of binaural interaction. In: Handbook of Perception, Vol. 4 (Academic, New York 1978) pp. 467-518
4.10.
Zurück zum Zitat J.P. Penrod: Speech threshold and word recognition/ discrimination testing. In: Handbook of Clinical Audiology, 4th edn, ed. by J. Katz (Williams and Wilkins, Baltimore 1994) pp. 147-164 J.P. Penrod: Speech threshold and word recognition/ discrimination testing. In: Handbook of Clinical Audiology, 4th edn, ed. by J. Katz (Williams and Wilkins, Baltimore 1994) pp. 147-164
4.11.
Zurück zum Zitat R. Plomp, A. Mimpen: Improving the reliability of testing the speech-reception threshold for sentences, Audiology 18, 43-52 (1979)CrossRef R. Plomp, A. Mimpen: Improving the reliability of testing the speech-reception threshold for sentences, Audiology 18, 43-52 (1979)CrossRef
4.12.
Zurück zum Zitat B. Hagerman: Sentences for testing speech intelligibility in noise, Scand. Audiol. 11, 79-87 (1982)CrossRef B. Hagerman: Sentences for testing speech intelligibility in noise, Scand. Audiol. 11, 79-87 (1982)CrossRef
4.13.
Zurück zum Zitat M. Nilsson, S.D. Soli, J.A. Sullivan: Development of the hearing in noise test for the measurement of speech reception thresholds in quiet and in noise, J. Acoust. Soc. Am. 95(2), 1085-1099 (1994)CrossRef M. Nilsson, S.D. Soli, J.A. Sullivan: Development of the hearing in noise test for the measurement of speech reception thresholds in quiet and in noise, J. Acoust. Soc. Am. 95(2), 1085-1099 (1994)CrossRef
4.14.
Zurück zum Zitat B. Kollmeier, M. Wesselkamp: Development and evaluation of a German sentence test for objective and subjective speech intelligibility assessment, J. Acoust. Soc. Am. 102, 2412-2421 (1997)CrossRef B. Kollmeier, M. Wesselkamp: Development and evaluation of a German sentence test for objective and subjective speech intelligibility assessment, J. Acoust. Soc. Am. 102, 2412-2421 (1997)CrossRef
4.15.
Zurück zum Zitat K. Wagener, V. Kühnel, B. Kollmeier: Entwicklung und Evaluation eines Satztests für die deutsche Sprache I: Design des Oldenburger Satztests (Development and evaluation of a German sentence test I: Design of the Oldenburg sentence test), Zeitschrift für Audiologie 38, 4-15 (1999) K. Wagener, V. Kühnel, B. Kollmeier: Entwicklung und Evaluation eines Satztests für die deutsche Sprache I: Design des Oldenburger Satztests (Development and evaluation of a German sentence test I: Design of the Oldenburg sentence test), Zeitschrift für Audiologie 38, 4-15 (1999)
4.16.
Zurück zum Zitat K. Wagener, J.L. Josvassen, R. Ardenkjaer: Design, optimization and evaluation of a Danish sentence test in noise, Int. J. Audiol. 42(1), 10-17 (2003)CrossRef K. Wagener, J.L. Josvassen, R. Ardenkjaer: Design, optimization and evaluation of a Danish sentence test in noise, Int. J. Audiol. 42(1), 10-17 (2003)CrossRef
4.17.
Zurück zum Zitat T. Brand, B. Kollmeier: Efficient adaptive procedures for threshold and concurrent slope estimates for psychophysics and speech intelligibility tests, J. Acoust. Soc. Am. 111(6), 2801-2810 (2002)CrossRef T. Brand, B. Kollmeier: Efficient adaptive procedures for threshold and concurrent slope estimates for psychophysics and speech intelligibility tests, J. Acoust. Soc. Am. 111(6), 2801-2810 (2002)CrossRef
4.18.
Zurück zum Zitat A. Bronkhorst: The cocktail party phenomenon: a review of research on speech intelligibility in multiple-talker conditions, Acustica 86, 117-128 (2000) A. Bronkhorst: The cocktail party phenomenon: a review of research on speech intelligibility in multiple-talker conditions, Acustica 86, 117-128 (2000)
4.19.
Zurück zum Zitat N.R. French, J.C. Steinberg: Factors governing the intelligibility of speech sounds, J. Acoust. Soc. Am. 19, 90-119 (1947)CrossRef N.R. French, J.C. Steinberg: Factors governing the intelligibility of speech sounds, J. Acoust. Soc. Am. 19, 90-119 (1947)CrossRef
4.20.
Zurück zum Zitat H. Fletcher, R.H. Galt: The perception of speech and its relation to telephony, J. Acoust. Soc. Am. 22, 89-151 (1950)CrossRef H. Fletcher, R.H. Galt: The perception of speech and its relation to telephony, J. Acoust. Soc. Am. 22, 89-151 (1950)CrossRef
4.21.
Zurück zum Zitat ANSI: Methods for the calculation of the articulation index, ANSI S3.5-1969 (American National Standards Institute, New York 1969) ANSI: Methods for the calculation of the articulation index, ANSI S3.5-1969 (American National Standards Institute, New York 1969)
4.22.
Zurück zum Zitat ANSI: Methods for calculation of the speech intelligibility index, ANSI S3.5-1997 (American National Standards Institute, New York 1997) ANSI: Methods for calculation of the speech intelligibility index, ANSI S3.5-1997 (American National Standards Institute, New York 1997)
4.23.
Zurück zum Zitat T. Houtgast, H.J.M. Steeneken: A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria, J. Acoust. Soc. Am. 77, 1069-1077 (1985)CrossRef T. Houtgast, H.J.M. Steeneken: A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria, J. Acoust. Soc. Am. 77, 1069-1077 (1985)CrossRef
4.24.
Zurück zum Zitat IEC: Sound system equipment - Part 16: Objective rating of speech intelligibility by speech transmission index. INTERNATIONAL STANDARD 60268-16 Second edition 1998-03 (1998) IEC: Sound system equipment - Part 16: Objective rating of speech intelligibility by speech transmission index. INTERNATIONAL STANDARD 60268-16 Second edition 1998-03 (1998)
4.25.
Zurück zum Zitat H. Müsch, S. Buus: Using statistical decision theory to predict speech intelligibility. I. Model structure, J. Acoust. Soc. Am. 109, 2896-2909 (2001)CrossRef H. Müsch, S. Buus: Using statistical decision theory to predict speech intelligibility. I. Model structure, J. Acoust. Soc. Am. 109, 2896-2909 (2001)CrossRef
4.26.
Zurück zum Zitat H. Müsch, S. Buus: Using statistical decision theory to predict speech intelligibility, II. Measurement and prediction of consonant-discrimination performance J. Acoust. Soc. Am. 109, 2910-2920 (2001) H. Müsch, S. Buus: Using statistical decision theory to predict speech intelligibility, II. Measurement and prediction of consonant-discrimination performance J. Acoust. Soc. Am. 109, 2910-2920 (2001)
4.27.
Zurück zum Zitat I. Holube, B. Kollmeier: Speech intelligibility prediction in hearing-impaired listeners based on a psychoacoustically motivated perception model, J. Acoust. Soc. Am. 100, 1703-1716 (1996)CrossRef I. Holube, B. Kollmeier: Speech intelligibility prediction in hearing-impaired listeners based on a psychoacoustically motivated perception model, J. Acoust. Soc. Am. 100, 1703-1716 (1996)CrossRef
4.28.
Zurück zum Zitat R. Lippmann: Speech recognition by machines and humans, Speech Commun. 22, 1-15 (1997)CrossRef R. Lippmann: Speech recognition by machines and humans, Speech Commun. 22, 1-15 (1997)CrossRef
4.29.
Zurück zum Zitat S. Greenberg, W.A. Ainsworth, A.N. Popper: Speech Processing in the Auditory System. In: Handbook of Auditory research, Vol. 18, ed. by R.R. Fay (Springer, New York 2004) S. Greenberg, W.A. Ainsworth, A.N. Popper: Speech Processing in the Auditory System. In: Handbook of Auditory research, Vol. 18, ed. by R.R. Fay (Springer, New York 2004)
4.30.
Zurück zum Zitat M.R. Schroeder, H.W. Strube: Flat spectrum speech, J. Acoust. Soc. Am. 79, 1580-1583 (1986)CrossRef M.R. Schroeder, H.W. Strube: Flat spectrum speech, J. Acoust. Soc. Am. 79, 1580-1583 (1986)CrossRef
4.31.
Zurück zum Zitat R.V. Shannon, F.G. Zeng, V. Kamth, J. Wygonsky, M. Ekelid: Speech recognition with primarily temporal cues, Science 270, 303-304 (1995)CrossRef R.V. Shannon, F.G. Zeng, V. Kamth, J. Wygonsky, M. Ekelid: Speech recognition with primarily temporal cues, Science 270, 303-304 (1995)CrossRef
4.32.
Zurück zum Zitat R. Jakobson, C.G.M. Fant, M. Halle: Preliminaries to speech analysis: the distinctive features and their correlates (MIT Press, Cambridge 1963) R. Jakobson, C.G.M. Fant, M. Halle: Preliminaries to speech analysis: the distinctive features and their correlates (MIT Press, Cambridge 1963)
4.33.
Zurück zum Zitat G.A. Miller, P.E. Nicely: An analysis of perceptual confusions among some english consonants, J. Acoust. Soc. Am. 27, 338-352 (1955)CrossRef G.A. Miller, P.E. Nicely: An analysis of perceptual confusions among some english consonants, J. Acoust. Soc. Am. 27, 338-352 (1955)CrossRef
4.34.
Zurück zum Zitat M.D. Wang, R.C. Bilger: Consonant confusions in noise: a study of perceptual features, J. Acoust. Soc. Am. 54, 1248-1266 (1973)CrossRef M.D. Wang, R.C. Bilger: Consonant confusions in noise: a study of perceptual features, J. Acoust. Soc. Am. 54, 1248-1266 (1973)CrossRef
4.35.
Zurück zum Zitat J. Tchorz, B. Kollmeier: A model of auditory perception as front end for automatic speech recognition, J. Acoust. Soc. Am. 106(4), 2040-2050 (1999)CrossRef J. Tchorz, B. Kollmeier: A model of auditory perception as front end for automatic speech recognition, J. Acoust. Soc. Am. 106(4), 2040-2050 (1999)CrossRef
4.36.
Zurück zum Zitat M. Hansen, B. Kollmeier: Objective modeling of speech quality with a psychoacoustically validated auditory model, J. Audio Eng. Soc. 48(5), 395-408 (2000) M. Hansen, B. Kollmeier: Objective modeling of speech quality with a psychoacoustically validated auditory model, J. Audio Eng. Soc. 48(5), 395-408 (2000)
4.37.
Zurück zum Zitat C.E. Schreiner, G. Langner: Periodicity coding in the inferior colliculus of the cat II. Topographical organization, J. Neurophys. 60, 1823-1840 (1988) C.E. Schreiner, G. Langner: Periodicity coding in the inferior colliculus of the cat II. Topographical organization, J. Neurophys. 60, 1823-1840 (1988)
4.38.
Zurück zum Zitat M. Kleinschmidt: Methods for capturing spectro-temporal modulations in automatic speech recognition, Acustica united with Acta Acustica 88(3), 416-422 (2002) M. Kleinschmidt: Methods for capturing spectro-temporal modulations in automatic speech recognition, Acustica united with Acta Acustica 88(3), 416-422 (2002)
4.39.
Zurück zum Zitat D.A. Depireux, J.Z. Simon, D.J. Klein, S.A. Shamma: Spectro-temporal response field characterization with dynamic ripples in ferret primary auditory cortex, J. Neurophysiol. 85(3), 1220-1234 (2001) D.A. Depireux, J.Z. Simon, D.J. Klein, S.A. Shamma: Spectro-temporal response field characterization with dynamic ripples in ferret primary auditory cortex, J. Neurophysiol. 85(3), 1220-1234 (2001)
4.40.
Zurück zum Zitat C. Kaernbach: The Memory of Noise, Exp. Psychol. 1(4), 240-248 (2004)CrossRef C. Kaernbach: The Memory of Noise, Exp. Psychol. 1(4), 240-248 (2004)CrossRef
Metadaten
Titel
Perception of Speech and Sound
verfasst von
Birger Kollmeier, Prof.
Thomas Brand, Dr.
Bernd Meyer, Ph.D
Copyright-Jahr
2008
Verlag
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-540-49127-9_4

Neuer Inhalt