Skip to main content
Top

2008 | OriginalPaper | Chapter

14. Principles of Speech Coding

Author : W. Bastiaan Kleijn, Prof.

Published in: Springer Handbook of Speech Processing

Publisher: Springer Berlin Heidelberg

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Speech coding is the art of reducing the bit rate required to describe a speech signal. In this chapter, we discuss the attributes of speech coders as well as the underlying principles that determine their behavior and their architecture. The ubiquitous class of linear-prediction-based coders is used as an illustration. Speech is generally modeled as a sequence of stationary signal segments, each having unique statistics. Segments are encoded using a two-step procedure: (1) find a model describing the speech segment, (2) encode the segment assuming it is generated by the model. We show that the bit allocation for the model (the predictor parameters) is independent of overall rate and of perception, which is consistent with existing experimental results. The modeling of perception is an important aspect of efficient coding and we discuss how various perceptual distortion measures can be integrated into speech coders.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
14.1.
go back to reference W.B. Kleijn, K.K. Paliwal: An introduction to speech coding. In: Speech Coding and Synthesis, ed. by W.B. Kleijn, K.K. Paliwal (Elsevier, Amsterdam 1995) pp. 1-47 W.B. Kleijn, K.K. Paliwal: An introduction to speech coding. In: Speech Coding and Synthesis, ed. by W.B. Kleijn, K.K. Paliwal (Elsevier, Amsterdam 1995) pp. 1-47
14.2.
go back to reference R.V. Cox: Speech coding standards. In: Speech Coding and Synthesis, ed. by W.B. Kleijn, K.K. Paliwal (Elsevier, Amsterdam 1995) pp. 49-78 R.V. Cox: Speech coding standards. In: Speech Coding and Synthesis, ed. by W.B. Kleijn, K.K. Paliwal (Elsevier, Amsterdam 1995) pp. 49-78
14.3.
go back to reference R. Salami, C. Laflamme, J. Adoul, A. Kataoka, S. HAyashi, T. Moriya, C. Lamblin, D. Massaloux, S. Proust, P. Kroon, Y. Shoham: Design and description of CS-ACELP: a toll quality 8 kb/s speech coder, IEEE Trans. Speech Audio Process. 6(2), 116-130 (1998)CrossRef R. Salami, C. Laflamme, J. Adoul, A. Kataoka, S. HAyashi, T. Moriya, C. Lamblin, D. Massaloux, S. Proust, P. Kroon, Y. Shoham: Design and description of CS-ACELP: a toll quality 8 kb/s speech coder, IEEE Trans. Speech Audio Process. 6(2), 116-130 (1998)CrossRef
14.4.
go back to reference B. Bessette, R. Salami, R. Lefebvre, M. Jelinek, J. Rotola-Pukkila, J. Vainio, H. Mikkola: The adaptive multirate wideband speech codec (amr-wb), IEEE Trans. Speech Audio Process. 6(8), 620-636 (2002)CrossRef B. Bessette, R. Salami, R. Lefebvre, M. Jelinek, J. Rotola-Pukkila, J. Vainio, H. Mikkola: The adaptive multirate wideband speech codec (amr-wb), IEEE Trans. Speech Audio Process. 6(8), 620-636 (2002)CrossRef
14.5.
go back to reference ITU-T Rec. P.800: Methods for Subjective Determination of Transmission Quality (1996) ITU-T Rec. P.800: Methods for Subjective Determination of Transmission Quality (1996)
14.6.
go back to reference A.W. Rix: Perceptual speech quality assessment - a review, Proc. IEEE ICASSP, Vol. 3 (2004) pp. 1056-1059 A.W. Rix: Perceptual speech quality assessment - a review, Proc. IEEE ICASSP, Vol. 3 (2004) pp. 1056-1059
14.7.
go back to reference S. Möller: Assessment and Prediction of Speech Quality in Telecommunications (Kluwer Academic, Boston 2000)CrossRef S. Möller: Assessment and Prediction of Speech Quality in Telecommunications (Kluwer Academic, Boston 2000)CrossRef
14.8.
go back to reference P. Kroon: Evaluation of speech coders. In: Speech Coding and Synthesis, ed. by W.B. Kleijn, K.K. Paliwal (Elsevier, Amsterdam 1995) pp. 467-493 P. Kroon: Evaluation of speech coders. In: Speech Coding and Synthesis, ed. by W.B. Kleijn, K.K. Paliwal (Elsevier, Amsterdam 1995) pp. 467-493
14.9.
go back to reference W. Stallings: High-speed networks: TCP/IP and ATM design principles (Prentice Hall, Englewood Cliffs 1998) W. Stallings: High-speed networks: TCP/IP and ATM design principles (Prentice Hall, Englewood Cliffs 1998)
14.10.
go back to reference Information Sciences Institute: Transmission control protocol, IETF RFC793 (1981) Information Sciences Institute: Transmission control protocol, IETF RFC793 (1981)
14.11.
go back to reference J. Postel: User datagram protocol, IETF RFC768 (1980) J. Postel: User datagram protocol, IETF RFC768 (1980)
14.12.
14.13.
go back to reference N. Kitawaki, K. Itoh: Pure delay effects on speech quality in telecommunications, IEEE J. Sel. Area. Comm. 9(4), 586-593 (1991)CrossRef N. Kitawaki, K. Itoh: Pure delay effects on speech quality in telecommunications, IEEE J. Sel. Area. Comm. 9(4), 586-593 (1991)CrossRef
14.14.
go back to reference J. Cox: The minimum detectable delay of speech and music, Proc. IEEE ICASSP, Vol. 1 (1984) pp. 136-139 J. Cox: The minimum detectable delay of speech and music, Proc. IEEE ICASSP, Vol. 1 (1984) pp. 136-139
14.15.
go back to reference J. Chen: A robust low-delay CELP speech coder at 16 kb/s. In: Advances in Speech Coding, ed. by B.S. Atal, V. Cuperman, A. Gersho (Kluwer Academic, Dordrecht 1991) pp. 25-35CrossRef J. Chen: A robust low-delay CELP speech coder at 16 kb/s. In: Advances in Speech Coding, ed. by B.S. Atal, V. Cuperman, A. Gersho (Kluwer Academic, Dordrecht 1991) pp. 25-35CrossRef
14.16.
go back to reference B.S. Atal, M.R. Schroeder: Stochastic coding of speech at very low bit rates, Proc. Int. Conf. Comm. (1984) pp. 1610-1613 B.S. Atal, M.R. Schroeder: Stochastic coding of speech at very low bit rates, Proc. Int. Conf. Comm. (1984) pp. 1610-1613
14.17.
go back to reference J.-P. Adoul, P. Mabilleau, M. Delprat, S. Morisette: Fast CELP coding based on algebraic codes, Proc. IEEE ICASSP (1987) pp. 1957-1960 J.-P. Adoul, P. Mabilleau, M. Delprat, S. Morisette: Fast CELP coding based on algebraic codes, Proc. IEEE ICASSP (1987) pp. 1957-1960
14.18.
go back to reference I.M. Trancoso, B.S. Atal: Efficient procedures for selecting the optimum innovation in stochastic coders, IEEE Trans. Acoust. Speech 38(3), 385-396 (1990)CrossRef I.M. Trancoso, B.S. Atal: Efficient procedures for selecting the optimum innovation in stochastic coders, IEEE Trans. Acoust. Speech 38(3), 385-396 (1990)CrossRef
14.19.
go back to reference W.B. Kleijn, D.J. Krasinski, R.H. Ketchum: Fast methods for the CELP speech coding algorithm, IEEE Trans. Acoust. Speech 38(8), 1330-1342 (1990)CrossRef W.B. Kleijn, D.J. Krasinski, R.H. Ketchum: Fast methods for the CELP speech coding algorithm, IEEE Trans. Acoust. Speech 38(8), 1330-1342 (1990)CrossRef
14.20.
go back to reference T. Lookabough, R. Gray: High-resolution theory and the vector quantizer advantage, IEEE Trans. Inform. Theory IT-35(5), 1020-1033 (1989)CrossRef T. Lookabough, R. Gray: High-resolution theory and the vector quantizer advantage, IEEE Trans. Inform. Theory IT-35(5), 1020-1033 (1989)CrossRef
14.21.
14.23.
go back to reference Y. Linde, A. Buzo, R.M. Gray: An algorithm for vector quantizer design, IEEE Trans. Commun. COM-28, 84-95 (1980)CrossRef Y. Linde, A. Buzo, R.M. Gray: An algorithm for vector quantizer design, IEEE Trans. Commun. COM-28, 84-95 (1980)CrossRef
14.24.
go back to reference P. Chou, T. Lookabough, R. Gray: Entropy-constrained vector quantization, IEEE Trans. Acoust. Speech 38(1), 31-42 (1989)MathSciNetCrossRef P. Chou, T. Lookabough, R. Gray: Entropy-constrained vector quantization, IEEE Trans. Acoust. Speech 38(1), 31-42 (1989)MathSciNetCrossRef
14.26.
go back to reference P. Swaszek, T. Ku: Asymptotic performance of unrestricted polar quantizers, IEEE Trans. Inform. Theory 32(2), 330-333 (1986)CrossRef P. Swaszek, T. Ku: Asymptotic performance of unrestricted polar quantizers, IEEE Trans. Inform. Theory 32(2), 330-333 (1986)CrossRef
14.27.
go back to reference R. Vafin, W.B. Kleijn: Entropy-constrained polar quantization and its application to audio coding, IEEE Trans. Speech Audio Process. 13(2), 220-232 (2005)CrossRef R. Vafin, W.B. Kleijn: Entropy-constrained polar quantization and its application to audio coding, IEEE Trans. Speech Audio Process. 13(2), 220-232 (2005)CrossRef
14.29.
14.30.
14.31.
go back to reference P. Grunwald: A tutorial introduction to the minimum description length principle. In: Advances in Minimum Description Length: Theory and Applications, ed. by P. Grunwald, I.J. Myung, M. Pitt (MIT, Boston 2005) P. Grunwald: A tutorial introduction to the minimum description length principle. In: Advances in Minimum Description Length: Theory and Applications, ed. by P. Grunwald, I.J. Myung, M. Pitt (MIT, Boston 2005)
14.32.
14.33.
go back to reference A.H. Gray, J.D. Markel: Distance measures for speech process, IEEE Trans. Acoust. Speech Signal Process. ASSP-24(5), 380-391 (1976)CrossRef A.H. Gray, J.D. Markel: Distance measures for speech process, IEEE Trans. Acoust. Speech Signal Process. ASSP-24(5), 380-391 (1976)CrossRef
14.34.
go back to reference R. Hagen, P. Hedelin: Low bit-rate spectral coding in CELP a new LSP method, Proc. IEEE ICASSP (1990) pp. 189-192 R. Hagen, P. Hedelin: Low bit-rate spectral coding in CELP a new LSP method, Proc. IEEE ICASSP (1990) pp. 189-192
14.35.
go back to reference K.K. Paliwal, B.S. Atal: Efficient vector quantization of LPC parameters at 24 bits/frame, IEEE Trans. Speech Audio Process. 1(1), 3-14 (1993)CrossRef K.K. Paliwal, B.S. Atal: Efficient vector quantization of LPC parameters at 24 bits/frame, IEEE Trans. Speech Audio Process. 1(1), 3-14 (1993)CrossRef
14.36.
go back to reference C. Xydeas, C. Papanastasiou: Split matrix quantization of lpc parameters, IEEE Trans. Speech Audio Process. 7(2), 113-125 (1999)CrossRef C. Xydeas, C. Papanastasiou: Split matrix quantization of lpc parameters, IEEE Trans. Speech Audio Process. 7(2), 113-125 (1999)CrossRef
14.37.
go back to reference A. Subramaniam, B. Rao: Speech LSF quantization with rate independent complexity, bit scalability, and learning, Proc. IEEE ICASSP (2001) pp. 705-708 A. Subramaniam, B. Rao: Speech LSF quantization with rate independent complexity, bit scalability, and learning, Proc. IEEE ICASSP (2001) pp. 705-708
14.38.
go back to reference U. Grenander, G. Szego: Toeplitz Forms and their Applications (Chelsea, New York 1984)MATH U. Grenander, G. Szego: Toeplitz Forms and their Applications (Chelsea, New York 1984)MATH
14.39.
go back to reference F. Itakura, S. Saito: Speech information compression based on the maximum likelihood estimation, J. Acoust. Soc. Jpn. 27(9), 463 (1971) F. Itakura, S. Saito: Speech information compression based on the maximum likelihood estimation, J. Acoust. Soc. Jpn. 27(9), 463 (1971)
14.40.
go back to reference S. Saito, K. Nakata: Fundamentals of Speech Signal Process (Academic, New York 1985) S. Saito, K. Nakata: Fundamentals of Speech Signal Process (Academic, New York 1985)
14.41.
go back to reference P.J. Brockwell, R.A. Davis: Time Series: Theory and Methods (Springer, New York 1996)MATH P.J. Brockwell, R.A. Davis: Time Series: Theory and Methods (Springer, New York 1996)MATH
14.42.
go back to reference F. Itakura, S. Saito: Analysis Synthesis Telephony Based Upon the Maximum Likelihood Method, Reports of 6th Int. Cong. Acoust.,C-5-5, C17-20, ed. by Y. Kohasi (1968) F. Itakura, S. Saito: Analysis Synthesis Telephony Based Upon the Maximum Likelihood Method, Reports of 6th Int. Cong. Acoust.,C-5-5, C17-20, ed. by Y. Kohasi (1968)
14.43.
go back to reference R.M. Gray, A. Buzo, A.H. Gray, Y. Matsuyama: Distortion measures for speech process, IEEE Trans. Acoust. Speech Signal Process. ASSP-28(4), 367-376 (1980)CrossRefMATH R.M. Gray, A. Buzo, A.H. Gray, Y. Matsuyama: Distortion measures for speech process, IEEE Trans. Acoust. Speech Signal Process. ASSP-28(4), 367-376 (1980)CrossRefMATH
14.44.
go back to reference K.K. Paliwal, W.B. Kleijn: Quantization of LPC parameters. In: Speech Coding and Synthesis, ed. by W.B. Kleijn, K.K. Paliwal (Elsevier, Amsterdam 1995) pp. 433-466 K.K. Paliwal, W.B. Kleijn: Quantization of LPC parameters. In: Speech Coding and Synthesis, ed. by W.B. Kleijn, K.K. Paliwal (Elsevier, Amsterdam 1995) pp. 433-466
14.45.
go back to reference W.R. Gardner, B.D. Rao: Noncausal all-pole modeling of voiced speech, IEEE Trans. Speech Audio Process. 5(1), 1-10 (1997)CrossRef W.R. Gardner, B.D. Rao: Noncausal all-pole modeling of voiced speech, IEEE Trans. Speech Audio Process. 5(1), 1-10 (1997)CrossRef
14.46.
go back to reference M. Nilsson, W.B. Kleijn: Shannon entropy estimation based on high-rate quantization theory, Proc. EUSIPCO (2004) pp. 1753-1756 M. Nilsson, W.B. Kleijn: Shannon entropy estimation based on high-rate quantization theory, Proc. EUSIPCO (2004) pp. 1753-1756
14.47.
go back to reference M. Nilsson: Entropy and Speech (Royal Institute of Technology, Stockholm 2006), Ph.D. dissertation, KTH M. Nilsson: Entropy and Speech (Royal Institute of Technology, Stockholm 2006), Ph.D. dissertation, KTH
14.48.
go back to reference C. Lamm: Improved Spectral Estimation in Speech Coding (Lund Institute of Technology (LTH), Lund 1998), Masterʼs thesis C. Lamm: Improved Spectral Estimation in Speech Coding (Lund Institute of Technology (LTH), Lund 1998), Masterʼs thesis
14.49.
go back to reference K.L.C. Chan: Split-dimension vector quantization of parcor coefficients for low bit rate speech coding, IEEE Trans. Speech Audio Process. 2(3), 443-446 (1994)CrossRef K.L.C. Chan: Split-dimension vector quantization of parcor coefficients for low bit rate speech coding, IEEE Trans. Speech Audio Process. 2(3), 443-446 (1994)CrossRef
14.50.
go back to reference A. Subramaniam, B.D. Rao: PDF optimized parametric vector quantization of speech line spectral freuencies, IEEE Speech Coding Workshop (Delavan 2000) pp. 87-89 A. Subramaniam, B.D. Rao: PDF optimized parametric vector quantization of speech line spectral freuencies, IEEE Speech Coding Workshop (Delavan 2000) pp. 87-89
14.51.
go back to reference P. Hedelin, J. Skoglund: Vector quantization based on Gaussian mixture models, IEEE Trans. Speech Audio Process. 8(4), 385-401 (2000)CrossRef P. Hedelin, J. Skoglund: Vector quantization based on Gaussian mixture models, IEEE Trans. Speech Audio Process. 8(4), 385-401 (2000)CrossRef
14.52.
go back to reference S. Srinivasan, J. Samuelsson, W.B. Kleijn: Speech enhancement using a-priori information with classified noise codebooks, Proc. EUSIPCO (2004) pp. 1461-1464 S. Srinivasan, J. Samuelsson, W.B. Kleijn: Speech enhancement using a-priori information with classified noise codebooks, Proc. EUSIPCO (2004) pp. 1461-1464
14.53.
go back to reference W.R. Gardner, B.D. Rao: Optimal distortion measures for the high rate vector quantization of LPC parameters, Proc. IEEE ICASSP (1995) pp. 752-755 W.R. Gardner, B.D. Rao: Optimal distortion measures for the high rate vector quantization of LPC parameters, Proc. IEEE ICASSP (1995) pp. 752-755
14.54.
go back to reference M.Y. Kim, W.B. Kleijn: KLT-based adaptive classified vector quantization of the speech signal, IEEE Trans. Speech Audio Process. 12(3), 277-289 (2004)CrossRef M.Y. Kim, W.B. Kleijn: KLT-based adaptive classified vector quantization of the speech signal, IEEE Trans. Speech Audio Process. 12(3), 277-289 (2004)CrossRef
14.55.
go back to reference P. Kroon, E.F. Deprettere: A class of analysis-by-synthesis predictive coders for high quality speech coding at rates between 4.8 and 16 kbit/s, IEEE J. Sel. Area. Commun. 6(2), 353-363 (1988)CrossRef P. Kroon, E.F. Deprettere: A class of analysis-by-synthesis predictive coders for high quality speech coding at rates between 4.8 and 16 kbit/s, IEEE J. Sel. Area. Commun. 6(2), 353-363 (1988)CrossRef
14.56.
go back to reference J. Chen, A. Gersho: Real-time vector APC speech coding at 4-800 bps with adaptive postfiltering, Proc. IEEE ICASSP (1987) pp. 2185-2188 J. Chen, A. Gersho: Real-time vector APC speech coding at 4-800 bps with adaptive postfiltering, Proc. IEEE ICASSP (1987) pp. 2185-2188
14.57.
go back to reference J. Johnston: Transform coding of audio signals using perceptual noise criteria, IEEE J. Sel. Area. Commun. 6(2), 314-323 (1988)CrossRef J. Johnston: Transform coding of audio signals using perceptual noise criteria, IEEE J. Sel. Area. Commun. 6(2), 314-323 (1988)CrossRef
14.58.
go back to reference H. Malvar: Enhancing the performance of subband audio coders for speech signals, Proc. IEEE Int. Symp. on Circ. Syst., Vol. 5 (1998) pp. 98-101 H. Malvar: Enhancing the performance of subband audio coders for speech signals, Proc. IEEE Int. Symp. on Circ. Syst., Vol. 5 (1998) pp. 98-101
14.59.
go back to reference R. Veldhuis: Bit rates in audio source coding, IEEE J. Sel. Area. Commun. 10(1), 86-96 (1992)CrossRef R. Veldhuis: Bit rates in audio source coding, IEEE J. Sel. Area. Commun. 10(1), 86-96 (1992)CrossRef
14.60.
go back to reference B.C.J. Moore: Masking in the human auditory system. In: Collected papers on digital audio bit-rate reduction, ed. by N. Gilchrist, C. Grewin (Audio Eng. Soc., New York 1996) B.C.J. Moore: Masking in the human auditory system. In: Collected papers on digital audio bit-rate reduction, ed. by N. Gilchrist, C. Grewin (Audio Eng. Soc., New York 1996)
14.61.
go back to reference B.C.J. Moore: An Introduction to the Psychology of Hearing (Academic, London 1997) B.C.J. Moore: An Introduction to the Psychology of Hearing (Academic, London 1997)
14.62.
go back to reference E. Zwicker, H. Fastl: Psychoacoustics (Springer Verlag, Berlin, Heidelberg 1999)CrossRef E. Zwicker, H. Fastl: Psychoacoustics (Springer Verlag, Berlin, Heidelberg 1999)CrossRef
14.63.
go back to reference T. Painter, A. Spanias: Perceptual coding of digital audio, Proc. IEEE 88(4), 451-515 (2000)CrossRef T. Painter, A. Spanias: Perceptual coding of digital audio, Proc. IEEE 88(4), 451-515 (2000)CrossRef
14.64.
go back to reference J.H. Plasberg, W.B. Kleijn: The sensitivity matrix: Using advanced auditory models in speech and audio processing, IEEE Trans. Speech Audio Process. 15, 310-319 (2007)CrossRef J.H. Plasberg, W.B. Kleijn: The sensitivity matrix: Using advanced auditory models in speech and audio processing, IEEE Trans. Speech Audio Process. 15, 310-319 (2007)CrossRef
14.65.
go back to reference J.L. Hall: Auditory psychophysics for coding applications. In: The Digital Signal Processing Handbook, ed. by V.K. Madisetti, D. Williams (CRC, Boca Raton 1998) pp. 39.1-39.25 J.L. Hall: Auditory psychophysics for coding applications. In: The Digital Signal Processing Handbook, ed. by V.K. Madisetti, D. Williams (CRC, Boca Raton 1998) pp. 39.1-39.25
14.66.
go back to reference W. Jesteadt, S.P. Bacon, J.R. Lehman: Forward masking as a function of frequency, masker level and signal delay, J. Acoust. Soc. Am. 71(4), 950-962 (1982)CrossRef W. Jesteadt, S.P. Bacon, J.R. Lehman: Forward masking as a function of frequency, masker level and signal delay, J. Acoust. Soc. Am. 71(4), 950-962 (1982)CrossRef
14.67.
go back to reference D. Sinha, J.D. Johnston: Audio compression at low bit rates using a signal adaptive switched filterbank, Proc. IEEE ICASSP, Vol. 2 (1996) pp. 1053-1056 D. Sinha, J.D. Johnston: Audio compression at low bit rates using a signal adaptive switched filterbank, Proc. IEEE ICASSP, Vol. 2 (1996) pp. 1053-1056
14.68.
go back to reference T. Verma, T. Meng: A 6 kbps to 85 kbps scalable audio coder, Proc. IEEE ICASSP, Vol. 2 (2000) pp. II877-II880 T. Verma, T. Meng: A 6 kbps to 85 kbps scalable audio coder, Proc. IEEE ICASSP, Vol. 2 (2000) pp. II877-II880
14.69.
go back to reference A.S. Scheuble, Z. Xiong: Scalable audio coding using the nonuniform modulated complex lapped transform, Proc. IEEE ICASSP, Vol. 5 (2001) pp. 3257-3260 A.S. Scheuble, Z. Xiong: Scalable audio coding using the nonuniform modulated complex lapped transform, Proc. IEEE ICASSP, Vol. 5 (2001) pp. 3257-3260
14.70.
go back to reference R. Heusdens, R. Vafin, W.B. Kleijn: Sinusoidal modeling using psychoacoustic-adaptive matching pursuits, IEEE Signal Proc. Lett. 9(8), 262-265 (2002)CrossRef R. Heusdens, R. Vafin, W.B. Kleijn: Sinusoidal modeling using psychoacoustic-adaptive matching pursuits, IEEE Signal Proc. Lett. 9(8), 262-265 (2002)CrossRef
14.71.
go back to reference M.Y. Kim, W.B. Kleijn: Resolution-constrained quantization with JND based perceptual-distortion measures, IEEE Signal Proc. Lett. 13(5), 304-307 (2006)CrossRef M.Y. Kim, W.B. Kleijn: Resolution-constrained quantization with JND based perceptual-distortion measures, IEEE Signal Proc. Lett. 13(5), 304-307 (2006)CrossRef
14.72.
go back to reference O. Ghitza: Auditory nerve representation as a basis for speech processing. In: Advances in Speech Signal Processing (Dekker, New York 1992) pp. 453-485 O. Ghitza: Auditory nerve representation as a basis for speech processing. In: Advances in Speech Signal Processing (Dekker, New York 1992) pp. 453-485
14.73.
go back to reference T. Dau, D. Püschel, A. Kohlrausch: A quantitative model of the effective signal processing in the auditory system. I. Model structure, J. Acoust. Soc. Am. 99(6), 3615-3622 (1996)CrossRef T. Dau, D. Püschel, A. Kohlrausch: A quantitative model of the effective signal processing in the auditory system. I. Model structure, J. Acoust. Soc. Am. 99(6), 3615-3622 (1996)CrossRef
14.74.
go back to reference T. Dau, B. Kollmeier, A. Kohlrausch: Modeling auditory processing of amplitude modulation. I. detection and masking with narrowband carriers, J. Acoust. Soc. Am. 102(5), 2892-2905 (1997)CrossRef T. Dau, B. Kollmeier, A. Kohlrausch: Modeling auditory processing of amplitude modulation. I. detection and masking with narrowband carriers, J. Acoust. Soc. Am. 102(5), 2892-2905 (1997)CrossRef
14.75.
go back to reference G. Kubin, W.B. Kleijn: On speech coding in a perceptual domain, Proc. IEEE ICASSP, Vol. I (1999) pp. 205-208 G. Kubin, W.B. Kleijn: On speech coding in a perceptual domain, Proc. IEEE ICASSP, Vol. I (1999) pp. 205-208
14.76.
go back to reference F. Baumgarte: Ein psychophysiologisches Gehörmodell zur Nachbildung von Wahrnehmungsschwellen für die Audiocodierung (Univ. Hannover, Hannover 2000), Ph.D. dissertation (in German) F. Baumgarte: Ein psychophysiologisches Gehörmodell zur Nachbildung von Wahrnehmungsschwellen für die Audiocodierung (Univ. Hannover, Hannover 2000), Ph.D. dissertation (in German)
14.77.
go back to reference S. van de Par, A. Kohlrausch, G. Charestan, R. Heusdens: A new psychoacoustical masking model for audio coding applications, Proc. IEEE ICASSP (2002) pp. 1805-1808 S. van de Par, A. Kohlrausch, G. Charestan, R. Heusdens: A new psychoacoustical masking model for audio coding applications, Proc. IEEE ICASSP (2002) pp. 1805-1808
14.78.
go back to reference D. Sen, D. Irving, W. Holmes: Use of an auditory model to improve speech coders, Proc. IEEE ICASSP (1993) pp. II411-II414 D. Sen, D. Irving, W. Holmes: Use of an auditory model to improve speech coders, Proc. IEEE ICASSP (1993) pp. II411-II414
14.79.
go back to reference J.H. Plasberg, D.Y. Zhao, W.B. Kleijn: The sensitivity matrix for a spectro-temporal auditory model, Proc. EUSIPCO (2004) pp. 1673-1676 J.H. Plasberg, D.Y. Zhao, W.B. Kleijn: The sensitivity matrix for a spectro-temporal auditory model, Proc. EUSIPCO (2004) pp. 1673-1676
14.80.
go back to reference X. Yang, K. Wang, S. Shamma: Auditory representation of acoustic signals, IEEE Trans. Inform. Theory 38(2), 824-839 (1996)CrossRef X. Yang, K. Wang, S. Shamma: Auditory representation of acoustic signals, IEEE Trans. Inform. Theory 38(2), 824-839 (1996)CrossRef
14.81.
go back to reference T. Linder, R. Zamir, K. Zeger: High-resolution source coding for non-difference measures: the rate-distortion function, IEEE Trans. Inform. Theory 45(2), 533-547 (1999)MathSciNetCrossRefMATH T. Linder, R. Zamir, K. Zeger: High-resolution source coding for non-difference measures: the rate-distortion function, IEEE Trans. Inform. Theory 45(2), 533-547 (1999)MathSciNetCrossRefMATH
14.82.
go back to reference I. Gerson, M. Jasiuk: Vector sum excited linear prediction (VSELP), Proc. IEEE ICASSP (1990) pp. 461-464 I. Gerson, M. Jasiuk: Vector sum excited linear prediction (VSELP), Proc. IEEE ICASSP (1990) pp. 461-464
Metadata
Title
Principles of Speech Coding
Author
W. Bastiaan Kleijn, Prof.
Copyright Year
2008
Publisher
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-540-49127-9_14