Top

Published in:

2008 | OriginalPaper | Chapter

14. Principles of Speech Coding

Author : W. Bastiaan Kleijn, Prof.

Published in: Springer Handbook of Speech Processing

Publisher: Springer Berlin Heidelberg

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Speech coding is the art of reducing the bit rate required to describe a speech signal. In this chapter, we discuss the attributes of speech coders as well as the underlying principles that determine their behavior and their architecture. The ubiquitous class of linear-prediction-based coders is used as an illustration. Speech is generally modeled as a sequence of stationary signal segments, each having unique statistics. Segments are encoded using a two-step procedure: (1) find a model describing the speech segment, (2) encode the segment assuming it is generated by the model. We show that the bit allocation for the model (the predictor parameters) is independent of overall rate and of perception, which is consistent with existing experimental results. The modeling of perception is an important aspect of efficient coding and we discuss how various perceptual distortion measures can be integrated into speech coders.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Adaptive Blind Multichannel Identification

next chapter Voice over IP: Speech Transmission over Packet Networks

14.1.

W.B. Kleijn, K.K. Paliwal: An introduction to speech coding. In: Speech Coding and Synthesis, ed. by W.B. Kleijn, K.K. Paliwal (Elsevier, Amsterdam 1995) pp. 1-47

14.2.

R.V. Cox: Speech coding standards. In: Speech Coding and Synthesis, ed. by W.B. Kleijn, K.K. Paliwal (Elsevier, Amsterdam 1995) pp. 49-78

14.3.

R. Salami, C. Laflamme, J. Adoul, A. Kataoka, S. HAyashi, T. Moriya, C. Lamblin, D. Massaloux, S. Proust, P. Kroon, Y. Shoham: Design and description of CS-ACELP: a toll quality 8 kb/s speech coder, IEEE Trans. Speech Audio Process. 6(2), 116-130 (1998)CrossRef

14.4.

B. Bessette, R. Salami, R. Lefebvre, M. Jelinek, J. Rotola-Pukkila, J. Vainio, H. Mikkola: The adaptive multirate wideband speech codec (amr-wb), IEEE Trans. Speech Audio Process. 6(8), 620-636 (2002)CrossRef

14.5.

ITU-T Rec. P.800: Methods for Subjective Determination of Transmission Quality (1996)

14.6.

A.W. Rix: Perceptual speech quality assessment - a review, Proc. IEEE ICASSP, Vol. 3 (2004) pp. 1056-1059

14.7.

S. Möller: Assessment and Prediction of Speech Quality in Telecommunications (Kluwer Academic, Boston 2000)CrossRef

14.8.

P. Kroon: Evaluation of speech coders. In: Speech Coding and Synthesis, ed. by W.B. Kleijn, K.K. Paliwal (Elsevier, Amsterdam 1995) pp. 467-493

14.9.

W. Stallings: High-speed networks: TCP/IP and ATM design principles (Prentice Hall, Englewood Cliffs 1998)

14.10.

Information Sciences Institute: Transmission control protocol, IETF RFC793 (1981)

14.11.

J. Postel: User datagram protocol, IETF RFC768 (1980)

14.12.

T.M. Cover, J.A. Thomas: Elements of Information Theory (Wiley, New York 1991)CrossRefMATH

14.13.

N. Kitawaki, K. Itoh: Pure delay effects on speech quality in telecommunications, IEEE J. Sel. Area. Comm. 9(4), 586-593 (1991)CrossRef

14.14.

J. Cox: The minimum detectable delay of speech and music, Proc. IEEE ICASSP, Vol. 1 (1984) pp. 136-139

14.15.

J. Chen: A robust low-delay CELP speech coder at 16 kb/s. In: Advances in Speech Coding, ed. by B.S. Atal, V. Cuperman, A. Gersho (Kluwer Academic, Dordrecht 1991) pp. 25-35CrossRef

14.16.

B.S. Atal, M.R. Schroeder: Stochastic coding of speech at very low bit rates, Proc. Int. Conf. Comm. (1984) pp. 1610-1613

14.17.

J.-P. Adoul, P. Mabilleau, M. Delprat, S. Morisette: Fast CELP coding based on algebraic codes, Proc. IEEE ICASSP (1987) pp. 1957-1960

14.18.

I.M. Trancoso, B.S. Atal: Efficient procedures for selecting the optimum innovation in stochastic coders, IEEE Trans. Acoust. Speech 38(3), 385-396 (1990)CrossRef

14.19.

W.B. Kleijn, D.J. Krasinski, R.H. Ketchum: Fast methods for the CELP speech coding algorithm, IEEE Trans. Acoust. Speech 38(8), 1330-1342 (1990)CrossRef

14.20.

T. Lookabough, R. Gray: High-resolution theory and the vector quantizer advantage, IEEE Trans. Inform. Theory IT-35(5), 1020-1033 (1989)CrossRef

14.21.

S. Na, D. Neuhoff: Bennettʼs integral for vector quantizers, IEEE Trans. Inform. Theory 41(4), 886-900 (1995)MathSciNetCrossRefMATH

14.22.

S.P. Lloyd: Least squares quantization in PCM, IEEE Trans. Inform. Theory IT-28, 129-137 (1982)MathSciNetCrossRefMATH

14.23.

Y. Linde, A. Buzo, R.M. Gray: An algorithm for vector quantizer design, IEEE Trans. Commun. COM-28, 84-95 (1980)CrossRef

14.24.

P. Chou, T. Lookabough, R. Gray: Entropy-constrained vector quantization, IEEE Trans. Acoust. Speech 38(1), 31-42 (1989)MathSciNetCrossRef

14.25.

A. Gersho: Asymptotically optimal block quantization, IEEE Trans. Inform. Theory 25, 373-380 (1979)MathSciNetCrossRefMATH

14.26.

P. Swaszek, T. Ku: Asymptotic performance of unrestricted polar quantizers, IEEE Trans. Inform. Theory 32(2), 330-333 (1986)CrossRef

14.27.

R. Vafin, W.B. Kleijn: Entropy-constrained polar quantization and its application to audio coding, IEEE Trans. Speech Audio Process. 13(2), 220-232 (2005)CrossRef

14.28.

J.J. Rissanen, G. Langdon: Arithmetic coding, IBM J. Res. Devel. 23(2), 149-162 (1979)MathSciNetCrossRefMATH

14.29.

J. Rissanen: Modeling by the shortest data description, Automatica 14, 465-471 (1978)CrossRefMATH

14.30.

J. Rissanen: A universal prior for integers and estimation by minimum description length, Ann. Stat. 11(2), 416-431 (1983)MathSciNetCrossRefMATH

14.31.

P. Grunwald: A tutorial introduction to the minimum description length principle. In: Advances in Minimum Description Length: Theory and Applications, ed. by P. Grunwald, I.J. Myung, M. Pitt (MIT, Boston 2005)

14.32.

A. Barron, T.M. Cover: Minimum complexity density estimation, IEEE Trans. Inform. Theory 37(4), 1034-1054 (1991)MathSciNetCrossRefMATH

14.33.

A.H. Gray, J.D. Markel: Distance measures for speech process, IEEE Trans. Acoust. Speech Signal Process. ASSP-24(5), 380-391 (1976)CrossRef

14.34.

R. Hagen, P. Hedelin: Low bit-rate spectral coding in CELP a new LSP method, Proc. IEEE ICASSP (1990) pp. 189-192

14.35.

K.K. Paliwal, B.S. Atal: Efficient vector quantization of LPC parameters at 24 bits/frame, IEEE Trans. Speech Audio Process. 1(1), 3-14 (1993)CrossRef

14.36.

C. Xydeas, C. Papanastasiou: Split matrix quantization of lpc parameters, IEEE Trans. Speech Audio Process. 7(2), 113-125 (1999)CrossRef

14.37.

A. Subramaniam, B. Rao: Speech LSF quantization with rate independent complexity, bit scalability, and learning, Proc. IEEE ICASSP (2001) pp. 705-708

14.38.

U. Grenander, G. Szego: Toeplitz Forms and their Applications (Chelsea, New York 1984)MATH

14.39.

F. Itakura, S. Saito: Speech information compression based on the maximum likelihood estimation, J. Acoust. Soc. Jpn. 27(9), 463 (1971)

14.40.

S. Saito, K. Nakata: Fundamentals of Speech Signal Process (Academic, New York 1985)

14.41.

P.J. Brockwell, R.A. Davis: Time Series: Theory and Methods (Springer, New York 1996)MATH

14.42.

F. Itakura, S. Saito: Analysis Synthesis Telephony Based Upon the Maximum Likelihood Method, Reports of 6th Int. Cong. Acoust.,C-5-5, C17-20, ed. by Y. Kohasi (1968)

14.43.

R.M. Gray, A. Buzo, A.H. Gray, Y. Matsuyama: Distortion measures for speech process, IEEE Trans. Acoust. Speech Signal Process. ASSP-28(4), 367-376 (1980)CrossRefMATH

14.44.

K.K. Paliwal, W.B. Kleijn: Quantization of LPC parameters. In: Speech Coding and Synthesis, ed. by W.B. Kleijn, K.K. Paliwal (Elsevier, Amsterdam 1995) pp. 433-466

14.45.

W.R. Gardner, B.D. Rao: Noncausal all-pole modeling of voiced speech, IEEE Trans. Speech Audio Process. 5(1), 1-10 (1997)CrossRef

14.46.

M. Nilsson, W.B. Kleijn: Shannon entropy estimation based on high-rate quantization theory, Proc. EUSIPCO (2004) pp. 1753-1756

14.47.

M. Nilsson: Entropy and Speech (Royal Institute of Technology, Stockholm 2006), Ph.D. dissertation, KTH

14.48.

C. Lamm: Improved Spectral Estimation in Speech Coding (Lund Institute of Technology (LTH), Lund 1998), Masterʼs thesis

14.49.

K.L.C. Chan: Split-dimension vector quantization of parcor coefficients for low bit rate speech coding, IEEE Trans. Speech Audio Process. 2(3), 443-446 (1994)CrossRef

14.50.

A. Subramaniam, B.D. Rao: PDF optimized parametric vector quantization of speech line spectral freuencies, IEEE Speech Coding Workshop (Delavan 2000) pp. 87-89

14.51.

P. Hedelin, J. Skoglund: Vector quantization based on Gaussian mixture models, IEEE Trans. Speech Audio Process. 8(4), 385-401 (2000)CrossRef

14.52.

S. Srinivasan, J. Samuelsson, W.B. Kleijn: Speech enhancement using a-priori information with classified noise codebooks, Proc. EUSIPCO (2004) pp. 1461-1464

14.53.

W.R. Gardner, B.D. Rao: Optimal distortion measures for the high rate vector quantization of LPC parameters, Proc. IEEE ICASSP (1995) pp. 752-755

14.54.

M.Y. Kim, W.B. Kleijn: KLT-based adaptive classified vector quantization of the speech signal, IEEE Trans. Speech Audio Process. 12(3), 277-289 (2004)CrossRef

14.55.

P. Kroon, E.F. Deprettere: A class of analysis-by-synthesis predictive coders for high quality speech coding at rates between 4.8 and 16 kbit/s, IEEE J. Sel. Area. Commun. 6(2), 353-363 (1988)CrossRef

14.56.

J. Chen, A. Gersho: Real-time vector APC speech coding at 4-800 bps with adaptive postfiltering, Proc. IEEE ICASSP (1987) pp. 2185-2188

14.57.

J. Johnston: Transform coding of audio signals using perceptual noise criteria, IEEE J. Sel. Area. Commun. 6(2), 314-323 (1988)CrossRef

14.58.

H. Malvar: Enhancing the performance of subband audio coders for speech signals, Proc. IEEE Int. Symp. on Circ. Syst., Vol. 5 (1998) pp. 98-101

14.59.

R. Veldhuis: Bit rates in audio source coding, IEEE J. Sel. Area. Commun. 10(1), 86-96 (1992)CrossRef

14.60.

B.C.J. Moore: Masking in the human auditory system. In: Collected papers on digital audio bit-rate reduction, ed. by N. Gilchrist, C. Grewin (Audio Eng. Soc., New York 1996)

14.61.

B.C.J. Moore: An Introduction to the Psychology of Hearing (Academic, London 1997)

14.62.

E. Zwicker, H. Fastl: Psychoacoustics (Springer Verlag, Berlin, Heidelberg 1999)CrossRef

14.63.

T. Painter, A. Spanias: Perceptual coding of digital audio, Proc. IEEE 88(4), 451-515 (2000)CrossRef

14.64.

J.H. Plasberg, W.B. Kleijn: The sensitivity matrix: Using advanced auditory models in speech and audio processing, IEEE Trans. Speech Audio Process. 15, 310-319 (2007)CrossRef

14.65.

J.L. Hall: Auditory psychophysics for coding applications. In: The Digital Signal Processing Handbook, ed. by V.K. Madisetti, D. Williams (CRC, Boca Raton 1998) pp. 39.1-39.25

14.66.

W. Jesteadt, S.P. Bacon, J.R. Lehman: Forward masking as a function of frequency, masker level and signal delay, J. Acoust. Soc. Am. 71(4), 950-962 (1982)CrossRef

14.67.

D. Sinha, J.D. Johnston: Audio compression at low bit rates using a signal adaptive switched filterbank, Proc. IEEE ICASSP, Vol. 2 (1996) pp. 1053-1056

14.68.

T. Verma, T. Meng: A 6 kbps to 85 kbps scalable audio coder, Proc. IEEE ICASSP, Vol. 2 (2000) pp. II877-II880

14.69.

A.S. Scheuble, Z. Xiong: Scalable audio coding using the nonuniform modulated complex lapped transform, Proc. IEEE ICASSP, Vol. 5 (2001) pp. 3257-3260

14.70.

R. Heusdens, R. Vafin, W.B. Kleijn: Sinusoidal modeling using psychoacoustic-adaptive matching pursuits, IEEE Signal Proc. Lett. 9(8), 262-265 (2002)CrossRef

14.71.

M.Y. Kim, W.B. Kleijn: Resolution-constrained quantization with JND based perceptual-distortion measures, IEEE Signal Proc. Lett. 13(5), 304-307 (2006)CrossRef

14.72.

O. Ghitza: Auditory nerve representation as a basis for speech processing. In: Advances in Speech Signal Processing (Dekker, New York 1992) pp. 453-485

14.73.

T. Dau, D. Püschel, A. Kohlrausch: A quantitative model of the effective signal processing in the auditory system. I. Model structure, J. Acoust. Soc. Am. 99(6), 3615-3622 (1996)CrossRef

14.74.

T. Dau, B. Kollmeier, A. Kohlrausch: Modeling auditory processing of amplitude modulation. I. detection and masking with narrowband carriers, J. Acoust. Soc. Am. 102(5), 2892-2905 (1997)CrossRef

14.75.

G. Kubin, W.B. Kleijn: On speech coding in a perceptual domain, Proc. IEEE ICASSP, Vol. I (1999) pp. 205-208

14.76.

F. Baumgarte: Ein psychophysiologisches Gehörmodell zur Nachbildung von Wahrnehmungsschwellen für die Audiocodierung (Univ. Hannover, Hannover 2000), Ph.D. dissertation (in German)

14.77.

S. van de Par, A. Kohlrausch, G. Charestan, R. Heusdens: A new psychoacoustical masking model for audio coding applications, Proc. IEEE ICASSP (2002) pp. 1805-1808

14.78.

D. Sen, D. Irving, W. Holmes: Use of an auditory model to improve speech coders, Proc. IEEE ICASSP (1993) pp. II411-II414

14.79.

J.H. Plasberg, D.Y. Zhao, W.B. Kleijn: The sensitivity matrix for a spectro-temporal auditory model, Proc. EUSIPCO (2004) pp. 1673-1676

14.80.

X. Yang, K. Wang, S. Shamma: Auditory representation of acoustic signals, IEEE Trans. Inform. Theory 38(2), 824-839 (1996)CrossRef

14.81.

T. Linder, R. Zamir, K. Zeger: High-resolution source coding for non-difference measures: the rate-distortion function, IEEE Trans. Inform. Theory 45(2), 533-547 (1999)MathSciNetCrossRefMATH

14.82.

I. Gerson, M. Jasiuk: Vector sum excited linear prediction (VSELP), Proc. IEEE ICASSP (1990) pp. 461-464

Title: Principles of Speech Coding
Author: W. Bastiaan Kleijn, Prof.
Publisher: Springer Berlin Heidelberg
Book: Springer Handbook of Speech Processing
Print ISBN: 978-3-540-49125-5

Electronic ISBN: 978-3-540-49127-9

Copyright Year: 2008
DOI: https://doi.org/10.1007/978-3-540-49127-9_14

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"