Skip to main content

2008 | OriginalPaper | Buchkapitel

9. Homomorphic Systems and Cepstrum Analysis of Speech

verfasst von : Ronald W. Schafer, Prof.

Erschienen in: Springer Handbook of Speech Processing

Verlag: Springer Berlin Heidelberg

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In 1963, Bogert, Healy, and Tukey published a chapter with one of the most unusual titles to be found in the literature of science and engineering [9.1]. In this chapter, they observed that the logarithm of the power spectrum of a signal plus its echo (delayed and scaled replica) consists of the logarithm of the signal spectrum plus a periodic component due to the echo. They suggested that further spectrum analysis of the log spectrum could highlight the periodic component in the log spectrum and thus lead to a new indicator of the occurrence of an echo. Specifically they made the following observation:
In general, we find ourselves operating on the frequency side in ways customary on the time side and vice versa.
As an aid in formalizing this new point of view, they introduced a number of paraphrased words. For example, they defined the cepstrum of a signal as the power spectrum of the logarithm of the power spectrum of a signal. (In fact, they used discrete-time spectrum estimates based on the discrete Fourier transform.) Similarly, the term quefrency was introduced for the independent variable of the cepstrum [9.1].
In this chapter we will explore why the cepstrum has emerged as a central concept in digital speech processing. We will start with definitions appropriate for discrete-time signal processing and develop some of the general properties and computational approaches for the cepstrum of speech. Using this basis, we will explore the many ways that the cepstrum has been used in speech processing applications.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
9.1.
Zurück zum Zitat B.P. Bogert, M.J.R. Healy, J.W. Tukey: The quefrency alanysis of times series for echos: cepstrum, pseudo-autocovariance, cross-cepstrum, and saphe cracking, Proc. of the Symposium on Time Series Analysis, ed. by M. Rosenblatt (Wiley, New York 1963) B.P. Bogert, M.J.R. Healy, J.W. Tukey: The quefrency alanysis of times series for echos: cepstrum, pseudo-autocovariance, cross-cepstrum, and saphe cracking, Proc. of the Symposium on Time Series Analysis, ed. by M. Rosenblatt (Wiley, New York 1963)
9.2.
Zurück zum Zitat R.W. Schafer: Echo removal by discrete generalized linear filtering (MIT, Cambridge 1968), Ph.D. dissertation R.W. Schafer: Echo removal by discrete generalized linear filtering (MIT, Cambridge 1968), Ph.D. dissertation
9.3.
Zurück zum Zitat A.V. Oppenheim, R.W. Schafer, T.G. Stockham Jr.: Nonlinear filtering of multiplied and convolved signals, Proc. IEEE 56(8), 1264-1291 (1968)CrossRef A.V. Oppenheim, R.W. Schafer, T.G. Stockham Jr.: Nonlinear filtering of multiplied and convolved signals, Proc. IEEE 56(8), 1264-1291 (1968)CrossRef
9.4.
Zurück zum Zitat A.V. Oppenheim, R.W. Schafer, J.R. Buck: Discrete-Time Signal Processing (Upper Saddle River, Prentice-Hall 1999) A.V. Oppenheim, R.W. Schafer, J.R. Buck: Discrete-Time Signal Processing (Upper Saddle River, Prentice-Hall 1999)
9.5.
Zurück zum Zitat A.V. Oppenheim: Superposition in a Class of Nonlinear Systems (MIT, Cambridge 1964), Ph.D. dissertation, Also: MIT Research Lab. of Electronics, Cambridge, Massachusetts, Technical Report 432 A.V. Oppenheim: Superposition in a Class of Nonlinear Systems (MIT, Cambridge 1964), Ph.D. dissertation, Also: MIT Research Lab. of Electronics, Cambridge, Massachusetts, Technical Report 432
9.6.
Zurück zum Zitat J.M. Tribolet: A new phase unwrapping algorithm, IEEE Trans. Acoust. Speech ASSP-25(2), 170-177 (1977)CrossRefMATH J.M. Tribolet: A new phase unwrapping algorithm, IEEE Trans. Acoust. Speech ASSP-25(2), 170-177 (1977)CrossRefMATH
9.7.
Zurück zum Zitat G.A. Sitton, C.S. Burrus, J.W. Fox, S. Treitel: Factoring very-high-degree polynomials, IEEE Signal Proc. Mag. 20(6), 27-42 (2003)CrossRef G.A. Sitton, C.S. Burrus, J.W. Fox, S. Treitel: Factoring very-high-degree polynomials, IEEE Signal Proc. Mag. 20(6), 27-42 (2003)CrossRef
9.8.
Zurück zum Zitat L.R. Rabiner, R.W. Schafer: Digital Processing of Speech Signals (Prentice-Hall, Englewood Cliffs 1978) L.R. Rabiner, R.W. Schafer: Digital Processing of Speech Signals (Prentice-Hall, Englewood Cliffs 1978)
9.9.
Zurück zum Zitat A.V. Oppenheim, R.W. Schafer: Homomorphic analysis of speech, IEEE Trans. Audio Electroacoust. AU-16, 221-228 (1968)CrossRef A.V. Oppenheim, R.W. Schafer: Homomorphic analysis of speech, IEEE Trans. Audio Electroacoust. AU-16, 221-228 (1968)CrossRef
9.10.
Zurück zum Zitat G.E. Kopec, A.V. Oppenheim, J.M. Tribolet: Speech analysis by homomorphic prediction, IEEE Trans. Acoust. Speech ASSP-25(1), 40-49 (1977)CrossRef G.E. Kopec, A.V. Oppenheim, J.M. Tribolet: Speech analysis by homomorphic prediction, IEEE Trans. Acoust. Speech ASSP-25(1), 40-49 (1977)CrossRef
9.11.
Zurück zum Zitat A.M. Noll: Cepstrum pitch determination, J. Acoust. Soc. Am. 41(2), 293-309 (1967)CrossRef A.M. Noll: Cepstrum pitch determination, J. Acoust. Soc. Am. 41(2), 293-309 (1967)CrossRef
9.12.
Zurück zum Zitat B.S. Atal, S.L. Hanauer: Speech analysis and synthesis by linear prediction of the speech wave, J. Acoust. Soc. Am. 50, 561-580 (1971)CrossRef B.S. Atal, S.L. Hanauer: Speech analysis and synthesis by linear prediction of the speech wave, J. Acoust. Soc. Am. 50, 561-580 (1971)CrossRef
9.13.
Zurück zum Zitat A.V. Oppenheim: A speech analysis-synthesis system based on homomorphic filtering, J. Acoust. Soc. Am. 45(2), 293-309 (1969)CrossRef A.V. Oppenheim: A speech analysis-synthesis system based on homomorphic filtering, J. Acoust. Soc. Am. 45(2), 293-309 (1969)CrossRef
9.14.
Zurück zum Zitat R.W. Schafer, L.R. Rabiner: System for automatic formant analysis of voiced speech, J. Acoust. Soc. Am. 47(2), 458-465 (1970) R.W. Schafer, L.R. Rabiner: System for automatic formant analysis of voiced speech, J. Acoust. Soc. Am. 47(2), 458-465 (1970)
9.15.
Zurück zum Zitat B.S. Atal, J. Remde: A new model of LPC exitation for producing natural-sounding speech at low bit rates, Proc. IEEE ICASSP (1982), 614-617 B.S. Atal, J. Remde: A new model of LPC exitation for producing natural-sounding speech at low bit rates, Proc. IEEE ICASSP (1982), 614-617
9.16.
Zurück zum Zitat M.R. Schroeder, B.S. Atal: Code-excited linear prediction (CELP): high-quality speech at very low bit rates, Proc. IEEE ICASSP (1985), 937-940 M.R. Schroeder, B.S. Atal: Code-excited linear prediction (CELP): high-quality speech at very low bit rates, Proc. IEEE ICASSP (1985), 937-940
9.17.
Zurück zum Zitat R.C. Rose, T.P. Barnwell III: The self excited vocoder - an alternate approach to toll quality at 4800 bps, Proc. IEEE ICASSP 11, 453-456 (1986) R.C. Rose, T.P. Barnwell III: The self excited vocoder - an alternate approach to toll quality at 4800 bps, Proc. IEEE ICASSP 11, 453-456 (1986)
9.18.
Zurück zum Zitat J.H. Chung, R.W. Schafer: Excitation modeling in a homomorphic vocoder, Proc. IEEE ICASSP 1, 25-28 (1990) J.H. Chung, R.W. Schafer: Excitation modeling in a homomorphic vocoder, Proc. IEEE ICASSP 1, 25-28 (1990)
9.19.
Zurück zum Zitat J.H. Chung, R.W. Schafer: Performance evaluation of analysis-by-synthesis homomorphic vocoders, Proc. IEEE ICASSP 2, 117-120 (1992) J.H. Chung, R.W. Schafer: Performance evaluation of analysis-by-synthesis homomorphic vocoders, Proc. IEEE ICASSP 2, 117-120 (1992)
9.20.
Zurück zum Zitat B.S. Atal, M.R. Schroeder: Predictive coding of speech signals and subjective error criterion, IEEE Trans. Acoust. Speech ASSP-27, 247-254 (1079) B.S. Atal, M.R. Schroeder: Predictive coding of speech signals and subjective error criterion, IEEE Trans. Acoust. Speech ASSP-27, 247-254 (1079)
9.21.
Zurück zum Zitat T.G. Stockham Jr., T.M. Cannon, R.B. Ingebretsen: Blind deconvolution through digital signal processing, Proc. IEEE 63, 678-692 (1975)CrossRef T.G. Stockham Jr., T.M. Cannon, R.B. Ingebretsen: Blind deconvolution through digital signal processing, Proc. IEEE 63, 678-692 (1975)CrossRef
9.22.
Zurück zum Zitat S. Furui: Cepstral analysis technique for automatic speaker verification, IEEE Trans. Acoust. Speech ASSP-29(2), 254-272 (1981)CrossRef S. Furui: Cepstral analysis technique for automatic speaker verification, IEEE Trans. Acoust. Speech ASSP-29(2), 254-272 (1981)CrossRef
9.23.
Zurück zum Zitat Y. Tohkura: A weighted cepstral distance measure for speech recognition, IEEE Trans. Acoust. Speech ASSP-35(10), 1414-1422 (1987)CrossRef Y. Tohkura: A weighted cepstral distance measure for speech recognition, IEEE Trans. Acoust. Speech ASSP-35(10), 1414-1422 (1987)CrossRef
9.24.
Zurück zum Zitat B.-H. Juang, L.R. Rabiner, J.G. Wilpon: On the use of bandpass liftering in speech recognition, IEEE Trans. Acoust. Speech ASSP-35(7), 947-954 (1987)CrossRef B.-H. Juang, L.R. Rabiner, J.G. Wilpon: On the use of bandpass liftering in speech recognition, IEEE Trans. Acoust. Speech ASSP-35(7), 947-954 (1987)CrossRef
9.25.
Zurück zum Zitat F. Itakura, T. Umezaki: Distance measure for speech recognition based on the smoothed group delay spectrum, Proc. IEEE ICASSP 12, 1257-1260 (1987) F. Itakura, T. Umezaki: Distance measure for speech recognition based on the smoothed group delay spectrum, Proc. IEEE ICASSP 12, 1257-1260 (1987)
9.26.
Zurück zum Zitat S.B. Davis, P. Mermelstein: Comparison of parametric representations for monosyllabic word recognition in continously spoken sentences, IEEE Trans. Acoust. Speech ASSP-28(4), 357-366 (1980)CrossRef S.B. Davis, P. Mermelstein: Comparison of parametric representations for monosyllabic word recognition in continously spoken sentences, IEEE Trans. Acoust. Speech ASSP-28(4), 357-366 (1980)CrossRef
9.27.
Zurück zum Zitat P.D. Smith, M. Kucic, R. Ellis, P. Hasler, D.V. Anderson: Mel-frequency cepstrum encoding in analog floating-gate circuitry, Proc. ISCAS 2002(4), 671-674 (2002) P.D. Smith, M. Kucic, R. Ellis, P. Hasler, D.V. Anderson: Mel-frequency cepstrum encoding in analog floating-gate circuitry, Proc. ISCAS 2002(4), 671-674 (2002)
Metadaten
Titel
Homomorphic Systems and Cepstrum Analysis of Speech
verfasst von
Ronald W. Schafer, Prof.
Copyright-Jahr
2008
Verlag
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-540-49127-9_9

Neuer Inhalt