nach oben

Erschienen in:

2008 | OriginalPaper | Buchkapitel

12. The STFT, Sinusoidal Models, and Speech Modification

verfasst von : Michael M. Goodwin, Ph.D

Erschienen in: Springer Handbook of Speech Processing

Verlag: Springer Berlin Heidelberg

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Frequency-domain signal representations are used for a wide variety of applications in speech processing. In this Chapter, we first consider the short-time Fourier transform (STFT), presenting a number of interpretations of the analysis-synthesis process in a consistent mathematical framework. We then develop the sinusoidal model as a parametric extension of the STFT wherein the data in the STFT is compacted, sacrificing perfect reconstruction at the benefit of achieving a sparser and essentially more meaningful representation. We discuss several methods for sinusoidal parameter estimation and signal reconstruction, and present a detailed treatment of a matching pursuit algorithm for sinusoidal modeling. The final part of the Chapter addresses speech modifications such as filtering, enhancement, and time-scaling, for which both the STFT and the sinusoidal model are effective tools.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Formant Estimation and Tracking

Nächstes Kapitel Adaptive Blind Multichannel Identification

12.1.

L.R. Rabiner, R.W. Schafer: Digital Processing of Speech Signals (Prentice-Hall, Englewood Cliffs 1978)

12.2.

D. Gabor: Theory of communication, J. IEE 93(III-26), 429-457 (1946)

12.3.

D. Gabor: Acoustical quanta and the theory of hearing, Nature 159(4044), 591-594 (1947)CrossRef

12.4.

M. Vetterli, J. Kovačević: Wavelets and Subband Coding (Prentice-Hall, Englewood Cliffs 1995)MATH

12.5.

P.P. Vaidyanathan: Multirate Systems and Filter Banks (Prentice-Hall, Englewood Cliffs 1993)MATH

12.6.

T.F. Quatieri: Discrete-Time Speech Signal Processing (Prentice-Hall, Upper Saddle River 2002)

12.7.

J. Cooley, J. Tukey: An algorithm for the machine calculation of complex Fourier series, Math. Comput. 19(90), 297-301 (1965)MathSciNetCrossRefMATH

12.8.

P. Duhamel, M. Vetterli: Fast Fourier transforms: a tutorial review and a state of the art, Signal Process. 4(19), 259-299 (1990)MathSciNetCrossRefMATH

12.9.

R. Schafer, L. Rabiner: Design and simulation of a speech analysis-synthesis system based on short-time Fourier analysis, IEEE Trans. Audio Electroacoust. AU-21(3), 165-174 (1973)CrossRef

12.10.

M.R. Portnoff: Implementation of the digital phase vocoder using the fast Fourier transform, IEEE Trans. Acoust. Speech 24(3), 243-248 (1976)CrossRef

12.11.

J. Allen: Short term spectral analysis, synthesis, and modification by discrete Fourier transform, IEEE Trans. Acoust. Speech 25(3), 235-238 (1977)CrossRefMATH

12.12.

J. Allen, L. Rabiner: A unified approach to short-time Fourier analysis and synthesis, Proc. IEEE 65(11), 1558-1564 (1977)CrossRef

12.13.

J.O. Smith: Mathematics of the Discrete Fourier Transform (DFT), 2nd edn. (Booksurge, Seattle 2007), http://ccrma.stanford.edu/jos/mdft/

12.14.

M.R. Portnoff: Time-frequency representation of digital signals and systems based on short-time Fourier analysis, IEEE Trans. Acoust. Speech 28(1), 55-69 (1980)CrossRefMATH

12.15.

R. Crochiere: A weighted overlap-add method of short-time Fourier analysis/synthesis, IEEE Trans. Acoust. Speech 28(1), 99-102 (1980)CrossRef

12.16.

F.J. Harris: On the use of windows for harmonic analysis with the discrete Fourier transform, Proc. IEEE 66(1), 51-83 (1978)CrossRef

12.17.

A.H. Nuttall: Some windows with very good sidelobe behavior, IEEE Trans. Acoust. Speech 29(1), 84-91 (1981)CrossRef

12.18.

X. Rodet, P. Depalle: Spectral envelopes and inverse FFT synthesis, Proc. 93rd Conv. of the Audio Eng. Soc. (1992), Preprint 3393

12.19.

M. Goodwin: Adaptive Signal Models: Theory, Algorithms, and Audio Applications (Kluwer Academic, Boston 1998)CrossRef

12.20.

D. Griffin, J. Lim: Signal estimation from modified short-time Fourier transform, IEEE Trans. Acoust. Speech 32(2), 236-243 (1984)CrossRef

12.21.

H.S. Malvar: Signal Processing with Lapped Transforms (Artech House, Boston 1992)MATH

12.22.

Z. Czetkovic: Overcomplete Expansions for Digital Signal Processing (Univ. California, Berkeley 1995), PhD Dissertation

12.23.

Z. Czetkovic, M. Vetterli: Oversampled filter banks, IEEE Trans. Signal Process. 46(5), 1245-1255 (1998)CrossRef

12.24.

H. Bölcskei, F. Hlawatsch: Oversampled filter banks: Optimal noise shaping, design freedom, and noise analysis, IEEE ICASSP, Vol. 3 (1997) pp. 2453-2456

12.25.

F. Léonard: Referencing the phase to the centre of the spectral window. Why?, Mech. Syst. Signal Process. 2(1), 75-90 (1997)CrossRef

12.26.

M. Bosi, R. Goldberg: Introduction to Digital Audio Coding and Standards (Kluwer Academic, Boston 2003)CrossRef

12.27.

J.P. Princen, A.B. Bradley: Analysis/synthesis filter bank design based on time domain aliasing cancellation, IEEE Trans. Acoust. Speech 34(5), 1153-1161 (1986)CrossRef

12.28.

H.S. Malvar, D.H. Staelin: The LOT: Transform coding without blocking effects, IEEE Trans. Acoust. Speech 37(4), 553-559 (1989)CrossRef

12.29.

H. Dudley: The vocoder, Bell Lab. Rec. 18, 122-126 (1939)

12.30.

J.L. Flanagan, R.M. Golden: Phase vocoder, Bell Syst. Tech. J. 45(9), 1493-1509 (1966)CrossRef

12.31.

E. Moulines, J. Laroche: Non-parametric techniques for pitch-scale and time-scale modification of speech, Speech Commun. 16(2), 175-205 (1995)CrossRef

12.32.

J.A. Moorer: The use of the phase vocoder in computer music applications, J. Audio Eng. Soc. 26(1/2), 42-45 (1978)

12.33.

M. Dolson: The phase vocoder: A tutorial, Comput. Music J. 10(4), 14-27 (1986)CrossRef

12.34.

J. Laroche, M. Dolson: Improved phase vocoder time-scale modification of audio, IEEE Trans. Speech Audio Process. 7(3), 323-332 (1999)CrossRef

12.35.

J. Laroche, M. Dolson: New phase-vocoder techniques for real-time pitch shifting, chorusing, harmonizing, and other exotic audio modifications, J. Audio Eng. Soc. 47(11), 928-936 (1999)

12.36.

J. Laroche, M. Dolson: About this phasiness business, Proc. IEEE Workshop on Applications of Signal Process. to Audio and Acoust. (1997)

12.37.

R.J. McAulay, T.F. Quatieri: Speech analysis/synthesis based on a sinusoidal representation, IEEE Trans. Acoust. Speech 34(4), 744-754 (1986)CrossRef

12.38.

X. Serra, J. Smith: Spectral modeling synthesis: A sound analysis/synthesis system based on a deterministic plus stochastic decomposition, Comput. Music J. 14(4), 12-24 (1990)CrossRef

12.39.

E.B. George, M.J.T. Smith: Analysis-by-synthesis/ overlap-add sinusoidal modeling applied to the analysis and synthesis of musical tones, J. Audio Eng. Soc. 40(6), 497-516 (1992)

12.40.

P. Depalle, G. Garcia, X. Rodet: Tracking of partials for additive sound synthesis using hidden Markov models, IEEE ICASSP, Vol. 1 (1993) pp. 225-228

12.41.

S. Levine, J.O. Smith: A sines+transients+noise audio representation for data compression and time/pitch scale modifications, 105th Audio Eng. Soc. Conv. (1998), Preprint 4781.

12.42.

M. Lagrange, S. Marchand, J.-B. Rault: Using linear prediction to enhance the tracking of partials, IEEE ICASSP, Vol. 4 (2004) pp. 241-244

12.43.

E.B. George, M.J.T. Smith: Speech analysis/synthesis and modification using an analysis-by-synthesis/overlap-add sinusoidal model, IEEE Trans. Speech Audio Process. 5(5), 389-406 (1997)CrossRef

12.44.

R.J. McAulay, T.F. Quatieri: Computationally efficient sine-wave synthesis and its application to sinusoidal transform coding, IEEE ICASSP, Vol. 1 (1988) pp. 370-373

12.45.

S. Mallat, Z. Zhang: Matching pursuits with time-frequency dictionaries, IEEE Trans. Signal Process. 41(12), 3397-3415 (1993)CrossRefMATH

12.46.

M. Goodwin, M. Vetterli: Matching pursuit and atomic signal models based on recursive filter banks, IEEE Trans. Signal Process. 47(7), 1890-1902 (1999)CrossRef

12.47.

G. Davis: Adaptive Nonlinear Approximations (New York University, New York 1994), PhD Dissertation

12.48.

Y. Pati, R. Rezaiifar, P. Krishnaprasad: Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition, Conf. Record of the Twenty-Seventh Asilomar Conf. on Signals, Systems, and Comput., Vol. 1 (1993) pp. 40-44

12.49.

S. Chen, J. Wigger: Fast orthogonal least squares algorithm for efficient subset model selection, IEEE Trans. Signal Process. 43(7), 1713-1715 (1995)CrossRef

12.50.

B. Natarajan: Sparse approximate solutions to linear systems, SIAM J. Comput. 24(2), 227-234 (1995)MathSciNetCrossRefMATH

12.51.

S. Singhal, B. Atal: Amplitude optimization and pitch prediction in multipulse coders, IEEE Trans. Acoust. Speech 37(3), 317-327 (1989)CrossRef

12.52.

J. Adler, B. Rao, K. Kreutz-Delgado: Comparison of basis selection methods, Conf. Record of the Thirtieth Asilomar Conf. on Signals, Systems, and Comput., Vol. 1 (1996) pp. 252-257

12.53.

L. Rebollo-Neira, D. Lowe: Optimized orthogonal matching pursuit approach, IEEE Signal Proc. Let. 9(4), 137-140 (2002)CrossRef

12.54.

G. Davis, S. Mallat, Z. Zhang: Adaptive time-frequency decompositions with matching pursuit, Opt. Eng. 33(7), 2183-2191 (1994)CrossRef

12.55.

M. Goodwin: Multiscale overlap-add sinusoidal modeling using matching pursuit and refinements, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (2001) pp. 207-210

12.56.

H. Purnhagen, N. Meine, B. Edler: Speeding up HILN - MPEG-4 parametric audio coding with reduced complexity, 109th Audio Eng. Soc. Conv. (2000), Preprint 5177

12.57.

K. Vos, R. Vafin, R. Heusdens, W. Kleijn: High-quality consistent analysis-synthesis in sinusoidal coding, 17th Audio Eng. Soc. Int. Conf. (1999) pp. 244-250

12.58.

C. Etemoglu, V. Cuperman: Matching pursuits sinusoidal speech coding, IEEE Trans. Speech Audio Process. 11(5), 413-424 (2003)CrossRef

12.59.

T.S. Verma, T. Meng: Sinusoidal modeling using frame-based perceptually weighted matching pursuits, IEEE ICASSP (1998)

12.60.

R. Heusdens, R. Vafin, W.B. Kleijn: Sinusoidal modeling using psychoacoustic-adaptive matching pursuits, IEEE Signal Proc. Lett. 9(8), 262-265 (2002)CrossRef

12.61.

J. Laroche, Y. Stylianou, E. Moulines: HNM: A simple, efficient harmonic + noise model for speech, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (1993) pp. 169-172

12.62.

M. Goodwin: Residual modeling in music analysis-synthesis, IEEE ICASSP, Vol. 2 (1996) pp. 1005-1008

12.63.

K. Hamdy, M. Ali, A. Tewfik: Low bit rate high quality audio coding with combined harmonic and wavelet representations, IEEE ICASSP, Vol. 2 (1996) pp. 1045-1048

12.64.

A. Oomen, A. Den Brinker: Sinusoids plus noise modeling for audio signals, 17th Audio Eng. Soc. Int. Conf. (1999) pp. 226-232

12.65.

S. Levine, T. Verma, J. Smith: Alias-free, multiresolution sinusoidal modeling for polyphonic wideband audio, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (1997)

12.66.

M. Goodwin, C. Avendano: Frequency-domain algorithms for audio signal enhancement based on transient modification, J. Audio Eng. Soc. 54(9), 827-840 (2006)

12.67.

A.V. Oppenheim, R.W. Schafer: Discrete-Time Signal Processing (Prentice-Hall, Englewood Cliffs 1989)MATH

12.68.

M. Goodwin, M. Wolters, R. Sridharan: Post-processing and computation in parametric and transform audio coders, AES 22nd Int. Conf.: Virtual, Synthetic, and Entertainment Audio (2002) pp. 149-158

12.69.

S.F. Boll: Suppression of acoustic noise in speech using spectral subtraction, IEEE Trans. Acoust. Speech 27(2), 113-120 (1979)CrossRef

12.70.

D.L. Donoho: De-noising by soft-thresholding, IEEE Trans. Inform. Theory 41(3), 613-627 (1995)MathSciNetCrossRefMATH

12.71.

S. Mallat, D. Donoho, A. Willsky: Best basis algorithm for signal enhancement, IEEE ICASSP, Vol. 3 (1995) pp. 1561-1564

12.72.

J.M. Kates: Speech enhancement based on a sinusoidal model, J. Speech Hear. Res. 37(2), 449-464 (1994)CrossRef

12.73.

T.F. Quatieri, R.J. McAulay: Speech transformations based on a sinusoidal representation, IEEE Trans. Acoust. Speech 34(6), 1449-1464 (1986)CrossRef

12.74.

T.F. Quatieri, R.J. McAulay: Shape invariant time-scale and pitch modification of speech, IEEE Trans. Signal Process. 40(3), 497-510 (1992)CrossRef

12.75.

D.L. Jones, T.W. Parks: Generation and combination of grains for music synthesis, Comput. Music J. 12(2), 27-34 (1988)CrossRef

12.76.

C.A. Rodbro, M.G. Christensen, S.H. Jensen, S.V. Andersen: Compressed domain packet loss concealment of sinusoidally coded speech, IEEE ICASSP, Vol. 1 (2003) pp. 104-107

12.77.

G. Wolberg: Recent advances in image morphing, Proc. of Comput. Graph. Int. (1996) pp. 64-71

12.78.

M. Slaney, M. Covell, B. Lassiter: Automatic audio morphing, IEEE ICASSP, Vol. 2 (1996) pp. 1001-1004

12.79.

H. Purnhagen, B. Edler, C. Ferekidis: Object-based analysis/synthesis audio coder for very low bit rates, 104th Audio Eng. Soc. Conv. (1998), Preprint 4747

12.80.

M. Goodwin, C. Avendano: Parametric coding and frequency-domain processing for multichannel audio applications, AES 24th Int. Conf.: Multichannel Audio (2003) pp. 280-285

12.81.

M.G. Christensen: Estimation and modeling problems in parametric audio coding. Ph.D. Thesis (Aalborg University, Aalborg 2005)

12.82.

R.J. McAulay, T.F. Quatieri: Multirate sinusoidal transform coding at rates from 2.4 kbps to 8 kbps, IEEE ICASSP, Vol. 3 (1987) pp. 1645-1648

12.83.

S. Ahmadi: New techniques for sinusoidal coding of speech at 2400 bps, Conf. Record of the Thirtieth Asilomar Conf. on Signals, Systems, and Comput., Vol. 1 (1996) pp. 770-774

Titel: The STFT, Sinusoidal Models, and Speech Modification
verfasst von: Michael M. Goodwin, Ph.D
Verlag: Springer Berlin Heidelberg
Buch: Springer Handbook of Speech Processing
Print ISBN: 978-3-540-49125-5

Electronic ISBN: 978-3-540-49127-9

Copyright-Jahr: 2008
DOI: https://doi.org/10.1007/978-3-540-49127-9_12

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Jonas Klose/© Pine Valley Capital GmbH, Carina Kießling von der Strategieberatung Roland Berger/© Monika Walther Fotografie | ATZ, Beijing Auto Show 2024: Deutsche Hersteller wollen angreifen./© EKH-Pictures / Generated with AI / Stock.adobe.com, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.