Skip to main content

2008 | OriginalPaper | Buchkapitel

12. The STFT, Sinusoidal Models, and Speech Modification

verfasst von : Michael M. Goodwin, Ph.D

Erschienen in: Springer Handbook of Speech Processing

Verlag: Springer Berlin Heidelberg

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Frequency-domain signal representations are used for a wide variety of applications in speech processing. In this Chapter, we first consider the short-time Fourier transform (STFT), presenting a number of interpretations of the analysis-synthesis process in a consistent mathematical framework. We then develop the sinusoidal model as a parametric extension of the STFT wherein the data in the STFT is compacted, sacrificing perfect reconstruction at the benefit of achieving a sparser and essentially more meaningful representation. We discuss several methods for sinusoidal parameter estimation and signal reconstruction, and present a detailed treatment of a matching pursuit algorithm for sinusoidal modeling. The final part of the Chapter addresses speech modifications such as filtering, enhancement, and time-scaling, for which both the STFT and the sinusoidal model are effective tools.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
12.1.
Zurück zum Zitat L.R. Rabiner, R.W. Schafer: Digital Processing of Speech Signals (Prentice-Hall, Englewood Cliffs 1978) L.R. Rabiner, R.W. Schafer: Digital Processing of Speech Signals (Prentice-Hall, Englewood Cliffs 1978)
12.2.
Zurück zum Zitat D. Gabor: Theory of communication, J. IEE 93(III-26), 429-457 (1946) D. Gabor: Theory of communication, J. IEE 93(III-26), 429-457 (1946)
12.3.
Zurück zum Zitat D. Gabor: Acoustical quanta and the theory of hearing, Nature 159(4044), 591-594 (1947)CrossRef D. Gabor: Acoustical quanta and the theory of hearing, Nature 159(4044), 591-594 (1947)CrossRef
12.4.
Zurück zum Zitat M. Vetterli, J. Kovačević: Wavelets and Subband Coding (Prentice-Hall, Englewood Cliffs 1995)MATH M. Vetterli, J. Kovačević: Wavelets and Subband Coding (Prentice-Hall, Englewood Cliffs 1995)MATH
12.5.
Zurück zum Zitat P.P. Vaidyanathan: Multirate Systems and Filter Banks (Prentice-Hall, Englewood Cliffs 1993)MATH P.P. Vaidyanathan: Multirate Systems and Filter Banks (Prentice-Hall, Englewood Cliffs 1993)MATH
12.6.
Zurück zum Zitat T.F. Quatieri: Discrete-Time Speech Signal Processing (Prentice-Hall, Upper Saddle River 2002) T.F. Quatieri: Discrete-Time Speech Signal Processing (Prentice-Hall, Upper Saddle River 2002)
12.7.
Zurück zum Zitat J. Cooley, J. Tukey: An algorithm for the machine calculation of complex Fourier series, Math. Comput. 19(90), 297-301 (1965)MathSciNetCrossRefMATH J. Cooley, J. Tukey: An algorithm for the machine calculation of complex Fourier series, Math. Comput. 19(90), 297-301 (1965)MathSciNetCrossRefMATH
12.8.
Zurück zum Zitat P. Duhamel, M. Vetterli: Fast Fourier transforms: a tutorial review and a state of the art, Signal Process. 4(19), 259-299 (1990)MathSciNetCrossRefMATH P. Duhamel, M. Vetterli: Fast Fourier transforms: a tutorial review and a state of the art, Signal Process. 4(19), 259-299 (1990)MathSciNetCrossRefMATH
12.9.
Zurück zum Zitat R. Schafer, L. Rabiner: Design and simulation of a speech analysis-synthesis system based on short-time Fourier analysis, IEEE Trans. Audio Electroacoust. AU-21(3), 165-174 (1973)CrossRef R. Schafer, L. Rabiner: Design and simulation of a speech analysis-synthesis system based on short-time Fourier analysis, IEEE Trans. Audio Electroacoust. AU-21(3), 165-174 (1973)CrossRef
12.10.
Zurück zum Zitat M.R. Portnoff: Implementation of the digital phase vocoder using the fast Fourier transform, IEEE Trans. Acoust. Speech 24(3), 243-248 (1976)CrossRef M.R. Portnoff: Implementation of the digital phase vocoder using the fast Fourier transform, IEEE Trans. Acoust. Speech 24(3), 243-248 (1976)CrossRef
12.11.
Zurück zum Zitat J. Allen: Short term spectral analysis, synthesis, and modification by discrete Fourier transform, IEEE Trans. Acoust. Speech 25(3), 235-238 (1977)CrossRefMATH J. Allen: Short term spectral analysis, synthesis, and modification by discrete Fourier transform, IEEE Trans. Acoust. Speech 25(3), 235-238 (1977)CrossRefMATH
12.12.
Zurück zum Zitat J. Allen, L. Rabiner: A unified approach to short-time Fourier analysis and synthesis, Proc. IEEE 65(11), 1558-1564 (1977)CrossRef J. Allen, L. Rabiner: A unified approach to short-time Fourier analysis and synthesis, Proc. IEEE 65(11), 1558-1564 (1977)CrossRef
12.13.
Zurück zum Zitat J.O. Smith: Mathematics of the Discrete Fourier Transform (DFT), 2nd edn. (Booksurge, Seattle 2007), http://ccrma.stanford.edu/jos/mdft/ J.O. Smith: Mathematics of the Discrete Fourier Transform (DFT), 2nd edn. (Booksurge, Seattle 2007), http://​ccrma.​stanford.​edu/​jos/​mdft/​
12.14.
Zurück zum Zitat M.R. Portnoff: Time-frequency representation of digital signals and systems based on short-time Fourier analysis, IEEE Trans. Acoust. Speech 28(1), 55-69 (1980)CrossRefMATH M.R. Portnoff: Time-frequency representation of digital signals and systems based on short-time Fourier analysis, IEEE Trans. Acoust. Speech 28(1), 55-69 (1980)CrossRefMATH
12.15.
Zurück zum Zitat R. Crochiere: A weighted overlap-add method of short-time Fourier analysis/synthesis, IEEE Trans. Acoust. Speech 28(1), 99-102 (1980)CrossRef R. Crochiere: A weighted overlap-add method of short-time Fourier analysis/synthesis, IEEE Trans. Acoust. Speech 28(1), 99-102 (1980)CrossRef
12.16.
Zurück zum Zitat F.J. Harris: On the use of windows for harmonic analysis with the discrete Fourier transform, Proc. IEEE 66(1), 51-83 (1978)CrossRef F.J. Harris: On the use of windows for harmonic analysis with the discrete Fourier transform, Proc. IEEE 66(1), 51-83 (1978)CrossRef
12.17.
Zurück zum Zitat A.H. Nuttall: Some windows with very good sidelobe behavior, IEEE Trans. Acoust. Speech 29(1), 84-91 (1981)CrossRef A.H. Nuttall: Some windows with very good sidelobe behavior, IEEE Trans. Acoust. Speech 29(1), 84-91 (1981)CrossRef
12.18.
Zurück zum Zitat X. Rodet, P. Depalle: Spectral envelopes and inverse FFT synthesis, Proc. 93rd Conv. of the Audio Eng. Soc. (1992), Preprint 3393 X. Rodet, P. Depalle: Spectral envelopes and inverse FFT synthesis, Proc. 93rd Conv. of the Audio Eng. Soc. (1992), Preprint 3393
12.19.
Zurück zum Zitat M. Goodwin: Adaptive Signal Models: Theory, Algorithms, and Audio Applications (Kluwer Academic, Boston 1998)CrossRef M. Goodwin: Adaptive Signal Models: Theory, Algorithms, and Audio Applications (Kluwer Academic, Boston 1998)CrossRef
12.20.
Zurück zum Zitat D. Griffin, J. Lim: Signal estimation from modified short-time Fourier transform, IEEE Trans. Acoust. Speech 32(2), 236-243 (1984)CrossRef D. Griffin, J. Lim: Signal estimation from modified short-time Fourier transform, IEEE Trans. Acoust. Speech 32(2), 236-243 (1984)CrossRef
12.21.
Zurück zum Zitat H.S. Malvar: Signal Processing with Lapped Transforms (Artech House, Boston 1992)MATH H.S. Malvar: Signal Processing with Lapped Transforms (Artech House, Boston 1992)MATH
12.22.
Zurück zum Zitat Z. Czetkovic: Overcomplete Expansions for Digital Signal Processing (Univ. California, Berkeley 1995), PhD Dissertation Z. Czetkovic: Overcomplete Expansions for Digital Signal Processing (Univ. California, Berkeley 1995), PhD Dissertation
12.23.
Zurück zum Zitat Z. Czetkovic, M. Vetterli: Oversampled filter banks, IEEE Trans. Signal Process. 46(5), 1245-1255 (1998)CrossRef Z. Czetkovic, M. Vetterli: Oversampled filter banks, IEEE Trans. Signal Process. 46(5), 1245-1255 (1998)CrossRef
12.24.
Zurück zum Zitat H. Bölcskei, F. Hlawatsch: Oversampled filter banks: Optimal noise shaping, design freedom, and noise analysis, IEEE ICASSP, Vol. 3 (1997) pp. 2453-2456 H. Bölcskei, F. Hlawatsch: Oversampled filter banks: Optimal noise shaping, design freedom, and noise analysis, IEEE ICASSP, Vol. 3 (1997) pp. 2453-2456
12.25.
Zurück zum Zitat F. Léonard: Referencing the phase to the centre of the spectral window. Why?, Mech. Syst. Signal Process. 2(1), 75-90 (1997)CrossRef F. Léonard: Referencing the phase to the centre of the spectral window. Why?, Mech. Syst. Signal Process. 2(1), 75-90 (1997)CrossRef
12.26.
Zurück zum Zitat M. Bosi, R. Goldberg: Introduction to Digital Audio Coding and Standards (Kluwer Academic, Boston 2003)CrossRef M. Bosi, R. Goldberg: Introduction to Digital Audio Coding and Standards (Kluwer Academic, Boston 2003)CrossRef
12.27.
Zurück zum Zitat J.P. Princen, A.B. Bradley: Analysis/synthesis filter bank design based on time domain aliasing cancellation, IEEE Trans. Acoust. Speech 34(5), 1153-1161 (1986)CrossRef J.P. Princen, A.B. Bradley: Analysis/synthesis filter bank design based on time domain aliasing cancellation, IEEE Trans. Acoust. Speech 34(5), 1153-1161 (1986)CrossRef
12.28.
Zurück zum Zitat H.S. Malvar, D.H. Staelin: The LOT: Transform coding without blocking effects, IEEE Trans. Acoust. Speech 37(4), 553-559 (1989)CrossRef H.S. Malvar, D.H. Staelin: The LOT: Transform coding without blocking effects, IEEE Trans. Acoust. Speech 37(4), 553-559 (1989)CrossRef
12.29.
Zurück zum Zitat H. Dudley: The vocoder, Bell Lab. Rec. 18, 122-126 (1939) H. Dudley: The vocoder, Bell Lab. Rec. 18, 122-126 (1939)
12.30.
Zurück zum Zitat J.L. Flanagan, R.M. Golden: Phase vocoder, Bell Syst. Tech. J. 45(9), 1493-1509 (1966)CrossRef J.L. Flanagan, R.M. Golden: Phase vocoder, Bell Syst. Tech. J. 45(9), 1493-1509 (1966)CrossRef
12.31.
Zurück zum Zitat E. Moulines, J. Laroche: Non-parametric techniques for pitch-scale and time-scale modification of speech, Speech Commun. 16(2), 175-205 (1995)CrossRef E. Moulines, J. Laroche: Non-parametric techniques for pitch-scale and time-scale modification of speech, Speech Commun. 16(2), 175-205 (1995)CrossRef
12.32.
Zurück zum Zitat J.A. Moorer: The use of the phase vocoder in computer music applications, J. Audio Eng. Soc. 26(1/2), 42-45 (1978) J.A. Moorer: The use of the phase vocoder in computer music applications, J. Audio Eng. Soc. 26(1/2), 42-45 (1978)
12.33.
Zurück zum Zitat M. Dolson: The phase vocoder: A tutorial, Comput. Music J. 10(4), 14-27 (1986)CrossRef M. Dolson: The phase vocoder: A tutorial, Comput. Music J. 10(4), 14-27 (1986)CrossRef
12.34.
Zurück zum Zitat J. Laroche, M. Dolson: Improved phase vocoder time-scale modification of audio, IEEE Trans. Speech Audio Process. 7(3), 323-332 (1999)CrossRef J. Laroche, M. Dolson: Improved phase vocoder time-scale modification of audio, IEEE Trans. Speech Audio Process. 7(3), 323-332 (1999)CrossRef
12.35.
Zurück zum Zitat J. Laroche, M. Dolson: New phase-vocoder techniques for real-time pitch shifting, chorusing, harmonizing, and other exotic audio modifications, J. Audio Eng. Soc. 47(11), 928-936 (1999) J. Laroche, M. Dolson: New phase-vocoder techniques for real-time pitch shifting, chorusing, harmonizing, and other exotic audio modifications, J. Audio Eng. Soc. 47(11), 928-936 (1999)
12.36.
Zurück zum Zitat J. Laroche, M. Dolson: About this phasiness business, Proc. IEEE Workshop on Applications of Signal Process. to Audio and Acoust. (1997) J. Laroche, M. Dolson: About this phasiness business, Proc. IEEE Workshop on Applications of Signal Process. to Audio and Acoust. (1997)
12.37.
Zurück zum Zitat R.J. McAulay, T.F. Quatieri: Speech analysis/synthesis based on a sinusoidal representation, IEEE Trans. Acoust. Speech 34(4), 744-754 (1986)CrossRef R.J. McAulay, T.F. Quatieri: Speech analysis/synthesis based on a sinusoidal representation, IEEE Trans. Acoust. Speech 34(4), 744-754 (1986)CrossRef
12.38.
Zurück zum Zitat X. Serra, J. Smith: Spectral modeling synthesis: A sound analysis/synthesis system based on a deterministic plus stochastic decomposition, Comput. Music J. 14(4), 12-24 (1990)CrossRef X. Serra, J. Smith: Spectral modeling synthesis: A sound analysis/synthesis system based on a deterministic plus stochastic decomposition, Comput. Music J. 14(4), 12-24 (1990)CrossRef
12.39.
Zurück zum Zitat E.B. George, M.J.T. Smith: Analysis-by-synthesis/ overlap-add sinusoidal modeling applied to the analysis and synthesis of musical tones, J. Audio Eng. Soc. 40(6), 497-516 (1992) E.B. George, M.J.T. Smith: Analysis-by-synthesis/ overlap-add sinusoidal modeling applied to the analysis and synthesis of musical tones, J. Audio Eng. Soc. 40(6), 497-516 (1992)
12.40.
Zurück zum Zitat P. Depalle, G. Garcia, X. Rodet: Tracking of partials for additive sound synthesis using hidden Markov models, IEEE ICASSP, Vol. 1 (1993) pp. 225-228 P. Depalle, G. Garcia, X. Rodet: Tracking of partials for additive sound synthesis using hidden Markov models, IEEE ICASSP, Vol. 1 (1993) pp. 225-228
12.41.
Zurück zum Zitat S. Levine, J.O. Smith: A sines+transients+noise audio representation for data compression and time/pitch scale modifications, 105th Audio Eng. Soc. Conv. (1998), Preprint 4781. S. Levine, J.O. Smith: A sines+transients+noise audio representation for data compression and time/pitch scale modifications, 105th Audio Eng. Soc. Conv. (1998), Preprint 4781.
12.42.
Zurück zum Zitat M. Lagrange, S. Marchand, J.-B. Rault: Using linear prediction to enhance the tracking of partials, IEEE ICASSP, Vol. 4 (2004) pp. 241-244 M. Lagrange, S. Marchand, J.-B. Rault: Using linear prediction to enhance the tracking of partials, IEEE ICASSP, Vol. 4 (2004) pp. 241-244
12.43.
Zurück zum Zitat E.B. George, M.J.T. Smith: Speech analysis/synthesis and modification using an analysis-by-synthesis/overlap-add sinusoidal model, IEEE Trans. Speech Audio Process. 5(5), 389-406 (1997)CrossRef E.B. George, M.J.T. Smith: Speech analysis/synthesis and modification using an analysis-by-synthesis/overlap-add sinusoidal model, IEEE Trans. Speech Audio Process. 5(5), 389-406 (1997)CrossRef
12.44.
Zurück zum Zitat R.J. McAulay, T.F. Quatieri: Computationally efficient sine-wave synthesis and its application to sinusoidal transform coding, IEEE ICASSP, Vol. 1 (1988) pp. 370-373 R.J. McAulay, T.F. Quatieri: Computationally efficient sine-wave synthesis and its application to sinusoidal transform coding, IEEE ICASSP, Vol. 1 (1988) pp. 370-373
12.45.
Zurück zum Zitat S. Mallat, Z. Zhang: Matching pursuits with time-frequency dictionaries, IEEE Trans. Signal Process. 41(12), 3397-3415 (1993)CrossRefMATH S. Mallat, Z. Zhang: Matching pursuits with time-frequency dictionaries, IEEE Trans. Signal Process. 41(12), 3397-3415 (1993)CrossRefMATH
12.46.
Zurück zum Zitat M. Goodwin, M. Vetterli: Matching pursuit and atomic signal models based on recursive filter banks, IEEE Trans. Signal Process. 47(7), 1890-1902 (1999)CrossRef M. Goodwin, M. Vetterli: Matching pursuit and atomic signal models based on recursive filter banks, IEEE Trans. Signal Process. 47(7), 1890-1902 (1999)CrossRef
12.47.
Zurück zum Zitat G. Davis: Adaptive Nonlinear Approximations (New York University, New York 1994), PhD Dissertation G. Davis: Adaptive Nonlinear Approximations (New York University, New York 1994), PhD Dissertation
12.48.
Zurück zum Zitat Y. Pati, R. Rezaiifar, P. Krishnaprasad: Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition, Conf. Record of the Twenty-Seventh Asilomar Conf. on Signals, Systems, and Comput., Vol. 1 (1993) pp. 40-44 Y. Pati, R. Rezaiifar, P. Krishnaprasad: Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition, Conf. Record of the Twenty-Seventh Asilomar Conf. on Signals, Systems, and Comput., Vol. 1 (1993) pp. 40-44
12.49.
Zurück zum Zitat S. Chen, J. Wigger: Fast orthogonal least squares algorithm for efficient subset model selection, IEEE Trans. Signal Process. 43(7), 1713-1715 (1995)CrossRef S. Chen, J. Wigger: Fast orthogonal least squares algorithm for efficient subset model selection, IEEE Trans. Signal Process. 43(7), 1713-1715 (1995)CrossRef
12.51.
Zurück zum Zitat S. Singhal, B. Atal: Amplitude optimization and pitch prediction in multipulse coders, IEEE Trans. Acoust. Speech 37(3), 317-327 (1989)CrossRef S. Singhal, B. Atal: Amplitude optimization and pitch prediction in multipulse coders, IEEE Trans. Acoust. Speech 37(3), 317-327 (1989)CrossRef
12.52.
Zurück zum Zitat J. Adler, B. Rao, K. Kreutz-Delgado: Comparison of basis selection methods, Conf. Record of the Thirtieth Asilomar Conf. on Signals, Systems, and Comput., Vol. 1 (1996) pp. 252-257 J. Adler, B. Rao, K. Kreutz-Delgado: Comparison of basis selection methods, Conf. Record of the Thirtieth Asilomar Conf. on Signals, Systems, and Comput., Vol. 1 (1996) pp. 252-257
12.53.
Zurück zum Zitat L. Rebollo-Neira, D. Lowe: Optimized orthogonal matching pursuit approach, IEEE Signal Proc. Let. 9(4), 137-140 (2002)CrossRef L. Rebollo-Neira, D. Lowe: Optimized orthogonal matching pursuit approach, IEEE Signal Proc. Let. 9(4), 137-140 (2002)CrossRef
12.54.
Zurück zum Zitat G. Davis, S. Mallat, Z. Zhang: Adaptive time-frequency decompositions with matching pursuit, Opt. Eng. 33(7), 2183-2191 (1994)CrossRef G. Davis, S. Mallat, Z. Zhang: Adaptive time-frequency decompositions with matching pursuit, Opt. Eng. 33(7), 2183-2191 (1994)CrossRef
12.55.
Zurück zum Zitat M. Goodwin: Multiscale overlap-add sinusoidal modeling using matching pursuit and refinements, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (2001) pp. 207-210 M. Goodwin: Multiscale overlap-add sinusoidal modeling using matching pursuit and refinements, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (2001) pp. 207-210
12.56.
Zurück zum Zitat H. Purnhagen, N. Meine, B. Edler: Speeding up HILN - MPEG-4 parametric audio coding with reduced complexity, 109th Audio Eng. Soc. Conv. (2000), Preprint 5177 H. Purnhagen, N. Meine, B. Edler: Speeding up HILN - MPEG-4 parametric audio coding with reduced complexity, 109th Audio Eng. Soc. Conv. (2000), Preprint 5177
12.57.
Zurück zum Zitat K. Vos, R. Vafin, R. Heusdens, W. Kleijn: High-quality consistent analysis-synthesis in sinusoidal coding, 17th Audio Eng. Soc. Int. Conf. (1999) pp. 244-250 K. Vos, R. Vafin, R. Heusdens, W. Kleijn: High-quality consistent analysis-synthesis in sinusoidal coding, 17th Audio Eng. Soc. Int. Conf. (1999) pp. 244-250
12.58.
Zurück zum Zitat C. Etemoglu, V. Cuperman: Matching pursuits sinusoidal speech coding, IEEE Trans. Speech Audio Process. 11(5), 413-424 (2003)CrossRef C. Etemoglu, V. Cuperman: Matching pursuits sinusoidal speech coding, IEEE Trans. Speech Audio Process. 11(5), 413-424 (2003)CrossRef
12.59.
Zurück zum Zitat T.S. Verma, T. Meng: Sinusoidal modeling using frame-based perceptually weighted matching pursuits, IEEE ICASSP (1998) T.S. Verma, T. Meng: Sinusoidal modeling using frame-based perceptually weighted matching pursuits, IEEE ICASSP (1998)
12.60.
Zurück zum Zitat R. Heusdens, R. Vafin, W.B. Kleijn: Sinusoidal modeling using psychoacoustic-adaptive matching pursuits, IEEE Signal Proc. Lett. 9(8), 262-265 (2002)CrossRef R. Heusdens, R. Vafin, W.B. Kleijn: Sinusoidal modeling using psychoacoustic-adaptive matching pursuits, IEEE Signal Proc. Lett. 9(8), 262-265 (2002)CrossRef
12.61.
Zurück zum Zitat J. Laroche, Y. Stylianou, E. Moulines: HNM: A simple, efficient harmonic + noise model for speech, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (1993) pp. 169-172 J. Laroche, Y. Stylianou, E. Moulines: HNM: A simple, efficient harmonic + noise model for speech, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (1993) pp. 169-172
12.62.
Zurück zum Zitat M. Goodwin: Residual modeling in music analysis-synthesis, IEEE ICASSP, Vol. 2 (1996) pp. 1005-1008 M. Goodwin: Residual modeling in music analysis-synthesis, IEEE ICASSP, Vol. 2 (1996) pp. 1005-1008
12.63.
Zurück zum Zitat K. Hamdy, M. Ali, A. Tewfik: Low bit rate high quality audio coding with combined harmonic and wavelet representations, IEEE ICASSP, Vol. 2 (1996) pp. 1045-1048 K. Hamdy, M. Ali, A. Tewfik: Low bit rate high quality audio coding with combined harmonic and wavelet representations, IEEE ICASSP, Vol. 2 (1996) pp. 1045-1048
12.64.
Zurück zum Zitat A. Oomen, A. Den Brinker: Sinusoids plus noise modeling for audio signals, 17th Audio Eng. Soc. Int. Conf. (1999) pp. 226-232 A. Oomen, A. Den Brinker: Sinusoids plus noise modeling for audio signals, 17th Audio Eng. Soc. Int. Conf. (1999) pp. 226-232
12.65.
Zurück zum Zitat S. Levine, T. Verma, J. Smith: Alias-free, multiresolution sinusoidal modeling for polyphonic wideband audio, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (1997) S. Levine, T. Verma, J. Smith: Alias-free, multiresolution sinusoidal modeling for polyphonic wideband audio, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (1997)
12.66.
Zurück zum Zitat M. Goodwin, C. Avendano: Frequency-domain algorithms for audio signal enhancement based on transient modification, J. Audio Eng. Soc. 54(9), 827-840 (2006) M. Goodwin, C. Avendano: Frequency-domain algorithms for audio signal enhancement based on transient modification, J. Audio Eng. Soc. 54(9), 827-840 (2006)
12.67.
Zurück zum Zitat A.V. Oppenheim, R.W. Schafer: Discrete-Time Signal Processing (Prentice-Hall, Englewood Cliffs 1989)MATH A.V. Oppenheim, R.W. Schafer: Discrete-Time Signal Processing (Prentice-Hall, Englewood Cliffs 1989)MATH
12.68.
Zurück zum Zitat M. Goodwin, M. Wolters, R. Sridharan: Post-processing and computation in parametric and transform audio coders, AES 22nd Int. Conf.: Virtual, Synthetic, and Entertainment Audio (2002) pp. 149-158 M. Goodwin, M. Wolters, R. Sridharan: Post-processing and computation in parametric and transform audio coders, AES 22nd Int. Conf.: Virtual, Synthetic, and Entertainment Audio (2002) pp. 149-158
12.69.
Zurück zum Zitat S.F. Boll: Suppression of acoustic noise in speech using spectral subtraction, IEEE Trans. Acoust. Speech 27(2), 113-120 (1979)CrossRef S.F. Boll: Suppression of acoustic noise in speech using spectral subtraction, IEEE Trans. Acoust. Speech 27(2), 113-120 (1979)CrossRef
12.71.
Zurück zum Zitat S. Mallat, D. Donoho, A. Willsky: Best basis algorithm for signal enhancement, IEEE ICASSP, Vol. 3 (1995) pp. 1561-1564 S. Mallat, D. Donoho, A. Willsky: Best basis algorithm for signal enhancement, IEEE ICASSP, Vol. 3 (1995) pp. 1561-1564
12.72.
Zurück zum Zitat J.M. Kates: Speech enhancement based on a sinusoidal model, J. Speech Hear. Res. 37(2), 449-464 (1994)CrossRef J.M. Kates: Speech enhancement based on a sinusoidal model, J. Speech Hear. Res. 37(2), 449-464 (1994)CrossRef
12.73.
Zurück zum Zitat T.F. Quatieri, R.J. McAulay: Speech transformations based on a sinusoidal representation, IEEE Trans. Acoust. Speech 34(6), 1449-1464 (1986)CrossRef T.F. Quatieri, R.J. McAulay: Speech transformations based on a sinusoidal representation, IEEE Trans. Acoust. Speech 34(6), 1449-1464 (1986)CrossRef
12.74.
Zurück zum Zitat T.F. Quatieri, R.J. McAulay: Shape invariant time-scale and pitch modification of speech, IEEE Trans. Signal Process. 40(3), 497-510 (1992)CrossRef T.F. Quatieri, R.J. McAulay: Shape invariant time-scale and pitch modification of speech, IEEE Trans. Signal Process. 40(3), 497-510 (1992)CrossRef
12.75.
Zurück zum Zitat D.L. Jones, T.W. Parks: Generation and combination of grains for music synthesis, Comput. Music J. 12(2), 27-34 (1988)CrossRef D.L. Jones, T.W. Parks: Generation and combination of grains for music synthesis, Comput. Music J. 12(2), 27-34 (1988)CrossRef
12.76.
Zurück zum Zitat C.A. Rodbro, M.G. Christensen, S.H. Jensen, S.V. Andersen: Compressed domain packet loss concealment of sinusoidally coded speech, IEEE ICASSP, Vol. 1 (2003) pp. 104-107 C.A. Rodbro, M.G. Christensen, S.H. Jensen, S.V. Andersen: Compressed domain packet loss concealment of sinusoidally coded speech, IEEE ICASSP, Vol. 1 (2003) pp. 104-107
12.77.
Zurück zum Zitat G. Wolberg: Recent advances in image morphing, Proc. of Comput. Graph. Int. (1996) pp. 64-71 G. Wolberg: Recent advances in image morphing, Proc. of Comput. Graph. Int. (1996) pp. 64-71
12.78.
Zurück zum Zitat M. Slaney, M. Covell, B. Lassiter: Automatic audio morphing, IEEE ICASSP, Vol. 2 (1996) pp. 1001-1004 M. Slaney, M. Covell, B. Lassiter: Automatic audio morphing, IEEE ICASSP, Vol. 2 (1996) pp. 1001-1004
12.79.
Zurück zum Zitat H. Purnhagen, B. Edler, C. Ferekidis: Object-based analysis/synthesis audio coder for very low bit rates, 104th Audio Eng. Soc. Conv. (1998), Preprint 4747 H. Purnhagen, B. Edler, C. Ferekidis: Object-based analysis/synthesis audio coder for very low bit rates, 104th Audio Eng. Soc. Conv. (1998), Preprint 4747
12.80.
Zurück zum Zitat M. Goodwin, C. Avendano: Parametric coding and frequency-domain processing for multichannel audio applications, AES 24th Int. Conf.: Multichannel Audio (2003) pp. 280-285 M. Goodwin, C. Avendano: Parametric coding and frequency-domain processing for multichannel audio applications, AES 24th Int. Conf.: Multichannel Audio (2003) pp. 280-285
12.81.
Zurück zum Zitat M.G. Christensen: Estimation and modeling problems in parametric audio coding. Ph.D. Thesis (Aalborg University, Aalborg 2005) M.G. Christensen: Estimation and modeling problems in parametric audio coding. Ph.D. Thesis (Aalborg University, Aalborg 2005)
12.82.
Zurück zum Zitat R.J. McAulay, T.F. Quatieri: Multirate sinusoidal transform coding at rates from 2.4 kbps to 8 kbps, IEEE ICASSP, Vol. 3 (1987) pp. 1645-1648 R.J. McAulay, T.F. Quatieri: Multirate sinusoidal transform coding at rates from 2.4 kbps to 8 kbps, IEEE ICASSP, Vol. 3 (1987) pp. 1645-1648
12.83.
Zurück zum Zitat S. Ahmadi: New techniques for sinusoidal coding of speech at 2400 bps, Conf. Record of the Thirtieth Asilomar Conf. on Signals, Systems, and Comput., Vol. 1 (1996) pp. 770-774 S. Ahmadi: New techniques for sinusoidal coding of speech at 2400 bps, Conf. Record of the Thirtieth Asilomar Conf. on Signals, Systems, and Comput., Vol. 1 (1996) pp. 770-774
Metadaten
Titel
The STFT, Sinusoidal Models, and Speech Modification
verfasst von
Michael M. Goodwin, Ph.D
Copyright-Jahr
2008
Verlag
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-540-49127-9_12

Neuer Inhalt