nach oben

Erschienen in:

2015 | OriginalPaper | Buchkapitel

10. Modulation Processing for Speech Enhancement

verfasst von : Kuldip Paliwal, Belinda Schwerin

Erschienen in: Speech and Audio Processing for Coding, Enhancement and Recognition

Verlag: Springer New York

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Many of the traditionally speech enhancement methods reduce noise from corrupted speech by processing the magnitude spectrum in a short-time Fourier analysis-modification-synthesis (AMS) based framework. More recently, use of the modulation domain for speech processing has been investigated, however early efforts in this direction did not account for the changing properties of the modulation spectrum across time. Motivated by this and evidence of the significance of the modulation domain, we investigated the processing of the modulation spectrum on a short-time basis for speech enhancement. For this purpose, a modulation domain-based AMS framework was used, in which the trajectories of each acoustic frequency bin were processed frame-wise in a secondary AMS framework. A number of different enhancement algorithms were investigated for the enhancement of speech in the short-time modulation domain. These included spectral subtraction and MMSE magnitude estimation. In each case, the respective algorithm was used to modify the short-time modulation magnitude spectrum within the modulation AMS framework. Here we review the findings of this investigation, comparing the quality of stimuli enhanced using these modulation based approaches to stimuli enhanced using corresponding modification algorithms applied in the acoustic domain. Results presented show modulation domain based approaches to have improved quality compared to their acoustic domain counterparts. Further, MMSE modulation magnitude estimation (MME) is shown to have improved speech quality compared to Modulation spectral subtraction (ModSSub) stimuli. MME stimuli are found to have good removal of noise without the introduction of musical noise, problematic in spectral subtraction based enhancement. Results also show that ModSSub has minimal musical noise compared to acoustic Spectral subtraction, for appropriately selected modulation frame duration. For modulation domain based methods, modulation frame duration is shown to be an important parameter, with quality generally improved by use of shorter frame durations. From the results of experiments conducted, it is concluded that the short-time modulation domain provides an effective alternative to the short-time acoustic domain for speech processing. Further, that in this domain, MME provides effective noise suppression without the introduction of musical noise distortion.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Maximum A Posteriori Spectral Estimation with Source Log-Spectral Priors for Multichannel Speech Enhancement

Note that for references made to the magnitude, phase or complex spectra throughout this text, the STFT modifier is implied unless otherwise stated. The acoustic and modulation modifiers are also included to disambiguate between acoustic and modulation domains.

J. Allen, L. Rabiner, A unified approach to short-time Fourier analysis and synthesis. Proc. IEEE 65(11), 1558–1564 (1977)CrossRef

T. Arai, M. Pavel, H. Hermansky, C. Avendano, Intelligibility of speech with filtered time trajectories of spectral envelopes, in Proceedings of International Conference on Spoken Language Processing (ICSLP), Philadelphia, PA, Oct 1996, pp. 2490–2493

L. Atlas, Modulation spectral transforms: application to speech separation and modification. Tech. Rep. 155. IEICE, University of Washington, Washington, WA (2003)

L. Atlas, S. Shamma, Joint acoustic and modulation frequency. EURASIP J. Appl. Signal Process. 2003(7), 668–675 (2003)CrossRefMATH

L. Atlas, M. Vinton, Modulation frequency and efficient audio coding, in Proceedings of the SPIE The International Society for Optical Engineering, vol. 4474 (2001), pp. 1–8

S. Bacon, D. Grantham, Modulation masking: effects of modulation frequency, depth, and phase. J. Acoust. Soc. Am. 85(6), 2575–2580 (1989)CrossRef

M. Berouti, R. Schwartz, J. Makhoul, Enhancement of speech corrupted by acoustic noise, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 4., Washington, DC, Apr 1979, pp. 208–211

S. Boll, Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust. Speech Signal Process. 27(2), 113–120 (1979)CrossRef

O. Cappe, Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor. IEEE Trans. Speech Audio Process. 2(2), 345–349 (1994)CrossRef

10.

I. Cohen, Relaxed statistical model for speech enhancement and a priori SNR estimation. IEEE Trans. Speech Audio Process. 13(5), 870–881 (2005)CrossRef

11.

D. Depireux, J. Simon, D. Klein, S. Shamma, Spectrotemporal response field characterization with dynamic ripples in ferrect primary auditory cortex. J. Neurophysiol. 85(3), 1220–1234 (2001)

12.

R. Drullman, J. Festen, R. Plomp, Effect of reducing slow temporal modulations on speech reception. J. Acoust. Soc. Am. 95(5), 2670–2680 (1994)CrossRef

13.

R. Drullman, J. Festen, R. Plomp, Effect of temporal envelope smearing on speech reception. J. Acoust. Soc. Am. 95(2), 1053–1064 (1994)CrossRef

14.

Y. Ephraim, D. Malah, Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 32(6), 1109–1121 (1984)CrossRef

15.

Y. Ephraim, D. Malah, Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 33(2), 443–445 (1985)CrossRef

16.

T. Falk, S. Stadler, W.B. Kleijn, W.-Y. Chan, Noise suppression based on extending a speech-dominated modulation band, in Proceedings of the ISCA Conference of the International Speech Communication Association (INTERSPEECH), Antwerp, Aug 2007, pp. 970–973

17.

R. Goldsworthy, J. Greenberg, Analysis of speech-based speech transmission index methods with implications for nonlinear operations. J. Acoust. Soc. Am. 116(6), 3679–3689 (2004)CrossRef

18.

R. Gray, A. Buzo, A. Gray, Y. Matsuyama, Distortion measures for speech processing. IEEE Trans. Acoust. Speech Signal Process. 28(4), 367–376 (1980)CrossRefMATH

19.

S. Greenberg, T. Arai, The relation between speech intelligibility and the complex modulation spectrum, in Proceedings of the ISCA European Conference on Speech Communication and Technology (EUROSPEECH), Aalborg, Sept 2001, pp. 473–476

20.

D. Griffin, J. Lim, Signal estimation from modified short-time Fourier transform. IEEE Trans. Acoust. Speech Signal Process. 32(2), 236–243 (1984)CrossRef

21.

H. Hermansky, N. Morgan, RASTA processing of speech. IEEE Trans. Speech Audio Process. 2, 578–589 (1994)CrossRef

22.

H. Hermansky, E. Wan, C. Avendano, Speech enhancement based on temporal processing, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, Detroit, MI, May 1995, pp. 405–408

23.

T. Houtgast, H. Steeneken, A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria. J. Acoust. Soc. Am. 77(3), 1069–1077 (1985)CrossRef

24.

X. Huang, A. Acero, H. Hon, Spoken Language Processing: A Guide to Theory, Algorithm, and System Development (Prentice Hall, Upper Saddle River, 2001)

25.

S. Kamath, P. Loizou, A multi-band spectral subtraction method for enhancing speech corrupted by colored noise, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2002)

26.

N. Kanedera, T. Arai, H. Hermansky, M. Pavel, On the relative importance of various components of the modulation spectrum for automatic speech recognition. Speech Commun. 28(1), 43–55 (1999)CrossRef

27.

D. Kim, A cue for objective speech quality estimation in temporal envelope representations. IEEE Signal Process. Lett. 11(10), 849–852 (2004)CrossRef

28.

D. Kim, Anique: an auditory model for single-ended speech quality estimation. IEEE Trans. Speech Audio Process. 13(5), 821–831 (2005)CrossRef

29.

B. Kingsbury, N. Morgan, S. Greenberg, Robust speech recognition using the modulation spectrogram. Speech Commun. 25(1–3), 117–132 (1998)CrossRef

30.

T. Kinnunen, Joint acoustic-modulation frequency for speaker recognition, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1. Toulouse, May 2006, pp. 665–668

31.

T. Kinnunen, K. Lee, H. Li, Dimension reduction of the modulation spectrogram for speaker verification, in Proceedings of ISCA Speaker and Language Recognition Workshop (ODYSSEY), Stellenbosch, Jan 2008

32.

N. Kowalski, D. Depireux, S. Shamma, Analysis of dynamic spectra in ferret primary auditory cortex: I. Characteristics of single unit responses to moving ripple spectra. J. Neurophysiol. 76(5), 3503–3523 (1996)

33.

J. Lim, A. Oppenheim, Enhancement and bandwidth compression of noisy speech. Proc. IEEE 67(12), 1586–1604 (1979)CrossRef

34.

P. Loizou, Speech Enhancement: Theory and Practice (Taylor and Francis, Boca Raton, 2007)

35.

X. Lu, S. Matsuda, M. Unoki, S. Nakamura, Temporal contrast normalization and edge-preserved smoothing of temporal modulation structures of speech for robust speech recognition. Speech Commun. 52(1), 1–11 (2010)CrossRef

36.

J. Lyons, K. Paliwal, Effect of compressing the dynamic range of the power spectrum in modulation filtering based speech enhancement, in Proceedings of ISCA Conference of the International Speech Communication Association (INTERSPEECH), Brisbane, Sep 2008, pp. 387–390

37.

N. Malayath, H. Hermansky, S. Kajarekar, B. Yegnanarayana, Data-driven temporal filters and alternatives to GMM in speaker verification. Digit. Signal Proces. 10(1–3), 55–74 (2000)CrossRef

38.

R. McAulay, M. Malpass, Speech enhancement using a soft-decision noise suppression filter. IEEE Trans. Acoust. Speech Signal Process. 28(2), 137–145 (1980)CrossRef

39.

N. Mesgarani, S. Shamma, Speech enhancement based on filtering the spectrotemporal modulations, in Proceedings of IEEE International Conference Acoustics Speech and Signal Processing (ICASSP), vol. 1, Philadelphia, PA, Mar 2005, pp. 1105–1108

40.

C. Nadeu, P. Pachés-Leal, B.-H. Juang, Filtering the time sequences of spectral parameters for speech recognition. Speech Commun. 22(4), 315–332 (1997)CrossRef

41.

K. Paliwal, B. Schwerin, K. Wójcicki, Role of modulation magnitude and phase spectrum towards speech intelligibility. Speech Commun. 53(3), 327–339 (2011)CrossRef

42.

K. Paliwal, B. Schwerin, K. Wójcicki, Speech enhancement using minimum mean-square error short-time spectral modulation magnitude estimator. Speech Commun. 54(2), 282–305 (2012)CrossRef

43.

K. Paliwal, K. Wójcicki, Effect of analysis window duration on speech intelligibility. IEEE Signal Process. Lett. 15, 785–788 (2008)CrossRef

44.

K. Paliwal, K. Wójcicki, B. Schwerin, Single-channel speech enhancement using spectral subtraction in the short-time modulation domain. Speech Commun. 52(5), 450–475 (2010)CrossRef

45.

K. Payton, L. Braida, A method to determine the speech transmission index from speech waveforms. J. Acoust. Soc. Am. 106(6), 3637–3648 (1999)CrossRef

46.

J. Picone, Signal modeling techniques in speech recognition. Proc. IEEE 81(9), 1215–1247 (1993)CrossRef

47.

S. Quackenbush, T. Barnwell, M. Clements, Objective Measures of Speech Quality (Prentice Hall, Englewood Cliffs, 1988)

48.

T. Quatieri, Discrete-Time Speech Signal Processing: Principles and Practice (Prentice Hall, Upper Saddle River, 2002)

49.

L. Rabiner, R. Schafer, Theory and Applications of Digital Speech Processing (Pearson Higher Education, Upper Saddle River, 2011)

50.

A. Rix, J. Beerends, M. Hollier, A. Hekstra, Perceptual Evaluation of Speech Quality (PESQ), an objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs. ITU-T Recommendation P.862 (2001)

51.

P. Scalart, J. Filho, Speech enhancement based on a priori signal to noise estimation, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Process (ICASSP), vol. 2. Atlanta, GA, May 1996, pp. 629–632

52.

C. Schreiner, J. Urbas, Representation of amplitude modulation in the auditory cortex of the cat: I. The anterior auditory field (AAF). Hear. Res. 21(3), 227–241 (1986)

53.

B. Schwerin, K. Paliwal, Using STFT real and imaginary parts of modulation signals for MMSE-based speech enhancement. Speech Commun. 58, 49–68 (2014)CrossRef

54.

S. Shamma, Auditory cortical representation of complex acoustic spectra as inferred from the ripple analysis method. Netw. Comput. Neural Syst. 7(3), 439–476 (1996)CrossRefMATH

55.

B. Shannon, K. Paliwal, Role of phase estimation in speech enhancement, in Proceedings of International Conference on Spoken Language Processing (ICSLP), Pittsburgh, PA, Sep 2006, pp. 1423–1426

56.

S. Sheft, W. Yost, Temporal integration in amplitude modulation detection. J. Acoust. Soc. Am. 88(2), 796–805 (1990)CrossRef

57.

S. So, K. Paliwal, Modulation-domain Kalman filtering for single-channel speech enhancement. Speech Commun. 53(6), 818–829 (2011)CrossRef

58.

J. Sohn, N.S. Kim, W. Sung, A statistical model-based voice activity detection. IEEE Signal Process. Lett. 6(1), 1–3 (1999)CrossRef

59.

H. Steeneken, T. Houtgast, A physical method for measuring speech-transmission quality. J. Acoust. Soc. Am. 67(1), 318–326 (1980)CrossRef

60.

J. Thompson, L. Atlas, A non-uniform modulation transform for audio coding with increased time resolution, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Process (ICASSP), vol. 5, Hong Kong, Apr 2003, pp. 397–400

61.

V. Tyagi, I. McCowan, H. Misra, H. Bourland, Mel-cepstrum modulation spectrum (MCMS) features for robust ASR, in Proceedings of IEEE Workshop Automatic Speech Recognition and Understanding (ASRU), St. Thomas, VI, Dec 2003

62.

P. Vary, R. Martin, Digital Speech Transmission: Enhancement, Coding and Error Concealment (Wiley, West Sussex, 2006)CrossRef

63.

N. Virag, Single channel speech enhancement based on masking properties of the human auditory system. IEEE Trans. Speech Audio Process. 7(2), 126–137 (1999)CrossRef

64.

S.V. Vuuren, H. Hermanshy, On the importance of components of the modulation spectrum for speaker verification, in Proceedings of International Conference on Spoken Language Processing (ICSLP), vol. 7, Sydney, Nov 1998, pp. 3205–3208

65.

D. Wang, J. Lim, The unimportance of phase in speech enhancement. IEEE Trans. Acoust. Speech Signal Process. 30(4), 679–681 (1982)CrossRef

66.

X. Xiao, E. Chng, H. Li, Normalization of the speech modulation spectra for robust speech recognition, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Process (ICASSP), vol. 4, Monolulu, HI, Apr 2007, pp. 1021–1024

Titel: Modulation Processing for Speech Enhancement
verfasst von: Kuldip Paliwal
Belinda Schwerin
Verlag: Springer New York
Buch: Speech and Audio Processing for Coding, Enhancement and Recognition
Print ISBN: 978-1-4939-1455-5

Electronic ISBN: 978-1-4939-1456-2

Copyright-Jahr: 2015
DOI: https://doi.org/10.1007/978-1-4939-1456-2_10

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Die Gewinner und Laudatoren des Sustainability Award in Automotive 2024/© Uli Regenscheit | ATZlive, Search Icon, Banner Hanser, Kundenpotenzial/© Andrii Yalanskyi / Getty Images / iStock, Toyota-Logo/© ollo / Getty Images / iStock, Sebastian Glenschek/© Hermes International, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade, chassis.tech plus 2023/© [M] ATZlive / TÜV SÜD PRODUCT SERVICE GMBH, adäsion-Webinar-Matinee/© krystiannawrocki_ Getty Images

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.