Skip to main content

2015 | OriginalPaper | Buchkapitel

10. Modulation Processing for Speech Enhancement

verfasst von : Kuldip Paliwal, Belinda Schwerin

Erschienen in: Speech and Audio Processing for Coding, Enhancement and Recognition

Verlag: Springer New York

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Many of the traditionally speech enhancement methods reduce noise from corrupted speech by processing the magnitude spectrum in a short-time Fourier analysis-modification-synthesis (AMS) based framework. More recently, use of the modulation domain for speech processing has been investigated, however early efforts in this direction did not account for the changing properties of the modulation spectrum across time. Motivated by this and evidence of the significance of the modulation domain, we investigated the processing of the modulation spectrum on a short-time basis for speech enhancement. For this purpose, a modulation domain-based AMS framework was used, in which the trajectories of each acoustic frequency bin were processed frame-wise in a secondary AMS framework. A number of different enhancement algorithms were investigated for the enhancement of speech in the short-time modulation domain. These included spectral subtraction and MMSE magnitude estimation. In each case, the respective algorithm was used to modify the short-time modulation magnitude spectrum within the modulation AMS framework. Here we review the findings of this investigation, comparing the quality of stimuli enhanced using these modulation based approaches to stimuli enhanced using corresponding modification algorithms applied in the acoustic domain. Results presented show modulation domain based approaches to have improved quality compared to their acoustic domain counterparts. Further, MMSE modulation magnitude estimation (MME) is shown to have improved speech quality compared to Modulation spectral subtraction (ModSSub) stimuli. MME stimuli are found to have good removal of noise without the introduction of musical noise, problematic in spectral subtraction based enhancement. Results also show that ModSSub has minimal musical noise compared to acoustic Spectral subtraction, for appropriately selected modulation frame duration. For modulation domain based methods, modulation frame duration is shown to be an important parameter, with quality generally improved by use of shorter frame durations. From the results of experiments conducted, it is concluded that the short-time modulation domain provides an effective alternative to the short-time acoustic domain for speech processing. Further, that in this domain, MME provides effective noise suppression without the introduction of musical noise distortion.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
Note that for references made to the magnitude, phase or complex spectra throughout this text, the STFT modifier is implied unless otherwise stated. The acoustic and modulation modifiers are also included to disambiguate between acoustic and modulation domains.
 
Literatur
1.
Zurück zum Zitat J. Allen, L. Rabiner, A unified approach to short-time Fourier analysis and synthesis. Proc. IEEE 65(11), 1558–1564 (1977)CrossRef J. Allen, L. Rabiner, A unified approach to short-time Fourier analysis and synthesis. Proc. IEEE 65(11), 1558–1564 (1977)CrossRef
2.
Zurück zum Zitat T. Arai, M. Pavel, H. Hermansky, C. Avendano, Intelligibility of speech with filtered time trajectories of spectral envelopes, in Proceedings of International Conference on Spoken Language Processing (ICSLP), Philadelphia, PA, Oct 1996, pp. 2490–2493 T. Arai, M. Pavel, H. Hermansky, C. Avendano, Intelligibility of speech with filtered time trajectories of spectral envelopes, in Proceedings of International Conference on Spoken Language Processing (ICSLP), Philadelphia, PA, Oct 1996, pp. 2490–2493
3.
Zurück zum Zitat L. Atlas, Modulation spectral transforms: application to speech separation and modification. Tech. Rep. 155. IEICE, University of Washington, Washington, WA (2003) L. Atlas, Modulation spectral transforms: application to speech separation and modification. Tech. Rep. 155. IEICE, University of Washington, Washington, WA (2003)
4.
Zurück zum Zitat L. Atlas, S. Shamma, Joint acoustic and modulation frequency. EURASIP J. Appl. Signal Process. 2003(7), 668–675 (2003)CrossRefMATH L. Atlas, S. Shamma, Joint acoustic and modulation frequency. EURASIP J. Appl. Signal Process. 2003(7), 668–675 (2003)CrossRefMATH
5.
Zurück zum Zitat L. Atlas, M. Vinton, Modulation frequency and efficient audio coding, in Proceedings of the SPIE The International Society for Optical Engineering, vol. 4474 (2001), pp. 1–8 L. Atlas, M. Vinton, Modulation frequency and efficient audio coding, in Proceedings of the SPIE The International Society for Optical Engineering, vol. 4474 (2001), pp. 1–8
6.
Zurück zum Zitat S. Bacon, D. Grantham, Modulation masking: effects of modulation frequency, depth, and phase. J. Acoust. Soc. Am. 85(6), 2575–2580 (1989)CrossRef S. Bacon, D. Grantham, Modulation masking: effects of modulation frequency, depth, and phase. J. Acoust. Soc. Am. 85(6), 2575–2580 (1989)CrossRef
7.
Zurück zum Zitat M. Berouti, R. Schwartz, J. Makhoul, Enhancement of speech corrupted by acoustic noise, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 4., Washington, DC, Apr 1979, pp. 208–211 M. Berouti, R. Schwartz, J. Makhoul, Enhancement of speech corrupted by acoustic noise, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 4., Washington, DC, Apr 1979, pp. 208–211
8.
Zurück zum Zitat S. Boll, Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust. Speech Signal Process. 27(2), 113–120 (1979)CrossRef S. Boll, Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust. Speech Signal Process. 27(2), 113–120 (1979)CrossRef
9.
Zurück zum Zitat O. Cappe, Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor. IEEE Trans. Speech Audio Process. 2(2), 345–349 (1994)CrossRef O. Cappe, Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor. IEEE Trans. Speech Audio Process. 2(2), 345–349 (1994)CrossRef
10.
Zurück zum Zitat I. Cohen, Relaxed statistical model for speech enhancement and a priori SNR estimation. IEEE Trans. Speech Audio Process. 13(5), 870–881 (2005)CrossRef I. Cohen, Relaxed statistical model for speech enhancement and a priori SNR estimation. IEEE Trans. Speech Audio Process. 13(5), 870–881 (2005)CrossRef
11.
Zurück zum Zitat D. Depireux, J. Simon, D. Klein, S. Shamma, Spectrotemporal response field characterization with dynamic ripples in ferrect primary auditory cortex. J. Neurophysiol. 85(3), 1220–1234 (2001) D. Depireux, J. Simon, D. Klein, S. Shamma, Spectrotemporal response field characterization with dynamic ripples in ferrect primary auditory cortex. J. Neurophysiol. 85(3), 1220–1234 (2001)
12.
Zurück zum Zitat R. Drullman, J. Festen, R. Plomp, Effect of reducing slow temporal modulations on speech reception. J. Acoust. Soc. Am. 95(5), 2670–2680 (1994)CrossRef R. Drullman, J. Festen, R. Plomp, Effect of reducing slow temporal modulations on speech reception. J. Acoust. Soc. Am. 95(5), 2670–2680 (1994)CrossRef
13.
Zurück zum Zitat R. Drullman, J. Festen, R. Plomp, Effect of temporal envelope smearing on speech reception. J. Acoust. Soc. Am. 95(2), 1053–1064 (1994)CrossRef R. Drullman, J. Festen, R. Plomp, Effect of temporal envelope smearing on speech reception. J. Acoust. Soc. Am. 95(2), 1053–1064 (1994)CrossRef
14.
Zurück zum Zitat Y. Ephraim, D. Malah, Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 32(6), 1109–1121 (1984)CrossRef Y. Ephraim, D. Malah, Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 32(6), 1109–1121 (1984)CrossRef
15.
Zurück zum Zitat Y. Ephraim, D. Malah, Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 33(2), 443–445 (1985)CrossRef Y. Ephraim, D. Malah, Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 33(2), 443–445 (1985)CrossRef
16.
Zurück zum Zitat T. Falk, S. Stadler, W.B. Kleijn, W.-Y. Chan, Noise suppression based on extending a speech-dominated modulation band, in Proceedings of the ISCA Conference of the International Speech Communication Association (INTERSPEECH), Antwerp, Aug 2007, pp. 970–973 T. Falk, S. Stadler, W.B. Kleijn, W.-Y. Chan, Noise suppression based on extending a speech-dominated modulation band, in Proceedings of the ISCA Conference of the International Speech Communication Association (INTERSPEECH), Antwerp, Aug 2007, pp. 970–973
17.
Zurück zum Zitat R. Goldsworthy, J. Greenberg, Analysis of speech-based speech transmission index methods with implications for nonlinear operations. J. Acoust. Soc. Am. 116(6), 3679–3689 (2004)CrossRef R. Goldsworthy, J. Greenberg, Analysis of speech-based speech transmission index methods with implications for nonlinear operations. J. Acoust. Soc. Am. 116(6), 3679–3689 (2004)CrossRef
18.
Zurück zum Zitat R. Gray, A. Buzo, A. Gray, Y. Matsuyama, Distortion measures for speech processing. IEEE Trans. Acoust. Speech Signal Process. 28(4), 367–376 (1980)CrossRefMATH R. Gray, A. Buzo, A. Gray, Y. Matsuyama, Distortion measures for speech processing. IEEE Trans. Acoust. Speech Signal Process. 28(4), 367–376 (1980)CrossRefMATH
19.
Zurück zum Zitat S. Greenberg, T. Arai, The relation between speech intelligibility and the complex modulation spectrum, in Proceedings of the ISCA European Conference on Speech Communication and Technology (EUROSPEECH), Aalborg, Sept 2001, pp. 473–476 S. Greenberg, T. Arai, The relation between speech intelligibility and the complex modulation spectrum, in Proceedings of the ISCA European Conference on Speech Communication and Technology (EUROSPEECH), Aalborg, Sept 2001, pp. 473–476
20.
Zurück zum Zitat D. Griffin, J. Lim, Signal estimation from modified short-time Fourier transform. IEEE Trans. Acoust. Speech Signal Process. 32(2), 236–243 (1984)CrossRef D. Griffin, J. Lim, Signal estimation from modified short-time Fourier transform. IEEE Trans. Acoust. Speech Signal Process. 32(2), 236–243 (1984)CrossRef
21.
Zurück zum Zitat H. Hermansky, N. Morgan, RASTA processing of speech. IEEE Trans. Speech Audio Process. 2, 578–589 (1994)CrossRef H. Hermansky, N. Morgan, RASTA processing of speech. IEEE Trans. Speech Audio Process. 2, 578–589 (1994)CrossRef
22.
Zurück zum Zitat H. Hermansky, E. Wan, C. Avendano, Speech enhancement based on temporal processing, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, Detroit, MI, May 1995, pp. 405–408 H. Hermansky, E. Wan, C. Avendano, Speech enhancement based on temporal processing, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, Detroit, MI, May 1995, pp. 405–408
23.
Zurück zum Zitat T. Houtgast, H. Steeneken, A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria. J. Acoust. Soc. Am. 77(3), 1069–1077 (1985)CrossRef T. Houtgast, H. Steeneken, A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria. J. Acoust. Soc. Am. 77(3), 1069–1077 (1985)CrossRef
24.
Zurück zum Zitat X. Huang, A. Acero, H. Hon, Spoken Language Processing: A Guide to Theory, Algorithm, and System Development (Prentice Hall, Upper Saddle River, 2001) X. Huang, A. Acero, H. Hon, Spoken Language Processing: A Guide to Theory, Algorithm, and System Development (Prentice Hall, Upper Saddle River, 2001)
25.
Zurück zum Zitat S. Kamath, P. Loizou, A multi-band spectral subtraction method for enhancing speech corrupted by colored noise, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2002) S. Kamath, P. Loizou, A multi-band spectral subtraction method for enhancing speech corrupted by colored noise, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2002)
26.
Zurück zum Zitat N. Kanedera, T. Arai, H. Hermansky, M. Pavel, On the relative importance of various components of the modulation spectrum for automatic speech recognition. Speech Commun. 28(1), 43–55 (1999)CrossRef N. Kanedera, T. Arai, H. Hermansky, M. Pavel, On the relative importance of various components of the modulation spectrum for automatic speech recognition. Speech Commun. 28(1), 43–55 (1999)CrossRef
27.
Zurück zum Zitat D. Kim, A cue for objective speech quality estimation in temporal envelope representations. IEEE Signal Process. Lett. 11(10), 849–852 (2004)CrossRef D. Kim, A cue for objective speech quality estimation in temporal envelope representations. IEEE Signal Process. Lett. 11(10), 849–852 (2004)CrossRef
28.
Zurück zum Zitat D. Kim, Anique: an auditory model for single-ended speech quality estimation. IEEE Trans. Speech Audio Process. 13(5), 821–831 (2005)CrossRef D. Kim, Anique: an auditory model for single-ended speech quality estimation. IEEE Trans. Speech Audio Process. 13(5), 821–831 (2005)CrossRef
29.
Zurück zum Zitat B. Kingsbury, N. Morgan, S. Greenberg, Robust speech recognition using the modulation spectrogram. Speech Commun. 25(1–3), 117–132 (1998)CrossRef B. Kingsbury, N. Morgan, S. Greenberg, Robust speech recognition using the modulation spectrogram. Speech Commun. 25(1–3), 117–132 (1998)CrossRef
30.
Zurück zum Zitat T. Kinnunen, Joint acoustic-modulation frequency for speaker recognition, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1. Toulouse, May 2006, pp. 665–668 T. Kinnunen, Joint acoustic-modulation frequency for speaker recognition, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1. Toulouse, May 2006, pp. 665–668
31.
Zurück zum Zitat T. Kinnunen, K. Lee, H. Li, Dimension reduction of the modulation spectrogram for speaker verification, in Proceedings of ISCA Speaker and Language Recognition Workshop (ODYSSEY), Stellenbosch, Jan 2008 T. Kinnunen, K. Lee, H. Li, Dimension reduction of the modulation spectrogram for speaker verification, in Proceedings of ISCA Speaker and Language Recognition Workshop (ODYSSEY), Stellenbosch, Jan 2008
32.
Zurück zum Zitat N. Kowalski, D. Depireux, S. Shamma, Analysis of dynamic spectra in ferret primary auditory cortex: I. Characteristics of single unit responses to moving ripple spectra. J. Neurophysiol. 76(5), 3503–3523 (1996) N. Kowalski, D. Depireux, S. Shamma, Analysis of dynamic spectra in ferret primary auditory cortex: I. Characteristics of single unit responses to moving ripple spectra. J. Neurophysiol. 76(5), 3503–3523 (1996)
33.
Zurück zum Zitat J. Lim, A. Oppenheim, Enhancement and bandwidth compression of noisy speech. Proc. IEEE 67(12), 1586–1604 (1979)CrossRef J. Lim, A. Oppenheim, Enhancement and bandwidth compression of noisy speech. Proc. IEEE 67(12), 1586–1604 (1979)CrossRef
34.
Zurück zum Zitat P. Loizou, Speech Enhancement: Theory and Practice (Taylor and Francis, Boca Raton, 2007) P. Loizou, Speech Enhancement: Theory and Practice (Taylor and Francis, Boca Raton, 2007)
35.
Zurück zum Zitat X. Lu, S. Matsuda, M. Unoki, S. Nakamura, Temporal contrast normalization and edge-preserved smoothing of temporal modulation structures of speech for robust speech recognition. Speech Commun. 52(1), 1–11 (2010)CrossRef X. Lu, S. Matsuda, M. Unoki, S. Nakamura, Temporal contrast normalization and edge-preserved smoothing of temporal modulation structures of speech for robust speech recognition. Speech Commun. 52(1), 1–11 (2010)CrossRef
36.
Zurück zum Zitat J. Lyons, K. Paliwal, Effect of compressing the dynamic range of the power spectrum in modulation filtering based speech enhancement, in Proceedings of ISCA Conference of the International Speech Communication Association (INTERSPEECH), Brisbane, Sep 2008, pp. 387–390 J. Lyons, K. Paliwal, Effect of compressing the dynamic range of the power spectrum in modulation filtering based speech enhancement, in Proceedings of ISCA Conference of the International Speech Communication Association (INTERSPEECH), Brisbane, Sep 2008, pp. 387–390
37.
Zurück zum Zitat N. Malayath, H. Hermansky, S. Kajarekar, B. Yegnanarayana, Data-driven temporal filters and alternatives to GMM in speaker verification. Digit. Signal Proces. 10(1–3), 55–74 (2000)CrossRef N. Malayath, H. Hermansky, S. Kajarekar, B. Yegnanarayana, Data-driven temporal filters and alternatives to GMM in speaker verification. Digit. Signal Proces. 10(1–3), 55–74 (2000)CrossRef
38.
Zurück zum Zitat R. McAulay, M. Malpass, Speech enhancement using a soft-decision noise suppression filter. IEEE Trans. Acoust. Speech Signal Process. 28(2), 137–145 (1980)CrossRef R. McAulay, M. Malpass, Speech enhancement using a soft-decision noise suppression filter. IEEE Trans. Acoust. Speech Signal Process. 28(2), 137–145 (1980)CrossRef
39.
Zurück zum Zitat N. Mesgarani, S. Shamma, Speech enhancement based on filtering the spectrotemporal modulations, in Proceedings of IEEE International Conference Acoustics Speech and Signal Processing (ICASSP), vol. 1, Philadelphia, PA, Mar 2005, pp. 1105–1108 N. Mesgarani, S. Shamma, Speech enhancement based on filtering the spectrotemporal modulations, in Proceedings of IEEE International Conference Acoustics Speech and Signal Processing (ICASSP), vol. 1, Philadelphia, PA, Mar 2005, pp. 1105–1108
40.
Zurück zum Zitat C. Nadeu, P. Pachés-Leal, B.-H. Juang, Filtering the time sequences of spectral parameters for speech recognition. Speech Commun. 22(4), 315–332 (1997)CrossRef C. Nadeu, P. Pachés-Leal, B.-H. Juang, Filtering the time sequences of spectral parameters for speech recognition. Speech Commun. 22(4), 315–332 (1997)CrossRef
41.
Zurück zum Zitat K. Paliwal, B. Schwerin, K. Wójcicki, Role of modulation magnitude and phase spectrum towards speech intelligibility. Speech Commun. 53(3), 327–339 (2011)CrossRef K. Paliwal, B. Schwerin, K. Wójcicki, Role of modulation magnitude and phase spectrum towards speech intelligibility. Speech Commun. 53(3), 327–339 (2011)CrossRef
42.
Zurück zum Zitat K. Paliwal, B. Schwerin, K. Wójcicki, Speech enhancement using minimum mean-square error short-time spectral modulation magnitude estimator. Speech Commun. 54(2), 282–305 (2012)CrossRef K. Paliwal, B. Schwerin, K. Wójcicki, Speech enhancement using minimum mean-square error short-time spectral modulation magnitude estimator. Speech Commun. 54(2), 282–305 (2012)CrossRef
43.
Zurück zum Zitat K. Paliwal, K. Wójcicki, Effect of analysis window duration on speech intelligibility. IEEE Signal Process. Lett. 15, 785–788 (2008)CrossRef K. Paliwal, K. Wójcicki, Effect of analysis window duration on speech intelligibility. IEEE Signal Process. Lett. 15, 785–788 (2008)CrossRef
44.
Zurück zum Zitat K. Paliwal, K. Wójcicki, B. Schwerin, Single-channel speech enhancement using spectral subtraction in the short-time modulation domain. Speech Commun. 52(5), 450–475 (2010)CrossRef K. Paliwal, K. Wójcicki, B. Schwerin, Single-channel speech enhancement using spectral subtraction in the short-time modulation domain. Speech Commun. 52(5), 450–475 (2010)CrossRef
45.
Zurück zum Zitat K. Payton, L. Braida, A method to determine the speech transmission index from speech waveforms. J. Acoust. Soc. Am. 106(6), 3637–3648 (1999)CrossRef K. Payton, L. Braida, A method to determine the speech transmission index from speech waveforms. J. Acoust. Soc. Am. 106(6), 3637–3648 (1999)CrossRef
46.
Zurück zum Zitat J. Picone, Signal modeling techniques in speech recognition. Proc. IEEE 81(9), 1215–1247 (1993)CrossRef J. Picone, Signal modeling techniques in speech recognition. Proc. IEEE 81(9), 1215–1247 (1993)CrossRef
47.
Zurück zum Zitat S. Quackenbush, T. Barnwell, M. Clements, Objective Measures of Speech Quality (Prentice Hall, Englewood Cliffs, 1988) S. Quackenbush, T. Barnwell, M. Clements, Objective Measures of Speech Quality (Prentice Hall, Englewood Cliffs, 1988)
48.
Zurück zum Zitat T. Quatieri, Discrete-Time Speech Signal Processing: Principles and Practice (Prentice Hall, Upper Saddle River, 2002) T. Quatieri, Discrete-Time Speech Signal Processing: Principles and Practice (Prentice Hall, Upper Saddle River, 2002)
49.
Zurück zum Zitat L. Rabiner, R. Schafer, Theory and Applications of Digital Speech Processing (Pearson Higher Education, Upper Saddle River, 2011) L. Rabiner, R. Schafer, Theory and Applications of Digital Speech Processing (Pearson Higher Education, Upper Saddle River, 2011)
50.
Zurück zum Zitat A. Rix, J. Beerends, M. Hollier, A. Hekstra, Perceptual Evaluation of Speech Quality (PESQ), an objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs. ITU-T Recommendation P.862 (2001) A. Rix, J. Beerends, M. Hollier, A. Hekstra, Perceptual Evaluation of Speech Quality (PESQ), an objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs. ITU-T Recommendation P.862 (2001)
51.
Zurück zum Zitat P. Scalart, J. Filho, Speech enhancement based on a priori signal to noise estimation, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Process (ICASSP), vol. 2. Atlanta, GA, May 1996, pp. 629–632 P. Scalart, J. Filho, Speech enhancement based on a priori signal to noise estimation, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Process (ICASSP), vol. 2. Atlanta, GA, May 1996, pp. 629–632
52.
Zurück zum Zitat C. Schreiner, J. Urbas, Representation of amplitude modulation in the auditory cortex of the cat: I. The anterior auditory field (AAF). Hear. Res. 21(3), 227–241 (1986) C. Schreiner, J. Urbas, Representation of amplitude modulation in the auditory cortex of the cat: I. The anterior auditory field (AAF). Hear. Res. 21(3), 227–241 (1986)
53.
Zurück zum Zitat B. Schwerin, K. Paliwal, Using STFT real and imaginary parts of modulation signals for MMSE-based speech enhancement. Speech Commun. 58, 49–68 (2014)CrossRef B. Schwerin, K. Paliwal, Using STFT real and imaginary parts of modulation signals for MMSE-based speech enhancement. Speech Commun. 58, 49–68 (2014)CrossRef
54.
Zurück zum Zitat S. Shamma, Auditory cortical representation of complex acoustic spectra as inferred from the ripple analysis method. Netw. Comput. Neural Syst. 7(3), 439–476 (1996)CrossRefMATH S. Shamma, Auditory cortical representation of complex acoustic spectra as inferred from the ripple analysis method. Netw. Comput. Neural Syst. 7(3), 439–476 (1996)CrossRefMATH
55.
Zurück zum Zitat B. Shannon, K. Paliwal, Role of phase estimation in speech enhancement, in Proceedings of International Conference on Spoken Language Processing (ICSLP), Pittsburgh, PA, Sep 2006, pp. 1423–1426 B. Shannon, K. Paliwal, Role of phase estimation in speech enhancement, in Proceedings of International Conference on Spoken Language Processing (ICSLP), Pittsburgh, PA, Sep 2006, pp. 1423–1426
56.
Zurück zum Zitat S. Sheft, W. Yost, Temporal integration in amplitude modulation detection. J. Acoust. Soc. Am. 88(2), 796–805 (1990)CrossRef S. Sheft, W. Yost, Temporal integration in amplitude modulation detection. J. Acoust. Soc. Am. 88(2), 796–805 (1990)CrossRef
57.
Zurück zum Zitat S. So, K. Paliwal, Modulation-domain Kalman filtering for single-channel speech enhancement. Speech Commun. 53(6), 818–829 (2011)CrossRef S. So, K. Paliwal, Modulation-domain Kalman filtering for single-channel speech enhancement. Speech Commun. 53(6), 818–829 (2011)CrossRef
58.
Zurück zum Zitat J. Sohn, N.S. Kim, W. Sung, A statistical model-based voice activity detection. IEEE Signal Process. Lett. 6(1), 1–3 (1999)CrossRef J. Sohn, N.S. Kim, W. Sung, A statistical model-based voice activity detection. IEEE Signal Process. Lett. 6(1), 1–3 (1999)CrossRef
59.
Zurück zum Zitat H. Steeneken, T. Houtgast, A physical method for measuring speech-transmission quality. J. Acoust. Soc. Am. 67(1), 318–326 (1980)CrossRef H. Steeneken, T. Houtgast, A physical method for measuring speech-transmission quality. J. Acoust. Soc. Am. 67(1), 318–326 (1980)CrossRef
60.
Zurück zum Zitat J. Thompson, L. Atlas, A non-uniform modulation transform for audio coding with increased time resolution, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Process (ICASSP), vol. 5, Hong Kong, Apr 2003, pp. 397–400 J. Thompson, L. Atlas, A non-uniform modulation transform for audio coding with increased time resolution, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Process (ICASSP), vol. 5, Hong Kong, Apr 2003, pp. 397–400
61.
Zurück zum Zitat V. Tyagi, I. McCowan, H. Misra, H. Bourland, Mel-cepstrum modulation spectrum (MCMS) features for robust ASR, in Proceedings of IEEE Workshop Automatic Speech Recognition and Understanding (ASRU), St. Thomas, VI, Dec 2003 V. Tyagi, I. McCowan, H. Misra, H. Bourland, Mel-cepstrum modulation spectrum (MCMS) features for robust ASR, in Proceedings of IEEE Workshop Automatic Speech Recognition and Understanding (ASRU), St. Thomas, VI, Dec 2003
62.
Zurück zum Zitat P. Vary, R. Martin, Digital Speech Transmission: Enhancement, Coding and Error Concealment (Wiley, West Sussex, 2006)CrossRef P. Vary, R. Martin, Digital Speech Transmission: Enhancement, Coding and Error Concealment (Wiley, West Sussex, 2006)CrossRef
63.
Zurück zum Zitat N. Virag, Single channel speech enhancement based on masking properties of the human auditory system. IEEE Trans. Speech Audio Process. 7(2), 126–137 (1999)CrossRef N. Virag, Single channel speech enhancement based on masking properties of the human auditory system. IEEE Trans. Speech Audio Process. 7(2), 126–137 (1999)CrossRef
64.
Zurück zum Zitat S.V. Vuuren, H. Hermanshy, On the importance of components of the modulation spectrum for speaker verification, in Proceedings of International Conference on Spoken Language Processing (ICSLP), vol. 7, Sydney, Nov 1998, pp. 3205–3208 S.V. Vuuren, H. Hermanshy, On the importance of components of the modulation spectrum for speaker verification, in Proceedings of International Conference on Spoken Language Processing (ICSLP), vol. 7, Sydney, Nov 1998, pp. 3205–3208
65.
Zurück zum Zitat D. Wang, J. Lim, The unimportance of phase in speech enhancement. IEEE Trans. Acoust. Speech Signal Process. 30(4), 679–681 (1982)CrossRef D. Wang, J. Lim, The unimportance of phase in speech enhancement. IEEE Trans. Acoust. Speech Signal Process. 30(4), 679–681 (1982)CrossRef
66.
Zurück zum Zitat X. Xiao, E. Chng, H. Li, Normalization of the speech modulation spectra for robust speech recognition, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Process (ICASSP), vol. 4, Monolulu, HI, Apr 2007, pp. 1021–1024 X. Xiao, E. Chng, H. Li, Normalization of the speech modulation spectra for robust speech recognition, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Process (ICASSP), vol. 4, Monolulu, HI, Apr 2007, pp. 1021–1024
Metadaten
Titel
Modulation Processing for Speech Enhancement
verfasst von
Kuldip Paliwal
Belinda Schwerin
Copyright-Jahr
2015
Verlag
Springer New York
DOI
https://doi.org/10.1007/978-1-4939-1456-2_10

Neuer Inhalt