Skip to main content
Top

2018 | OriginalPaper | Chapter

Speech Intelligibility Enhancement in Strong Mechanical Noise Based on Neural Networks

Authors : Feng Cheng, Xiaochen Wang, Li Gang, Weiping Tu, Jinshan Wang

Published in: Advances in Multimedia Information Processing – PCM 2017

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Speech intelligibility is a significant factor for successful speech communication. To enhance the intelligibility, many methods have been proposed, mainly by operating the speech signal such as increasing the amplitude or modifying the speech spectrum. However, their effects are limited when the background noise is extremely strong. In this paper, we purpose a preprocessed noise cancellation model to enhance the speech intelligibility by predicting the cancelling signal and superimposing it into the speech signal. We build a deep neural network (DNN) model to make the prediction algorithm have better accuracy. Finally, the effectiveness of the algorithm was verified by objective and subjective tests, the average of signal-to-noise ratio (SNR) improved 4.5 dB, the average of speech intelligibility index (SII) increased 5.4% and the average of comparison mean opinion score (CMOS) rose 1.16 on a variety of test cases.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Kleijn, W.B., Crespo, J.B., Hendriks, R.C., et al.: Optimizing speech intelligibility in a noisy environment: a unified view. IEEE Signal Process. Mag. 32(2), 43–54 (2015)CrossRef Kleijn, W.B., Crespo, J.B., Hendriks, R.C., et al.: Optimizing speech intelligibility in a noisy environment: a unified view. IEEE Signal Process. Mag. 32(2), 43–54 (2015)CrossRef
2.
go back to reference Niederjohn, R., Grotelueschen, J.: The enhancement of speech intelligibility in high noise levels by high-pass filtering followed by rapid amplitude compression. IEEE Trans. Acoust. Speech Signal Process. 24(4), 277–282 (1976)CrossRef Niederjohn, R., Grotelueschen, J.: The enhancement of speech intelligibility in high noise levels by high-pass filtering followed by rapid amplitude compression. IEEE Trans. Acoust. Speech Signal Process. 24(4), 277–282 (1976)CrossRef
3.
go back to reference Niederjohn, R., Grotelueschen, J.: Speech intelligibility enhancement in a power generating noise environment. IEEE Trans. Acoust. Speech Signal Process. 26(4), 378–380 (1978)CrossRef Niederjohn, R., Grotelueschen, J.: Speech intelligibility enhancement in a power generating noise environment. IEEE Trans. Acoust. Speech Signal Process. 26(4), 378–380 (1978)CrossRef
4.
go back to reference Sauert, B., Vary, P.: Near end listening enhancement: Speech intelligibility improvement in noisy environments. In: IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2006. IEEE, vol. 1, pp. I-I (2006) Sauert, B., Vary, P.: Near end listening enhancement: Speech intelligibility improvement in noisy environments. In: IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2006. IEEE, vol. 1, pp. I-I (2006)
5.
go back to reference Zorila, T.C., Kandia, V., Stylianou, Y.: Speech-in-noise intelligibility improvement based on spectral shaping and dynamic range compression. In: Thirteenth Annual Conference of the International Speech Communication Association (2012) Zorila, T.C., Kandia, V., Stylianou, Y.: Speech-in-noise intelligibility improvement based on spectral shaping and dynamic range compression. In: Thirteenth Annual Conference of the International Speech Communication Association (2012)
6.
go back to reference Schepker, H.F., Rennies, J., Doclo, S.: Improving speech intelligibility in noise by SII-dependent preprocessing using frequency-dependent amplification and dynamic range compression. In: INTERSPEECH. pp. 3577–3581 (2013) Schepker, H.F., Rennies, J., Doclo, S.: Improving speech intelligibility in noise by SII-dependent preprocessing using frequency-dependent amplification and dynamic range compression. In: INTERSPEECH. pp. 3577–3581 (2013)
7.
go back to reference Petkov, P.N., Kleijn, W.B.: Spectral dynamics recovery for enhanced speech intelligibility in noise. IEEE/ACM Trans. Audio, Speech Lang. Process. (TASLP) 23(2), 327–338 (2015)CrossRef Petkov, P.N., Kleijn, W.B.: Spectral dynamics recovery for enhanced speech intelligibility in noise. IEEE/ACM Trans. Audio, Speech Lang. Process. (TASLP) 23(2), 327–338 (2015)CrossRef
8.
go back to reference Goli, P., Karami-mollaei, M.R.: Speech intelligibility improvement in noisy environments based on energy correlation in frequency bands. Digit. Signal Proc. 62, 238–248 (2017)CrossRef Goli, P., Karami-mollaei, M.R.: Speech intelligibility improvement in noisy environments based on energy correlation in frequency bands. Digit. Signal Proc. 62, 238–248 (2017)CrossRef
9.
go back to reference ANSI A. S3.: 5–1997, Methods for the calculation of the speech intelligibility index. New York: American National Standards Institute, 19, 90–119 (1997) ANSI A. S3.: 5–1997, Methods for the calculation of the speech intelligibility index. New York: American National Standards Institute, 19, 90–119 (1997)
10.
go back to reference Widrow, B., Glover, J.R., McCool, J.M., et al.: Adaptive noise cancelling: Principles and applications. Proc. IEEE 63(12), 1692–1716 (1975)CrossRef Widrow, B., Glover, J.R., McCool, J.M., et al.: Adaptive noise cancelling: Principles and applications. Proc. IEEE 63(12), 1692–1716 (1975)CrossRef
11.
go back to reference Guarnaccia, C.: Advanced tools for traffic noise modelling and prediction. WSEAS Trans. Syst. 12(2), 121–130 (2013) Guarnaccia, C.: Advanced tools for traffic noise modelling and prediction. WSEAS Trans. Syst. 12(2), 121–130 (2013)
12.
go back to reference Varga, A., Steeneken, H.J.M., Tomlinson, M., et al.: The NOISEX-92 study on the effect of additive noise on automatic speech recognition. Technical Report, DRA Speech Research Unit (1992) Varga, A., Steeneken, H.J.M., Tomlinson, M., et al.: The NOISEX-92 study on the effect of additive noise on automatic speech recognition. Technical Report, DRA Speech Research Unit (1992)
13.
go back to reference ETSI TS 103 224: A sound field reproduction method for terminal testing including a background noise database. European Telecommunications Standards Institute (2014) ETSI TS 103 224: A sound field reproduction method for terminal testing including a background noise database. European Telecommunications Standards Institute (2014)
14.
go back to reference Zue, V., Seneff, S., Glass, J.: Speech database development at MIT: TIMIT and beyond. Speech Commun. 9(4), 351–356 (1990)CrossRef Zue, V., Seneff, S., Glass, J.: Speech database development at MIT: TIMIT and beyond. Speech Commun. 9(4), 351–356 (1990)CrossRef
15.
go back to reference Recommendation I. 800: Methods for subjective determination of transmission quality. International Telecommunication Union (1996) Recommendation I. 800: Methods for subjective determination of transmission quality. International Telecommunication Union (1996)
16.
go back to reference Khademi, S., Hendriks, R.C., Kleijn, W.B.: Jointly optimal near-end and far-end multi-microphone speech intelligibility enhancement based on mutual information. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp. 654–658 (2016) Khademi, S., Hendriks, R.C., Kleijn, W.B.: Jointly optimal near-end and far-end multi-microphone speech intelligibility enhancement based on mutual information. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp. 654–658 (2016)
17.
go back to reference Petkov, P.N., Stylianou, Y.: Adaptive Gain Control for Enhanced Speech Intelligibility Under Reverberation[J]. IEEE Signal Process. Lett. 23(10), 1434–1438 (2016)CrossRef Petkov, P.N., Stylianou, Y.: Adaptive Gain Control for Enhanced Speech Intelligibility Under Reverberation[J]. IEEE Signal Process. Lett. 23(10), 1434–1438 (2016)CrossRef
Metadata
Title
Speech Intelligibility Enhancement in Strong Mechanical Noise Based on Neural Networks
Authors
Feng Cheng
Xiaochen Wang
Li Gang
Weiping Tu
Jinshan Wang
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-77383-4_69