Skip to main content

2018 | OriginalPaper | Buchkapitel

Speech Intelligibility Enhancement in Strong Mechanical Noise Based on Neural Networks

verfasst von : Feng Cheng, Xiaochen Wang, Li Gang, Weiping Tu, Jinshan Wang

Erschienen in: Advances in Multimedia Information Processing – PCM 2017

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Speech intelligibility is a significant factor for successful speech communication. To enhance the intelligibility, many methods have been proposed, mainly by operating the speech signal such as increasing the amplitude or modifying the speech spectrum. However, their effects are limited when the background noise is extremely strong. In this paper, we purpose a preprocessed noise cancellation model to enhance the speech intelligibility by predicting the cancelling signal and superimposing it into the speech signal. We build a deep neural network (DNN) model to make the prediction algorithm have better accuracy. Finally, the effectiveness of the algorithm was verified by objective and subjective tests, the average of signal-to-noise ratio (SNR) improved 4.5 dB, the average of speech intelligibility index (SII) increased 5.4% and the average of comparison mean opinion score (CMOS) rose 1.16 on a variety of test cases.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Kleijn, W.B., Crespo, J.B., Hendriks, R.C., et al.: Optimizing speech intelligibility in a noisy environment: a unified view. IEEE Signal Process. Mag. 32(2), 43–54 (2015)CrossRef Kleijn, W.B., Crespo, J.B., Hendriks, R.C., et al.: Optimizing speech intelligibility in a noisy environment: a unified view. IEEE Signal Process. Mag. 32(2), 43–54 (2015)CrossRef
2.
Zurück zum Zitat Niederjohn, R., Grotelueschen, J.: The enhancement of speech intelligibility in high noise levels by high-pass filtering followed by rapid amplitude compression. IEEE Trans. Acoust. Speech Signal Process. 24(4), 277–282 (1976)CrossRef Niederjohn, R., Grotelueschen, J.: The enhancement of speech intelligibility in high noise levels by high-pass filtering followed by rapid amplitude compression. IEEE Trans. Acoust. Speech Signal Process. 24(4), 277–282 (1976)CrossRef
3.
Zurück zum Zitat Niederjohn, R., Grotelueschen, J.: Speech intelligibility enhancement in a power generating noise environment. IEEE Trans. Acoust. Speech Signal Process. 26(4), 378–380 (1978)CrossRef Niederjohn, R., Grotelueschen, J.: Speech intelligibility enhancement in a power generating noise environment. IEEE Trans. Acoust. Speech Signal Process. 26(4), 378–380 (1978)CrossRef
4.
Zurück zum Zitat Sauert, B., Vary, P.: Near end listening enhancement: Speech intelligibility improvement in noisy environments. In: IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2006. IEEE, vol. 1, pp. I-I (2006) Sauert, B., Vary, P.: Near end listening enhancement: Speech intelligibility improvement in noisy environments. In: IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2006. IEEE, vol. 1, pp. I-I (2006)
5.
Zurück zum Zitat Zorila, T.C., Kandia, V., Stylianou, Y.: Speech-in-noise intelligibility improvement based on spectral shaping and dynamic range compression. In: Thirteenth Annual Conference of the International Speech Communication Association (2012) Zorila, T.C., Kandia, V., Stylianou, Y.: Speech-in-noise intelligibility improvement based on spectral shaping and dynamic range compression. In: Thirteenth Annual Conference of the International Speech Communication Association (2012)
6.
Zurück zum Zitat Schepker, H.F., Rennies, J., Doclo, S.: Improving speech intelligibility in noise by SII-dependent preprocessing using frequency-dependent amplification and dynamic range compression. In: INTERSPEECH. pp. 3577–3581 (2013) Schepker, H.F., Rennies, J., Doclo, S.: Improving speech intelligibility in noise by SII-dependent preprocessing using frequency-dependent amplification and dynamic range compression. In: INTERSPEECH. pp. 3577–3581 (2013)
7.
Zurück zum Zitat Petkov, P.N., Kleijn, W.B.: Spectral dynamics recovery for enhanced speech intelligibility in noise. IEEE/ACM Trans. Audio, Speech Lang. Process. (TASLP) 23(2), 327–338 (2015)CrossRef Petkov, P.N., Kleijn, W.B.: Spectral dynamics recovery for enhanced speech intelligibility in noise. IEEE/ACM Trans. Audio, Speech Lang. Process. (TASLP) 23(2), 327–338 (2015)CrossRef
8.
Zurück zum Zitat Goli, P., Karami-mollaei, M.R.: Speech intelligibility improvement in noisy environments based on energy correlation in frequency bands. Digit. Signal Proc. 62, 238–248 (2017)CrossRef Goli, P., Karami-mollaei, M.R.: Speech intelligibility improvement in noisy environments based on energy correlation in frequency bands. Digit. Signal Proc. 62, 238–248 (2017)CrossRef
9.
Zurück zum Zitat ANSI A. S3.: 5–1997, Methods for the calculation of the speech intelligibility index. New York: American National Standards Institute, 19, 90–119 (1997) ANSI A. S3.: 5–1997, Methods for the calculation of the speech intelligibility index. New York: American National Standards Institute, 19, 90–119 (1997)
10.
Zurück zum Zitat Widrow, B., Glover, J.R., McCool, J.M., et al.: Adaptive noise cancelling: Principles and applications. Proc. IEEE 63(12), 1692–1716 (1975)CrossRef Widrow, B., Glover, J.R., McCool, J.M., et al.: Adaptive noise cancelling: Principles and applications. Proc. IEEE 63(12), 1692–1716 (1975)CrossRef
11.
Zurück zum Zitat Guarnaccia, C.: Advanced tools for traffic noise modelling and prediction. WSEAS Trans. Syst. 12(2), 121–130 (2013) Guarnaccia, C.: Advanced tools for traffic noise modelling and prediction. WSEAS Trans. Syst. 12(2), 121–130 (2013)
12.
Zurück zum Zitat Varga, A., Steeneken, H.J.M., Tomlinson, M., et al.: The NOISEX-92 study on the effect of additive noise on automatic speech recognition. Technical Report, DRA Speech Research Unit (1992) Varga, A., Steeneken, H.J.M., Tomlinson, M., et al.: The NOISEX-92 study on the effect of additive noise on automatic speech recognition. Technical Report, DRA Speech Research Unit (1992)
13.
Zurück zum Zitat ETSI TS 103 224: A sound field reproduction method for terminal testing including a background noise database. European Telecommunications Standards Institute (2014) ETSI TS 103 224: A sound field reproduction method for terminal testing including a background noise database. European Telecommunications Standards Institute (2014)
14.
Zurück zum Zitat Zue, V., Seneff, S., Glass, J.: Speech database development at MIT: TIMIT and beyond. Speech Commun. 9(4), 351–356 (1990)CrossRef Zue, V., Seneff, S., Glass, J.: Speech database development at MIT: TIMIT and beyond. Speech Commun. 9(4), 351–356 (1990)CrossRef
15.
Zurück zum Zitat Recommendation I. 800: Methods for subjective determination of transmission quality. International Telecommunication Union (1996) Recommendation I. 800: Methods for subjective determination of transmission quality. International Telecommunication Union (1996)
16.
Zurück zum Zitat Khademi, S., Hendriks, R.C., Kleijn, W.B.: Jointly optimal near-end and far-end multi-microphone speech intelligibility enhancement based on mutual information. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp. 654–658 (2016) Khademi, S., Hendriks, R.C., Kleijn, W.B.: Jointly optimal near-end and far-end multi-microphone speech intelligibility enhancement based on mutual information. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp. 654–658 (2016)
17.
Zurück zum Zitat Petkov, P.N., Stylianou, Y.: Adaptive Gain Control for Enhanced Speech Intelligibility Under Reverberation[J]. IEEE Signal Process. Lett. 23(10), 1434–1438 (2016)CrossRef Petkov, P.N., Stylianou, Y.: Adaptive Gain Control for Enhanced Speech Intelligibility Under Reverberation[J]. IEEE Signal Process. Lett. 23(10), 1434–1438 (2016)CrossRef
Metadaten
Titel
Speech Intelligibility Enhancement in Strong Mechanical Noise Based on Neural Networks
verfasst von
Feng Cheng
Xiaochen Wang
Li Gang
Weiping Tu
Jinshan Wang
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-77383-4_69

Neuer Inhalt