Skip to main content
Top
Published in: International Journal of Speech Technology 1/2019

23-02-2019

Low SNR speech enhancement with DNN based phase estimation

Authors: Samba Raju Chiluveru, Manoj Tripathy

Published in: International Journal of Speech Technology | Issue 1/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In low Signal-to-Noise Ratio environment phase information is one of the important factor and therefore this article consider the importance of clean phase in single channel speech enhancement technique. The proposed method uses Deep Neural Network based regression model to estimate clean phase and clean amplitude for speech reconstruction. Experiments are conducted over five different noises such as factory, restaurant, car, airport and babble at different levels and result are evaluated using objective quality measures like Perceptual Evaluation of Speech Quality, Weighted Spectral Slope, Cepstrum Distance, frequency weighted segmented Signal-to-Noise Ratio and Log Likelihood Ratio. The overall quality of speech improved for factory noise by \(12\%\), restaurant noise by \(8\%\), car noise by \(13\%\), airport noise by \(10\%\) and babble noise by \(14\%\) respectively.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Benesty, J., Makino, S., & Chen, J. (2005). Speech enhancement, signals and communication technology. Berlin: Springer. Benesty, J., Makino, S., & Chen, J. (2005). Speech enhancement, signals and communication technology. Berlin: Springer.
go back to reference Berouti, M., Schwartz, R., & Makhoul, J. (1979). Enhancement of speech corrupted by acoustic noise. In: Acoustics, Speech, and Signal Processing, International Conference on ICASSP, IEEE (Vol. 4, pp. 208–211). Berouti, M., Schwartz, R., & Makhoul, J. (1979). Enhancement of speech corrupted by acoustic noise. In: Acoustics, Speech, and Signal Processing, International Conference on ICASSP, IEEE (Vol. 4, pp. 208–211).
go back to reference Bouguelia, M. R., Nowaczyk, S., Santosh, K., & Verikas, A. (2018). Agreeing to disagree: Active learning with noisy labels without crowdsourcing. International Journal of Machine Learning and Cybernetics, 9(8), 1307–1319.CrossRef Bouguelia, M. R., Nowaczyk, S., Santosh, K., & Verikas, A. (2018). Agreeing to disagree: Active learning with noisy labels without crowdsourcing. International Journal of Machine Learning and Cybernetics, 9(8), 1307–1319.CrossRef
go back to reference Brainard, D. H., & Vision, S. (1997). The psychophysics toolbox. Spatial Vision, 10, 433–436.CrossRef Brainard, D. H., & Vision, S. (1997). The psychophysics toolbox. Spatial Vision, 10, 433–436.CrossRef
go back to reference Chazan, D., Hoory, R., Cohen, G., & Zibulski, M. (2000). Speech reconstruction from mel frequency cepstral coefficients and pitch frequency. In 2000 IEEE international conference on acoustics, speech, and signal processing. Proceedings (Vol. 3, pp. 1299–1302). IEEE. https://doi.org/10.1109/ICASSP.2000.861816. Chazan, D., Hoory, R., Cohen, G., & Zibulski, M. (2000). Speech reconstruction from mel frequency cepstral coefficients and pitch frequency. In 2000 IEEE international conference on acoustics, speech, and signal processing. Proceedings (Vol. 3, pp. 1299–1302). IEEE. https://​doi.​org/​10.​1109/​ICASSP.​2000.​861816.
go back to reference Deng, L. (2012). Three classes of deep learning architectures and their applications: a tutorial survey. APSIPA Transactions on Signal and Information Processing, 1, 60–88. Deng, L. (2012). Three classes of deep learning architectures and their applications: a tutorial survey. APSIPA Transactions on Signal and Information Processing, 1, 60–88.
go back to reference Garofolo, J. S., et al. (1988). Getting started with the DARPA TIMIT CD-ROM: An acoustic phonetic continuous speech database. National Institute of Standards and Technology (NIST), Gaithersburgh, MD 107. Garofolo, J. S., et al. (1988). Getting started with the DARPA TIMIT CD-ROM: An acoustic phonetic continuous speech database. National Institute of Standards and Technology (NIST), Gaithersburgh, MD 107.
go back to reference Hansen, J .H., & Pellom, B. L. (1998). An effective quality evaluation protocol for speech enhancement algorithms. In Fifth international conference on spoken language processing. Sydney, Australia. Hansen, J .H., & Pellom, B. L. (1998). An effective quality evaluation protocol for speech enhancement algorithms. In Fifth international conference on spoken language processing. Sydney, Australia.
go back to reference He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision (pp. 1026–1034). He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision (pp. 1026–1034).
go back to reference Kamath, S., & Loizou, P. (2002). A multi-band spectral subtraction method for enhancing speech corrupted by colored noise. In ICASSP, Citeseer (Vol. 4, pp. 44164–44164). Kamath, S., & Loizou, P. (2002). A multi-band spectral subtraction method for enhancing speech corrupted by colored noise. In ICASSP, Citeseer (Vol. 4, pp. 44164–44164).
go back to reference Lee, H., Grosse, R., Ranganath, R., Ng, A.Y. (2009). Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In Proceedings of the 26th annual international conference on machine learning (pp. 609–616). ACM. https://doi.org/10.1145/1553374.1553453. Lee, H., Grosse, R., Ranganath, R., Ng, A.Y. (2009). Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In Proceedings of the 26th annual international conference on machine learning (pp. 609–616). ACM. https://​doi.​org/​10.​1145/​1553374.​1553453.
go back to reference Loizou, P. C. (2013). Speech enhancement: Theory and practice. Boca Raton: CRC Press.CrossRef Loizou, P. C. (2013). Speech enhancement: Theory and practice. Boca Raton: CRC Press.CrossRef
go back to reference Mukherjee, H., Obaidullah, S. M., Santosh, K., Phadikar, S., & Roy, K. (2018). Line spectral frequency-based features and extreme learning machine for voice activity detection from audio signal. International Journal of Speech Technology, 21, 753–760.CrossRef Mukherjee, H., Obaidullah, S. M., Santosh, K., Phadikar, S., & Roy, K. (2018). Line spectral frequency-based features and extreme learning machine for voice activity detection from audio signal. International Journal of Speech Technology, 21, 753–760.CrossRef
go back to reference Pearce, D., & Hirsch H. G. (2000). The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In ISCA ITRW ASR2000(pp. 29–32). Paris, France. Pearce, D., & Hirsch H. G. (2000). The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In ISCA ITRW ASR2000(pp. 29–32). Paris, France.
go back to reference Rix, A.W., Beerends, J.G., Hollier, M.P., & Hekstra, A.P. (2001). Perceptual evaluation of speech quality (pesq)-a new method for speech quality assessment of telephone networks and codecs. In 2001 IEEE international conference on acoustics, speech, and signal processing. Proceedings. (Vol. 2, pp. 749–752). IEEE. https://doi.org/10.1109/ICASSP.2001.941023. Rix, A.W., Beerends, J.G., Hollier, M.P., & Hekstra, A.P. (2001). Perceptual evaluation of speech quality (pesq)-a new method for speech quality assessment of telephone networks and codecs. In 2001 IEEE international conference on acoustics, speech, and signal processing. Proceedings. (Vol. 2, pp. 749–752). IEEE. https://​doi.​org/​10.​1109/​ICASSP.​2001.​941023.
Metadata
Title
Low SNR speech enhancement with DNN based phase estimation
Authors
Samba Raju Chiluveru
Manoj Tripathy
Publication date
23-02-2019
Publisher
Springer US
Published in
International Journal of Speech Technology / Issue 1/2019
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-019-09603-y

Other articles of this Issue 1/2019

International Journal of Speech Technology 1/2019 Go to the issue