Top

International Journal of Speech Technology

Published in:

23-02-2019

Low SNR speech enhancement with DNN based phase estimation

Authors: Samba Raju Chiluveru, Manoj Tripathy

Published in: International Journal of Speech Technology | Issue 1/2019

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

In low Signal-to-Noise Ratio environment phase information is one of the important factor and therefore this article consider the importance of clean phase in single channel speech enhancement technique. The proposed method uses Deep Neural Network based regression model to estimate clean phase and clean amplitude for speech reconstruction. Experiments are conducted over five different noises such as factory, restaurant, car, airport and babble at different levels and result are evaluated using objective quality measures like Perceptual Evaluation of Speech Quality, Weighted Spectral Slope, Cepstrum Distance, frequency weighted segmented Signal-to-Noise Ratio and Log Likelihood Ratio. The overall quality of speech improved for factory noise by \(12\%\), restaurant noise by \(8\%\), car noise by \(13\%\), airport noise by \(10\%\) and babble noise by \(14\%\) respectively.

previous article Replay spoofing countermeasures using high spectro-temporal resolution features

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Benesty, J., Makino, S., & Chen, J. (2005). Speech enhancement, signals and communication technology. Berlin: Springer.

Bengio, Y., et al. (2009). Learning deep architectures for AI. Foundations and Trends® in Machine Learning, 2(1), 1–127. https://doi.org/10.1561/2200000006.MathSciNetCrossRefMATH

Berouti, M., Schwartz, R., & Makhoul, J. (1979). Enhancement of speech corrupted by acoustic noise. In: Acoustics, Speech, and Signal Processing, International Conference on ICASSP, IEEE (Vol. 4, pp. 208–211).

Bottou, L. (2010). Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT’2010 (pp. 177–186). Springer.https://doi.org/10.1007/978-3-7908-2604-316.

Bouguelia, M. R., Nowaczyk, S., Santosh, K., & Verikas, A. (2018). Agreeing to disagree: Active learning with noisy labels without crowdsourcing. International Journal of Machine Learning and Cybernetics, 9(8), 1307–1319.CrossRef

Bouzid, A., Ellouze, N., et al. (2016). Speech enhancement based on wavelet packet of an improved principal component analysis. Computer Speech & Language, Elsevier, 35, 58–72. https://doi.org/10.1016/j.csl.2015.06.001.CrossRef

Brainard, D. H., & Vision, S. (1997). The psychophysics toolbox. Spatial Vision, 10, 433–436.CrossRef

Chazan, D., Hoory, R., Cohen, G., & Zibulski, M. (2000). Speech reconstruction from mel frequency cepstral coefficients and pitch frequency. In 2000 IEEE international conference on acoustics, speech, and signal processing. Proceedings (Vol. 3, pp. 1299–1302). IEEE. https://doi.org/10.1109/ICASSP.2000.861816.

Cohen, I. (2003). Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging. IEEE Transactions on Speech and Audio Processing, 11(5), 466–475. https://doi.org/10.1109/TSA.2003.811544.CrossRef

Dahl, G. E., Yu, D., Deng, L., & Acero, A. (2012). Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 20(1), 30–42. https://doi.org/10.1109/TASL.2011.2134090.CrossRef

Deng, L. (2012). Three classes of deep learning architectures and their applications: a tutorial survey. APSIPA Transactions on Signal and Information Processing, 1, 60–88.

Ephraim, Y., & Malah, D. (1984). Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(6), 1109–1121. https://doi.org/10.1109/TASSP.1984.1164453.CrossRef

Ephraim, Y., & Malah, D. (1985). Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 33(2), 443–445. https://doi.org/10.1109/TASSP.1985.1164550.CrossRef

Garofolo, J. S., et al. (1988). Getting started with the DARPA TIMIT CD-ROM: An acoustic phonetic continuous speech database. National Institute of Standards and Technology (NIST), Gaithersburgh, MD 107.

Graves, A., Mohamed, A. R., & Hinton, G. (2013). Speech recognition with deep recurrent neural networks. In IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 6645–6649). IEEE. https://doi.org/10.1109/ICASSP.2013.6638947.

Hansen, J .H., & Pellom, B. L. (1998). An effective quality evaluation protocol for speech enhancement algorithms. In Fifth international conference on spoken language processing. Sydney, Australia.

He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision (pp. 1026–1034).

Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504–507. https://doi.org/10.1126/science.1127647.MathSciNetCrossRefMATH

Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735.CrossRef

Hu, Y., & Loizou, P. C. (2008). Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 229–238. https://doi.org/10.1109/TASL.2007.911054.CrossRef

Kamath, S., & Loizou, P. (2002). A multi-band spectral subtraction method for enhancing speech corrupted by colored noise. In ICASSP, Citeseer (Vol. 4, pp. 44164–44164).

Klatt, D. (1982). Prediction of perceived phonetic distance from critical-band spectra: A first step. In Acoustics, speech, and signal processing, IEEE international conference on ICASSP’82 (Vol. 7, pp. 1278–1281). IEEE. https://doi.org/10.1109/ICASSP.1982.1171512.

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. https://doi.org/10.1038/nature14539.CrossRef

Lee, H., Grosse, R., Ranganath, R., Ng, A.Y. (2009). Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In Proceedings of the 26th annual international conference on machine learning (pp. 609–616). ACM. https://doi.org/10.1145/1553374.1553453.

Loizou, P. C. (2013). Speech enhancement: Theory and practice. Boca Raton: CRC Press.CrossRef

Loizou, P. C., & Kim, G. (2011). Reasons why current speech-enhancement algorithms do not improve speech intelligibility and suggested solutions. IEEE Transactions on Audio, Speech, and Language Processing, 19(1), 47–56. https://doi.org/10.1109/TASL.2010.2045180.CrossRef

Mukherjee, H., Obaidullah, S. M., Santosh, K., Phadikar, S., & Roy, K. (2018). Line spectral frequency-based features and extreme learning machine for voice activity detection from audio signal. International Journal of Speech Technology, 21, 753–760.CrossRef

Pearce, D., & Hirsch H. G. (2000). The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In ISCA ITRW ASR2000(pp. 29–32). Paris, France.

Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2000). Speaker verification using adapted gaussian mixture models. Digital Signal Processing, 10(1–3), 19–41. https://doi.org/10.1006/dspr.1999.0361.CrossRef

Rix, A.W., Beerends, J.G., Hollier, M.P., & Hekstra, A.P. (2001). Perceptual evaluation of speech quality (pesq)-a new method for speech quality assessment of telephone networks and codecs. In 2001 IEEE international conference on acoustics, speech, and signal processing. Proceedings. (Vol. 2, pp. 749–752). IEEE. https://doi.org/10.1109/ICASSP.2001.941023.

Samui, S., Chakrabarti, I., & Ghosh, S. K. (2016). Improved single channel phase-aware speech enhancement technique for low signal-to-noise ratio signal. IET Signal Processing, 10(6), 641–650. https://doi.org/10.1049/iet-spr.2015.0182.CrossRef

Scalart, P., et al. (1996). Speech enhancement based on a priori signal to noise estimation. In Acoustics, speech, and signal processing, 1996. ICASSP-96. Conference proceedings (Vol. 2, pp. 629–632). IEEE. https://doi.org/10.1109/ICASSP.1996.543199.

Steeneken, H. J. M., & Houtgast, T. (1980). A physical method for measuring speech-transmission quality. The Journal of the Acoustical Society of America, 67(1), 318–326. https://doi.org/10.1121/1.384464.CrossRef

Surendran, S., & Kumar, T. K. (2015). Perceptual subspace speech enhancement with variance normalization. Procedia Computer Science, 54, 818–828. https://doi.org/10.1016/j.procs.2015.06.096.CrossRef

Vary, P., & Eurasip, M. (1985). Noise suppression by spectral magnitude estimation mechanism and theoretical limits. Signal Processing, 8(4), 387–400. https://doi.org/10.1016/0165-1684(85)90002-7.CrossRef

Xu, Y., Du, J., Dai, L. R., & Lee, C. H. (2015). A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 23(1), 7–19. https://doi.org/10.1109/TASLP.2014.2364452.CrossRef

Title: Low SNR speech enhancement with DNN based phase estimation
Authors: Samba Raju Chiluveru
Manoj Tripathy
Publication date: 23-02-2019
Publisher: Springer US
Published in: International Journal of Speech Technology / Issue 1/2019
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-019-09603-y

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 1/2019

Noise reduction in speech signals using adaptive independent component analysis (ICA) for hands free communication devices

Long short-term memory recurrent neural network architectures for Urdu acoustic modeling

Voice signal processing for detecting possible early signs of Parkinson’s disease in patients with rapid eye movement sleep behavior disorder

Automatic detection of consonant omission in cleft palate speech

Optimal prosodic feature extraction and classification in parametric excitation source information for Indian language identification using neural network based Q-learning algorithm

A comparative study of deep neural network based Punjabi-ASR system