Skip to main content
Erschienen in: International Journal of Speech Technology 4/2019

10.10.2019

A review of supervised learning algorithms for single channel speech enhancement

verfasst von: Nasir Saleem, Muhammad Irfan Khattak

Erschienen in: International Journal of Speech Technology | Ausgabe 4/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Reducing interfering noise in a noisy speech recording has been a difficult task in many applications related to the voice. From hands-free communication to human–machine interaction, a speech signal of the interest captured by a microphone is always mixed with the interfering noise. The interfering noise appends new frequency components and masks a large portion of the time-varying spectra of the desired speech. This significantly affects our perception of the desired speech when listening to the noisy observations. Therefore, it is extremely desirable and sometimes even crucial to clean the noisy speech signals. This clean-up process is referred to as the speech enhancement (SE). SE aims to improve the speech intelligibility and quality of the voice for the communication. We present a comprehensive review on the supervised single channel speech enhancement (SCSE) algorithms. First, a classification based overview of the supervised SCSE algorithms is provided and the related works is outlined. The recent literature on the SCSE algorithms in supervised perspective is reviewed. Finally, some open research problems are identified that need further research.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Ali, S. M., & Gupta, B. Speech enhancement using neural network. Ali, S. M., & Gupta, B. Speech enhancement using neural network.
Zurück zum Zitat Allen, J. B. (1994). How do humans process and recognize speech? IEEE Transactions on Speech and Audio Processing,2(4), 567–577. Allen, J. B. (1994). How do humans process and recognize speech? IEEE Transactions on Speech and Audio Processing,2(4), 567–577.
Zurück zum Zitat Arehart, K. H., Hansen, J. H., Gallant, S., & Kalstein, L. (2003). Evaluation of an auditory masked threshold noise suppression algorithm in normal-hearing and hearing-impaired listeners. Speech Communication,40(4), 575–592. Arehart, K. H., Hansen, J. H., Gallant, S., & Kalstein, L. (2003). Evaluation of an auditory masked threshold noise suppression algorithm in normal-hearing and hearing-impaired listeners. Speech Communication,40(4), 575–592.
Zurück zum Zitat Baer, T., Moore, B. C., & Gatehouse, S. (1993). Spectral contrast enhancement of speech in noise for listeners with sensorineural hearing impairment: Effects on intelligibility, quality, and response times. Journal of Rehabilitation Research and Development,30, 49. Baer, T., Moore, B. C., & Gatehouse, S. (1993). Spectral contrast enhancement of speech in noise for listeners with sensorineural hearing impairment: Effects on intelligibility, quality, and response times. Journal of Rehabilitation Research and Development,30, 49.
Zurück zum Zitat Bahoura, M., & Rouat, J. (2001). Wavelet speech enhancement based on the teager energy operator. IEEE Signal Processing Letters,8(1), 10–12. Bahoura, M., & Rouat, J. (2001). Wavelet speech enhancement based on the teager energy operator. IEEE Signal Processing Letters,8(1), 10–12.
Zurück zum Zitat Bentler, R., Wu, Y. H., Kettel, J., & Hurtig, R. (2008). Digital noise reduction: Outcomes from laboratory and field studies. International Journal of Audiology,47(8), 447–460. Bentler, R., Wu, Y. H., Kettel, J., & Hurtig, R. (2008). Digital noise reduction: Outcomes from laboratory and field studies. International Journal of Audiology,47(8), 447–460.
Zurück zum Zitat Chang, C. C., & Lin, C. J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST),2(3), 27. Chang, C. C., & Lin, C. J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST),2(3), 27.
Zurück zum Zitat Chazan, S. E., Goldberger, J., & Gannot, S. (2016). A hybrid approach for speech enhancement using MoG model and neural network phoneme classifier. IEEE/ACM Transactions on Audio, Speech, and Language Processing,24(12), 2516–2530. Chazan, S. E., Goldberger, J., & Gannot, S. (2016). A hybrid approach for speech enhancement using MoG model and neural network phoneme classifier. IEEE/ACM Transactions on Audio, Speech, and Language Processing,24(12), 2516–2530.
Zurück zum Zitat Chen, J., Wang, Y., & Wang, D. (2014). A feature study for classification-based speech separation at low signal-to-noise ratios. IEEE/ACM Transactions on Audio, Speech, and Language Processing,22(12), 1993–2002. Chen, J., Wang, Y., & Wang, D. (2014). A feature study for classification-based speech separation at low signal-to-noise ratios. IEEE/ACM Transactions on Audio, Speech, and Language Processing,22(12), 1993–2002.
Zurück zum Zitat Chen, J., Wang, Y., & Wang, D. (2016). Noise perturbation for supervised speech separation. Speech Communication,78, 1–10. Chen, J., Wang, Y., & Wang, D. (2016). Noise perturbation for supervised speech separation. Speech Communication,78, 1–10.
Zurück zum Zitat Chiluveru, S. R., & Tripathy, M. (2019). Low SNR speech enhancement with DNN based phase estimation. International Journal of Speech Technology,22(1), 283–292. Chiluveru, S. R., & Tripathy, M. (2019). Low SNR speech enhancement with DNN based phase estimation. International Journal of Speech Technology,22(1), 283–292.
Zurück zum Zitat Chung, H., Plourde, E., & Champagne, B. (2016, March). Basis compensation in non-negative matrix factorization model for speech enhancement. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 2249–2253). IEEE. Chung, H., Plourde, E., & Champagne, B. (2016, March). Basis compensation in non-negative matrix factorization model for speech enhancement. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 2249–2253). IEEE.
Zurück zum Zitat Chung, H., Plourde, E., & Champagne, B. (2017). Regularized non-negative matrix factorization with Gaussian mixtures and masking model for speech enhancement. Speech Communication,87, 18–30. Chung, H., Plourde, E., & Champagne, B. (2017). Regularized non-negative matrix factorization with Gaussian mixtures and masking model for speech enhancement. Speech Communication,87, 18–30.
Zurück zum Zitat Cohen, Israel. (2002). Optimal speech enhancement under signal presence uncertainty using log-spectral amplitude estimator. IEEE Signal Processing Letters,9(4), 113–116. Cohen, Israel. (2002). Optimal speech enhancement under signal presence uncertainty using log-spectral amplitude estimator. IEEE Signal Processing Letters,9(4), 113–116.
Zurück zum Zitat Cohen, I., & Berdugo, B. (2001). Speech enhancement for non-stationary noise environments. Signal Processing,81(11), 2403–2418.MATH Cohen, I., & Berdugo, B. (2001). Speech enhancement for non-stationary noise environments. Signal Processing,81(11), 2403–2418.MATH
Zurück zum Zitat Cohen, I., & Berdugo, B. (2002). Noise estimation by minima controlled recursive averaging for robust speech enhancement. IEEE Signal Processing Letters,9(1), 12–15. Cohen, I., & Berdugo, B. (2002). Noise estimation by minima controlled recursive averaging for robust speech enhancement. IEEE Signal Processing Letters,9(1), 12–15.
Zurück zum Zitat Deng, L., Hinton, G., & Kingsbury, B. (2013, May). New types of deep neural network learning for speech recognition and related applications: An overview. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 8599–8603). IEEE. Deng, L., Hinton, G., & Kingsbury, B. (2013, May). New types of deep neural network learning for speech recognition and related applications: An overview. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 8599–8603). IEEE.
Zurück zum Zitat Eggert, J., Wersing, H., & Korner, E. (2004, July). Transformation-invariant representation and NMF. In 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No. 04CH37541) (Vol. 4, pp. 2535–2539). IEEE. Eggert, J., Wersing, H., & Korner, E. (2004, July). Transformation-invariant representation and NMF. In 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No. 04CH37541) (Vol. 4, pp. 2535–2539). IEEE.
Zurück zum Zitat Ephraim, Y., & Malah, D. (1984). Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing,32(6), 1109–1121. Ephraim, Y., & Malah, D. (1984). Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing,32(6), 1109–1121.
Zurück zum Zitat Ephraim, Yariv, & Malah, David. (1985). Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing,33(2), 443–445. Ephraim, Yariv, & Malah, David. (1985). Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing,33(2), 443–445.
Zurück zum Zitat Ephraim, Y., & van Trees, H. L. (1995). A signal subspace approach for speech enhancement. IEEE Transactions on Speech and Audio Processing,3(4), 251–266. Ephraim, Y., & van Trees, H. L. (1995). A signal subspace approach for speech enhancement. IEEE Transactions on Speech and Audio Processing,3(4), 251–266.
Zurück zum Zitat Févotte, C., & Idier, J. (2011). Algorithms for nonnegative matrix factorization with the β-divergence. Neural Computation,23(9), 2421–2456.MathSciNetMATH Févotte, C., & Idier, J. (2011). Algorithms for nonnegative matrix factorization with the β-divergence. Neural Computation,23(9), 2421–2456.MathSciNetMATH
Zurück zum Zitat Glorot, X., & Bengio, Y. (2010, March). Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics (pp. 249–256). Glorot, X., & Bengio, Y. (2010, March). Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics (pp. 249–256).
Zurück zum Zitat Gordon-Salant, S. (1987). Effects of acoustic modification on consonant recognition by elderly hearing-impaired subjects. The Journal of the Acoustical Society of America,81(4), 1199–1202. Gordon-Salant, S. (1987). Effects of acoustic modification on consonant recognition by elderly hearing-impaired subjects. The Journal of the Acoustical Society of America,81(4), 1199–1202.
Zurück zum Zitat Han, K., & Wang, D. (2012). A classification based approach to speech segregation. The Journal of the Acoustical Society of America,132(5), 3475–3483. Han, K., & Wang, D. (2012). A classification based approach to speech segregation. The Journal of the Acoustical Society of America,132(5), 3475–3483.
Zurück zum Zitat Han, K., & Wang, D. (2013). Towards generalizing classification based speech separation. IEEE Transactions on Audio, Speech and Language Processing,21(1), 168–177. Han, K., & Wang, D. (2013). Towards generalizing classification based speech separation. IEEE Transactions on Audio, Speech and Language Processing,21(1), 168–177.
Zurück zum Zitat Han, W., Zhang, X., Min, G., & Sun, M. (2016). A perceptually motivated approach for speech enhancement based on deep neural network. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences,99(4), 835–838. Han, W., Zhang, X., Min, G., & Sun, M. (2016). A perceptually motivated approach for speech enhancement based on deep neural network. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences,99(4), 835–838.
Zurück zum Zitat Han, W., Zhang, X., Min, G., Zhou, X., & Sun, M. (2017). Joint optimization of perceptual gain function and deep neural networks for single-channel speech enhancement. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences,100(2), 714–717. Han, W., Zhang, X., Min, G., Zhou, X., & Sun, M. (2017). Joint optimization of perceptual gain function and deep neural networks for single-channel speech enhancement. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences,100(2), 714–717.
Zurück zum Zitat Hansen, J. H., & Clements, M. A. (1991). Constrained iterative speech enhancement with application to speech recognition. IEEE Transactions on Signal Processing,39(4), 795–805. Hansen, J. H., & Clements, M. A. (1991). Constrained iterative speech enhancement with application to speech recognition. IEEE Transactions on Signal Processing,39(4), 795–805.
Zurück zum Zitat Helfer, K. S., & Wilber, L. A. (1990). Hearing loss, aging, and speech perception in reverberation and noise. Journal of Speech, Language, and Hearing Research,33(1), 149–155. Helfer, K. S., & Wilber, L. A. (1990). Hearing loss, aging, and speech perception in reverberation and noise. Journal of Speech, Language, and Hearing Research,33(1), 149–155.
Zurück zum Zitat Hermus, K., & Wambacq, P. (2006). A review of signal subspace speech enhancement and its application to noise robust speech recognition. EURASIP Journal on Advances in Signal Processing,2007(1), 045821.MathSciNetMATH Hermus, K., & Wambacq, P. (2006). A review of signal subspace speech enhancement and its application to noise robust speech recognition. EURASIP Journal on Advances in Signal Processing,2007(1), 045821.MathSciNetMATH
Zurück zum Zitat Hirsch, H. G., & Ehrlicher, C. (1995, May). Noise estimation techniques for robust speech recognition. In 1995 International Conference on Acoustics, Speech, and Signal Processing (Vol. 1, pp. 153–156). IEEE. Hirsch, H. G., & Ehrlicher, C. (1995, May). Noise estimation techniques for robust speech recognition. In 1995 International Conference on Acoustics, Speech, and Signal Processing (Vol. 1, pp. 153–156). IEEE.
Zurück zum Zitat Hu, Y., & Loizou, P. C. (2004). Speech enhancement based on wavelet thresholding the multitaper spectrum. IEEE Transactions on Speech and Audio processing,12(1), 59–67. Hu, Y., & Loizou, P. C. (2004). Speech enhancement based on wavelet thresholding the multitaper spectrum. IEEE Transactions on Speech and Audio processing,12(1), 59–67.
Zurück zum Zitat Hu, Y., & Loizou, P. C. (2007, April). A comparative intelligibility study of speech enhancement algorithms. In 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07 (Vol. 4, pp. IV–561). IEEE. Hu, Y., & Loizou, P. C. (2007, April). A comparative intelligibility study of speech enhancement algorithms. In 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07 (Vol. 4, pp. IV–561). IEEE.
Zurück zum Zitat Hu, Y., & Loizou, P. C. (2007b). A comparative intelligibility study of single-microphone noise reduction algorithms. The Journal of the Acoustical Society of America,122(3), 1777–1786. Hu, Y., & Loizou, P. C. (2007b). A comparative intelligibility study of single-microphone noise reduction algorithms. The Journal of the Acoustical Society of America,122(3), 1777–1786.
Zurück zum Zitat Hu, G., & Wang, D. (2010). A tandem algorithm for pitch estimation and voiced speech segregation. IEEE Transactions on Audio, Speech and Language Processing,18(8), 2067–2079. Hu, G., & Wang, D. (2010). A tandem algorithm for pitch estimation and voiced speech segregation. IEEE Transactions on Audio, Speech and Language Processing,18(8), 2067–2079.
Zurück zum Zitat Hu, Y., Zhang, X., Zou, X., Sun, M., Min, G., & Li, Y. (2016). Improved semi-supervised NMF based real-time capable speech enhancement. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences,99(1), 402–406. Hu, Y., Zhang, X., Zou, X., Sun, M., Min, G., & Li, Y. (2016). Improved semi-supervised NMF based real-time capable speech enhancement. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences,99(1), 402–406.
Zurück zum Zitat Hu, Y., Zhang, X., Zou, X., Sun, M., Zheng, Y., & Min, G. (2017). Semi-supervised speech enhancement combining nonnegative matrix factorization and robust principal component analysis. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences,100(8), 1714–1719. Hu, Y., Zhang, X., Zou, X., Sun, M., Zheng, Y., & Min, G. (2017). Semi-supervised speech enhancement combining nonnegative matrix factorization and robust principal component analysis. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences,100(8), 1714–1719.
Zurück zum Zitat Huang, P. S., Kim, M., Hasegawa-Johnson, M., & Smaragdis, P. (2015). Joint optimization of masks and deep recurrent neural networks for monaural source separation. IEEE/ACM Transactions on Audio, Speech, and Language Processing,23(12), 2136–2147. Huang, P. S., Kim, M., Hasegawa-Johnson, M., & Smaragdis, P. (2015). Joint optimization of masks and deep recurrent neural networks for monaural source separation. IEEE/ACM Transactions on Audio, Speech, and Language Processing,23(12), 2136–2147.
Zurück zum Zitat Hussain, T., Siniscalchi, S. M., Lee, C. C., Wang, S. S., Tsao, Y., & Liao, W. H. (2017). Experimental study on extreme learning machine applications for speech enhancement. IEEE Access,5, 25542–25554. Hussain, T., Siniscalchi, S. M., Lee, C. C., Wang, S. S., Tsao, Y., & Liao, W. H. (2017). Experimental study on extreme learning machine applications for speech enhancement. IEEE Access,5, 25542–25554.
Zurück zum Zitat Jamieson, D. G., Brennan, R. L., & Cornelisse, L. E. (1995). Evaluation of a speech enhancement strategy with normal-hearing and hearing-impaired listeners. Ear and Hearing,16(3), 274–286. Jamieson, D. G., Brennan, R. L., & Cornelisse, L. E. (1995). Evaluation of a speech enhancement strategy with normal-hearing and hearing-impaired listeners. Ear and Hearing,16(3), 274–286.
Zurück zum Zitat Jin, Z., & Wang, D. (2009). A supervised learning approach to monaural segregation of reverberant speech. IEEE Transactions on Audio, Speech and Language Processing,17(4), 625–638. Jin, Z., & Wang, D. (2009). A supervised learning approach to monaural segregation of reverberant speech. IEEE Transactions on Audio, Speech and Language Processing,17(4), 625–638.
Zurück zum Zitat Joder, C., Weninger, F., Eyben, F., Virette, D., & Schuller, B. (2012, March). Real-time speech separation by semi-supervised nonnegative matrix factorization. In International Conference on Latent Variable Analysis and Signal Separation (pp. 322–329). Berlin, Heidelberg: Springer. Joder, C., Weninger, F., Eyben, F., Virette, D., & Schuller, B. (2012, March). Real-time speech separation by semi-supervised nonnegative matrix factorization. In International Conference on Latent Variable Analysis and Signal Separation (pp. 322–329). Berlin, Heidelberg: Springer.
Zurück zum Zitat Kim, G., & Loizou, P. C. (2010). Improving speech intelligibility in noise using environment-optimized algorithms. IEEE Transactions on Audio, Speech and Language Processing,18(8), 2080–2090. Kim, G., & Loizou, P. C. (2010). Improving speech intelligibility in noise using environment-optimized algorithms. IEEE Transactions on Audio, Speech and Language Processing,18(8), 2080–2090.
Zurück zum Zitat Kim, G., Lu, Y., Hu, Y., & Loizou, P. C. (2009). An algorithm that improves speech intelligibility in noise for normal-hearing listeners. The Journal of the Acoustical Society of America,126(3), 1486–1494. Kim, G., Lu, Y., Hu, Y., & Loizou, P. C. (2009). An algorithm that improves speech intelligibility in noise for normal-hearing listeners. The Journal of the Acoustical Society of America,126(3), 1486–1494.
Zurück zum Zitat Kim, W., & Stern, R. M. (2011). Mask classification for missing-feature reconstruction for robust speech recognition in unknown background noise. Speech Communication,53(1), 1–11. Kim, W., & Stern, R. M. (2011). Mask classification for missing-feature reconstruction for robust speech recognition in unknown background noise. Speech Communication,53(1), 1–11.
Zurück zum Zitat Kolbk, M., Tan, Z. H., Jensen, J., Kolbk, M., Tan, Z. H., & Jensen, J. (2017). Speech intelligibility potential of general and specialized deep neural network based speech enhancement systems. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP),25(1), 153–167. Kolbk, M., Tan, Z. H., Jensen, J., Kolbk, M., Tan, Z. H., & Jensen, J. (2017). Speech intelligibility potential of general and specialized deep neural network based speech enhancement systems. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP),25(1), 153–167.
Zurück zum Zitat Koul, R. K., & Allen, G. D. (1993). Segmental intelligibility and speech interference thresholds of high-quality synthetic speech in presence of noise. Journal of Speech, Language, and Hearing Research,36(4), 790–798. Koul, R. K., & Allen, G. D. (1993). Segmental intelligibility and speech interference thresholds of high-quality synthetic speech in presence of noise. Journal of Speech, Language, and Hearing Research,36(4), 790–798.
Zurück zum Zitat Krishnamoorthy, P., & Prasanna, S. M. (2009). Temporal and spectral processing methods for processing of degraded speech: A review. IETE Technical Review,26(2), 137–148. Krishnamoorthy, P., & Prasanna, S. M. (2009). Temporal and spectral processing methods for processing of degraded speech: A review. IETE Technical Review,26(2), 137–148.
Zurück zum Zitat Larochelle, H., Bengio, Y., Louradour, J., & Lamblin, P. (2009). Exploring strategies for training deep neural networks. Journal of Machine Learning Research,10, 1–40.MATH Larochelle, H., Bengio, Y., Louradour, J., & Lamblin, P. (2009). Exploring strategies for training deep neural networks. Journal of Machine Learning Research,10, 1–40.MATH
Zurück zum Zitat LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature,521(7553), 436. LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature,521(7553), 436.
Zurück zum Zitat Lee, D. D., & Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature,401(6755), 788.MATH Lee, D. D., & Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature,401(6755), 788.MATH
Zurück zum Zitat Lee, D. D., & Seung, H. S. (2001). Algorithms for non-negative matrix factorization. In M. C. Mozer, M. E. Hasselmo, & D. S. Touretzky (Eds.), Advances in neural information processing systems (pp. 556–562). Cambridge: MIT Press. Lee, D. D., & Seung, H. S. (2001). Algorithms for non-negative matrix factorization. In M. C. Mozer, M. E. Hasselmo, & D. S. Touretzky (Eds.), Advances in neural information processing systems (pp. 556–562). Cambridge: MIT Press.
Zurück zum Zitat Levitt, H. (2001). Noise reduction in hearing aids: A review. Journal of Rehabilitation Research and Development,38(1), 111–122.MathSciNet Levitt, H. (2001). Noise reduction in hearing aids: A review. Journal of Rehabilitation Research and Development,38(1), 111–122.MathSciNet
Zurück zum Zitat Li, Y., & Kang, S. (2016). Deep neural network-based linear predictive parameter estimations for speech enhancement. IET Signal Processing,11(4), 469–476. Li, Y., & Kang, S. (2016). Deep neural network-based linear predictive parameter estimations for speech enhancement. IET Signal Processing,11(4), 469–476.
Zurück zum Zitat Loizou, Philipos C. (2007). Speech enhancement: Theory and practice. Boca Raton, FL: CRC. Loizou, Philipos C. (2007). Speech enhancement: Theory and practice. Boca Raton, FL: CRC.
Zurück zum Zitat Loizou, P. C. (2011). Speech quality assessment. In Multimedia analysis, processing and communications (pp. 623–654). Berlin, Heidelberg: Springer. Loizou, P. C. (2011). Speech quality assessment. In Multimedia analysis, processing and communications (pp. 623–654). Berlin, Heidelberg: Springer.
Zurück zum Zitat Lotter, T., & Vary, P. (2005). Speech enhancement by MAP spectral amplitude estimation using a super-Gaussian speech model. EURASIP Journal on Advances in Signal Processing,2005(7), 354850.MATH Lotter, T., & Vary, P. (2005). Speech enhancement by MAP spectral amplitude estimation using a super-Gaussian speech model. EURASIP Journal on Advances in Signal Processing,2005(7), 354850.MATH
Zurück zum Zitat Ludeña-Choez, J., & Gallardo-Antolín, A. (2012). Speech denoising using non-negative matrix factorization with kullback-leibler divergence and sparseness constraints. In Advances in Speech and Language Technologies for Iberian Languages (pp. 207–216). Berlin, Heidelberg: Springer. Ludeña-Choez, J., & Gallardo-Antolín, A. (2012). Speech denoising using non-negative matrix factorization with kullback-leibler divergence and sparseness constraints. In Advances in Speech and Language Technologies for Iberian Languages (pp. 207–216). Berlin, Heidelberg: Springer.
Zurück zum Zitat Luts, H., Eneman, K., Wouters, J., Schulte, M., Vormann, M., Buechler, M.,… & Puder, H. (2010). Multicenter evaluation of signal enhancement algorithms for hearing aids. The Journal of the Acoustical Society of America, 127(3), 1491-1505. Luts, H., Eneman, K., Wouters, J., Schulte, M., Vormann, M., Buechler, M.,… & Puder, H. (2010). Multicenter evaluation of signal enhancement algorithms for hearing aids. The Journal of the Acoustical Society of America, 127(3), 1491-1505.
Zurück zum Zitat Ma, J., & Loizou, P. C. (2011). SNR loss: A new objective measure for predicting the intelligibility of noise-suppressed speech. Speech Communication,53(3), 340–354. Ma, J., & Loizou, P. C. (2011). SNR loss: A new objective measure for predicting the intelligibility of noise-suppressed speech. Speech Communication,53(3), 340–354.
Zurück zum Zitat Martin, R. (2005). Speech enhancement based on minimum mean-square error estimation and supergaussian priors. IEEE Transactions on Speech and Audio Processing,13(5), 845–856. Martin, R. (2005). Speech enhancement based on minimum mean-square error estimation and supergaussian priors. IEEE Transactions on Speech and Audio Processing,13(5), 845–856.
Zurück zum Zitat May, T., & Dau, T. (2014). Requirements for the evaluation of computational speech segregation systems. The Journal of the Acoustical Society of America,136(6), 398–404. May, T., & Dau, T. (2014). Requirements for the evaluation of computational speech segregation systems. The Journal of the Acoustical Society of America,136(6), 398–404.
Zurück zum Zitat Mohammadiha, N., Smaragdis, P., & Leijon, A. (2013). Supervised and unsupervised speech enhancement using nonnegative matrix factorization. IEEE Transactions on Audio, Speech and Language Processing,21(10), 2140–2151. Mohammadiha, N., Smaragdis, P., & Leijon, A. (2013). Supervised and unsupervised speech enhancement using nonnegative matrix factorization. IEEE Transactions on Audio, Speech and Language Processing,21(10), 2140–2151.
Zurück zum Zitat Mohammed, S., & Tashev, I. (2017, March). A statistical approach to semi-supervised speech enhancement with low-order non-negative matrix factorization. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 546–550). IEEE. Mohammed, S., & Tashev, I. (2017, March). A statistical approach to semi-supervised speech enhancement with low-order non-negative matrix factorization. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 546–550). IEEE.
Zurück zum Zitat Moore, B. C. (2003). Speech processing for the hearing-impaired: Successes, failures, and implications for speech mechanisms. Speech Communication,41(1), 81–91. Moore, B. C. (2003). Speech processing for the hearing-impaired: Successes, failures, and implications for speech mechanisms. Speech Communication,41(1), 81–91.
Zurück zum Zitat Mysore, G. J., & Smaragdis, P. (2011, May). A non-negative approach to semi-supervised separation of speech from noise with the use of temporal dynamics. In 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 17–20). IEEE. Mysore, G. J., & Smaragdis, P. (2011, May). A non-negative approach to semi-supervised separation of speech from noise with the use of temporal dynamics. In 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 17–20). IEEE.
Zurück zum Zitat Nidhyananthan, S. S., Kumari, R. S. S., & Prakash, A. A. (2014). A review on speech enhancement algorithms and why to combine with environment classification. International Journal of Modern Physics C,25(10), 1430002. Nidhyananthan, S. S., Kumari, R. S. S., & Prakash, A. A. (2014). A review on speech enhancement algorithms and why to combine with environment classification. International Journal of Modern Physics C,25(10), 1430002.
Zurück zum Zitat Nielsen, M. A. (2015). Neural networks and deep learning(Vol 25). San Francisco, CA: Determination Press. Nielsen, M. A. (2015). Neural networks and deep learning(Vol 25). San Francisco, CA: Determination Press.
Zurück zum Zitat Ozerov, A., Philippe, P., Bimbot, F., & Gribonval, R. (2007). Adaptation of Bayesian models for single-channel source separation and its application to voice/music separation in popular songs. IEEE Transactions on Audio, Speech and Language Processing,15(5), 1564–1578. Ozerov, A., Philippe, P., Bimbot, F., & Gribonval, R. (2007). Adaptation of Bayesian models for single-channel source separation and its application to voice/music separation in popular songs. IEEE Transactions on Audio, Speech and Language Processing,15(5), 1564–1578.
Zurück zum Zitat Pal, S. K., & Mitra, S. (1992). Multilayer perceptron, fuzzy sets, and classification. IEEE Transactions on Neural Networks,3(5), 683–697. Pal, S. K., & Mitra, S. (1992). Multilayer perceptron, fuzzy sets, and classification. IEEE Transactions on Neural Networks,3(5), 683–697.
Zurück zum Zitat Plapous, C., Marro, C., & Scalart, P. (2006). Improved signal-to-noise ratio estimation for speech enhancement. IEEE Transactions on Audio, Speech and Language Processing,14(6), 2098–2108. Plapous, C., Marro, C., & Scalart, P. (2006). Improved signal-to-noise ratio estimation for speech enhancement. IEEE Transactions on Audio, Speech and Language Processing,14(6), 2098–2108.
Zurück zum Zitat Quackenbush, S. R. (1995). Objective measures of speech quality. (Doctoral dissertation, Georgia Institute of Technology). Quackenbush, S. R. (1995). Objective measures of speech quality. (Doctoral dissertation, Georgia Institute of Technology).
Zurück zum Zitat Raj, B., Virtanen, T., Chaudhuri, S., & Singh, R. (2010). Non-negative matrix factorization based compensation of music for automatic speech recognition. In Eleventh Annual Conference of the International Speech Communication Association. Raj, B., Virtanen, T., Chaudhuri, S., & Singh, R. (2010). Non-negative matrix factorization based compensation of music for automatic speech recognition. In Eleventh Annual Conference of the International Speech Communication Association.
Zurück zum Zitat Rezayee, A., & Gazor, S. (2001). An adaptive KLT approach for speech enhancement. IEEE Transactions on Speech and Audio Processing,9(2), 87–95. Rezayee, A., & Gazor, S. (2001). An adaptive KLT approach for speech enhancement. IEEE Transactions on Speech and Audio Processing,9(2), 87–95.
Zurück zum Zitat Rix, A. W., Beerends, J. G., Hollier, M. P., & Hekstra, A. P. (2001, May). Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. In 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No. 01CH37221) (Vol. 2, pp. 749–752). IEEE. Rix, A. W., Beerends, J. G., Hollier, M. P., & Hekstra, A. P. (2001, May). Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. In 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No. 01CH37221) (Vol. 2, pp. 749–752). IEEE.
Zurück zum Zitat Roberts, S. J., Husmeier, D., Rezek, I., & Penny, W. (1998). Bayesian approaches to Gaussian mixture modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence,20(11), 1133–1142. Roberts, S. J., Husmeier, D., Rezek, I., & Penny, W. (1998). Bayesian approaches to Gaussian mixture modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence,20(11), 1133–1142.
Zurück zum Zitat Roweis, S. T. (2001). One microphone source separation. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems (pp. 793–799). Cambridge: MIT Press. Roweis, S. T. (2001). One microphone source separation. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems (pp. 793–799). Cambridge: MIT Press.
Zurück zum Zitat Ruck, D. W., Rogers, S. K., & Kabrisky, M. (1990a). Feature selection using a multilayer perceptron. Journal of Neural Network Computing,2(2), 40–48. Ruck, D. W., Rogers, S. K., & Kabrisky, M. (1990a). Feature selection using a multilayer perceptron. Journal of Neural Network Computing,2(2), 40–48.
Zurück zum Zitat Ruck, D. W., Rogers, S. K., Kabrisky, M., Oxley, M. E., & Suter, B. W. (1990b). The multilayer perceptron as an approximation to a Bayes optimal discriminant function. IEEE Transactions on Neural Networks,1(4), 296–298. Ruck, D. W., Rogers, S. K., Kabrisky, M., Oxley, M. E., & Suter, B. W. (1990b). The multilayer perceptron as an approximation to a Bayes optimal discriminant function. IEEE Transactions on Neural Networks,1(4), 296–298.
Zurück zum Zitat Sainath, T. N., Vinyals, O., Senior, A., & Sak, H. (2015, April). Convolutional, long short-term memory, fully connected deep neural networks. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4580–4584). IEEE. Sainath, T. N., Vinyals, O., Senior, A., & Sak, H. (2015, April). Convolutional, long short-term memory, fully connected deep neural networks. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4580–4584). IEEE.
Zurück zum Zitat Saleem, N. (2017). Single channel noise reduction system in low SNR. International Journal of Speech Technology,20(1), 89–98.MathSciNet Saleem, N. (2017). Single channel noise reduction system in low SNR. International Journal of Speech Technology,20(1), 89–98.MathSciNet
Zurück zum Zitat Saleem, N., & Khattak, M. I. (2019). Deep neural networks for speech enhancement in complex-noisy environments. International Journal of Interactive Multimedia and Artificial Intelligence, vol. In Press, issue In Press, no. In Press, pp. 1–7, In Press. Saleem, N., & Khattak, M. I. (2019). Deep neural networks for speech enhancement in complex-noisy environments. International Journal of Interactive Multimedia and Artificial Intelligence, vol. In Press, issue In Press, no. In Press, pp. 1–7, In Press.
Zurück zum Zitat Saleem, N., Khattak, M. I., Ali, M. Y., & Shafi, M. (2019b). Deep neural network for supervised single-channel speech enhancement. Archives of Acoustics,44(1), 3–12. Saleem, N., Khattak, M. I., Ali, M. Y., & Shafi, M. (2019b). Deep neural network for supervised single-channel speech enhancement. Archives of Acoustics,44(1), 3–12.
Zurück zum Zitat Sang, J. (2012). Evaluation of the sparse coding shrinkage noise reduction algorithm for the hearing impaired. (Doctoral dissertation, University of Southampton). Sang, J. (2012). Evaluation of the sparse coding shrinkage noise reduction algorithm for the hearing impaired. (Doctoral dissertation, University of Southampton).
Zurück zum Zitat Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks,61, 85–117. Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks,61, 85–117.
Zurück zum Zitat Scholkopf, B., & Smola, A. J. (2001). Learning with kernels: Support vector machines, regularization, optimization, and beyond. Cambridge: MIT Press. Scholkopf, B., & Smola, A. J. (2001). Learning with kernels: Support vector machines, regularization, optimization, and beyond. Cambridge: MIT Press.
Zurück zum Zitat Seltzer, M. L., Raj, B., & Stern, R. M. (2004). A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition. Speech Communication,43(4), 379–393. Seltzer, M. L., Raj, B., & Stern, R. M. (2004). A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition. Speech Communication,43(4), 379–393.
Zurück zum Zitat Sharma, P., Abrol, V., & Sao, A. K. (2015, February). Supervised speech enhancement using compressed sensing. In 2015 Twenty First National Conference on Communications (NCC) (pp. 1–5). IEEE. Sharma, P., Abrol, V., & Sao, A. K. (2015, February). Supervised speech enhancement using compressed sensing. In 2015 Twenty First National Conference on Communications (NCC) (pp. 1–5). IEEE.
Zurück zum Zitat Smaragdis, P. (2007). Convolutive speech bases and their application to supervised speech separation. IEEE Transactions on Audio, Speech and Language Processing,15(1), 1–12. Smaragdis, P. (2007). Convolutive speech bases and their application to supervised speech separation. IEEE Transactions on Audio, Speech and Language Processing,15(1), 1–12.
Zurück zum Zitat Smola, A. J., & Schölkopf, B. (2004). A tutorial on support vector regression. Statistics and Computing,14(3), 199–222.MathSciNet Smola, A. J., & Schölkopf, B. (2004). A tutorial on support vector regression. Statistics and Computing,14(3), 199–222.MathSciNet
Zurück zum Zitat Taal, C. H., Hendriks, R. C., Heusdens, R., & Jensen, J. (2010, March). A short-time objective intelligibility measure for time-frequency weighted noisy speech. In 2010 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 4214–4217). IEEE. Taal, C. H., Hendriks, R. C., Heusdens, R., & Jensen, J. (2010, March). A short-time objective intelligibility measure for time-frequency weighted noisy speech. In 2010 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 4214–4217). IEEE.
Zurück zum Zitat Tang, J., Deng, C., & Huang, G. B. (2016). Extreme learning machine for multilayer perceptron. IEEE Transactions on Neural Networks and Learning Systems,27(4), 809–821.MathSciNet Tang, J., Deng, C., & Huang, G. B. (2016). Extreme learning machine for multilayer perceptron. IEEE Transactions on Neural Networks and Learning Systems,27(4), 809–821.MathSciNet
Zurück zum Zitat Tchorz, J., & Kollmeier, B. (2003). SNR estimation based on amplitude modulation analysis with applications to noise suppression. IEEE Transactions on Speech and Audio Processing,11(3), 184–192. Tchorz, J., & Kollmeier, B. (2003). SNR estimation based on amplitude modulation analysis with applications to noise suppression. IEEE Transactions on Speech and Audio Processing,11(3), 184–192.
Zurück zum Zitat Tsoukalas, D. E., Mourjopoulos, J. N., & Kokkinakis, G. (1997). Speech enhancement based on audible noise suppression. IEEE Transactions on Speech and Audio Processing,5(6), 497–514.MATH Tsoukalas, D. E., Mourjopoulos, J. N., & Kokkinakis, G. (1997). Speech enhancement based on audible noise suppression. IEEE Transactions on Speech and Audio Processing,5(6), 497–514.MATH
Zurück zum Zitat Vary, P., & Martin, R. (2006). Digital speech transmission: Enhancement, coding and error concealment. Hoboken: Wiley. Vary, P., & Martin, R. (2006). Digital speech transmission: Enhancement, coding and error concealment. Hoboken: Wiley.
Zurück zum Zitat Virag, N. (1999). Single channel speech enhancement based on masking properties of the human auditory system. IEEE Transactions on Speech and Audio Processing,7(2), 126–137. Virag, N. (1999). Single channel speech enhancement based on masking properties of the human auditory system. IEEE Transactions on Speech and Audio Processing,7(2), 126–137.
Zurück zum Zitat Virtanen, T. (2007). Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria. IEEE Transactions on Audio, Speech and Language Processing,15(3), 1066–1074. Virtanen, T. (2007). Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria. IEEE Transactions on Audio, Speech and Language Processing,15(3), 1066–1074.
Zurück zum Zitat Wang, Y., Han, K., & Wang, D. (2012). Acoustic features for classification based speech separation. In Thirteenth Annual Conference of the International Speech Communication Association. Wang, Y., Han, K., & Wang, D. (2012). Acoustic features for classification based speech separation. In Thirteenth Annual Conference of the International Speech Communication Association.
Zurück zum Zitat Wang, Y., Han, K., & Wang, D. (2013). Exploring monaural features for classification-based speech segregation. IEEE Transactions on Audio, Speech and Language Processing,21(2), 270–279. Wang, Y., Han, K., & Wang, D. (2013). Exploring monaural features for classification-based speech segregation. IEEE Transactions on Audio, Speech and Language Processing,21(2), 270–279.
Zurück zum Zitat Wang, Y., & Wang, D. (2013). Towards scaling up classification-based speech separation. IEEE Transactions on Audio, Speech and Language Processing,21(7), 1381–1390. Wang, Y., & Wang, D. (2013). Towards scaling up classification-based speech separation. IEEE Transactions on Audio, Speech and Language Processing,21(7), 1381–1390.
Zurück zum Zitat Weninger, F., Roux, J. L., Hershey, J. R., & Watanabe, S. (2014). Discriminative NMF and its application to single-channel source separation. In Fifteenth Annual Conference of the International Speech Communication Association. Weninger, F., Roux, J. L., Hershey, J. R., & Watanabe, S. (2014). Discriminative NMF and its application to single-channel source separation. In Fifteenth Annual Conference of the International Speech Communication Association.
Zurück zum Zitat Wiest, J., Höffken, M., Kreßel, U., & Dietmayer, K. (2012, June). Probabilistic trajectory prediction with Gaussian mixture models. In 2012 IEEE Intelligent Vehicles Symposium (pp. 141–146). IEEE. Wiest, J., Höffken, M., Kreßel, U., & Dietmayer, K. (2012, June). Probabilistic trajectory prediction with Gaussian mixture models. In 2012 IEEE Intelligent Vehicles Symposium (pp. 141–146). IEEE.
Zurück zum Zitat Xiao, X., Zhao, S., Nguyen, D. H. H., Zhong, X., Jones, D. L., Chng, E. S., et al. (2016). Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation. EURASIP Journal on Advances in Signal Processing,2016(1), 4. Xiao, X., Zhao, S., Nguyen, D. H. H., Zhong, X., Jones, D. L., Chng, E. S., et al. (2016). Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation. EURASIP Journal on Advances in Signal Processing,2016(1), 4.
Zurück zum Zitat Xu, Y., Du, J., Dai, L. R., & Lee, C. H. (2014). An experimental study on speech enhancement based on deep neural networks. IEEE Signal Processing Letters,21(1), 65–68. Xu, Y., Du, J., Dai, L. R., & Lee, C. H. (2014). An experimental study on speech enhancement based on deep neural networks. IEEE Signal Processing Letters,21(1), 65–68.
Zurück zum Zitat Xu, Y., Du, J., Dai, L. R., & Lee, C. H. (2015). A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing,23(1), 7–19. Xu, Y., Du, J., Dai, L. R., & Lee, C. H. (2015). A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing,23(1), 7–19.
Metadaten
Titel
A review of supervised learning algorithms for single channel speech enhancement
verfasst von
Nasir Saleem
Muhammad Irfan Khattak
Publikationsdatum
10.10.2019
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 4/2019
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-019-09645-2

Weitere Artikel der Ausgabe 4/2019

International Journal of Speech Technology 4/2019 Zur Ausgabe