nach oben

International Journal of Speech Technology

Erschienen in:

10.10.2019

A review of supervised learning algorithms for single channel speech enhancement

verfasst von: Nasir Saleem, Muhammad Irfan Khattak

Erschienen in: International Journal of Speech Technology | Ausgabe 4/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Reducing interfering noise in a noisy speech recording has been a difficult task in many applications related to the voice. From hands-free communication to human–machine interaction, a speech signal of the interest captured by a microphone is always mixed with the interfering noise. The interfering noise appends new frequency components and masks a large portion of the time-varying spectra of the desired speech. This significantly affects our perception of the desired speech when listening to the noisy observations. Therefore, it is extremely desirable and sometimes even crucial to clean the noisy speech signals. This clean-up process is referred to as the speech enhancement (SE). SE aims to improve the speech intelligibility and quality of the voice for the communication. We present a comprehensive review on the supervised single channel speech enhancement (SCSE) algorithms. First, a classification based overview of the supervised SCSE algorithms is provided and the related works is outlined. The recent literature on the SCSE algorithms in supervised perspective is reviewed. Finally, some open research problems are identified that need further research.

Vorheriger Artikel Evaluation of PNN pattern-layer activation function approximations in different training setups

Nächster Artikel Early reflection detection using autocorrelation to improve robustness of speaker verification in reverberant conditions

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Ali, S. M., & Gupta, B. Speech enhancement using neural network.

Allen, J. B. (1994). How do humans process and recognize speech? IEEE Transactions on Speech and Audio Processing,2(4), 567–577.

Arehart, K. H., Hansen, J. H., Gallant, S., & Kalstein, L. (2003). Evaluation of an auditory masked threshold noise suppression algorithm in normal-hearing and hearing-impaired listeners. Speech Communication,40(4), 575–592.

Baer, T., Moore, B. C., & Gatehouse, S. (1993). Spectral contrast enhancement of speech in noise for listeners with sensorineural hearing impairment: Effects on intelligibility, quality, and response times. Journal of Rehabilitation Research and Development,30, 49.

Bahoura, M., & Rouat, J. (2001). Wavelet speech enhancement based on the teager energy operator. IEEE Signal Processing Letters,8(1), 10–12.

Bentler, R., Wu, Y. H., Kettel, J., & Hurtig, R. (2008). Digital noise reduction: Outcomes from laboratory and field studies. International Journal of Audiology,47(8), 447–460.

Chang, C. C., & Lin, C. J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST),2(3), 27.

Chazan, S. E., Goldberger, J., & Gannot, S. (2016). A hybrid approach for speech enhancement using MoG model and neural network phoneme classifier. IEEE/ACM Transactions on Audio, Speech, and Language Processing,24(12), 2516–2530.

Chen, J., Wang, Y., & Wang, D. (2014). A feature study for classification-based speech separation at low signal-to-noise ratios. IEEE/ACM Transactions on Audio, Speech, and Language Processing,22(12), 1993–2002.

Chen, J., Wang, Y., & Wang, D. (2016). Noise perturbation for supervised speech separation. Speech Communication,78, 1–10.

Chiluveru, S. R., & Tripathy, M. (2019). Low SNR speech enhancement with DNN based phase estimation. International Journal of Speech Technology,22(1), 283–292.

Chung, H., Plourde, E., & Champagne, B. (2016, March). Basis compensation in non-negative matrix factorization model for speech enhancement. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 2249–2253). IEEE.

Chung, H., Plourde, E., & Champagne, B. (2017). Regularized non-negative matrix factorization with Gaussian mixtures and masking model for speech enhancement. Speech Communication,87, 18–30.

Cohen, Israel. (2002). Optimal speech enhancement under signal presence uncertainty using log-spectral amplitude estimator. IEEE Signal Processing Letters,9(4), 113–116.

Cohen, I., & Berdugo, B. (2001). Speech enhancement for non-stationary noise environments. Signal Processing,81(11), 2403–2418.MATH

Cohen, I., & Berdugo, B. (2002). Noise estimation by minima controlled recursive averaging for robust speech enhancement. IEEE Signal Processing Letters,9(1), 12–15.

Deng, L., Hinton, G., & Kingsbury, B. (2013, May). New types of deep neural network learning for speech recognition and related applications: An overview. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 8599–8603). IEEE.

Eggert, J., Wersing, H., & Korner, E. (2004, July). Transformation-invariant representation and NMF. In 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No. 04CH37541) (Vol. 4, pp. 2535–2539). IEEE.

Ephraim, Y., & Malah, D. (1984). Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing,32(6), 1109–1121.

Ephraim, Yariv, & Malah, David. (1985). Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing,33(2), 443–445.

Ephraim, Y., & van Trees, H. L. (1995). A signal subspace approach for speech enhancement. IEEE Transactions on Speech and Audio Processing,3(4), 251–266.

Févotte, C., & Idier, J. (2011). Algorithms for nonnegative matrix factorization with the β-divergence. Neural Computation,23(9), 2421–2456.MathSciNetMATH

Glorot, X., & Bengio, Y. (2010, March). Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics (pp. 249–256).

Gordon-Salant, S. (1987). Effects of acoustic modification on consonant recognition by elderly hearing-impaired subjects. The Journal of the Acoustical Society of America,81(4), 1199–1202.

Han, K., & Wang, D. (2012). A classification based approach to speech segregation. The Journal of the Acoustical Society of America,132(5), 3475–3483.

Han, K., & Wang, D. (2013). Towards generalizing classification based speech separation. IEEE Transactions on Audio, Speech and Language Processing,21(1), 168–177.

Han, W., Zhang, X., Min, G., & Sun, M. (2016). A perceptually motivated approach for speech enhancement based on deep neural network. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences,99(4), 835–838.

Han, W., Zhang, X., Min, G., Zhou, X., & Sun, M. (2017). Joint optimization of perceptual gain function and deep neural networks for single-channel speech enhancement. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences,100(2), 714–717.

Hansen, J. H., & Clements, M. A. (1991). Constrained iterative speech enhancement with application to speech recognition. IEEE Transactions on Signal Processing,39(4), 795–805.

Helfer, K. S., & Wilber, L. A. (1990). Hearing loss, aging, and speech perception in reverberation and noise. Journal of Speech, Language, and Hearing Research,33(1), 149–155.

Hermus, K., & Wambacq, P. (2006). A review of signal subspace speech enhancement and its application to noise robust speech recognition. EURASIP Journal on Advances in Signal Processing,2007(1), 045821.MathSciNetMATH

Hirsch, H. G., & Ehrlicher, C. (1995, May). Noise estimation techniques for robust speech recognition. In 1995 International Conference on Acoustics, Speech, and Signal Processing (Vol. 1, pp. 153–156). IEEE.

Hu, Y., & Loizou, P. C. (2004). Speech enhancement based on wavelet thresholding the multitaper spectrum. IEEE Transactions on Speech and Audio processing,12(1), 59–67.

Hu, Y., & Loizou, P. C. (2007, April). A comparative intelligibility study of speech enhancement algorithms. In 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07 (Vol. 4, pp. IV–561). IEEE.

Hu, Y., & Loizou, P. C. (2007b). A comparative intelligibility study of single-microphone noise reduction algorithms. The Journal of the Acoustical Society of America,122(3), 1777–1786.

Hu, G., & Wang, D. (2010). A tandem algorithm for pitch estimation and voiced speech segregation. IEEE Transactions on Audio, Speech and Language Processing,18(8), 2067–2079.

Hu, Y., Zhang, X., Zou, X., Sun, M., Min, G., & Li, Y. (2016). Improved semi-supervised NMF based real-time capable speech enhancement. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences,99(1), 402–406.

Hu, Y., Zhang, X., Zou, X., Sun, M., Zheng, Y., & Min, G. (2017). Semi-supervised speech enhancement combining nonnegative matrix factorization and robust principal component analysis. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences,100(8), 1714–1719.

Huang, P. S., Kim, M., Hasegawa-Johnson, M., & Smaragdis, P. (2015). Joint optimization of masks and deep recurrent neural networks for monaural source separation. IEEE/ACM Transactions on Audio, Speech, and Language Processing,23(12), 2136–2147.

Hussain, T., Siniscalchi, S. M., Lee, C. C., Wang, S. S., Tsao, Y., & Liao, W. H. (2017). Experimental study on extreme learning machine applications for speech enhancement. IEEE Access,5, 25542–25554.

Jamieson, D. G., Brennan, R. L., & Cornelisse, L. E. (1995). Evaluation of a speech enhancement strategy with normal-hearing and hearing-impaired listeners. Ear and Hearing,16(3), 274–286.

Jin, Z., & Wang, D. (2009). A supervised learning approach to monaural segregation of reverberant speech. IEEE Transactions on Audio, Speech and Language Processing,17(4), 625–638.

Joder, C., Weninger, F., Eyben, F., Virette, D., & Schuller, B. (2012, March). Real-time speech separation by semi-supervised nonnegative matrix factorization. In International Conference on Latent Variable Analysis and Signal Separation (pp. 322–329). Berlin, Heidelberg: Springer.

Kim, G., & Loizou, P. C. (2010). Improving speech intelligibility in noise using environment-optimized algorithms. IEEE Transactions on Audio, Speech and Language Processing,18(8), 2080–2090.

Kim, G., Lu, Y., Hu, Y., & Loizou, P. C. (2009). An algorithm that improves speech intelligibility in noise for normal-hearing listeners. The Journal of the Acoustical Society of America,126(3), 1486–1494.

Kim, W., & Stern, R. M. (2011). Mask classification for missing-feature reconstruction for robust speech recognition in unknown background noise. Speech Communication,53(1), 1–11.

Kolbk, M., Tan, Z. H., Jensen, J., Kolbk, M., Tan, Z. H., & Jensen, J. (2017). Speech intelligibility potential of general and specialized deep neural network based speech enhancement systems. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP),25(1), 153–167.

Koul, R. K., & Allen, G. D. (1993). Segmental intelligibility and speech interference thresholds of high-quality synthetic speech in presence of noise. Journal of Speech, Language, and Hearing Research,36(4), 790–798.

Krishnamoorthy, P., & Prasanna, S. M. (2009). Temporal and spectral processing methods for processing of degraded speech: A review. IETE Technical Review,26(2), 137–148.

Larochelle, H., Bengio, Y., Louradour, J., & Lamblin, P. (2009). Exploring strategies for training deep neural networks. Journal of Machine Learning Research,10, 1–40.MATH

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature,521(7553), 436.

Lee, D. D., & Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature,401(6755), 788.MATH

Lee, D. D., & Seung, H. S. (2001). Algorithms for non-negative matrix factorization. In M. C. Mozer, M. E. Hasselmo, & D. S. Touretzky (Eds.), Advances in neural information processing systems (pp. 556–562). Cambridge: MIT Press.

Levitt, H. (2001). Noise reduction in hearing aids: A review. Journal of Rehabilitation Research and Development,38(1), 111–122.MathSciNet

Li, Y., & Kang, S. (2016). Deep neural network-based linear predictive parameter estimations for speech enhancement. IET Signal Processing,11(4), 469–476.

Loizou, Philipos C. (2007). Speech enhancement: Theory and practice. Boca Raton, FL: CRC.

Loizou, P. C. (2011). Speech quality assessment. In Multimedia analysis, processing and communications (pp. 623–654). Berlin, Heidelberg: Springer.

Lotter, T., & Vary, P. (2005). Speech enhancement by MAP spectral amplitude estimation using a super-Gaussian speech model. EURASIP Journal on Advances in Signal Processing,2005(7), 354850.MATH

Ludeña-Choez, J., & Gallardo-Antolín, A. (2012). Speech denoising using non-negative matrix factorization with kullback-leibler divergence and sparseness constraints. In Advances in Speech and Language Technologies for Iberian Languages (pp. 207–216). Berlin, Heidelberg: Springer.

Luts, H., Eneman, K., Wouters, J., Schulte, M., Vormann, M., Buechler, M.,… & Puder, H. (2010). Multicenter evaluation of signal enhancement algorithms for hearing aids. The Journal of the Acoustical Society of America, 127(3), 1491-1505.

Lyubimov, N., & Kotov, M. (2013). Non-negative matrix factorization with linear constraints for single-channel speech enhancement. http://arxiv.org/abs//1309.6047.

Ma, J., & Loizou, P. C. (2011). SNR loss: A new objective measure for predicting the intelligibility of noise-suppressed speech. Speech Communication,53(3), 340–354.

Martin, R. (2005). Speech enhancement based on minimum mean-square error estimation and supergaussian priors. IEEE Transactions on Speech and Audio Processing,13(5), 845–856.

May, T., & Dau, T. (2014). Requirements for the evaluation of computational speech segregation systems. The Journal of the Acoustical Society of America,136(6), 398–404.

Mohammadiha, N., Smaragdis, P., & Leijon, A. (2013). Supervised and unsupervised speech enhancement using nonnegative matrix factorization. IEEE Transactions on Audio, Speech and Language Processing,21(10), 2140–2151.

Mohammed, S., & Tashev, I. (2017, March). A statistical approach to semi-supervised speech enhancement with low-order non-negative matrix factorization. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 546–550). IEEE.

Moore, B. C. (2003). Speech processing for the hearing-impaired: Successes, failures, and implications for speech mechanisms. Speech Communication,41(1), 81–91.

Mysore, G. J., & Smaragdis, P. (2011, May). A non-negative approach to semi-supervised separation of speech from noise with the use of temporal dynamics. In 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 17–20). IEEE.

Nidhyananthan, S. S., Kumari, R. S. S., & Prakash, A. A. (2014). A review on speech enhancement algorithms and why to combine with environment classification. International Journal of Modern Physics C,25(10), 1430002.

Nielsen, M. A. (2015). Neural networks and deep learning(Vol 25). San Francisco, CA: Determination Press.

Ozerov, A., Philippe, P., Bimbot, F., & Gribonval, R. (2007). Adaptation of Bayesian models for single-channel source separation and its application to voice/music separation in popular songs. IEEE Transactions on Audio, Speech and Language Processing,15(5), 1564–1578.

Pal, S. K., & Mitra, S. (1992). Multilayer perceptron, fuzzy sets, and classification. IEEE Transactions on Neural Networks,3(5), 683–697.

Plapous, C., Marro, C., & Scalart, P. (2006). Improved signal-to-noise ratio estimation for speech enhancement. IEEE Transactions on Audio, Speech and Language Processing,14(6), 2098–2108.

Quackenbush, S. R. (1995). Objective measures of speech quality. (Doctoral dissertation, Georgia Institute of Technology).

Raj, B., Virtanen, T., Chaudhuri, S., & Singh, R. (2010). Non-negative matrix factorization based compensation of music for automatic speech recognition. In Eleventh Annual Conference of the International Speech Communication Association.

Rehr, R., & Gerkmann, T. (2017). Normalized features for improving the generalization of DNN based speech enhancement. http://arxiv.org/abs//1709.02175.

Rezayee, A., & Gazor, S. (2001). An adaptive KLT approach for speech enhancement. IEEE Transactions on Speech and Audio Processing,9(2), 87–95.

Rix, A. W., Beerends, J. G., Hollier, M. P., & Hekstra, A. P. (2001, May). Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. In 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No. 01CH37221) (Vol. 2, pp. 749–752). IEEE.

Roberts, S. J., Husmeier, D., Rezek, I., & Penny, W. (1998). Bayesian approaches to Gaussian mixture modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence,20(11), 1133–1142.

Roweis, S. T. (2001). One microphone source separation. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems (pp. 793–799). Cambridge: MIT Press.

Ruck, D. W., Rogers, S. K., & Kabrisky, M. (1990a). Feature selection using a multilayer perceptron. Journal of Neural Network Computing,2(2), 40–48.

Ruck, D. W., Rogers, S. K., Kabrisky, M., Oxley, M. E., & Suter, B. W. (1990b). The multilayer perceptron as an approximation to a Bayes optimal discriminant function. IEEE Transactions on Neural Networks,1(4), 296–298.

Sainath, T. N., Vinyals, O., Senior, A., & Sak, H. (2015, April). Convolutional, long short-term memory, fully connected deep neural networks. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4580–4584). IEEE.

Saleem, N. (2017). Single channel noise reduction system in low SNR. International Journal of Speech Technology,20(1), 89–98.MathSciNet

Saleem, N., & Khattak, M. I. (2019). Deep neural networks for speech enhancement in complex-noisy environments. International Journal of Interactive Multimedia and Artificial Intelligence, vol. In Press, issue In Press, no. In Press, pp. 1–7, In Press.

Saleem, N., Irfan Khattak, M., & Qazi, A. B. (2019a). Supervised speech enhancement based on deep neural network. Journal of Intelligent & Fuzzy Systems. https://doi.org/10.3233/JIFS-190047.CrossRef

Saleem, N., Khattak, M. I., Ali, M. Y., & Shafi, M. (2019b). Deep neural network for supervised single-channel speech enhancement. Archives of Acoustics,44(1), 3–12.

Sang, J. (2012). Evaluation of the sparse coding shrinkage noise reduction algorithm for the hearing impaired. (Doctoral dissertation, University of Southampton).

Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks,61, 85–117.

Scholkopf, B., & Smola, A. J. (2001). Learning with kernels: Support vector machines, regularization, optimization, and beyond. Cambridge: MIT Press.

Seltzer, M. L., Raj, B., & Stern, R. M. (2004). A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition. Speech Communication,43(4), 379–393.

Sharma, P., Abrol, V., & Sao, A. K. (2015, February). Supervised speech enhancement using compressed sensing. In 2015 Twenty First National Conference on Communications (NCC) (pp. 1–5). IEEE.

Smaragdis, P. (2007). Convolutive speech bases and their application to supervised speech separation. IEEE Transactions on Audio, Speech and Language Processing,15(1), 1–12.

Smola, A. J., & Schölkopf, B. (2004). A tutorial on support vector regression. Statistics and Computing,14(3), 199–222.MathSciNet

Sun, P., & Qin, J. (2016). Semi-supervised speech enhancement in envelop and details subspaces. http://arxiv.org/abs//1609.09443.

Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2013). Intriguing properties of neural networks. http://arxiv.org/abs//1312.6199.

Taal, C. H., Hendriks, R. C., Heusdens, R., & Jensen, J. (2010, March). A short-time objective intelligibility measure for time-frequency weighted noisy speech. In 2010 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 4214–4217). IEEE.

Tang, J., Deng, C., & Huang, G. B. (2016). Extreme learning machine for multilayer perceptron. IEEE Transactions on Neural Networks and Learning Systems,27(4), 809–821.MathSciNet

Tchorz, J., & Kollmeier, B. (2003). SNR estimation based on amplitude modulation analysis with applications to noise suppression. IEEE Transactions on Speech and Audio Processing,11(3), 184–192.

Tsoukalas, D. E., Mourjopoulos, J. N., & Kokkinakis, G. (1997). Speech enhancement based on audible noise suppression. IEEE Transactions on Speech and Audio Processing,5(6), 497–514.MATH

Vary, P., & Martin, R. (2006). Digital speech transmission: Enhancement, coding and error concealment. Hoboken: Wiley.

Virag, N. (1999). Single channel speech enhancement based on masking properties of the human auditory system. IEEE Transactions on Speech and Audio Processing,7(2), 126–137.

Virtanen, T. (2007). Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria. IEEE Transactions on Audio, Speech and Language Processing,15(3), 1066–1074.

Wang, Y., Han, K., & Wang, D. (2012). Acoustic features for classification based speech separation. In Thirteenth Annual Conference of the International Speech Communication Association.

Wang, Y., Han, K., & Wang, D. (2013). Exploring monaural features for classification-based speech segregation. IEEE Transactions on Audio, Speech and Language Processing,21(2), 270–279.

Wang, Y., & Wang, D. (2013). Towards scaling up classification-based speech separation. IEEE Transactions on Audio, Speech and Language Processing,21(7), 1381–1390.

Weninger, F., Roux, J. L., Hershey, J. R., & Watanabe, S. (2014). Discriminative NMF and its application to single-channel source separation. In Fifteenth Annual Conference of the International Speech Communication Association.

Wiest, J., Höffken, M., Kreßel, U., & Dietmayer, K. (2012, June). Probabilistic trajectory prediction with Gaussian mixture models. In 2012 IEEE Intelligent Vehicles Symposium (pp. 141–146). IEEE.

Xiao, X., Zhao, S., Nguyen, D. H. H., Zhong, X., Jones, D. L., Chng, E. S., et al. (2016). Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation. EURASIP Journal on Advances in Signal Processing,2016(1), 4.

Xu, Y., Du, J., Dai, L. R., & Lee, C. H. (2014). An experimental study on speech enhancement based on deep neural networks. IEEE Signal Processing Letters,21(1), 65–68.

Xu, Y., Du, J., Dai, L. R., & Lee, C. H. (2015). A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing,23(1), 7–19.

Titel: A review of supervised learning algorithms for single channel speech enhancement
verfasst von: Nasir Saleem
Muhammad Irfan Khattak
Publikationsdatum: 10.10.2019
Verlag: Springer US
Erschienen in: International Journal of Speech Technology / Ausgabe 4/2019
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-019-09645-2

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 4/2019

A novel voice conversion approach using cascaded powerful cepstrum predictors with excitation and phase extracted from the target training space encoded as a KD-tree

A usage of the syllable unit based on morphological statistics in Korean large vocabulary continuous speech recognition system

A novel system for effective speech recognition based on artificial neural network and opposition artificial bee colony algorithm

Maximum entropy PLDA for robust speaker recognition under speech coding distortion

Early reflection detection using autocorrelation to improve robustness of speaker verification in reverberant conditions

Sliding-band dynamic range compression for use in hearing aids