Weitere Artikel dieser Ausgabe durch Wischen aufrufen
In this paper, we propose a deep learning-based speech enhancement (DLSE) method to improve speech intelligibility for the hearing-impaired listeners. The algorithm decomposes the noisy speech signal into frames (as features) and feeds them to the deep convolutional neural networks (DCNNs) to produce an estimation of which frequency channels contain more perceptually important information (higher signal-to-noise ratio, SNR). This estimate is used to attenuate noise-dominated and retain speech-dominated cochlear implant (CI) channels for electrical stimulation, as in traditional n-of-m CI coding strategies. The proposed algorithm was evaluated by measuring the speech-in-noise performance of 12 CI users using two types of background noises such as fan and music sounds. The architecture and low processing delay of the DLSE algorithm make it suitable for application in hearing devices. While DLSE was evaluated using a noise-specific approach, several aspects of generalisation to unseen acoustic conditions were addressed, most importantly performance with a speaker not used during the training stage. The largest improvements for both speech intelligibility and quality are found by DCNN-based proposed method. Moreover, the results show that DCNN-based methods appeared more promising than existing methods.
Bitte loggen Sie sich ein, um Zugang zu diesem Inhalt zu erhalten
Sie möchten Zugang zu diesem Inhalt erhalten? Dann informieren Sie sich jetzt über unsere Produkte:
Oxenham AJ, Kreft HA (2014) Speech perception in tones and noise via cochlear implants reveals influence of spectral resolution on temporal processing. Trends Hear 18:1–14
Miller GA, Licklider JCR (1950) The intelligibility of interrupted speech. J Acoust Soc Am 22:167–173 CrossRef
Fu QJ, Shannon RV, Wang X (2013) Effects of noise and spectral resolution on vowel and consonant recognition: acoustic and electric hearing. J Acoust Soc Am 104:3586–3596 CrossRef
Sundararaj V (2016) An efficient threshold prediction scheme for wavelet based ECG signal noise reduction using variable step size firefly algorithm. Int J Intell Eng Syst 9(3):117–126
Tsoukalas DE, Mourjopoulos JN, Kokkinakis G (1997) Speech enhancement based on audible noise suppression. IEEE Trans Speech Audio Process 5:497–514 CrossRef
Luts H, Eneman K, Wouters J, Schulte M, Vormann M, Buechler M, Dillier N, Houben R, Dreschler WA, Froehlich M, Puder H, Grimm G, Hohmann V, Leijon A, Lombard A, Mauler D, Spriet A (2010) Multicenter evaluation of signal enhancement algorithms for hearing aids. J Acoust Soc Am 127:1491–1505 CrossRefPubMedCentral
D. Shalini Punithavathani, K. Sujatha, J. Mark Jain, (2015) Surveillance of anomaly and misuse in critical networks to counter insider threats using computational intelligence. Cluster Computing 18 (1):435–451
Vinu Sundararaj, (2016) An Efficient Threshold Prediction Scheme for Wavelet Based ECG Signal Noise Reduction Using Variable Step Size Firefly Algorithm. International Journal of Intelligent Engineering and Systems 9 (3):117–126
K . Sujatha, D. Shalini Punithavathani, (2018) Optimized ensemble decision-based multi-focus imagefusion using binary genetic Grey-Wolf optimizer in camera sensor networks. Multimedia Tools and Applications 77 (2):1735–1759
Vinu Sundararaj, Selvi Muthukumar, & Kumar, R. S. (2018). An optimal cluster formation based energy efficient dynamic scheduling hybrid MAC protocol for heavy traffic load in wireless sensor networks. Computers & Security, 77, 277–288
Sundararaj, V. (2018). Optimal task assignment in mobile cloud computing by queue based Ant-Bee algorithm. Wireless Personal Communications. https://doi.org/10.1007/s11277-018-6014-9
Bolner F, Goehring T, Monaghan J, van Dijk B, Wouters J, Bleeck S (2016) Speech enhancement based on neural networks applied to cochlear implant coding strategies. In: 2016 IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 6520–6524
Dahl, George E., Dong Yu, Li Deng, and Alex Acero. "Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition." IEEE Transactions on audio, speech, and language processing, 20(1):30–42
Hinton G, Deng L, Yu D, Dahl GE, Mohamed A-R, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath TN et al (2012) Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process Mag 29(6):82–97 CrossRef
Spille C, Stephan D, Birger E, Bernd K, Meyer T (2018) Predicting speech intelligibility with deep neural networks. Computer Speech & Language, 48:51–66
Yang D, Mak CM (2018) An investigation of speech intelligibility for second language students in classrooms, Applied Acoustics, 134:54–59
Giovanni M, Di Liberto Edmund C, Lalor R, Millman E (2018) Causal cortical dynamics of a predictive enhancement of speech intelligibility, Neuroimage, 166:247–258
Kondo K, Taira K (2018) Estimation of binaural speech intelligibility using machine learning, Applied Acoustics, 129:408–416
Wang YX, Wang DL (2013) Towards scaling up classification based speech separation. Audio, Speech, and Language Processing, IEEE Transactions on 21(7):1381–1390 CrossRef
Yuxuan Wang, Arun Narayanan, DeLiang Wang. "On training targets for supervised speech separation." IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) 22, no. 12 (2014): 1849-1858.
Po-Sen Huang,Minje Kim,Mark Hasegawa-Johnson, Paris Smaragdis, (2015) Joint optimization of masks and deep recurrent neural networks for monaural source separation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(12), pp.2136-2147
Yong Xu, Jun Du, Li-Rong Dai, and Chin-Hui Lee. "A regression approach to speech enhancement based on deep neural networks." IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) 23, no. 1 (2015): 7-19.
Yong Xu, Jun Du, Li-Rong Dai, Chin-Hui Lee, "Global variance equalization for improving deep neural network based speech enhancement." In Signal and Information Processing (ChinaSIP), 2014 IEEE China Summit & International Conference on, pp. 71-75. IEEE, 2014.
Xu Y, Du J, Dai L-R, Lee C-H (2014) Dynamic noise aware training for speech enhancement based on deep neural networks. In: INTERSPEECH, pp. 2670–2674
Yong Xu, Jun Du, Zhen Huang, Li-Rong Dai, Chin-Hui Lee, "Multi-objective learning and mask-based post-processing for deep neural network based speech enhancement." arXiv preprint arXiv:1703.07172 (2017).
Minje Kim,Paris Smaragdis,"Adaptive Denoising Autoencoders: A Fine-Tuning Scheme to Learn from Test Mixtures"International Conference on Latent Variable Analysis and Signal Separation,pp 100-107,2015.
Gao T, Du J, Xu Y, Liu C, Dai L-R, Lee C-H (2015) Improving deep neural network based speech enhancement in low SNR environments. In: International Conference on Latent Variable Analysis and Signal Separation, Springer, pp. 75–82
Weninger F, Erdogan H, Watanabe S, Vincent E, Le Roux J, Hershey JR, Schuller B (2015) Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR. In: International Conference on Latent Variable Analysis and Signal Separation, Springer, pp. 91–99
Szu-Wei Fu, Yu Tsao, Xugang Lu. "SNR-Aware Convolutional Neural Network Modeling for Speech Enhancement." In Interspeech, pp. 3768-3772. 2016.
Tu Y-H, Du J, Xu Y, Dai L-R, Lee C-H (2014) Speech separation based on improved deep neural networks with dual outputs of speech features for both target and interfering speakers. In: ISCSLP, IEEE, pp. 250–254
Bendong Zhao, Huanzhang Lu, Shangfeng Chen, Junliang Liu, Dongya Wu. "Convolutional neural networks for time series classification." Journal of Systems Engineering and Electronics 28, no. 1 (2017): 162-169.
Jiuxiang Gu, Zhenhua Wang, Jason Kuen, Lianyang Ma, Amir Shahroudy, Bing Shuai, Ting Liu, Xingxing Wang, Gang Wang, Jianfei Cai, Tsuhan Chen,"Recent advances in convolutional neural networks." Pattern Recognition 77 (2018): 354-377.
Yann L, Yoshua B, Hinton G (2015) Deep learning. Nature 521:436–444 CrossRef
He K, X. Zhang, S. Ren, and J. Sun (2016) Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, vol. 11–18–Dece, pp. 1026–1034
Bouvrie, Jake, (2006) Notes on convolutional neural networks. In Pract., pp. 47–60
Tchorz J, Kollmeier B (2003) SNR estimation based on amplitude modulation analysis with applications to noise suppression. IEEE Trans Speech Audio Process 11(3):184–192 CrossRef
Hermansky H, Morgan N (1994) RASTA processing of speech. IEEE Trans Speech Audio Process 2(4):578–589 CrossRef
Bleeck S, Ives T, Patterson RD (2004) Aim-mat: the auditory image model in MATLAB. Acta Acust Acust 90:781–787
Rix AW, Beerends JG, Hollier MP, Hekstra AP (2001) Perceptual Evaluation of Speech Quality (PESQ)—a new method for speech quality assessment of telephone networks and codecs. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’01), Vol. 2, pp. 749–752
- Deep convolutional neural network-based speech enhancement to improve speech intelligibility and quality for hearing-impaired listeners
P. F. Khaleelur Rahiman
V. S. Jayanthi
A. N. Jayanthi
- Springer Berlin Heidelberg
Medical & Biological Engineering & Computing
Print ISSN: 0140-0118
Elektronische ISSN: 1741-0444
Neuer Inhalt/© ITandMEDIA, Best Practices für die Mitarbeiter-Partizipation in der Produktentwicklung/© astrosystem | stock.adobe.com