Swipe to navigate through the articles of this issue
In this paper, the role of speech recognition system in the assessment of dysarthric speech based on a method called Elman back propagation network (EBN) is studied. Dysarthria is a neurological disability that damages the control of motor speech articulators. The persons who suffer from Dysarthria may have speech intelligibility rate which may vary from low (2 %) to high (95 %). EBN is a Recurrent network, here a fully connected neural network is built such that the speech characteristics are represented simultaneously by neuron activation states. It is an efficient self supervised training algorithm. For parametric representation of the speech signal, we used Glottal feature along with mel frequency cepstral coefficients. Then finally the output of both the features is compared after the evaluation process using different neural networks and modeling methods. Evaluation of the proposed method is done on the subset of the Universal Access Research database. The subset consists of 9 dysarthric speakers out of 19 speakers each uttering 100 words repeatedly 3 times. The promising performance of the proposed system can be successfully applied to help the people who work for the voice disorder persons.
Please log in to get access to this content
To get access to this content you need the following product:
Arisoy, E., Chen, S. F., Ramabhadran, B., & Sethy, A. (2014). Converting neural network language models into back-off language models for efficient decoding in automatic speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(1), 184–192. CrossRef
Buabin, E. (2012). Boosted hybrid recurrent neural classifier for text document classification on the Reuters news text corpus. International Journal of Machine Learning and Computing, 2(5), 588–592. CrossRef
De Mulder, W., Bethard, S., & Moens, M.-F. (2015). A survey on the application of recurrent neural networks to statistical language modeling. Computer Speech & Language, 30(1), 61–98. CrossRef
Dede, G., & Sazli, M. H. (2010). Speech recognition with artificial neural networks. Digital Signal Processing, 20(3), 763–768. CrossRef
Duffy, J. (1995). Motor speech disorders. St. Louis: Mosby.
Fachrie, M., & Harjoko, A. (2015). Robust Indonesian Digit Speech Recognition using Elman Recurrent Neural Network. Konferensi Nasional Informatika (KNIF), 2015, 49–54.
Finch, A., Dixon, P., & Sumita, E. (2012). Rescoring a phrase-based machine transliteration system with recurrent neural network language models. NEWS’12 Proceedings of the 4th named entity workshop (pp. 47–51).
Graves, A., & Schmidhuber, J. (2005). Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks, 18(5–6), 602–610. CrossRef
Green, P., Carmiehael, J., Hatzis, A., Enderby, P., Hawley, M., & Parker, M. (2003). Automatic speech recognition with sparse training data for dysarthric speakers. Proceedings of the 8th European conference on speech communication and technology (pp. 1189–1192). Geneva.
Hawley, M. S., Enderby, P., Green, P., Cuningham, S., Brownsell, S., Carmichael, J., et al. (2007). A speech-controlled environmental control system for people with severe dysarthria. Medical Engineering & Physics, 29(5), 586–593. CrossRef
Huang, F., Ahuja, A., Downey, D., Yang, Y., Guo, Y., & Yates, A. (2014). Learning representations for weakly supervised natural language processing tasks. Computational Linguistics, 40(1), 85–120. CrossRef
Jayaram, G., & Abdelhamied, K. (1995). Experiments in dysarthric speech recognition using artificial neural networks. Journal of Rehabilitation Research and Developmen., 32(2), 162–169.
Lecorvé, G., & Motlicek, P. (2012). Conversion of recurrent neural network language models to weighted finite state transducers for automatic speech recognition. Proceedings of Interspeech (pp. 1666–1669).
Love, R. J. (1992). Childhood motor speech disability. Boston: Allyn and Bacon.
Menendez-Pidal, X., Polikoff, J. B., Peters, S. M., Leonzio, J. E., & Bunnell, H. T. (1996). The Nemours database of dysarthric speech. Fourth international conference on spoken language. ICSLP 96 (Vol. 3, pp. 1962–1965). Philadelphia, PA.
Michaelis, D., Gramss, T., & Strube, H. W. (1997). Glottal-to-noise excitation ratio: A new measure for describing pathological voices. ACUSTICA Acta Acustica, 83, 700–706.
Mikolov, T., Joulin, A., Chopra, S., Mathieu, M., & Ranzato, M. A. (2015). Learning longer memory. In Recurrent neural networks. arXiv:1412.7753v2 [cs.NE].
O’Shaughnessy, D. (2001). Speech communication human and machines (II ed.). New Delhi: Universities press (India) Limited.
Selouani S.-A., Yakoub, M. S., & O’Shaughnessy D. (2009). Alternative speech communication system for persons with severe speech disorders. Eurasip Journal on Advances in Signal Processing, pp. 1–12, Article No. 6.
Selva Nidhyananthan S., Shantha Selva Kumari, R., & Jaffino, G. (2012). Text-independent speaker identification using residual feature extraction Technique. CiiT International Journal of Digital Signal Processing, Vol. 4(3), pp. 81–85.
Sheela, K. G, & Deepa, S. N. (2013). Review on methods to fix number of hidden neurons in neural networks. Mathematical Problems in Engineering. Article ID 425740.
Shi, Y., Zhang, W. -Q., Liu, J., & Johnson, M. T. (2013). RNN language model with word clustering and class-based output layer. EURASIP Journal on Audio, Speech, and Music Processing, 2013, 22.
Sundermeyer, M, Oparin, I., Gauvain, J. -L., Freiberg, B., Schluter, R., & Ney, H. (2013). Comparison of feedforward and recurrent neural network language models. Proceedings of the international conference on acoustics, speech and signal processing (pp. 8430–8434).
Trentin, E., & Gori, M. (2010). A survey of hybrid ANN/HMM models for automatic speech recognition. Neurocomputing, 20(3), 763–768. MATH
- Assessment of dysarthric speech using Elman back propagation network (recurrent network) for speech recognition
S. Selva Nidhyananthan
R. Shantha Selva kumari
- Publication date
- Springer US