nach oben

Neural Computing and Applications

Erschienen in:

28.04.2018 | Original Article

Discriminatively trained continuous Hindi speech recognition system using interpolated recurrent neural network language modeling

verfasst von: Mohit Dua, R. K. Aggarwal, Mantosh Biswas

Erschienen in: Neural Computing and Applications | Ausgabe 10/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

This paper implements and evaluates the performance of a discriminatively trained continuous Hindi language speech recognition system. The system uses maximum mutual information and minimum phone error discriminative techniques with various numbers of Gaussian mixtures to train the automatic speech recognition (ASR) system. The training dataset consists of Hindi speech transcription. The experiments show a significant performance gain over maximum likelihood-based Hindi language speech recognition system. The system uses an efficient recurrent neural network (RNN)-based language modeling. The results indicate that the use of RNN-based language modeling enhances the performance of the ASR system. Further, the interpolation of n-gram language model (LM) with the RNNLM exhibits an additional increase in recognition performance of the implemented system. The proposed system introduces the concept of speaker adaption using maximum likelihood linear regression technique. The paper also gives an overview of the techniques used for discriminative training along with practical issues involved in their implementation.

Vorheriger Artikel Predictive mathematical model for solving multi-criteria decision-making problems

Nächster Artikel Global asymptotical stability for a class of non-autonomous impulsive inertial neural networks with unbounded time-varying delay

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Liu H, Yin J, Luo X, Zhang S (2018) Foreword to the special issue on recent advances on pattern recognition and artificial intelligence. Neural Comput Appl 29(1):1–2CrossRef

de Jesús Rubio J et al (2013) A method for online pattern recognition of abnormal eye movements. Neural Comput Appl 22(3–4):597–605CrossRef

Acır N (2006) A modified hybrid neural network for pattern recognition and its application to SSW complex in EEG. Neural Comput Appl 15(1):49–54CrossRef

Cervelló-Royo R, Guijarro F, Michniuk K (2015) Stock market trading rule based on pattern recognition and technical analysis: forecasting the DJIA index with intraday data. Expert Syst Appl 42(14):5963–5975CrossRef

Arabacı H, Bilgin O (2010) Automatic detection and classification of rotor cage faults in squirrel cage induction motor. Neural Comput Appl 19(5):713–723CrossRef

Cardoso JS, Pardo XM, Paredes R (2017) Foreword to the special issue on pattern recognition and image analysis. Neural Comput Appl 28(9):2371–2372CrossRef

Daneshyari M (2010) Chaotic neural network controlled by particle swarm with decaying chaotic inertia weight for pattern recognition. Neural Comput Appl 19(4):637–645CrossRef

Xiong W, Droppo J, Huang X, Seide F, Seltzer M, Stolcke A, Zweig G (2017) The Microsoft 2016 conversational speech recognition system. In: IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 5255–5259

Adiga A, Magimai M, Seelamantula CS (2013) Gammatone wavelet cepstral coefficients for robust speech recognition. In: TENCON 2013-2013 IEEE Region 10 conference (31194). IEEE, pp 1–4

10.

Aggarwal RK, Dave M (2011) Discriminative techniques for Hindi speech recognition system. In: Information systems for Indian languages, pp 261–266

11.

Biswas A et al (2015) Hindi phoneme classification using Wiener filtered wavelet packet decomposed periodic and aperiodic acoustic feature. Comput Electr Eng 42(2015):12–22CrossRef

12.

Shao Y et al (2009) An auditory-based feature for robust speech recognition. In: IEEE international conference on acoustics, speech and signal processing, 2009. ICASSP 2009. IEEE, pp 4625–4628

13.

Baba Ali B, Sameti H, Falk TH (2011) A model distance maximizing framework for speech recognizer-based speech enhancement. AEU Int J Electron Commun 65(2):99–106CrossRef

14.

Huang Z, Siniscalchi SM, Lee C-H (2016) A unified approach to transfer learning of deep neural networks with applications to speaker adaptation in automatic speech recognition. Neurocomputing 218:448–459CrossRef

15.

Sun S et al (2017) An unsupervised deep domain adaptation approach for robust speech recognition. Neurocomputing 257:79–87CrossRef

16.

Hayasaka N, Kawamura A, Sasaoka N (2017) Noise-robust scream detection using band-limited spectral entropy. AEU Int J Electron Commun 76:117–124CrossRef

17.

Mahapatra A et al (2014) Human recognition system for outdoor videos using Hidden Markov model. AEU Int J Electron Commun 68(3):227–236CrossRef

18.

Vertanen K (2004) An overview of discriminative training for speech recognition. University of Cambridge, Cambridge, pp 1–14

19.

Gillick D, Wegmann S, Gillick L (2012) Discriminative training for speech recognition is compensating for statistical dependence in the HMM framework. In: 2012 IEEE acoustics, speech and signal processing (ICASSP-12) conference, Kyoto. IEEE, pp 4745–4748

20.

McDermott E, Hazen TJ, Le Roux J, Nakamura A, Katagiri S (2007) Discriminative training for large-vocabulary speech recognition using minimum classification error. IEEE Trans Audio Speech Lang Process 15(1):203–223CrossRef

21.

Siniscalchi SM, Svendsen T, Lee C-H (2014) An artificial neural network approach to automatic speech processing. Neurocomputing 140:326–338CrossRef

22.

Trentin E, Gori M (2001) A survey of hybrid ANN/HMM models for automatic speech recognition. Neurocomputing 37(1):91–126CrossRef

23.

Young S, Evermann G, Gales M, Hain T, Kershaw D, Liu X, Valtchev V (2002) The HTK book. Cambridge University Engineering Department, vol 3, pp 1–285

24.

Kumar M, Rajput N, Verma A (2004) A large-vocabulary continuous speech recognition system for Hindi. IBM J Res Dev 48(5.6):703–715CrossRef

25.

Kuamr A, Dua M, Choudhary A (2014) Implementation and performance evaluation of continuous Hindi speech recognition. In: Electronics and communication systems (ICECS), 2014 international conference on. IEEE, pp 1–5

26.

Fung ADYLP (2012) Using English acoustic models for Hindi automatic speech recognition. In: 24th international conference on computational linguistics

27.

Patil HA, Basu TK (2008) Development of speech corpora for speaker recognition research and evaluation in Indian languages. Int J Speech Technol 11(1):17–32CrossRef

28.

Aggarwal RKumar, Dave M (2012) Filterbank optimization for robust ASR using GA and PSO. Int J Speech Technol 15(2):191–201CrossRef

29.

Biswas A, Sahu PK, Chandra M (2016) Admissible wavelet packet sub-band based harmonic energy features using ANOVA fusion techniques for Hindi phoneme recognition. IET Signal Proc 10(8):902–911CrossRef

30.

Biswas A et al (2015) Admissible wavelet packet sub-band-based harmonic energy features for Hindi phoneme recognition. IET Signal Proc 9(6):511–519CrossRef

31.

Mittal T, Sharma R (2016) Speech recognition using ANN and predator-influenced civilized swarm optimization algorithm. Turk J Electr Eng Comput Sci 24:4790–4803CrossRef

32.

Gopalakrishnan PS, Kanevsky D, Nadas A, Nahamoo D (1991) An inequality for rational functions with applications to some statistical estimation problems. IEEE Trans Inf Theory 37(1):107–113CrossRef

33.

Valtchev V (1995) Discriminative methods in HMM-based speech recognition, Ph.D Thesis. University of Cambridge

34.

Povey D, Kanevsky D, Kingsbury B, Ramabhadran B, Saon G, Visweswariah K (2008) Boosted MMI for model and feature-space discriminative training. In: 2008 IEEE international conference on acoustics, speech and signal processing (ICASSP-08), Las Vegas. IEEE, pp 4057–4060

35.

Povey D (2005) Discriminative training for large vocabulary speech recognition, Ph.D Thesis. University of Cambridge

36.

Liu X, Wang Y, Chen X, Gales MJF, Woodland PC (2014) Efficient lattice rescoring using recurrent neural network language models. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP-14), Florence. IEEE, pp 4908–4912

37.

Williams DRGHR, Hinton GE (1986) Learning representations by back-propagating errors. Nature 323(6088):533–538CrossRef

38.

Boden M (2002) A guide to recurrent neural networks and back propagation. The Dallas Project, Halmstad University, Sweden

39.

Shi Y, Hwang MY, Yao K, Larson M (2013) Speed up of recurrent neural network language models with sentence independent sub sampling stochastic gradient descent. In: Proceeding of interspeech conference, Lyon. ISCA, pp 1203–1207

40.

Huang Z, Zweig G, Levit M, Dumoulin B, Oguz B, Chang S (2013) Accelerating recurrent neural network training via two stage classes and parallelization. In: 2013 IEEE workshop on automatic speech recognition and understanding, Olomouc. IEEE, pp 326–331

41.

Li B, Zhou E, Huang B, Duan J, Wang Y, Xu N, Zhang J, Yang H (2014) Large scale recurrent neural network on GPU. In: 2014 international joint conference on neural networks (IJCNN), Beijing. IEEE, pp 4062–4069

42.

Chen X, Wang Y, Liu X, Gales MJ, Woodland PC (2014) Efficient GPU-based training of recurrent neural network language models using spliced sentence bunch. In: Proceeding of interspeech conference, Singapore. ISCA, pp 641–645

43.

Liu X, Chen X, Wang Y, Gales MJ, Woodland PC (2016) Two efficient lattice rescoring methods using recurrent neural network language models. IEEE/ACM Trans Audio Speech Lang Process 24(8):1438–1449CrossRef

44.

Samudravijaya K, Rao PVS, Agrawal SS (2002) Hindi speech database. In: International conference on spoken language processing, Beijing, pp 456–464

45.

Macherey W (2010) Discriminative training and acoustic modeling for speech recognition, Ph.D Thesis. RWTH Aachen University

46.

Chen X, Liu X, Qian Y, Gales MJF, Woodland PC (2016) CUED-RNN LM—an open-source toolkit for efficient training and evaluation of recurrent neural network language models. In: 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP-16), Shanghai. IEEE, pp 6000–6004

47.

Deoras A, Mikolov T, Kombrink S, Karafiát M, Khudanpur S (2011) Variational approximation of long-span language models for LVCSR. In: 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP-11), Prague. IEEE, pp 5532–5535

48.

Lecouteux B, Linares G, Esteve Y, Gravier G (2008) Generalized driven decoding for speech recognition system combination. In: Acoustics, speech and signal processing, 2008. ICASSP 2008. IEEE international conference on. IEEE, pp 1549–1552

Titel: Discriminatively trained continuous Hindi speech recognition system using interpolated recurrent neural network language modeling
verfasst von: Mohit Dua
R. K. Aggarwal
Mantosh Biswas
Publikationsdatum: 28.04.2018
Verlag: Springer London
Erschienen in: Neural Computing and Applications / Ausgabe 10/2019
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI: https://doi.org/10.1007/s00521-018-3499-9

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Weitere Artikel der Ausgabe 10/2019

Monthly long-term rainfall estimation in Central India using M5Tree, MARS, LSSVR, ANN and GEP models

Strength retrieval of artificially cemented bauxite residue using machine learning: an alternative design approach based on response surface methodology

No-reference image quality assessment based on sparse representation

An enhanced SSD with feature fusion and visual reasoning for object detection

Robust possibilistic programming for multi-item EOQ model with defective supply batches: Whale Optimization and Water Cycle Algorithms

Hopfield network-based approach to detect seam-carved images and identify tampered regions