nach oben

Arabian Journal for Science and Engineering

Erschienen in:

10.07.2019 | Research Article - Computer Engineering and Computer Science

Diacritics Effect on Arabic Speech Recognition

verfasst von: Sa’ed Abed, Mohammad Alshayeji, Sari Sultan

Erschienen in: Arabian Journal for Science and Engineering | Ausgabe 11/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Arabic is the native language for over 300 million speakers and one of the official languages in United Nations. It has a unique set of diacritics that can alter a word’s meaning. Arabic automatic speech recognition (ASR) received little attention compared to other languages, and researches were oblivious to the diacritics in most cases. Omitting diacritics circumscribes the Arabic ASR system’s usability for several applications such as voice-enabled translation, text to speech, and speech-to-speech. In this paper, we study the effect of diacritics on Arabic ASR systems. Our approach is based on building and comparing diacritized and nondiacritized models for different corpus sizes. In particular, we build Arabic ASR models using state-of-the-art technologies for 1, 2, 5, 10, and 23 h. Each of those models was trained once with a diacritized corpus and another time with a nondiacritized version of the same corpus. KALDI toolkit and SRILM were used to build eight models for each corpus that are GMM-SI, GMM SAT, GMM MPE, GMM MMI, SGMM, SGMM-bMMI, DNN, DNN-MPE. Eighty different models were created using this experimental setup. Our results show that Word Error Rates (WERs) ranged from 4.68% to 42%. Adding diacritics increased WER by 0.59% to 3.29%. Although diacritics increased WERs, it is recommended to include diacritics for ASR systems when integrated with other systems such as voice-enabled translation. We believe that the benefit of the overall accuracy of the integrated system (e.g., translation) outweighs the WER increase for the Arabic ASR system.

Vorheriger Artikel Unsupervised Shape Co-segmentation Based on Transformation Network

Nächster Artikel Non-singular Terminal Sliding Mode Control of Robot Manipulators with Trajectory Tracking Performance

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Davis, S.; Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980)CrossRef

Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. J. Acoust. Soc. Am. 87(4), 1738–1752 (1990)CrossRef

Rabiner, L.; Juang, B.: An introduction to hidden markov models. IEEE ASSP Mag. 3(1), 4–16 (1986)CrossRef

Dong, Y.; Deng, L.: Automatic Speech Recognition. Springer, Berlin (2012)MATH

Dahl, G.E.; Dong, Y.; Deng, L.; Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2012)CrossRef

Hinton, G.; Deng, L.; Dong, Y.; Dahl, G.E.; Mohamed, A.; Jaitly, N.; Senior, A.; Vanhoucke, V.; Nguyen, P.; Sainath, T.N.; et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)CrossRef

Seide, F.; Li, G.; Yu, D.: Conversational speech transcription using context-dependent deep neural networks. In: Interspeech, pp. 437–440 (2011)

University of Cambridge. The hidden markov model toolkit (HTK) (2015). http://htk.eng.cam.ac.uk/

CMU. Cmusphinx speech recognition toolkit. http://cmusphinx.sourceforge.net/

10.

KALDI. Kaldi speech recognition toolkit (2015). http://kaldi.sourceforge.net/

11.

Stolcke, A.; et al.: Srilm-an extensible language modeling toolkit. In: Interspeech, vol. 2002, p. 2002 (2002)

12.

United Nations. http://www.un.org/en/sections/about-un/official-languages/ (2016). Accessed 17 July 2016

13.

Gordon Jr, Raymond G.: Ethnologue: Languages of the world, Dallas, Texas.: SIL international (2005). Online version: http://www.ethnologue.com

14.

Kirchhoff, K.; Bilmes, J.; Henderson, J.; Schwartz, R.; Noamany, M.; Schone, P.; Ji, G.; Das, S.; Egan, M.; He, F. et al.: Novel speech recognition models for Arabic. In: Johns-Hopkins University Summer Research Workshop (2002)

15.

Alghamdi, M.; Elshafei, M.; Al-Muhtaseb, H.: Arabic broadcast news transcription system. Int. J. Speech Technol. 10(4), 183–195 (2007)CrossRef

16.

Abushariah, M.A.M.; Ainon, R.N.; Zainuddin, R.; Elshafei, M.; Khalifa, O.O.: Natural speaker-independent arabic speech recognition system based on hidden Markov models using sphinx tools. In: International Conference on Computer and Communication Engineering (ICCCE), pp. 1–6 (2010)

17.

Abushariah, M.A.A.M.; Ainon, R.; Zainuddin, R.; Elshafei, M.; Khalifa, O.O.: Arabic speaker-independent continuous automatic speech recognition based on a phonetically rich and balanced speech corpus. Int. Arab J. Inf. Technol. (IAJIT) 9(1), 84–93 (2012)

18.

Abushariah, M.A.M.; Ainon, R.N.; Zainuddin, R.; Elshafei, M.; Khalifa, O.O.: Phonetically rich and balanced speech corpus for arabic speaker-independent continuous automatic speech recognition systems. In: 10th International Conference on Information Sciences Signal Processing and their Applications (ISSPA), pp. 65–68 (2010)

19.

Hyassat, H.; Zitar, R.A.: Arabic speech recognition using sphinx engine. Int. J. Speech Technol. 9(3–4), 133–150 (2006)CrossRef

20.

Ali, A.; Zhang, Y.; Cardinal, P.; Dahak, N.; Vogel, S.; Glass, J.: A complete kaldi recipe for building Arabic speech recognition systems. In: Spoken Language Technology Workshop (SLT), IEEE, pp. 525–529 (2014)

21.

Ali, A.; Vogel, S.; Renals, S.: Speech recognition challenge in the wild: Arabic MGB-3. In: IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 316–322 (2017)

22.

AlShafei, M.: KFUPM Arabic speech processing group (2016). http://www.ccse.kfupm.edu.sa/~elshafei/AASR.htm. Accessed 17 Dec 2017

23.

Woodland, P.C.: Speaker adaptation for continuous density hmms: A review. In: ISCA Tutorial and Research Workshop (ITRW) on Adaptation Methods for Speech Recognition, pp. 11–19 (2001)

24.

Anastasakos, T.; McDonough, J.; Schwartz, R.; Makhoul; J.: A compact model for speaker-adaptive training. In: Proceedings of Fourth International Conference on Spoken Language, ICSLP 96. vol. 2, pp. 1137–1140 (1996)

25.

Pye, D.; Woodland, P.C.: Experiments in speaker normalisation and adaptation for large vocabulary speech recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, 1997. ICASSP-97, vol. 2, pp. 1047–1050 (1997)

26.

Gauvain, J.-L.; Lee, C.-H.: Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Trans. Speech Audio Process. 2(2), 291–298 (1994)CrossRef

27.

Leggetter, C.J.; Woodland, P.C.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Comput. Speech Lang. 9(2), 171–185 (1995)CrossRef

28.

Shinoda, K.: Speaker adaptation techniques for automatic speech recognition. In: Proceeding of APSIPA ASC (2011)

29.

Titus Felix Furtună: Dynamic programming algorithms in speech recognition. Revista Informatica Economică nr 2(46), 94–99 (2008)

30.

Chakraborty, C.; Talukdar, P.H.: Issues and limitations of hmm in speech processing: a survey. Int. J. Comput. Appl. 141(7), 13–17 (2016)

31.

Melnikoff, S.J.; Quigley, S.F.; Russell, M.J.: Implementing a hidden Markov model speech recognition system in programmable logic. In: International Conference on Field Programmable Logic and Applications, pp. 81–90. Springer, Berlin (2001)

32.

Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)CrossRef

33.

Holmes, W.: Speech Synthesis and Recognition. CRC Press, Boca Raton (2001)

34.

Swee, L.H.: Implementing speech-recognition algorithms on the TMS320C2xx platform (1988)

35.

Buchsbaum, A.L.; Giancarlo, R.: Algorithmic aspects in speech recognition: an introduction. J. Exp. Algorithm. (JEA) 2, 1 (1997)MathSciNetCrossRef

36.

Satori, H.: Arabic speech recognition system based on CMUSphinx, pp. 28–35 (2007)

37.

Juang, B.H.; Rabiner, L.R.: Hidden markov models for speech recognition. Technometrics 33(3), 251–272 (1991)MathSciNetCrossRefMATH

38.

Hinton, G.; Deng, L.; Yu, D.; Dahl, G.E.; Mohamed, A.; Jaitly, N.; Senior, A.; Vanhoucke, V.; Nguyen, P.; Sainath, T.N.; Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012). https://doi.org/10.1109/MSP.2012.2205597. ISSN 1053-5888CrossRef

39.

Saeed, K.; Nammous, M.: Heuristic method of Arabic speech recognition. In: Proceedings of IEEE 7th International Conference on DSPA, pp. 528–530 (2005)

40.

Deng, L.; Dong, Y.: Deep learning: methods and applications. Found. Trends Signal Process. 7(3–4), 197–387 (2014)MathSciNetCrossRefMATH

41.

Yu, D.; Deng, L.; Dahl, G.: Roles of pre-training and fine-tuning in context-dependent DBN-HMMS for real-world speech recognition. In: Proceedings of NIPS Workshop on Deep Learning and Unsupervised Feature Learning (2010)

42.

Deng, L.; Li, J.; Huang, J.-T.; Yao, K.; Yu, D.; Seide, F.; Seltzer, M.; Zweig, G.; He, X.; Williams, J. et al.: Recent advances in deep learning for speech research at microsoft. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8604–8608 (2013)

43.

Nielsen, J.: Usability Engineering, 1st edn. Morgan Kaufmann, Burlington (1994)MATH

44.

Rashid, R.: Microsoft research shows a promising new breakthrough in speech translation technology, (2012). http://blogs.microsoft.com/next/2012/11/08/microsoft-research-shows-a-promising-new-breakthrough-in-speech-translation-technology/

45.

Povey, D.; Ghoshal, A.; Boulianne, G.; Burget, L.; Glembek, O.; Goel, N.; Hannemann, M.; Motlicek, P.; Qian, Y.; Schwarz, P. et al.: The kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding, number EPFL-CONF-192584. IEEE Signal Processing Society (2011)

46.

Allauzen, C.; Riley, M.; Schalkwyk, J.; Skut, W.; Mohri, M.: Openfst: a general and efficient weighted finite-state transducer library. In: International Conference on Implementation and Application of Automata, pp. 11–23. Springer, Berlin (2007)

47.

Gales, M.; Young, S.: The application of hidden Markov models in speech recognition. Found. Trends Signal Process. 1(3), 195–304 (2008)CrossRefMATH

48.

Gales, M.J.F.: Discriminative models for speech recognition. In: 2007 Information Theory and Applications Workshop, pp. 170–176 (2007)

49.

Tremain, T.E.: The government standard linear predictive coding algorithm: LPC-10. Speech Technol. 1(2), 40–49 (1982)

50.

Biadsy, F.; Moreno, P.J.; Jansche, M.: Google’s cross-dialect arabic voice search. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4441–4444 (2012)

51.

Vergyri, D.; Kirchhoff, K.: Automatic diacritization of Arabic for acoustic modeling in speech recognition. In: Proceedings of the Workshop On Computational Approaches to Arabic Script-based Languages, pp. 66–73. Association for Computational Linguistics (2004)

52.

Vergyri, D.; Kirchhoff, K.; Duh, K.; Stolcke, A.: Morphology-based language modeling for Arabic speech recognition. INTERSPEECH 4, 2245–2248 (2004)

53.

Soltau, H.; Saon, G.; Kingsbury, B.; Kuo, J.; Mangu, L.; Povey, D.; Zweig, G.: The IBM 2006 gale Arabic ASR system. In: IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07, vol. 4, pp. IV–349–IV–352 (2007)

54.

Djellab, M.; Amrouche, A.; Bouridane, A.; Mehallegue, N.: Algerian modern colloquial arabic speech corpus (AMCASC): regional accents recognition within complex socio-linguistic environments. Lang. Resour. Eval. 51(3), 613–641 (2017)CrossRef

55.

Mohammed, Z.Y.; Khidhir, A.S.M.: Real-time Arabic speech recognition. Int. J. Comput. Appl. 81(4), 43–45 (2013)

56.

Alkhatib, B.; Kawas, M.; Alnahhas, A.; Bondok, R.; Kannous, R.: Building an assistant mobile application for teaching Arabic pronunciation using a new approach for Arabic speech recognition. J. Theor. Appl. Inf. Technol. 95(3), 478 (2017)

57.

Gorin, A.L.; Riccardi, G.; Wright, J.H.: How may I help you? Speech Commun. 23(1), 113–127 (1997)CrossRefMATH

58.

Price, M.; Glass, J.; Chandrakasan, A.P.: A 6 mw, 5000-word real-time speech recognizer using wfst models. IEEE J. Solid-State Circuits 50(1), 102–112 (2015)CrossRef

59.

Elmisery, F.A.; Khalil, A.H.; Salama, A.E.; Hammed, H.F.: A FPGA-based hmm for a discrete Arabic speech recognition system. In: Proceedings of the 15th International Conference on Microelectronics, 2003. ICM 2003, pp. 322–325 (2003)

60.

Alghamdi, M.; Muzaffar, Z.; Alhakami, H.: Automatic restoration of arabic diacritics: a simple, purely statistical approach. Arab. J. Sci. Eng. 35(2), 125 (2010)

61.

http://tahadz.com/mishkal. Accessed 17 July 2017

62.

Duda, R.O.; Hart, P.E.; Stork, D.G.: Pattern Classification, 2nd edn. Wiley, New York (2001)MATH

63.

Rath, S.P.; Povey, D.; Veselỳ, K.; Cernockỳ, J.: Improved feature processing for deep neural networks. In: INTERSPEECH, pp. 109–113 (2013)

64.

Gopinath, R.A.: Maximum likelihood modeling with Gaussian distributions for classification. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 2, pp. 661–664 (1998)

65.

Gales, M.J.F.: Semi-tied covariance matrices for hidden Markov models. IEEE Trans. Speech Audio Process. 7(3), 272–281 (1999)CrossRef

66.

Povey, D.; Saon, G.: Feature and model space speaker adaptation with full covariance Gaussians. In: INTERSPEECH, pp. 1145–1148 (2006)

67.

Gales, M.J.F.: Maximum likelihood linear transformations for HMM-based speech recognition. Comput. Speech Lang. 12(2), 75–98 (1998)CrossRef

Titel: Diacritics Effect on Arabic Speech Recognition
verfasst von: Sa’ed Abed
Mohammad Alshayeji
Sari Sultan
Publikationsdatum: 10.07.2019
Verlag: Springer Berlin Heidelberg
Erschienen in: Arabian Journal for Science and Engineering / Ausgabe 11/2019
Print ISSN: 2193-567X
Elektronische ISSN: 2191-4281
DOI: https://doi.org/10.1007/s13369-019-04024-0

Premium Partner

Marktübersichten

Die im Laufe eines Jahres in der „adhäsion“ veröffentlichten Marktübersichten helfen Anwendern verschiedenster Branchen, sich einen gezielten Überblick über Lieferantenangebote zu verschaffen.

Zur Marktübersicht

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 11/2019

Brain MR Imaging Tumor Detection Using Monogenic Signal Analysis-Based Invariant Texture Descriptors

A Novel Integrated Approach for Companion Vehicle Discovery Based on Frequent Itemset Mining on Spark

Embedded Fuzzy Logic Control System for Refrigerated Display Cabinets

Cluster-Based Architecture Capable for Device-to-Device Millimeter-Wave Communications in 5G Cellular Networks

Fast Execution of Black-Box Algorithms Through a Piece-Wise Linear Interpolation Technique

Filterbank Optimization for Text-Dependent Speaker Verification by Evolutionary Algorithm Using Spline-Defined Design Parameters

Premium Partner

Marktübersichten