Skip to main content
Erschienen in: Arabian Journal for Science and Engineering 11/2019

10.07.2019 | Research Article - Computer Engineering and Computer Science

Diacritics Effect on Arabic Speech Recognition

verfasst von: Sa’ed Abed, Mohammad Alshayeji, Sari Sultan

Erschienen in: Arabian Journal for Science and Engineering | Ausgabe 11/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Arabic is the native language for over 300 million speakers and one of the official languages in United Nations. It has a unique set of diacritics that can alter a word’s meaning. Arabic automatic speech recognition (ASR) received little attention compared to other languages, and researches were oblivious to the diacritics in most cases. Omitting diacritics circumscribes the Arabic ASR system’s usability for several applications such as voice-enabled translation, text to speech, and speech-to-speech. In this paper, we study the effect of diacritics on Arabic ASR systems. Our approach is based on building and comparing diacritized and nondiacritized models for different corpus sizes. In particular, we build Arabic ASR models using state-of-the-art technologies for 1, 2, 5, 10, and 23 h. Each of those models was trained once with a diacritized corpus and another time with a nondiacritized version of the same corpus. KALDI toolkit and SRILM were used to build eight models for each corpus that are GMM-SI, GMM SAT, GMM MPE, GMM MMI, SGMM, SGMM-bMMI, DNN, DNN-MPE. Eighty different models were created using this experimental setup. Our results show that Word Error Rates (WERs) ranged from 4.68% to 42%. Adding diacritics increased WER by 0.59% to 3.29%. Although diacritics increased WERs, it is recommended to include diacritics for ASR systems when integrated with other systems such as voice-enabled translation. We believe that the benefit of the overall accuracy of the integrated system (e.g., translation) outweighs the WER increase for the Arabic ASR system.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Davis, S.; Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980)CrossRef Davis, S.; Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980)CrossRef
2.
Zurück zum Zitat Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. J. Acoust. Soc. Am. 87(4), 1738–1752 (1990)CrossRef Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. J. Acoust. Soc. Am. 87(4), 1738–1752 (1990)CrossRef
3.
Zurück zum Zitat Rabiner, L.; Juang, B.: An introduction to hidden markov models. IEEE ASSP Mag. 3(1), 4–16 (1986)CrossRef Rabiner, L.; Juang, B.: An introduction to hidden markov models. IEEE ASSP Mag. 3(1), 4–16 (1986)CrossRef
4.
Zurück zum Zitat Dong, Y.; Deng, L.: Automatic Speech Recognition. Springer, Berlin (2012)MATH Dong, Y.; Deng, L.: Automatic Speech Recognition. Springer, Berlin (2012)MATH
5.
Zurück zum Zitat Dahl, G.E.; Dong, Y.; Deng, L.; Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2012)CrossRef Dahl, G.E.; Dong, Y.; Deng, L.; Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2012)CrossRef
6.
Zurück zum Zitat Hinton, G.; Deng, L.; Dong, Y.; Dahl, G.E.; Mohamed, A.; Jaitly, N.; Senior, A.; Vanhoucke, V.; Nguyen, P.; Sainath, T.N.; et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)CrossRef Hinton, G.; Deng, L.; Dong, Y.; Dahl, G.E.; Mohamed, A.; Jaitly, N.; Senior, A.; Vanhoucke, V.; Nguyen, P.; Sainath, T.N.; et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)CrossRef
7.
Zurück zum Zitat Seide, F.; Li, G.; Yu, D.: Conversational speech transcription using context-dependent deep neural networks. In: Interspeech, pp. 437–440 (2011) Seide, F.; Li, G.; Yu, D.: Conversational speech transcription using context-dependent deep neural networks. In: Interspeech, pp. 437–440 (2011)
11.
Zurück zum Zitat Stolcke, A.; et al.: Srilm-an extensible language modeling toolkit. In: Interspeech, vol. 2002, p. 2002 (2002) Stolcke, A.; et al.: Srilm-an extensible language modeling toolkit. In: Interspeech, vol. 2002, p. 2002 (2002)
14.
Zurück zum Zitat Kirchhoff, K.; Bilmes, J.; Henderson, J.; Schwartz, R.; Noamany, M.; Schone, P.; Ji, G.; Das, S.; Egan, M.; He, F. et al.: Novel speech recognition models for Arabic. In: Johns-Hopkins University Summer Research Workshop (2002) Kirchhoff, K.; Bilmes, J.; Henderson, J.; Schwartz, R.; Noamany, M.; Schone, P.; Ji, G.; Das, S.; Egan, M.; He, F. et al.: Novel speech recognition models for Arabic. In: Johns-Hopkins University Summer Research Workshop (2002)
15.
Zurück zum Zitat Alghamdi, M.; Elshafei, M.; Al-Muhtaseb, H.: Arabic broadcast news transcription system. Int. J. Speech Technol. 10(4), 183–195 (2007)CrossRef Alghamdi, M.; Elshafei, M.; Al-Muhtaseb, H.: Arabic broadcast news transcription system. Int. J. Speech Technol. 10(4), 183–195 (2007)CrossRef
16.
Zurück zum Zitat Abushariah, M.A.M.; Ainon, R.N.; Zainuddin, R.; Elshafei, M.; Khalifa, O.O.: Natural speaker-independent arabic speech recognition system based on hidden Markov models using sphinx tools. In: International Conference on Computer and Communication Engineering (ICCCE), pp. 1–6 (2010) Abushariah, M.A.M.; Ainon, R.N.; Zainuddin, R.; Elshafei, M.; Khalifa, O.O.: Natural speaker-independent arabic speech recognition system based on hidden Markov models using sphinx tools. In: International Conference on Computer and Communication Engineering (ICCCE), pp. 1–6 (2010)
17.
Zurück zum Zitat Abushariah, M.A.A.M.; Ainon, R.; Zainuddin, R.; Elshafei, M.; Khalifa, O.O.: Arabic speaker-independent continuous automatic speech recognition based on a phonetically rich and balanced speech corpus. Int. Arab J. Inf. Technol. (IAJIT) 9(1), 84–93 (2012) Abushariah, M.A.A.M.; Ainon, R.; Zainuddin, R.; Elshafei, M.; Khalifa, O.O.: Arabic speaker-independent continuous automatic speech recognition based on a phonetically rich and balanced speech corpus. Int. Arab J. Inf. Technol. (IAJIT) 9(1), 84–93 (2012)
18.
Zurück zum Zitat Abushariah, M.A.M.; Ainon, R.N.; Zainuddin, R.; Elshafei, M.; Khalifa, O.O.: Phonetically rich and balanced speech corpus for arabic speaker-independent continuous automatic speech recognition systems. In: 10th International Conference on Information Sciences Signal Processing and their Applications (ISSPA), pp. 65–68 (2010) Abushariah, M.A.M.; Ainon, R.N.; Zainuddin, R.; Elshafei, M.; Khalifa, O.O.: Phonetically rich and balanced speech corpus for arabic speaker-independent continuous automatic speech recognition systems. In: 10th International Conference on Information Sciences Signal Processing and their Applications (ISSPA), pp. 65–68 (2010)
19.
Zurück zum Zitat Hyassat, H.; Zitar, R.A.: Arabic speech recognition using sphinx engine. Int. J. Speech Technol. 9(3–4), 133–150 (2006)CrossRef Hyassat, H.; Zitar, R.A.: Arabic speech recognition using sphinx engine. Int. J. Speech Technol. 9(3–4), 133–150 (2006)CrossRef
20.
Zurück zum Zitat Ali, A.; Zhang, Y.; Cardinal, P.; Dahak, N.; Vogel, S.; Glass, J.: A complete kaldi recipe for building Arabic speech recognition systems. In: Spoken Language Technology Workshop (SLT), IEEE, pp. 525–529 (2014) Ali, A.; Zhang, Y.; Cardinal, P.; Dahak, N.; Vogel, S.; Glass, J.: A complete kaldi recipe for building Arabic speech recognition systems. In: Spoken Language Technology Workshop (SLT), IEEE, pp. 525–529 (2014)
21.
Zurück zum Zitat Ali, A.; Vogel, S.; Renals, S.: Speech recognition challenge in the wild: Arabic MGB-3. In: IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 316–322 (2017) Ali, A.; Vogel, S.; Renals, S.: Speech recognition challenge in the wild: Arabic MGB-3. In: IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 316–322 (2017)
23.
Zurück zum Zitat Woodland, P.C.: Speaker adaptation for continuous density hmms: A review. In: ISCA Tutorial and Research Workshop (ITRW) on Adaptation Methods for Speech Recognition, pp. 11–19 (2001) Woodland, P.C.: Speaker adaptation for continuous density hmms: A review. In: ISCA Tutorial and Research Workshop (ITRW) on Adaptation Methods for Speech Recognition, pp. 11–19 (2001)
24.
Zurück zum Zitat Anastasakos, T.; McDonough, J.; Schwartz, R.; Makhoul; J.: A compact model for speaker-adaptive training. In: Proceedings of Fourth International Conference on Spoken Language, ICSLP 96. vol. 2, pp. 1137–1140 (1996) Anastasakos, T.; McDonough, J.; Schwartz, R.; Makhoul; J.: A compact model for speaker-adaptive training. In: Proceedings of Fourth International Conference on Spoken Language, ICSLP 96. vol. 2, pp. 1137–1140 (1996)
25.
Zurück zum Zitat Pye, D.; Woodland, P.C.: Experiments in speaker normalisation and adaptation for large vocabulary speech recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, 1997. ICASSP-97, vol. 2, pp. 1047–1050 (1997) Pye, D.; Woodland, P.C.: Experiments in speaker normalisation and adaptation for large vocabulary speech recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, 1997. ICASSP-97, vol. 2, pp. 1047–1050 (1997)
26.
Zurück zum Zitat Gauvain, J.-L.; Lee, C.-H.: Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Trans. Speech Audio Process. 2(2), 291–298 (1994)CrossRef Gauvain, J.-L.; Lee, C.-H.: Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Trans. Speech Audio Process. 2(2), 291–298 (1994)CrossRef
27.
Zurück zum Zitat Leggetter, C.J.; Woodland, P.C.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Comput. Speech Lang. 9(2), 171–185 (1995)CrossRef Leggetter, C.J.; Woodland, P.C.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Comput. Speech Lang. 9(2), 171–185 (1995)CrossRef
28.
Zurück zum Zitat Shinoda, K.: Speaker adaptation techniques for automatic speech recognition. In: Proceeding of APSIPA ASC (2011) Shinoda, K.: Speaker adaptation techniques for automatic speech recognition. In: Proceeding of APSIPA ASC (2011)
29.
Zurück zum Zitat Titus Felix Furtună: Dynamic programming algorithms in speech recognition. Revista Informatica Economică nr 2(46), 94–99 (2008) Titus Felix Furtună: Dynamic programming algorithms in speech recognition. Revista Informatica Economică nr 2(46), 94–99 (2008)
30.
Zurück zum Zitat Chakraborty, C.; Talukdar, P.H.: Issues and limitations of hmm in speech processing: a survey. Int. J. Comput. Appl. 141(7), 13–17 (2016) Chakraborty, C.; Talukdar, P.H.: Issues and limitations of hmm in speech processing: a survey. Int. J. Comput. Appl. 141(7), 13–17 (2016)
31.
Zurück zum Zitat Melnikoff, S.J.; Quigley, S.F.; Russell, M.J.: Implementing a hidden Markov model speech recognition system in programmable logic. In: International Conference on Field Programmable Logic and Applications, pp. 81–90. Springer, Berlin (2001) Melnikoff, S.J.; Quigley, S.F.; Russell, M.J.: Implementing a hidden Markov model speech recognition system in programmable logic. In: International Conference on Field Programmable Logic and Applications, pp. 81–90. Springer, Berlin (2001)
32.
Zurück zum Zitat Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)CrossRef Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)CrossRef
33.
Zurück zum Zitat Holmes, W.: Speech Synthesis and Recognition. CRC Press, Boca Raton (2001) Holmes, W.: Speech Synthesis and Recognition. CRC Press, Boca Raton (2001)
34.
Zurück zum Zitat Swee, L.H.: Implementing speech-recognition algorithms on the TMS320C2xx platform (1988) Swee, L.H.: Implementing speech-recognition algorithms on the TMS320C2xx platform (1988)
35.
Zurück zum Zitat Buchsbaum, A.L.; Giancarlo, R.: Algorithmic aspects in speech recognition: an introduction. J. Exp. Algorithm. (JEA) 2, 1 (1997)MathSciNetCrossRef Buchsbaum, A.L.; Giancarlo, R.: Algorithmic aspects in speech recognition: an introduction. J. Exp. Algorithm. (JEA) 2, 1 (1997)MathSciNetCrossRef
36.
Zurück zum Zitat Satori, H.: Arabic speech recognition system based on CMUSphinx, pp. 28–35 (2007) Satori, H.: Arabic speech recognition system based on CMUSphinx, pp. 28–35 (2007)
38.
Zurück zum Zitat Hinton, G.; Deng, L.; Yu, D.; Dahl, G.E.; Mohamed, A.; Jaitly, N.; Senior, A.; Vanhoucke, V.; Nguyen, P.; Sainath, T.N.; Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012). https://doi.org/10.1109/MSP.2012.2205597. ISSN 1053-5888CrossRef Hinton, G.; Deng, L.; Yu, D.; Dahl, G.E.; Mohamed, A.; Jaitly, N.; Senior, A.; Vanhoucke, V.; Nguyen, P.; Sainath, T.N.; Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012). https://​doi.​org/​10.​1109/​MSP.​2012.​2205597. ISSN 1053-5888CrossRef
39.
Zurück zum Zitat Saeed, K.; Nammous, M.: Heuristic method of Arabic speech recognition. In: Proceedings of IEEE 7th International Conference on DSPA, pp. 528–530 (2005) Saeed, K.; Nammous, M.: Heuristic method of Arabic speech recognition. In: Proceedings of IEEE 7th International Conference on DSPA, pp. 528–530 (2005)
40.
41.
Zurück zum Zitat Yu, D.; Deng, L.; Dahl, G.: Roles of pre-training and fine-tuning in context-dependent DBN-HMMS for real-world speech recognition. In: Proceedings of NIPS Workshop on Deep Learning and Unsupervised Feature Learning (2010) Yu, D.; Deng, L.; Dahl, G.: Roles of pre-training and fine-tuning in context-dependent DBN-HMMS for real-world speech recognition. In: Proceedings of NIPS Workshop on Deep Learning and Unsupervised Feature Learning (2010)
42.
Zurück zum Zitat Deng, L.; Li, J.; Huang, J.-T.; Yao, K.; Yu, D.; Seide, F.; Seltzer, M.; Zweig, G.; He, X.; Williams, J. et al.: Recent advances in deep learning for speech research at microsoft. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8604–8608 (2013) Deng, L.; Li, J.; Huang, J.-T.; Yao, K.; Yu, D.; Seide, F.; Seltzer, M.; Zweig, G.; He, X.; Williams, J. et al.: Recent advances in deep learning for speech research at microsoft. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8604–8608 (2013)
43.
Zurück zum Zitat Nielsen, J.: Usability Engineering, 1st edn. Morgan Kaufmann, Burlington (1994)MATH Nielsen, J.: Usability Engineering, 1st edn. Morgan Kaufmann, Burlington (1994)MATH
45.
Zurück zum Zitat Povey, D.; Ghoshal, A.; Boulianne, G.; Burget, L.; Glembek, O.; Goel, N.; Hannemann, M.; Motlicek, P.; Qian, Y.; Schwarz, P. et al.: The kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding, number EPFL-CONF-192584. IEEE Signal Processing Society (2011) Povey, D.; Ghoshal, A.; Boulianne, G.; Burget, L.; Glembek, O.; Goel, N.; Hannemann, M.; Motlicek, P.; Qian, Y.; Schwarz, P. et al.: The kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding, number EPFL-CONF-192584. IEEE Signal Processing Society (2011)
46.
Zurück zum Zitat Allauzen, C.; Riley, M.; Schalkwyk, J.; Skut, W.; Mohri, M.: Openfst: a general and efficient weighted finite-state transducer library. In: International Conference on Implementation and Application of Automata, pp. 11–23. Springer, Berlin (2007) Allauzen, C.; Riley, M.; Schalkwyk, J.; Skut, W.; Mohri, M.: Openfst: a general and efficient weighted finite-state transducer library. In: International Conference on Implementation and Application of Automata, pp. 11–23. Springer, Berlin (2007)
47.
Zurück zum Zitat Gales, M.; Young, S.: The application of hidden Markov models in speech recognition. Found. Trends Signal Process. 1(3), 195–304 (2008)CrossRefMATH Gales, M.; Young, S.: The application of hidden Markov models in speech recognition. Found. Trends Signal Process. 1(3), 195–304 (2008)CrossRefMATH
48.
Zurück zum Zitat Gales, M.J.F.: Discriminative models for speech recognition. In: 2007 Information Theory and Applications Workshop, pp. 170–176 (2007) Gales, M.J.F.: Discriminative models for speech recognition. In: 2007 Information Theory and Applications Workshop, pp. 170–176 (2007)
49.
Zurück zum Zitat Tremain, T.E.: The government standard linear predictive coding algorithm: LPC-10. Speech Technol. 1(2), 40–49 (1982) Tremain, T.E.: The government standard linear predictive coding algorithm: LPC-10. Speech Technol. 1(2), 40–49 (1982)
50.
Zurück zum Zitat Biadsy, F.; Moreno, P.J.; Jansche, M.: Google’s cross-dialect arabic voice search. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4441–4444 (2012) Biadsy, F.; Moreno, P.J.; Jansche, M.: Google’s cross-dialect arabic voice search. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4441–4444 (2012)
51.
Zurück zum Zitat Vergyri, D.; Kirchhoff, K.: Automatic diacritization of Arabic for acoustic modeling in speech recognition. In: Proceedings of the Workshop On Computational Approaches to Arabic Script-based Languages, pp. 66–73. Association for Computational Linguistics (2004) Vergyri, D.; Kirchhoff, K.: Automatic diacritization of Arabic for acoustic modeling in speech recognition. In: Proceedings of the Workshop On Computational Approaches to Arabic Script-based Languages, pp. 66–73. Association for Computational Linguistics (2004)
52.
Zurück zum Zitat Vergyri, D.; Kirchhoff, K.; Duh, K.; Stolcke, A.: Morphology-based language modeling for Arabic speech recognition. INTERSPEECH 4, 2245–2248 (2004) Vergyri, D.; Kirchhoff, K.; Duh, K.; Stolcke, A.: Morphology-based language modeling for Arabic speech recognition. INTERSPEECH 4, 2245–2248 (2004)
53.
Zurück zum Zitat Soltau, H.; Saon, G.; Kingsbury, B.; Kuo, J.; Mangu, L.; Povey, D.; Zweig, G.: The IBM 2006 gale Arabic ASR system. In: IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07, vol. 4, pp. IV–349–IV–352 (2007) Soltau, H.; Saon, G.; Kingsbury, B.; Kuo, J.; Mangu, L.; Povey, D.; Zweig, G.: The IBM 2006 gale Arabic ASR system. In: IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07, vol. 4, pp. IV–349–IV–352 (2007)
54.
Zurück zum Zitat Djellab, M.; Amrouche, A.; Bouridane, A.; Mehallegue, N.: Algerian modern colloquial arabic speech corpus (AMCASC): regional accents recognition within complex socio-linguistic environments. Lang. Resour. Eval. 51(3), 613–641 (2017)CrossRef Djellab, M.; Amrouche, A.; Bouridane, A.; Mehallegue, N.: Algerian modern colloquial arabic speech corpus (AMCASC): regional accents recognition within complex socio-linguistic environments. Lang. Resour. Eval. 51(3), 613–641 (2017)CrossRef
55.
Zurück zum Zitat Mohammed, Z.Y.; Khidhir, A.S.M.: Real-time Arabic speech recognition. Int. J. Comput. Appl. 81(4), 43–45 (2013) Mohammed, Z.Y.; Khidhir, A.S.M.: Real-time Arabic speech recognition. Int. J. Comput. Appl. 81(4), 43–45 (2013)
56.
Zurück zum Zitat Alkhatib, B.; Kawas, M.; Alnahhas, A.; Bondok, R.; Kannous, R.: Building an assistant mobile application for teaching Arabic pronunciation using a new approach for Arabic speech recognition. J. Theor. Appl. Inf. Technol. 95(3), 478 (2017) Alkhatib, B.; Kawas, M.; Alnahhas, A.; Bondok, R.; Kannous, R.: Building an assistant mobile application for teaching Arabic pronunciation using a new approach for Arabic speech recognition. J. Theor. Appl. Inf. Technol. 95(3), 478 (2017)
57.
Zurück zum Zitat Gorin, A.L.; Riccardi, G.; Wright, J.H.: How may I help you? Speech Commun. 23(1), 113–127 (1997)CrossRefMATH Gorin, A.L.; Riccardi, G.; Wright, J.H.: How may I help you? Speech Commun. 23(1), 113–127 (1997)CrossRefMATH
58.
Zurück zum Zitat Price, M.; Glass, J.; Chandrakasan, A.P.: A 6 mw, 5000-word real-time speech recognizer using wfst models. IEEE J. Solid-State Circuits 50(1), 102–112 (2015)CrossRef Price, M.; Glass, J.; Chandrakasan, A.P.: A 6 mw, 5000-word real-time speech recognizer using wfst models. IEEE J. Solid-State Circuits 50(1), 102–112 (2015)CrossRef
59.
Zurück zum Zitat Elmisery, F.A.; Khalil, A.H.; Salama, A.E.; Hammed, H.F.: A FPGA-based hmm for a discrete Arabic speech recognition system. In: Proceedings of the 15th International Conference on Microelectronics, 2003. ICM 2003, pp. 322–325 (2003) Elmisery, F.A.; Khalil, A.H.; Salama, A.E.; Hammed, H.F.: A FPGA-based hmm for a discrete Arabic speech recognition system. In: Proceedings of the 15th International Conference on Microelectronics, 2003. ICM 2003, pp. 322–325 (2003)
60.
Zurück zum Zitat Alghamdi, M.; Muzaffar, Z.; Alhakami, H.: Automatic restoration of arabic diacritics: a simple, purely statistical approach. Arab. J. Sci. Eng. 35(2), 125 (2010) Alghamdi, M.; Muzaffar, Z.; Alhakami, H.: Automatic restoration of arabic diacritics: a simple, purely statistical approach. Arab. J. Sci. Eng. 35(2), 125 (2010)
62.
Zurück zum Zitat Duda, R.O.; Hart, P.E.; Stork, D.G.: Pattern Classification, 2nd edn. Wiley, New York (2001)MATH Duda, R.O.; Hart, P.E.; Stork, D.G.: Pattern Classification, 2nd edn. Wiley, New York (2001)MATH
63.
Zurück zum Zitat Rath, S.P.; Povey, D.; Veselỳ, K.; Cernockỳ, J.: Improved feature processing for deep neural networks. In: INTERSPEECH, pp. 109–113 (2013) Rath, S.P.; Povey, D.; Veselỳ, K.; Cernockỳ, J.: Improved feature processing for deep neural networks. In: INTERSPEECH, pp. 109–113 (2013)
64.
Zurück zum Zitat Gopinath, R.A.: Maximum likelihood modeling with Gaussian distributions for classification. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 2, pp. 661–664 (1998) Gopinath, R.A.: Maximum likelihood modeling with Gaussian distributions for classification. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 2, pp. 661–664 (1998)
65.
Zurück zum Zitat Gales, M.J.F.: Semi-tied covariance matrices for hidden Markov models. IEEE Trans. Speech Audio Process. 7(3), 272–281 (1999)CrossRef Gales, M.J.F.: Semi-tied covariance matrices for hidden Markov models. IEEE Trans. Speech Audio Process. 7(3), 272–281 (1999)CrossRef
66.
Zurück zum Zitat Povey, D.; Saon, G.: Feature and model space speaker adaptation with full covariance Gaussians. In: INTERSPEECH, pp. 1145–1148 (2006) Povey, D.; Saon, G.: Feature and model space speaker adaptation with full covariance Gaussians. In: INTERSPEECH, pp. 1145–1148 (2006)
67.
Zurück zum Zitat Gales, M.J.F.: Maximum likelihood linear transformations for HMM-based speech recognition. Comput. Speech Lang. 12(2), 75–98 (1998)CrossRef Gales, M.J.F.: Maximum likelihood linear transformations for HMM-based speech recognition. Comput. Speech Lang. 12(2), 75–98 (1998)CrossRef
Metadaten
Titel
Diacritics Effect on Arabic Speech Recognition
verfasst von
Sa’ed Abed
Mohammad Alshayeji
Sari Sultan
Publikationsdatum
10.07.2019
Verlag
Springer Berlin Heidelberg
Erschienen in
Arabian Journal for Science and Engineering / Ausgabe 11/2019
Print ISSN: 2193-567X
Elektronische ISSN: 2191-4281
DOI
https://doi.org/10.1007/s13369-019-04024-0

Weitere Artikel der Ausgabe 11/2019

Arabian Journal for Science and Engineering 11/2019 Zur Ausgabe

Research Article - Computer Engineering and Computer Science

Embedded Fuzzy Logic Control System for Refrigerated Display Cabinets

    Marktübersichten

    Die im Laufe eines Jahres in der „adhäsion“ veröffentlichten Marktübersichten helfen Anwendern verschiedenster Branchen, sich einen gezielten Überblick über Lieferantenangebote zu verschaffen.