Skip to main content

2019 | OriginalPaper | Buchkapitel

Arabic Speech Recognition with Deep Learning: A Review

verfasst von : Wajdan Algihab, Noura Alawwad, Anfal Aldawish, Sarah AlHumoud

Erschienen in: Social Computing and Social Media. Design, Human Behavior and Analytics

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Automatic speech recognition is the area of research concerning the enablement of machines to accept vocal input from humans and interpreting it with the highest probability of correctness. There are several techniques to implement speech recognition models. One of the emerging techniques is using neural networks with deep learning for speech recognition. Arabic is one of the most spoken languages and least highlighted in terms of speech recognition. This paper serves as a brief review on the available studies on Arabic speech recognition. In addition, it sheds some light on the services and toolkits available for Arabic speech recognition systems’ development.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat El Choubassi, M.M., El Khoury, H.E., Alagha, C.E.J., Skaf, J.A., Al-Alaoui, M.A.: Arabic speech recognition using recurrent neural networks. In: Proceedings of the 3rd IEEE International Symposium on Signal Processing and Information Technology (IEEE Cat. No. 03EX795), Darmstadt, Germany, pp. 543–547 (2004) El Choubassi, M.M., El Khoury, H.E., Alagha, C.E.J., Skaf, J.A., Al-Alaoui, M.A.: Arabic speech recognition using recurrent neural networks. In: Proceedings of the 3rd IEEE International Symposium on Signal Processing and Information Technology (IEEE Cat. No. 03EX795), Darmstadt, Germany, pp. 543–547 (2004)
3.
Zurück zum Zitat Ahmed, B.H.A., Ghabayen, A.S.: Arabic automatic speech recognition enhancement. In: 2017 Palestinian International Conference on Information and Communication Technology (PICICT), Gaza, Palestine, pp. 98–102 (2017) Ahmed, B.H.A., Ghabayen, A.S.: Arabic automatic speech recognition enhancement. In: 2017 Palestinian International Conference on Information and Communication Technology (PICICT), Gaza, Palestine, pp. 98–102 (2017)
4.
Zurück zum Zitat Al-Anzi, F., AbuZeina, D.: Literature survey of Arabic speech recognition. In: International Conference on Computing Sciences and Engineering (ICCSE) (2018) Al-Anzi, F., AbuZeina, D.: Literature survey of Arabic speech recognition. In: International Conference on Computing Sciences and Engineering (ICCSE) (2018)
5.
Zurück zum Zitat Rana, C.: A review: speech recognition with deep learning methods, p. 8 (2015) Rana, C.: A review: speech recognition with deep learning methods, p. 8 (2015)
6.
Zurück zum Zitat Kitchenham, B.: Procedures for performing systematic reviews. Joint Technical report, Keele University Technical report (TR/SE-0401) and NICTA Technical report (0400011T.1), July 2004 (2004) Kitchenham, B.: Procedures for performing systematic reviews. Joint Technical report, Keele University Technical report (TR/SE-0401) and NICTA Technical report (0400011T.1), July 2004 (2004)
7.
Zurück zum Zitat Heckman, S., Williams, L.: A systematic literature review of actionable alert identification techniques for automated static code analysis Heckman, S., Williams, L.: A systematic literature review of actionable alert identification techniques for automated static code analysis
8.
Zurück zum Zitat Nasereddin, H.H.O., Omari, A.A.R.: Classification techniques for automatic speech recognition (ASR) algorithms used with real time speech translation. In: 2017 Computing Conference, London, pp. 200–207 (2017) Nasereddin, H.H.O., Omari, A.A.R.: Classification techniques for automatic speech recognition (ASR) algorithms used with real time speech translation. In: 2017 Computing Conference, London, pp. 200–207 (2017)
9.
Zurück zum Zitat Shanbhogue, M., Kulkarni, S., Suprith, R.: A study on speech recognition, vol. 4, p. 6 (2016) Shanbhogue, M., Kulkarni, S., Suprith, R.: A study on speech recognition, vol. 4, p. 6 (2016)
12.
Zurück zum Zitat Turab, N., Khatatneh, K., Odeh, A.: A novel Arabic Speech Recognition method using neural networks and Gaussian Filtering. (IJEECS) Int. J. Electr. Electron. Comput. Syst. 19(01) (2014) Turab, N., Khatatneh, K., Odeh, A.: A novel Arabic Speech Recognition method using neural networks and Gaussian Filtering. (IJEECS) Int. J. Electr. Electron. Comput. Syst. 19(01) (2014)
13.
Zurück zum Zitat Emami, A., Mangu, L.: Empirical study of neural network language models for Arabic speech recognition. In: 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU), The Westin Miyako Kyoto, pp. 147–152 (2007) Emami, A., Mangu, L.: Empirical study of neural network language models for Arabic speech recognition. In: 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU), The Westin Miyako Kyoto, pp. 147–152 (2007)
14.
Zurück zum Zitat Desai, N., Dhameliya, K., Desai, V.: Feature extraction and classification techniques for speech recognition: a review, 3(12), 5 (2013) Desai, N., Dhameliya, K., Desai, V.: Feature extraction and classification techniques for speech recognition: a review, 3(12), 5 (2013)
15.
Zurück zum Zitat Kirchhoff, K., Vergyri, D., Bilmes, J., Duh, K., Stolcke, A.: Morphology-based language modeling for conversational Arabic speech recognition. Comput. Speech Lang. 20(4), 589–608 (2006)CrossRef Kirchhoff, K., Vergyri, D., Bilmes, J., Duh, K., Stolcke, A.: Morphology-based language modeling for conversational Arabic speech recognition. Comput. Speech Lang. 20(4), 589–608 (2006)CrossRef
16.
Zurück zum Zitat Emami, A., Mangu, L.: Empirical study of neural network language models for Arabic speech recognition. In: IEEE Workshop on Automatic Speech Recognition & Understanding, ASRU. IEEE (2007) Emami, A., Mangu, L.: Empirical study of neural network language models for Arabic speech recognition. In: IEEE Workshop on Automatic Speech Recognition & Understanding, ASRU. IEEE (2007)
17.
Zurück zum Zitat Alghamdi, M., Elshafei, M., Al-Muhtaseb, H.: Arabic broadcast news transcription system. Int. J. Speech Technol. 10(4), 183–195 (2007)CrossRef Alghamdi, M., Elshafei, M., Al-Muhtaseb, H.: Arabic broadcast news transcription system. Int. J. Speech Technol. 10(4), 183–195 (2007)CrossRef
18.
Zurück zum Zitat Hyassat, H., Abu Zitar, R.: Arabic speech recognition using SPHINX engine. Int. J. Speech Technol. 9(3–4), 133–150 (2006)CrossRef Hyassat, H., Abu Zitar, R.: Arabic speech recognition using SPHINX engine. Int. J. Speech Technol. 9(3–4), 133–150 (2006)CrossRef
19.
Zurück zum Zitat Elmahdy, M., et al.: Modern standard Arabic based multilingual approach for dialectal Arabic speech recognition. In: Eighth International Symposium on Natural Language Processing, SNLP 2009. IEEE (2009) Elmahdy, M., et al.: Modern standard Arabic based multilingual approach for dialectal Arabic speech recognition. In: Eighth International Symposium on Natural Language Processing, SNLP 2009. IEEE (2009)
20.
Zurück zum Zitat Selouani, S.A., Boudraa, M.: Algerian Arabic speech database (ALGASD): corpus design and automatic speech recognition application. Arab. J. Sci. Eng. 35(2C), 15 (2010) Selouani, S.A., Boudraa, M.: Algerian Arabic speech database (ALGASD): corpus design and automatic speech recognition application. Arab. J. Sci. Eng. 35(2C), 15 (2010)
21.
Zurück zum Zitat Jurafsky, D., Martin, J.: Speech and Language Processing. Prentice Hall, Upper Saddle River (2000) Jurafsky, D., Martin, J.: Speech and Language Processing. Prentice Hall, Upper Saddle River (2000)
22.
Zurück zum Zitat AbdAlmisreb, A., Abidin, A.F., Tahir, N.: Maxout based deep neural networks for Arabic phonemes recognition, p. 6 (2015) AbdAlmisreb, A., Abidin, A.F., Tahir, N.: Maxout based deep neural networks for Arabic phonemes recognition, p. 6 (2015)
23.
Zurück zum Zitat Amrouche, A., Rouvaen, J.M.: Arabic isolated word recognition using general regression neural network. In: 2003 46th Midwest Symposium on Circuits and Systems, Cairo, Egypt, vol. 2, pp. 689–692 (2003) Amrouche, A., Rouvaen, J.M.: Arabic isolated word recognition using general regression neural network. In: 2003 46th Midwest Symposium on Circuits and Systems, Cairo, Egypt, vol. 2, pp. 689–692 (2003)
24.
Zurück zum Zitat Alotaibi, Y.A.: Spoken Arabic digits recognizer using recurrent neural networks. In: Proceedings of the Fourth IEEE International Symposium on Signal Processing and Information Technology, Rome, Italy, pp. 195–199 (2004) Alotaibi, Y.A.: Spoken Arabic digits recognizer using recurrent neural networks. In: Proceedings of the Fourth IEEE International Symposium on Signal Processing and Information Technology, Rome, Italy, pp. 195–199 (2004)
25.
Zurück zum Zitat Alotaibi, Y.: A simple time alignment algorithm for spoken Arabic digit recognition. J. King Abdulaziz Univ.-Eng. Sci. 20(1), 29–43 (2009)CrossRef Alotaibi, Y.: A simple time alignment algorithm for spoken Arabic digit recognition. J. King Abdulaziz Univ.-Eng. Sci. 20(1), 29–43 (2009)CrossRef
26.
Zurück zum Zitat Ahmad, A.M., Ismail, S., Samaon, D.F.: Recurrent neural network with backpropagation through time for speech recognition. In: IEEE International Symposium on Communications and Information Technology, ISCIT 2004, Sapporo, Japan, vol. 1, pp. 98–102 (2004) Ahmad, A.M., Ismail, S., Samaon, D.F.: Recurrent neural network with backpropagation through time for speech recognition. In: IEEE International Symposium on Communications and Information Technology, ISCIT 2004, Sapporo, Japan, vol. 1, pp. 98–102 (2004)
27.
Zurück zum Zitat Zerari, N., Abdelhamid, S., Bouzgou, H., Raymond, C.: Bi-directional recurrent end-to-end neural network classifier for spoken Arab digit recognition. In: 2018 2nd International Conference on Natural Language and Speech Processing (ICNLSP), Algiers, pp. 1–6 (2018) Zerari, N., Abdelhamid, S., Bouzgou, H., Raymond, C.: Bi-directional recurrent end-to-end neural network classifier for spoken Arab digit recognition. In: 2018 2nd International Conference on Natural Language and Speech Processing (ICNLSP), Algiers, pp. 1–6 (2018)
29.
Zurück zum Zitat Bouchakour, L., Debyeche, M.: Improving continuous Arabic speech recognition over mobile networks DSR and NSR using MFCCs features transformed, 12, 8 (2018) Bouchakour, L., Debyeche, M.: Improving continuous Arabic speech recognition over mobile networks DSR and NSR using MFCCs features transformed, 12, 8 (2018)
30.
Zurück zum Zitat El-Desoky Mousa, A., Kuo, H.-K.J., Mangu, L., Soltau, H.: Morpheme-based feature-rich language models using deep neural networks for LVCSR of Egyptian Arabic. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, pp. 8435–8439 (2013) El-Desoky Mousa, A., Kuo, H.-K.J., Mangu, L., Soltau, H.: Morpheme-based feature-rich language models using deep neural networks for LVCSR of Egyptian Arabic. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, pp. 8435–8439 (2013)
31.
Zurück zum Zitat AlHanai, T., Hsu, W.-N., Glass, J.: Development of the MIT ASR system for the 2016 Arabic multi-genre broadcast challenge. In: 2016 IEEE Spoken Language Technology Workshop (SLT), San Diego, CA, pp. 299–304 (2016) AlHanai, T., Hsu, W.-N., Glass, J.: Development of the MIT ASR system for the 2016 Arabic multi-genre broadcast challenge. In: 2016 IEEE Spoken Language Technology Workshop (SLT), San Diego, CA, pp. 299–304 (2016)
32.
Zurück zum Zitat Cardinal, P., et al.: Recent advances in ASR applied to an Arabic transcription system for Al-Jazeera, p. 5 Cardinal, P., et al.: Recent advances in ASR applied to an Arabic transcription system for Al-Jazeera, p. 5
33.
Zurück zum Zitat Ali, A., Zhang, Y., Cardinal, P., Dahak, N., Vogel, S., Glass, J.: A complete KALDI recipe for building Arabic speech recognition systems. In: 2014 IEEE Spoken Language Technology Workshop (SLT), South Lake Tahoe, NV, USA, pp. 525–529 (2014) Ali, A., Zhang, Y., Cardinal, P., Dahak, N., Vogel, S., Glass, J.: A complete KALDI recipe for building Arabic speech recognition systems. In: 2014 IEEE Spoken Language Technology Workshop (SLT), South Lake Tahoe, NV, USA, pp. 525–529 (2014)
34.
Zurück zum Zitat Tomashenko, N., Vythelingum, K., Rousseau, A., Esteve, Y.: LIUM ASR systems for the 2016 multi-genre broadcast Arabic challenge. In: 2016 IEEE Spoken Language Technology Workshop (SLT), San Diego, CA, pp. 285–291 (2016) Tomashenko, N., Vythelingum, K., Rousseau, A., Esteve, Y.: LIUM ASR systems for the 2016 multi-genre broadcast Arabic challenge. In: 2016 IEEE Spoken Language Technology Workshop (SLT), San Diego, CA, pp. 285–291 (2016)
35.
Zurück zum Zitat Khurana, S., Ali, A.: QCRI advanced transcription system (QATS) for the Arabic multi-dialect broadcast media recognition: MGB-2 challenge. In: 2016 IEEE Spoken Language Technology Workshop (SLT), San Diego, CA, pp. 292–298 (2016) Khurana, S., Ali, A.: QCRI advanced transcription system (QATS) for the Arabic multi-dialect broadcast media recognition: MGB-2 challenge. In: 2016 IEEE Spoken Language Technology Workshop (SLT), San Diego, CA, pp. 292–298 (2016)
36.
Zurück zum Zitat Graciarena, M., Kajarekar, S., Stolcke, A., Shriberg, E.: Noise robust speaker identification for spontaneous Arabic speech. In: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2007, Honolulu, HI, pp. IV-245–IV-248 (2007) Graciarena, M., Kajarekar, S., Stolcke, A., Shriberg, E.: Noise robust speaker identification for spontaneous Arabic speech. In: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2007, Honolulu, HI, pp. IV-245–IV-248 (2007)
37.
Zurück zum Zitat Tolba, H.: Comparative experiments to evaluate the use of a CHMM-based speaker identification engine for Arabic spontaneous speech. In: 2009 2nd IEEE International Conference on Computer Science and Information Technology, Beijing, China, pp. 241–245 (2009) Tolba, H.: Comparative experiments to evaluate the use of a CHMM-based speaker identification engine for Arabic spontaneous speech. In: 2009 2nd IEEE International Conference on Computer Science and Information Technology, Beijing, China, pp. 241–245 (2009)
38.
Zurück zum Zitat Ettaouil, M., Lazaar, M., En-Naimani, Z.: A hybrid ANN/HMM models for arabic speech recognition using optimal codebook. In: 2013 8th International Conference on Intelligent Systems: Theories and Applications (SITA), Rabat, Morocco, pp. 1–5 (2013) Ettaouil, M., Lazaar, M., En-Naimani, Z.: A hybrid ANN/HMM models for arabic speech recognition using optimal codebook. In: 2013 8th International Conference on Intelligent Systems: Theories and Applications (SITA), Rabat, Morocco, pp. 1–5 (2013)
39.
Zurück zum Zitat Wahyuni, E.S.: Arabic speech recognition using MFCC feature extraction and ANN classification. In: 2017 2nd International conferences on Information Technology, Information Systems and Electrical Engineering (ICITISEE), Yogyakarta, pp. 22–25 (2017) Wahyuni, E.S.: Arabic speech recognition using MFCC feature extraction and ANN classification. In: 2017 2nd International conferences on Information Technology, Information Systems and Electrical Engineering (ICITISEE), Yogyakarta, pp. 22–25 (2017)
40.
Zurück zum Zitat Venkateswarlu, R., Kumari, R., JayaSri, G.: Speech_recognition_by_using_recurrent_neural_networks, 2(6), 7 (2011) Venkateswarlu, R., Kumari, R., JayaSri, G.: Speech_recognition_by_using_recurrent_neural_networks, 2(6), 7 (2011)
45.
Zurück zum Zitat Ali, A., Zhang, Y., Cardinal, P., Dahak, N., Vogel, S.: A complete KALDI recipe for building Arabic speech recognition systems. In: Presented at the 2014 IEEE Spoken Language Technology Workshop (SLT), pp. 225–229 (2014) Ali, A., Zhang, Y., Cardinal, P., Dahak, N., Vogel, S.: A complete KALDI recipe for building Arabic speech recognition systems. In: Presented at the 2014 IEEE Spoken Language Technology Workshop (SLT), pp. 225–229 (2014)
46.
Zurück zum Zitat Manohar, V., Povey, D., Khudanpur, S.: JHU Kaldi system for Arabic MGB-3 ASR challenge using diarization, audio-transcript alignment and transfer learning. In: 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Okinawa, pp. 346–352 (2017) Manohar, V., Povey, D., Khudanpur, S.: JHU Kaldi system for Arabic MGB-3 ASR challenge using diarization, audio-transcript alignment and transfer learning. In: 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Okinawa, pp. 346–352 (2017)
49.
Zurück zum Zitat Sim, K.C., Narayanan, A., Bagby, T., Sainath, T.N., Bacchiani, M.: Improving the efficiency of forward-backward algorithm using batched computation in TensorFlow. In: 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Okinawa, Japan (2017) Sim, K.C., Narayanan, A., Bagby, T., Sainath, T.N., Bacchiani, M.: Improving the efficiency of forward-backward algorithm using batched computation in TensorFlow. In: 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Okinawa, Japan (2017)
Metadaten
Titel
Arabic Speech Recognition with Deep Learning: A Review
verfasst von
Wajdan Algihab
Noura Alawwad
Anfal Aldawish
Sarah AlHumoud
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-21902-4_2