Skip to main content
Erschienen in: International Journal of Speech Technology 1/2022

08.08.2021

Automatic diacritization of Tunisian dialect text using SMT model

verfasst von: Abir Masmoudi, Chafik Aloulou, Abdel Ghader Sidi Abdellahi, Lamia Hadrich Belguith

Erschienen in: International Journal of Speech Technology | Ausgabe 1/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Unlike other tongues, Arabic language is characterized by its written form which is essentially consonant and may not have short vowels. One of the major functions of short vowels is to determine and facilitate the meaning of words or sentences. However, MSA texts are generally written without vowels. This fact gives rise to a great deal of morphological, semantic, and syntactic ambiguities. Thus, this ambiguity problem is not only associated with Modern Standard Arabic (MSA) but also related to Arabic dialects in general and Tunisian Dialect (TD) in particular. Compared to MSA, TD suffers from the unavailability of basic tools and linguistic resources, like sufficient amount of corpora, multilingual dictionaries, morphological and syntactic analyzers of these resources makes the processing of this language a great challenge (Masmoudi et al., 2020). Despite the numerous efforts currently underway, still some shortages persist in this field. Hence, we tried to challenge this lack by presenting our work that investigates the automatic diacritization of TD texts. In this respect, we regard the diacritization problem as a simplified phrase-based SMT (Statistical Machine Translation) task. The source language is the undiacritic text while the target language is the diacritic text. We initially go deeper into the details of TD corpus creation. This corpus is finally approved and used to build a diacritic restoration system for the TD. It is called TDTACHKIL and it can achieve a Word Error Rate (WER) of 16.7% and Diacritic Error Rate (DER) of 8.89%.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Transcription is coded following Buckwalter. For more details about it, see Habash and Rambow (2007).
 
Literatur
Zurück zum Zitat Abandah, G., Graves, A., Al-Shagoor, B., Arabiyat, A., Jamour, F., & Al-Taee, M. (2015a). Automatic diacritization of arabic text using recurrent neural networks. International Journal on Document Analysis and Recognition (IJDAR). Abandah, G., Graves, A., Al-Shagoor, B., Arabiyat, A., Jamour, F., & Al-Taee, M. (2015a). Automatic diacritization of arabic text using recurrent neural networks. International Journal on Document Analysis and Recognition (IJDAR).
Zurück zum Zitat Abandah, G., Graves, A., Arabiyat, B., Jamour, F., & Al-Taee,M. (2015b). Automatic diacritization of Arabic text using recurrent neural networks, IJDAR,volume’18, number 2, pp. 183–197. Abandah, G., Graves, A., Arabiyat, B., Jamour, F., & Al-Taee,M. (2015b). Automatic diacritization of Arabic text using recurrent neural networks, IJDAR,volume’18, number 2, pp. 183–197.
Zurück zum Zitat Afli, H., Barrault, L., & Schwenk, H. (2016). OCR error correction using statistical machine translation. International Journal of Linguistics and Computational Applications, 7(1), 175–191. Afli, H., Barrault, L., & Schwenk, H. (2016). OCR error correction using statistical machine translation. International Journal of Linguistics and Computational Applications, 7(1), 175–191.
Zurück zum Zitat Ahmed, A., & Elaraby, M. (2000). A large-scale computational processor of the Arabic morphology, and applications. PhD thesis, Faculty of Engineering, Cairo University Giza, Egypt. Ahmed, A., & Elaraby, M. (2000). A large-scale computational processor of the Arabic morphology, and applications. PhD thesis, Faculty of Engineering, Cairo University Giza, Egypt.
Zurück zum Zitat Al-Badrashiny, M., Hawwari, A., & Diab, M. (2017). A Layered Language Model based Hybrid Approach to Automatic Full Diacritization of Arabic. In Proceedings of The Third Arabic Natural Language Processing Workshop (WANLP). Al-Badrashiny, M., Hawwari, A., & Diab, M. (2017). A Layered Language Model based Hybrid Approach to Automatic Full Diacritization of Arabic. In Proceedings of The Third Arabic Natural Language Processing Workshop (WANLP).
Zurück zum Zitat Al-Taani, A., & Abu Al-Rub, S. (2009). A Rule-Based Approach for Tagging Non-Vocalized Arabic Words. The International Arab Journal of Information Technology. Al-Taani, A., & Abu Al-Rub, S. (2009). A Rule-Based Approach for Tagging Non-Vocalized Arabic Words. The International Arab Journal of Information Technology.
Zurück zum Zitat Alghamdi, M., & Muzaffar, Z. (2007). KACST Arabic diacritizer. In Proceedings of the First International Symposium on Computers and Arabic Language, Riyadh, Saudi Arabia. Alghamdi, M., & Muzaffar, Z. (2007). KACST Arabic diacritizer. In Proceedings of the First International Symposium on Computers and Arabic Language, Riyadh, Saudi Arabia.
Zurück zum Zitat Alnefaiea, R., & Azmi, M. (2017). 2017. ACLing: Automatic minimal diacritization of Arabic texts. Alnefaiea, R., & Azmi, M. (2017). 2017. ACLing: Automatic minimal diacritization of Arabic texts.
Zurück zum Zitat Alotaibi, Y. A., Meftah, A. H., & Selouani, S. A. (2013). Diacritization. In Digital Signal Processing and Signal Processing Education Meeting (DSP/SPE): Automatic Segmentation and Labeling for Levantine Arabic Speech. Alotaibi, Y. A., Meftah, A. H., & Selouani, S. A. (2013). Diacritization. In Digital Signal Processing and Signal Processing Education Meeting (DSP/SPE): Automatic Segmentation and Labeling for Levantine Arabic Speech.
Zurück zum Zitat Alqudah, S., Abandah, G., & Arabiyat, A. (2017). Investigating Hybrid Approaches for Arabic Text Diacritization with Recurrent Neural Networks. 2017 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies. Alqudah, S., Abandah, G., & Arabiyat, A. (2017). Investigating Hybrid Approaches for Arabic Text Diacritization with Recurrent Neural Networks. 2017 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies.
Zurück zum Zitat Ameur, M., Moulahoum, Y., & Guessoum, A. (2015). Restoration of Arabic Diacritics Using a Multilevel Statistical Model. In IFIP International Federation for Information Processing. Ameur, M., Moulahoum, Y., & Guessoum, A. (2015). Restoration of Arabic Diacritics Using a Multilevel Statistical Model. In IFIP International Federation for Information Processing.
Zurück zum Zitat Ayman, A. Z., Elmahdy, M., Husni, H., & Al Jaam, J. (2016). Automatic diacritics restoration for Arabic text. International Journal of Computing and Information Science, December 2016. Ayman, A. Z., Elmahdy, M., Husni, H., & Al Jaam, J. (2016). Automatic diacritics restoration for Arabic text. International Journal of Computing and Information Science, December 2016.
Zurück zum Zitat Azmi, A., & Almajed, R. (2015). A survey of automatic Arabic diacritization techniques. Natural Language Engineering, 21, pages:477–495. Azmi, A., & Almajed, R. (2015). A survey of automatic Arabic diacritization techniques. Natural Language Engineering, 21, pages:477–495.
Zurück zum Zitat Baccouche, T. (2003). L’arabe, d’une koin dialectale une langue de culture, Mémoires de la société linguistique de Paris, TomeXI, (les langues de Communication...), 87-93. Baccouche, T. (2003). L’arabe, d’une koin dialectale une langue de culture, Mémoires de la société linguistique de Paris, TomeXI, (les langues de Communication...), 87-93.
Zurück zum Zitat Baccouche, T. (1994). L’emprunt en arabe moderne, Beit Elhikma et IBLV. Baccouche, T. (1994). L’emprunt en arabe moderne, Beit Elhikma et IBLV.
Zurück zum Zitat Belinkov, Y., & Glass. J. (2015). Arabic diacritization with recurrent neural networks. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Belinkov, Y., & Glass. J. (2015). Arabic diacritization with recurrent neural networks. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing.
Zurück zum Zitat Bouamor, H., Zaghouani, W., Diab, M., Obeid, O., Kemal, O., Ghoneim, M., & Hawwari, A. (2015). A pilot study on Arabic multi-genre corpus diacritization annotation. In Proceedings of the Second Workshop on Arabic Natural Language Processing. Bouamor, H., Zaghouani, W., Diab, M., Obeid, O., Kemal, O., Ghoneim, M., & Hawwari, A. (2015). A pilot study on Arabic multi-genre corpus diacritization annotation. In Proceedings of the Second Workshop on Arabic Natural Language Processing.
Zurück zum Zitat Boujelbane, R., Mallek, M., Ellouze, M., & Belguith, L. (2014). Fine-Grained (POS) Tagging of Spoken Tunisian Dialect Corpora. In International Conference on Applications of Natural Language to Information Systems, NLDB’2014. Boujelbane, R., Mallek, M., Ellouze, M., & Belguith, L. (2014). Fine-Grained (POS) Tagging of Spoken Tunisian Dialect Corpora. In International Conference on Applications of Natural Language to Information Systems, NLDB’2014.
Zurück zum Zitat Brown, P., Pietra, S., Pietra, V., & Mercer, R. (1993). The mathematic of statistical machine translation : Parameter estimation. Computational linguistics, 19(2), 263–311. Brown, P., Pietra, S., Pietra, V., & Mercer, R. (1993). The mathematic of statistical machine translation : Parameter estimation. Computational linguistics, 19(2), 263–311.
Zurück zum Zitat Darwish, K., Abdelali, A., Mubarak, H., Samih, Y., & Attia, M. (2018). Diacritization of Moroccan and Tunisian Arabic Dialects: A CRF Approach, In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). Darwish, K., Abdelali, A., Mubarak, H., Samih, Y., & Attia, M. (2018). Diacritization of Moroccan and Tunisian Arabic Dialects: A CRF Approach, In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018).
Zurück zum Zitat Darwish, K., Mubarak, H., & Abdelali, A. (2017). Arabic diacritization: Stats, rules, and hacks. Proceedings of the Third Arabic Natural Language Processing Workshop, 9–17. Darwish, K., Mubarak, H., & Abdelali, A. (2017). Arabic diacritization: Stats, rules, and hacks. Proceedings of the Third Arabic Natural Language Processing Workshop, 9–17.
Zurück zum Zitat Diab, M., Ghoneim, M., & Habash. N. (2007). Arabic Diacritization in the Context of Statistical Machine Translation. In Proceedings of MTSummit, Copenhagen, Denmark. Diab, M., Ghoneim, M., & Habash. N. (2007). Arabic Diacritization in the Context of Statistical Machine Translation. In Proceedings of MTSummit, Copenhagen, Denmark.
Zurück zum Zitat El Klibi, S., El Hamzaoui, S., Ben Abda, H., Kaddes, C., & El Horcheni, F. (2014). and Maalla. Tunisie: A. La constitution en dialectetunisien. Association tunisienne de droitconstitutionnel. El Klibi, S., El Hamzaoui, S., Ben Abda, H., Kaddes, C., & El Horcheni, F. (2014). and Maalla. Tunisie: A. La constitution en dialectetunisien. Association tunisienne de droitconstitutionnel.
Zurück zum Zitat Elshafei, M., Al-muhtaseb, H., & Alghamdi, M. (2006). Statistical methods for automatic diacritization of Arabic text. In Proceedings of Saudi 18th National Computer Conference (NCC18). Elshafei, M., Al-muhtaseb, H., & Alghamdi, M. (2006). Statistical methods for automatic diacritization of Arabic text. In Proceedings of Saudi 18th National Computer Conference (NCC18).
Zurück zum Zitat Fashwan, A., & Alansary, S. (2017). SHAKKIL: an automatic diacritization system for modern standard Arabic texts. Proceedings of The Third Arabic Natural Language Processing Workshop (WANLP). Fashwan, A., & Alansary, S. (2017). SHAKKIL: an automatic diacritization system for modern standard Arabic texts. Proceedings of The Third Arabic Natural Language Processing Workshop (WANLP).
Zurück zum Zitat Gal, Y. (2002). An HMM approach to vowel restoration in Arabic and Hebrew. In Proceedings of the ACL’2002 Workshop on Computational Approaches to Semitic Languages, SEMITIC’02. Gal, Y. (2002). An HMM approach to vowel restoration in Arabic and Hebrew. In Proceedings of the ACL’2002 Workshop on Computational Approaches to Semitic Languages, SEMITIC’02.
Zurück zum Zitat Gibson, M. L. (1998). Dialect Contact in Tunisian Arabic: Sociolinguistic and Structural Aspects. University of Reading. Gibson, M. L. (1998). Dialect Contact in Tunisian Arabic: Sociolinguistic and Structural Aspects. University of Reading.
Zurück zum Zitat Habash, N., Shahrour, A., & Al-Khalil, M. (2016). 2016. Habash, N., Shahrour, A., & Al-Khalil, M. (2016). 2016.
Zurück zum Zitat Habash, N., Diab, M., & Rambow, O. (2012). Conventional Orthography for Dialectal Arabic. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’2012). Habash, N., Diab, M., & Rambow, O. (2012). Conventional Orthography for Dialectal Arabic. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’2012).
Zurück zum Zitat Habash, N., & Rambow, O. (2007). Arabic diacritization through full morphological tagging. The Conference of the North American Chapter of the Association for Computational Linguistics. Habash, N., & Rambow, O. (2007). Arabic diacritization through full morphological tagging. The Conference of the North American Chapter of the Association for Computational Linguistics.
Zurück zum Zitat Hamed, O., & Zesch, T. (2017). A Survey and Comparative Study of Arabic Diacritization Tools. Journal of Language Technology and Computational Linguistics, volume 32, number 1. Hamed, O., & Zesch, T. (2017). A Survey and Comparative Study of Arabic Diacritization Tools. Journal of Language Technology and Computational Linguistics, volume 32, number 1.
Zurück zum Zitat Harrat, S., Abbas, M., Meftouh, K., Smaili, K., Bouzareah, E. N. S., & Loria, C. (2013). Diacritics restoration for Arabic dialect texts. 14th Annual Conference of the International Speech Communication. Harrat, S., Abbas, M., Meftouh, K., Smaili, K., Bouzareah, E. N. S., & Loria, C. (2013). Diacritics restoration for Arabic dialect texts. 14th Annual Conference of the International Speech Communication.
Zurück zum Zitat Hermena, E., Drieghe, D., Hellmuth, S., & Simon P. (2015). Processing of Arabic Diacritical Marks: Phonological Syntactic Disambiguation of Homographic Verbs and Visual Crowding Effects. Journal of Experimental Psychology. Human Perception and Performance, 41, pages: 494–507. Hermena, E., Drieghe, D., Hellmuth, S., & Simon P. (2015). Processing of Arabic Diacritical Marks: Phonological Syntactic Disambiguation of Homographic Verbs and Visual Crowding Effects. Journal of Experimental Psychology. Human Perception and Performance, 41, pages: 494–507.
Zurück zum Zitat Hifny, Y. (2012). Higher order n-gram language models for Arabic diacritics restoration. In Proceedings of the 12th Conference on Language Engineering (ESOLEC 12). Hifny, Y. (2012). Higher order n-gram language models for Arabic diacritics restoration. In Proceedings of the 12th Conference on Language Engineering (ESOLEC 12).
Zurück zum Zitat Holes, C. (2004). Modern Arabic: Structures, Functions, and Varieties, Georgetown. Ed. Washington. Holes, C. (2004). Modern Arabic: Structures, Functions, and Varieties, Georgetown. Ed. Washington.
Zurück zum Zitat Jarrar, M., Habash, N., Akra, D., Zalmout, N., & Bank, W. (2014). Building a Corpus for Palestinian Arabic: a Preliminary Study. Jarrar, M., Habash, N., Akra, D., Zalmout, N., & Bank, W. (2014). Building a Corpus for Palestinian Arabic: a Preliminary Study.
Zurück zum Zitat Khalifa, S., Habash, N., Eryani, F., Obeid, O., Abdulrahim, D., & Al Kaabi, M. (2018). A Morphologically Annotated Corpus of Emirati Arabic. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation, LREC’2018. Khalifa, S., Habash, N., Eryani, F., Obeid, O., Abdulrahim, D., & Al Kaabi, M. (2018). A Morphologically Annotated Corpus of Emirati Arabic. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation, LREC’2018.
Zurück zum Zitat Kirchhoff, K., & Vergyri, D. (2005). Cross- Dialectal Data Sharing for Acoustic Modeling in Arabic Speech Recognition. Speech Communication, 46, pages: 37–51. Kirchhoff, K., & Vergyri, D. (2005). Cross- Dialectal Data Sharing for Acoustic Modeling in Arabic Speech Recognition. Speech Communication, 46, pages: 37–51.
Zurück zum Zitat Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., & Bertoldi, N. (2007). Moses: Open Source Toolkit for Statistical Machine Translation. ACL’2007, demonstration session. Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., & Bertoldi, N. (2007). Moses: Open Source Toolkit for Statistical Machine Translation. ACL’2007, demonstration session.
Zurück zum Zitat Kubra, A., & Eryigit, G. (2014). Vowel and Diacritic Restoration for Social Media Texts. 5th Workshop on Language Analysis for Social Media (LASM) at EACL’2014. Kubra, A., & Eryigit, G. (2014). Vowel and Diacritic Restoration for Social Media Texts. 5th Workshop on Language Analysis for Social Media (LASM) at EACL’2014.
Zurück zum Zitat Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. Proc. ICML, 282–289. Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. Proc. ICML, 282–289.
Zurück zum Zitat Lawson, S., & Itesh, S. (1997). Accommodation communicative en Tunisie: une etude empirique. Plurilinguisme et identités au Maghreb, Publications de l’Universite de Rouen, 01–114. Lawson, S., & Itesh, S. (1997). Accommodation communicative en Tunisie: une etude empirique. Plurilinguisme et identités au Maghreb, Publications de l’Universite de Rouen, 01–114.
Zurück zum Zitat Maamouri, M., Bies, A., Buckwalter, T., & Mekki, W. (2004). The Penn Arabic treebank: Building a large-scale annotatedArabic corpus. In: NEMLAR Conf. Arabic Language Resources and Tools, pp. 102-109. Maamouri, M., Bies, A., Buckwalter, T., & Mekki, W. (2004). The Penn Arabic treebank: Building a large-scale annotatedArabic corpus. In: NEMLAR Conf. Arabic Language Resources and Tools, pp. 102-109.
Zurück zum Zitat Maamouri, M., Bies, A., & Kulick, S. (2006). Diacritization: A Challenge to Arabic Treebank Annotation and Parsing. Proceeding of the British Computer Society Arabic NLP/MT Conference, 2006. Maamouri, M., Bies, A., & Kulick, S. (2006). Diacritization: A Challenge to Arabic Treebank Annotation and Parsing. Proceeding of the British Computer Society Arabic NLP/MT Conference, 2006.
Zurück zum Zitat Maamouri, M., Bies, A., & Kulick, S. (2008). Enhancing the Arabic treebank: a collaborative effort toward new annotation guidelines. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008). Maamouri, M., Bies, A., & Kulick, S. (2008). Enhancing the Arabic treebank: a collaborative effort toward new annotation guidelines. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008).
Zurück zum Zitat Maamouri, M., Bies, A., & Kulick, S. (2009). Creating a methodology for large-scale correction of treebank annotation: The case of the arabic treebank. In Proceedings of MEDAR International Conference on Arabic Language Resources and Tools, Cairo, Egypt. Maamouri, M., Bies, A., & Kulick, S. (2009). Creating a methodology for large-scale correction of treebank annotation: The case of the arabic treebank. In Proceedings of MEDAR International Conference on Arabic Language Resources and Tools, Cairo, Egypt.
Zurück zum Zitat Masmoudi, A., Bougares, F., Khmekhem, M., Estéve, Y., & Belguith, L. (2018). Automatic speech recognition system for Tunisian dialect. Language Resources and Evaluation, 52(1), 249–267.CrossRef Masmoudi, A., Bougares, F., Khmekhem, M., Estéve, Y., & Belguith, L. (2018). Automatic speech recognition system for Tunisian dialect. Language Resources and Evaluation, 52(1), 249–267.CrossRef
Zurück zum Zitat Masmoudi, A., Ellouze, M., & Belguith, L. (2019). Automatic diacritization of Tunisian dialect text using Recurrent Neural Network. RANLP, 2019, 730–739. Masmoudi, A., Ellouze, M., & Belguith, L. (2019). Automatic diacritization of Tunisian dialect text using Recurrent Neural Network. RANLP, 2019, 730–739.
Zurück zum Zitat Masmoudi, A., Ellouze, M., Khrouf, M., & Belguith, L. (2020). Transliteration of Arabizi into Arabic Script for Tunisian Dialect. ACM Transactions on Asian and Low-Resource Language Information Processing, 19(2), 32:1-32:21.CrossRef Masmoudi, A., Ellouze, M., Khrouf, M., & Belguith, L. (2020). Transliteration of Arabizi into Arabic Script for Tunisian Dialect. ACM Transactions on Asian and Low-Resource Language Information Processing, 19(2), 32:1-32:21.CrossRef
Zurück zum Zitat Masmoudi, A., Khmekhem, M., Estéve, Y., Bougares, F., & Belguith, L. (2014). Phonetic tool for the Tunisian Arabic. In the 4th International Workshop on Spoken Language Technologies for Under-resourced Languages. Masmoudi, A., Khmekhem, M., Estéve, Y., Bougares, F., & Belguith, L. (2014). Phonetic tool for the Tunisian Arabic. In the 4th International Workshop on Spoken Language Technologies for Under-resourced Languages.
Zurück zum Zitat Masmoudi, A., Khmekhem, M., Estéve, Y., Bougares, F., Belguith, L., & Habash, N. (2014). A corpus and a phonetic dictionary for Tunisian Arabic speech recognition. In 19th edition of the Language Resources and Evaluation Conference. Masmoudi, A., Khmekhem, M., Estéve, Y., Bougares, F., Belguith, L., & Habash, N. (2014). A corpus and a phonetic dictionary for Tunisian Arabic speech recognition. In 19th edition of the Language Resources and Evaluation Conference.
Zurück zum Zitat Masmoudi, A., Habash, N., Khmekhem, M., Estéve, Y., & Belguith, L. (2015). Arabic Transliteration of Romanized Tunisian Dialect Text: A Preliminary Investigation. Computational Linguistics and Intelligent Text Processing, 16th International Conference, CICLing 2015. Masmoudi, A., Habash, N., Khmekhem, M., Estéve, Y., & Belguith, L. (2015). Arabic Transliteration of Romanized Tunisian Dialect Text: A Preliminary Investigation. Computational Linguistics and Intelligent Text Processing, 16th International Conference, CICLing 2015.
Zurück zum Zitat Masmoudi, A., Mdhaffar, S., Sellami, R., & Belguith, L. (2019). Automatic Diacritics Restoration for Tunisian Dialect, ACM Transactions on Asian and Low-Resource Language Information Processing, volume 18, number 3. Masmoudi, A., Mdhaffar, S., Sellami, R., & Belguith, L. (2019). Automatic Diacritics Restoration for Tunisian Dialect, ACM Transactions on Asian and Low-Resource Language Information Processing, volume 18, number 3.
Zurück zum Zitat Mejri, S., Said, M., & Sfar, I. (2009). Pluringuisme et diglossie en Tunisie. Synerg. Tunisie, 1, 53–74. Mejri, S., Said, M., & Sfar, I. (2009). Pluringuisme et diglossie en Tunisie. Synerg. Tunisie, 1, 53–74.
Zurück zum Zitat Nelken, R., & Shieber, S. M. (2005). Arabic diacritization using weighted –nite–state transducers. In: ACL Workshopon Computational Approaches to Semitic Languages, pp. 79–86. Nelken, R., & Shieber, S. M. (2005). Arabic diacritization using weighted –nite–state transducers. In: ACL Workshopon Computational Approaches to Semitic Languages, pp. 79–86.
Zurück zum Zitat Och, F., & Ney, H. (2003). A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics, 29(1), 19–52.CrossRef Och, F., & Ney, H. (2003). A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics, 29(1), 19–52.CrossRef
Zurück zum Zitat Ouerhani, B. (2009). Interférence entre le dialectal et le litteral en Tunisie F: Le cas de la morphologie verbale. Synerg. Tunisie, 1, 75–84. Ouerhani, B. (2009). Interférence entre le dialectal et le litteral en Tunisie F: Le cas de la morphologie verbale. Synerg. Tunisie, 1, 75–84.
Zurück zum Zitat Rashwan, M., Al Sallab, A., Raafat, H., & Rafea, A. (2015). Deep Learning Framework with Confused Sub Set Resolution Architecture for Automatic Arabic Diacritization (p. 2015). IEEE/ACM Transactions on Audio: Speech, and Language Processing. Rashwan, M., Al Sallab, A., Raafat, H., & Rafea, A. (2015). Deep Learning Framework with Confused Sub Set Resolution Architecture for Automatic Arabic Diacritization (p. 2015). IEEE/ACM Transactions on Audio: Speech, and Language Processing.
Zurück zum Zitat Saadane, H., & Habash, N. (2015). A Conventional Orthography for Algerian Arabic. In Proceedings of the Second Workshop on Arabic Natural Language Processing. Saadane, H., & Habash, N. (2015). A Conventional Orthography for Algerian Arabic. In Proceedings of the Second Workshop on Arabic Natural Language Processing.
Zurück zum Zitat Said, A., El-Sharqwi, M., Chalabi, A., & Kamal, E. (2013). A hybrid approach for Arabic diacritization. E. Mtais,F. Meziane, M. Saraee, V. Sugumaran, S. Vadera (eds.) Natural Language Processing and Information Systems, Lecture Notes in Computer Science, vol. 7934, pp. 53–64. Springer. Said, A., El-Sharqwi, M., Chalabi, A., & Kamal, E. (2013). A hybrid approach for Arabic diacritization. E. Mtais,F. Meziane, M. Saraee, V. Sugumaran, S. Vadera (eds.) Natural Language Processing and Information Systems, Lecture Notes in Computer Science, vol. 7934, pp. 53–64. Springer.
Zurück zum Zitat Schlippe, T., ThuyLinh, N., & Stephan, V. (2008). Diacritization as a Machine Translating Problem and as a Sequence Labeling Problem”, Proceedings of the Eighth Conference of the Association for Machine Translation in the Americas (AMTA), Hawai’i, USA,2008. Schlippe, T., ThuyLinh, N., & Stephan, V. (2008). Diacritization as a Machine Translating Problem and as a Sequence Labeling Problem”, Proceedings of the Eighth Conference of the Association for Machine Translation in the Americas (AMTA), Hawai’i, USA,2008.
Zurück zum Zitat Schlippe, T. (2008). Statistical methods for automatic Diacritization of Arabic Texts. Carnegie Mello University Pittsburgh, USA, May 2008. Schlippe, T. (2008). Statistical methods for automatic Diacritization of Arabic Texts. Carnegie Mello University Pittsburgh, USA, May 2008.
Zurück zum Zitat Sfar, I. (2005). Morphologie des noms de professions : incorporation et paraphrase, La terminologie, entre traduction et bilinguisme, pages 15–16, 2005. Sfar, I. (2005). Morphologie des noms de professions : incorporation et paraphrase, La terminologie, entre traduction et bilinguisme, pages 15–16, 2005.
Zurück zum Zitat Shaalan, K., Abo Bakr, M., & Ziedan. I. (2009). A hybrid approach for building Arabic diacritizer. In Proceedings of the EACL 2009 Workshop on Computational Approaches to Semitic Languages. Shaalan, K., Abo Bakr, M., & Ziedan. I. (2009). A hybrid approach for building Arabic diacritizer. In Proceedings of the EACL 2009 Workshop on Computational Approaches to Semitic Languages.
Zurück zum Zitat Shaalan, K., Abo Bakr, H., & Ziedan. I. (2008). A statistical method for adding case ending diacritics for Arabic text. In Proceedings of Language Engineering Conference. Shaalan, K., Abo Bakr, H., & Ziedan. I. (2008). A statistical method for adding case ending diacritics for Arabic text. In Proceedings of Language Engineering Conference.
Zurück zum Zitat Stolcke A. (2002). SRILM an Extensible Language Modeling Toolkit. Proceedings of ICSLP. Stolcke A. (2002). SRILM an Extensible Language Modeling Toolkit. Proceedings of ICSLP.
Zurück zum Zitat Talmoudi, F. (1980). A morphosyntactic study of Romance verbs in the Arabic dialects of Tunis, Sousa, and Sfax. Gothobg: GHteborg Acta Univ. Talmoudi, F. (1980). A morphosyntactic study of Romance verbs in the Arabic dialects of Tunis, Sousa, and Sfax. Gothobg: GHteborg Acta Univ.
Zurück zum Zitat Tilmatine, M. (1999). Substrat Et Convergences: Le Berbére Et L’arabe Nord-Africain, in: HAAK, M., JONG, R. DE, VERSTEEGH, K. (Eds.), Estudios de Dialectologia Norteafricana Y Andalusi. Tilmatine, M. (1999). Substrat Et Convergences: Le Berbére Et L’arabe Nord-Africain, in: HAAK, M., JONG, R. DE, VERSTEEGH, K. (Eds.), Estudios de Dialectologia Norteafricana Y Andalusi.
Zurück zum Zitat Vergyri, D., & Kirchho, K. (2004). Automatic diacritization of Arabic for acoustic modeling in speech recognition. Workshop on Computational Approaches to Arabic ScriptbasedLanguages, pp. 66-73 (2004) Vergyri, D., & Kirchho, K. (2004). Automatic diacritization of Arabic for acoustic modeling in speech recognition. Workshop on Computational Approaches to Arabic ScriptbasedLanguages, pp. 66-73 (2004)
Zurück zum Zitat Wang, D., & King, S. (2011). Letter-to-sound Pronunciation Prediction Using Conditional Random Fields. IEEE Signal Processing Letters. Wang, D., & King, S. (2011). Letter-to-sound Pronunciation Prediction Using Conditional Random Fields. IEEE Signal Processing Letters.
Zurück zum Zitat Zaghouani, W., Habash, N., Bouamor, H., Rozovskaya, A., Mohit, B., Heider, A., & Oflazer, K. (2015). Correction annotation for non-native arabic texts: Guidelines and corpus. In Proceedings of the Association for Computational Linguistics Fourth Linguistic Annotation Workshop. Zaghouani, W., Habash, N., Bouamor, H., Rozovskaya, A., Mohit, B., Heider, A., & Oflazer, K. (2015). Correction annotation for non-native arabic texts: Guidelines and corpus. In Proceedings of the Association for Computational Linguistics Fourth Linguistic Annotation Workshop.
Zurück zum Zitat Zaghouani, W., Bouamor, H., Hawwari, A., Diab, M., Obeid, O., Ghoneim, M., Alqahtani, S., & Oflazer, K. (2016). Guidelines and framework for a large-scale Arabic diacritized corpus. Proceedings of the Tenth International Conference on Language Resources and Evaluation: LREC’2016. Zaghouani, W., Bouamor, H., Hawwari, A., Diab, M., Obeid, O., Ghoneim, M., Alqahtani, S., & Oflazer, K. (2016). Guidelines and framework for a large-scale Arabic diacritized corpus. Proceedings of the Tenth International Conference on Language Resources and Evaluation: LREC’2016.
Zurück zum Zitat Zaghouani, W., Habash, N., Obeid, O., Mohit, B., Bouamor, H., & Oflazer, K. (2016). Building an Arabic machine translation post-edited corpus: Guidelines and annotation. In International Conference on Language Resources and Evaluation: LREC’2016. Zaghouani, W., Habash, N., Obeid, O., Mohit, B., Bouamor, H., & Oflazer, K. (2016). Building an Arabic machine translation post-edited corpus: Guidelines and annotation. In International Conference on Language Resources and Evaluation: LREC’2016.
Zurück zum Zitat Zitouni, I., Sorensen, J. & Sarikaya, R. (2006). Maximum entropy based restoration of Arabic diacritics. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics. Zitouni, I., Sorensen, J. & Sarikaya, R. (2006). Maximum entropy based restoration of Arabic diacritics. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics.
Zurück zum Zitat Zitouni, I. & Sarikaya, R. (2009). Arabic Diacritic Restoration Approach Based on Maximum Entropy Models. In Journal of Computer Speech and Language. Zitouni, I. & Sarikaya, R. (2009). Arabic Diacritic Restoration Approach Based on Maximum Entropy Models. In Journal of Computer Speech and Language.
Zurück zum Zitat Zribi, I., Khmekhem, M., Belguith, L., & Blache, P. (2017). Morphological disambiguation of Tunisian dialect. Journal of King Saud University, Computer and Information Sciences, 29, 147–155.CrossRef Zribi, I., Khmekhem, M., Belguith, L., & Blache, P. (2017). Morphological disambiguation of Tunisian dialect. Journal of King Saud University, Computer and Information Sciences, 29, 147–155.CrossRef
Zurück zum Zitat Zribi, I., Boujelbane, R., Masmoudi, A., Ellouze, M., Belguith, L., & Habash, N. (2014). A Conventional Orthography for Tunisian Arabic. In Proceedings of the Ninth International Conference on Language Resources and Evaluation: LREC’14. Zribi, I., Boujelbane, R., Masmoudi, A., Ellouze, M., Belguith, L., & Habash, N. (2014). A Conventional Orthography for Tunisian Arabic. In Proceedings of the Ninth International Conference on Language Resources and Evaluation: LREC’14.
Zurück zum Zitat Zribi, I., Ellouze, M., Belguith, L. H., & Blache, P. (2015). Spoken Tunisian Arabic Corpus “STAC”: Transcription and Annotation (p. 90). Sci: Res. Comput. Zribi, I., Ellouze, M., Belguith, L. H., & Blache, P. (2015). Spoken Tunisian Arabic Corpus “STAC”: Transcription and Annotation (p. 90). Sci: Res. Comput.
Zurück zum Zitat Zribi, I., Graja, M., Khemakhem, M.E., Jaoua, M., & Belguith, L. (2013). Orthographic Transcription for Spoken Tunisian Arabic, in: A. Gelbukh (Ed.): CICLing 2013. Zribi, I., Graja, M., Khemakhem, M.E., Jaoua, M., & Belguith, L. (2013). Orthographic Transcription for Spoken Tunisian Arabic, in: A. Gelbukh (Ed.): CICLing 2013.
Metadaten
Titel
Automatic diacritization of Tunisian dialect text using SMT model
verfasst von
Abir Masmoudi
Chafik Aloulou
Abdel Ghader Sidi Abdellahi
Lamia Hadrich Belguith
Publikationsdatum
08.08.2021
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 1/2022
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-021-09864-6

Weitere Artikel der Ausgabe 1/2022

International Journal of Speech Technology 1/2022 Zur Ausgabe

Neuer Inhalt