Skip to main content
Log in

Automatic speech recognition system for Tunisian dialect

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

Although Modern Standard Arabic is taught in schools and used in written communication and TV/radio broadcasts, all informal communication is typically carried out in dialectal Arabic. In this work, we focus on the design of speech tools and resources required for the development of an Automatic Speech Recognition system for the Tunisian dialect. The development of such a system faces the challenges of the lack of annotated resources and tools, apart from the lack of standardization at all linguistic levels (phonological, morphological, syntactic and lexical) together with the mispronunciation dictionary needed for ASR development. In this paper, we present a historical overview of the Tunisian dialect and its linguistic characteristics. We also describe and evaluate our rule-based phonetic tool. Next, we go deeper into the details of Tunisian dialect corpus creation. This corpus is finally approved and used to build the first ASR system for Tunisian dialect with a Word Error Rate of 22.6%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. Treebanks are language resources that provide annotations of natural languages at various levels of structure: at the word level and the sentence level.

  2. http://www.speech.cs.cmu.edu/cgi-bin/cmudict.

  3. Undiacritized or unvowelized word refers to a word without short vowels.

  4. MADA is a POS tagger for Arabic languages.

  5. The Algerian dialect is the language used in the daily spoken communication of Algerian.

  6. http://www1.icsi.berkeley.edu/Speech/docs/sctk-1.2/sclite.htm.

  7. http://www.sncft.com.tn/.

  8. https://sourceforge.net/projects/audacity/.

  9. Transcriber is distributed as free software and is available at http://trans.sourceforge.net.

  10. http://www-lium.univ-lemans.fr/~bougares/ressources.php.

References

  • Abdel-Rahman A. (1991). Code-switching and linguistic accommodation in Arabic, In Perspectives on arabic linguistics III: Papers from the third annual symposium on Arabic linguistics (vol. 80, pp. 231250). John Benjamins Publishing.

  • Alghamdi, M., Elshafei, M. & and Al-Muhtaseb, H. (2002). Speech units for Arabic text-to-speech, fourth workshop on computer and information sciences, pp. 199–212.

  • Alghamdi, M., Muzaffar, Z., & Alhakami, H. (2010). Automatic restoration of Arabic diacritics: A simple, purely statistical approach. The Arabian Journal for Science and Engineering, 35(2), 35.

    Google Scholar 

  • Andersen, O., Kuhn, R., Lazaridès, A., Dalsgaard, P., Haas, J., & Nth, E. (1996). Comparison of two tree-structured approaches for Grapheme-to-Phoneme conversion, spoken language processing (Vol. 3, pp. 1700–1703). Philadelphia, USA.

  • Baccouche, T. (2003). Larabe, dune koin dialectale une langue de culture, Mmoires de la soci linguistique de Paris, TomeXI, (les langues de Communication...), 87–93.

  • Barnard, E., Davel, M. H., & Van Huyssteen, G. B. (2010). Speech technology for information access: A South African case study. In AAAI spring symposium: artificial intelligence for development.

  • Besacier, L., Le, V.B., Castelli, E., Sethserey, S. & Protin, L. (2005). Reconnaissance automatique de la parole pour des langues peu dotees: Application au vietnamien et au khmer, TALN’2005.

  • Besacier, L., Barnard, E., Karpov, A., & Schultz, T. (2014). Automatic speech recognition for under-resourced languages: A survey. Speech Communication, 56, 85–100.

    Article  Google Scholar 

  • Biadsy, F., Habash, N. & Hirschberg, J. (2009). Improving the Arabic pronunciation dictionary for phone and word recognition with linguistically-based pronunciation rules. In Annual conference of the North American, Boulder, Colorado p. 397405.

  • Bisani, M., & Ney, H. (2008). Joint-sequence models for Grapheme-to-Phoneme conversion. Speech Communication, 50, 434–451.

    Article  Google Scholar 

  • Blachona, D., Gauthiera, E., Besacier, L., Kouarata, G., Adda-Deckerb, M. & Rialland, A. (2016). Parallel speech collection for under-resourced language studies using the lig-aikuma mobile device app, In 5th workshop on spoken language technology for under-resourced languages, SLTU’2016.

  • Cucu, H., Buzo, A., Besacier, L., & Burileanu, C. (2014). SMT-based ASR domain adaptation methods for under-resourced languages: Application to Romanian. Speech Communication, 56, 195–212.

    Article  Google Scholar 

  • El-Imam, Y. (2004). Phonetization of Arabic: Rules and algorithms, Computer Speech and Language.

  • Elmahdy, M., Hasegawa-Johnson, M. & Mustafawi, E. (2014). Development of a TV broadcasts speech recognition system for Qatari Arabic, In The 9th edition of the language resources and evaluation conference: LREC’2014.

  • Elshafei, M., Al-Muhtaseb, H. & Alghamdi. M. (2006). Statistical methods for automatic diacritization of Arabic text. In The Saudi 18th national computer conference (vol. 18, pp. 301–306).

  • Gauthier, E., Besacier, L., Voisin, S., Melese, M. & Elingui, U. P. (2016). Collecting resources in sub-Saharan African languages for automatic speech recognition: A case study of wolof, LREC’2016.

  • Gauthiera, E., Besacier, L. & Voisinb, S. (2016). Automatic speech recognition for African languages with vowel length contrast. In 5th workshop on spoken language technology for under-resourced languages, SLTU’2016.

  • Gelas, H., Abate, S. T., Besacier, L. & Pellegrino, F. (2012). Analyse des performances de modles de langage sub-lexicale pour des langues peu-dotees morphologie riche, JEP-TALN-RECITAL 2012, Atelier TALAf 2012: Traitement Automatique des Langues Africaines.

  • Graja, M., Jaoua, M. & Belguith, L. (2010). Lexical study of a spoken dialogue corpus in Tunisian dialect. In ACIT2010: The International Arab conference on information technology, Benghazi-Libya, December 1416.

  • Graja, M., Jaoua, M., & Belguith, L. (2015). Statistical framework with knowledge base integration for robust speech understanding of the Tunisian dialect. IEEE/ACM Transactions on Audio, Speech & Language Processing, 23, 2311–2321.

    Article  Google Scholar 

  • Habash, N., Diab, D. & Rambow, O. (2012). Conventional orthography for dialectal Arabic. In Proceedings of the eighth international conference on language resources and evaluation, LREC’2012.

  • Habash, N. (2010). Introduction to Arabic natural language processing, synthesis lectures on human language technologies, Graeme Hirst. San Rafael: Morgan & Claypool Publishers.

    Google Scholar 

  • Habash, N. (2006). On Arabic and its dialects. Multilingual Magazine, 17, 81.

    Google Scholar 

  • Häkkinen, J., Suontausta, J., Riis, S., & Jensen, K. (2003). Assessing text-to-phoneme mapping strategies in speaker independent isolated word recognition. Speech Communication, 41, 455–467.

  • Harrat, S., Meftouh, K., Abbas, M., & Smaïli, K. (2014). Grapheme to Phoneme conversion—an Arabic dialect case, In Spoken language technologies for under-resourced languages, (SLTU’2014).

  • Illina, I., Fohr, D., & Jouvet, D. (2011). Grapheme-to-phoneme conversion using conditional random fields, Interspeech’ 2011.

  • Jensen, J., & Riis, S. (2000). Self-organizing letter code-book for text-to-phoneme neural network model. Spoken Language Processing, 3(318), 321.

    Google Scholar 

  • Juan, S., & Besacier, L. (2013). Fast bootstrapping of Grapheme to Phoneme system for under-resourced languages-application to the iban language, WSSANLP-2013.

  • Kheang, S., Katsurada, K., Iribe, Y., & Nitta, T. (2014). Solving the phoneme conflict in Grapheme-to-Phoneme conversion using a two-stage neural network-based approach. IEICE Transactions on Information and Systems, 97, 901–910.

    Article  Google Scholar 

  • Lawson, S., & Itesh, S. (1997). Accommodation communicative en Tunisie: une tude empirique (pp. 101–114). Plurilinguisme et identits au Maghreb: Publications de lUniversite de Rouen.

    Google Scholar 

  • Lileikyta, R., Gorinaa, A., Lamela, L., Gauvaina, J., & Fraga-Silva, T. H. (2016). Lithuanian broadcast speech transcription using semi-supervised acoustic model training. In 5th Workshop on spoken language technology for under-resourced languages, SLTU’2016.

  • Loots, L., & Niesler, T. (2011). Automatic conversion between pronunciations of different English accents. Speech Communication, 53, 7584.

    Article  Google Scholar 

  • Marchand, Y., & Damper, R. (2000). A multistrategy approach to improving pronunciation by analogy. Computational Linguistics, 26, 19–219.

    Article  Google Scholar 

  • Masmoudi, A., Khmekhem, M., Estève, Y., Belguith, L., & Habash, N. (2014). A corpus and phonetic dictionary for Tunisian Arabic speech recognition. In Proceedings of the ninth international conference on language resources and evaluation (LREC-2014), Reykjavik, Iceland (pp. 306–310).

  • Masmoudi, A., Habash, N., Khmekhem, M., Estève, Y., & Belguith, L. (2015). Arabic transliteration of Romanized Tunisian dialect text: A preliminary investigation. In 16th international conference on computational linguistics and intelligent text processing, CICLing 2015. Cairo: Egypt, pp. 608–619.

  • Mejri, S., Said, S., & Sfar, I. (2009). Pluringuisme et diglossie en Tunisie. Synergies Tunisie, 1, 53–74.

    Google Scholar 

  • Nimaan, A., Nocera, P., & Torres-Moreno, J. M. (2006). Boites a outils tal pour les langues peu informatisees: Le cas du somali. JADT06: actes des 8es Journees internationales danalyse statistique des donnees textuelles: Besancon.

  • Pagel, V., Lenzo, K., & Black, A. (1998). Letter-to-sound rules for accented lexicon compression. Spoken Language Processing, Sydney, Australia, 2015, 2018.

    Google Scholar 

  • Pellegrini, T. (2008). Transcription automatique de langues peu dotees, Ph.D. thesis; Universite Paris Sud-Paris XI.

  • Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, K., Stemmer, G., & Vesely, K. (2011). The Kaldi speech recognition toolkit. In IEEE 2011 Workshop on Automatic Speech Recognition and Understanding.

  • Rasipuram, R., & Doss, M. (2012). Acoustic data-driven grapheme-to-phoneme conversion using KL-HMM. In Acoustics, Speech and Signal Processing (ICASSP’2012), pp. 4841–4844.

  • Saadane, H., & Habash, N. (2015). A conventional orthography for Algerian Arabic. In Proceedings of the Second Workshop on Arabic Natural Language Processing, pp. 69–79.

  • Samson, S., Besacier, L., Lecouteux, B., & Dyab, M. (2015). Using resources from a closely-related language to develop ASR for a very under-resourced language: A case study for iban, interspeech’2015. Germany: Dresden.

  • Schlippe, T., Djomgang, E., Vu, N., Ochs, S., & Schultz, T. (2012). Hause large vocabulary continuous speech recognition. In The Third International Workshop on Spoken Languages Technologies for Under-Resourced Languages, Cape Town, South Africa, SLTU’2012.

  • Sejnowski, T., & Rosenberg, C. H. (1987). Parallel networks that learn to pronounce English text. Complex Systems Publications (pp. 145–168).

  • Seng. K., Iribe, Y., Nitta, T. (2011). Letter-to-phoneme conversion based on two-stage neural network focusing on letter and phoneme contexts. In INTERSPEECH’2011, 12th Annual Conference of the International Speech Communication Association, ISCA, pp. 1885–1888.

  • Taylor, P. (2005). Hidden Markov models for grapheme to phoneme conversion. In INTERSPEECH’ 2005Eurospeech, 9th European Conference on Speech Communication and Technology, ISCA, pp. 1973–1976.

  • Tebbi, H. (2007). Transcription orthographique phonétique en vue de la synthèse de la parole partir du texte de lArabe. Algérie: Univrersité de Blida.

    Google Scholar 

  • Vergyri, D., Mandal, A., Wang, W., Stolcke, A., Zheng, J., Graciarena, M., et al. (2008). Development of the SRI/Nightingale Arabic ASR system. Interspeech, 2008, 14371440.

    Google Scholar 

  • Vu, N.T, Kraus, F., & Schultz, T. (2011). Rapid building of an ASR system for under-resourced languages based on multilingual unsupervised training, Interspeech, Citeseer.

  • Wang, X., & Sim, K. (2013). Integrating conditional random fields and joint multi-gram model with syllabic features for grapheme-to-phone conversion, INTERSPEECH’2013.

  • Zribi, I., Boujelbane, R., Masmoudi, A., Ellouze, M., Belguith, L., & Habash, N. (2014). A conventional orthography for Tunisian Arabic. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014) (pp. 2355–2361). Reykjavik, Iceland.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abir Masmoudi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Masmoudi, A., Bougares, F., Ellouze, M. et al. Automatic speech recognition system for Tunisian dialect. Lang Resources & Evaluation 52, 249–267 (2018). https://doi.org/10.1007/s10579-017-9402-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-017-9402-y

Keywords

Navigation