Automatic speech recognition system for Tunisian dialect

Masmoudi, Abir; Bougares, Fethi; Ellouze, Mariem; Estève, Yannick; Belguith, Lamia

doi:10.1007/s10579-017-9402-y

Automatic speech recognition system for Tunisian dialect

Original Paper
Published: 22 September 2017

Volume 52, pages 249–267, (2018)
Cite this article

Language Resources and Evaluation Aims and scope Submit manuscript

Abir Masmoudi^1,2,
Fethi Bougares¹,
Mariem Ellouze²,
Yannick Estève¹ &
…
Lamia Belguith²

734 Accesses
30 Citations
2 Altmetric
Explore all metrics

Abstract

Although Modern Standard Arabic is taught in schools and used in written communication and TV/radio broadcasts, all informal communication is typically carried out in dialectal Arabic. In this work, we focus on the design of speech tools and resources required for the development of an Automatic Speech Recognition system for the Tunisian dialect. The development of such a system faces the challenges of the lack of annotated resources and tools, apart from the lack of standardization at all linguistic levels (phonological, morphological, syntactic and lexical) together with the mispronunciation dictionary needed for ASR development. In this paper, we present a historical overview of the Tunisian dialect and its linguistic characteristics. We also describe and evaluate our rule-based phonetic tool. Next, we go deeper into the details of Tunisian dialect corpus creation. This corpus is finally approved and used to build the first ASR system for Tunisian dialect with a Word Error Rate of 22.6%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Automatic speech recognition: a survey

Article 10 November 2020

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

Article 29 September 2022

Notes

Treebanks are language resources that provide annotations of natural languages at various levels of structure: at the word level and the sentence level.
http://www.speech.cs.cmu.edu/cgi-bin/cmudict.
Undiacritized or unvowelized word refers to a word without short vowels.
MADA is a POS tagger for Arabic languages.
The Algerian dialect is the language used in the daily spoken communication of Algerian.
http://www1.icsi.berkeley.edu/Speech/docs/sctk-1.2/sclite.htm.
http://www.sncft.com.tn/.
https://sourceforge.net/projects/audacity/.
Transcriber is distributed as free software and is available at http://trans.sourceforge.net.
http://www-lium.univ-lemans.fr/~bougares/ressources.php.

References

Abdel-Rahman A. (1991). Code-switching and linguistic accommodation in Arabic, In Perspectives on arabic linguistics III: Papers from the third annual symposium on Arabic linguistics (vol. 80, pp. 231250). John Benjamins Publishing.
Alghamdi, M., Elshafei, M. & and Al-Muhtaseb, H. (2002). Speech units for Arabic text-to-speech, fourth workshop on computer and information sciences, pp. 199–212.
Alghamdi, M., Muzaffar, Z., & Alhakami, H. (2010). Automatic restoration of Arabic diacritics: A simple, purely statistical approach. The Arabian Journal for Science and Engineering, 35(2), 35.
Google Scholar
Andersen, O., Kuhn, R., Lazaridès, A., Dalsgaard, P., Haas, J., & Nth, E. (1996). Comparison of two tree-structured approaches for Grapheme-to-Phoneme conversion, spoken language processing (Vol. 3, pp. 1700–1703). Philadelphia, USA.
Baccouche, T. (2003). Larabe, dune koin dialectale une langue de culture, Mmoires de la soci linguistique de Paris, TomeXI, (les langues de Communication...), 87–93.
Barnard, E., Davel, M. H., & Van Huyssteen, G. B. (2010). Speech technology for information access: A South African case study. In AAAI spring symposium: artificial intelligence for development.
Besacier, L., Le, V.B., Castelli, E., Sethserey, S. & Protin, L. (2005). Reconnaissance automatique de la parole pour des langues peu dotees: Application au vietnamien et au khmer, TALN’2005.
Besacier, L., Barnard, E., Karpov, A., & Schultz, T. (2014). Automatic speech recognition for under-resourced languages: A survey. Speech Communication, 56, 85–100.
Article Google Scholar
Biadsy, F., Habash, N. & Hirschberg, J. (2009). Improving the Arabic pronunciation dictionary for phone and word recognition with linguistically-based pronunciation rules. In Annual conference of the North American, Boulder, Colorado p. 397405.
Bisani, M., & Ney, H. (2008). Joint-sequence models for Grapheme-to-Phoneme conversion. Speech Communication, 50, 434–451.
Article Google Scholar
Blachona, D., Gauthiera, E., Besacier, L., Kouarata, G., Adda-Deckerb, M. & Rialland, A. (2016). Parallel speech collection for under-resourced language studies using the lig-aikuma mobile device app, In 5th workshop on spoken language technology for under-resourced languages, SLTU’2016.
Cucu, H., Buzo, A., Besacier, L., & Burileanu, C. (2014). SMT-based ASR domain adaptation methods for under-resourced languages: Application to Romanian. Speech Communication, 56, 195–212.
Article Google Scholar
El-Imam, Y. (2004). Phonetization of Arabic: Rules and algorithms, Computer Speech and Language.
Elmahdy, M., Hasegawa-Johnson, M. & Mustafawi, E. (2014). Development of a TV broadcasts speech recognition system for Qatari Arabic, In The 9th edition of the language resources and evaluation conference: LREC’2014.
Elshafei, M., Al-Muhtaseb, H. & Alghamdi. M. (2006). Statistical methods for automatic diacritization of Arabic text. In The Saudi 18th national computer conference (vol. 18, pp. 301–306).
Gauthier, E., Besacier, L., Voisin, S., Melese, M. & Elingui, U. P. (2016). Collecting resources in sub-Saharan African languages for automatic speech recognition: A case study of wolof, LREC’2016.
Gauthiera, E., Besacier, L. & Voisinb, S. (2016). Automatic speech recognition for African languages with vowel length contrast. In 5th workshop on spoken language technology for under-resourced languages, SLTU’2016.
Gelas, H., Abate, S. T., Besacier, L. & Pellegrino, F. (2012). Analyse des performances de modles de langage sub-lexicale pour des langues peu-dotees morphologie riche, JEP-TALN-RECITAL 2012, Atelier TALAf 2012: Traitement Automatique des Langues Africaines.
Graja, M., Jaoua, M. & Belguith, L. (2010). Lexical study of a spoken dialogue corpus in Tunisian dialect. In ACIT2010: The International Arab conference on information technology, Benghazi-Libya, December 1416.
Graja, M., Jaoua, M., & Belguith, L. (2015). Statistical framework with knowledge base integration for robust speech understanding of the Tunisian dialect. IEEE/ACM Transactions on Audio, Speech & Language Processing, 23, 2311–2321.
Article Google Scholar
Habash, N., Diab, D. & Rambow, O. (2012). Conventional orthography for dialectal Arabic. In Proceedings of the eighth international conference on language resources and evaluation, LREC’2012.
Habash, N. (2010). Introduction to Arabic natural language processing, synthesis lectures on human language technologies, Graeme Hirst. San Rafael: Morgan & Claypool Publishers.
Google Scholar
Habash, N. (2006). On Arabic and its dialects. Multilingual Magazine, 17, 81.
Google Scholar
Häkkinen, J., Suontausta, J., Riis, S., & Jensen, K. (2003). Assessing text-to-phoneme mapping strategies in speaker independent isolated word recognition. Speech Communication, 41, 455–467.
Harrat, S., Meftouh, K., Abbas, M., & Smaïli, K. (2014). Grapheme to Phoneme conversion—an Arabic dialect case, In Spoken language technologies for under-resourced languages, (SLTU’2014).
Illina, I., Fohr, D., & Jouvet, D. (2011). Grapheme-to-phoneme conversion using conditional random fields, Interspeech’ 2011.
Jensen, J., & Riis, S. (2000). Self-organizing letter code-book for text-to-phoneme neural network model. Spoken Language Processing, 3(318), 321.
Google Scholar
Juan, S., & Besacier, L. (2013). Fast bootstrapping of Grapheme to Phoneme system for under-resourced languages-application to the iban language, WSSANLP-2013.
Kheang, S., Katsurada, K., Iribe, Y., & Nitta, T. (2014). Solving the phoneme conflict in Grapheme-to-Phoneme conversion using a two-stage neural network-based approach. IEICE Transactions on Information and Systems, 97, 901–910.
Article Google Scholar
Lawson, S., & Itesh, S. (1997). Accommodation communicative en Tunisie: une tude empirique (pp. 101–114). Plurilinguisme et identits au Maghreb: Publications de lUniversite de Rouen.
Google Scholar
Lileikyta, R., Gorinaa, A., Lamela, L., Gauvaina, J., & Fraga-Silva, T. H. (2016). Lithuanian broadcast speech transcription using semi-supervised acoustic model training. In 5th Workshop on spoken language technology for under-resourced languages, SLTU’2016.
Loots, L., & Niesler, T. (2011). Automatic conversion between pronunciations of different English accents. Speech Communication, 53, 7584.
Article Google Scholar
Marchand, Y., & Damper, R. (2000). A multistrategy approach to improving pronunciation by analogy. Computational Linguistics, 26, 19–219.
Article Google Scholar
Masmoudi, A., Khmekhem, M., Estève, Y., Belguith, L., & Habash, N. (2014). A corpus and phonetic dictionary for Tunisian Arabic speech recognition. In Proceedings of the ninth international conference on language resources and evaluation (LREC-2014), Reykjavik, Iceland (pp. 306–310).
Masmoudi, A., Habash, N., Khmekhem, M., Estève, Y., & Belguith, L. (2015). Arabic transliteration of Romanized Tunisian dialect text: A preliminary investigation. In 16th international conference on computational linguistics and intelligent text processing, CICLing 2015. Cairo: Egypt, pp. 608–619.
Mejri, S., Said, S., & Sfar, I. (2009). Pluringuisme et diglossie en Tunisie. Synergies Tunisie, 1, 53–74.
Google Scholar
Nimaan, A., Nocera, P., & Torres-Moreno, J. M. (2006). Boites a outils tal pour les langues peu informatisees: Le cas du somali. JADT06: actes des 8es Journees internationales danalyse statistique des donnees textuelles: Besancon.
Pagel, V., Lenzo, K., & Black, A. (1998). Letter-to-sound rules for accented lexicon compression. Spoken Language Processing, Sydney, Australia, 2015, 2018.
Google Scholar
Pellegrini, T. (2008). Transcription automatique de langues peu dotees, Ph.D. thesis; Universite Paris Sud-Paris XI.
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, K., Stemmer, G., & Vesely, K. (2011). The Kaldi speech recognition toolkit. In IEEE 2011 Workshop on Automatic Speech Recognition and Understanding.
Rasipuram, R., & Doss, M. (2012). Acoustic data-driven grapheme-to-phoneme conversion using KL-HMM. In Acoustics, Speech and Signal Processing (ICASSP’2012), pp. 4841–4844.
Saadane, H., & Habash, N. (2015). A conventional orthography for Algerian Arabic. In Proceedings of the Second Workshop on Arabic Natural Language Processing, pp. 69–79.
Samson, S., Besacier, L., Lecouteux, B., & Dyab, M. (2015). Using resources from a closely-related language to develop ASR for a very under-resourced language: A case study for iban, interspeech’2015. Germany: Dresden.
Schlippe, T., Djomgang, E., Vu, N., Ochs, S., & Schultz, T. (2012). Hause large vocabulary continuous speech recognition. In The Third International Workshop on Spoken Languages Technologies for Under-Resourced Languages, Cape Town, South Africa, SLTU’2012.
Sejnowski, T., & Rosenberg, C. H. (1987). Parallel networks that learn to pronounce English text. Complex Systems Publications (pp. 145–168).
Seng. K., Iribe, Y., Nitta, T. (2011). Letter-to-phoneme conversion based on two-stage neural network focusing on letter and phoneme contexts. In INTERSPEECH’2011, 12th Annual Conference of the International Speech Communication Association, ISCA, pp. 1885–1888.
Taylor, P. (2005). Hidden Markov models for grapheme to phoneme conversion. In INTERSPEECH’ 2005—Eurospeech, 9th European Conference on Speech Communication and Technology, ISCA, pp. 1973–1976.
Tebbi, H. (2007). Transcription orthographique phonétique en vue de la synthèse de la parole partir du texte de lArabe. Algérie: Univrersité de Blida.
Google Scholar
Vergyri, D., Mandal, A., Wang, W., Stolcke, A., Zheng, J., Graciarena, M., et al. (2008). Development of the SRI/Nightingale Arabic ASR system. Interspeech, 2008, 14371440.
Google Scholar
Vu, N.T, Kraus, F., & Schultz, T. (2011). Rapid building of an ASR system for under-resourced languages based on multilingual unsupervised training, Interspeech, Citeseer.
Wang, X., & Sim, K. (2013). Integrating conditional random fields and joint multi-gram model with syllabic features for grapheme-to-phone conversion, INTERSPEECH’2013.
Zribi, I., Boujelbane, R., Masmoudi, A., Ellouze, M., Belguith, L., & Habash, N. (2014). A conventional orthography for Tunisian Arabic. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014) (pp. 2355–2361). Reykjavik, Iceland.

Download references

Author information

Authors and Affiliations

LIUM, Le Mans University, Le Mans, France
Abir Masmoudi, Fethi Bougares & Yannick Estève
ANLP Research group, MIRACL Lab., University of Sfax, Sfax, Tunisia
Abir Masmoudi, Mariem Ellouze & Lamia Belguith

Authors

Abir Masmoudi
View author publications
You can also search for this author in PubMed Google Scholar
Fethi Bougares
View author publications
You can also search for this author in PubMed Google Scholar
Mariem Ellouze
View author publications
You can also search for this author in PubMed Google Scholar
Yannick Estève
View author publications
You can also search for this author in PubMed Google Scholar
Lamia Belguith
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abir Masmoudi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Masmoudi, A., Bougares, F., Ellouze, M. et al. Automatic speech recognition system for Tunisian dialect. Lang Resources & Evaluation 52, 249–267 (2018). https://doi.org/10.1007/s10579-017-9402-y

Download citation

Published: 22 September 2017
Issue Date: March 2018
DOI: https://doi.org/10.1007/s10579-017-9402-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic speech recognition system for Tunisian dialect

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Automatic speech recognition: a survey

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automatic speech recognition system for Tunisian dialect

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Automatic speech recognition: a survey

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation