Skip to main content
Erschienen in: International Journal of Speech Technology 2/2016

01.06.2016

Improving Arabic morphological analyzers benchmark

verfasst von: Younes Jaafar, Karim Bouzoubaa, Abdellah Yousfi, Rachida Tajmout, Hakima Khamar

Erschienen in: International Journal of Speech Technology | Ausgabe 2/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The various tools dedicated to Arabic natural language processing have undergone significant development during recent years. Among these tools, Arabic morphological analyzers are of great importance because they are often used within other projects that are more advanced such as syntactic parsers, search engines, machine translation systems, etc. Thus, researchers are forced to make a decision concerning which morphological analyzer to use in their research projects, and this task is very difficult since there are many criteria to take into account. In order to facilitate this choice, we considered the problem of benchmarking morphological analyzers in a previous work by proposing a solution that allows returning a set of metrics of each analyzer that are: accuracy, precision, recall, F-measure and the execution time. In this article, we present two new major improvements to our solution: the establishment of the first version of our corpus that is dedicated to the evaluation of morphological analyzers, as well as the introduction of a new metric, which combines all metrics related to results as well as the execution time of the analyzers.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Alansary, S., Nagi, M., & Adly, N. (2007). Building an international corpus of Arabic. 7th international conference on language engineering, (p. np.). Cairo. Alansary, S., Nagi, M., & Adly, N. (2007). Building an international corpus of Arabic. 7th international conference on language engineering, (p. np.). Cairo.
Zurück zum Zitat Al-Kabi, M., Al-Radaideh, Q., & Akkawi, K. (2011). Benchmarking and assessing the performance of Arabic stemmers. Journal of Information Science, 37(2), 111–119.CrossRef Al-Kabi, M., Al-Radaideh, Q., & Akkawi, K. (2011). Benchmarking and assessing the performance of Arabic stemmers. Journal of Information Science, 37(2), 111–119.CrossRef
Zurück zum Zitat Al-Sughaiyer, I. A., & Al-Kharashi, I. A. (2004). Arabic morphological analysis techniques: A comprehensive survey. Journal of the American Society For Information Science and Technology, 55(3), 189–213. Retrieved from Imad Al-Sughayer and Ibrahim Al-Kharashi. “Arabic morphological Analysis Techniques: a comprehensive Survey”. Computer and Electronics. Al-Sughaiyer, I. A., & Al-Kharashi, I. A. (2004). Arabic morphological analysis techniques: A comprehensive survey. Journal of the American Society For Information Science and Technology, 55(3), 189–213. Retrieved from Imad Al-Sughayer and Ibrahim Al-Kharashi. “Arabic morphological Analysis Techniques: a comprehensive Survey”. Computer and Electronics.
Zurück zum Zitat Boudlal, A., Lakhouaja, A., Mazroui, A., Meziane, A., Ould Abdallahi, O. B., & Shoul, M. (2011). Alkhalil Morpho Sys: A morphosyntactic analysis system for Arabic texts. Proceedings of ACIT’2010. Boudlal, A., Lakhouaja, A., Mazroui, A., Meziane, A., Ould Abdallahi, O. B., & Shoul, M. (2011). Alkhalil Morpho Sys: A morphosyntactic analysis system for Arabic texts. Proceedings of ACIT’2010.
Zurück zum Zitat Buckwalter, T. (2002b). Buckwalter Arabic morphological analyzer version 1.0. Buckwalter, T. (2002b). Buckwalter Arabic morphological analyzer version 1.0.
Zurück zum Zitat Champsaur, C. (2013, January). La traduction automatique : Un outil pour les traducteurs? The Journal of Specialised Translation, 19, pp. 19–28. Champsaur, C. (2013, January). La traduction automatique : Un outil pour les traducteurs? The Journal of Specialised Translation, 19, pp. 19–28.
Zurück zum Zitat Chennoufi, A., & Mazroui, A. (2014). Apport de la deuxième version de l’analyseur Alkhalil Morpho Sys dans la voyellation automatique des textes Arabes. 5th international conference on Arabic language processing (CITALA 2014), (pp. 223–230). Oujda. Chennoufi, A., & Mazroui, A. (2014). Apport de la deuxième version de l’analyseur Alkhalil Morpho Sys dans la voyellation automatique des textes Arabes. 5th international conference on Arabic language processing (CITALA 2014), (pp. 223–230). Oujda.
Zurück zum Zitat Darwish, K. (2002). Building a shallow Arabic morphological analyzer in one day. Proceedings of the ACL-2002 workshop on computational approaches to semitic languages, (pp. 47–54). Retrieved from https://aclweb.org/anthology. Darwish, K. (2002). Building a shallow Arabic morphological analyzer in one day. Proceedings of the ACL-2002 workshop on computational approaches to semitic languages, (pp. 47–54). Retrieved from https://​aclweb.​org/​anthology.
Zurück zum Zitat Diab, M. (2009). Second generation tools (AMIRA 2.0): Fast and robust tokenization, POS tagging, and base phrase chunking. Second international conference on Arabic language resources and tools, (pp. 285–288). Cairo. Diab, M. (2009). Second generation tools (AMIRA 2.0): Fast and robust tokenization, POS tagging, and base phrase chunking. Second international conference on Arabic language resources and tools, (pp. 285–288). Cairo.
Zurück zum Zitat Graff, D., Maamouri, M., Bouziri, B., Krouna, S., Kulick, S., & Buckwalter, T. (2009). Standard Arabic morphological analyzer (SAMA) version 3.1. Linguistic Data Consortium LDC2009E73. Graff, D., Maamouri, M., Bouziri, B., Krouna, S., Kulick, S., & Buckwalter, T. (2009). Standard Arabic morphological analyzer (SAMA) version 3.1. Linguistic Data Consortium LDC2009E73.
Zurück zum Zitat Habash, N., Rambow, O., & Roth, R. (2009). Mada + tokan: A toolkit for Arabic tokenization, diacritization, morphological disambiguation, pos tagging, stemming and lemmatization. Proceedings of the 2nd international conference on Arabic language resources and Tools (MEDAR), (pp. 102–109). Cairo. Habash, N., Rambow, O., & Roth, R. (2009). Mada + tokan: A toolkit for Arabic tokenization, diacritization, morphological disambiguation, pos tagging, stemming and lemmatization. Proceedings of the 2nd international conference on Arabic language resources and Tools (MEDAR), (pp. 102–109). Cairo.
Zurück zum Zitat Hassan, Y., Aly, M., & Atiya, A. (2014). Arabic spelling correction using supervised learning. Proceedings of the EMNLP 2014 workshop on Arabic, (pp. 121–126). Doha. Hassan, Y., Aly, M., & Atiya, A. (2014). Arabic spelling correction using supervised learning. Proceedings of the EMNLP 2014 workshop on Arabic, (pp. 121–126). Doha.
Zurück zum Zitat Hattab, M., Haddad, B., Yaseen, M., Duraidi, A., & Shmais, A. A. (2009). Addaall Arabic search engine: Improving search based on combination of morphological analysis and generation considering semantic patterns. The 2nd international conference on Arabic language resources & tools, (pp. 159–162). Hattab, M., Haddad, B., Yaseen, M., Duraidi, A., & Shmais, A. A. (2009). Addaall Arabic search engine: Improving search based on combination of morphological analysis and generation considering semantic patterns. The 2nd international conference on Arabic language resources & tools, (pp. 159–162).
Zurück zum Zitat Jaafar, Y., & Bouzoubaa, K. (2014). Benchmark of Arabic morphological analyzers: Challenges and solutions. Intelligent systems: Theories and applications (SITA-14), (pp. 1–6). Rabat. Jaafar, Y., & Bouzoubaa, K. (2014). Benchmark of Arabic morphological analyzers: Challenges and solutions. Intelligent systems: Theories and applications (SITA-14), (pp. 1–6). Rabat.
Zurück zum Zitat Kano, Y., Dorado, R., McCrohon, L., Ananiadou, S., & Tsujii, J. (2010). U-Compare: An integrated language resource evaluation platform including a comprehensive UIMA resource library. Proceedings of the seventh international conference on language resources and evaluation (LREC 2010), (pp. 428–434). Kano, Y., Dorado, R., McCrohon, L., Ananiadou, S., & Tsujii, J. (2010). U-Compare: An integrated language resource evaluation platform including a comprehensive UIMA resource library. Proceedings of the seventh international conference on language resources and evaluation (LREC 2010), (pp. 428–434).
Zurück zum Zitat Koulali, R., & Meziane, A. (2013). Experiments with Arabic topic detection. Journal of Theoretical and Applied Information Technology, 50(1), 28–32. Koulali, R., & Meziane, A. (2013). Experiments with Arabic topic detection. Journal of Theoretical and Applied Information Technology, 50(1), 28–32.
Zurück zum Zitat Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends® in Information Retrieval, 2(1–2), 1–135.CrossRef Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends® in Information Retrieval, 2(1–2), 1–135.CrossRef
Zurück zum Zitat Pasha, A., Al-Badrashiny, M., Diab, M., El Kholy, A., Eskander, R., Habash, N., & Roth, R. M. (2014). MADAMIRA: A fast, comprehensive tool for morphological analysis and disambiguation of Arabic. LREC’14, (pp. 1094–1101). Reykjavik. Pasha, A., Al-Badrashiny, M., Diab, M., El Kholy, A., Eskander, R., Habash, N., & Roth, R. M. (2014). MADAMIRA: A fast, comprehensive tool for morphological analysis and disambiguation of Arabic. LREC’14, (pp. 1094–1101). Reykjavik.
Zurück zum Zitat Sawalha, M., & Atwell, E. (2008). Comparative evaluation of Arabic language morphological analysers and stemmers. International conference on computational linguistics—COLING, (pp. 107–110). Retrieved from https://aclweb.org/anthology. Sawalha, M., & Atwell, E. (2008). Comparative evaluation of Arabic language morphological analysers and stemmers. International conference on computational linguisticsCOLING, (pp. 107–110). Retrieved from https://​aclweb.​org/​anthology.
Zurück zum Zitat Smrž, O. (2007). ElixirFM: Implementation of functional Arabic morphology. Proceedings of the 2007 workshop on computational approaches to Semitic languages: common issues and resources (pp. 1–8). Stroudsburg: Association for Computational Linguistics. Smrž, O. (2007). ElixirFM: Implementation of functional Arabic morphology. Proceedings of the 2007 workshop on computational approaches to Semitic languages: common issues and resources (pp. 1–8). Stroudsburg: Association for Computational Linguistics.
Zurück zum Zitat Wali, W., Gargouri, B., & Ben Hamadou, A. (2014). A system for evaluating the content of LMF Arabic dictionaries. 5th international conference on Arabic language processing (CITALA 2014), (pp. 159–167). Oujda. Wali, W., Gargouri, B., & Ben Hamadou, A. (2014). A system for evaluating the content of LMF Arabic dictionaries. 5th international conference on Arabic language processing (CITALA 2014), (pp. 159–167). Oujda.
Metadaten
Titel
Improving Arabic morphological analyzers benchmark
verfasst von
Younes Jaafar
Karim Bouzoubaa
Abdellah Yousfi
Rachida Tajmout
Hakima Khamar
Publikationsdatum
01.06.2016
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 2/2016
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-016-9340-x

Weitere Artikel der Ausgabe 2/2016

International Journal of Speech Technology 2/2016 Zur Ausgabe

Special Issue Article

WIT: Weka interface translator

Neuer Inhalt