Skip to main content
Erschienen in: International Journal of Speech Technology 2/2016

11.06.2015

Comparative evaluation of tools for Arabic corpora search and analysis

verfasst von: Abdullah Alfaifi, Eric Atwell

Erschienen in: International Journal of Speech Technology | Ausgabe 2/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

As the number of Arabic corpora is constantly increasing, there is an obvious and growing need for concordancing software for corpus search and analysis that supports as many features as possible of the Arabic language, and provides users with a greater number of functions. This paper evaluates six existing corpus search and analysis tools based on eight criteria which seem to be the most essential for searching and analysing Arabic corpora, such as displaying Arabic text in its right-to-left direction, normalising diacritics and Hamza, and providing an Arabic user interface. The results of the evaluation revealed that three tools: Khawas, Sketch Engine, and aConCorde, have met most of the evaluation criteria and achieved the highest benchmark scores. The paper concluded that developers’ conscious consideration of the linguistic features of Arabic when designing these three tools was the most significant factor behind their superiority.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Alansary, S., Nagi, M., & Adly, N. (2007). Building an international corpus of Arabic (ICA): Progress of compilation stage. Paper presented at the seventh conference of language engineering ESOLEC (5–6 December 2007), Cairo, Egypt. Alansary, S., Nagi, M., & Adly, N. (2007). Building an international corpus of Arabic (ICA): Progress of compilation stage. Paper presented at the seventh conference of language engineering ESOLEC (5–6 December 2007), Cairo, Egypt.
Zurück zum Zitat Alfaifi, A., Atwell, E., & Hedaya, I. (2014). Arabic learner corpus (ALC) v. 2: A new written and spoken corpus of Arabic learners. In S. Ishikawa (Ed.), Learner corpus studies in Asia and the World (Vol. 2, pp. 77–89). Papers from LCSAW2014. Kobe: School of Languages and Communication, Kobe University. Alfaifi, A., Atwell, E., & Hedaya, I. (2014). Arabic learner corpus (ALC) v. 2: A new written and spoken corpus of Arabic learners. In S. Ishikawa (Ed.), Learner corpus studies in Asia and the World (Vol. 2, pp. 77–89). Papers from LCSAW2014. Kobe: School of Languages and Communication, Kobe University.
Zurück zum Zitat Al-Sulaiti, L., & Atwell, E. (2006). The design of a corpus of contemporary Arabic. International Journal of Corpus Linguistics, 11, 135–171.CrossRef Al-Sulaiti, L., & Atwell, E. (2006). The design of a corpus of contemporary Arabic. International Journal of Corpus Linguistics, 11, 135–171.CrossRef
Zurück zum Zitat Al-Thubaity, A., Khan, M., Al-Mazrua, M., & Almoussa, M. (2013). New language resources for Arabic Corpora containing more than two million words and a corpus processing tool. In Proceedings of IALP international conference on Asian language processing, Urumqui, Xinjiang Uyghur Autonomous Region, China (pp. 67–70). Al-Thubaity, A., Khan, M., Al-Mazrua, M., & Almoussa, M. (2013). New language resources for Arabic Corpora containing more than two million words and a corpus processing tool. In Proceedings of IALP international conference on Asian language processing, Urumqui, Xinjiang Uyghur Autonomous Region, China (pp. 67–70).
Zurück zum Zitat Anthony, L. (2005). AntCone: design and development of a freeware corpus analysis toolkit for the technical writing classroom. In Proceedings of IPCC international professional communication conference, Limerick (pp. 729–737). Anthony, L. (2005). AntCone: design and development of a freeware corpus analysis toolkit for the technical writing classroom. In Proceedings of IPCC international professional communication conference, Limerick (pp. 729–737).
Zurück zum Zitat Atwell, E.S., Al-Sulaiti, L., Al-Osaimi, S., & Abu Shawar, B. A. (2004). A review of Arabic corpus analysis tools—un examen d’outils pour l’analyse de corpora Arabes. In B. Bel & I. Marlien (Eds.) Proceedings of TALN04: XI Conference sur le Traitement Automatique des Langues Naturelles (Vol. 2, pp. 229–234). Atwell, E.S., Al-Sulaiti, L., Al-Osaimi, S., & Abu Shawar, B. A. (2004). A review of Arabic corpus analysis tools—un examen d’outils pour l’analyse de corpora Arabes. In B. Bel & I. Marlien (Eds.) Proceedings of TALN04: XI Conference sur le Traitement Automatique des Langues Naturelles (Vol. 2, pp. 229–234).
Zurück zum Zitat Burnard, L. (2005). Metadata for corpus work. In M. Wynne (Ed.), Developing linguistic corpora: A guide to good practice (pp. 30–46). Oxford: Oxbow Books. Burnard, L. (2005). Metadata for corpus work. In M. Wynne (Ed.), Developing linguistic corpora: A guide to good practice (pp. 30–46). Oxford: Oxbow Books.
Zurück zum Zitat Habash, N. (2010). Introduction to Arabic natural language processing. In G. Hirst (Ed.), Synthesis lectures on human language technologies. San Rafael, CA: Morgan and Claypool. Habash, N. (2010). Introduction to Arabic natural language processing. In G. Hirst (Ed.), Synthesis lectures on human language technologies. San Rafael, CA: Morgan and Claypool.
Zurück zum Zitat Kilgarriff, A., Rychly, P., Smrz, P., & Tugwell, D. (2004). The sketch engine. In Proceedings of the Euralex, 6–10 July 2004, pp. 105–116, Lorient, France. Kilgarriff, A., Rychly, P., Smrz, P., & Tugwell, D. (2004). The sketch engine. In Proceedings of the Euralex, 610 July 2004, pp. 105–116, Lorient, France.
Zurück zum Zitat Roberts, A., Al-Sulaiti, L., & Atwell, E. (2006). aConCorde: Towards an open-source, extendable concordancer for Arabic. Corpora (Vol. 1, pp. 39–60). Roberts, A., Al-Sulaiti, L., & Atwell, E. (2006). aConCorde: Towards an open-source, extendable concordancer for Arabic. Corpora (Vol. 1, pp. 39–60).
Zurück zum Zitat Samy, W., & Samy, L. (2014). Basic arabic: A grammar and workbook. London: Routledge. Samy, W., & Samy, L. (2014). Basic arabic: A grammar and workbook. London: Routledge.
Zurück zum Zitat Scott, M. (2008). Developing Wordsmith. International Journal of English Studies, 8(1), 95–106. Scott, M. (2008). Developing Wordsmith. International Journal of English Studies, 8(1), 95–106.
Zurück zum Zitat Wiechmann, D., & Fuhs, S. (2006). Concordancing software. Corpus Linguistics and Linguistic Theory Journal, 2(1), 107–127. Wiechmann, D., & Fuhs, S. (2006). Concordancing software. Corpus Linguistics and Linguistic Theory Journal, 2(1), 107–127.
Zurück zum Zitat Wilson, J., Hartley, A., Sharoff, S., & Stephenson, P. (2010). Advanced corpus solutions for humanities researchers. In Proceedings of PACLIC 24, Sendai, Japan. Wilson, J., Hartley, A., Sharoff, S., & Stephenson, P. (2010). Advanced corpus solutions for humanities researchers. In Proceedings of PACLIC 24, Sendai, Japan.
Metadaten
Titel
Comparative evaluation of tools for Arabic corpora search and analysis
verfasst von
Abdullah Alfaifi
Eric Atwell
Publikationsdatum
11.06.2015
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 2/2016
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-015-9285-5

Weitere Artikel der Ausgabe 2/2016

International Journal of Speech Technology 2/2016 Zur Ausgabe

Neuer Inhalt