Skip to main content
Erschienen in: International Journal of Speech Technology 2/2016

08.10.2015

Bidirectional HMM-based Arabic POS tagging

verfasst von: Ayoub Kadim, Azzeddine Lazrek

Erschienen in: International Journal of Speech Technology | Ausgabe 2/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this work, we will present a new concept of POS tagging that will be implemented for the Arabic language. Indeed, we will see that in Arabic there are a numerous cases where the determination of the morpho-syntactic state of a word depends on the states of the subsequent words, which represents the theoretical foundation of the approach: how to consider, in addition of the past elements, the future ones. We will then demonstrate how the POS tagging in its statistical application: the HMM, is based mainly on the past elements, and how to combine both direct and reverse taggers to tag the same sequence of words in both senses. Thus, we will propose a hypothesis for the result selecting. In the practical part, we will present, in general, the used resource and the changes made on it. Then we will explain the experiment steps and the parameters collected and presented on graphics, that we will discuss later to lead to the final conclusion.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
We can say for this function of lA:لا رجلٌ قائمٌ بل رجلان lA rajulN qA}mN bal rajulAni (It is not one man who is standing but two) contrary to the first function (Ibn 'Aqil 2002).
 
Literatur
Zurück zum Zitat Albared, M., Omar, N., & Ab Aziz, M. J. (2009). Classifiers combination to Arabic morphosyntactic disambiguation. International Conference on Electrical Engineering and Informatics, 1, 163–171. Albared, M., Omar, N., & Ab Aziz, M. J. (2009). Classifiers combination to Arabic morphosyntactic disambiguation. International Conference on Electrical Engineering and Informatics, 1, 163–171.
Zurück zum Zitat Alkhalil, I. A. (1985). Aljomal fi annahw (Sentences in the Arabic grammar) (1st ed., p. 208). Beirut: Moeassasat Arrisala. Alkhalil, I. A. (1985). Aljomal fi annahw (Sentences in the Arabic grammar) (1st ed., p. 208). Beirut: Moeassasat Arrisala.
Zurück zum Zitat Brill, E. (2000). Part-of-speech tagging. In R. Dale, H. Moisl, & H. Somers (Eds.), Handbook of natural language processing (pp. 403–414). CRC Press: Boca Raton. Brill, E. (2000). Part-of-speech tagging. In R. Dale, H. Moisl, & H. Somers (Eds.), Handbook of natural language processing (pp. 403–414). CRC Press: Boca Raton.
Zurück zum Zitat Diab, M., Hacioglu, K., & Jurafsky, D. (2007). Automated methods for processing Arabic text: From tokenization to base phrase chunking. Arabic Computational Morphology: Knowledge-based and Empirical Methods. Dordrecht: Kluwer/Springer. Diab, M., Hacioglu, K., & Jurafsky, D. (2007). Automated methods for processing Arabic text: From tokenization to base phrase chunking. Arabic Computational Morphology: Knowledge-based and Empirical Methods. Dordrecht: Kluwer/Springer.
Zurück zum Zitat Greene, B. B., & Rubin, G. M. (1971).Automated grammatical tagging of English. Greene, B. B., & Rubin, G. M. (1971).Automated grammatical tagging of English.
Zurück zum Zitat Habash, N., Rambow, O., & Roth, R. (2009, April). MADA + TOKAN: A toolkit for Arabic tokenization, diacritization, morphological disambiguation, POS tagging, stemming and lemmatization. In Proceedings of the 2nd international conference on Arabic Language Resources and Tools (MEDAR), Cairo (pp. 102–109). Habash, N., Rambow, O., & Roth, R. (2009, April). MADA + TOKAN: A toolkit for Arabic tokenization, diacritization, morphological disambiguation, POS tagging, stemming and lemmatization. In Proceedings of the 2nd international conference on Arabic Language Resources and Tools (MEDAR), Cairo (pp. 102–109).
Zurück zum Zitat Ibn ‘Aqil. (2002). Charh Ibn Aqile eala alfiat Ben Malik (explanation of Alfiat Ibn Malik). Tahqiq Mohyi Adin AbdAlhamid, almaktaba aleasria, vol. 1, Beirut, Lebanon. Ibn ‘Aqil. (2002). Charh Ibn Aqile eala alfiat Ben Malik (explanation of Alfiat Ibn Malik). Tahqiq Mohyi Adin AbdAlhamid, almaktaba aleasria, vol. 1, Beirut, Lebanon.
Zurück zum Zitat Ibn Hicham, A. (2013). Moghni allabib ean kotobi al aearib (what suffices the thinker from the books of arabic traditional grammar). Almaktaba alassrya, vol. 1, Sidon, Lebanon, ISBN: 9953-400-37-7, pp.102–108. Ibn Hicham, A. (2013). Moghni allabib ean kotobi al aearib (what suffices the thinker from the books of arabic traditional grammar). Almaktaba alassrya, vol. 1, Sidon, Lebanon, ISBN: 9953-400-37-7, pp.102–108.
Zurück zum Zitat Jurafsky, D., & James, H. (2000). Speech and language processing an introduction to natural language processing, Computational Linguistics and Natural Language Processing. Prentice Hall, ISBN: 10: 0131873210, pp. 1024 Jurafsky, D., & James, H. (2000). Speech and language processing an introduction to natural language processing, Computational Linguistics and Natural Language Processing. Prentice Hall, ISBN: 10: 0131873210, pp. 1024
Zurück zum Zitat Kim, J. H. (1993). Korean Part-of-Speech Tagging by Using a Fuzzy net. Proceedings of the 5th national conference on Korean Information Processing. Kim, J. H. (1993). Korean Part-of-Speech Tagging by Using a Fuzzy net. Proceedings of the 5th national conference on Korean Information Processing.
Zurück zum Zitat Klein, S., & Simmons, R. F. (1963). A computational approach to grammatical coding of English words. Journal of the ACM (JACM), 10(3), 334–347.CrossRefMATH Klein, S., & Simmons, R. F. (1963). A computational approach to grammatical coding of English words. Journal of the ACM (JACM), 10(3), 334–347.CrossRefMATH
Zurück zum Zitat Kübler, S., & Mohamed, E. (2012). Part of speech tagging for Arabic. Natural Language Engineering, 18(04), 521–548.CrossRef Kübler, S., & Mohamed, E. (2012). Part of speech tagging for Arabic. Natural Language Engineering, 18(04), 521–548.CrossRef
Zurück zum Zitat Voutilainen, A. (2003). Part-of-speech tagging. In R. Mitkov (Ed.), The Oxford handbook of computational linguistics (pp. 219–232). New York: Oxford University Press. Voutilainen, A. (2003). Part-of-speech tagging. In R. Mitkov (Ed.), The Oxford handbook of computational linguistics (pp. 219–232). New York: Oxford University Press.
Zurück zum Zitat Yaseen, M., Attia, M., Maegaard, B., Choukri, K., Paulsson, N., Haamid, S., Krauwer, S., Bendahman, C., Fersøe, H., Rashwan, M., Haddad, B., Mukbel, C., Mouradi, A., Al-Kufaishi, A., Shahin, M., Chenfour, N., & Ragheb, A. (2006). Building annotated written and spoken Arabic LR’s in NEMLAR project. Proceedings of LREC. pp. 533–538. Available: https://uop.edu.jo/download/research/members/202_1544_Yase.pdf Yaseen, M., Attia, M., Maegaard, B., Choukri, K., Paulsson, N., Haamid, S., Krauwer, S., Bendahman, C., Fersøe, H., Rashwan, M., Haddad, B., Mukbel, C., Mouradi, A., Al-Kufaishi, A., Shahin, M., Chenfour, N., & Ragheb, A. (2006). Building annotated written and spoken Arabic LR’s in NEMLAR project. Proceedings of LREC. pp. 533–538. Available: https://​uop.​edu.​jo/​download/​research/​members/​202_​1544_​Yase.​pdf
Metadaten
Titel
Bidirectional HMM-based Arabic POS tagging
verfasst von
Ayoub Kadim
Azzeddine Lazrek
Publikationsdatum
08.10.2015
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 2/2016
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-015-9303-7

Weitere Artikel der Ausgabe 2/2016

International Journal of Speech Technology 2/2016 Zur Ausgabe

Neuer Inhalt