skip to main content
research-article

RENAR: A Rule-Based Arabic Named Entity Recognition System

Published:01 March 2012Publication History
Skip Abstract Section

Abstract

Named entity recognition has served many natural language processing tasks such as information retrieval, machine translation, and question answering systems. Many researchers have addressed the name identification issue in a variety of languages and recently some research efforts have started to focus on named entity recognition for the Arabic language. We present a working Arabic information extraction (IE) system that is used to analyze large volumes of news texts every day to extract the named entity (NE) types person, organization, location, date, and number, as well as quotations (direct reported speech) by and about people. The named entity recognition (NER) system was not developed for Arabic, but instead a multilingual NER system was adapted to also cover Arabic. The Semitic language Arabic substantially differs from the Indo-European and Finno-Ugric languages currently covered. This article thus describes what Arabic language-specific resources had to be developed and what changes needed to be made to the rule set in order to be applicable to the Arabic language. The achieved evaluation results are generally satisfactory, but could be improved for certain entity types.

References

  1. Abuleil, S. 2004. Extracting names from Arabic text for question-answering systems. In Proceedings of Coupling Approaches, Coupling Media and Coupling Languages for Information Retrieval (RIAO’04). 638--647.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Benajiba, D. Y. 2009a. Named entity recognition. Doctoral dissertation, Universidad Politecnica de Valencia.Google ScholarGoogle Scholar
  3. Benajiba, Y. and Rosso, P. 2007. ANERsys 2.0: Conquering the NER task for the Arabic language by combining the maximum entropy with POS-tag information. In Proceedings of the Workshop on Language-Independent Engineering (LIE’07).Google ScholarGoogle Scholar
  4. Benajiba, Y., Rosso P., and Benedi, J.-M. 2007. Arabic ANERsys: An Arabic named entity recognition system based on maximum entropy. In Proceedings of the Conference on Computational Linguistics and Intelligent Text Processing (CLITP’07). 143--153. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Benajiba, Y., Diab, M., and Rosso, P. 2008. Arabic named entity recognition using optimized feature sets. In Proceedings of the Joint Meeting of the Conference on Empirical Methods in Natural Language Processing (EMNLP’08). 284--293. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Benajiba, Y., Diab, M., and Rosso, P. 2009b. Using language independent and language specific features to enhance Arabic NER. Int. Arabic J. Inf. Technol., 463--471.Google ScholarGoogle Scholar
  7. Debili, F. and Achour, H. 1998. Voyellation automatique de l’arabe. In Proceedings of the Workshop on Computational Approaches to Semitic Languages (CASL’98). 42--49. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Doaa, S., Moreno-Sandoval, A., and Guirao, J.-M. 2005. A proposal for an Arabic named entity tagger leveraging a parallel corpus (Spanish-Arabic). In Proceedings of Recent Advances in Natural Language Processing (RANLP’05). 459--465.Google ScholarGoogle Scholar
  9. Grishman, R. and Sundheim, B. 1996. Message Understanding Conference - 6: A brief history. In Proceedings of the International Conference on Computer Linguistics (COLING’96). 466--471. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Maloney, J. and Niv, M. 1998. TAGARAB: A fast, accurate Arabic name recognizer using high precision morphological analysis. In Proceedings of the Workshop on Computational Approaches to Semitic Languages (CASL’98). 8--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Nadeau, D. and Sekine, S. 2009. A survey of named entity recognition and classification. In Named Entities -- Recognition, Classification and Use. S. Sekine and E. Ranchhod Eds., Benjamins Current Topics, Vol. 19, John Benjamins Publishing Company, Amsterdam.Google ScholarGoogle Scholar
  12. Pouliquen, B., Steinberger, R., Ignat, C., Temnikova, I., Widiger, A., Zaghouani, W., and Žižka, J. 2005. Multilingual person name recognition and transliteration. Corela, Numéros spéciaux, Le traitement lexicographique des noms propres.Google ScholarGoogle Scholar
  13. Sekine, S. 2004. amed entity: History and future. http://cs.nyu.edu/~sekine/papers/NEsurvey200402.pdf.Google ScholarGoogle Scholar
  14. Shaalan, K. 2005. Arabic GramCheck: A grammar checker for Arabic. Softw. Prac. Exp. 35, 7, 643--665. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Shaalan, K. and Raza, H. 2008. Arabic named entity recognition from diverse text types. In Proceedings of the 6th International Conference (GoTAL’09). 440--451. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Shaalan, K. and Raza, H. 2009. NERA: Named entity recognition for Arabic. J. Amer. Soc. for Inf. Sci. Technol. 60, 8, 1652--1663. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Steinberger, R., Pouliquen, B., and Ignat, C. 2008. Using language-independent rules to achieve high multilinguality in Text Mining. In Mining Massive Data Sets for Security. F.-S. Françoise, D. Perrotta, J. Piskorski, and R. Steinberger Eds., IOS Press, 217--240.Google ScholarGoogle Scholar
  18. Steinberger, R., Pouliquen, B. and Van der Goot, E. 2009. An introduction to the Europe media monitor family of applications. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-CLIR’09). F. Gey, N. Kando, and J. Karlgren Eds. 1--8.Google ScholarGoogle Scholar
  19. Traboulsi, H. N. 2006. Named Entity Recognition: A local grammar-based approach. Doctoral dissertation, Department of Computing, Surrey University, Guildford, U.K.Google ScholarGoogle Scholar
  20. Traboulsi, H. N. 2009. Arabic named entity extraction: A local grammar-based approach. In Proceedings of the International Multiconference on Computer Science and Information Technology (IMCSIT’09). 139--143.Google ScholarGoogle ScholarCross RefCross Ref
  21. Vergyri, D., Kirchhoff, K., Duh, K., and Stolcke, A. 2004. Morphology-based language modeling for Arabic speech recognition. In Proceedings of International Conference on Spoken Language Processing (ICSLP’04). 2245--2248.Google ScholarGoogle Scholar
  22. Zaghouani, W. 2009. Le repérage automatique des entités nommées dans la langue arabe: Vers la création d’un système à base de règles. Master’s thesis, University of Montreal.Google ScholarGoogle Scholar
  23. Zitouni, I., Sorensen, J., Luo, X., and Florian, R. 2005. The impact of morphological stemming on Arabic mention detection and coreference resolution. In Proceedings of the Workshop of Computational Approaches to Semitic Languages (ACL’05). 79--86. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. RENAR: A Rule-Based Arabic Named Entity Recognition System

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian Language Information Processing
      ACM Transactions on Asian Language Information Processing  Volume 11, Issue 1
      March 2012
      72 pages
      ISSN:1530-0226
      EISSN:1558-3430
      DOI:10.1145/2090176
      Issue’s Table of Contents

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 March 2012
      • Accepted: 1 May 2011
      • Revised: 1 April 2011
      • Received: 1 June 2010
      Published in talip Volume 11, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader