research-article

RENAR: A Rule-Based Arabic Named Entity Recognition System

Author:
Wajdi Zaghouani

University of Pennsylvania

University of Pennsylvania
View Profile

ACM Transactions on Asian Language Information Processing Volume 11 Issue 1Article No.: 2pp 1–13https://doi.org/10.1145/2090176.2090178

Published:01 March 2012Publication History

ACM Transactions on Asian Language Information Processing

Abstract

Named entity recognition has served many natural language processing tasks such as information retrieval, machine translation, and question answering systems. Many researchers have addressed the name identification issue in a variety of languages and recently some research efforts have started to focus on named entity recognition for the Arabic language. We present a working Arabic information extraction (IE) system that is used to analyze large volumes of news texts every day to extract the named entity (NE) types person, organization, location, date, and number, as well as quotations (direct reported speech) by and about people. The named entity recognition (NER) system was not developed for Arabic, but instead a multilingual NER system was adapted to also cover Arabic. The Semitic language Arabic substantially differs from the Indo-European and Finno-Ugric languages currently covered. This article thus describes what Arabic language-specific resources had to be developed and what changes needed to be made to the rule set in order to be applicable to the Arabic language. The achieved evaluation results are generally satisfactory, but could be improved for certain entity types.

References

Abuleil, S. 2004. Extracting names from Arabic text for question-answering systems. In Proceedings of Coupling Approaches, Coupling Media and Coupling Languages for Information Retrieval (RIAO’04). 638--647.Google ScholarDigital Library
Benajiba, D. Y. 2009a. Named entity recognition. Doctoral dissertation, Universidad Politecnica de Valencia.Google Scholar
Benajiba, Y. and Rosso, P. 2007. ANERsys 2.0: Conquering the NER task for the Arabic language by combining the maximum entropy with POS-tag information. In Proceedings of the Workshop on Language-Independent Engineering (LIE’07).Google Scholar
Benajiba, Y., Rosso P., and Benedi, J.-M. 2007. Arabic ANERsys: An Arabic named entity recognition system based on maximum entropy. In Proceedings of the Conference on Computational Linguistics and Intelligent Text Processing (CLITP’07). 143--153. Google ScholarDigital Library
Benajiba, Y., Diab, M., and Rosso, P. 2008. Arabic named entity recognition using optimized feature sets. In Proceedings of the Joint Meeting of the Conference on Empirical Methods in Natural Language Processing (EMNLP’08). 284--293. Google ScholarDigital Library
Benajiba, Y., Diab, M., and Rosso, P. 2009b. Using language independent and language specific features to enhance Arabic NER. Int. Arabic J. Inf. Technol., 463--471.Google Scholar
Debili, F. and Achour, H. 1998. Voyellation automatique de l’arabe. In Proceedings of the Workshop on Computational Approaches to Semitic Languages (CASL’98). 42--49. Google ScholarDigital Library
Doaa, S., Moreno-Sandoval, A., and Guirao, J.-M. 2005. A proposal for an Arabic named entity tagger leveraging a parallel corpus (Spanish-Arabic). In Proceedings of Recent Advances in Natural Language Processing (RANLP’05). 459--465.Google Scholar
Grishman, R. and Sundheim, B. 1996. Message Understanding Conference - 6: A brief history. In Proceedings of the International Conference on Computer Linguistics (COLING’96). 466--471. Google ScholarDigital Library
Maloney, J. and Niv, M. 1998. TAGARAB: A fast, accurate Arabic name recognizer using high precision morphological analysis. In Proceedings of the Workshop on Computational Approaches to Semitic Languages (CASL’98). 8--15. Google ScholarDigital Library
Nadeau, D. and Sekine, S. 2009. A survey of named entity recognition and classification. In Named Entities -- Recognition, Classification and Use. S. Sekine and E. Ranchhod Eds., Benjamins Current Topics, Vol. 19, John Benjamins Publishing Company, Amsterdam.Google Scholar
Pouliquen, B., Steinberger, R., Ignat, C., Temnikova, I., Widiger, A., Zaghouani, W., and Žižka, J. 2005. Multilingual person name recognition and transliteration. Corela, Numéros spéciaux, Le traitement lexicographique des noms propres.Google Scholar
Sekine, S. 2004. amed entity: History and future. http://cs.nyu.edu/~sekine/papers/NEsurvey200402.pdf.Google Scholar
Shaalan, K. 2005. Arabic GramCheck: A grammar checker for Arabic. Softw. Prac. Exp. 35, 7, 643--665. Google ScholarDigital Library
Shaalan, K. and Raza, H. 2008. Arabic named entity recognition from diverse text types. In Proceedings of the 6th International Conference (GoTAL’09). 440--451. Google ScholarDigital Library
Shaalan, K. and Raza, H. 2009. NERA: Named entity recognition for Arabic. J. Amer. Soc. for Inf. Sci. Technol. 60, 8, 1652--1663. Google ScholarDigital Library
Steinberger, R., Pouliquen, B., and Ignat, C. 2008. Using language-independent rules to achieve high multilinguality in Text Mining. In Mining Massive Data Sets for Security. F.-S. Françoise, D. Perrotta, J. Piskorski, and R. Steinberger Eds., IOS Press, 217--240.Google Scholar
Steinberger, R., Pouliquen, B. and Van der Goot, E. 2009. An introduction to the Europe media monitor family of applications. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-CLIR’09). F. Gey, N. Kando, and J. Karlgren Eds. 1--8.Google Scholar
Traboulsi, H. N. 2006. Named Entity Recognition: A local grammar-based approach. Doctoral dissertation, Department of Computing, Surrey University, Guildford, U.K.Google Scholar
Traboulsi, H. N. 2009. Arabic named entity extraction: A local grammar-based approach. In Proceedings of the International Multiconference on Computer Science and Information Technology (IMCSIT’09). 139--143.Google ScholarCross Ref
Vergyri, D., Kirchhoff, K., Duh, K., and Stolcke, A. 2004. Morphology-based language modeling for Arabic speech recognition. In Proceedings of International Conference on Spoken Language Processing (ICSLP’04). 2245--2248.Google Scholar
Zaghouani, W. 2009. Le repérage automatique des entités nommées dans la langue arabe: Vers la création d’un système à base de règles. Master’s thesis, University of Montreal.Google Scholar
Zitouni, I., Sorensen, J., Luo, X., and Florian, R. 2005. The impact of morphological stemming on Arabic mention detection and coreference resolution. In Proceedings of the Workshop of Computational Approaches to Semitic Languages (ACL’05). 79--86. Google ScholarDigital Library

Index Terms

RENAR: A Rule-Based Arabic Named Entity Recognition System
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

Arabic Named Entity Recognition from Diverse Text Types
GoTAL '08: Proceedings of the 6th international conference on Advances in Natural Language Processing

Name identification has been worked on quite intensively for the past few years, and has been incorporated into several products. Many researchers have attacked this problem in a variety of languages but only a few limited researches have focused on ...
Read More
Learning multilingual named entity recognition from Wikipedia

We automatically create enormous, free and multilingual silver-standard training annotations for named entity recognition (ner) by exploiting the text and structure of Wikipedia. Most ner systems rely on statistical models of annotated data to identify ...
Read More
Two-stage approach to named entity recognition using Wikipedia and DBpedia
IMCOM '17: Proceedings of the 11th International Conference on Ubiquitous Information Management and Communication

In natural language understanding, extraction of named entity (NE) mentions in given text and classification of the mentions into pre-defined NE types are important processes. Most NE recognition (NER) relies on resources such as a training corpus or NE ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Asian Language Information Processing Volume 11, Issue 1
March 2012
72 pages
ISSN:1530-0226
EISSN:1558-3430
DOI:10.1145/2090176
Issue’s Table of Contents

Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 March 2012
- Accepted: 1 May 2011
- Revised: 1 April 2011
- Received: 1 June 2010
Published in talip Volume 11, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Arabic natural language processing
Named entity recognition
information extraction
rule-based systems
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 33
  Total Citations
  View Citations
- 589
  Total Downloads
- Downloads (Last 12 months)20
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

RENAR: A Rule-Based Arabic Named Entity Recognition System

ACM Transactions on Asian Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

Arabic Named Entity Recognition from Diverse Text Types

Learning multilingual named entity recognition from Wikipedia

Two-stage approach to named entity recognition using Wikipedia and DBpedia

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

RENAR: A Rule-Based Arabic Named Entity Recognition System

ACM Transactions on Asian Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

Arabic Named Entity Recognition from Diverse Text Types

Learning multilingual named entity recognition from Wikipedia

Two-stage approach to named entity recognition using Wikipedia and DBpedia

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media