ABSTRACT
Named entities are important content-carrying units within documents. Consequently named entity recognition (NER) is an important part of information extraction. One fast and accurate approach to NER uses a list or gazette consisting of known instances. Gazette creation problem considers how to automatically create a comprehensive gazette from given unlabeled document repository. We describe an unsupervised algorithm for automatic gazette creation, which is modified from [5]. We propose a fast NER algorithm using large gazette and show that it significantly outperforms a naïve approach based on regular expressions. We describe experimental results obtained by using the system for gazette creation for various resume related named entities (e.g., ORG, DEGREE, EDUCATIONAL_INSTITUTE, DESIGNATION) and the associated NER on a large set of real-life resumes.
- Collins, M. and Singer, Y. 1999. Unsupervised models for named entity classification. Proc. EMNLP.Google Scholar
- Etzioni, O., Cafarella, M., Downey, D., Popescu, A.-M., Shaked, T., Soderland, S., Weld, D. S. and Yates, A. 2005. Unsupervised named-entity extraction from the Web: An experimental study. Artificial Intelligence, 165, pp. 91--134. Google ScholarDigital Library
- Nadeau, D., Turney, P. and Matwin, S. 2006. Unsupervised named-entity recognition: generating gazetteers and resolving ambiguity. Proc. 19th Canadian Conf. Artificial Intelligence. Google ScholarDigital Library
- Palshikar, G. K., 2011. Techniques for named entity recognition: a survey. TRDDC Technical Report.Google Scholar
- Thelen, M. and Riloff E. 2002. A bootstrapping method for learning semantic lexicons using extraction pattern contexts. Conference on Empirical Methods in Natural Language Processing (EMNLP 2002). Google ScholarDigital Library
Index Terms
- Automatic gazette creation for named entity recognition and application to resume processing
Recommendations
Learning multilingual named entity recognition from Wikipedia
We automatically create enormous, free and multilingual silver-standard training annotations for named entity recognition (ner) by exploiting the text and structure of Wikipedia. Most ner systems rely on statistical models of annotated data to identify ...
Two-stage approach to named entity recognition using Wikipedia and DBpedia
IMCOM '17: Proceedings of the 11th International Conference on Ubiquitous Information Management and CommunicationIn natural language understanding, extraction of named entity (NE) mentions in given text and classification of the mentions into pre-defined NE types are important processes. Most NE recognition (NER) relies on resources such as a training corpus or NE ...
Comparison of Methods to Annotate Named Entity Corpora
The authors compared two methods for annotating a corpus for the named entity (NE) recognition task using non-expert annotators: (i) revising the results of an existing NE recognizer and (ii) manually annotating the NEs completely. The annotation time, ...
Comments