research-article

Automatic gazette creation for named entity recognition and application to resume processing

Authors:
Sachin Pawar

Tata Research Development and Design Centre, Hadapsar Industrial Estate, Pune, India

Tata Research Development and Design Centre, Hadapsar Industrial Estate, Pune, India
View Profile

,
Rajiv Srivastava

Tata Research Development and Design Centre, Hadapsar Industrial Estate, Pune, India

Tata Research Development and Design Centre, Hadapsar Industrial Estate, Pune, India
View Profile

,
Girish Keshav Palshikar

Tata Research Development and Design Centre, Hadapsar Industrial Estate, Pune, India

Tata Research Development and Design Centre, Hadapsar Industrial Estate, Pune, India
View Profile

COMPUTE '12: Proceedings of the 5th ACM COMPUTE Conference: Intelligent & scalable system technologiesJanuary 2012Article No.: 15Pages 1–7https://doi.org/10.1145/2459118.2459133

Published:23 January 2012Publication History

COMPUTE '12: Proceedings of the 5th ACM COMPUTE Conference: Intelligent & scalable system technologies

Pages 1–7

ABSTRACT

Named entities are important content-carrying units within documents. Consequently named entity recognition (NER) is an important part of information extraction. One fast and accurate approach to NER uses a list or gazette consisting of known instances. Gazette creation problem considers how to automatically create a comprehensive gazette from given unlabeled document repository. We describe an unsupervised algorithm for automatic gazette creation, which is modified from [5]. We propose a fast NER algorithm using large gazette and show that it significantly outperforms a naïve approach based on regular expressions. We describe experimental results obtained by using the system for gazette creation for various resume related named entities (e.g., ORG, DEGREE, EDUCATIONAL_INSTITUTE, DESIGNATION) and the associated NER on a large set of real-life resumes.

References

Collins, M. and Singer, Y. 1999. Unsupervised models for named entity classification. Proc. EMNLP.Google Scholar
Etzioni, O., Cafarella, M., Downey, D., Popescu, A.-M., Shaked, T., Soderland, S., Weld, D. S. and Yates, A. 2005. Unsupervised named-entity extraction from the Web: An experimental study. Artificial Intelligence, 165, pp. 91--134. Google ScholarDigital Library
Nadeau, D., Turney, P. and Matwin, S. 2006. Unsupervised named-entity recognition: generating gazetteers and resolving ambiguity. Proc. 19th Canadian Conf. Artificial Intelligence. Google ScholarDigital Library
Palshikar, G. K., 2011. Techniques for named entity recognition: a survey. TRDDC Technical Report.Google Scholar
Thelen, M. and Riloff E. 2002. A bootstrapping method for learning semantic lexicons using extraction pattern contexts. Conference on Empirical Methods in Natural Language Processing (EMNLP 2002). Google ScholarDigital Library

Index Terms

Automatic gazette creation for named entity recognition and application to resume processing
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Image and video acquisition
        3D imaging
  2. Computer graphics
    1. Animation

Recommendations

Learning multilingual named entity recognition from Wikipedia

We automatically create enormous, free and multilingual silver-standard training annotations for named entity recognition (ner) by exploiting the text and structure of Wikipedia. Most ner systems rely on statistical models of annotated data to identify ...
Read More
Two-stage approach to named entity recognition using Wikipedia and DBpedia
IMCOM '17: Proceedings of the 11th International Conference on Ubiquitous Information Management and Communication

In natural language understanding, extraction of named entity (NE) mentions in given text and classification of the mentions into pre-defined NE types are important processes. Most NE recognition (NER) relies on resources such as a training corpus or NE ...
Read More
Comparison of Methods to Annotate Named Entity Corpora

The authors compared two methods for annotating a corpus for the named entity (NE) recognition task using non-expert annotators: (i) revising the results of an existing NE recognizer and (ii) manually annotating the NEs completely. The annotation time, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
COMPUTE '12: Proceedings of the 5th ACM COMPUTE Conference: Intelligent & scalable system technologies
January 2012
146 pages
ISBN:9781450314404
DOI:10.1145/2459118
Conference Chairs:
R. K. Shyamasundar
TIFR, India
,
Lokendra Shastri
Infosys Labs, Infosys
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 January 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
gazette creation
information extraction
information retrieval
named entity extraction
named entity recognition
resume processing
Qualifiers
- research-article
Conference

Acceptance Rates
COMPUTE '12 Paper Acceptance Rate18of116submissions,16%Overall Acceptance Rate114of622submissions,18%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 9
  Total Citations
  View Citations
- 200
  Total Downloads
- Downloads (Last 12 months)11
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Automatic gazette creation for named entity recognition and application to resume processing

COMPUTE '12: Proceedings of the 5th ACM COMPUTE Conference: Intelligent & scalable system technologies

ABSTRACT

References

Cited By

Index Terms

Recommendations

Learning multilingual named entity recognition from Wikipedia

Two-stage approach to named entity recognition using Wikipedia and DBpedia

Comparison of Methods to Annotate Named Entity Corpora