Article

Free Access

Multi-criteria-based active learning for named entity recognition

Authors:
Dan Shen

Institute for Infocomm Technology, Singapore and National University of Singapore, Singapore and Universität des Saarlandes, Germany

Institute for Infocomm Technology, Singapore and National University of Singapore, Singapore and Universität des Saarlandes, Germany
View Profile

,
Jie Zhang

Institute for Infocomm Technology, Singapore and National University of Singapore, Singapore

Institute for Infocomm Technology, Singapore and National University of Singapore, Singapore
View Profile

,
Jian Su

Institute for Infocomm Technology, Singapore

Institute for Infocomm Technology, Singapore
View Profile

,
Guodong Zhou

Institute for Infocomm Technology, Singapore

Institute for Infocomm Technology, Singapore
View Profile

,
Chew-Lim Tan

National University of Singapore, Singapore

National University of Singapore, Singapore
View Profile

ACL '04: Proceedings of the 42nd Annual Meeting on Association for Computational LinguisticsJuly 2004Pages 589–eshttps://doi.org/10.3115/1218955.1219030

Published:21 July 2004Publication History

ACL '04: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics

Pages 589–es

ABSTRACT

In this paper, we propose a multi-criteria-based active learning approach and effectively apply it to named entity recognition. Active learning targets to minimize the human annotation efforts by selecting examples for labeling. To maximize the contribution of the selected examples, we consider the multiple criteria: informativeness, representativeness and diversity and propose measures to quantify them. More comprehensively, we incorporate all the criteria using two selection strategies, both of which result in less labeling cost than single-criterion-based method. The results of the named entity recognition in both MUC-6 and GENIA show that the labeling cost can be reduced by at least 80% without degrading the performance.

References

R. Baeza-Yates and B. Ribeiro-Neto. 1999. Modern Information Retrieval. ISBN 0-201-39829-X. Google ScholarDigital Library
K. Brinker. 2003. Incorporating Diversity in Active Learning with Support Vector Machines. In Proceedings of ICML, 2003.Google Scholar
S. A. Engelson and I. Dagan. 1999. Committee-Based Sample Selection for Probabilistic Classifiers. Journal of Artifical Intelligence Research.Google Scholar
F. Jelinek. 1997. Statistical Methods for Speech Recognition. MIT Press. Google ScholarDigital Library
J. Kazama, T. Makino, Y. Ohta and J. Tsujii. 2002. Tuning Support Vector Machines for Biomedical Named Entity Recognition. In Proceedings of the ACL2002 Workshop on NLP in Biomedicine. Google ScholarDigital Library
K. J. Lee, Y. S. Hwang and H. C. Rim. 2003. Two-Phase Biomedical NE Recognition based on SVMs. In Proceedings of the ACL2003 Workshop on NLP in Biomedicine. Google ScholarDigital Library
D. D. Lewis and J. Catlett. 1994. Heterogeneous Uncertainty Sampling for Supervised Learning. In Proceedings of ICML, 1994.Google Scholar
A. McCallum and K. Nigam. 1998. Employing EM in Pool-Based Active Learning for Text Classification. In Proceedings of ICML, 1998. Google ScholarDigital Library
G. Ngai and D. Yarowsky. 2000. Rule Writing or Annotation: Cost-efficient Resource Usage for Base Noun Phrase Chunking. In Proceedings of ACL, 2000. Google ScholarDigital Library
T. Ohta, Y. Tateisi, J. Kim, H. Mima and J. Tsujii. 2002. The GENIA corpus: An annotated research abstract corpus in molecular biology domain. In Proceedings of HLT 2002. Google ScholarDigital Library
L. R. Rabiner, A. E. Rosenberg and S. E. Levinson. 1978. Considerations in Dynamic Time Warping Algorithms for Discrete Word Recognition. In Proceedings of IEEE Transactions on acoustics, speech and signal processing. Vol. ASSP-26, NO.6.Google Scholar
D. Schohn and D. Cohn. 2000. Less is More: Active Learning with Support Vector Machines. In Proceedings of the 17th International Conference on Machine Learning. Google ScholarDigital Library
D. Shen, J. Zhang, G. D. Zhou, J. Su and C. L. Tan. 2003. Effective Adaptation of a Hidden Markov Model-based Named Entity Recognizer for Bio-medical Domain. In Proceedings of the ACL2003 Workshop on NLP in Biomedicine. Google ScholarDigital Library
M. Steedman, R. Hwa, S. Clark, M. Osborne, A. Sarkar, J. Hockenmaier, P. Ruhlen, S. Baker and J. Crim. 2003. Example Selection for Bootstrapping Statistical Parsers. In Proceedings of HLTNAACL, 2003. Google ScholarDigital Library
M. Tang, X. Luo and S. Roukos. 2002. Active Learning for Statistical Natural Language Parsing. In Proceedings of the ACL 2002. Google ScholarDigital Library
C. A. Thompson, M. E. Califf and R. J. Mooney. 1999. Active Learning for Natural Language Parsing and Information Extraction. In Proceedings of ICML 1999. Google ScholarDigital Library
S. Tong and D. Koller. 2000. Support Vector Machine Active Learning with Applications to Text Classification. Journal of Machine Learning Research. Google ScholarDigital Library
V. Vapnik. 1998. Statistical learning theory. N.Y.:John Wiley. Google ScholarDigital Library

Recommendations

Bagging-based active learning model for named entity recognition with distant supervision
BIGCOMP '16: Proceedings of the 2016 International Conference on Big Data and Smart Computing (BigComp)

Named entity recognition (NER) is a preliminary step to performing information extraction and question answering. Most previous studies on NER have been based on supervised machine learning methods that need a large amount of human-annotated training ...
Read More
Learning multilingual named entity recognition from Wikipedia

We automatically create enormous, free and multilingual silver-standard training annotations for named entity recognition (ner) by exploiting the text and structure of Wikipedia. Most ner systems rely on statistical models of annotated data to identify ...
Read More
Boosted Web Named Entity Recognition via Tri-Training
TALLIP Notes and Regular Papers

Named entity extraction is a fundamental task for many natural language processing applications on the web. Existing studies rely on annotated training data, which is quite expensive to obtain large datasets, limiting the effectiveness of recognition. In ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ACL '04: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
July 2004
729 pages
General Chair:
Donia Scott
ITRI, University of Brighton
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 21 July 2004
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate85of443submissions,19%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 44
  Total Citations
  View Citations
- 1,059
  Total Downloads
- Downloads (Last 12 months)15
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Multi-criteria-based active learning for named entity recognition

ACL '04: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics

ABSTRACT

References

Cited By

Recommendations

Bagging-based active learning model for named entity recognition with distant supervision

Learning multilingual named entity recognition from Wikipedia

Boosted Web Named Entity Recognition via Tri-Training

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Multi-criteria-based active learning for named entity recognition

ACL '04: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics

ABSTRACT

References

Cited By

Recommendations

Bagging-based active learning model for named entity recognition with distant supervision

Learning multilingual named entity recognition from Wikipedia

Boosted Web Named Entity Recognition via Tri-Training

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media