skip to main content
10.3115/1218955.1219030dlproceedingsArticle/Chapter ViewAbstractPublication PagesaclConference Proceedingsconference-collections
Article
Free Access

Multi-criteria-based active learning for named entity recognition

Authors Info & Claims
Published:21 July 2004Publication History

ABSTRACT

In this paper, we propose a multi-criteria-based active learning approach and effectively apply it to named entity recognition. Active learning targets to minimize the human annotation efforts by selecting examples for labeling. To maximize the contribution of the selected examples, we consider the multiple criteria: informativeness, representativeness and diversity and propose measures to quantify them. More comprehensively, we incorporate all the criteria using two selection strategies, both of which result in less labeling cost than single-criterion-based method. The results of the named entity recognition in both MUC-6 and GENIA show that the labeling cost can be reduced by at least 80% without degrading the performance.

References

  1. R. Baeza-Yates and B. Ribeiro-Neto. 1999. Modern Information Retrieval. ISBN 0-201-39829-X. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. K. Brinker. 2003. Incorporating Diversity in Active Learning with Support Vector Machines. In Proceedings of ICML, 2003.Google ScholarGoogle Scholar
  3. S. A. Engelson and I. Dagan. 1999. Committee-Based Sample Selection for Probabilistic Classifiers. Journal of Artifical Intelligence Research.Google ScholarGoogle Scholar
  4. F. Jelinek. 1997. Statistical Methods for Speech Recognition. MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Kazama, T. Makino, Y. Ohta and J. Tsujii. 2002. Tuning Support Vector Machines for Biomedical Named Entity Recognition. In Proceedings of the ACL2002 Workshop on NLP in Biomedicine. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. K. J. Lee, Y. S. Hwang and H. C. Rim. 2003. Two-Phase Biomedical NE Recognition based on SVMs. In Proceedings of the ACL2003 Workshop on NLP in Biomedicine. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. D. Lewis and J. Catlett. 1994. Heterogeneous Uncertainty Sampling for Supervised Learning. In Proceedings of ICML, 1994.Google ScholarGoogle Scholar
  8. A. McCallum and K. Nigam. 1998. Employing EM in Pool-Based Active Learning for Text Classification. In Proceedings of ICML, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. Ngai and D. Yarowsky. 2000. Rule Writing or Annotation: Cost-efficient Resource Usage for Base Noun Phrase Chunking. In Proceedings of ACL, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. T. Ohta, Y. Tateisi, J. Kim, H. Mima and J. Tsujii. 2002. The GENIA corpus: An annotated research abstract corpus in molecular biology domain. In Proceedings of HLT 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. L. R. Rabiner, A. E. Rosenberg and S. E. Levinson. 1978. Considerations in Dynamic Time Warping Algorithms for Discrete Word Recognition. In Proceedings of IEEE Transactions on acoustics, speech and signal processing. Vol. ASSP-26, NO.6.Google ScholarGoogle Scholar
  12. D. Schohn and D. Cohn. 2000. Less is More: Active Learning with Support Vector Machines. In Proceedings of the 17th International Conference on Machine Learning. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Shen, J. Zhang, G. D. Zhou, J. Su and C. L. Tan. 2003. Effective Adaptation of a Hidden Markov Model-based Named Entity Recognizer for Bio-medical Domain. In Proceedings of the ACL2003 Workshop on NLP in Biomedicine. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Steedman, R. Hwa, S. Clark, M. Osborne, A. Sarkar, J. Hockenmaier, P. Ruhlen, S. Baker and J. Crim. 2003. Example Selection for Bootstrapping Statistical Parsers. In Proceedings of HLTNAACL, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Tang, X. Luo and S. Roukos. 2002. Active Learning for Statistical Natural Language Parsing. In Proceedings of the ACL 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. A. Thompson, M. E. Califf and R. J. Mooney. 1999. Active Learning for Natural Language Parsing and Information Extraction. In Proceedings of ICML 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Tong and D. Koller. 2000. Support Vector Machine Active Learning with Applications to Text Classification. Journal of Machine Learning Research. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. V. Vapnik. 1998. Statistical learning theory. N.Y.:John Wiley. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image DL Hosted proceedings
    ACL '04: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
    July 2004
    729 pages

    Publisher

    Association for Computational Linguistics

    United States

    Publication History

    • Published: 21 July 2004

    Qualifiers

    • Article

    Acceptance Rates

    Overall Acceptance Rate85of443submissions,19%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader