Skip to main content

2016 | OriginalPaper | Buchkapitel

Domain Adaptation with Active Learning for Named Entity Recognition

verfasst von : Huiyu Sun, Ralph Grishman, Yingchao Wang

Erschienen in: Cloud Computing and Security

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

One of the dominant problems facing Named Entity Recognition is that when a system trained on one domain is applied to a different domain, a substantial drop in performance is frequently observed. In this paper, we apply active learning strategies to domain adaptation for named entity recognition systems and show that adaptive learning combining the source and target domains is more effective than non-adaptive learning directly from the target domain. Active learning aims to minimize labeling effort by selecting the most informative instances to label. We investigate several sample selection techniques such as Maximum Entropy and Smallest Margin and apply them to the ACE corpus. Our results show that the labeling cost can be reduced by over 92 % without degrading the performance.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Jurafsky, D., Martin, J.H.: Speech and Language Processing: An introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 2nd edn. Prentice-Hall, Upper Saddle River (2009) Jurafsky, D., Martin, J.H.: Speech and Language Processing: An introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 2nd edn. Prentice-Hall, Upper Saddle River (2009)
2.
Zurück zum Zitat Shen, D., Zhang, J., Su, J., Zhou, G., Tan, C.: Multi-criteria-based active learning for named entity recognition. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, ACL 2004 (2004) Shen, D., Zhang, J., Su, J., Zhou, G., Tan, C.: Multi-criteria-based active learning for named entity recognition. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, ACL 2004 (2004)
3.
Zurück zum Zitat Becker, M., Hachey, B., Alex, B., Grove, C.: Optimising selective sampling for bootstrapping named entity recognition. In: Proceedings of the Workshop on Learning with Multiple Views, 22nd ICML, Bonn, Germany (2005) Becker, M., Hachey, B., Alex, B., Grove, C.: Optimising selective sampling for bootstrapping named entity recognition. In: Proceedings of the Workshop on Learning with Multiple Views, 22nd ICML, Bonn, Germany (2005)
4.
Zurück zum Zitat Zhu, J., Wang, H., Yao, T., Tsou, B.K.: Active learning with sampling by uncertainty and density for word sense disambiguation and text classification. In: Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), Manchester, pp. 1137–1144, August 2008 Zhu, J., Wang, H., Yao, T., Tsou, B.K.: Active learning with sampling by uncertainty and density for word sense disambiguation and text classification. In: Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), Manchester, pp. 1137–1144, August 2008
5.
Zurück zum Zitat Settles, B., Craven, M.: An analysis of active learning strategies for sequence labeling tasks. In: Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pp. 1070–1079, Honolulu, October 2008 Settles, B., Craven, M.: An analysis of active learning strategies for sequence labeling tasks. In: Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pp. 1070–1079, Honolulu, October 2008
6.
Zurück zum Zitat Xiao, M., Guo, Y.: Domain adaptation for sequence labeling tasks with a probabilistic language adaptation. In: Proceedings of the 30th International Conference on Machine Learning, Atlanta, Georgia, USA (2013) Xiao, M., Guo, Y.: Domain adaptation for sequence labeling tasks with a probabilistic language adaptation. In: Proceedings of the 30th International Conference on Machine Learning, Atlanta, Georgia, USA (2013)
7.
Zurück zum Zitat Rai, P., Saha, A., Daume III., H., Venkatasubramanian, S.: Domain adaptation meets active learning. In: Proceedings of the NAACL HLT 2010 Workshop on Active Learning for Natural Language Processing, Los Angeles, California, pp. 27–32, June 2010 Rai, P., Saha, A., Daume III., H., Venkatasubramanian, S.: Domain adaptation meets active learning. In: Proceedings of the NAACL HLT 2010 Workshop on Active Learning for Natural Language Processing, Los Angeles, California, pp. 27–32, June 2010
8.
Zurück zum Zitat Chiticariu, L., Krishnamurthy, R., Li, Y., Reiss, F., Vaithyanathan, S.: Domain adaptation of rule-based annotators for named-entity recognition tasks. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, MIT, Massachusetts, USA, pp. 1002–1012, 9–11 October 2010 Chiticariu, L., Krishnamurthy, R., Li, Y., Reiss, F., Vaithyanathan, S.: Domain adaptation of rule-based annotators for named-entity recognition tasks. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, MIT, Massachusetts, USA, pp. 1002–1012, 9–11 October 2010
9.
Zurück zum Zitat Li, L., Jin, X., Pan, S.J., Sun, J.: Multi-domain active learning for text classification. In: KDD 2012, Beijing, China, 12–16 August 2012 Li, L., Jin, X., Pan, S.J., Sun, J.: Multi-domain active learning for text classification. In: KDD 2012, Beijing, China, 12–16 August 2012
10.
Zurück zum Zitat Walker, C., Strassel, S., Medero, J., Maeda, K.: ACE 2005 Multilingual Training Corpus. Linguistic Data Consortium, Philadelphia (2006) Walker, C., Strassel, S., Medero, J., Maeda, K.: ACE 2005 Multilingual Training Corpus. Linguistic Data Consortium, Philadelphia (2006)
11.
Zurück zum Zitat Brown, P.F., Pietra, V.J.D., Desouza, P.V., Lai, J.C., Mercer, R.L.: Class-based n-gram models of natural language. Comput. Linguist. 18, 467–479 (1992) Brown, P.F., Pietra, V.J.D., Desouza, P.V., Lai, J.C., Mercer, R.L.: Class-based n-gram models of natural language. Comput. Linguist. 18, 467–479 (1992)
12.
Zurück zum Zitat Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010, Stroudsburg, PA, USA, pp. 384–394 (2010) Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010, Stroudsburg, PA, USA, pp. 384–394 (2010)
13.
Zurück zum Zitat Schein, A.I., Ungar, L.H.: Active learning for logistic regression: an evaluation. Mach. Learn. 68, 235–265 (2007)CrossRef Schein, A.I., Ungar, L.H.: Active learning for logistic regression: an evaluation. Mach. Learn. 68, 235–265 (2007)CrossRef
14.
Zurück zum Zitat Zhu, J., Wang, H., Tsou, B.K., Ma, M.: Active learning with sampling by uncertainty and density for data annotations. IEEE Trans. Audio Speech Lang. Process. 18(6), 1323–1331 (2010)CrossRef Zhu, J., Wang, H., Tsou, B.K., Ma, M.: Active learning with sampling by uncertainty and density for data annotations. IEEE Trans. Audio Speech Lang. Process. 18(6), 1323–1331 (2010)CrossRef
15.
Zurück zum Zitat Xia, Z., Wang, X., Sun, X., Wang, Q.: A secure and dynamic multi-keyword ranked search scheme over encrypted cloud data. IEEE Trans. Parallel Distrib. Syst. 27(2), 340–352 (2015)MathSciNetCrossRef Xia, Z., Wang, X., Sun, X., Wang, Q.: A secure and dynamic multi-keyword ranked search scheme over encrypted cloud data. IEEE Trans. Parallel Distrib. Syst. 27(2), 340–352 (2015)MathSciNetCrossRef
16.
Zurück zum Zitat Fu, Z., Ren, K., Shu, J., Sun, X., Huang, F.: Enabling personalized search over encrypted outsourced data with efficiency improvement. IEEE Trans. Parallel Distrib. Syst. (2015) Fu, Z., Ren, K., Shu, J., Sun, X., Huang, F.: Enabling personalized search over encrypted outsourced data with efficiency improvement. IEEE Trans. Parallel Distrib. Syst. (2015)
17.
Zurück zum Zitat Fu, Z., Sun, X., Liu, Q., Zhou, L., Shu, J.: Achieving efficient cloud search services: multi-keyword ranked search over encrypted cloud data supporting parallel computing. IEICE Trans. Commun. E98-B(1), 190–200 (2015) Fu, Z., Sun, X., Liu, Q., Zhou, L., Shu, J.: Achieving efficient cloud search services: multi-keyword ranked search over encrypted cloud data supporting parallel computing. IEICE Trans. Commun. E98-B(1), 190–200 (2015)
18.
Zurück zum Zitat Xia, Z., Wang, X., Sun, X., Liu, Q., Xiong, N.: Steganalysis of LSB matching using differences between nonadjacent pixels. Multimedia Tools Appl. 75(4), 1947–1962 (2016)CrossRef Xia, Z., Wang, X., Sun, X., Liu, Q., Xiong, N.: Steganalysis of LSB matching using differences between nonadjacent pixels. Multimedia Tools Appl. 75(4), 1947–1962 (2016)CrossRef
19.
Zurück zum Zitat Tkachenko, M., Simanovsky, A.: Named entity recognition: exploring features. In: Proceedings of KONVENS 2012 (Main Track: Oral Presentations), Vienna, 20 September 2012 Tkachenko, M., Simanovsky, A.: Named entity recognition: exploring features. In: Proceedings of KONVENS 2012 (Main Track: Oral Presentations), Vienna, 20 September 2012
20.
Zurück zum Zitat He, Y., Grishman, R.: ICE: rapid information extraction customization for NLP novices. In: Proceedings of NAACL-HLT 2015, Denver, Colorado, pp. 31–35, May 31–June 5 2015 He, Y., Grishman, R.: ICE: rapid information extraction customization for NLP novices. In: Proceedings of NAACL-HLT 2015, Denver, Colorado, pp. 31–35, May 31–June 5 2015
21.
Zurück zum Zitat Chen, B., Shu, H., Coatrieux, G., Chen, G., Sun, X., Coatrieux, J.: Color image analysis by quaternion-type moments. J. Math. Imag. Vis. 51(1), 124–144 (2015)MathSciNetCrossRefMATH Chen, B., Shu, H., Coatrieux, G., Chen, G., Sun, X., Coatrieux, J.: Color image analysis by quaternion-type moments. J. Math. Imag. Vis. 51(1), 124–144 (2015)MathSciNetCrossRefMATH
22.
Zurück zum Zitat Fu, L., Grishman, R.: An efficient active learning framework for new relation types. In: International Joint Conference on Natural Language Processing, Nagoya, Japan, October 2013 Fu, L., Grishman, R.: An efficient active learning framework for new relation types. In: International Joint Conference on Natural Language Processing, Nagoya, Japan, October 2013
23.
Zurück zum Zitat Cao, K., Li, X., Fan, M., Grishman, R.: Improving event detection with active learning. In: Proceedings of Recent Advances in Natural Language Processing (RANLP) (2015) Cao, K., Li, X., Fan, M., Grishman, R.: Improving event detection with active learning. In: Proceedings of Recent Advances in Natural Language Processing (RANLP) (2015)
24.
Zurück zum Zitat Nguyen, T., Plank, B., Grishman, R.: Semantic representations for domain adaptation: a case study on the tree Kernel-based method for relation extraction. In: Proceedings of 53rd Annual Meeting Association for Computational Linguistics (ACL) (2015) Nguyen, T., Plank, B., Grishman, R.: Semantic representations for domain adaptation: a case study on the tree Kernel-based method for relation extraction. In: Proceedings of 53rd Annual Meeting Association for Computational Linguistics (ACL) (2015)
25.
Zurück zum Zitat Tjong, E.F., Meulder, F.D.: Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL, CONLL 2003, vol. 4, pp. 142–147 (2003) Tjong, E.F., Meulder, F.D.: Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL, CONLL 2003, vol. 4, pp. 142–147 (2003)
26.
Zurück zum Zitat Xia, Z., Wang, X., Sun, X., Wang, B.: Steganalysis of least significant bit matching using multi-order differences. Secur. Commun. Netw. 7(8), 1283–1291 (2014)CrossRef Xia, Z., Wang, X., Sun, X., Wang, B.: Steganalysis of least significant bit matching using multi-order differences. Secur. Commun. Netw. 7(8), 1283–1291 (2014)CrossRef
27.
Zurück zum Zitat Sun, H., Mcintosh, S.: Big data mobile services for New York city taxi riders and drivers. In: 2016 IEEE International Conference on Mobile Services, San Francisco (to appear) Sun, H., Mcintosh, S.: Big data mobile services for New York city taxi riders and drivers. In: 2016 IEEE International Conference on Mobile Services, San Francisco (to appear)
28.
Zurück zum Zitat Li, J., Li, X., Yang, B., Sun, X.: Segmentation-based image copy-move forgery detection scheme. IEEE Trans. Inf. Forensics Secur. 10(3), 507–518 (2015)CrossRef Li, J., Li, X., Yang, B., Sun, X.: Segmentation-based image copy-move forgery detection scheme. IEEE Trans. Inf. Forensics Secur. 10(3), 507–518 (2015)CrossRef
29.
Zurück zum Zitat Gu, B., Sun, X., Sheng, V.S.: Structural minimax probability machine. IEEE Trans. Neural Netw. Learn. Syst. (2016) Gu, B., Sun, X., Sheng, V.S.: Structural minimax probability machine. IEEE Trans. Neural Netw. Learn. Syst. (2016)
30.
Zurück zum Zitat Sun, A., Grishman, R.: Cross-domain bootstrapping for named entity recognition. In: Proceedings of SIGIR 2011 Workshop on Entity-Oriented Search (EOS) (2015) Sun, A., Grishman, R.: Cross-domain bootstrapping for named entity recognition. In: Proceedings of SIGIR 2011 Workshop on Entity-Oriented Search (EOS) (2015)
31.
Zurück zum Zitat Sun, H., Grishman, R., Wang, Y.: Active learning based named entity recognition and its application in natural language coverless information hiding. J. Internet Technol. (to appear) Sun, H., Grishman, R., Wang, Y.: Active learning based named entity recognition and its application in natural language coverless information hiding. J. Internet Technol. (to appear)
Metadaten
Titel
Domain Adaptation with Active Learning for Named Entity Recognition
verfasst von
Huiyu Sun
Ralph Grishman
Yingchao Wang
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-48674-1_54