nach oben

Neural Processing Letters

Erschienen in:

12.01.2022

LTP: A New Active Learning Strategy for CRF-Based Named Entity Recognition

verfasst von: Mingyi Liu, Zhiying Tu, Tong Zhang, Tonghua Su, Xiaofei Xu, Zhongjie Wang

Erschienen in: Neural Processing Letters | Ausgabe 3/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In recent years, deep learning has achieved great success in many natural language processing tasks, including named entity recognition. The shortcoming is that a large quantity of manually annotated data is usually required. Previous studies have demonstrated that active learning can considerably reduce the cost of data annotation, but there is still plenty of room for improvement. In real applications, we found that existing uncertainty-based active learning strategies have two shortcomings. First, these strategies prefer to choose long sequences explicitly or implicitly, which increases the annotation burden of annotators. Second, some strategies need to revise and modify the model to generate additional information for sample selection, which increases the workload of the developer and increases the training/prediction time of the model. In this paper, we first examine traditional active learning strategies in specific cases of Word2Vec-BiLSTM-CRF and Bert-CRF that have been widely used in named entity recognition on several typical datasets. Then, we propose an uncertainty-based active learning strategy called the lowest token probability (LTP), which combines the input and output of conditional random field (CRF) to select informative instances. LTP is a simple and powerful strategy that does not favor long sequences and does not need to revise the model. We test LTP on multiple real-world datasets, the experiment results show that compared with existing state-of-the-art selection strategies, LTP can reduce about 20% annotation tokens while maintaining competitive performance on both sentence-level accuracy and entity-level F1-score. Additionally, LTP significantly outperformed all other strategies in selecting valid samples, which dramatically reduced the invalid annotation times of the labelers.

Vorheriger Artikel TriEP: Expansion-Pool TriHard Loss for Person Re-Identification

Nächster Artikel Multi-scale Learning for Multimodal Neurophysiological Signals: Gait Pattern Classification as an Example

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Awasthi P, Balcan MF, Long PM (2014) The power of localization for efficiently learning linear separators with noise. In: Proceedings of the forty-sixth annual ACM symposium on Theory of computing, pp 449–458. ACM

Boreshban Y, Mirbostani SM, Ghassem-Sani G, Mirroshandel SA, Amiriparian S (2021) Improving question answering performance using knowledge distillation and active learning. arXiv preprint arXiv:2109.12662

Chen Y, Lasko TA, Mei Q, Denny JC, Xu H (2015) A study of active learning methods for named entity recognition in clinical text. J Biomed Inform 58:11–18CrossRef

Chiu JP, Nichols E (2016) Named entity recognition with bidirectional lstm-cnns. Trans Assoc Comput Linguist 4:357–370CrossRef

Claveau V, Kijak E (2018) Strategies to select examples for active learning with conditional random fields. In: Gelbukh A (ed) Computational linguistics and intelligent text processing. Springer International Publishing, Cham, pp 30–43CrossRef

Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–2537MATH

Culotta A, McCallum A (2005) Reducing labeling effort for structured prediction tasks. In: AAAI, vol 5, pp 746–751

Dasgupta S, Kalai AT, Monteleoni C (2005) Analysis of perceptron-based active learning. In: International conference on computational learning theory, pp 249–263. Springer

Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805

10.

Gal Y, Ghahramani Z (2016) A theoretically grounded application of dropout in recurrent neural networks. In: Advances in neural information processing systems, pp 1019–1027

11.

Gal Y, Islam R, Ghahramani Z (2017) Deep bayesian active learning with image data. In: International conference on machine Learning, pp 1183–1192

12.

Huang Z, Xu W, Yu K (2015) Bidirectional lstm-crf models for sequence tagging. arXiv preprint arXiv:1508.01991

13.

Kim S, Song Y, Kim K, Cha JW, Lee GG (2006) Mmr-based active machine learning for bio named entity recognition. In: Proceedings of the human language technology conference of the NAACL, Companion Volume: Short Papers, pp 69–72. Association for Computational Linguistics

14.

Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C (2016) Neural architectures for named entity recognition. In: Proceedings of NAACL-HLT, pp 260–270

15.

Lewis DD, Catlett J (1994) Heterogeneous uncertainty sampling for supervised learning. In: Machine learning proceedings 1994, pp 148–156. Elsevier

16.

Li J, Sun A, Han J, Li C (2020) A survey on deep learning for named entity recognition. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2020.2981314CrossRef

17.

Li S, Zhao Z, Hu R, Li W, Liu T, Du X (2018) Analogical reasoning on chinese morphological and semantic relations. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 2: Short Papers), pp 138–143

18.

Limsopatham N, Collier NH (2016) Bidirectional lstm for named entity recognition in twitter messages

19.

Lyu Z, Duolikun D, Dai B, Yao Y, Minervini P, Xiao TZ, Gal Y (2020) You need only uncertain answers: Data efficient multilingual question answering. In: TWorkshop on Uncertainty and Ro-Bustness in Deep Learning

20.

Marcheggiani D, Artières T (2014) An experimental comparison of active learning strategies for partially labeled sequences. In: EMNLP

21.

Mesnil G, He X, Deng L, Bengio Y (2013) Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding. In: Interspeech, pp 3771–3775

22.

Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119

23.

Min K, Ma C, Zhao T, Li H (2015) Bosonnlp: An ensemble approach for word segmentation and pos tagging. In: Natural language processing and chinese computing, pp 520–526. Springer

24.

Nguyen TH, Sil A, Dinu G, Florian R (2016) Toward mention detection robustness with recurrent neural networks. arXiv preprint arXiv:1602.07749

25.

Peng N, Dredze M (2015) Named entity recognition for chinese social media with jointly trained embeddings. In: Processings of the conference on empirical methods in natural language processing (EMNLP), pp 548—554

26.

Peng N, Dredze M (2016) Improving named entity recognition for chinese social media with word segmentation representation learning. In: Proceedings of the 54th annual meeting of the association for computational linguistics (ACL), vol 2, pp 149–155

27.

Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543

28.

Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. arXiv preprint arXiv:1802.05365

29.

Qiu X, Qian P, Yin L, Wu S, Huang X (2015) Overview of the nlpcc 2015 shared task: Chinese word segmentation and pos tagging for micro-blog texts. In: Natural language processing and chinese computing, pp 541–549. Springer

30.

Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pre-training. https://s3-us-west-2.amazonaws.com/openai-assets/researchcovers/languageunsupervised/language understanding paper.pdf

31.

Ritter A, Clark S, Mausam Etzioni O (2011) Named entity recognition in tweets: An experimental study. In: EMNLP

32.

Rosenstein MT, Marx Z, Kaelbling LP, Dietterich TG (2005) To transfer or not to transfer. In: NIPS 2005 workshop on transfer learning, vol 898, pp 1–4

33.

Scheffer T, Decomain C, Wrobel S (2001) Active hidden markov models for information extraction. In: International symposium on intelligent data analysis, pp 309–318. Springer

34.

Settles B, Craven M (2008) An analysis of active learning strategies for sequence labeling tasks. In: Proceedings of the conference on empirical methods in natural language processing, pp 1070–1079. Association for Computational Linguistics

35.

Seung HS, Opper M, Sompolinsky H (1992) Query by committee. In: Proceedings of the fifth annual workshop on Computational learning theory, pp 287–294. ACM

36.

Shen Y, Yun H, Lipton ZC, Kronrod Y, Anandkumar A (2017) Deep active learning for named entity recognition. arXiv preprint arXiv:1707.05928

37.

Siddhant A, Lipton ZC (2018) Deep bayesian active learning for natural language processing: Results of a large-scale empirical study. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 2904–2909

38.

Strubell E, Verga P, Belanger D, McCallum A (2017) Fast and accurate entity recognition with iterated dilated convolutions. arXiv preprint arXiv:1702.02098

39.

Tjong Kim Sang EF, De Meulder F (2003) Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, pp 142–147. https://aclanthology.org/W03-0419

40.

Vandoni J, Aldea E, Le Hégarat-Mascle S (2019) Evidential query-by-committee active learning for pedestrian detection in high-density crowds. Int J Approx Reason 104:166–184MathSciNetCrossRef

41.

Wei K, Iyer R, Bilmes J (2015) Submodularity in data subset selection and active learning. In: International conference on machine learning, pp 1954–1963

42.

Weischedel R, Pradhan S, Ramshaw L, Kaufman J, Franchini M, El-Bachouti M, Xue N, Palmer M, Hwang JD, Bonial C, et al (2012) Ontonotes release 5.0

43.

Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, Cistac P, Rault T, Louf R, Funtowicz M, Davison J, Shleifer S, von Platen P, Ma C, Jernite Y, Plu J, Xu C, Scao TL, Gugger S, Drame M, Lhoest Q, Rush AM (2020) Transformers: State-of-the-art natural language processing. In: Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, pp 38–45. Association for Computational Linguistics, Online. https://www.aclweb.org/anthology/2020.emnlp-demos.6

44.

Yang Z, Salakhutdinov R, Cohen W (2016) Multi-task cross-lingual sequence tagging from scratch. arXiv preprint arXiv:1603.06270

45.

Zhang Y, Lan M (2021) A unified information extraction system based on role recognition and combination. In: CCF international conference on natural language processing and chinese computing, pp 447–459. Springer

Titel: LTP: A New Active Learning Strategy for CRF-Based Named Entity Recognition
verfasst von: Mingyi Liu
Zhiying Tu
Tong Zhang
Tonghua Su
Xiaofei Xu
Zhongjie Wang
Publikationsdatum: 12.01.2022
Verlag: Springer US
Erschienen in: Neural Processing Letters / Ausgabe 3/2022
Print ISSN: 1370-4621
Elektronische ISSN: 1573-773X
DOI: https://doi.org/10.1007/s11063-021-10737-x

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Arbeitszeit/© granata68 / Fotolia, E-Autos im Fuhrpark: Lohnt sich das noch?/© Petair / stock.adobe.com, Kryptowährungen/© gopixa / Getty Images / iStock, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Sustainibility Finance/© Robert Kneschke / stock.adobe.com / Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 3/2022

A Novel Chinese Points of Interest Classification Method Based on Weighted Quadratic Surface Support Vector Machine

Facial Expression Recognition Based on Depth Fusion and Discriminative Association Learning

Effects of Similarity Score Functions in Attention Mechanisms on the Performance of Neural Question Answering Systems

Fixed-Time Synchronization of Neural Networks with Parameter Uncertainties via Quantized Intermittent Control

Multi-scale Learning for Multimodal Neurophysiological Signals: Gait Pattern Classification as an Example

Non-Interlaced Dynamic Time Warping for Distance Between Matrixes

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.