Skip to main content
Erschienen in: Journal of Visualization 3/2022

18.11.2021 | Regular Paper

CNERVis: a visual diagnosis tool for Chinese named entity recognition

verfasst von: Pei-Shan Lo, Jian-Lin Wu, Syu-Ting Deng, Ko-Chih Wang

Erschienen in: Journal of Visualization | Ausgabe 3/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Named entity recognition (NER) is a crucial initial task that identifies both spans and types of named entities to extract the specific information, such as organization, person, location, and time. Nowadays, the NER task achieves state-of-the-art performance by deep learning approaches for capturing contextual features. However, the complex structures of deep learning make a black-box problem and limit researchers’ ability to improve it. Unlike the Latin alphabet, Chinese (or other languages such as Korean and Japanese) do not have an explicit word boundary. Therefore, some preliminary works, such as word segmentation (WS) and part-of-speech tagging (POS), are needed before the Chinese NER task. The correctness of preliminary works importantly influences the final NER prediction. Thus, investigating the model behavior of the Chinese NER task becomes more complicated and challenging. In this paper, we present CNERVis, a visual analysis tool that allows users to interactively inspect the WS-POS-NER pipeline and understand how and why a NER prediction is made. Also, CNERVis allows users to load the numerous testing data and explores the critical instances to facilitate the analysis from large datasets. Our tool’s usability and effectiveness are demonstrated through case studies.

Graphic abstract

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
Zurück zum Zitat Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, et al. (2016) Tensorflow: a system for large-scale machine learning. In: 12th USENIX symposium on operating systems design and implementation, pp 265–283 Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, et al. (2016) Tensorflow: a system for large-scale machine learning. In: 12th USENIX symposium on operating systems design and implementation, pp 265–283
Zurück zum Zitat Chiu JP, Nichols E (2016) Named entity recognition with bidirectional LSTM-CNNs. Trans Assoc Comput Linguist 4:357–370CrossRef Chiu JP, Nichols E (2016) Named entity recognition with bidirectional LSTM-CNNs. Trans Assoc Comput Linguist 4:357–370CrossRef
Zurück zum Zitat Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:14061078 Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:​14061078
Zurück zum Zitat Cui J, Long J, Min E, Mao Y (2018) Wedl-nids: improving network intrusion detection using word embedding-based deep learning method. In: International conference on modeling decisions for artificial intelligence. Springer, pp 283–295 Cui J, Long J, Min E, Mao Y (2018) Wedl-nids: improving network intrusion detection using word embedding-based deep learning method. In: International conference on modeling decisions for artificial intelligence. Springer, pp 283–295
Zurück zum Zitat Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805 Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:​181004805
Zurück zum Zitat Endert A, Ribarsky W, Turkay C, Wong BW, Nabney I, Blanco ID, Rossi F (2017) The state of the art in integrating machine learning into visual analytics. Comput Graph Forum, Wiley Online Libr 36:458–486CrossRef Endert A, Ribarsky W, Turkay C, Wong BW, Nabney I, Blanco ID, Rossi F (2017) The state of the art in integrating machine learning into visual analytics. Comput Graph Forum, Wiley Online Libr 36:458–486CrossRef
Zurück zum Zitat Ethayarajh K (2019) How contextual are contextualized word representations? comparing the geometry of bert, elmo, and gpt-2 embeddings. arXiv preprint arXiv:190900512 Ethayarajh K (2019) How contextual are contextualized word representations? comparing the geometry of bert, elmo, and gpt-2 embeddings. arXiv preprint arXiv:​190900512
Zurück zum Zitat Gargiulo F, Silvestri S, Ciampi M, De Pietro G (2019) Deep neural network for hierarchical extreme multi-label text classification. Appl Soft Comput 79:125–138CrossRef Gargiulo F, Silvestri S, Ciampi M, De Pietro G (2019) Deep neural network for hierarchical extreme multi-label text classification. Appl Soft Comput 79:125–138CrossRef
Zurück zum Zitat Gers FA, Schmidhuber J, Cummins F (2000) Learning to forget: Continual prediction with LSTM. Neural Comput 12(10):2451–2471CrossRef Gers FA, Schmidhuber J, Cummins F (2000) Learning to forget: Continual prediction with LSTM. Neural Comput 12(10):2451–2471CrossRef
Zurück zum Zitat Gillick D, Lazic N, Ganchev K, Kirchner J, Huynh D (2014) Context-dependent fine-grained entity type tagging. arXiv preprint arXiv:14121820 Gillick D, Lazic N, Ganchev K, Kirchner J, Huynh D (2014) Context-dependent fine-grained entity type tagging. arXiv preprint arXiv:​14121820
Zurück zum Zitat Grinberg M (2018) Flask web development: developing web applications with python. O’Reilly Media, Inc. Grinberg M (2018) Flask web development: developing web applications with python. O’Reilly Media, Inc.
Zurück zum Zitat Hovy E, Marcus M, Palmer M, Ramshaw L, Weischedel R (2006) Ontonotes: the 90% solution. In: Proceedings of the human language technology conference of the NAACL, Companion Volume: Short Papers, pp 57–60 Hovy E, Marcus M, Palmer M, Ramshaw L, Weischedel R (2006) Ontonotes: the 90% solution. In: Proceedings of the human language technology conference of the NAACL, Companion Volume: Short Papers, pp 57–60
Zurück zum Zitat Karpathy A, Johnson J, Fei-Fei L (2015) Visualizing and understanding recurrent networks. arXiv preprint arXiv:150602078 Karpathy A, Johnson J, Fei-Fei L (2015) Visualizing and understanding recurrent networks. arXiv preprint arXiv:​150602078
Zurück zum Zitat Li J, Chen X, Hovy E, Jurafsky D (2015) Visualizing and understanding neural models in NLP. arXiv preprint arXiv:150601066 Li J, Chen X, Hovy E, Jurafsky D (2015) Visualizing and understanding neural models in NLP. arXiv preprint arXiv:​150601066
Zurück zum Zitat Li PH, Fu TJ, Ma WY (2020b) Why attention? Analyze BiLSTM deficiency and its remedies in the case of NER. Proc AAAI Conf Artif Intell 34:8236–8244 Li PH, Fu TJ, Ma WY (2020b) Why attention? Analyze BiLSTM deficiency and its remedies in the case of NER. Proc AAAI Conf Artif Intell 34:8236–8244
Zurück zum Zitat Liu S, Wang X, Liu M, Zhu J (2017) Towards better analysis of machine learning models: a visual analytics perspective. Vis Inf 1(1):48–56 Liu S, Wang X, Liu M, Zhu J (2017) Towards better analysis of machine learning models: a visual analytics perspective. Vis Inf 1(1):48–56
Zurück zum Zitat McInnes L, Healy J, Melville J (2018) Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:180203426 McInnes L, Healy J, Melville J (2018) Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:​180203426
Zurück zum Zitat Ming Y, Cao S, Zhang R, Li Z, Chen Y, Song Y, Qu H (2017) Understanding hidden memories of recurrent neural networks. In: 2017 IEEE conference on visual analytics science and technology (VAST). IEEE, pp 13–24 Ming Y, Cao S, Zhang R, Li Z, Chen Y, Song Y, Qu H (2017) Understanding hidden memories of recurrent neural networks. In: 2017 IEEE conference on visual analytics science and technology (VAST). IEEE, pp 13–24
Zurück zum Zitat Pham TH, Le-Hong P (2017) End-to-end recurrent neural network models for Vietnamese named entity recognition: word-level vs. character-level. In: International conference of the Pacific association for computational linguistics. Springer, pp 219–232 Pham TH, Le-Hong P (2017) End-to-end recurrent neural network models for Vietnamese named entity recognition: word-level vs. character-level. In: International conference of the Pacific association for computational linguistics. Springer, pp 219–232
Zurück zum Zitat Reiss F, Xu H, Cutler B, Muthuraman K, Eichenberger Z (2020) Identifying incorrect labels in the CoNLL-2003 corpus. In: Proceedings of the 24th conference on computational natural language learning, pp 215–226 Reiss F, Xu H, Cutler B, Muthuraman K, Eichenberger Z (2020) Identifying incorrect labels in the CoNLL-2003 corpus. In: Proceedings of the 24th conference on computational natural language learning, pp 215–226
Zurück zum Zitat Shneiderman B (2003) The eyes have it: a task by data type taxonomy for information visualizations. In: The craft of information visualization. Elsevier, pp 364–371 Shneiderman B (2003) The eyes have it: a task by data type taxonomy for information visualizations. In: The craft of information visualization. Elsevier, pp 364–371
Zurück zum Zitat Strobelt H, Gehrmann S, Pfister H, Rush AM (2017) Lstmvis: a tool for visual analysis of hidden state dynamics in recurrent neural networks. IEEE Trans Vis Comput Graph 24(1):667–676CrossRef Strobelt H, Gehrmann S, Pfister H, Rush AM (2017) Lstmvis: a tool for visual analysis of hidden state dynamics in recurrent neural networks. IEEE Trans Vis Comput Graph 24(1):667–676CrossRef
Zurück zum Zitat Strobelt H, Gehrmann S, Behrisch M, Perer A, Pfister H, Rush AM (2018) S eq 2s eq-v is: a visual debugging tool for sequence-to-sequence models. IEEE Trans Vis Comput Graph 25(1):353–363CrossRef Strobelt H, Gehrmann S, Behrisch M, Perer A, Pfister H, Rush AM (2018) S eq 2s eq-v is: a visual debugging tool for sequence-to-sequence models. IEEE Trans Vis Comput Graph 25(1):353–363CrossRef
Zurück zum Zitat Wang Z, Shang J, Liu L, Lu L, Liu J, Han J (2019) Crossweigh: training named entity tagger from imperfect annotations. arXiv preprint arXiv:190901441 Wang Z, Shang J, Liu L, Lu L, Liu J, Han J (2019) Crossweigh: training named entity tagger from imperfect annotations. arXiv preprint arXiv:​190901441
Zurück zum Zitat Xia X, Roppel T, Hung JY, Zhang J, Periaswamy SC, Patton J (2020) Environmental complexity measurement using Shannon entropy. In: 2020 SoutheastCon. IEEE, pp 1–6 Xia X, Roppel T, Hung JY, Zhang J, Periaswamy SC, Patton J (2020) Environmental complexity measurement using Shannon entropy. In: 2020 SoutheastCon. IEEE, pp 1–6
Zurück zum Zitat Yadav V, Bethard S (2019) A survey on recent advances in named entity recognition from deep learning models. arXiv preprint arXiv:191011470 Yadav V, Bethard S (2019) A survey on recent advances in named entity recognition from deep learning models. arXiv preprint arXiv:​191011470
Zurück zum Zitat Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision. Springer, pp 818–833 Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision. Springer, pp 818–833
Zurück zum Zitat Zhao Y, Luo F, Chen M, Wang Y, Xia J, Zhou F, Wang Y, Chen Y, Chen W (2018) Evaluating multi-dimensional visualizations for understanding fuzzy clusters. IEEE Trans Vis Comput Graph 25(1):12–21CrossRef Zhao Y, Luo F, Chen M, Wang Y, Xia J, Zhou F, Wang Y, Chen Y, Chen W (2018) Evaluating multi-dimensional visualizations for understanding fuzzy clusters. IEEE Trans Vis Comput Graph 25(1):12–21CrossRef
Zurück zum Zitat Zhu Y, Wang G, Karlsson BF (2019) CAN-NER: Convolutional attention network for Chinese named entity recognition. arXiv preprint arXiv:190402141 Zhu Y, Wang G, Karlsson BF (2019) CAN-NER: Convolutional attention network for Chinese named entity recognition. arXiv preprint arXiv:​190402141
Metadaten
Titel
CNERVis: a visual diagnosis tool for Chinese named entity recognition
verfasst von
Pei-Shan Lo
Jian-Lin Wu
Syu-Ting Deng
Ko-Chih Wang
Publikationsdatum
18.11.2021
Verlag
Springer Berlin Heidelberg
Erschienen in
Journal of Visualization / Ausgabe 3/2022
Print ISSN: 1343-8875
Elektronische ISSN: 1875-8975
DOI
https://doi.org/10.1007/s12650-021-00799-3

Weitere Artikel der Ausgabe 3/2022

Journal of Visualization 3/2022 Zur Ausgabe