Skip to main content
Top
Published in: Neural Processing Letters 1/2021

01-01-2021

Label-Embedding Bi-directional Attentive Model for Multi-label Text Classification

Authors: Naiyin Liu, Qianlong Wang, Jiangtao Ren

Published in: Neural Processing Letters | Issue 1/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Multi-label text classification is a critical task in natural language processing field. As the latest language representation model, BERT obtains new state-of-the-art results in the classification task. Nevertheless, the text classification framework of BERT neglects to make full use of the token-level text representation and label embedding, since it only utilizes the final hidden state corresponding to CLS token as sequence-level text representation for classification. We assume that the finer-grained token-level text representation and label embedding contribute to classification. Consequently, in this paper, we propose a Label-Embedding Bi-directional Attentive model to improve the performance of BERT’s text classification framework. In particular, we extend BERT’s text classification framework with label embedding and bi-directional attention. Experimental results on the five datasets indicate that our model has notable improvements over both baselines and state-of-the-art models.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Boutell MR, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recogn 37(9):1757–1771CrossRef Boutell MR, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recogn 37(9):1757–1771CrossRef
2.
go back to reference Chen G, Ye D, Xing Z, Chen J, Cambria E (2017) Ensemble application of convolutional and recurrent neural networks for multi-label text categorization. In: 2017 international joint conference on neural networks (IJCNN). IEEE, pp 2377–2383 Chen G, Ye D, Xing Z, Chen J, Cambria E (2017) Ensemble application of convolutional and recurrent neural networks for multi-label text categorization. In: 2017 international joint conference on neural networks (IJCNN). IEEE, pp 2377–2383
3.
go back to reference Clare A, King RD (2001) Knowledge discovery in multi-label phenotype data. Lect Notes Comput Sci 2168(2168):42–53CrossRef Clare A, King RD (2001) Knowledge discovery in multi-label phenotype data. Lect Notes Comput Sci 2168(2168):42–53CrossRef
4.
go back to reference Dembczynski K, Cheng W, Hüllermeier E (2010) Bayes optimal multilabel classification via probabilistic classifier chains. ICML 10:279–286 Dembczynski K, Cheng W, Hüllermeier E (2010) Bayes optimal multilabel classification via probabilistic classifier chains. ICML 10:279–286
5.
go back to reference Devlin J, Chang MW, Lee K, Toutanova K (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp 4171–4186 Devlin J, Chang MW, Lee K, Toutanova K (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp 4171–4186
7.
go back to reference Gui Y, Gao Z, Li R, Yang X (2012) Hierarchical text classification for news articles based-on named entities. In: International conference on advanced data mining and applications. Springer, pp 318–329 Gui Y, Gao Z, Li R, Yang X (2012) Hierarchical text classification for news articles based-on named entities. In: International conference on advanced data mining and applications. Springer, pp 318–329
8.
go back to reference Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1746–1751 Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1746–1751
9.
go back to reference Kingma DP, Ba JL (2015) Adam: a method for stochastic optimization. In: International conference on learning representations Kingma DP, Ba JL (2015) Adam: a method for stochastic optimization. In: International conference on learning representations
10.
go back to reference Kurata G, Xiang B, Zhou B (2016) Improved neural network-based multi-label classification with better initialization leveraging label co-occurrence. In: Proceedings of the 2016 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 521–526 Kurata G, Xiang B, Zhou B (2016) Improved neural network-based multi-label classification with better initialization leveraging label co-occurrence. In: Proceedings of the 2016 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 521–526
11.
go back to reference Li J, Ren F (2011) Creating a Chinese emotion lexicon based on corpus REN-CECPS. In: 2011 IEEE international conference on cloud computing and intelligence systems. IEEE, pp 80–84 Li J, Ren F (2011) Creating a Chinese emotion lexicon based on corpus REN-CECPS. In: 2011 IEEE international conference on cloud computing and intelligence systems. IEEE, pp 80–84
12.
go back to reference Lin J, Su Q, Yang P, Ma S, Sun X (2018) Semantic-unit-based dilated convolution for multi-label text classification. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 4554–4564 Lin J, Su Q, Yang P, Ma S, Sun X (2018) Semantic-unit-based dilated convolution for multi-label text classification. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 4554–4564
14.
go back to reference Mullenbach J, Wiegreffe S, Duke J, Sun J, Eisenstein J (2018) Explainable prediction of medical codes from clinical text. In: NAACL HLT 2018: 16th annual conference of the North American Chapter of the Association for Computational Linguistics: human language technologies, vol 1, pp 1101–1111 Mullenbach J, Wiegreffe S, Duke J, Sun J, Eisenstein J (2018) Explainable prediction of medical codes from clinical text. In: NAACL HLT 2018: 16th annual conference of the North American Chapter of the Association for Computational Linguistics: human language technologies, vol 1, pp 1101–1111
16.
go back to reference Qin K, Li C, Pavlu V, Aslam J (2019) Adapting RNN sequence prediction model to multi-label set prediction. In: Proceedings of the 2019 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies, Volume 1 (Long and Short Papers), pp 3181–3190 Qin K, Li C, Pavlu V, Aslam J (2019) Adapting RNN sequence prediction model to multi-label set prediction. In: Proceedings of the 2019 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies, Volume 1 (Long and Short Papers), pp 3181–3190
17.
go back to reference Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding with unsupervised learning. Technical report, OpenAI Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding with unsupervised learning. Technical report, OpenAI
18.
go back to reference Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Mach Learn 85(3):333MathSciNetCrossRef Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Mach Learn 85(3):333MathSciNetCrossRef
19.
go back to reference Tang J, Qu M, Mei Q (2015) PTE: Predictive text embedding through large-scale heterogeneous text networks. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1165–1174 Tang J, Qu M, Mei Q (2015) PTE: Predictive text embedding through large-scale heterogeneous text networks. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1165–1174
20.
go back to reference Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. Int J Data Warehousing Min (IJDWM) 3(3):1–13CrossRef Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. Int J Data Warehousing Min (IJDWM) 3(3):1–13CrossRef
21.
go back to reference Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008 Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
22.
go back to reference Wang B, Li C, Pavlu V, Aslam J (2018) A pipeline for optimizing f1-measure in multi-label text classification. In: 2018 17th IEEE international conference on machine learning and applications (ICMLA). IEEE, pp 913–918 Wang B, Li C, Pavlu V, Aslam J (2018) A pipeline for optimizing f1-measure in multi-label text classification. In: 2018 17th IEEE international conference on machine learning and applications (ICMLA). IEEE, pp 913–918
23.
go back to reference Wang G, Li C, Wang W, Zhang Y, Shen D, Zhang X, Henao R, Carin L (2018) Joint embedding of words and labels for text classification. Meeting Assoc Comput Linguist 1:2321–2331 Wang G, Li C, Wang W, Zhang Y, Shen D, Zhang X, Henao R, Carin L (2018) Joint embedding of words and labels for text classification. Meeting Assoc Comput Linguist 1:2321–2331
24.
go back to reference Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K et al (2016) Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K et al (2016) Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:​1609.​08144
25.
go back to reference Yang P, Sun X, Li W, Ma S, Wu W, Wang H (2018) SGM: sequence generation model for multi-label classification. In: Proceedings of the 27th international conference on computational linguistics, pp 3915–3926 Yang P, Sun X, Li W, Ma S, Wu W, Wang H (2018) SGM: sequence generation model for multi-label classification. In: Proceedings of the 27th international conference on computational linguistics, pp 3915–3926
26.
go back to reference Zhang H, Xiao L, Chen W, Wang Y, Jin Y (2018) Multi-task label embedding for text classification. In: Riloff E, Chiang D, Hockenmaier J, Tsujii J (eds) Proceedings of the 2018 conference on empirical methods in natural language processing, Brussels, Belgium, October 31–November 4, 2018. Association for Computational Linguistics, pp 4545–4553. https://www.aclweb.org/anthology/D18-1484/ Zhang H, Xiao L, Chen W, Wang Y, Jin Y (2018) Multi-task label embedding for text classification. In: Riloff E, Chiang D, Hockenmaier J, Tsujii J (eds) Proceedings of the 2018 conference on empirical methods in natural language processing, Brussels, Belgium, October 31–November 4, 2018. Association for Computational Linguistics, pp 4545–4553. https://​www.​aclweb.​org/​anthology/​D18-1484/​
27.
go back to reference Zhang ML, Zhou ZH (2006) Multilabel neural networks with applications to functional genomics and text categorization. IEEE Trans Knowl Data Eng 18(10):1338–1351CrossRef Zhang ML, Zhou ZH (2006) Multilabel neural networks with applications to functional genomics and text categorization. IEEE Trans Knowl Data Eng 18(10):1338–1351CrossRef
28.
go back to reference Zhang ML, Zhou ZH (2007) ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn 40(7):2038–2048CrossRef Zhang ML, Zhou ZH (2007) ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn 40(7):2038–2048CrossRef
Metadata
Title
Label-Embedding Bi-directional Attentive Model for Multi-label Text Classification
Authors
Naiyin Liu
Qianlong Wang
Jiangtao Ren
Publication date
01-01-2021
Publisher
Springer US
Published in
Neural Processing Letters / Issue 1/2021
Print ISSN: 1370-4621
Electronic ISSN: 1573-773X
DOI
https://doi.org/10.1007/s11063-020-10411-8

Other articles of this Issue 1/2021

Neural Processing Letters 1/2021 Go to the issue