Skip to main content
Top

2019 | OriginalPaper | Chapter

How Many Labels? Determining the Number of Labels in Multi-Label Text Classification

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Multi-Label Text Classification (MLTC) is a supervised machine learning task in which the goal is to learn a classifier that assigns multiple labels to text documents. When all documents have the same number of labels, this task is very close to ordinary (single label) text classification. However, in case this number varies another classifier needs to determine, for each document, how many labels to assign. The topic of this paper is exactly this additional classifier. We compare several baselines to a system which learns a dynamic threshold for a given text classifier. The thresholding classifier receives the ranked list of scores for each label for a document as input and returns a threshold score. All labels with a score higher than this threshold will then be assigned to the document. Our results show that, first, this dynamic thresholding significantly improves recall but has the same precision as a static system which assigns the same (the mean) number of classes to each document, and second, that the accuracy of predicting the number of classes is positively related to the quality (measured by MAP) of the text classifier.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Babbar, R., Schölkopf, B.: Dismec: distributed sparse machines for extreme multi-label classification. In: WSDM 2017, pp. 721–729 (2017) Babbar, R., Schölkopf, B.: Dismec: distributed sparse machines for extreme multi-label classification. In: WSDM 2017, pp. 721–729 (2017)
2.
go back to reference Bi, W., Kwok, J.T.: Multi-label classification on tree and dag-structured hierarchies. In: ICML 2011, pp. 17–24 (2011) Bi, W., Kwok, J.T.: Multi-label classification on tree and dag-structured hierarchies. In: ICML 2011, pp. 17–24 (2011)
3.
go back to reference Bi, W., Kwok, J.T.: Efficient multi-label classification with many labels. In: Proceedings of the 30th International Conference on Machine Learning, ICML 2013, pp. 405–413 (2013) Bi, W., Kwok, J.T.: Efficient multi-label classification with many labels. In: Proceedings of the 30th International Conference on Machine Learning, ICML 2013, pp. 405–413 (2013)
5.
go back to reference Dehghani, M., Azarbonyad, H., Kamps, J., Marx, M.: On horizontal and vertical separation in hierarchical text classification. In: ICTIR 2016, pp. 185–194 (2016) Dehghani, M., Azarbonyad, H., Kamps, J., Marx, M.: On horizontal and vertical separation in hierarchical text classification. In: ICTIR 2016, pp. 185–194 (2016)
6.
go back to reference Elisseeff, A., Weston, J.: A kernel method for multi-labelled classification. In: NIPS 2001 (2001) Elisseeff, A., Weston, J.: A kernel method for multi-labelled classification. In: NIPS 2001 (2001)
8.
go back to reference Hariharan, B., Zelnik-manor, L., Vishwanathan, S.V.N., Varma, M.: Large scale max-margin multi-label classification with priors. In: ICML 2010, pp. 423–430 (2010) Hariharan, B., Zelnik-manor, L., Vishwanathan, S.V.N., Varma, M.: Large scale max-margin multi-label classification with priors. In: ICML 2010, pp. 423–430 (2010)
10.
go back to reference Ioannou, M., Sakkas, G., Tsoumakas, G., Vlahavas, I.: Obtaining bipartitions from score vectors for multi-label classification. In: ICTAI 2010, pp. 409–416 (2010) Ioannou, M., Sakkas, G., Tsoumakas, G., Vlahavas, I.: Obtaining bipartitions from score vectors for multi-label classification. In: ICTAI 2010, pp. 409–416 (2010)
11.
go back to reference Nam, J., Kim, J., Loza Mencía, E., Gurevych, I., Fürnkranz, J.: Large-scale multi-label text classification—revisiting neural networks. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014. LNCS (LNAI), vol. 8725, pp. 437–452. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44851-9_28CrossRef Nam, J., Kim, J., Loza Mencía, E., Gurevych, I., Fürnkranz, J.: Large-scale multi-label text classification—revisiting neural networks. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014. LNCS (LNAI), vol. 8725, pp. 437–452. Springer, Heidelberg (2014). https://​doi.​org/​10.​1007/​978-3-662-44851-9_​28CrossRef
12.
go back to reference Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifier chains for multi-label classification. Mach. Learn. 850(3), 333–359 (2011)MathSciNetCrossRef Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifier chains for multi-label classification. Mach. Learn. 850(3), 333–359 (2011)MathSciNetCrossRef
13.
go back to reference Steinberger, R., Pouliquen, B., Widiger, A., Ignat, C., Erjavec, T., Tufis, D.: The JRC-Acquis: a multilingual aligned parallel corpus with 20+ languages. In: LREC 2006 (2006) Steinberger, R., Pouliquen, B., Widiger, A., Ignat, C., Erjavec, T., Tufis, D.: The JRC-Acquis: a multilingual aligned parallel corpus with 20+ languages. In: LREC 2006 (2006)
14.
go back to reference Steinberger, R., Ebrahim, M., Turchi, M.: JRC EuroVoc indexer JEX-A freely available multi-label categorisation tool. In: LREC 2012 (2012) Steinberger, R., Ebrahim, M., Turchi, M.: JRC EuroVoc indexer JEX-A freely available multi-label categorisation tool. In: LREC 2012 (2012)
15.
go back to reference Tang, L., Rajan, S., Narayanan, V.K.: Large scale multi-label classification via metalabeler. In: WWW 2009, pp. 211–220 (2009) Tang, L., Rajan, S., Narayanan, V.K.: Large scale multi-label classification via metalabeler. In: WWW 2009, pp. 211–220 (2009)
16.
go back to reference Xu, J., Li, H.: Adarank: a boosting algorithm for information retrieval. In: SIGIR 2007, pp. 391–398 (2007) Xu, J., Li, H.: Adarank: a boosting algorithm for information retrieval. In: SIGIR 2007, pp. 391–398 (2007)
17.
go back to reference Yang, Y., Gopal, S.: Multilabel classification with meta-level features in a learning-to-rank framework. Mach. Learn. 880(1–2), 47–68 (2012)MathSciNetCrossRef Yang, Y., Gopal, S.: Multilabel classification with meta-level features in a learning-to-rank framework. Mach. Learn. 880(1–2), 47–68 (2012)MathSciNetCrossRef
18.
go back to reference Zhang, M.L., Zhou, Z.H.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 260(8), 1819–1837 (2014)CrossRef Zhang, M.L., Zhou, Z.H.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 260(8), 1819–1837 (2014)CrossRef
Metadata
Title
How Many Labels? Determining the Number of Labels in Multi-Label Text Classification
Authors
Hosein Azarbonyad
Maarten Marx
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-28577-7_11

Premium Partner