Top

Progress in Artificial Intelligence

Published in:

01-03-2015 | Regular Paper

Optimizing different loss functions in multilabel classifications

Authors: Jorge Díez, Oscar Luaces, Juan José del Coz, Antonio Bahamonde

Published in: Progress in Artificial Intelligence | Issue 2/2015

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Multilabel classification (ML) aims to assign a set of labels to an instance. This generalization of multiclass classification yields to the redefinition of loss functions and the learning tasks become harder. The objective of this paper is to gain insights into the relations of optimization aims and some of the most popular performance measures: subset (or 0/1), Hamming, and the example-based F-measure. To make a fair comparison, we implemented three ML learners for optimizing explicitly each one of these measures in a common framework. This can be done considering a subset of labels as a structured output. Then, we use structured output support vector machines tailored to optimize a given loss function. The paper includes an exhaustive experimental comparison. The conclusion is that in most cases, the optimization of the Hamming loss produces the best or competitive scores. This is a practical result since the Hamming loss can be minimized using a bunch of binary classifiers, one for each label separately, and therefore, it is a scalable and fast method to learn ML tasks. Additionally, we observe that in noise-free learning tasks optimizing the subset loss is the best option, but the differences are very small. We have also noticed that the biggest room for improvement can be found when the goal is to optimize an F-measure in noisy learning tasks.

previous article Can artificial intelligence benefit from quantum computing?

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Available only for authorised users

http://www.aic.uniovi.es/ml_generator/.

Table 1

Cardinality and density statistics of the 48 free-noise datasets

	Cardinality	Density (%)
50 Labels
Max	4.3	9
Min	2.5	5
Mean	3.3	7
SD	0.5	1
25 Labels
Max	4.3	17
Min	2.4	10
Mean	3.1	13
SD	0.6	2
10 Labels
Max	4.0	40
Min	1.8	18
Mean	2.9	29
SD	0.7	7

Datasets with Bernoulli and swap noise present similar figures

Cheng, W., Hüllermeier, E.: Combining instance-based learning and logistic regression for multilabel classification. Mach Learn 76(2), 211–225 (2009)CrossRef

Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. J Mach Learn Res 2, 265–292 (2002)MATH

Dembczyński, K., Cheng, W., Hüllermeier, E.: Bayes optimal multilabel classification via probabilistic classifier chains. In: Proceedings of the International Conference on Machine Learning (ICML) (2010)

Dembczyński, K., Kotłowski, W., Jachnik, A., Waegeman, W., Hüllermeier, E.: Optimizing the f-measure in multi-label classification: plug-in rule approach versus structured loss minimization. ICML (2013)

Dembczyński, K., Waegeman, W., Cheng, W., Hüllermeier, E.: An exact algorithm for F-measure maximization. In: Proceedings of the neural information processing systems (NIPS) (2011)

Dembczyński, K., Waegeman, W., Cheng, W., Hüllermeier, E.: On label dependence and loss minimization in multi-label classification. Mach Learn 88, 1–41 (2012)CrossRefMathSciNet

Díez, J., del Coz, J.J., Luaces, O., Bahamonde, A.: Tensor products to optimize label-based loss measures in multilabel classifications. Tech. rep., Centro de Inteligencia Artificial. Universidad de Oviedo at Gijón (2012)

Elisseeff, A., Weston, J.: A kernel method for multi-labelled classification. In: Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), pp. 681–687. MIT Press, Cambridge (2001)

Gao, W., Zhou, Z.H.: On the consistency of multi-label learning. J Mach Learn Res Proc Track (COLT) 19, 341–358 (2011)

10.

Ghamrawi, N., McCallum, A.: Collective multi-label classification. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, pp. 195–200. ACM, New York (2005)

11.

Hariharan, B., Vishwanathan, S., Varma, M.: Efficient max-margin multi-label classification with applications to zero-shot learning. Mach Learn 88(1–2), 127–155 (2012)CrossRefMATHMathSciNet

12.

Joachims, T.: A support vector method for multivariate performance measures. In: Proceedings of the International Conference on Machine Learning (ICML) (2005)

13.

Joachims, T.: Training linear SVMs in linear time. In: Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD). ACM, New York (2006)

14.

Joachims, T., Finley, T., Yu, C.: Cutting-plane training of structural svms. Mach Learn 77(1), 27–59 (2009)CrossRefMATH

15.

Lampert, C.H.: Maximum margin multi-label structured prediction. In: Advances in Neural Information Processing Systems, pp. 289–297 (2011)

16.

Luaces, O., Dfez, J., Barranquero, J., del Coz, J.J., Bahamonde, A.: Binary relevance efficacy for multilabel classification. Prog Artif Intell 4(1), 303–313 (2012)CrossRef

17.

Madjarov, G., Kocev, D., Gjorgjevikj, D., D\(\rm \check{z}\)eroski, S.: An extensive experimental comparison of methods for multi-label learning. Pattern Recognit 45(9), 3084–3104 (2012). doi:10.1016/j.patcog.2012.03.004. http://www.sciencedirect.com/science/article/pii/S0031320312001203

18.

Montañés, E., Quevedo, J., del Coz, J.: Aggregating independent and dependent models to learn multi-label classifiers. In: Proceedings of European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), pp. 484–500. Springer, Berlin (2011)

19.

Montañes, E., Senge, R., Barranquero, J., Ramón Quevedo, J., José del Coz, J., Hüllermeier, E.: Dependent binary relevance models for multi-label classification. Pattern Recognit 47(3), 1494–1508 (2014)CrossRef

20.

Petterson, J., Caetano, T.: Reverse multi-label learning. In: Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), pp. 1912–1920 (2010)

21.

Petterson, J., Caetano, T.S.: Submodular multi-label learning. In: Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), pp. 1512–1520 (2011)

22.

Quevedo, J.R., Luaces, O., Bahamonde, A.: Multilabel classifiers with a probabilistic thresholding strategy. Pattern Recognit 45(2), 876–883 (2012)MATH

23.

Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifier chains for multi-label classification. In: Proceedings of European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), pp. 254–269 (2009)

24.

Schapire, R., Singer, Y.: Boostexter: a boosting-based system for text categorization. Mach Learn 39(2), 135–168 (2000)CrossRefMATH

25.

Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large margin methods for structured and interdependent output variables. J Mach Learn Res 6(2), 1453 (2006)MathSciNet

26.

Tsoumakas, G., Katakis, I.: Multi labelclassification: an overview. Int J Data Wareh Min 3(3), 1–13 (2007)CrossRef

27.

Tsoumakas, G., Katakis, I., Vlahavas, I.: Mining multilabel data. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook. Springer, Berlin (2010)

28.

Tsoumakas, G., Katakis, I., Vlahavas, I.: Random k-labelsets for multi-label classification. IEEE Trans Knowl Discov Data Eng 23, 1079–1089 (2010)CrossRef

29.

Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)MATH

30.

Vedaldi, A.: A MATLAB wrapper of \({\text{ SVM }}^{{\text{ struct }}}\) (2011). http://www.vlfeat.org/~vedaldi/code/svm-struct-matlab.html

31.

Zaragoza, J., Sucar, L., Bielza, C., Larrañaga, P.: Bayesian chain classifiers for multidimensional classification. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI) (2011)

32.

Zhang, M.L., Zhou, Z.H.: ML-KNN: a lazy learning approach to multi-label learning. Pattern Recognit 40(7), 2038–2048 (2007)CrossRefMATH

Title: Optimizing different loss functions in multilabel classifications
Authors: Jorge Díez
Oscar Luaces
Juan José del Coz
Antonio Bahamonde
Publication date: 01-03-2015
Publisher: Springer Berlin Heidelberg
Published in: Progress in Artificial Intelligence / Issue 2/2015
Print ISSN: 2192-6352
Electronic ISSN: 2192-6360
DOI: https://doi.org/10.1007/s13748-014-0060-7

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Premium Partner