Skip to main content

2016 | OriginalPaper | Buchkapitel

3. Case Studies and Metrics

verfasst von : Francisco Herrera, Francisco Charte, Antonio J. Rivera, María J. del Jesus

Erschienen in: Multilabel Classification

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Multilabel classification techniques have been applied in many real-world situations in the last two decades. Each one represents a different case study for MLC, using one or more MLDs. After the general overview provided in Sect. 3.1, this chapter begins by briefly describing in Sect. 3.2 the most usual case studies found in the literature. As a result, a full list of available MLDs will be obtained, and the usual characterization metrics are explained and put in use with them in Sect. 3.3. Then, a practical use case is detailed in Sect. 3.4, running a simple MLC algorithm over a few MLDs. Lastly, the usual performance evaluation metrics for MLC are introduced in Sect. 3.5 and they are used to analyze the results obtained from this experiment.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
All datasets are available at RUMDR (R Ultimate Multilabel Dataset Repository) [10], from which can be downloaded and exported to several file formats.
 
2
The differences among the main file formats, all of them derived from the ARFF format used by WEKA, and how to use each of them, will be detailed in Chap. 9.
 
5
Additional information about how these MLDs were produced, including the software to do so, can be found at http://​www.​ke.​tu-darmstadt.​de/​resources/​eurlex.
 
21
The values of metrics such as HammingLoss, OneError, and RankingLoss have been complemented as the difference with respect to 1, aiming to preserve the principle of assigning a larger area to better values.
 
22
It must be taken into account that ML-kNN does not generate a real ranking of labels as prediction, but a binary partition. The ranking is generated from the posterior probabilities calculated for each label. With so few labels in emotions, it is possible to have many ties in these probabilities, so the positions in the ranking could be randomly determined in some cases.
 
Literatur
1.
Zurück zum Zitat Aha, D.W. (ed.): Lazy Learning. Springer (1997) Aha, D.W. (ed.): Lazy Learning. Springer (1997)
2.
Zurück zum Zitat Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991) Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)
5.
Zurück zum Zitat Barnard, K., Duygulu, P., Forsyth, D., de Freitas, N., Blei, D.M., Jordan, M.I.: Matching words and pictures. J. Mach. Learn. Res. 3, 1107–1135 (2003)MATH Barnard, K., Duygulu, P., Forsyth, D., de Freitas, N., Blei, D.M., Jordan, M.I.: Matching words and pictures. J. Mach. Learn. Res. 3, 1107–1135 (2003)MATH
6.
Zurück zum Zitat Boutell, M., Luo, J., Shen, X., Brown, C.: Learning multi-label scene classification. Pattern Recogn. 37(9), 1757–1771 (2004)CrossRef Boutell, M., Luo, J., Shen, X., Brown, C.: Learning multi-label scene classification. Pattern Recogn. 37(9), 1757–1771 (2004)CrossRef
7.
Zurück zum Zitat Briggs, F., Lakshminarayanan, B., Neal, L., Fern, X.Z., Raich, R., Hadley, S.J.K., Hadley, A.S., Betts, M.G.: Acoustic classification of multiple simultaneous bird species: a multi-instance multi-label approach. J. Acoust. Soc. Am. 131(6), 4640–4650 (2012)CrossRef Briggs, F., Lakshminarayanan, B., Neal, L., Fern, X.Z., Raich, R., Hadley, S.J.K., Hadley, A.S., Betts, M.G.: Acoustic classification of multiple simultaneous bird species: a multi-instance multi-label approach. J. Acoust. Soc. Am. 131(6), 4640–4650 (2012)CrossRef
8.
Zurück zum Zitat Cesa-Bianchi, N., Gentile, C., Zaniboni, L.: Incremental algorithms for hierarchical classification. J. Mach. Learn. Res. 7, 31–54 (2006)MATHMathSciNet Cesa-Bianchi, N., Gentile, C., Zaniboni, L.: Incremental algorithms for hierarchical classification. J. Mach. Learn. Res. 7, 31–54 (2006)MATHMathSciNet
10.
Zurück zum Zitat Charte, F., Charte, D., Rivera, A.J., del Jesus, M.J., Herrera, F.: R Ultimate multilabel dataset repository. In: Proceedings of 11th International Conference on Hybrid Artificial Intelligent Systems, HAIS’16, vol. 9648, pp. 487–499. Springer (2016) Charte, F., Charte, D., Rivera, A.J., del Jesus, M.J., Herrera, F.: R Ultimate multilabel dataset repository. In: Proceedings of 11th International Conference on Hybrid Artificial Intelligent Systems, HAIS’16, vol. 9648, pp. 487–499. Springer (2016)
11.
Zurück zum Zitat Charte, F., Rivera, A., del Jesus, M.J., Herrera, F.: LI-MLC: a label inference methodology for addressing high dimensionality in the label space for multilabel classification. IEEE Trans. Neural Netw. Learn. Syst. 25(10), 1842–1854 (2014)CrossRef Charte, F., Rivera, A., del Jesus, M.J., Herrera, F.: LI-MLC: a label inference methodology for addressing high dimensionality in the label space for multilabel classification. IEEE Trans. Neural Netw. Learn. Syst. 25(10), 1842–1854 (2014)CrossRef
13.
Zurück zum Zitat Charte, F., Rivera, A.J., del Jesus, M.J., Herrera, F.: Concurrence among Imbalanced labels and its influence on multilabel resampling algorithms. In: Proceedings of 9th International Conference on Hybrid Artificial Intelligent Systems, HAIS’14, vol. 8480. Springer (2014) Charte, F., Rivera, A.J., del Jesus, M.J., Herrera, F.: Concurrence among Imbalanced labels and its influence on multilabel resampling algorithms. In: Proceedings of 9th International Conference on Hybrid Artificial Intelligent Systems, HAIS’14, vol. 8480. Springer (2014)
14.
Zurück zum Zitat Charte, F., Rivera, A.J., del Jesus, M.J., Herrera, F.: Addressing imbalance in multilabel classification: measures and random resampling algorithms. Neurocomputing 163, 3–16 (2015)CrossRef Charte, F., Rivera, A.J., del Jesus, M.J., Herrera, F.: Addressing imbalance in multilabel classification: measures and random resampling algorithms. Neurocomputing 163, 3–16 (2015)CrossRef
15.
Zurück zum Zitat Charte, F., Rivera, A.J., del Jesus, M.J., Herrera, F.: QUINTA: a question tagging assistant to improve the answering ratio in electronic forums. In: Proceedings of IEEE International Conference on Computer as a Tool, EUROCON’15, pp. 1–6. IEEE (2015) Charte, F., Rivera, A.J., del Jesus, M.J., Herrera, F.: QUINTA: a question tagging assistant to improve the answering ratio in electronic forums. In: Proceedings of IEEE International Conference on Computer as a Tool, EUROCON’15, pp. 1–6. IEEE (2015)
16.
Zurück zum Zitat Charte, F., Rivera, A.J., del Jesus, M.J., Herrera, F.: On the impact of dataset complexity and sampling strategy in multilabel classifiers performance. In: Proceedings of 11th International Conference on Hybrid Artificial Intelligent Systems, HAIS’16, vol. 9648, pp. 500–511. Springer (2016) Charte, F., Rivera, A.J., del Jesus, M.J., Herrera, F.: On the impact of dataset complexity and sampling strategy in multilabel classifiers performance. In: Proceedings of 11th International Conference on Hybrid Artificial Intelligent Systems, HAIS’16, vol. 9648, pp. 500–511. Springer (2016)
17.
Zurück zum Zitat Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: NUS-WIDE: a real-world web image database from National University of Singapore. In: Proceedings of 8th ACM international Conference on Image and Video Retrieval, CIVR’09, pp. 48:1–48:9. ACM (2009) Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: NUS-WIDE: a real-world web image database from National University of Singapore. In: Proceedings of 8th ACM international Conference on Image and Video Retrieval, CIVR’09, pp. 48:1–48:9. ACM (2009)
18.
Zurück zum Zitat Crammer, K., Dredze, M., Ganchev, K., Talukdar, P.P., Carroll, S.: Automatic code assignment to medical text. In: Proceedings of Workshop on Biological, Translational, and Clinical Language Processing, BioNLP’07, pp. 129–136. Association for Computational Linguistics (2007) Crammer, K., Dredze, M., Ganchev, K., Talukdar, P.P., Carroll, S.: Automatic code assignment to medical text. In: Proceedings of Workshop on Biological, Translational, and Clinical Language Processing, BioNLP’07, pp. 129–136. Association for Computational Linguistics (2007)
19.
Zurück zum Zitat Diplaris, S., Tsoumakas, G., Mitkas, P., Vlahavas, I.: Protein classification with multiple algorithms. In: Proceedings of 10th Panhellenic Conference on Informatics, PCI’05, vol. 3746, pp. 448–456. Springer (2005) Diplaris, S., Tsoumakas, G., Mitkas, P., Vlahavas, I.: Protein classification with multiple algorithms. In: Proceedings of 10th Panhellenic Conference on Informatics, PCI’05, vol. 3746, pp. 448–456. Springer (2005)
20.
Zurück zum Zitat Duygulu, P., Barnard, K., de Freitas, J., Forsyth, D.: Object recognition as machine translation: learning a Lexicon for a fixed image vocabulary. In: Proceedings of 7th European Conference on Computer Vision, ECCV’02, vol. 2353, pp. 97–112. Springer (2002) Duygulu, P., Barnard, K., de Freitas, J., Forsyth, D.: Object recognition as machine translation: learning a Lexicon for a fixed image vocabulary. In: Proceedings of 7th European Conference on Computer Vision, ECCV’02, vol. 2353, pp. 97–112. Springer (2002)
21.
Zurück zum Zitat Elisseeff, A., Weston, J.: A kernel method for multi-labelled classification. In: Advances in Neural Information Processing Systems, vol. 14, pp. 681–687. MIT Press (2001) Elisseeff, A., Weston, J.: A kernel method for multi-labelled classification. In: Advances in Neural Information Processing Systems, vol. 14, pp. 681–687. MIT Press (2001)
22.
Zurück zum Zitat Ghamrawi, N., McCallum, A.: Collective multi-label classification. In: Proceedings of 14th ACM International Conference on Information and Knowledge Management, CIKM’05, pp. 195–200. ACM (2005) Ghamrawi, N., McCallum, A.: Collective multi-label classification. In: Proceedings of 14th ACM International Conference on Information and Knowledge Management, CIKM’05, pp. 195–200. ACM (2005)
23.
Zurück zum Zitat Godbole, S., Sarawagi, S.: Discriminative methods for multi-labeled classification. Adv. Knowl. Discov. Data Min. 3056, 22–30 (2004) Godbole, S., Sarawagi, S.: Discriminative methods for multi-labeled classification. Adv. Knowl. Discov. Data Min. 3056, 22–30 (2004)
24.
Zurück zum Zitat Gonçalves, E.C., Plastino, A., Freitas, A.A.: A genetic algorithm for optimizing the label ordering in multi-label classifier chains. In: Proceedings of 25th IEEE International Conference on Tools with Artificial Intelligence, ICTAI’13, pp. 469–476. IEEE (2013) Gonçalves, E.C., Plastino, A., Freitas, A.A.: A genetic algorithm for optimizing the label ordering in multi-label classifier chains. In: Proceedings of 25th IEEE International Conference on Tools with Artificial Intelligence, ICTAI’13, pp. 469–476. IEEE (2013)
25.
Zurück zum Zitat Joachims, T.: Text categorization with suport vector machines: learning with many relevant features. In: Proceedings of 10th European Conference on Machine Learning, ECML’98, pp. 137–142. Springer (1998) Joachims, T.: Text categorization with suport vector machines: learning with many relevant features. In: Proceedings of 10th European Conference on Machine Learning, ECML’98, pp. 137–142. Springer (1998)
26.
Zurück zum Zitat Katakis, I., Tsoumakas, G., Vlahavas, I.: Multilabel text classification for automated tag suggestion. In: Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD’08, pp. 75–83 (2008) Katakis, I., Tsoumakas, G., Vlahavas, I.: Multilabel text classification for automated tag suggestion. In: Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD’08, pp. 75–83 (2008)
27.
Zurück zum Zitat Klimt, B., Yang, Y.: The enron corpus: a new dataset for email classification research. In: Proceedings of 15th European Conference on Machine Learning, ECML’04, pp. 217–226. Springer (2004) Klimt, B., Yang, Y.: The enron corpus: a new dataset for email classification research. In: Proceedings of 15th European Conference on Machine Learning, ECML’04, pp. 217–226. Springer (2004)
28.
Zurück zum Zitat Lang, K.: Newsweeder: learning to filter netnews. In: Proceedings of 12th International Conference on Machine Learning, ML’95, pp. 331–339 (1995) Lang, K.: Newsweeder: learning to filter netnews. In: Proceedings of 12th International Conference on Machine Learning, ML’95, pp. 331–339 (1995)
29.
Zurück zum Zitat Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: RCV1: a new benchmark collection for text categorization research. J. Mach. Learn. Res. 5, 361–397 (2004) Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: RCV1: a new benchmark collection for text categorization research. J. Mach. Learn. Res. 5, 361–397 (2004)
30.
Zurück zum Zitat Mencia, E.L., Fürnkranz, J.: Efficient pairwise multilabel classification for large-scale problems in the legal domain. In: Proceedings of 11th European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD’08, pp. 50–65. Springer (2008) Mencia, E.L., Fürnkranz, J.: Efficient pairwise multilabel classification for large-scale problems in the legal domain. In: Proceedings of 11th European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD’08, pp. 50–65. Springer (2008)
31.
Zurück zum Zitat Read, J.: Scalable multi-label classification. Ph.D. thesis, University of Waikato (2010) Read, J.: Scalable multi-label classification. Ph.D. thesis, University of Waikato (2010)
32.
Zurück zum Zitat Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifier chains for multi-label classification. Mach. Learn. 85, 333–359 (2011)CrossRefMathSciNet Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifier chains for multi-label classification. Mach. Learn. 85, 333–359 (2011)CrossRefMathSciNet
34.
Zurück zum Zitat Schapire, R.E., Singer, Y.: Boostexter: a boosting-based system for text categorization. Mach. Learn. 39(2–3), 135–168 (2000)CrossRefMATH Schapire, R.E., Singer, Y.: Boostexter: a boosting-based system for text categorization. Mach. Learn. 39(2–3), 135–168 (2000)CrossRefMATH
35.
Zurück zum Zitat Snoek, C.G.M., Worring, M., van Gemert, J.C., Geusebroek, J.M., Smeulders, A.W.M.: The challenge problem for automated detection of 101 semantic concepts in multimedia. In: Proceedings of 14th ACM International Conference on Multimedia, MULTIMEDIA’06, pp. 421–430 (2006) Snoek, C.G.M., Worring, M., van Gemert, J.C., Geusebroek, J.M., Smeulders, A.W.M.: The challenge problem for automated detection of 101 semantic concepts in multimedia. In: Proceedings of 14th ACM International Conference on Multimedia, MULTIMEDIA’06, pp. 421–430 (2006)
36.
Zurück zum Zitat Spyromitros-Xioufis, E., Papadopoulos, S., Kompatsiaris, I.Y., Tsoumakas, G., Vlahavas, I.: A comprehensive study over vlad and product quantization in large-scale image retrieval. IEEE Trans. Multimedia 16(6), 1713–1728 (2014)CrossRef Spyromitros-Xioufis, E., Papadopoulos, S., Kompatsiaris, I.Y., Tsoumakas, G., Vlahavas, I.: A comprehensive study over vlad and product quantization in large-scale image retrieval. IEEE Trans. Multimedia 16(6), 1713–1728 (2014)CrossRef
37.
Zurück zum Zitat Srivastava, A.N., Zane-Ulman, B.: Discovering recurring anomalies in text reports regarding complex space systems. In: Aerospace Conference, pp. 3853–3862. IEEE (2005) Srivastava, A.N., Zane-Ulman, B.: Discovering recurring anomalies in text reports regarding complex space systems. In: Aerospace Conference, pp. 3853–3862. IEEE (2005)
38.
Zurück zum Zitat Tomás, J.T., Spolaôr, N., Cherman, E.A., Monard, M.C.: A framework to generate synthetic multi-label datasets. Electron. Notes Theoret. Comput. Sci. 302, 155–176 (2014)CrossRef Tomás, J.T., Spolaôr, N., Cherman, E.A., Monard, M.C.: A framework to generate synthetic multi-label datasets. Electron. Notes Theoret. Comput. Sci. 302, 155–176 (2014)CrossRef
39.
Zurück zum Zitat Tsoumakas, G., Katakis, I.: Multi-label classification: An overview. Int. J. Data Warehouse. Min. 3(3), 1–13 (2007)CrossRef Tsoumakas, G., Katakis, I.: Multi-label classification: An overview. Int. J. Data Warehouse. Min. 3(3), 1–13 (2007)CrossRef
40.
Zurück zum Zitat Tsoumakas, G., Katakis, I., Vlahavas, I.: Effective and efficient multilabel classification in domains with large number of labels. In: Proceedings of ECML/PKDD Workshop on Mining Multidimensional Data, MMD’08, pp. 30–44 (2008) Tsoumakas, G., Katakis, I., Vlahavas, I.: Effective and efficient multilabel classification in domains with large number of labels. In: Proceedings of ECML/PKDD Workshop on Mining Multidimensional Data, MMD’08, pp. 30–44 (2008)
41.
Zurück zum Zitat Tsoumakas, G., Katakis, I., Vlahavas, I.: Mining multi-label data. In: Data Mining and Knowledge Discovery Handbook, pp. 667–685. Springer (2010) Tsoumakas, G., Katakis, I., Vlahavas, I.: Mining multi-label data. In: Data Mining and Knowledge Discovery Handbook, pp. 667–685. Springer (2010)
42.
Zurück zum Zitat Tsoumakas, G., Vlahavas, I.: Random k-Labelsets: an ensemble method for multilabel classification. In: Proceedings of 18th European Conference on Machine Learning, ECML’07, vol. 4701, pp. 406–417. Springer (2007) Tsoumakas, G., Vlahavas, I.: Random k-Labelsets: an ensemble method for multilabel classification. In: Proceedings of 18th European Conference on Machine Learning, ECML’07, vol. 4701, pp. 406–417. Springer (2007)
44.
Zurück zum Zitat Turnbull, D., Barrington, L., Torres, D., Lanckriet, G.: Semantic annotation and retrieval of music and sound effects. IEEE Trans. Audio Speech Lang. Process. 16(2), 467–476 (2008)CrossRef Turnbull, D., Barrington, L., Torres, D., Lanckriet, G.: Semantic annotation and retrieval of music and sound effects. IEEE Trans. Audio Speech Lang. Process. 16(2), 467–476 (2008)CrossRef
45.
Zurück zum Zitat Turner, M.D., Chakrabarti, C., Jones, T.B., Xu, J.F., Fox, P.T., Luger, G.F., Laird, A.R., Turner, J.A.: Automated annotation of functional imaging experiments via multi-label classification. Front. Neurosci. 7 (2013) Turner, M.D., Chakrabarti, C., Jones, T.B., Xu, J.F., Fox, P.T., Luger, G.F., Laird, A.R., Turner, J.A.: Automated annotation of functional imaging experiments via multi-label classification. Front. Neurosci. 7 (2013)
46.
Zurück zum Zitat Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Trans. Speech Audio Process. 10(5), 293–302 (2002)CrossRef Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Trans. Speech Audio Process. 10(5), 293–302 (2002)CrossRef
47.
Zurück zum Zitat Ueda, N., Saito, K.: Parametric mixture models for multi-labeled text. In: Proceedings of 15th Annual Conference on Neural Information Processing Systems, NIPS’02, pp. 721–728 (2002) Ueda, N., Saito, K.: Parametric mixture models for multi-labeled text. In: Proceedings of 15th Annual Conference on Neural Information Processing Systems, NIPS’02, pp. 721–728 (2002)
48.
Zurück zum Zitat Wieczorkowska, A., Synak, P., Raś, Z.: Multi-label classification of emotions in music. In: Intelligent Information Processing and Web Mining, AISC, vol. 35, chap. 30, pp. 307–315 (2006) Wieczorkowska, A., Synak, P., Raś, Z.: Multi-label classification of emotions in music. In: Intelligent Information Processing and Web Mining, AISC, vol. 35, chap. 30, pp. 307–315 (2006)
49.
Zurück zum Zitat Zhang, M., Zhou, Z.: ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn. 40(7), 2038–2048 (2007)CrossRefMATH Zhang, M., Zhou, Z.: ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn. 40(7), 2038–2048 (2007)CrossRefMATH
Metadaten
Titel
Case Studies and Metrics
verfasst von
Francisco Herrera
Francisco Charte
Antonio J. Rivera
María J. del Jesus
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-41111-8_3

Premium Partner