Skip to main content
Top
Published in: Data Mining and Knowledge Discovery 4/2021

06-05-2021

Efficient set-valued prediction in multi-class classification

Authors: Thomas Mortier, Marek Wydmuch, Krzysztof Dembczyński, Eyke Hüllermeier, Willem Waegeman

Published in: Data Mining and Knowledge Discovery | Issue 4/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In cases of uncertainty, a multi-class classifier preferably returns a set of candidate classes instead of predicting a single class label with little guarantee. More precisely, the classifier should strive for an optimal balance between the correctness (the true class is among the candidates) and the precision (the candidates are not too many) of its prediction. We formalize this problem within a general decision-theoretic framework that unifies most of the existing work in this area. In this framework, uncertainty is quantified in terms of conditional class probabilities, and the quality of a predicted set is measured in terms of a utility function. We then address the problem of finding the Bayes-optimal prediction, i.e., the subset of class labels with the highest expected utility. For this problem, which is computationally challenging as there are exponentially (in the number of classes) many predictions to choose from, we propose efficient algorithms that can be applied to a broad family of utility functions. Our theoretical results are complemented by experimental studies, in which we analyze the proposed algorithms in terms of predictive accuracy and runtime efficiency.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
1
Practically, this augmentation is often omitted for simplicity.
 
2
Note that the candidate solutions in the set are sorted in decreasing order of conditional class probability.
 
3
The multi-label VOC datasets are transformed to multi-class by removing instances with more than one label.
 
Literature
go back to reference Babbar R, Schölkopf B (2017) Dismec: Distributed sparse machines for extreme multi-label classification. Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, DOI 10(1145/3018661):3018741 Babbar R, Schölkopf B (2017) Dismec: Distributed sparse machines for extreme multi-label classification. Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, DOI 10(1145/3018661):3018741
go back to reference Balasubramanian V, Ho S, Vovk V (eds) (2014) Conformal Prediction for Reliable Machine Learning: Theory. Morgan Kaufmann, Adaptations and Applications Balasubramanian V, Ho S, Vovk V (eds) (2014) Conformal Prediction for Reliable Machine Learning: Theory. Morgan Kaufmann, Adaptations and Applications
go back to reference Beygelzimer A, Langford J, Lifshits Y, Sorkin G, Strehl A (2009) Conditional probability tree estimation analysis and algorithms. In: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, AUAI Press, Arlington, Virginia, United States, UAI ’09, pp 51–58 Beygelzimer A, Langford J, Lifshits Y, Sorkin G, Strehl A (2009) Conditional probability tree estimation analysis and algorithms. In: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, AUAI Press, Arlington, Virginia, United States, UAI ’09, pp 51–58
go back to reference Bi W, Kwok J (2015) Bayes-optimal hierarchical multilabel classification. IEEE Trans Knowl Data Eng 27:1–1CrossRef Bi W, Kwok J (2015) Bayes-optimal hierarchical multilabel classification. IEEE Trans Knowl Data Eng 27:1–1CrossRef
go back to reference Corani G, Zaffalon M (2008) Learning reliable classifiers from small or incomplete data sets: the naive credal classifier 2. J Mach Learn Res 9:581–621MathSciNetMATH Corani G, Zaffalon M (2008) Learning reliable classifiers from small or incomplete data sets: the naive credal classifier 2. J Mach Learn Res 9:581–621MathSciNetMATH
go back to reference Corani G, Zaffalon M (2009) Lazy naive credal classifier. In: Proceedings of the 1st ACM SIGKDD Workshop on Knowledge Discovery from Uncertain Data, ACM, pp 30–37 Corani G, Zaffalon M (2009) Lazy naive credal classifier. In: Proceedings of the 1st ACM SIGKDD Workshop on Knowledge Discovery from Uncertain Data, ACM, pp 30–37
go back to reference Del Coz JJ, Díez J, Bahamonde A (2009) Learning nondeterministic classifiers. J Mach Learn Res 10:2273–2293MathSciNetMATH Del Coz JJ, Díez J, Bahamonde A (2009) Learning nondeterministic classifiers. J Mach Learn Res 10:2273–2293MathSciNetMATH
go back to reference Dembczyński K, Waegeman W, Cheng W, Hüllermeier E (2012) An analysis of chaining in multi-label classification. In: Proceedings of the European Conference on Artificial Intelligence Dembczyński K, Waegeman W, Cheng W, Hüllermeier E (2012) An analysis of chaining in multi-label classification. In: Proceedings of the European Conference on Artificial Intelligence
go back to reference Dembczyński K, Kotłowski W, Waegeman W, Busa-Fekete R, Hüllermeier E (2016) Consistency of probabilistic classifier trees. In: ECML/PKDD Dembczyński K, Kotłowski W, Waegeman W, Busa-Fekete R, Hüllermeier E (2016) Consistency of probabilistic classifier trees. In: ECML/PKDD
go back to reference Denis C, Hebiri M (2017) Confidence sets with expected sizes for multiclass classification. J Mach Learn Res 18:102–128MathSciNetMATH Denis C, Hebiri M (2017) Confidence sets with expected sizes for multiclass classification. J Mach Learn Res 18:102–128MathSciNetMATH
go back to reference Depeweg S, Hernández-Lobato JM, Doshi-Velez F, Udluft S (2018) Decomposition of uncertainty in Bayesian deep learning for efficient and risk-sensitive learning. ICML, PMLR, Proceedings of Machine Learning Research 80:1192–1201 Depeweg S, Hernández-Lobato JM, Doshi-Velez F, Udluft S (2018) Decomposition of uncertainty in Bayesian deep learning for efficient and risk-sensitive learning. ICML, PMLR, Proceedings of Machine Learning Research 80:1192–1201
go back to reference Everingham M, Eslami ASM, Gool LV, Williams CKI, Winn J, Zisserman A (2006) The pascal visual object classes challenge 2006 (VOC2006) results. Int J comput vision 111(1):98–136CrossRef Everingham M, Eslami ASM, Gool LV, Williams CKI, Winn J, Zisserman A (2006) The pascal visual object classes challenge 2006 (VOC2006) results. Int J comput vision 111(1):98–136CrossRef
go back to reference Everingham M, Gool LV, Williams CKI, Winn J, Zisserman A (2007) The PASCAL visual object classes challenge 2007 (VOC2007) results Everingham M, Gool LV, Williams CKI, Winn J, Zisserman A (2007) The PASCAL visual object classes challenge 2007 (VOC2007) results
go back to reference Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) LIBLINEAR: a library for large linear classification. J Mach Learn Res 9:1871–1874MATH Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) LIBLINEAR: a library for large linear classification. J Mach Learn Res 9:1871–1874MATH
go back to reference Fiannaca A, Paglia LL, Rosa ML, Bosco GL, Renda G, Rizzo R, Gaglio S, Urso A (2018) Deep learning models for bacteria taxonomic classification of metagenomic data. BMC Bioinformat 19:61–76CrossRef Fiannaca A, Paglia LL, Rosa ML, Bosco GL, Renda G, Rizzo R, Gaglio S, Urso A (2018) Deep learning models for bacteria taxonomic classification of metagenomic data. BMC Bioinformat 19:61–76CrossRef
go back to reference Fox J (1997) Applied regression analysis, linear models, and related methods. Sage, Fox J (1997) Applied regression analysis, linear models, and related methods. Sage,
go back to reference Frank E, Kramer S (2004) Ensembles of nested dichotomies for multi-class problems. In: Proceedings of the Twenty-first International Conference on Machine Learning, ACM, New York, NY, USA, ICML ’04, pp 39 Frank E, Kramer S (2004) Ensembles of nested dichotomies for multi-class problems. In: Proceedings of the Twenty-first International Conference on Machine Learning, ACM, New York, NY, USA, ICML ’04, pp 39
go back to reference Freitas A (2007) A tutorial on hierarchical classification with applications in bioinformatics. In: Research and Trends in Data Mining Technologies and Applications,, pp 175–208 Freitas A (2007) A tutorial on hierarchical classification with applications in bioinformatics. In: Research and Trends in Data Mining Technologies and Applications,, pp 175–208
go back to reference Geusebroek JM, Burghouts G, Smeulders A (2005) The amsterdam library of object images. Int J Comput Vision 61(1):103–112CrossRef Geusebroek JM, Burghouts G, Smeulders A (2005) The amsterdam library of object images. Int J Comput Vision 61(1):103–112CrossRef
go back to reference Griffin G, Holub A, Perona P (2007) Caltech-256 object category dataset. Tech Rep 7694, California Institute of Technology Griffin G, Holub A, Perona P (2007) Caltech-256 object category dataset. Tech Rep 7694, California Institute of Technology
go back to reference Jansche M (2007) A maximum expected utility framework for binary sequence labeling. In: Association for Computational Linguistics, pp 736–743 Jansche M (2007) A maximum expected utility framework for binary sequence labeling. In: Association for Computational Linguistics, pp 736–743
go back to reference Kendall A, Gal Y (2017) What uncertainties do we need in Bayesian deep learning for computer vision? Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4–9 December 2017. Long Beach, CA, USA, pp 5580–5590 Kendall A, Gal Y (2017) What uncertainties do we need in Bayesian deep learning for computer vision? Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4–9 December 2017. Long Beach, CA, USA, pp 5580–5590
go back to reference Li FF, Andreetto M, Ranzato MA (2003) Caltech101 image dataset. Tech. rep, California Institute of Technology Li FF, Andreetto M, Ranzato MA (2003) Caltech101 image dataset. Tech. rep, California Institute of Technology
go back to reference Li Y, Wang S, Umarov R, Xie B, Fan M, Li L, Gao X (2018) Deepre: sequence-based enzyme EC number prediction by deep learning. Bioinformatics 34(5):760–769CrossRef Li Y, Wang S, Umarov R, Xie B, Fan M, Li L, Gao X (2018) Deepre: sequence-based enzyme EC number prediction by deep learning. Bioinformatics 34(5):760–769CrossRef
go back to reference Malkov YA, Yashunin DA (2018) Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence pp 1–1 Malkov YA, Yashunin DA (2018) Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence pp 1–1
go back to reference Melnikov V, Hüllermeier E (2018) On the effectiveness of heuristics for learning nested dichotomies: an empirical analysis. Mach Learn 107(8–10):1537–1560MathSciNetCrossRef Melnikov V, Hüllermeier E (2018) On the effectiveness of heuristics for learning nested dichotomies: an empirical analysis. Mach Learn 107(8–10):1537–1560MathSciNetCrossRef
go back to reference Mena D, Montañés E, Quevedo JR, del Coz JJ (2017) A family of admissible heuristics for A* to perform inference in probabilistic classifier chains. Mach Learn 106(1):143–169MathSciNetCrossRef Mena D, Montañés E, Quevedo JR, del Coz JJ (2017) A family of admissible heuristics for A* to perform inference in probabilistic classifier chains. Mach Learn 106(1):143–169MathSciNetCrossRef
go back to reference Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in Neural Information Processing Systems 26, Curran Associates, Inc., pp 3111–3119 Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in Neural Information Processing Systems 26, Curran Associates, Inc., pp 3111–3119
go back to reference Morin F, Bengio Y (2005) Hierarchical probabilistic neural network language model. In: Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics, Society for Artificial Intelligence and Statistics, pp 246–252 Morin F, Bengio Y (2005) Hierarchical probabilistic neural network language model. In: Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics, Society for Artificial Intelligence and Statistics, pp 246–252
go back to reference Nguyen V, Destercke S, Masson M, Hüllermeier E (2018) Reliable multi-class classification based on pairwise epistemic and aleatoric uncertainty. In: IJCAI, ijcai.org, pp 5089–5095 Nguyen V, Destercke S, Masson M, Hüllermeier E (2018) Reliable multi-class classification based on pairwise epistemic and aleatoric uncertainty. In: IJCAI, ijcai.org, pp 5089–5095
go back to reference Oh S (2017) Top-k hierarchical classification. In: AAAI, AAAI Press, pp 2450–2456 Oh S (2017) Top-k hierarchical classification. In: AAAI, AAAI Press, pp 2450–2456
go back to reference Papadopoulos H (2008) Inductive conformal prediction: theory and application to neural networks. Tools Artif Intel 18(2):315–330 Papadopoulos H (2008) Inductive conformal prediction: theory and application to neural networks. Tools Artif Intel 18(2):315–330
go back to reference Partalas I, Kosmopoulos A, Baskiotis N, Artières T, Paliouras G, Gaussier É, Androutsopoulos I, Amini M, Gallinari P (2015) LSHTC: A benchmark for large-scale text classification. CoRR arXiv:1503.08581 Partalas I, Kosmopoulos A, Baskiotis N, Artières T, Paliouras G, Gaussier É, Androutsopoulos I, Amini M, Gallinari P (2015) LSHTC: A benchmark for large-scale text classification. CoRR arXiv:​1503.​08581
go back to reference Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in pytorch. In: NIPS-W Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in pytorch. In: NIPS-W
go back to reference Prabhu Y, Varma M (2014) Fastxml: A fast, accurate and stable tree-classifier for extreme multi-label learning. In: KDD Prabhu Y, Varma M (2014) Fastxml: A fast, accurate and stable tree-classifier for extreme multi-label learning. In: KDD
go back to reference Prabhu Y, Kag A, Harsola S, Agrawal R, Varma M (2018) Parabel: Partitioned label trees for extreme classification with application to dynamic search advertising. In: Proceedings of the International World Wide Web Conference Prabhu Y, Kag A, Harsola S, Agrawal R, Varma M (2018) Parabel: Partitioned label trees for extreme classification with application to dynamic search advertising. In: Proceedings of the International World Wide Web Conference
go back to reference Rahimi A, Recht B (2008) Random features for large-scale kernel machines. Adv Neural Inform Process Syst 20:1177–1184 Rahimi A, Recht B (2008) Random features for large-scale kernel machines. Adv Neural Inform Process Syst 20:1177–1184
go back to reference Ramaswamy HG, Tewari A, Agarwal S (2015) Consistent algorithms for multiclass classification with a reject option. CoRR arXiv:5050.4137 Ramaswamy HG, Tewari A, Agarwal S (2015) Consistent algorithms for multiclass classification with a reject option. CoRR arXiv:​5050.​4137
go back to reference Rangwala H, Naik A (2017) Large scale hierarchical classification: foundations, algorithms and applications. In: The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases Rangwala H, Naik A (2017) Large scale hierarchical classification: foundations, algorithms and applications. In: The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases
go back to reference Senge R, Bösner S, Dembczyénski K, Haasenritter J, Hirsch O, Donner-Banzhoff N, Hüllermeier E (2014) Reliable classification: Learning classifiers that distinguish aleatoric and epistemic uncertainty. Inf Sci 255:16–29MathSciNetCrossRef Senge R, Bösner S, Dembczyénski K, Haasenritter J, Hirsch O, Donner-Banzhoff N, Hüllermeier E (2014) Reliable classification: Learning classifiers that distinguish aleatoric and epistemic uncertainty. Inf Sci 255:16–29MathSciNetCrossRef
go back to reference Shafer G, Vovk V (2008) A tutorial on conformal prediction. J Mach Learn Res 9:371–421 Shafer G, Vovk V (2008) A tutorial on conformal prediction. J Mach Learn Res 9:371–421
go back to reference Shrivastava A, Li P (2014) Asymmetric lsh (alsh) for sublinear time maximum inner product search (mips). In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, MIT Press, Cambridge, MA, USA, NIPS’14, pp 2321–2329 Shrivastava A, Li P (2014) Asymmetric lsh (alsh) for sublinear time maximum inner product search (mips). In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, MIT Press, Cambridge, MA, USA, NIPS’14, pp 2321–2329
go back to reference Vovk V, Gammerman A, Shafer G (2003) Algorithmic Learning in a Random World. Springer-Verlag, Vovk V, Gammerman A, Shafer G (2003) Algorithmic Learning in a Random World. Springer-Verlag,
go back to reference Waegeman W, Dembczyński K, Jachnik A, Cheng W, Hüllermeier E (2014) On the Bayes-optimality of F-measure maximizers. J Mach Learn Res 15:3333–3388MathSciNetMATH Waegeman W, Dembczyński K, Jachnik A, Cheng W, Hüllermeier E (2014) On the Bayes-optimality of F-measure maximizers. J Mach Learn Res 15:3333–3388MathSciNetMATH
go back to reference Yagnik J, Strelow D, Ross DA, sung Lin R (2011) The power of comparative reasoning. In: 2011 International Conference on Computer Vision, pp 2431–2438 Yagnik J, Strelow D, Ross DA, sung Lin R (2011) The power of comparative reasoning. In: 2011 International Conference on Computer Vision, pp 2431–2438
go back to reference Yang G, Destercke S, Masson MH (2017a) Cautious classification with nested dichotomies and imprecise probabilities. Soft Comput 21:7447–7462CrossRef Yang G, Destercke S, Masson MH (2017a) Cautious classification with nested dichotomies and imprecise probabilities. Soft Comput 21:7447–7462CrossRef
go back to reference Yang G, Destercke S, Masson MH (2017b) The costs of indeterminacy: how to determine them? IEEE Transact Cybernet 47:4316–4327CrossRef Yang G, Destercke S, Masson MH (2017b) The costs of indeterminacy: how to determine them? IEEE Transact Cybernet 47:4316–4327CrossRef
go back to reference Ye N, Chai K, Lee WS, Chieu HL (2012) Optimizing f-measures: a tale of two approaches. In: Proceedings of the International Conference on Machine Learning Ye N, Chai K, Lee WS, Chieu HL (2012) Optimizing f-measures: a tale of two approaches. In: Proceedings of the International Conference on Machine Learning
go back to reference Zaffalon M, Giorgio C, Mauá DD (2012) Evaluating credal classifiers by utility-discounted predictive accuracy. Int J Approx Reasoning 53:1282–1301MathSciNetCrossRef Zaffalon M, Giorgio C, Mauá DD (2012) Evaluating credal classifiers by utility-discounted predictive accuracy. Int J Approx Reasoning 53:1282–1301MathSciNetCrossRef
go back to reference Ziyin L, Wang Z, Liang PP, Salakhutdinov R, Morency LP, Ueda M (2019) Deep gamblers: Learning to abstain with portfolio theory. arXiv:1907.00208 Ziyin L, Wang Z, Liang PP, Salakhutdinov R, Morency LP, Ueda M (2019) Deep gamblers: Learning to abstain with portfolio theory. arXiv:​1907.​00208
Metadata
Title
Efficient set-valued prediction in multi-class classification
Authors
Thomas Mortier
Marek Wydmuch
Krzysztof Dembczyński
Eyke Hüllermeier
Willem Waegeman
Publication date
06-05-2021
Publisher
Springer US
Published in
Data Mining and Knowledge Discovery / Issue 4/2021
Print ISSN: 1384-5810
Electronic ISSN: 1573-756X
DOI
https://doi.org/10.1007/s10618-021-00751-x

Other articles of this Issue 4/2021

Data Mining and Knowledge Discovery 4/2021 Go to the issue

Premium Partner