Skip to main content
Erschienen in: Knowledge and Information Systems 5/2021

11.03.2021 | Regular Paper

Cost-sensitive selection of variables by ensemble of model sequences

verfasst von: Donghui Yan, Zhiwei Qin, Songxiang Gu, Haiping Xu, Ming Shao

Erschienen in: Knowledge and Information Systems | Ausgabe 5/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Many applications require the collection of data on different variables or measurements over many system performance metrics. We term those broadly as measures or variables. Often data collection along each measure incurs a cost, thus it is desirable to consider the cost of measures in modeling. This is a fairly new class of problems in the area of cost-sensitive learning. A few attempts have been made to incorporate costs in combining and selecting measures. However, existing studies either do not strictly enforce a budget constraint, or are not the ‘most’ cost effective. With a focus on classification problems, we propose a computationally efficient approach that could find a near optimal model under a given budget by exploring the most ‘promising’ part of the solution space. Instead of outputting a single model, we produce a model schedule—a list of models, sorted by model costs and expected predictive accuracy. This could be used to choose the model with the best predictive accuracy under a given budget, or to trade off between the budget and the predictive accuracy. Experiments on some benchmark datasets show that our approach compares favorably to competing methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
2.
4.
Zurück zum Zitat Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. Chapman and Hall, LondonMATH Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. Chapman and Hall, LondonMATH
5.
Zurück zum Zitat Caruana R, Karampatziakis N, Yessenalina A (2008) An empirical evaluation of supervised learning in high dimensions. In: Proceedings of the twenty-fifth international conference on machine learning (ICML), pp 96–103 Caruana R, Karampatziakis N, Yessenalina A (2008) An empirical evaluation of supervised learning in high dimensions. In: Proceedings of the twenty-fifth international conference on machine learning (ICML), pp 96–103
6.
Zurück zum Zitat Caruana R, Niculescu-Mizil A (2006) An empirical comparison of supervised learning algorithms. In: Proceedings of the 23rd international conference on machine learning (ICML) Caruana R, Niculescu-Mizil A (2006) An empirical comparison of supervised learning algorithms. In: Proceedings of the 23rd international conference on machine learning (ICML)
7.
Zurück zum Zitat Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD, pp 785–794 Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD, pp 785–794
8.
Zurück zum Zitat Cortes C, Vapnik VN (1995) Support-vector networks. Mach Learn 20(3):273–297MATH Cortes C, Vapnik VN (1995) Support-vector networks. Mach Learn 20(3):273–297MATH
9.
Zurück zum Zitat Delaigle A, Hall P, Jin J (2011) Robustness and accuracy of methods for high dimensional data analysis based on student’s t-statistic. J R Stat Soc Ser B 73(3):283–301MathSciNetCrossRef Delaigle A, Hall P, Jin J (2011) Robustness and accuracy of methods for high dimensional data analysis based on student’s t-statistic. J R Stat Soc Ser B 73(3):283–301MathSciNetCrossRef
10.
Zurück zum Zitat Donoho D, Jin J (2008) Higher criticism thresholding: optimal feature selection when useful features are rare and weak. Proc Natl Acad Sci USA 105(39):14790–14795CrossRef Donoho D, Jin J (2008) Higher criticism thresholding: optimal feature selection when useful features are rare and weak. Proc Natl Acad Sci USA 105(39):14790–14795CrossRef
11.
12.
Zurück zum Zitat Elkan C (2001) The foundations of cost-sensitive learning. In: In Proceedings of the 17th international joint conference on artificial intelligence, pp 973–978 Elkan C (2001) The foundations of cost-sensitive learning. In: In Proceedings of the 17th international joint conference on artificial intelligence, pp 973–978
13.
Zurück zum Zitat Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Proceedings of the 13th international conference on machine learning (ICML) Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Proceedings of the 13th international conference on machine learning (ICML)
15.
Zurück zum Zitat Friedman J, Hastie T, Tibshirani R (2010) Regulzrization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1–22CrossRef Friedman J, Hastie T, Tibshirani R (2010) Regulzrization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1–22CrossRef
16.
17.
Zurück zum Zitat Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182MATH Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182MATH
18.
Zurück zum Zitat Ji S, Carin L (2007) Cost-sensitive feature acquisition and classification. Pattern Recogn 40(5):1474–1485CrossRef Ji S, Carin L (2007) Cost-sensitive feature acquisition and classification. Pattern Recogn 40(5):1474–1485CrossRef
20.
Zurück zum Zitat Liu H, Motoda H (1998) Feature selection for knowledge discovery and data mining. Springer, BerlinCrossRef Liu H, Motoda H (1998) Feature selection for knowledge discovery and data mining. Springer, BerlinCrossRef
21.
Zurück zum Zitat Luenberger DG (2003) Linear and nonlinear programming. Springer, BerlinMATH Luenberger DG (2003) Linear and nonlinear programming. Springer, BerlinMATH
22.
Zurück zum Zitat Meinshausen N, Buhlmann P (2006) High-dimensional graphs and variable selection with the lasso. Ann Stat 34(3):1436–1462MathSciNetCrossRef Meinshausen N, Buhlmann P (2006) High-dimensional graphs and variable selection with the lasso. Ann Stat 34(3):1436–1462MathSciNetCrossRef
23.
Zurück zum Zitat Min F, He H, Qian Y, Zhu W (2011) Test-cost-sensitive attribute reduction. Inf Sci 181(22):4928–4942CrossRef Min F, He H, Qian Y, Zhu W (2011) Test-cost-sensitive attribute reduction. Inf Sci 181(22):4928–4942CrossRef
24.
Zurück zum Zitat Nagaraju V, Yan D, Fiondella L (2018) A framework for selecting a subset of metrics considering cost. In: 24th ISSAT international conference on reliability and quality in design (RQD 2018) Nagaraju V, Yan D, Fiondella L (2018) A framework for selecting a subset of metrics considering cost. In: 24th ISSAT international conference on reliability and quality in design (RQD 2018)
25.
Zurück zum Zitat O’Brien DB, Gupta MR, Gray, RM (2008) Cost-sensitive multi-class classiØcation from probability estimates. In: Proceedings of the 25th international conference on machine learning (ICML) O’Brien DB, Gupta MR, Gray, RM (2008) Cost-sensitive multi-class classiØcation from probability estimates. In: Proceedings of the 25th international conference on machine learning (ICML)
26.
Zurück zum Zitat Park M, Hastie T (2007) L1-regularization path algorithm for generalized linear models. J R Stat Soc B 69(4):659–677MathSciNetCrossRef Park M, Hastie T (2007) L1-regularization path algorithm for generalized linear models. J R Stat Soc B 69(4):659–677MathSciNetCrossRef
27.
Zurück zum Zitat Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238CrossRef Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238CrossRef
29.
Zurück zum Zitat Sheng VS, Ling CX (2006) Thresholding for making classifiers cost-sensitive. In: Proceedings of AAAI Sheng VS, Ling CX (2006) Thresholding for making classifiers cost-sensitive. In: Proceedings of AAAI
30.
Zurück zum Zitat Spackman KA (1989) Signal detection theory: valuable tools for evaluating inductive learning. In: Proceedings of the 6th international workshop on machine learning Spackman KA (1989) Signal detection theory: valuable tools for evaluating inductive learning. In: Proceedings of the 6th international workshop on machine learning
31.
Zurück zum Zitat Tang J, Alelyani S, Liu H (2014) Feature selection for classification: a review. In: Aggarwal CC (ed) Data classification: algorithms and applications. Chapman and Hall, London Tang J, Alelyani S, Liu H (2014) Feature selection for classification: a review. In: Aggarwal CC (ed) Data classification: algorithms and applications. Chapman and Hall, London
32.
Zurück zum Zitat Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58(1):267–288MathSciNetMATH Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58(1):267–288MathSciNetMATH
33.
Zurück zum Zitat Wang A, Bian X, Liu P, Yan D (2019) DC\(^{2}\): a divide-and-conquer algorithm for large-scale kernel learning with application to clustering. arXiv:1911.06944 Wang A, Bian X, Liu P, Yan D (2019) DC\(^{2}\): a divide-and-conquer algorithm for large-scale kernel learning with application to clustering. arXiv:​1911.​06944
34.
Zurück zum Zitat Wang H (2009) Forward regression for ultra-high dimensional variable screening. J Am Stat Assoc 104(488):1512–1524MathSciNetCrossRef Wang H (2009) Forward regression for ultra-high dimensional variable screening. J Am Stat Assoc 104(488):1512–1524MathSciNetCrossRef
35.
Zurück zum Zitat Wang X, Leng C (2016) High dimensional ordinary least squares projection for screening variables. J R Stat Soc Ser B 78(3):589–611MathSciNetCrossRef Wang X, Leng C (2016) High dimensional ordinary least squares projection for screening variables. J R Stat Soc Ser B 78(3):589–611MathSciNetCrossRef
36.
Zurück zum Zitat Yan D, Li C, Cong N, Yu L, Gong P (2019) A structured approach to the analysis of remote sensing images. Int J Remote Sens 40(20):7874–7897CrossRef Yan D, Li C, Cong N, Yu L, Gong P (2019) A structured approach to the analysis of remote sensing images. Int J Remote Sens 40(20):7874–7897CrossRef
37.
Zurück zum Zitat Yan D, Wang Y, Wang J, Wu G, Wang H (2019) Fast communication-efficient spectral clustering over distributed data. arXiv:1905.01596 Yan D, Wang Y, Wang J, Wu G, Wang H (2019) Fast communication-efficient spectral clustering over distributed data. arXiv:​1905.​01596
39.
Zurück zum Zitat Zadrozny B, Langford J, Abe N (2003) Cost-sensitive learning by cost-proportionate example weighting. In: Proceedings of IEEE international conference on data mining (ICDM) Zadrozny B, Langford J, Abe N (2003) Cost-sensitive learning by cost-proportionate example weighting. In: Proceedings of IEEE international conference on data mining (ICDM)
40.
Zurück zum Zitat Zhou Q, Zhou H, Li T (2016) Cost-sensitive feature selection using random forest: selecting low-cost subsets of informative features. Knowl Based Syst 95:1–11CrossRef Zhou Q, Zhou H, Li T (2016) Cost-sensitive feature selection using random forest: selecting low-cost subsets of informative features. Knowl Based Syst 95:1–11CrossRef
41.
Zurück zum Zitat Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B 67(2):301–320MathSciNetCrossRef Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B 67(2):301–320MathSciNetCrossRef
Metadaten
Titel
Cost-sensitive selection of variables by ensemble of model sequences
verfasst von
Donghui Yan
Zhiwei Qin
Songxiang Gu
Haiping Xu
Ming Shao
Publikationsdatum
11.03.2021
Verlag
Springer London
Erschienen in
Knowledge and Information Systems / Ausgabe 5/2021
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-021-01551-x

Weitere Artikel der Ausgabe 5/2021

Knowledge and Information Systems 5/2021 Zur Ausgabe