Skip to main content
Top
Published in: Data Mining and Knowledge Discovery 2/2024

16-08-2023

Regression tree-based active learning

Authors: Ashna Jose, João Paulo Almeida de Mendonça, Emilie Devijver, Noël Jakse, Valérie Monbet, Roberta Poloni

Published in: Data Mining and Knowledge Discovery | Issue 2/2024

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Machine learning algorithms often require large training sets to perform well, but labeling such large amounts of data is not always feasible, as in many applications, substantial human effort and material cost is needed. Finding effective ways to reduce the size of training sets while maintaining the same performance is then crucial: one wants to choose the best sample of fixed size to be labeled among a given population, aiming at an accurate prediction of the response. This challenge has been studied in detail in classification, but not deeply enough in regression, which is known to be a more difficult task for active learning despite its need in practice. Few model-free active learning methods have been proposed that detect the new samples to be labeled using unlabeled data, but they lack the information of the conditional distribution between the response and the features. In this paper, we propose a standard regression tree-based active learning method for regression that improves significantly upon existing active learning approaches. It provides impressive results for small and large training sets and an appreciably low variance within several runs. We also exploit model-free approaches, and adapt them to our algorithm to utilize maximum information. Through experiments on numerous benchmark datasets, we demonstrate that our framework improves existing methods and is effective in learning a regression model from a very limited labeled dataset, reducing the sample size for a fixed level of performance, even with many features.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
1
These results were given for purely random Mondrian trees. Here, we use a regression tree that contains the knowledge of the training set, thus improving the structure of the tree.
 
3
For a deeper understanding, we illustrate our method on a generated multimodal dataset in Appendix 8 as well.
 
Literature
go back to reference Burbidge R, Rowland JJ, King RD (2007) Active learning for regression based on query by committee. In: Yin H, Tino P, Corchado E, Byrne W, Yao X (eds) Intelligent Data Engineering and Automated Learning - IDEAL 2007. Springer, Berlin, pp 209–218CrossRef Burbidge R, Rowland JJ, King RD (2007) Active learning for regression based on query by committee. In: Yin H, Tino P, Corchado E, Byrne W, Yao X (eds) Intelligent Data Engineering and Automated Learning - IDEAL 2007. Springer, Berlin, pp 209–218CrossRef
go back to reference Cai W, Zhang M, Zhang Y (2017) Batch mode active learning for regression with expected model change. IEEE Trans Neural Networks Learn Syst 28(7):1668–1681MathSciNetCrossRef Cai W, Zhang M, Zhang Y (2017) Batch mode active learning for regression with expected model change. IEEE Trans Neural Networks Learn Syst 28(7):1668–1681MathSciNetCrossRef
go back to reference Cai W, Zhang Y, Zhou J (2013) Maximizing expected model change for active learning in regression. In: 2013 IEEE 13th international conference on data mining, pp 51–60 Cai W, Zhang Y, Zhou J (2013) Maximizing expected model change for active learning in regression. In: 2013 IEEE 13th international conference on data mining, pp 51–60
go back to reference Chaudhuri K, Jain P, Natarajan N (2017) Active heteroscedastic regression. In: Proceedings of the 34th international conference on machine learning. Proceedings of machine learning research, vol 70, pp 694–702 Chaudhuri K, Jain P, Natarajan N (2017) Active heteroscedastic regression. In: Proceedings of the 34th international conference on machine learning. Proceedings of machine learning research, vol 70, pp 694–702
go back to reference Cohn D, Ghahramani Z, Jordan M (1994) Active learning with statistical models. In: Advances in neural information processing systems, vol 7 Cohn D, Ghahramani Z, Jordan M (1994) Active learning with statistical models. In: Advances in neural information processing systems, vol 7
go back to reference Goetz J, Tewari A, Zimmerman P (2018) Active learning for non-parametric regression using purely random trees. In: Advances in neural information processing systems, vol 31 Goetz J, Tewari A, Zimmerman P (2018) Active learning for non-parametric regression using purely random trees. In: Advances in neural information processing systems, vol 31
go back to reference Hazan E, Karnin Z (2014) Hard-margin active linear regression. In: Xing EP, Jebara T (eds) Proceedings of the 31st international conference on machine learning. Proceedings of Machine Learning Research, vol 32. PMLR, Bejing, pp 883–891 Hazan E, Karnin Z (2014) Hard-margin active linear regression. In: Xing EP, Jebara T (eds) Proceedings of the 31st international conference on machine learning. Proceedings of Machine Learning Research, vol 32. PMLR, Bejing, pp 883–891
go back to reference Holzmüller D, Zaverkin V, Kästner J, Steinwart I (2023) A framework and benchmark for deep batch active learning for regression Holzmüller D, Zaverkin V, Kästner J, Steinwart I (2023) A framework and benchmark for deep batch active learning for regression
go back to reference Kaur H, Kaur H, Sharma A (2021) A review of recent advancement in superconductors. Mater Today Proc 37:3612–3614CrossRef Kaur H, Kaur H, Sharma A (2021) A review of recent advancement in superconductors. Mater Today Proc 37:3612–3614CrossRef
go back to reference Lakshminarayanan B, Roy DM, Teh YW (2014) Mondrian forests: Efficient online random forests. Adv Neural Inf Process Syst 27:1 Lakshminarayanan B, Roy DM, Teh YW (2014) Mondrian forests: Efficient online random forests. Adv Neural Inf Process Syst 27:1
go back to reference Liu Z, Jiang X, Luo H, Fang W, Liu J, Wu D (2021) Pool-based unsupervised active learning for regression using iterative representativeness-diversity maximization. Pattern Recognit Lett 142:11–19ADSCrossRef Liu Z, Jiang X, Luo H, Fang W, Liu J, Wu D (2021) Pool-based unsupervised active learning for regression using iterative representativeness-diversity maximization. Pattern Recognit Lett 142:11–19ADSCrossRef
go back to reference Luo Z, Hauskrecht M (2019) Region-based active learning with hierarchical and adaptive region construction, pp 441–449 Luo Z, Hauskrecht M (2019) Region-based active learning with hierarchical and adaptive region construction, pp 441–449
go back to reference O’Neill J, Jane Delany S, MacNamee B (2017) Model-free and model-based active learning for regression. Advances in computational intelligence systems. Springer, Cham, pp 375–386CrossRef O’Neill J, Jane Delany S, MacNamee B (2017) Model-free and model-based active learning for regression. Advances in computational intelligence systems. Springer, Cham, pp 375–386CrossRef
go back to reference Polyzos KD, Lu Q, Giannakis GB (2022) Weighted ensembles for active learning with adaptivity Polyzos KD, Lu Q, Giannakis GB (2022) Weighted ensembles for active learning with adaptivity
go back to reference Pukelsheim F (2006) Optimal design of experiments (classics in applied mathematics). Society for Industrial and Applied Mathematics, USACrossRef Pukelsheim F (2006) Optimal design of experiments (classics in applied mathematics). Society for Industrial and Applied Mathematics, USACrossRef
go back to reference Riis C, Antunes F, Hüttel FB, Azevedo CL, Pereira FC (2023) Bayesian active learning with fully Bayesian gaussian processes Riis C, Antunes F, Hüttel FB, Azevedo CL, Pereira FC (2023) Bayesian active learning with fully Bayesian gaussian processes
go back to reference Sabato S, Munos R (2014) Active regression by stratification. In: Proceedings of the 27th international conference on neural information processing systems—Volume 1. NIPS’14. MIT Press, Cambridge, pp 469–477 Sabato S, Munos R (2014) Active regression by stratification. In: Proceedings of the 27th international conference on neural information processing systems—Volume 1. NIPS’14. MIT Press, Cambridge, pp 469–477
go back to reference Willett R, Nowak R, Castro R (2005) Faster rates in regression via active learning. In: Advances in neural information processing systems, vol 18 Willett R, Nowak R, Castro R (2005) Faster rates in regression via active learning. In: Advances in neural information processing systems, vol 18
go back to reference Woods DC, Lewis SM, Eccleston JA, Russell KG (2006) Designs for generalized linear models with several variables and model uncertainty. Technometrics 48(2):284–292MathSciNetCrossRef Woods DC, Lewis SM, Eccleston JA, Russell KG (2006) Designs for generalized linear models with several variables and model uncertainty. Technometrics 48(2):284–292MathSciNetCrossRef
go back to reference Wu D (2019) Pool-based sequential active learning for regression. IEEE Trans Neural Networks Learn Syst 30(5):1348–1359MathSciNetCrossRef Wu D (2019) Pool-based sequential active learning for regression. IEEE Trans Neural Networks Learn Syst 30(5):1348–1359MathSciNetCrossRef
go back to reference Xue Y, Hauskrecht M (2019) Active learning of multi-class classification models from ordered class sets. Proc AAAI Conf Artif Intell 33(01):5589–5596PubMedPubMedCentral Xue Y, Hauskrecht M (2019) Active learning of multi-class classification models from ordered class sets. Proc AAAI Conf Artif Intell 33(01):5589–5596PubMedPubMedCentral
go back to reference Xue Y, Hauskrecht M (2017) Active learning of classification models with like-scale feedback. In: Proceedings of the SIAM international conference on data mining, pp 28–35 Xue Y, Hauskrecht M (2017) Active learning of classification models with like-scale feedback. In: Proceedings of the SIAM international conference on data mining, pp 28–35
go back to reference Yang M, Biedermann S, Tang E (2013) On optimal designs for nonlinear models: a general and efficient algorithm. J Am Stat Assoc 108(504):1411–1420MathSciNetCrossRef Yang M, Biedermann S, Tang E (2013) On optimal designs for nonlinear models: a general and efficient algorithm. J Am Stat Assoc 108(504):1411–1420MathSciNetCrossRef
go back to reference Yu H, Kim S (2010) Passive sampling for regression. In: 2010 IEEE international conference on data mining, pp 1151–1156 Yu H, Kim S (2010) Passive sampling for regression. In: 2010 IEEE international conference on data mining, pp 1151–1156
go back to reference Zhang H, Ravi SS, Davidson I (2020) A graph-based approach for active learning in regression, pp 280–288 Zhang H, Ravi SS, Davidson I (2020) A graph-based approach for active learning in regression, pp 280–288
go back to reference Zhao J, Sun S, Wang H, Cao Z (2020) Promoting active learning with mixtures of gaussian processes. Knowl-Based Syst 188:105044CrossRef Zhao J, Sun S, Wang H, Cao Z (2020) Promoting active learning with mixtures of gaussian processes. Knowl-Based Syst 188:105044CrossRef
Metadata
Title
Regression tree-based active learning
Authors
Ashna Jose
João Paulo Almeida de Mendonça
Emilie Devijver
Noël Jakse
Valérie Monbet
Roberta Poloni
Publication date
16-08-2023
Publisher
Springer US
Published in
Data Mining and Knowledge Discovery / Issue 2/2024
Print ISSN: 1384-5810
Electronic ISSN: 1573-756X
DOI
https://doi.org/10.1007/s10618-023-00951-7

Other articles of this Issue 2/2024

Data Mining and Knowledge Discovery 2/2024 Go to the issue

Premium Partner