Top

Data Mining and Knowledge Discovery

Published in:

16-08-2023

Regression tree-based active learning

Authors: Ashna Jose, João Paulo Almeida de Mendonça, Emilie Devijver, Noël Jakse, Valérie Monbet, Roberta Poloni

Published in: Data Mining and Knowledge Discovery | Issue 2/2024

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Machine learning algorithms often require large training sets to perform well, but labeling such large amounts of data is not always feasible, as in many applications, substantial human effort and material cost is needed. Finding effective ways to reduce the size of training sets while maintaining the same performance is then crucial: one wants to choose the best sample of fixed size to be labeled among a given population, aiming at an accurate prediction of the response. This challenge has been studied in detail in classification, but not deeply enough in regression, which is known to be a more difficult task for active learning despite its need in practice. Few model-free active learning methods have been proposed that detect the new samples to be labeled using unlabeled data, but they lack the information of the conditional distribution between the response and the features. In this paper, we propose a standard regression tree-based active learning method for regression that improves significantly upon existing active learning approaches. It provides impressive results for small and large training sets and an appreciably low variance within several runs. We also exploit model-free approaches, and adapt them to our algorithm to utilize maximum information. Through experiments on numerous benchmark datasets, we demonstrate that our framework improves existing methods and is effective in learning a regression model from a very limited labeled dataset, reducing the sample size for a fixed level of performance, even with many features.

previous article Sky-signatures: detecting and characterizing recurrent behavior in sequential data

next article Optimal selection of benchmarking datasets for unbiased machine learning algorithm evaluation

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Available only for authorised users

These results were given for purely random Mondrian trees. Here, we use a regression tree that contains the knowledge of the training set, thus improving the structure of the tree.

https://archive.ics.uci.edu/.

For a deeper understanding, we illustrate our method on a generated multimodal dataset in Appendix 8 as well.

https://archive.ics.uci.edu/ml/datasets/superconduct- ivty+data/.

Burbidge R, Rowland JJ, King RD (2007) Active learning for regression based on query by committee. In: Yin H, Tino P, Corchado E, Byrne W, Yao X (eds) Intelligent Data Engineering and Automated Learning - IDEAL 2007. Springer, Berlin, pp 209–218CrossRef

Cai W, Zhang M, Zhang Y (2017) Batch mode active learning for regression with expected model change. IEEE Trans Neural Networks Learn Syst 28(7):1668–1681MathSciNetCrossRef

Cai W, Zhang Y, Zhou J (2013) Maximizing expected model change for active learning in regression. In: 2013 IEEE 13th international conference on data mining, pp 51–60

Chan NN (1982) A-optimality for regression designs. J Math Anal Appl 87(1):45–50MathSciNetCrossRef

Chaudhuri K, Jain P, Natarajan N (2017) Active heteroscedastic regression. In: Proceedings of the 34th international conference on machine learning. Proceedings of machine learning research, vol 70, pp 694–702

Chauvet G, Tillé Y (2006) A fast algorithm for balanced sampling. Comput Stat 21(1):53–62MathSciNetCrossRef

Cohn D, Ghahramani Z, Jordan M (1994) Active learning with statistical models. In: Advances in neural information processing systems, vol 7

Goetz J, Tewari A, Zimmerman P (2018) Active learning for non-parametric regression using purely random trees. In: Advances in neural information processing systems, vol 31

Hazan E, Karnin Z (2014) Hard-margin active linear regression. In: Xing EP, Jebara T (eds) Proceedings of the 31st international conference on machine learning. Proceedings of Machine Learning Research, vol 32. PMLR, Bejing, pp 883–891

Holzmüller D, Zaverkin V, Kästner J, Steinwart I (2023) A framework and benchmark for deep batch active learning for regression

John RCS, Draper NR (1975) D-optimality for regression designs: a review. Technometrics 17(1):15–23MathSciNetCrossRef

Kaur H, Kaur H, Sharma A (2021) A review of recent advancement in superconductors. Mater Today Proc 37:3612–3614CrossRef

Lakshminarayanan B, Roy DM, Teh YW (2014) Mondrian forests: Efficient online random forests. Adv Neural Inf Process Syst 27:1

Liu Z, Jiang X, Luo H, Fang W, Liu J, Wu D (2021) Pool-based unsupervised active learning for regression using iterative representativeness-diversity maximization. Pattern Recognit Lett 142:11–19ADSCrossRef

Luo Z, Hauskrecht M (2019) Region-based active learning with hierarchical and adaptive region construction, pp 441–449

O’Neill J, Jane Delany S, MacNamee B (2017) Model-free and model-based active learning for regression. Advances in computational intelligence systems. Springer, Cham, pp 375–386CrossRef

Polyzos KD, Lu Q, Giannakis GB (2022) Weighted ensembles for active learning with adaptivity

Pukelsheim F (2006) Optimal design of experiments (classics in applied mathematics). Society for Industrial and Applied Mathematics, USACrossRef

Riis C, Antunes F, Hüttel FB, Azevedo CL, Pereira FC (2023) Bayesian active learning with fully Bayesian gaussian processes

Sabato S, Munos R (2014) Active regression by stratification. In: Proceedings of the 27th international conference on neural information processing systems—Volume 1. NIPS’14. MIT Press, Cambridge, pp 469–477

Willett R, Nowak R, Castro R (2005) Faster rates in regression via active learning. In: Advances in neural information processing systems, vol 18

Woods DC, Lewis SM, Eccleston JA, Russell KG (2006) Designs for generalized linear models with several variables and model uncertainty. Technometrics 48(2):284–292MathSciNetCrossRef

Wu D (2019) Pool-based sequential active learning for regression. IEEE Trans Neural Networks Learn Syst 30(5):1348–1359MathSciNetCrossRef

Wu D, Lin C-T, Huang J (2019) Active learning for regression using greedy sampling. Inf Sci 474:90–105MathSciNetCrossRef

Xue Y, Hauskrecht M (2019) Active learning of multi-class classification models from ordered class sets. Proc AAAI Conf Artif Intell 33(01):5589–5596PubMedPubMedCentral

Xue Y, Hauskrecht M (2017) Active learning of classification models with like-scale feedback. In: Proceedings of the SIAM international conference on data mining, pp 28–35

Yang M, Biedermann S, Tang E (2013) On optimal designs for nonlinear models: a general and efficient algorithm. J Am Stat Assoc 108(504):1411–1420MathSciNetCrossRef

Yu H, Kim S (2010) Passive sampling for regression. In: 2010 IEEE international conference on data mining, pp 1151–1156

Zhang H, Ravi SS, Davidson I (2020) A graph-based approach for active learning in regression, pp 280–288

Zhao J, Sun S, Wang H, Cao Z (2020) Promoting active learning with mixtures of gaussian processes. Knowl-Based Syst 188:105044CrossRef

Title: Regression tree-based active learning
Authors: Ashna Jose
João Paulo Almeida de Mendonça
Emilie Devijver
Noël Jakse
Valérie Monbet
Roberta Poloni
Publication date: 16-08-2023
Publisher: Springer US
Published in: Data Mining and Knowledge Discovery / Issue 2/2024
Print ISSN: 1384-5810
Electronic ISSN: 1573-756X
DOI: https://doi.org/10.1007/s10618-023-00951-7

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 2/2024

An anomaly aware network embedding framework for unsupervised anomalous link detection

Correction to: AA-forecast: anomaly-aware forecast for extreme events

SALτ: efficiently stopping TAR by improving priors estimates

Correction: A semi‑supervised interactive algorithm for change point detection

Sky-signatures: detecting and characterizing recurrent behavior in sequential data

A semi-supervised interactive algorithm for change point detection

Premium Partner