Tree ensembles for predicting structured outputs

doi:10.1016/j.patcog.2012.09.023

Pattern Recognition

Volume 46, Issue 3, March 2013, Pages 817-833

https://doi.org/10.1016/j.patcog.2012.09.023 Get rights and content

Abstract

In this paper, we address the task of learning models for predicting structured outputs. We consider both global and local predictions of structured outputs, the former based on a single model that predicts the entire output structure and the latter based on a collection of models, each predicting a component of the output structure. We use ensemble methods and apply them in the context of predicting structured outputs. We propose to build ensemble models consisting of predictive clustering trees, which generalize classification trees: these have been used for predicting different types of structured outputs, both locally and globally. More specifically, we develop methods for learning two types of ensembles (bagging and random forests) of predictive clustering trees for global and local predictions of different types of structured outputs. The types of outputs considered correspond to different predictive modeling tasks: multi-target regression, multi-target classification, and hierarchical multi-label classification. Each of the combinations can be applied both in the context of global prediction (producing a single ensemble) or local prediction (producing a collection of ensembles). We conduct an extensive experimental evaluation across a range of benchmark datasets for each of the three types of structured outputs. We compare ensembles for global and local prediction, as well as single trees for global prediction and tree collections for local prediction, both in terms of predictive performance and in terms of efficiency (running times and model complexity). The results show that both global and local tree ensembles perform better than the single model counterparts in terms of predictive power. Global and local tree ensembles perform equally well, with global ensembles being more efficient and producing smaller models, as well as needing fewer trees in the ensemble to achieve the maximal performance.

Highlights

► Tree ensembles for efficient local and global predictions of structured outputs. ► Considered multi-target prediction and hierarchical multi-label classification. ► Theoretical and empirical evaluations over a variety of benchmark domains. ► Local and global tree ensembles perform equally well, while global are more efficient.

Introduction

Supervised learning is one of the most widely researched and investigated areas of machine learning. The goal in supervised learning is to learn, from a set of examples with known class, a function that outputs a prediction for the class of a previously unseen example. If the examples belong to two classes (e.g., the example has some property or not) the task is called binary classification. The task where the examples can belong to a single class from a given set of m classes ( $m \geq 3$ ) is known as multi-class classification. The case where the output is a real value is called regression.

However, in many real life problems of predictive modeling the output (i.e., the target) is structured, meaning that there can be dependencies between classes (e.g., classes are organized into a tree-shaped hierarchy or a directed acyclic graph) or some internal relations between the classes (e.g., sequences). These types of problems occur in domains such as life sciences (predicting gene function, finding the most important genes for a given disease, predicting toxicity of molecules, etc.), ecology (analysis of remotely sensed data, habitat modeling), multimedia (annotation and retrieval of images and videos) and the semantic web (categorization and analysis of text and web pages). Having in mind the needs of these application domains and the increasing quantities of structured data, Yang and Wu [1] and Kriegel et al. [2] listed the task of “mining complex knowledge from complex data” as one of the most challenging problems in machine learning.

A variety of methods, specialized in predicting a given type of structured output (e.g., a hierarchy of classes [3]), have been proposed [4]. These methods can be categorized into two groups of methods for solving the problem of predicting structured outputs [4], [3]: (1) local methods that predict component(s) of the output and then combine the individual models to get the overall model and (2) global methods that predict the complete structure as a whole (also known as ‘big-bang’ approaches). The global methods have several advantages over the local methods. First, they exploit and use the dependencies that exist between the components of the structured output in the model learning phase, which can result in better predictive performance. Next, they are typically more efficient: it can easily happen that the number of components in the output is very large (e.g., hierarchies in functional genomics can have several thousands of components), in which case executing a basic method for each component is not feasible. Furthermore, they produce models that are typically smaller than the sum of the sizes of the models built for each of the components.

The predictive models that we consider in this paper are predictive clustering trees (PCTs). PCTs belong to the group of global methods. PCTs offer a unifying approach for dealing with different types of structured outputs and construct the predictive models very efficiently. They are able to make predictions for several types of structured outputs: tuples of continuous/discrete variables, hierarchies of classes, and time series. More details about the PCT framework can be found in [5], [6], [7], [8], [9].

PCTs can be considered as a generalization of standard decision trees towards predicting structured outputs. Although they offer easily interpretable trees with good predictive performance, they inherit the stability issues of decision trees [10]. Namely, change in just a few training examples can sometimes drastically change the structure of the tree. Breiman [11] states that un-stable predictive models (such as decision trees) could be combined into an ensemble to improve their predictive performance. An ensemble is a set of (base) predictive models, whose output is combined. For basic classification and regression tasks, it is widely accepted that an ensemble lifts the predictive performance of its base predictive models [12]. However, for the task of predicting structured outputs, this issue has not been thoroughly investigated. Moreover, in the case where the base predictive models are decision trees, Bauer and Kohavi [13] conclude that the ensemble's increase in performance is stronger if the trees are unpruned, i.e., allowed to overfit. On the other hand, Blockeel et al. [14] state that PCTs for structured outputs show less overfitting than the trees for classification of a single target variable. Having in mind these two conflicting influences, it is not obvious whether an ensemble of predictive clustering trees can significantly increase the predictive performance over that of a single predictive clustering tree.

Furthermore, in an ensemble learning setting, it is not clear if the predictive performance of an ensemble of global predictive models will be better or worse than the predictive performance of a combination of ensembles of local predictive models. Generally, an ensemble is known to perform better than its base learner if the base learner is accurate and diverse [15]. While the superior predictive performance of global models has been shown before [8], less is known about their diversity or instability (i.e., whether they produce different errors with small changes to the training data). It is expected that a PCT for predicting structured outputs, especially in the case of hierarchical classification, is less unstable than a PCT for predicting components of the output. It is also not clear which approach will be more efficient, both in terms of running time and size of the predictive models.

In this paper, we investigate the aforementioned questions. We use bagging and random forests as ensemble learning methods, since they are the two most widely used ensemble learning methods in the context of decision trees [16]. We consider two structured output machine learning tasks: predicting multiple target variables and hierarchical multi-label classification. We perform an extensive empirical evaluation of the proposed methods over a variety of benchmark datasets.

The paper is based on our previous work by Kocev et al. [7] and Schietgat et al. [17]. Kocev et al. [7] conducted an initial comparison of different ensemble schemes using predictive clustering trees in the context of predicting multiple targets. Schietgat et al. [17] introduced bagging of PCTs for hierarchical multi-label classification (HMC) in the context of functional genomics. The present paper extends the previous work [7], [17] in the following directions:

•
The tasks of predicting multiple target variables and hierarchical multi-label classification are discussed jointly. Formal definitions of the considered tasks are provided, as well as an extensive discussion of the related work.
•
The experimental evaluation is performed on a much wider selection of datasets, from various domains. The performance of ensembles with different number of base predictive models is evaluated and saturation curves are provided.
•
A better methodology for performance comparisons is used and a more detailed discussion of the results is presented. Friedman tests combined with a Nemenyi post hoc test are used to compare different ensemble schemes, instead of performing pairwise comparisons.
•
Global random forests of PCTs for HMC are introduced and evaluated. Local ensembles (both bagging and random forests) of PCTs for HMC are introduced and evaluated.
•
A computational complexity analysis of the proposed methods is performed, which is consistent with the empirical evidence. Global tree ensembles are most efficient, especially random forests, and are scalable to large datasets.

The remainder of this paper is organized as follows. In Section 2, the considered machine learning tasks are formally defined. Section 3 first explains the predictive clustering trees framework and the extensions for predicting multiple targets and hierarchical multi-label classification. It then describes the ensemble methods and their extension for predicting structured outputs. The design of the experiments, the descriptions of the datasets, the evaluation measures and the parameter settings for the algorithms are given in Section 4. Section 5 presents and discusses the obtained results. The related work is presented in Section 6. Finally, the conclusions are stated in Section 7.

Section snippets

Machine learning tasks

The work presented in this paper concerns the learning of ensembles for predicting structured outputs. First, we formally describe the machine learning tasks that we consider here: predicting multiple targets and hierarchical multi-label classification. We follow the suggestions by Džeroski et al. [18] and Džeroski [86], where predictive modeling is defined for arbitrary types of input and output data. In particular, we describe the tasks of predicting multiple targets and hierarchical

Tree ensembles for predicting structured outputs

In this section, we present our ensemble methods for predicting structured outputs. We begin by presenting predictive clustering trees and their instantiations for predicting multiple continuous variables, predicting multiple discrete variables, and hierarchical multi-label classification. Next, we describe how ensemble learning methods can be adapted to use predictive clustering trees as base predictive models. Finally, we describe an approach to the prediction of structured outputs that uses

Experimental design

In this section, we describe the procedure for experimental evaluation of the proposed ensemble methods for predicting structured outputs. First, we state the questions we consider. Next, we present the datasets we use to evaluate the algorithms, and then the evaluation measures we applied. In the last subsection, we give the parameter values used in the algorithms and the statistical tests that we used.

Results and discussion

The results from the experiments can be analyzed along several dimensions. First, we present the saturation curves of the ensemble methods (both for predicting the structured outputs and the components of the outputs). We also compare single trees vs. ensembles of trees. Next, we compare models that predict the complete structured output vs. models that predict components of the structured output. Finally, we evaluate the algorithms by their efficiency in terms of running time and model size.

Related work

The task of predicting structured outputs is gaining more and more attention within the machine learning research community [4], [3]. The community has proposed a number of different methods for addressing this task. However, they are typically “computationally demanding and ill-suited for dealing with large datasets” [4]. In this paper, we have proposed a global method for predicting structured outputs that has good predictive performance and is very efficient: it scales linearly with the size

Conclusions

In this paper, we address the task of learning predictive models for structured output learning, which takes as input a tuple of attribute values and produces a structured object. In contrast to standard classification and regression, where the output is a single scalar value, in structured output learning the output is a data structure, such as a tuple or a directed acyclic graph. We consider both global and local prediction of structured outputs, the former based on a single model that

Acknowledgment

We would like to thank Valentin Gjorgjioski for the discussions concerning the computational complexity of the proposed methods. Celine Vens is a post-doctoral fellow of the Research Fund—Flanders (FWO—Vlaanderen). The work of Dragi Kocev and Sašo Džeroski was supported by the Slovenian Research Agency (Grants P2-0103 and J2-2285), the European Commission (Grants ICT-2010-266722 and ICT-2011-287713), and Operation no. OP13.1.1.2.02.0005 nanced by the European Regional Development Fund (85%) and

Dragi Kocev received his Ph.D. degree in computer science from the IPS Jozef Stefan in 2011. He is currently a post-doctoral researcher at the Department of Knowledge Technologies, Jozef Stefan Institute, Ljubljana, Slovenia. His research interests include machine learning and data mining, prediction of structured outputs and their applications in environmental and life sciences.

References (86)

A. Verikas et al.
Mining data with random forestsa survey and results of new tests
Pattern Recognition
(2011)
D. Demšar et al.
Using multi-objective classification to model communities of soil
Ecological Modelling
(2006)
H. Zouari et al.
Controlling the diversity in classifier ensembles through a measure of agreement
Pattern Recognition
(2005)
C. Kampichler et al.
Application of machine learning techniques to the analysis of soil ecological data basesrelationships between habitat features and Collembolan community characteristics
Soil Biology and Biochemistry
(2000)
D. Stojanova et al.
Estimating vegetation height and canopy cover from remotely sensed data with machine learning
Ecological Informatics
(2010)
D. Kocev et al.
Using single- and multi-target regression trees and ensembles to model a compound index of vegetation condition
Ecological Modelling
(2009)
M. Boutell et al.
Learning multi-label scene classification
Pattern Recognition
(2004)
M. Sokolova et al.
A systematic analysis of performance measures for classification tasks
Information Processing & Management
(2009)
Y. Chang et al.
Multi-task learning to rank for web search
Pattern Recognition Letters
(2012)
G. Liu et al.
Multi-output regression on the output manifold
Pattern Recognition
(2009)

J. Díez et al.

A semi-dependent decomposition approach to learn hierarchical classifiers

Pattern Recognition

(2010)

Q. Yang et al.

10 challenging problems in data mining research

International Journal of Information Technology & Decision Making

(2006)

H.-P. Kriegel et al.

Future trends in data mining

Data Mining and Knowledge Discovery

(2007)

C. Silla et al.

A survey of hierarchical classification across different application domains

Data Mining and Knowledge Discovery

(2011)

G.H. Bakır, T. Hofmann, B. Schölkopf, A.J. Smola, B. Taskar, S.V.N. Vishwanathan, Predicting Structured Data, Neural...

H. Blockeel, L.D. Raedt, J. Ramon, Top-down induction of clustering trees, in: Proceedings of the 15th International...

J. Struyf, S. Džeroski, Constraint based induction of multi-objective regression trees, in: Proceedings of the 4th...

D. Kocev, C. Vens, J. Struyf, S. Džeroski, Ensembles of multi-objective decision trees, in: ECML '07: Proceedings of...

C. Vens et al.

Decision trees for hierarchical multi-label classification

Machine Learning

(2008)

I. Slavkov et al.

Finding explained groups of time-course gene expression profiles with predictive clustering trees

Molecular Biosystems

(2010)

L. Breiman et al.

Classification and Regression Trees

(1984)

L. Breiman

Bagging predictors

Machine Learning

(1996)

G. Seni et al.

Ensemble Methods in Data MiningImproving Accuracy through Combining Predictions

(2010)

E. Bauer et al.

An empirical comparison of voting classification algorithmsbagging, boosting, and variants

Machine Learning

(1999)

H. Blockeel, L. Schietgat, J. Struyf, S. Džeroski, A. Clare, Decision trees for hierarchical multilabel classification:...

L.K. Hansen et al.

Neural network ensembles

IEEE Transactions on Pattern Analysis and Machine Intelligence

(1990)

L. Schietgat et al.

Predicting gene function using hierarchical multi-label decision tree ensembles

BMC Bioinformatics

(2010)

S. Džeroski, V. Gjorgjioski, I. Slavkov, J. Struyf, Analysis of time series data with predictive clustering trees, in:...

G. Tsoumakas et al.

Multi label classificationan overview

International Journal of Data Warehouse and Mining

(2007)

P. Langley, Elements of Machine Learning, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA,...

H. Blockeel et al.

Efficient algorithms for decision tree cross-validation

Journal of Machine Learning Research

(2002)

R.J. Quinlan

C4.5Programs for Machine Learning

(1993)

A. Clare, Machine Learning and Data Mining for Yeast Functional Genomics, Ph.D. Thesis, University of Wales...

L. Kuncheva

Combining Pattern ClassifiersMethods and Algorithms

(2004)

Y. Freund, R.E. Schapire, Experiments with a new boosting algorithm, in: Proceedings of the 13th International...

L. Breiman

Using iterated bagging to debias regressions

Machine Learning

(2001)

L. Breiman

Random forests

Machine Learning

(2001)

T.K. Ho

The random subspace method for constructing decision forests

IEEE Transactions on Pattern Analysis and Machine Intelligence

(1998)

T.G. Dietterich, Ensemble methods in machine learning, in: Proceedings of the 1st International Workshop on Multiple...

I.H. Witten et al.

Data MiningPractical Machine Learning Tools and Techniques

(2005)

Intel, Intel® SSE4 Programming Reference, D91561-003 Edition,...

T. Gärtner et al.

On structured output traininghard cases and an efficient alternative

Machine Learning

(2009)

A. Karalič, First Order Regression, Ph.D. Thesis, Faculty of Computer Science, University of Ljubljana, Ljubljana,...

Cited by (240)

Predicting adverse long-term neurocognitive outcomes after pediatric intensive care unit admission
2024, Computer Methods and Programs in Biomedicine
Critically ill children may suffer from impaired neurocognitive functions years after ICU (intensive care unit) discharge. To assess neurocognitive functions, these children are subjected to a fixed sequence of tests. Undergoing all tests is, however, arduous for former pediatric ICU patients, resulting in interrupted evaluations where several neurocognitive deficiencies remain undetected. As a solution, we propose using machine learning to predict the optimal order of tests for each child, reducing the number of tests required to identify the most severe neurocognitive deficiencies.
We have compared the current clinical approach against several machine learning methods, mainly multi-target regression and label ranking methods. We have also proposed a new method that builds several multi-target predictive models and combines the outputs into a ranking that prioritizes the worse neurocognitive outcomes. We used data available at discharge, from children who participated in the PEPaNIC-RCT trial (ClinicalTrials.gov-NCT01536275), as well as data from a 2-year follow-up study. The institutional review boards at each participating site have also approved this follow-up study (ML8052; NL49708.078; Pro00038098).
Our proposed method managed to outperform other machine learning methods and also the current clinical practice. Precisely, our method reaches approximately 80% precision when considering top-4 outcomes, in comparison to 65% and 78% obtained by the current clinical practice and the state-of-the-art method in label ranking, respectively.
Our experiments demonstrated that machine learning can be competitive or even superior to the current testing order employed in clinical practice, suggesting that our model can be used to severely reduce the number of tests necessary for each child. Moreover, the results indicate that possible long-term adverse outcomes are already predictable as early as at ICU discharge. Thus, our work can be seen as the first step to allow more personalized follow-up after ICU discharge leading to preventive care rather than curative.
Survival analysis as semi-supervised multi-target regression for time-to-employment prediction using oblique predictive clustering trees
2024, Expert Systems with Applications
We address the problem of estimating the time-to-employment of a jobseeker using survival analysis and oblique predictive clustering tree. Unlike standard survival analysis, oblique predictive clustering tree can handle categorical and continuous data and is capable of modelling non-linear dependences. Treating the censored data as missing data opens the possibility to perform survival analysis by using structured output prediction in semi-supervised multi-target regression setting. The effectiveness of this approach is shown on a real dataset from Public Employment Services in Slovenia, comprising time-to-employment records with jobseekers’ personal and professional characteristics. The performances are compared with six state-of-the-art AI methods. To the best of our knowledge, this is the first example of using semi-supervised oblique predictive clustering tree for survival analysis.
CLUSPLUS: A decision tree-based framework for predicting structured outputs
2023, SoftwareX
We present CLUSplus, a machine learning framework based on decision trees specialized for complex predictive modeling tasks. We provide the scientific community with an open source Java framework that unifies several major research directions in the machine learning field. The framework supports multi-target prediction, i.e., the simultaneous prediction of multiple continuous values, multiple discrete values, and hierarchically organized discrete values. Furthermore, CLUSplus enables state-of-the-art predictive performance via ensemble learning, exploitation of unlabeled data via semi-supervised learning, and data understanding via feature importance and building interpretable models. Out of a wide array of machine learning frameworks available today, very few support complex predictive modeling tasks and, to the best of our knowledge, none support all of the aforementioned functionalities.
Forecasting the compressive strength of GGBFS-based geopolymer concrete via ensemble predictive models
2023, Construction and Building Materials
The compressive strength ( $f_{c}$ ) of the concrete is an important parameter in the structural design. However, the assessment of $f_{c}$ via an experimental program is time-consuming, costly, and needs a labor force. Therefore, the forecasting of $f_{c}$ through different algorithms can accelerate and facilitate this process and also provide guidance for scheduling the progress of the construction. While some studies have explored the use of models for the prediction of $f_{c}$ of concrete, the ensemble models that can predict the $f_{c}$ of GPC with industrial by-products is still lacking. Within this scope, decision tree (DT), Bootstrap aggregating (Bagging), and Least-squares boosting (LSBoost) models were devised to predict $f_{c}$ of ground granulated blast furnace slag (GGBFS)-based geopolymer concrete (GPC). The data points collected to devise a GEP model in the previous study were used and the prediction results of the GEP model were compared with the proposed ensemble models in the current study. The age of the specimen, NaOH solution concentration, natural zeolite (NZ) content, silica fume (SF) content, and GGBFS content were used as input parameters, and $f_{c}$ was used as output parameter. According to ANOVA analysis, the age of the specimen was found as the most influential parameter in the determination of the $f_{c}$ of GGBFS-based GPC. Also, Multiple linear regression equation was proposed to estimate the $f_{c}$ of GGBFS-based GPC with the accuracy of 93%. The most accurate model was introduced through performance metrics and the Taylor diagram. The results proved that the highest accuracy and stable predictions were achieved by the LSBoost model with R-squared value of 98.25% followed by GEP model developed in the previous study, DT and Bagging models. However, it is worth mentioning that due to having a high coefficient of correlation values (>%80), DT and Bagging models also have an acceptable ability for predicting $f_{c}$ of GGBS-based GPC.
Prediction of thermal diffusivity of volcanic rocks using machine learning and genetic algorithm hybrid strategy
2023, International Journal of Thermal Sciences
Thermal diffusivity, a major determinant of heat transfer, has a substantial effect on rocks thermal properties. This investigation employed artificial intelligence to predict the thermal diffusivity of volcanic rocks. Models of machine learning with Genetic Algorithm are established to have a hollistic insight of thermal diffusivity analysis and prediction using elemental composition of volcanic rocks, and properties including temparature, density, specific gravity and apparent porosity. The support vector machine, ensembled learning tree, Gaussian process regression, and decision tree machine learning models are combined with a genetic algorithm to predict the thermal diffusivity of volcanic rocks. Using the genetic algorithm, feature selection and hyper-parameter optimization of machine learning methods improved the accuracy of model predictions. The SVM-GA predicts more accurately than other machine learning models, with a coefficient of determination of 0.99 and a root mean square error of 0.0046. In addition, the partial dependence plots analysis illustrates the potential effect of each parameter on thermal diffusivity. Finally, a graphical user interface based on the SVM-GA model was developed to predict thermal diffusivity without extensive machine learning knowledge. The difference between a graphical user interface and experimental yield was less than 1%.
Multi-target prediction model of urban distribution system rainfall-caused outage based on spatiotemporal fusion
2023, International Journal of Electrical Power and Energy Systems
Improving the prediction accuracy of weather-related outages in the distribution system is crucial for disaster prevention and mitigation. Due to the vulnerability of the distribution system in rainfall, we propose a multi-target prediction model of the rainfall-caused outages of the distribution system based on the fusion of spatiotemporal attributes. In this article, the ground subdivision model based on element reconstruction is firstly proposed to establish a spatial grid. Then the rainfall waterlogging model considering seepage, drainage and confluence is established to provide a targeted reference for distribution system outage prediction. Then the multi-target random forest based on superposition features is proposed to deal with the meteorological data, geographic environment data, power grid data and waterlogging data. The model outputs the rainfall outage probability and visualization results of ground and overhead equipment branded with time and space labels. Finally, the model is tested on the instance data and compared with other models.

View all citing articles on Scopus

Celine Vens received her Ph.D. in applied sciences from the Katholieke Universiteit Leuven, Belgium. She is a post-doctoral fellow of the Research Foundation—Flanders (FWO) and she is working at the Declarative Languages and Artificial Intelligence Research Group at the Department of Computer Science of the K.U. Leuven University in Leuven, Belgium. Her research interests include machine learning, (relational) data mining, and applications thereof in bioinformatics and other domains.

Jan Struyf received his Ph.D. in applied sciences from the Katholieke Universiteit Leuven, Belgium. He is a former post-doctoral researcher of the Declarative Languages and Artificial Intelligence Research Group at the Department of Computer Science of the K.U. Leuven University in Leuven, Belgium. His research interests include machine learning, statistical relational data mining, predictive clustering, structured output learning, and inductive databases.

Sašo Džeroski received his Ph.D. degree in computer science from the University of Ljubljana in 1995. He is currently a scientific councilor at the Department of Knowledge Technologies, Jozef Stefan Institute, and the Centre of Excellence for Integrated Approaches in Chemistry and Biology of Proteins, both in Ljubljana, Slovenia. He is also an associate professor at the Jozef Stefan International Postgraduate School, also in Ljubljana. His research interests include data mining and machine learning and their applications in environmental sciences (ecology) and life sciences (biomedicine). He is an ECCAI fellow, member of the executive board of SLAIS, member of ACM SIGKDD and IEEE.

View full text

Tree ensembles for predicting structured outputs

Abstract

Highlights

Introduction

Section snippets

Machine learning tasks

Tree ensembles for predicting structured outputs

Experimental design

Results and discussion

Related work

Conclusions

Acknowledgment

Pattern Recognition

Ecological Modelling

Pattern Recognition

Soil Biology and Biochemistry

Ecological Informatics

Ecological Modelling

Pattern Recognition

Information Processing & Management

Pattern Recognition Letters

Pattern Recognition

Pattern Recognition

10 challenging problems in data mining research

International Journal of Information Technology & Decision Making

Future trends in data mining

Data Mining and Knowledge Discovery

A survey of hierarchical classification across different application domains

Data Mining and Knowledge Discovery

Decision trees for hierarchical multi-label classification

Machine Learning

Finding explained groups of time-course gene expression profiles with predictive clustering trees

Molecular Biosystems

Classification and Regression Trees

Bagging predictors

Machine Learning

Ensemble Methods in Data MiningImproving Accuracy through Combining Predictions

An empirical comparison of voting classification algorithmsbagging, boosting, and variants

Machine Learning

Neural network ensembles

IEEE Transactions on Pattern Analysis and Machine Intelligence

Predicting gene function using hierarchical multi-label decision tree ensembles

BMC Bioinformatics

Multi label classificationan overview

International Journal of Data Warehouse and Mining

Efficient algorithms for decision tree cross-validation

Journal of Machine Learning Research

C4.5Programs for Machine Learning

Combining Pattern ClassifiersMethods and Algorithms

Using iterated bagging to debias regressions

Machine Learning

Random forests

Machine Learning

The random subspace method for constructing decision forests

IEEE Transactions on Pattern Analysis and Machine Intelligence

Data MiningPractical Machine Learning Tools and Techniques

On structured output traininghard cases and an efficient alternative

Machine Learning