Tree ensembles for predicting structured outputs
Highlights
► Tree ensembles for efficient local and global predictions of structured outputs. ► Considered multi-target prediction and hierarchical multi-label classification. ► Theoretical and empirical evaluations over a variety of benchmark domains. ► Local and global tree ensembles perform equally well, while global are more efficient.
Introduction
Supervised learning is one of the most widely researched and investigated areas of machine learning. The goal in supervised learning is to learn, from a set of examples with known class, a function that outputs a prediction for the class of a previously unseen example. If the examples belong to two classes (e.g., the example has some property or not) the task is called binary classification. The task where the examples can belong to a single class from a given set of m classes () is known as multi-class classification. The case where the output is a real value is called regression.
However, in many real life problems of predictive modeling the output (i.e., the target) is structured, meaning that there can be dependencies between classes (e.g., classes are organized into a tree-shaped hierarchy or a directed acyclic graph) or some internal relations between the classes (e.g., sequences). These types of problems occur in domains such as life sciences (predicting gene function, finding the most important genes for a given disease, predicting toxicity of molecules, etc.), ecology (analysis of remotely sensed data, habitat modeling), multimedia (annotation and retrieval of images and videos) and the semantic web (categorization and analysis of text and web pages). Having in mind the needs of these application domains and the increasing quantities of structured data, Yang and Wu [1] and Kriegel et al. [2] listed the task of “mining complex knowledge from complex data” as one of the most challenging problems in machine learning.
A variety of methods, specialized in predicting a given type of structured output (e.g., a hierarchy of classes [3]), have been proposed [4]. These methods can be categorized into two groups of methods for solving the problem of predicting structured outputs [4], [3]: (1) local methods that predict component(s) of the output and then combine the individual models to get the overall model and (2) global methods that predict the complete structure as a whole (also known as ‘big-bang’ approaches). The global methods have several advantages over the local methods. First, they exploit and use the dependencies that exist between the components of the structured output in the model learning phase, which can result in better predictive performance. Next, they are typically more efficient: it can easily happen that the number of components in the output is very large (e.g., hierarchies in functional genomics can have several thousands of components), in which case executing a basic method for each component is not feasible. Furthermore, they produce models that are typically smaller than the sum of the sizes of the models built for each of the components.
The predictive models that we consider in this paper are predictive clustering trees (PCTs). PCTs belong to the group of global methods. PCTs offer a unifying approach for dealing with different types of structured outputs and construct the predictive models very efficiently. They are able to make predictions for several types of structured outputs: tuples of continuous/discrete variables, hierarchies of classes, and time series. More details about the PCT framework can be found in [5], [6], [7], [8], [9].
PCTs can be considered as a generalization of standard decision trees towards predicting structured outputs. Although they offer easily interpretable trees with good predictive performance, they inherit the stability issues of decision trees [10]. Namely, change in just a few training examples can sometimes drastically change the structure of the tree. Breiman [11] states that un-stable predictive models (such as decision trees) could be combined into an ensemble to improve their predictive performance. An ensemble is a set of (base) predictive models, whose output is combined. For basic classification and regression tasks, it is widely accepted that an ensemble lifts the predictive performance of its base predictive models [12]. However, for the task of predicting structured outputs, this issue has not been thoroughly investigated. Moreover, in the case where the base predictive models are decision trees, Bauer and Kohavi [13] conclude that the ensemble's increase in performance is stronger if the trees are unpruned, i.e., allowed to overfit. On the other hand, Blockeel et al. [14] state that PCTs for structured outputs show less overfitting than the trees for classification of a single target variable. Having in mind these two conflicting influences, it is not obvious whether an ensemble of predictive clustering trees can significantly increase the predictive performance over that of a single predictive clustering tree.
Furthermore, in an ensemble learning setting, it is not clear if the predictive performance of an ensemble of global predictive models will be better or worse than the predictive performance of a combination of ensembles of local predictive models. Generally, an ensemble is known to perform better than its base learner if the base learner is accurate and diverse [15]. While the superior predictive performance of global models has been shown before [8], less is known about their diversity or instability (i.e., whether they produce different errors with small changes to the training data). It is expected that a PCT for predicting structured outputs, especially in the case of hierarchical classification, is less unstable than a PCT for predicting components of the output. It is also not clear which approach will be more efficient, both in terms of running time and size of the predictive models.
In this paper, we investigate the aforementioned questions. We use bagging and random forests as ensemble learning methods, since they are the two most widely used ensemble learning methods in the context of decision trees [16]. We consider two structured output machine learning tasks: predicting multiple target variables and hierarchical multi-label classification. We perform an extensive empirical evaluation of the proposed methods over a variety of benchmark datasets.
The paper is based on our previous work by Kocev et al. [7] and Schietgat et al. [17]. Kocev et al. [7] conducted an initial comparison of different ensemble schemes using predictive clustering trees in the context of predicting multiple targets. Schietgat et al. [17] introduced bagging of PCTs for hierarchical multi-label classification (HMC) in the context of functional genomics. The present paper extends the previous work [7], [17] in the following directions:
- •
The tasks of predicting multiple target variables and hierarchical multi-label classification are discussed jointly. Formal definitions of the considered tasks are provided, as well as an extensive discussion of the related work.
- •
The experimental evaluation is performed on a much wider selection of datasets, from various domains. The performance of ensembles with different number of base predictive models is evaluated and saturation curves are provided.
- •
A better methodology for performance comparisons is used and a more detailed discussion of the results is presented. Friedman tests combined with a Nemenyi post hoc test are used to compare different ensemble schemes, instead of performing pairwise comparisons.
- •
Global random forests of PCTs for HMC are introduced and evaluated. Local ensembles (both bagging and random forests) of PCTs for HMC are introduced and evaluated.
- •
A computational complexity analysis of the proposed methods is performed, which is consistent with the empirical evidence. Global tree ensembles are most efficient, especially random forests, and are scalable to large datasets.
The remainder of this paper is organized as follows. In Section 2, the considered machine learning tasks are formally defined. Section 3 first explains the predictive clustering trees framework and the extensions for predicting multiple targets and hierarchical multi-label classification. It then describes the ensemble methods and their extension for predicting structured outputs. The design of the experiments, the descriptions of the datasets, the evaluation measures and the parameter settings for the algorithms are given in Section 4. Section 5 presents and discusses the obtained results. The related work is presented in Section 6. Finally, the conclusions are stated in Section 7.
Section snippets
Machine learning tasks
The work presented in this paper concerns the learning of ensembles for predicting structured outputs. First, we formally describe the machine learning tasks that we consider here: predicting multiple targets and hierarchical multi-label classification. We follow the suggestions by Džeroski et al. [18] and Džeroski [86], where predictive modeling is defined for arbitrary types of input and output data. In particular, we describe the tasks of predicting multiple targets and hierarchical
Tree ensembles for predicting structured outputs
In this section, we present our ensemble methods for predicting structured outputs. We begin by presenting predictive clustering trees and their instantiations for predicting multiple continuous variables, predicting multiple discrete variables, and hierarchical multi-label classification. Next, we describe how ensemble learning methods can be adapted to use predictive clustering trees as base predictive models. Finally, we describe an approach to the prediction of structured outputs that uses
Experimental design
In this section, we describe the procedure for experimental evaluation of the proposed ensemble methods for predicting structured outputs. First, we state the questions we consider. Next, we present the datasets we use to evaluate the algorithms, and then the evaluation measures we applied. In the last subsection, we give the parameter values used in the algorithms and the statistical tests that we used.
Results and discussion
The results from the experiments can be analyzed along several dimensions. First, we present the saturation curves of the ensemble methods (both for predicting the structured outputs and the components of the outputs). We also compare single trees vs. ensembles of trees. Next, we compare models that predict the complete structured output vs. models that predict components of the structured output. Finally, we evaluate the algorithms by their efficiency in terms of running time and model size.
Related work
The task of predicting structured outputs is gaining more and more attention within the machine learning research community [4], [3]. The community has proposed a number of different methods for addressing this task. However, they are typically “computationally demanding and ill-suited for dealing with large datasets” [4]. In this paper, we have proposed a global method for predicting structured outputs that has good predictive performance and is very efficient: it scales linearly with the size
Conclusions
In this paper, we address the task of learning predictive models for structured output learning, which takes as input a tuple of attribute values and produces a structured object. In contrast to standard classification and regression, where the output is a single scalar value, in structured output learning the output is a data structure, such as a tuple or a directed acyclic graph. We consider both global and local prediction of structured outputs, the former based on a single model that
Acknowledgment
We would like to thank Valentin Gjorgjioski for the discussions concerning the computational complexity of the proposed methods. Celine Vens is a post-doctoral fellow of the Research Fund—Flanders (FWO—Vlaanderen). The work of Dragi Kocev and Sašo Džeroski was supported by the Slovenian Research Agency (Grants P2-0103 and J2-2285), the European Commission (Grants ICT-2010-266722 and ICT-2011-287713), and Operation no. OP13.1.1.2.02.0005 nanced by the European Regional Development Fund (85%) and
Dragi Kocev received his Ph.D. degree in computer science from the IPS Jozef Stefan in 2011. He is currently a post-doctoral researcher at the Department of Knowledge Technologies, Jozef Stefan Institute, Ljubljana, Slovenia. His research interests include machine learning and data mining, prediction of structured outputs and their applications in environmental and life sciences.
References (86)
- et al.
Mining data with random forestsa survey and results of new tests
Pattern Recognition
(2011) - et al.
Using multi-objective classification to model communities of soil
Ecological Modelling
(2006) - et al.
Controlling the diversity in classifier ensembles through a measure of agreement
Pattern Recognition
(2005) - et al.
Application of machine learning techniques to the analysis of soil ecological data basesrelationships between habitat features and Collembolan community characteristics
Soil Biology and Biochemistry
(2000) - et al.
Estimating vegetation height and canopy cover from remotely sensed data with machine learning
Ecological Informatics
(2010) - et al.
Using single- and multi-target regression trees and ensembles to model a compound index of vegetation condition
Ecological Modelling
(2009) - et al.
Learning multi-label scene classification
Pattern Recognition
(2004) - et al.
A systematic analysis of performance measures for classification tasks
Information Processing & Management
(2009) - et al.
Multi-task learning to rank for web search
Pattern Recognition Letters
(2012) - et al.
Multi-output regression on the output manifold
Pattern Recognition
(2009)
A semi-dependent decomposition approach to learn hierarchical classifiers
Pattern Recognition
10 challenging problems in data mining research
International Journal of Information Technology & Decision Making
Future trends in data mining
Data Mining and Knowledge Discovery
A survey of hierarchical classification across different application domains
Data Mining and Knowledge Discovery
Decision trees for hierarchical multi-label classification
Machine Learning
Finding explained groups of time-course gene expression profiles with predictive clustering trees
Molecular Biosystems
Classification and Regression Trees
Bagging predictors
Machine Learning
Ensemble Methods in Data MiningImproving Accuracy through Combining Predictions
An empirical comparison of voting classification algorithmsbagging, boosting, and variants
Machine Learning
Neural network ensembles
IEEE Transactions on Pattern Analysis and Machine Intelligence
Predicting gene function using hierarchical multi-label decision tree ensembles
BMC Bioinformatics
Multi label classificationan overview
International Journal of Data Warehouse and Mining
Efficient algorithms for decision tree cross-validation
Journal of Machine Learning Research
C4.5Programs for Machine Learning
Combining Pattern ClassifiersMethods and Algorithms
Using iterated bagging to debias regressions
Machine Learning
Random forests
Machine Learning
The random subspace method for constructing decision forests
IEEE Transactions on Pattern Analysis and Machine Intelligence
Data MiningPractical Machine Learning Tools and Techniques
On structured output traininghard cases and an efficient alternative
Machine Learning
Cited by (240)
Predicting adverse long-term neurocognitive outcomes after pediatric intensive care unit admission
2024, Computer Methods and Programs in BiomedicineSurvival analysis as semi-supervised multi-target regression for time-to-employment prediction using oblique predictive clustering trees
2024, Expert Systems with ApplicationsForecasting the compressive strength of GGBFS-based geopolymer concrete via ensemble predictive models
2023, Construction and Building MaterialsPrediction of thermal diffusivity of volcanic rocks using machine learning and genetic algorithm hybrid strategy
2023, International Journal of Thermal SciencesMulti-target prediction model of urban distribution system rainfall-caused outage based on spatiotemporal fusion
2023, International Journal of Electrical Power and Energy Systems
Dragi Kocev received his Ph.D. degree in computer science from the IPS Jozef Stefan in 2011. He is currently a post-doctoral researcher at the Department of Knowledge Technologies, Jozef Stefan Institute, Ljubljana, Slovenia. His research interests include machine learning and data mining, prediction of structured outputs and their applications in environmental and life sciences.
Celine Vens received her Ph.D. in applied sciences from the Katholieke Universiteit Leuven, Belgium. She is a post-doctoral fellow of the Research Foundation—Flanders (FWO) and she is working at the Declarative Languages and Artificial Intelligence Research Group at the Department of Computer Science of the K.U. Leuven University in Leuven, Belgium. Her research interests include machine learning, (relational) data mining, and applications thereof in bioinformatics and other domains.
Jan Struyf received his Ph.D. in applied sciences from the Katholieke Universiteit Leuven, Belgium. He is a former post-doctoral researcher of the Declarative Languages and Artificial Intelligence Research Group at the Department of Computer Science of the K.U. Leuven University in Leuven, Belgium. His research interests include machine learning, statistical relational data mining, predictive clustering, structured output learning, and inductive databases.
Sašo Džeroski received his Ph.D. degree in computer science from the University of Ljubljana in 1995. He is currently a scientific councilor at the Department of Knowledge Technologies, Jozef Stefan Institute, and the Centre of Excellence for Integrated Approaches in Chemistry and Biology of Proteins, both in Ljubljana, Slovenia. He is also an associate professor at the Jozef Stefan International Postgraduate School, also in Ljubljana. His research interests include data mining and machine learning and their applications in environmental sciences (ecology) and life sciences (biomedicine). He is an ECCAI fellow, member of the executive board of SLAIS, member of ACM SIGKDD and IEEE.