1 Introduction
-
extend our solution to work also with the regression trees. Additional experimental comparison between globally induced regression and model trees is also performed;
-
propose new, alternative crowding functions, that involve Bayesian Information Criterion (BIC) (Schwarz 1978) weight fitness value and the objective weights;
-
perform extensive experimental evaluation that includes: analysis of new real-life datasets; regression and model trees; two-objective and three-objective optimization and visualization;
-
show the significance of the crowding distance on the final form of the Pareto front.
2 Background
2.1 Decision trees
-
TARGET (Fan and Gray 2005)—evolves a CART-like regression tree with basic genetic operators;
-
STGP (Hazan et al. 2006)—strongly typed Genetic Programming (GP) approach;
-
E-Motion (Barros et al. 2011)—globally induces model trees that implement a standard 1-point crossover and two different mutation strategies;
-
GPMCC (Potgieter and Engelbrecht 2008)—evolves the model trees with nonlinear regression models in the leaves;
-
GASOPE (Potgieter and Engelbrecht 2007)—composed from GP to evolve the structure of the model trees and Genetic Algorithm (GA) to evolve polynomial expressions.
2.2 Multi-objective evolutionary algorithms
2.3 Multi-objective optimization and the decision trees
3 Pareto-optimal search in GMT
3.1 Global model tree
3.1.1 Representation and selection
3.1.2 Genetic operators
-
replace one of the following: subtree, branch, node, or test between two affected individuals or the best individual found so far;
-
prune the internal node and transform it into the leaf with a new multivariate linear regression model;
-
expand the leaf into the internal node;
-
modify the test in internal nodes (shift threshold, replace tested attribute);
-
change linear regression models in the leaves (add, remove, or change attributes).
3.1.3 Fitness function
3.1.4 Smoothing
3.2 GMT Pareto-based approach
3.2.1 Sorting strategy
3.2.2 Elitist archive and new population
3.2.3 Crowding distance
4 Experimental validation
Dataset | Number of features | ||
---|---|---|---|
Name | Instances | Numeric | Nominal |
Abalone (AB) | 4177 | 7 | 1 |
Ailerons (AI) | 13,750 | 40 | 0 |
Delta ailerons (DA) | 7129 | 5 | 0 |
Delta elevators (DE) | 9517 | 6 | 0 |
Kinemaics (KI) | 8192 | 8 | 0 |
Pole (PO) | 15,000 | 48 | 0 |
Stock (ST) | 950 | 9 | 0 |
4.1 Setup
4.2 Visualization of the Pareto front
Algorithm | Parameter | Dataset | ||||||
---|---|---|---|---|---|---|---|---|
AB | AI | DA | DE | KI | PO | ST | ||
REP Tree | RMSE | 2.223 | 0.000203 | 0.000175 | 0.00150 | 0.194 | 8.26 | 1.469 |
REP Tree | nodes | 291 | 93 | 251 | 229 | 819 | 223 | 137 |
REP Tree unprunned | RMSE | 2.526 | 0.000221 | 0.000186 | 0.00171 | 0.2031 | 8.0773 | 1.1157 |
REP Tree unprunned | Nodes | 526 | 2277 | 1047 | 1545 | 2301 | 469 | 261 |
wGRT | RMSE | 2.387 | 0.000207 | 0.000180 | 0.00155 | 0.195 | 8.16 | 1.431 |
wGRT | Nodes | 9.6 | 32 | 13 | 14 | 20 | 58 | 14 |
lGRT | RMSE | 2.596 | 0.000243 | 0.000195 | 0.00166 | 0.223 | 12.10 | 1.339 |
lGRT | Nodes | 3.5 | 11 | 4.0 | 4.9 | 2.8 | 19 | 35 |
pGRT 1* | RMSE | 2.700 | 0.000273 | 0.000188 | 0.00160 | 0.216 | 10.96 | 1.938 |
pGRT 1* | Nodes | 2.0 | 4.0 | 8.0 | 6.0 | 4.0 | 15 | 6.0 |
pGRT 2* | RMSE | 2.389 | 0.000209 | 0.000178 | 0.00151 | 0.199 | 8.555 | 1.393 |
pGRT 2* | Nodes | 9.0 | 29 | 46 | 14 | 25 | 40 | 15 |
pGRT 3* | RMSE | 2.273 | 0.000195 | 0.000173 | 0.00149 | 0.179 | 7.984 | 0.995 |
pGRT 3* | Nodes | 25 | 73 | 66 | 45 | 61 | 57 | 98 |
Algorithm | Parameter | AB | AI | DA | DE | KI | PO | ST |
---|---|---|---|---|---|---|---|---|
M5 | RMSE | 2.122 | 0.000169 | 0.000170 | 0.00148 | 0.162 | 7.4908 | 0.937 |
M5 | Nodes | 12 | 9.0 | 17 | 2.0 | 106 | 193 | 47 |
M5 | Attributes | 96 | 149 | 85 | 10 | 848 | 1568 | 423 |
M5 unprunned | RMSE | 2.151 | 0.000181 | 0.000190 | 0.00150 | 0.1578 | 7.5095 | 1.002 |
M5 unprunned | Nodes | 1113 | 2676 | 1789 | 2438 | 2304 | 511 | 196 |
M5 unprunned | Attributes | 9134 | 34941 | 22182 | 8715 | 15118 | 4627 | 907 |
wGMT | RMSE | 2.127 | 0.000165 | 0.000173 | 0.00148 | 0.163 | 8.076 | 1.386 |
wGMT | Nodes | 2.0 | 3.0 | 4.6 | 1.0 | 7.1 | 37 | 3.9 |
wGMT | Attributes | 7.9 | 17 | 9.5 | 4.0 | 34 | 52 | 14 |
lGMT | RMSE | 2.341 | 0.000177 | 0.000181 | 0.00142 | 0.154 | 7.314 | 0.935 |
lGMT | Nodes | 1.0 | 3.2 | 4.9 | 2.0 | 9.7 | 78 | 41 |
lGMT | Attributes | 5.0 | 11 | 12 | 5.2 | 4.5 | 82 | 111 |
pGMT 1* | RMSE | 2.359 | 0.000175 | 0.000175 | 0.00142 | 0.184 | 16.90 | 1.531 |
pGMT 1* | Nodes | 1.0 | 1.0 | 1.0 | 1.0 | 2.0 | 3.0 | 3.0 |
pGMT 1* | Attributes | 2.0 | 6.0 | 3.0 | 5.0 | 13 | 6.0 | 8.0 |
pGMT 2* | RMSE | 2.102 | 0.000168 | 0.000169 | 0.00140 | 0.174 | 9.791 | 0.928 |
pGMT 2* | Nodes | 2.0 | 2.0 | 4.0 | 3.0 | 4.0 | 15 | 31 |
pGMT 2* | Attributes | 9.0 | 12 | 14 | 10 | 25 | 23 | 38 |
pGMT 3* | RMSE | 2.079 | 0.000164 | 0.000167 | 0.00138 | 0.149 | 7.354 | 0.782 |
pGMT 3* | Nodes | 3.0 | 4.0 | 9.0 | 6.0 | 17 | 20 | 61 |
pGMT 3* | Attributes | 12 | 27 | 29 | 24 | 116 | 157 | 121 |
4.3 Comparison with other classifiers
-
REP Tree (REP)—popular top-down inducer that builds a regression tree using variance and prunes it using reduced-error pruning (with backfitting);
-
M5—state-of-the-art model tree inducer (Quinlan 1992), the most adequate greedy counterpart of the GMT.