1 Introduction
2 OTE: optimal trees ensemble
2.1 The Algorithm
2.2 Related approaches
3 Experiments and results
3.1 Simulation
Scenario 1 | Scenario 2 | Scenario 3 | Scenario 4 | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
i | j | k | i | j | k | i | j | k | i | j | k | ||||||||||||
1 | 2 | 3 | 4 | 1 | 2 | 3 | 4 | 1 | 2 | 3 | 4 | 1 | 2 | 3 | 4 | ||||||||
1 | 1 | 0.9 | 0.8 | 0.7 | 0.6 | 1 | 1 | 0.9 | 0.8 | 0.7 | 0.6 | 1 | 1 | 0.9 | 0.9 | 0.9 | 0.8 | 1 | 1 | 0.9 | 0.9 | 0.9 | 0.8 |
2 | 0.1 | 0.2 | 0.3 | 0.4 | 2 | 0.1 | 0.2 | 0.3 | 0.4 | 2 | 0.1 | 0.1 | 0.1 | 0.2 | 2 | 0.1 | 0.1 | 0.1 | 0.2 | ||||
3 | 0.1 | 0.2 | 0.3 | 0.4 | 3 | 0.1 | 0.2 | 0.3 | 0.4 | 3 | 0.1 | 0.1 | 0.1 | 0.2 | 3 | 0.1 | 0.1 | 0.1 | 0.2 | ||||
4 | 0.9 | 0.8 | 0.7 | 0.6 | 4 | 0.9 | 0.8 | 0.7 | 0.6 | 4 | 0.9 | 0.9 | 0.9 | 0.8 | 4 | 0.9 | 0.9 | 0.9 | 0.8 | ||||
2 | 1 | 0.9 | 0.8 | 0.7 | 0.6 | 2 | 1 | 0.9 | 0.8 | 0.7 | 0.6 | 2 | 1 | 0.9 | 0.9 | 0.9 | 0.8 | 2 | 1 | 0.9 | 0.9 | 0.9 | 0.8 |
2 | 0.1 | 0.2 | 0.3 | 0.4 | 2 | 0.1 | 0.2 | 0.3 | 0.4 | 2 | 0.1 | 0.1 | 0.1 | 0.2 | 2 | 0.1 | 0.1 | 0.1 | 0.2 | ||||
3 | 0.1 | 0.2 | 0.3 | 0.4 | 3 | 0.1 | 0.2 | 0.3 | 0.4 | 3 | 0.1 | 0.1 | 0.1 | 0.2 | 3 | 0.1 | 0.1 | 0.1 | 0.2 | ||||
4 | 0.9 | 0.8 | 0.7 | 0.6 | 4 | 0.9 | 0.8 | 0.7 | 0.6 | 4 | 0.9 | 0.9 | 0.9 | 0.8 | 4 | 0.9 | 0.9 | 0.9 | 0.8 | ||||
3 | 1 | 0.9 | 0.8 | 0.7 | 0.6 | 3 | 1 | 0.9 | 0.8 | 0.7 | 0.6 | 3 | 1 | 0.9 | 0.8 | 0.7 | 0.7 | 3 | 1 | 0.9 | 0.9 | 0.9 | 0.8 |
2 | 0.1 | 0.2 | 0.3 | 0.4 | 2 | 0.1 | 0.2 | 0.3 | 0.4 | 2 | 0.1 | 0.2 | 0.3 | 0.3 | 2 | 0.1 | 0.1 | 0.1 | 0.2 | ||||
3 | 0.1 | 0.2 | 0.3 | 0.4 | 3 | 0.1 | 0.2 | 0.3 | 0.4 | 3 | 0.1 | 0.2 | 0.3 | 0.3 | 3 | 0.1 | 0.1 | 0.1 | 0.2 | ||||
4 | 0.9 | 0.8 | 0.7 | 0.6 | 4 | 0.9 | 0.8 | 0.7 | 0.6 | 4 | 0.9 | 0.8 | 0.7 | 0.7 | 4 | 0.9 | 0.9 | 0.9 | 0.8 | ||||
4 | 1 | 0.9 | 0.8 | 0.7 | 0.6 | 4 | 1 | 0.9 | 0.8 | 0.7 | 0.7 | 4 | 1 | 0.9 | 0.8 | 0.7 | 0.7 | ||||||
2 | 0.1 | 0.2 | 0.3 | 0.4 | 2 | 0.1 | 0.2 | 0.3 | 0.3 | 2 | 0.1 | 0.2 | 0.3 | 0.3 | |||||||||
3 | 0.1 | 0.2 | 0.3 | 0.4 | 3 | 0.1 | 0.2 | 0.3 | 0.3 | 3 | 0.1 | 0.2 | 0.3 | 0.3 | |||||||||
4 | 0.9 | 0.8 | 0.7 | 0.6 | 4 | 0.9 | 0.8 | 0.7 | 0.7 | 4 | 0.9 | 0.8 | 0.7 | 0.7 | |||||||||
5 | 1 | 0.9 | 0.8 | 0.7 | 0.7 | 5 | 1 | 0.9 | 0.8 | 0.7 | 0.6 | ||||||||||||
2 | 0.1 | 0.2 | 0.3 | 0.3 | 2 | 0.1 | 0.2 | 0.3 | 0.4 | ||||||||||||||
3 | 0.1 | 0.2 | 0.3 | 0.3 | 3 | 0.1 | 0.2 | 0.3 | 0.4 | ||||||||||||||
4 | 0.9 | 0.8 | 0.7 | 0.7 | 4 | 0.9 | 0.8 | 0.7 | 0.6 | ||||||||||||||
6 | 1 | 0.9 | 0.8 | 0.7 | 0.6 | ||||||||||||||||||
2 | 0.1 | 0.2 | 0.3 | 0.4 | |||||||||||||||||||
3 | 0.1 | 0.2 | 0.3 | 0.4 | |||||||||||||||||||
4 | 0.9 | 0.8 | 0.7 | 0.6 |
3.1.1 Scenario 1
3.1.2 Scenario 2
3.1.3 Scenario 3
3.1.4 Scenario 4
Model | d | n | Bayes error | kNN | Tree | RF | NH | SVM (Radial) | SVM (Linear) | SVM (Bessel) | SVM (Laplacian) | OTE | Reduction in Ensemble Size (%) [trees selected] |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Scenario 1 | 9 | 1000 | 9.0 | 22 | 9.9 | 9.6 | 9.8 | 19 | 19 | 19 | 19 | 9.5 | 89.8 [102] |
14 | 26 | 15 | 15 | 15 | 22 | 22 | 23 | 22 | 15 | 89.8 [102] | |||
17 | 32 | 18 | 18 | 21 | 28 | 28 | 28 | 28 | 18 | 89.8 [102] | |||
33 | 42 | 36 | 35 | 36 | 37 | 37 | 38 | 37 | 37 | 89.8 [102] | |||
Scenario 2 | 12 | 1000 | 21 | 29 | 22 | 21 | 21 | 24 | 23 | 30 | 24 | 21 | 89.8 [102] |
24 | 31 | 25 | 24 | 24 | 26 | 26 | 32 | 26 | 23 | 89.7 [103] | |||
28 | 36 | 30 | 28 | 29 | 31 | 30 | 36 | 31 | 29 | 89.7 [103] | |||
30 | 39 | 32 | 32 | 32 | 33 | 33 | 38 | 33 | 32 | 89.7 [103] | |||
Scenario 3 | 15 | 1000 | 15 | 31 | 22 | 18 | 22 | 24 | 24 | 55 | 24 | 18 | 89.8 [102] |
18 | 32 | 24 | 21 | 24 | 26 | 25 | 55 | 26 | 22 | 89.5 [105] | |||
21 | 34 | 25 | 23 | 27 | 27 | 27 | 55 | 27 | 24 | 89.5 [105] | |||
24 | 36 | 29 | 28 | 29 | 29 | 29 | 54 | 30 | 28 | 89.5 [105] | |||
Scenario 4 | 18 | 1000 | 21 | 34 | 28 | 23 | 25 | 25 | 25 | 72 | 27 | 22 | 89.8 [102] |
22 | 35 | 27 | 23 | 26 | 27 | 27 | 71 | 28 | 24 | 90.0 [100] | |||
25 | 39 | 31 | 26 | 29 | 31 | 31 | 67 | 35 | 27 | 89.8 [102] | |||
26 | 40 | 31 | 28 | 30 | 32 | 32 | 68 | 36 | 29 | 89.8 [102] |
3.2 Benchmark problems
3.3 Experimental setup for benchmark data sets
Data set | n | d | Feature type (R/I/N) | Sources |
---|---|---|---|---|
Regression | ||||
Bone | 485 | 3 | (1/1/1) | |
Galaxy | 323 | 4 | (4/0/0) | |
Friedman | 1200 | 5 | (5/0/0) | (Friedman 1991) |
CPU | 209 | 7 | (7/0/0) | (Bache and Lichman 2013) |
Concrete | 103 | 7 | (7/0/0) | (Bache and Lichman 2013) |
Abalone | 4177 | 8 | (7/0/1) | (Bache and Lichman 2013) |
MPG | 398 | 8 | (2/2/4) | (Bache and Lichman 2013) |
Stock | 950 | 9 | (9/0/0) | |
Wine | 1599 | 11 | (11/0/0) | (Bache and Lichman 2013) |
Ozone | 203 | 12 | (9/0/3) | (Leisch and Dimitriadou 2010) |
Housing | 506 | 13 | (12/0/1) | (Meinshausen 2013) |
Pollution | 60 | 15 | (7/8/0) | |
Treasury | 1049 | 15 | (15/0/0) | |
Baseball | 337 | 16 | (2/14/0) | |
Classification | ||||
Mammographic | 830 | 5 | (0/5/0) | |
Dystrophy | 209 | 5 | (2/3/0) |
Peters and Hothorn (2012) |
Monk3 | 122 | 6 | (0/6/0) | (Bache and Lichman 2013) |
Appendicitis | 106 | 7 | (7/0/0) | |
SAHeart | 462 | 9 | (5/3/1) | |
Tic-Tac-Toe | 958 | 9 | (0/0/9) | (Bache and Lichman 2013) |
Heart | 303 | 13 | (1/12/0) | (Bache and Lichman 2013) |
House vote | 232 | 16 | (0/0/16) | (Bache and Lichman 2013) |
Bands | 365 | 19 | (13/6/0) | |
Hepatitis | 80 | 20 | (2/18/0) | (Bache and Lichman 2013) |
Parkinson | 195 | 22 | (22/0/0) | (Bache and Lichman 2013) |
Body | 507 | 23 | (22/1/0) |
Hurley (2012) |
Thyroid | 9172 | 27 | (3/2/22) | (Bache and Lichman 2013) |
WDBC | 569 | 29 | (29/0/0) | (Bache and Lichman 2013) |
WPBC | 198 | 32 | (30/2/0) | (Bache and Lichman 2013) |
Oil-Spill | 937 | 49 | (40/9/0) | |
Spam base | 4601 | 57 | (55/2/0) | (Bache and Lichman 2013) |
Glaucoma | 196 | 62 | (62/0/0) | (Peters and Hothorn 2012) |
Nki 70 | 144 | 76 | (71/5/0) | (Goeman 2012) |
Musk | 476 | 166 | (0/166/0) | (Karatzoglou et al. 2004) |
Data set | n | d | kNN | Tree | RF | NH | SVM (Radial) | SVM (Linear) | SVM (Bessel) | SVM (Laplacian) | OTE |
---|---|---|---|---|---|---|---|---|---|---|---|
Bone | 485 | 3 | 0.8932 | 0.7058 | 0.6601 | 0.6632 | 0.6292 | 0.7908 | 0.7369 | 0.6329 | 0.6454 |
Galaxy | 323 | 4 | 0.0285 | 0.0952 | 0.0275 | 0.0686 | 0.0253 | 0.1153 | 0.0356 | 0.0262 | 0.0261 |
Friedman | 1200 | 5 | 0.1373 | 0.3871 | 0.1212 | 0.4452 | 0.0559 | 0.2828 | 0.0849 | 0.0657 | 0.1364 |
CPU | 209 | 7 | 0.1058 | 0.2838 | 0.0646 | 0.2659 | 0.3898 | 0.0916 | 0.2861 | 0.3143 | 0.0600 |
Concrete | 103 | 7 | 0.3720 | 0.4989 | 0.2174 | 0.4307 | 0.0700 | 0.1743 | 0.0623 | 0.1806 | 0.2342 |
Abalone | 4177 | 8 | 0.5347 | 0.5673 | 0.4386 | 0.6083 | 0.4410 | 0.4904 | 0.4433 | 0.4418 | 0.4473 |
MPG | 398 | 8 | 0.3230 | 0.2301 | 0.1259 | 0.1990 | 0.1358 | 0.2066 | 0.1435 | 0.1359 | 0.1203 |
Stock | 950 | 9 | 0.0102 | 0.0942 | 0.0121 | 0.1192 | 0.0153 | 0.1373 | 0.0274 | 0.0142 | 0.0110 |
Wine | 1599 | 11 | 0.8975 | 0.7140 | 0.4933 | 0.7044 | 0.5980 | 0.6653 | 0.8991 | 0.5859 | 0.5072 |
Ozone | 203 | 12 | 0.6430 | 0.4366 | 0.3061 | 0.3642 | 0.2488 | 0.3528 | 0.7967 | 0.2750 | 0.3016 |
Housing | 506 | 13 | 0.4696 | 0.2821 | 0.1190 | 0.2477 | 0.1756 | 0.3055 | 0.8824 | 0.1853 | 0.1160 |
Pollution | 60 | 15 | 0.9500 | 0.9500 | 0.6779 | 0.7728 | 0.6942 | 0.8144 | 0.9500 | 0.7326 | 0.6653 |
Treasury | 1049 | 15 | 0.0075 | 0.0405 | 0.0040 | 0.0574 | 0.0062 | 0.0060 | 0.0077 | 0.0070 | 0.0039 |
Baseball | 337 | 16 | 0.6931 | 0.3513 | 0.3434 | 0.3908 | 0.3641 | 0.3818 | 0.8765 | 0.3641 | 0.3329 |
Data set | n | d | kNN | Tree | RF | NH | SVM (Radial) | SVM (Linear) | SVM (Bessel) | SVM (Laplacian) | RP (LDA) | RP | OTE (QDA) | OTE.Prob |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Mammographic | 830 | 5 | 0.1901 | 0.1631 | 0.1670 | 0.1579 | 0.1910 | 0.1750 | 0.1875 | 0.1863 | 0.1889 | 0.1957 | 0.1711 | 0.1710 |
Dystrophy | 209 | 5 | 0.1172 | 0.1482 | 0.1154 | 0.1470 | 0.0999 | 0.1122 | 0.1070 | 0.0997 | 0.1206 | 0.0924 | 0.1182 | 0.1183 |
Monk3 | 122 | 6 | 0.1226 | 0.0773 | 0.0728 | 0.2699 | 0.0953 | 0.2254 | 0.0928 | 0.0938 | 0.2024 | 0.1065 | 0.0731 | 0.0735 |
Appendicitis | 106 | 7 | 0.1423 | 0.1640 | 0.1455 | 0.1380 | 0.2245 | 0.1726 | 0.1905 | 0.1650 | 0.1818 | 0.1450 | 0.1500 | 0.1504 |
SAHeart | 462 | 9 | 0.3363 | 0.2911 | 0.2897 | 0.2762 | 0.3075 | 0.3080 | 0.3332 | 0.3139 | 0.3017 | 0.3033 | 0.3178 | 0.3177 |
Tic-Tac-Toe | 958 | 9 | 0.3617 | 0.1082 | 0.0317 | 0.2861 | 0.2078 | 0.3948 | 0.1725 | 0.1972 | 0.3002 | 0.2312 | 0.0353 | 0.0351 |
Heart | 303 | 13 | 0.3500 | 0.2108 | 0.1629 | 0.1892 | 0.2342 | 0.1745 | 0.1612 | 0.1719 | 0.1666 | 0.1958 | 0.1743 | 0.1744 |
House Vote | 232 | 16 | 0.0825 | 0.0345 | 0.0322 | 0.1020 | 0.0330 | 0.0470 | 0.2211 | 0.0529 | 0.0650 | 0.1454 | 0.0340 | 0.0344 |
Bands | 365 | 19 | 0.3196 | 0.3683 | 0.2683 | 0.3647 | 0.3669 | 0.3202 | 0.4724 | 0.5573 | 0.3382 | 0.3144 | 0.2601 | 0.2602 |
Hepatitis | 80 | 20 | 0.3831 | 0.1868 | 0.1385 | 0.1296 | 0.1406 | 0.1568 | 0.5629 | 0.1490 | 0.1921 | 0.1614 | 0.1229 | 0.1230 |
Parkinson | 195 | 22 | 0.1620 | 0.1456 | 0.0894 | 0.1235 | 0.1385 | 0.1941 | 0.2838 | 0.1928 | 0.1844 | 0.1577 | 0.0859 | 0.0861 |
Body | 507 | 23 | 0.0226 | 0.0788 | 0.0395 | 0.0744 | 0.0156 | 0.0136 | 0.5505 | 0.0219 | 0.0196 | 0.0234 | 0.0380 | 0.0371 |
Thyroid | 9172 | 27 | 0.0388 | 0.0126 | 0.0100 | 0.0203 | 0.1113 | 0.0310 | 0.2936 | 0.0834 | 0.0503 | 0.0426 | 0.0100 | 0.0103 |
WDBC | 569 | 29 | 0.0671 | 0.0686 | 0.0388 | 0.0525 | 0.0415 | 0.0264 | 0.6297 | 0.0403 | 0.0526 | 0.0568 | 0.0375 | 0.0374 |
WPBC | 198 | 32 | 0.2413 | 0.2815 | 0.1958 | 0.2282 | 0.2848 | 0.2881 | 0.5684 | 0.3084 | 0.2631 | 0.2263 | 0.1921 | 0.1922 |
Oil-Spill | 937 | 49 | 0.0435 | 0.0366 | 0.0330 | 0.0360 | 0.0756 | 0.1400 | 0.0387 | 0.1467 | 0.0444 | 0.0423 | 0.0321 | 0.0320 |
Spam base | 4601 | 58 | 0.1747 | 0.1083 | 0.0469 | 0.0944 | 0.0941 | 0.0725 | 0.4820 | 0.1020 | 0.2162 | 0.3189 | 0.0460 | 0.0463 |
Sonar | 208 | 60 | 0.1790 | 0.2879 | 0.1615 | 0.2390 | 0.1710 | 0.2505 | 0.5300 | 0.2698 | 0.4285 | 0.2058 | 0.1600 | 0.1616 |
Glaucoma | 196 | 62 | 0.1934 | 0.1237 | 0.1052 | 0.1154 | 0.1108 | 0.1565 | 0.6397 | 0.1664 | 0.1008 | 0.1455 | 0.1051 | 0.1053 |
Nki 70 | 144 | 76 | 0.1827 | 0.1683 | 0.1466 | 0.1448 | 0.2664 | 0.3381 | 0.4260 | 0.4089 | 0.1773 | 0.1837 | 0.1399 | 0.1396 |
Musk | 476 | 166 | 0.1420 | 0.2256 | 0.1103 | 0.2444 | 0.1326 | 0.1440 | 0.4964 | 0.4698 | 0.0957 | 0.0716 | 0.0949 | 0.0947 |