1 Introduction
1.1 Machine learning, regression and optimization
This paper was heavily inspired by Bertsimas’ observation that mixed-integer optimization is still relatively unknown but can be applied to many optimization problems in the context of machine learning and statistics. Therefore, we present Leveraged Least Trimmed Absolute Deviations (LLTA), a mixed-integer based robust regression model, whose main idea we explain now.In exciting new work, Bertsimas et al. (2016) showed that the classical best subset selection problem in regression modeling can be formulated as a mixed integer optimization (MIO) problem. Using recent advances in MIO algorithms, they demonstrated that best subset selection can now be solved at much larger problem sizes that what was thought possible in the statistics community.
1.2 Motivation
1.3 Statement of contributions
2 Leveraged least trimmed absolute deviations
2.1 Epigraph reformulation
2.2 Computation of leverage points
2.3 Choosing the number of outliers k
2.4 Performance evaluation
2.5 Uncertainty measurement
3 Computational results
3.1 Comparison with LTS and LTA on the body–brain data set
-
(1.35, 8.1), (465, 423), (36.33, 119.5), (27.66, 115), (1.04, 5.5), (11700, 50), (2547, 4603), (187.1, 419), (521, 655), (10, 115), (3.3, 25.6), (529, 680), (207, 406), (62, 1320), (6654, 5712), (9400, 70), (6.8, 179), (35, 56), (0.12, 1), (0.023, 0.4), (2.5, 12.1), (55.5, 175), (100, 157), (52.16, 440), (0.28, 1.9), (87000, 154.5), (0.122, 3), (192, 180), (3.385, 44.5), (0.48, 15.5), (14.83, 98.2), (4.19, 58), (0.425, 6.4), (0.101, 4), (0.92, 5.7), (1, 6.6), (0.005, 0.14), (0.06, 1), (3.5, 10.8), (2, 12.3), (1.7, 6.3), (0.023, 0.3), (0.785, 3.5), (0.2, 5), (1.41, 17.5), (85, 325), (0.75, 12.3), (3.5, 3.9), (4.05, 17), (0.01, 0.25), (1.4, 12.5), (250, 490), (10.55, 179.5), (0.55, 2.4), (60, 81), (3.6, 21), (4.288, 39.2), (0.075, 1.2), (0.048, 0.33), (3, 25), (160, 169), (0.9, 2.6), (1.62, 11.4), (0.104, 2.5), (4.235, 50.4)
3.2 Datasets from the literature
# | Title | Rows | Columns | Source |
---|---|---|---|---|
1 | Coleman data set | 20 | 6 |
Rousseeuw and Leroy (1987) |
2 | Delivery time data | 25 | 3 |
Montgomery and Peck (1982) |
3 | Hawkins, Bradu, Kass’s Artificial Data | 75 | 4 |
Hawkins et al. (1984) |
4 | Heart catherization data | 12 | 3 | |
5 | Waterflow measurements of Kootenay | 13 | 2 |
Ezekiel and Fox (1959) |
6 | Pension funds data | 18 | 2 |
Rousseeuw and Leroy (1987) |
7 | Phosphorus content data | 18 | 3 |
Rousseeuw and Leroy (1987) |
8 | Salinity data | 28 | 4 |
Ruppert and Carroll (1980) |
9 | Siegel’s exact fit example data | 9 | 2 |
Rousseeuw and Leroy (1987) |
10 | Steam usage data (excerpt) | 25 | 9 |
Norman and Draper (1981) |
11 | Modified data on wood specific gravity | 20 | 6 |
Rousseeuw and Leroy (1987) |