1 Introduction
2 Subgroup discovery
2.1 Description languages, objective functions, and closed selectors
2.2 Branch-and-bound and optimistic estimators
3 Efficiently computable tight optimistic estimators
3.1 The standard case: monotone functions of a central tendency measure
3.2 Dispersion-corrected objective functions based on the median
3.3 Reaching linear time—objectives based on dispersion-corrected coverage
4 Dispersion-corrected subgroup discovery in practice
4.1 Selection bias of dispersion-correction and its statistical merit
amd
(P)) followed by coverage (\({\texttt {cov}}(Q_0)\), \({\texttt {cov}}(Q_1)\)), median (\({\texttt {med}}(Q_0)\), \({\texttt {med}}(Q_1)\)), and mean absolute median deviation (\({\texttt {amd}}(Q_0)\), \({\texttt {amd}}(Q_1)\)) for best subgroup w.r.t. non-dispersion corrected function \(f_0\) and dispersion-corrected function \(f_1\), respectivelyDataset | Selection Bias | Efficiency | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Name | Target |
\(|P|\)
|
\(|\varPi |\)
|
\({\texttt {med}}(P)\)
|
\({\texttt {amd}}(P)\)
|
\({\texttt {cov}}(Q_0)\)
|
\({\texttt {cov}}(Q_1)\)
|
\({\texttt {med}}(Q_0)\)
|
\({\texttt {med}}(Q_1)\)
|
\({\texttt {amd}}(Q_0)\)
|
\({\texttt {amd}}(Q_1)\)
|
\(a_\text {eff}\)
|
\(|{\mathcal {E}}_0|\)
|
\(|{\mathcal {E}}_1|\)
|
\(t_0\)
|
\(t_1\)
| |
1 | abalone | rings | 4,177 | 69 | 9 | 2.359 |
\({\mathbf {0.544}}\)
| 0.191 | 11 | 11 | 2.257 |
\({\mathbf {1.662}}\)
| 1 | 848, 258 | 690, 177 |
\({\mathbf {304}}\)
| 339 |
2 | ailerons | goal | 13,750 | 357 |
\(-0.0008\)
| 0.000303 |
\({\mathbf {0.906}}\)
| 0.59 |
\(-0.0007\)
|
\({\mathbf {-0.0006}}\)
| 0.000288 |
\({\mathbf {0.000198}}\)
| 0.3 | 1, 069, 456 | 54, 103 | 6542 |
\({\mathbf {460}}\)
|
3 | autoMPG8 | mpg | 392 | 24 | 22.5 | 6.524 | 0.497 | 0.497 | 29 | 29 | 4.791 | 4.791 | 1 | 96 | 67 | 0.11 |
\({\mathbf {0.09}}\)
|
4 | baseball | salary | 337 | 24 | 740 | 954.386 |
\({\mathbf {0.362}}\)
| 0.003 | 1550 |
\({\mathbf {2500}}\)
|
\(\underline{1245.092}\)
|
\({\mathbf {0}}\)
| 1 | 117 | 117 | 0.22 |
\({\mathbf {0.21}}\)
|
5 | california | med. h. value | 20, 640 | 72 | 179, 700 | 88, 354 |
\({\mathbf {0.385}}\)
| 0.019 | 262, 500 |
\({\mathbf {500{,}001}}\)
|
\(\underline{94261}\)
|
\({\mathbf {294{,}00}}\)
| 0.4 | 1, 368, 662 | 65, 707 | 2676 |
\({\mathbf {368}}\)
|
6 | compactiv | usr | 8192 | 202 | 89 | 9.661 | 0.464 |
\({\mathbf {0.603}}\)
|
\({\mathbf {94}}\)
| 93 | 7.8 |
\({\mathbf {3.472}}\)
| 0.5 | 2, 458, 105 | 59, 053 | 5161 |
\({\mathbf {208}}\)
|
7 | concrete | compr. strength | 1030 | 70 | 34.4 | 13.427 |
\({\mathbf {0.284}}\)
| 0.1291 | 48.97 |
\({\mathbf {50.7}}\)
| 12.744 |
\({\mathbf {9.512}}\)
| 1 | 512, 195 | 221, 322 | 43.9 |
\({\mathbf {35.8}}\)
|
8 | dee | consume | 365 | 60 | 2.787 | 0.831 |
\({\mathbf {0.523}}\)
| 0.381 | 3.815 |
\({\mathbf {4.008}}\)
| 0.721 |
\({\mathbf {0.434}}\)
| 1 | 18, 663 | 2653 | 2.05 |
\({\mathbf {1.29}}\)
|
9 | delta_ail | sa | 7, 129 | 66 |
\(-0.0001\)
| 0.000231 |
\({\mathbf {0.902}}\)
| 0.392 | 0.0001 |
\({\mathbf {0.0002}}\)
| 0.000226 |
\({\mathbf {0.000119}}\)
| 1 | 45, 194 | 2632 | 33.3 |
\({\mathbf {6.11}}\)
|
10 | delta_elv | se | 9517 | 66 | 0.001 | 0.00198 |
\({\mathbf {0.384}}\)
| 0.369 | 0.002 | 0.002 | 0.00112 |
\({\mathbf {0.00108}}\)
| 1 | 10145 | 1415 | 8.9 |
\({\mathbf {4.01}}\)
|
11 | elevators | goal | 16, 599 | 155 | 0.02 | 0.00411 | 0.113 |
\({\mathbf {0.283}}\)
|
\({\mathbf {0.03}}\)
| 0.021 |
\(\underline{0.00813}\)
|
\({\mathbf {0.00373}}\)
| 0.05 | 6, 356, 465 | 526, 114 | 13, 712 |
\({\mathbf {2891}}\)
|
12 | forestfires | area | 517 | 70 | 0.52 | 12.832 |
\({\mathbf {0.01}}\)
| 0.002 | 86.45 |
\({\mathbf {278.53}}\)
|
\(\underline{56.027}\)
|
\({\mathbf {0}}\)
| 1 | 340, 426 | 264, 207 |
\({\mathbf {23}}\)
| 23.7 |
13 | friedman | output | 1200 | 48 | 14.651 | 4.234 |
\({\mathbf {0.387}}\)
| 0.294 | 18.934 |
\({\mathbf {19.727}}\)
| 3.065 |
\({\mathbf {2.73}}\)
| 1 | 19, 209 | 2, 489 | 3.23 |
\({\mathbf {1.56}}\)
|
14 | house | price | 22, 784 | 160 | 33, 200 | 28,456 | 0.56 |
\({\mathbf {0.723}}\)
|
\({\mathbf {45{,}200}}\)
| 34, 000 |
\(\underline{40{,}576}\)
|
\({\mathbf {27{,}214}}\)
| 0.002 | 1, 221, 696 | 114, 566 | 7937 |
\({\mathbf {1308}}\)
|
15 | laser | output | 993 | 42 | 46 | 35.561 |
\({\mathbf {0.32}}\)
| 0.093 | 109 |
\({\mathbf {135}}\)
|
\(\underline{40.313}\)
|
\({\mathbf {15.662}}\)
| 1 | 2008 | 815 | 0.96 |
\({\mathbf {0.83}}\)
|
16 | mortgage | 30 y. rate | 1049 | 128 | 6.71 | 2.373 |
\({\mathbf {0.256}}\)
| 0.097 | 11.61 |
\({\mathbf {14.41}}\)
| 2.081 |
\({\mathbf {0.98}}\)
| 1 | 40, 753 | 1270 | 11.6 |
\({\mathbf {1.59}}\)
|
17 | mv | y | 40, 768 | 79 | −5.02086 | 8.509 |
\({\mathbf {0.497}}\)
| 0.349 | 0.076 |
\({\mathbf {0.193}}\)
|
\(\underline{8.541}\)
|
\({\mathbf {2.032}}\)
| 1 | 6513 | 1017 | 31.9 |
\({\mathbf {13.2}}\)
|
18 | pole | output | 14, 998 | 260 | 0 | 28.949 |
\({\mathbf {0.40}}\)
| 0.24 | 100 | 100 |
\(\underline{38.995}\)
|
\({\mathbf {16.692}}\)
| 0.2 | 1, 041, 146 | 2966 | 2638 |
\({\mathbf {15}}\)
|
19 | puma32h | thetadd6 | 8192 | 318 | 0.000261 | 0.023 |
\({\mathbf {0.299}}\)
| 0.244 | 0.026 |
\({\mathbf {0.031}}\)
| 0.018 |
\({\mathbf {0.017}}\)
| 0.4 | 3, 141, 046 | 5782 | 2648 |
\({\mathbf {15.5}}\)
|
20 | stock | company10 | 950 | 80 | 46.625 | 5.47 |
\({\mathbf {0.471}}\)
| 0.337 | 52.5 |
\({\mathbf {54.375}}\)
| 3.741 |
\({\mathbf {2.515}}\)
| 1 | 85, 692 | 1822 | 12.5 |
\({\mathbf {1.56}}\)
|
21 | treasury | 1 m. def. rate | 1049 | 128 | 6.61 | 2.473 | 0.182 |
\({\mathbf {0.339}}\)
|
\({\mathbf {13.16}}\)
| 8.65 |
\(\underline{2.591}\)
|
\({\mathbf {0.863}}\)
| 1 | 49, 197 | 9247 | 14.8 |
\({\mathbf {5.91}}\)
|
22 | wankara | mean temp. | 321 | 87 | 47.7 | 12.753 |
\({\mathbf {0.545}}\)
| 0.296 | 60.6 |
\({\mathbf {67.6}}\)
| 8.873 |
\({\mathbf {4.752}}\)
| 1 | 191, 053 | 4081 | 11.9 |
\({\mathbf {1.24}}\)
|
23 | wizmir | mean temp. | 1, 461 | 82 | 60 | 12.622 |
\({\mathbf {0.6}}\)
| 0.349 | 72.9 |
\({\mathbf {78.5}}\)
| 8.527 |
\({\mathbf {3.889}}\)
| 1 | 177, 768 | 1409 | 38.5 |
\({\mathbf {1.48}}\)
|
24 | binaries | delta E | 82 | 499 | 0.106 | 0.277 | 0.305 |
\({\mathbf {0.378}}\)
|
\({\mathbf {0.43}}\)
| 0.202 |
\(\underline{0.373}\)
|
\({\mathbf {0.118}}\)
| 0.5 | 4, 712, 128 | 204 | 1200 |
\({\mathbf {0.29}}\)
|
25 | gold | Evdw-Evdw0 | 12, 200 | 250 | 0.131 | 0.088 |
\({\mathbf {0.765}}\)
| 0.34 | 0.217 |
\({\mathbf {0.234}}\)
| 0.081 |
\({\mathbf {0.0278}}\)
| 0.4 | 1, 498, 185 | 451 | 5650 |
\({\mathbf {3.96}}\)
|