1 Introduction
2 Uplift modeling
2.1 Definition
2.1.1 Binary model
2.1.2 Multitreatment model
2.2 Survey of multitreatment uplift modeling approaches
Current methods | ||
---|---|---|
Previous studies | Machine learning technique | Data sets |
Data preprocessing approach | ||
Direct estimation: dummy and interactions approach (DIA) | ||
A single predictive model with a modified input space is trained. In addition to the pretreatment characteristics, dummies indicating the exposure to treatments and interaction terms are added. | ||
Chen et al. (2015) | Logistic regression | Airline priority seating (private) |
Data processing approach | ||
Indirect estimation: separate model approach (SMA) | ||
A predictive model is trained for each treatment group using the pretreatment characteristics as predictors and the outcome variable as target. Then, each model is used to predict the conditional probabilities \({\hat{P}}(Y = 1 | X, do(T=k))\) for each test individual, so that the \({\hat{\tau }}_{i,k}\) can be estimated to identify the optimal treatment \(\pi _{i,k}^{*}\). | ||
Lo and Pachamanova (2015) | Logistic regression | MineThatData (public) |
Direct estimation: adapted algorithms | ||
An uplift model is trained with a machine learning technique that is specially adapted to the multitreatment setting. | ||
Rzepakowski and Jaroszewicz (2012) | Decision tree | splice in UCI repository (public) |
Guelman (2015) | K-nearest-neighbor (CKNN) | – |
Zhao et al. (2017b) | Random forest (CTS) | Synthetic data (private) & Seat reservation data (private) |
Zhao et al. (2017a) | Random forest (UCTS) | Synthetic data (private) |
Li et al. (2018) | Reinforcement learning (Rlift) | Synthetic data (private) & Marketing campaign (private) |
Sawant et al. (2018) | Reinforcement learning | Amazon fashion marketing (private) |
Zhao and Harinen (2019) | Meta-learners (X-Learner and R-Learner) | Synthetic data (public) & Promotion campaign (private) |
Proposed methods | ||
---|---|---|
Data preprocessing approach: multitreatment modified outcome approach (MMOA) | ||
Data processing approach: naive uplift approach (NUA) | ||
Separate binary uplift models directly estimate the uplift between each treatment group and the control group. |
2.3 Proposed methods
2.3.1 Multitreatment modified outcome approach (MMOA)
Treatment group (T) | Observed outcome (Y) | Modified outcome |
---|---|---|
\(T=0\) | 1 | \(R_{T=0}\) |
\(T=0\) | 0 | \(NR_{T=0}\) |
\(T=1\) | 1 | \(R_{T=1}\) |
\(T=1\) | 0 | \(NR_{T=1}\) |
\(T=2\) | 1 | \(R_{T=2}\) |
\(T=2\) | 0 | \(NR_{T=2}\) |
2.3.2 Naive uplift approach (NUA)
3 Evaluation metrics
3.1 Conventional uplift metrics
3.2 Expected response
4 Experimental setup
4.1 Data sets
- The \({\textit{Hillstrom}}\) direct marketing campaign data set (Hillstrom 2018) comprises a sample of 64.000 individuals. Three treatment groups are identified. Some customers receive an e-mail with men’s merchandise, a second group is targeted with an e-mail corresponding to women’s merchandise, and a last segment is not contacted. Success is considered when a customer visits the website within two weeks after receiving the e-mail.
- The \({\textit{Gerber}}\) data set (Gerber et al. 2008) relates to the study of the political behavior of voters. The aim is to analyze whether social pressure increases turnout from a sample of 180.002 households. Direct mailings were randomly sent 11 days before the August 2006 primary election. The households that received either the “Self” message or the “Neighbors” message are the treated groups to evaluate, whereas those who were targeted with the “Civic duty” message represent the control group. The outcome variable is positive if a vote was given in the elections.
- The \({\textit{Bladder}}\) data set (Therneau 2015) contains information regarding recurrence of bladder cancer for three treatment groups: 1) pyridoxine, 2) thiotepa, and 3) placebo. As in Sołtys et al. (2015), patients who had remaining cancer, or at least one recurrence, are classified as negative cases.
- The colon data set (Therneau 2015) includes data of chemotherapy trials against colon cancer. A low-toxicity medication, Levamisole, was administered to some patients, whereas a combination of Levamisole with the moderately toxic 5-FU chemotherapy agent was received by another subsample. The control treatment group corresponds to the nontreated patients. Following the setup proposed by Sołtys et al. (2015), two outcome variables can be extracted: 1) recurrence or death (Colon1) and 2) death (Colon2). The two data sets slightly differ in the way that the predictor variable \({\textit{time}}\) is processed. For the Colon1 data set, this variable is split into two factors: 1) the number of days until the recurrence event and 2) the number of days until the death event. In the Colon2 data set, \({\textit{time}}\) refers only to the number of days until death, since there is no recurrence.
- The \({\textit{AOD}}\) data set corresponds to alcohol and drug usage (McCaffrey et al. 2013). In this subset of 600 observations, three treatment groups are identified: “community,” “metcbt5” and “scy.” We assigned individuals within the former category to the control group. Given that the outcome variable is continuous, we apply binary encoding by assuming that a positive case is an individual whose substance use frequency declines by the 12th month after the treatment is applied. An important observation is that only 5 out of the 23 original pretreatment variables are available in this subset. Therefore, information on demography, substance use, criminal activities, mental health function and environmental risk is mostly absent.
- The \({\textit{Bank}} \ {\textit{Marketing}}\) data set (Moro et al. 2014) is publicly available in the UCI repository. This set contains information regarding a direct marketing campaign conducted by a commercial bank. To obtain a multitreatment set, the categorical variable “contact” is chosen as the decision variable to determine the different treatment groups. Depending on the type of contact communication, individuals are assigned to either the “cellular” group or “telephone” group. The “unknowns” are the control group. The outcome variable is positive if a customer decides to open a term deposit with the institution.
- The \({\textit{Turnover}}\) data set provided by a private Belgian organization comprises information regarding retention strategies aiming to reduce voluntary turnover. A subset of the 1.951 white collar employees is targeted with two retention campaigns: “recognition” and “flexibility.” The remaining group is not treated, and hence is classified as control. A positive case is represented by an employee who does not voluntarily leave the company the year after the strategies are deployed.
Dataset | Source | Domain | Channel | Response | No. of variables | Groups | ||
---|---|---|---|---|---|---|---|---|
Treatment 1 | Treatment 2 | Standard treatment | ||||||
Hillstrom |
Hillstrom (2018) | Marketing | E-mail | Visit | 18 | WomensEmail | MensEmail | Control |
(21.387) | (21.307) | (21.306) | ||||||
4.52%* | 7.66%* | |||||||
Gerber |
Gerber et al. (2008) | Political behavior | Mail | Vote | 11 | Self | Neighbors | Civic duty |
(38.218) | (38.201) | (38.218) | ||||||
6.34%* | 3.06%* | |||||||
Bladder |
Therneau (2015) | Clinical trial | Medication | No recurrence | 8 | Pyridoxine | Thiotepa | Placebo |
(85) | (81) | (128) | ||||||
\(-\)5.16%* | \(-\)9.86%* | |||||||
Colon 1 |
Therneau (2015) | Clinical trial | Medication | Recurrence or death | 13 | Levamisole | Levamisole & 5FU | Observation |
(310) | (304) | (315) | ||||||
\(-\)0.08%* | \(-\)17.08%* | |||||||
Colon 2 |
Therneau (2015) | Clinical trial | Medication | Death | 12 | Levamisole | Levamisole & 5FU | Observation |
(310) | (304) | (315) | ||||||
\(-\)0.08%* | \(-\)17.08%* | |||||||
Bank |
Moro et al. (2014) | Marketing | Call | Subscribe | 16 | Cellular | Telephone | Unknown |
(29.285) | (2.906) | (13.020) | ||||||
10.85%* | 9.35%* | |||||||
Turnover | Private organization | Human resource | Retention | No turner | 24 | Recognition | Flexibility | Control |
(363) | (491) | (690) | ||||||
1.84%* | \(-\)1.52%* | |||||||
AOD |
McCaffrey et al. (2013) | Public policy | Program | Reduce use | 5 | Metcbt5 | Scy | Community |
(200) | (200) | (200) | ||||||
\(-\)6%* | \(-\)4%* |
4.2 Data preprocessing and partitioning
4.3 Uplift modeling techniques
Method | Approach | Modeling technique | Implementation |
---|---|---|---|
Current | Data preprocessing | DIA | |
Logistic regression (DIALR) | train , method=“glmStepAIC” | ||
Random forest (DIARF) | train , method=“rf” | ||
Data processing | Indirect estimation | ||
SMA | |||
Logistic regression (SMALR) | train , method=“glmStepAIC” | ||
Random forest (SMARF) | train , method=“rf” | ||
Direct estimation | |||
Adapted algorithms | |||
Causal K-nearest neighbor (CKNN) | uplift , upliftKNN | ||
CTS random forest (CTS) | causalML , evaluationFunction=“CTS” | ||
ED random forest (ED) | causalml , evaluationFunction=“ED” | ||
XLearner random forest (XLearner) | causalML , evaluationFunction=“XLearner” | ||
RLearner random forest (RLearner) | causalML , evaluationFunction=“RLearner” | ||
Proposed | Data preprocessing | MMOA | |
Multinomial log-linear (MMOALR) | nnet , multinom | ||
Random forest (MMOARF) | randomForest | ||
Data processing | Indirect estimation | ||
NUA | |||
Uplift random forest (NUARF) | uplift , upliftRF | ||
Uplift causal conditional inference forest (NUACCIF) | uplift , ccif |
4.4 Statistical test
4.5 Implementation
RItools
package is used to check the imbalance in pretreatment characteristics among treatment groups. In the case of detecting any imbalance, the MatchIt
package applies optimal matching based on the propensity scores.caret
package includes the standard Logistic regression and the Random forest algorithms. Furthermore, in this programming language, the uplift
package (Guelman 2014) incorporates the CKNN ( upliftKNN
), the Uplift random forest (upliftRF
) and the Uplift causal conditional inference forest (ccif
). For the setup of the modified outcome techniques, the randomForest
algorithm (Liaw and Wiener 2002) and the Multinomial log-linear model algorithm (multinom
) (Ripley and Venables 2011) are chosen. Recent implementations of the CTS, ED, X-Learner and R-Learner algorithms are available in the causalML
Python package.5 Empirical results
5.1 Identifying and correcting selection bias
Data set | Treatment groups | Imbalance | p-value | Matching | Balance | Final p-value |
---|---|---|---|---|---|---|
Hillstrom | WomensEmail vs. control | \(\times \) | 0.73 | \(\times \) | – | – |
MensEmail vs. control | \(\times \) | 0.58 | \(\times \) | – | – | |
Gerber | Self vs. civic duty | \(\times \) | 0.31 | \(\times \) | – | – |
Neighbors vs. civic duty | \(\times \) | 0.22 | \(\times \) | – | – | |
Bladder | Pyridoxine vs. placebo | \(\times \) | 0.73 | \(\checkmark \) | \(\checkmark \) | 0.99 |
Thiotepa vs. placebo | \(\checkmark \) | \(9.12\mathrm {e}{-5}\) | \(\checkmark \) | \(\checkmark \) | 0.66 | |
Colon 1 | Levamisole vs. observation | \(\times \) | 0.91 | \(\checkmark \) | \(\checkmark \) | 0.97 |
Levamisole & 5FU vs. observation | \(\checkmark \) | 0.005 | \(\checkmark \) | \(\checkmark \) | 0.06 | |
Colon 2 | Levamisole vs. observation | \(\times \) | 0.87 | \(\checkmark \) | \(\checkmark \) | 0.96 |
Levamisole & 5FU vs. observation | \(\checkmark \) | 0.015 | \(\checkmark \) | \(\checkmark \) | 0.09 | |
Bank | Cellular vs. unknown | \(\checkmark \) | \(3.55\mathrm {e}{-65}\) | \(\times \) | \(\times \) | – |
Telephone vs. unknown | \(\checkmark \) | \(1.47\mathrm {e}{-302}\) | \(\times \) | \(\times \) | – | |
Turnover | Recognition vs. control | \(\checkmark \) | \(3.87\mathrm {e}{-11}\) | \(\checkmark \) | \(\checkmark \) | 0.9 |
Flexibility vs. Control | \(\checkmark \) | \(1.52\mathrm {e}{-16}\) | \(\checkmark \) | \(\checkmark \) | 0.29 | |
AOD | Metcbt5 vs. community | \(\times \) | 0.60 | \(\times \) | – | – |
Scy vs. community | \(\times \) | 0.76 | \(\times \) | – | – |
5.2 Assessing model performance: the Qini metric
Data set | Hillstrom | Gerber | Bank | Bladder | Colon1 | Colon2 | Turnover | AOD |
---|---|---|---|---|---|---|---|---|
A. at\(100\%\) | ||||||||
SMALR | 1.03 (0.25) | 1.17 (0.17) | 3.08 (0.63) | \(-\)10.47(5.02) | \(-\)0.56 (4.19) | 1.27 (3.63) | 0.88 (1.4) | \(-\)3.03 (4.32) |
SMARF | 0.49 (0.12) | 1.21 (0.27) | 4.85(0.24) | \(-\)0.24(5.86) | 2.84 (3.64) | 1.95 (4.22) | 1.1 (1.06) | \(-\)0.48 (8.69) |
DIALR | 1 (0.24) | 1.2 (0.17) | 3(0.58) | \(-\)10.44(5.07) | \(-\)0.56 (4.19) | 0.97 (3.58) | 1.36 (1.02) | \(-\)1.49 (4.58) |
DIARF | 0.54 (0.39) | 1.12 (0.42) | 4.14(0.19) | 1.78(1.11) | 7.52 (3.95) | 1.24 (3.74) | \(-\)0.01 (0.95) | \(-\)1.51 (5.21) |
CKNN | – | – | – | 0.18(6.96) | \(-\)1.36 (7.41) | \(-\)1.47 (4.16) | 1.44 (1.01) | \(-\)0.93 (4.82) |
NUARF | 0.66 (0.34) | 1.11 (0.35) | 2.95 (0.95) | \(-\)12.23 (23.65) | 1.03 (6.71) | 3.41 (1.71) | 0.42 (1.25) | \(-\)0.9 (6.94) |
NUACCIF | 0.92 (0.15) | 1.18 (0.21) | 1.49 (0.42) | \(-\)38.02(2.63) | \(-\)2.02 (5.82) | 5.16 (1.13) | 1.46 (0.51) | \(-\)20.92 (1.94) |
MMOALR | 0.95 (0.22) | 1.27 (0.11) | 3.66 (0.61) | 1.77 (1.22) | \(-\)1.07 (2.6) | 5.02 (2.63) | \(-\) 0.52 (0.73) | \(-\)1.41 (6.87) |
MMOARF | 0.21 (0.14) | 1.14 (0.27) | 7.89 (0.36) | 7.23 (1.16) | 0.99 (1.21) | 1.27 (3.74) | \(-\)1.09 (0.59) | 4.14 (4.11) |
CTS | 0.93 (0.15) | 1.16 (0.19) | 4.28 (0.52) | \(-\)5.01 (17.62) | 0.79 (4.4) | 5.37 (1.74) | \(-\)1.33 (1.36) | 0.39 (4.53) |
ED | 1 (0.18) | 1.22 (0.15) | 4.26 (0.51) | \(-\)21.05 (22.38) | \(-\)0.26 (4.29) | 5.36 (2.48) | \(-\)1.12 (1.34) | 1.36 (3.92) |
XLearner | 0.87 (0.22) | 1.24 (0.35) | 3.5 (0.12) | \(-\)28.84 (3.14) | \(-\)9.61 (9.64) | 0.49 (2.6) | \(-\)2.12 (1.02) | \(-\)15.07 (4.19) |
RLearner | 0.97 (0.15) | 1.22 (0.25) | 2.58 (0.31) | \(-\)25.93 (1.91) | 7.07 (4.25) | 1.85 (1.88) | \(-\)0.84 (1.12) | \(-\)11.33 (12.48) |
B. at \(10\%\) | ||||||||
SMALR | 3.44 (1.1) | 1.47 (1.27) | 6.97(2.51) | \(-\)19.36(13.16) | \(-\)21.01 (17.56) | 1.55 (5.64) | 0.8 (4.33) | \(-\)6.21 (8.8) |
SMARF | 2.36 (1.19) | 3.19 (1.13) | 16.35(1.38) | \(-\)12.28(18.16) | 7.9 (25.23) | 7.11 (12.48) | \(-\)1.65 (8.74) | 0.35 (8.92) |
DIALR | 3.59 (0.79) | 1.66 (1.21) | 6.53(1.95) | \(-\)16.65(14.91) | \(-\)21.01 (17.56) | \(-\)0.83 (6.96) | 2.33 (4.72) | 4.84 (10.34) |
DIARF | 1.86 (0.95) | 2.62 (0.84) | 14.25(1.07) | \(-\)33.85(1.58) | 17.02 (18.08) | 4.4 (19.19) | 0.52 (3.17) | \(-\)11.8 (17.14) |
CKNN | – | – | – | 1.17(34.07) | 3.27 (17.71) | \(-\)6.73 (10.51) | \(-\)5.81 (8.26) | \(-\)10.63 (16.47) |
NUARF | 0.39 (0.87) | 2.31 (0.97) | 15.69 (1.15) | \(-\)9.12 (29.22) | \(-\)7.5 (22.98) | \(-\)3.34 (17.04) | 1.99 (3.52) | \(-\)21.58 (28.12) |
NUACCIF | 2.1 (0.79) | 2.48 (1.61) | 9.76 (2.51) | \(-\)36.79 (0.95) | \(-\)4.53 (16.23) | 7.81 (22.58) | 2.98 (2.87) | \(-\)10.67 (11.31) |
MMOALR | 3.52 (0.92) | 3.09 (0.43) | 11.11(3.21) | \(-\)10.67 (9.66) | \(-\)6.04 (9.83) | \(-\)4.79 (9.15) | \(-\)2.56 (2.58) | \(-\)13.52 (21.41) |
MMOARF | 0 (0.74) | 2.36 (1.49) | 17.05 (3.28) | 27.19 (16.01) | 5.49 (11.67) | \(-\)6.54 (16.25) | 17.81 (22.48) | 2.95 (17.98) |
CTS | 2.6 (1.93) | 2.47 (0.83) | 19.34 (5.17) | \(-\)17.82 (24.18) | 9.02 (10.94) | 1.03 (22.09) | \(-\)5.56 (9.03) | \(-\)6.77 (18.43) |
ED | 3.62 (0.99) | 2.72 (0.69) | 20.48 (5.32) | \(-\)29.75 (10.69) | 4.14 (21.69) | \(-\)0.21 (18.88) | \(-\)0.69 (3.27) | \(-\)9.47 (21.69) |
XLearner | 2.83 (1.69) | 1.9 (1.91) | 11.85 (2.34) | \(-\)6.34 (12.68) | \(-\)7.62 (12.16) | \(-\)0.36 (20.89) | \(-\)27.4 (27.38) | 1.49 (1.51) |
RLearner | 2.69 (1.35) | 1.92 (0.76) | 13.91 (1.93) | 3.92 (0) | 7.5 (7.88) | \(-\)11.35 (7.71) | 0.64 (4.54) | 2.55 (14.76) |
5.3 Assessing model performance: the expected response
Data set | Hillstrom | Gerber | Bank | Bladder | Colon1 | Colon2 | Turnover | AOD |
---|---|---|---|---|---|---|---|---|
A. at\(100\%\) | ||||||||
SMALR | 0.18 | 0.38 | 0.16 | 0.79 | 0.54 | 0.56 | 0.99 | 0.51 |
SMARF | 0.16 | 0.36 | 0.15 | 0.97 | 0.51 | 0.51 | 1.07 | 0.51 |
DIALR | 0.18 | 0.38 | 0.16 | 0.79 | 0.54 | 0.58 | 1.01 | 0.46 |
DIARF | 0.17 | 0.37 | 0.15 | 0.85 | 0.54 | 0.55 | 1.14 | 0.46 |
CKNN | – | – | – | 0.86 | 0.49 | 0.45 | 0.92 | 0.48 |
NUARF | 0.18 | 0.37 | 0.14 | 0.86 | 0.50 | 0.61 | 1.12 | 0.49 |
NUACCIF | 0.18 | 0.38 | 0.17 | 0.84 | 0.44 | 0.62 | 1 | 0.53 |
MMOALR | 0.18 | 0.38 | 0.12 | 1.12 | 0.47 | 0.56 | 1.32 | 0.53 |
MMOARF | 0.17 | 0.37 | 0.12 | 1.20 | 0.55 | 0.48 | 1.49 | 0.56 |
CTS | 0.18 | 0.38 | 0.16 | 0.83 | 0.48 | 0.62 | 0.96 | 0.50 |
ED | 0.18 | 0.38 | 0.14 | 0.81 | 0.49 | 0.62 | 0.89 | 0.51 |
XLearner | 0.18 | 0.38 | 0.14 | 0.84 | 0.55 | 0.62 | 0.95 | 0.54 |
RLearner | 0.18 | 0.37 | 0.15 | 0.84 | 0.57 | 0.59 | 1.15 | 0.54 |
B. at\(10\%\) | ||||||||
SMALR | 0.12 | 0.33 | 0.08 | 0.79 | 0.39 | 0.44 | 0.90 | 0.49 |
SMARF | 0.11 | 0.33 | 0.09 | 0.81 | 0.56 | 0.45 | 0.91 | 0.51 |
DIALR | 0.12 | 0.33 | 0.08 | 0.81 | 0.39 | 0.44 | 0.90 | 0.51 |
DIARF | 0.12 | 0.33 | 0.08 | 0.68 | 0.57 | 0.47 | 0.93 | 0.49 |
CKNN | – | – | – | 0.79 | 0.53 | 0.42 | 0.93 | 0.51 |
NUARF | 0.11 | 0.32 | 0.07 | 0.84 | 0.52 | 0.45 | 0.95 | 0.52 |
NUACCIF | 0.12 | 0.32 | 0.10 | 0.84 | 0.51 | 0.45 | 0.91 | 0.53 |
MMOALR | 0.12 | 0.32 | 0.06 | 0.94 | 0.52 | 0.43 | 0.99 | 0.51 |
MMOARF | 0.11 | 0.32 | 0.07 | 1.03 | 0.56 | 0.44 | 1.20 | 0.53 |
CTS | 0.12 | 0.32 | 0.03 | 0.79 | 0.55 | 0.44 | 0.95 | 0.54 |
ED | 0.12 | 0.33 | 0.04 | 0.80 | 0.55 | 0.44 | 0.97 | 0.52 |
XLearner | 0.12 | 0.33 | 0.08 | 0.84 | 0.54 | 0.38 | 0.80 | 0.54 |
RLearner | 0.12 | 0.32 | 0.09 | 0.84 | 0.54 | 0.48 | 0.98 | 0.54 |