## 1 Introduction

## 2 Methods

### 2.1 Dynamic treatment regimes

### 2.2 Bayesian machine learning for DTRs

### 2.3 AFT-BART

### 2.4 Proposed AFT-BML algorithm

`dtr1`

that utilizes the BART R package (Sparapani et al. 2021). Specifically, the AFT-BART function (`abart`

) was called in our wrapper function. The default tuning parameters for the BART prior were adopted, including \(\alpha =0.95\), \(\gamma =2\), \(\nu =3\), \(q=0.9\) (Chipman et al. 2010). Details on the software implementation can be found in Appendix D.## 3 Simulations

### 3.1 Simulation design

### 3.2 Simulation settings

### 3.3 Method implementation and simulation metrics

`dtr1`

that implemented the algorithm described in Sect. 2.4; further documentation of this implementation is available in Appendix D. For Q-learning, we first used `survreg`

function from the R package `survival`

(Therneau and Grambsch 2000; Therneau 2022) to fit the Stage 2 model, then made predictions of the optimal second stage treatment and corresponding optimal survival time to create Stage 1 data. The `survreg`

function was called again to fit the new augmented Stage 1 data and estimate the optimal first stage treatment with corresponding optimal overall survival time. For Scenario 3 with a Gumbel error distribution, we used the Weibull option to fit the Q-learning approaches with the correct error distribution.### 3.4 Simulation results

## 4 Motivating analysis: optimal treatment for AML patients undergoing transplant

Stage 2 | Not entered | |||
---|---|---|---|---|

Standard | NHTL | Stage 2 | ||

Stage 1 | Standard | 673 | 219 | 2180 |

NHTL | 240 | 91 | 768 |

`timeROC`

(Blanche et al. 2013) using the predicted survival time estimated by both AFT-BML and Q-learning as predictors. For the Stage 2 model, the observed time and event indicator can be used directly in calculating the time dependent AUC. For the Stage 1 prediction model (which assumes patients receive optimal treatment in Stage 2), we need to account for not all patients receiving their optimal treatment in Stage 2. To handle this, depending on whether the estimated optimal treatment at Stage 2 was observed or not, the original observation was kept as is, or was censored at the time of entering Stage 2. Since the estimated optimal Stage 2 treatment could be different from AFT-BML to Q-learning, we examined three sets of censored Stage 1 data, including optimal Stage 2 treatment identified by AFT-BML, or Q-learning, or consistently optimal under both Stage 2 models. The time points of interest for Stage 1 are one year, two years, and three years. For Stage 2, only the median and third quartile of the observed time are evaluated. The results from both stages are shown in Table 2. For Stage 2, AFT-BML improves the AUC by \(0.55\%\) at the median, and \(1.87\%\) at the third quartile, indicating that AFT-BML has a better predictive performance at Stage 2.Time in | Suboptimal treatment | Time dependent AUC | ||
---|---|---|---|---|

Stage | months | censoring rule | AFT-BML (%) | Q-learning (%) |

1 | 12 | AFT-BML based | 71.34 | 69.61 |

Q-learning based | 70.89 | 69.14 | ||

Both agreed | 70.81 | 69.23 | ||

24 | AFT-BML based | 72.68 | 70.52 | |

Q-learning based | 72.07 | 69.86 | ||

Both agreed | 71.99 | 69.99 | ||

36 | AFT-BML based | 72.50 | 70.35 | |

Q-learning based | 71.82 | 69.60 | ||

Both agreed | 71.79 | 69.86 | ||

2 | 3.2 (Median) | NA | 70.33 | 69.78 |

15 (Third quartile) | NA | 76.09 | 74.22 |