Skip to main content

2021 | OriginalPaper | Buchkapitel

A General Machine Learning Framework for Survival Analysis

verfasst von : Andreas Bender, David Rügamer, Fabian Scheipl, Bernd Bischl

Erschienen in: Machine Learning and Knowledge Discovery in Databases

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The modeling of time-to-event data, also known as survival analysis, requires specialized methods that can deal with censoring and truncation, time-varying features and effects, and that extend to settings with multiple competing events. However, many machine learning methods for survival analysis only consider the standard setting with right-censored data and proportional hazards assumption. The methods that do provide extensions usually address at most a subset of these challenges and often require specialized software that can not be integrated into standard machine learning workflows directly. In this work, we present a very general machine learning framework for time-to-event analysis that uses a data augmentation strategy to reduce complex survival tasks to standard Poisson regression tasks. This reformulation is based on well developed statistical theory. With the proposed approach, any algorithm that can optimize a Poisson (log-)likelihood, such as gradient boosted trees, deep neural networks, model-based boosting and many more can be used in the context of time-to-event analysis. The proposed technique does not require any assumptions with respect to the distribution of event times or the functional shapes of feature and interaction effects. Based on the proposed framework we develop new methods that are competitive with specialized state of the art approaches in terms of accuracy, and versatility, but with comparatively small investments of programming effort or requirements for specialized methodological know-how.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Alaa, A.M., van der Schaar, M.: Deep multi-task gaussian processes for survival analysis with competing risks. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 2326–2334 (2017) Alaa, A.M., van der Schaar, M.: Deep multi-task gaussian processes for survival analysis with competing risks. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 2326–2334 (2017)
2.
Zurück zum Zitat Bender, A., Groll, A., Scheipl, F.: A generalized additive model approach to time-to-event analysis. Statistical Modelling p. 1471082X17748083 (2018) Bender, A., Groll, A., Scheipl, F.: A generalized additive model approach to time-to-event analysis. Statistical Modelling p. 1471082X17748083 (2018)
3.
Zurück zum Zitat Bender, A., Scheipl, F., Hartl, W., Day, A.G., Küchenhoff, H.: Penalized estimation of complex, non-linear exposure-lag-response associations. Biostatistics 20(2), 315–331 (2018)MathSciNetCrossRef Bender, A., Scheipl, F., Hartl, W., Day, A.G., Küchenhoff, H.: Penalized estimation of complex, non-linear exposure-lag-response associations. Biostatistics 20(2), 315–331 (2018)MathSciNetCrossRef
4.
Zurück zum Zitat Biganzoli, E., Boracchi, P., Marubini, E.: A general framework for neural network models on censored survival data. Neural Netw. 15(2), 209–218 (2002)CrossRef Biganzoli, E., Boracchi, P., Marubini, E.: A general framework for neural network models on censored survival data. Neural Netw. 15(2), 209–218 (2002)CrossRef
5.
Zurück zum Zitat Binder, H., Allignol, A., Schumacher, M., Beyersmann, J.: Boosting for high-dimensional time-to-event data with competing risks. Bioinformatics 25(7), 890–896 (2009)CrossRef Binder, H., Allignol, A., Schumacher, M., Beyersmann, J.: Boosting for high-dimensional time-to-event data with competing risks. Bioinformatics 25(7), 890–896 (2009)CrossRef
6.
7.
Zurück zum Zitat Cai, T., Hyndman, R.J., Wand, M.P.: Mixed model-based hazard estimation. J. Comput. Graph. Stat. 11(4), 784–798 (2002)MathSciNetCrossRef Cai, T., Hyndman, R.J., Wand, M.P.: Mixed model-based hazard estimation. J. Comput. Graph. Stat. 11(4), 784–798 (2002)MathSciNetCrossRef
8.
Zurück zum Zitat Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD 2016, pp. 785–794 (2016). arXiv: 1603.02754 Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD 2016, pp. 785–794 (2016). arXiv:​ 1603.​02754
9.
Zurück zum Zitat Cox, D.R.: Regression models and life-tables. J. Royal Stat. Soc. Series B (Methodological) 34(2), 187–220 (1972) Cox, D.R.: Regression models and life-tables. J. Royal Stat. Soc. Series B (Methodological) 34(2), 187–220 (1972)
10.
Zurück zum Zitat Faraggi, D., Simon, R.: A neural network model for survival data. Stat. Med. 14(1), 73–82 (1995)CrossRef Faraggi, D., Simon, R.: A neural network model for survival data. Stat. Med. 14(1), 73–82 (1995)CrossRef
11.
Zurück zum Zitat Fornili, M., Ambrogi, F., Boracchi, P., Biganzoli, E.: Piecewise exponential artificial neural networks (PEANN) for modeling hazard function with right censored data. In: Formenti, E., Tagliaferri, R., Wit, E. (eds.) CIBB 2013 2013. LNCS, vol. 8452, pp. 125–136. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09042-9_9CrossRef Fornili, M., Ambrogi, F., Boracchi, P., Biganzoli, E.: Piecewise exponential artificial neural networks (PEANN) for modeling hazard function with right censored data. In: Formenti, E., Tagliaferri, R., Wit, E. (eds.) CIBB 2013 2013. LNCS, vol. 8452, pp. 125–136. Springer, Cham (2014). https://​doi.​org/​10.​1007/​978-3-319-09042-9_​9CrossRef
12.
Zurück zum Zitat Friedman, J.H., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010). number: 1CrossRef Friedman, J.H., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010). number: 1CrossRef
13.
Zurück zum Zitat Friedman, M.: Piecewise exponential models for survival data with covariates. Ann. Stat. 10(1), 101–113 (1982)MathSciNetCrossRef Friedman, M.: Piecewise exponential models for survival data with covariates. Ann. Stat. 10(1), 101–113 (1982)MathSciNetCrossRef
14.
Zurück zum Zitat Gerds, T.A., Kattan, M.W., Schumacher, M., Yu, C.: Estimating a time-dependent concordance index for survival prediction models with covariate dependent censoring. Stat. Med. 32(13), 2173–2184 (2013)MathSciNetCrossRef Gerds, T.A., Kattan, M.W., Schumacher, M., Yu, C.: Estimating a time-dependent concordance index for survival prediction models with covariate dependent censoring. Stat. Med. 32(13), 2173–2184 (2013)MathSciNetCrossRef
15.
Zurück zum Zitat Gerds, T.A., Schumacher, M.: Consistent estimation of the expected brier score in general survival models with right-censored event times. Biometrical J. 48(6), 1029–1040 (2006)MathSciNetCrossRef Gerds, T.A., Schumacher, M.: Consistent estimation of the expected brier score in general survival models with right-censored event times. Biometrical J. 48(6), 1029–1040 (2006)MathSciNetCrossRef
16.
Zurück zum Zitat Guo, G.: Event-history analysis for left-truncated data. Sociol. Methodol. 23, 217–243 (1993)CrossRef Guo, G.: Event-history analysis for left-truncated data. Sociol. Methodol. 23, 217–243 (1993)CrossRef
17.
Zurück zum Zitat Hothorn, T., Bühlmann, P.: Model-based boosting in high dimensions. Bioinformatics 22(22), 2828–2829 (2006)CrossRef Hothorn, T., Bühlmann, P.: Model-based boosting in high dimensions. Bioinformatics 22(22), 2828–2829 (2006)CrossRef
18.
Zurück zum Zitat Hothorn, T., Hornik, K., Zeileis, A.: Unbiased recursive partitioning: a conditional inference framework. J. Comput. Graph. Stat. 15(3), 651–674 (2006)MathSciNetCrossRef Hothorn, T., Hornik, K., Zeileis, A.: Unbiased recursive partitioning: a conditional inference framework. J. Comput. Graph. Stat. 15(3), 651–674 (2006)MathSciNetCrossRef
19.
Zurück zum Zitat Huang, X., Chen, S., Soong, S.j.: Piecewise exponential survival trees with time-dependent covariates. Biometrics 54(4), 1420–1433 (1998) Huang, X., Chen, S., Soong, S.j.: Piecewise exponential survival trees with time-dependent covariates. Biometrics 54(4), 1420–1433 (1998)
20.
Zurück zum Zitat Iacobelli, S., Carstensen, B.: Multiple time scales in multi-state models. Stat. Med. 32(30), 5315–5327 (2013)MathSciNetCrossRef Iacobelli, S., Carstensen, B.: Multiple time scales in multi-state models. Stat. Med. 32(30), 5315–5327 (2013)MathSciNetCrossRef
21.
Zurück zum Zitat Ishwaran, H., et al.: Random survival forests for competing risks. Biostatistics 15(4), 757–773 (2014)CrossRef Ishwaran, H., et al.: Random survival forests for competing risks. Biostatistics 15(4), 757–773 (2014)CrossRef
22.
Zurück zum Zitat Ishwaran, H., Kogalur, U.B., Blackstone, E.H., Lauer, M.S.: Random survival forests. Ann. Appl. Stat. 2(3), 841–860 (2008)MathSciNetCrossRef Ishwaran, H., Kogalur, U.B., Blackstone, E.H., Lauer, M.S.: Random survival forests. Ann. Appl. Stat. 2(3), 841–860 (2008)MathSciNetCrossRef
23.
24.
Zurück zum Zitat Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 3146–3154. Curran Associates, Inc. (2017) Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 3146–3154. Curran Associates, Inc. (2017)
25.
Zurück zum Zitat Klein, J.P., Moeschberger, M.L.: Survival Analysis: Techniques for Censored and Truncated Data. Springer, New York (2006)MATH Klein, J.P., Moeschberger, M.L.: Survival Analysis: Techniques for Censored and Truncated Data. Springer, New York (2006)MATH
26.
Zurück zum Zitat Kyle, R.A., et al.: A long-term study of prognosis in monoclonal gammopathy of undetermined significance. N. Engl. J. Med. 346(8), 564–569 (2002)CrossRef Kyle, R.A., et al.: A long-term study of prognosis in monoclonal gammopathy of undetermined significance. N. Engl. J. Med. 346(8), 564–569 (2002)CrossRef
27.
Zurück zum Zitat Lee, C., Yoon, J., Schaar, M.V.D.: Dynamic-DeepHit: a deep learning approach for dynamic survival analysis with competing risks based on longitudinal data. IEEE Trans. Bio-Med. Eng. 67(1), 122–133 (2020)CrossRef Lee, C., Yoon, J., Schaar, M.V.D.: Dynamic-DeepHit: a deep learning approach for dynamic survival analysis with competing risks based on longitudinal data. IEEE Trans. Bio-Med. Eng. 67(1), 122–133 (2020)CrossRef
28.
Zurück zum Zitat Lee, C., Zame, W.R., Yoon, J., Schaar, M.V.d.: DeepHit: a deep learning approach to survival analysis with competing risks. In: Thirty-Second AAAI Conference on Artificial Intelligence (April 2018) Lee, C., Zame, W.R., Yoon, J., Schaar, M.V.d.: DeepHit: a deep learning approach to survival analysis with competing risks. In: Thirty-Second AAAI Conference on Artificial Intelligence (April 2018)
29.
Zurück zum Zitat Lee, D.K.K., Chen, N., Ishwaran, H.: Boosted nonparametric hazards with time-dependent covariates. arXiv:1701.07926 [stat] (November 2019) Lee, D.K.K., Chen, N., Ishwaran, H.: Boosted nonparametric hazards with time-dependent covariates. arXiv:​1701.​07926 [stat] (November 2019)
30.
Zurück zum Zitat Liestbl, K., Andersen, P.K., Andersen, U.: Survival analysis and neural nets. Stat. Med. 13(12), 1189–1200 (1994)CrossRef Liestbl, K., Andersen, P.K., Andersen, U.: Survival analysis and neural nets. Stat. Med. 13(12), 1189–1200 (1994)CrossRef
33.
Zurück zum Zitat Sennhenn-Reulen, H., Kneib, T.: Structured fusion lasso penalized multi-state models. Stat. Med. 35(25), 4637–4659 (2016)MathSciNetCrossRef Sennhenn-Reulen, H., Kneib, T.: Structured fusion lasso penalized multi-state models. Stat. Med. 35(25), 4637–4659 (2016)MathSciNetCrossRef
34.
Zurück zum Zitat Wang, P., Li, Y., Reddy, C.K.: Machine learning for survival analysis: a survey. ACM Comput. Surv. (CSUR) 51(6), 110:1–110:36 (2019) Wang, P., Li, Y., Reddy, C.K.: Machine learning for survival analysis: a survey. ACM Comput. Surv. (CSUR) 51(6), 110:1–110:36 (2019)
35.
Zurück zum Zitat Wright, M.N., Ziegler, A.: Ranger: a fast implementation of random forests for high dimensional data in C++ and r. J. Stat. Softw. 77(1), 1–17 (2017)CrossRef Wright, M.N., Ziegler, A.: Ranger: a fast implementation of random forests for high dimensional data in C++ and r. J. Stat. Softw. 77(1), 1–17 (2017)CrossRef
36.
Zurück zum Zitat Zaharia, M., et al.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)CrossRef Zaharia, M., et al.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)CrossRef
37.
Zurück zum Zitat Zhang, X., Zhou, Y., Ma, Y., Chen, B.C., Zhang, L., Agarwal, D.: Glmix: generalized linear mixed models for large-scale response prediction. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 363–372 (2016) Zhang, X., Zhou, Y., Ma, Y., Chen, B.C., Zhang, L., Agarwal, D.: Glmix: generalized linear mixed models for large-scale response prediction. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 363–372 (2016)
Metadaten
Titel
A General Machine Learning Framework for Survival Analysis
verfasst von
Andreas Bender
David Rügamer
Fabian Scheipl
Bernd Bischl
Copyright-Jahr
2021
DOI
https://doi.org/10.1007/978-3-030-67664-3_10

Premium Partner