Skip to main content
Top

2021 | OriginalPaper | Chapter

A General Machine Learning Framework for Survival Analysis

Authors : Andreas Bender, David Rügamer, Fabian Scheipl, Bernd Bischl

Published in: Machine Learning and Knowledge Discovery in Databases

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The modeling of time-to-event data, also known as survival analysis, requires specialized methods that can deal with censoring and truncation, time-varying features and effects, and that extend to settings with multiple competing events. However, many machine learning methods for survival analysis only consider the standard setting with right-censored data and proportional hazards assumption. The methods that do provide extensions usually address at most a subset of these challenges and often require specialized software that can not be integrated into standard machine learning workflows directly. In this work, we present a very general machine learning framework for time-to-event analysis that uses a data augmentation strategy to reduce complex survival tasks to standard Poisson regression tasks. This reformulation is based on well developed statistical theory. With the proposed approach, any algorithm that can optimize a Poisson (log-)likelihood, such as gradient boosted trees, deep neural networks, model-based boosting and many more can be used in the context of time-to-event analysis. The proposed technique does not require any assumptions with respect to the distribution of event times or the functional shapes of feature and interaction effects. Based on the proposed framework we develop new methods that are competitive with specialized state of the art approaches in terms of accuracy, and versatility, but with comparatively small investments of programming effort or requirements for specialized methodological know-how.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Alaa, A.M., van der Schaar, M.: Deep multi-task gaussian processes for survival analysis with competing risks. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 2326–2334 (2017) Alaa, A.M., van der Schaar, M.: Deep multi-task gaussian processes for survival analysis with competing risks. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 2326–2334 (2017)
2.
go back to reference Bender, A., Groll, A., Scheipl, F.: A generalized additive model approach to time-to-event analysis. Statistical Modelling p. 1471082X17748083 (2018) Bender, A., Groll, A., Scheipl, F.: A generalized additive model approach to time-to-event analysis. Statistical Modelling p. 1471082X17748083 (2018)
3.
go back to reference Bender, A., Scheipl, F., Hartl, W., Day, A.G., Küchenhoff, H.: Penalized estimation of complex, non-linear exposure-lag-response associations. Biostatistics 20(2), 315–331 (2018)MathSciNetCrossRef Bender, A., Scheipl, F., Hartl, W., Day, A.G., Küchenhoff, H.: Penalized estimation of complex, non-linear exposure-lag-response associations. Biostatistics 20(2), 315–331 (2018)MathSciNetCrossRef
4.
go back to reference Biganzoli, E., Boracchi, P., Marubini, E.: A general framework for neural network models on censored survival data. Neural Netw. 15(2), 209–218 (2002)CrossRef Biganzoli, E., Boracchi, P., Marubini, E.: A general framework for neural network models on censored survival data. Neural Netw. 15(2), 209–218 (2002)CrossRef
5.
go back to reference Binder, H., Allignol, A., Schumacher, M., Beyersmann, J.: Boosting for high-dimensional time-to-event data with competing risks. Bioinformatics 25(7), 890–896 (2009)CrossRef Binder, H., Allignol, A., Schumacher, M., Beyersmann, J.: Boosting for high-dimensional time-to-event data with competing risks. Bioinformatics 25(7), 890–896 (2009)CrossRef
7.
go back to reference Cai, T., Hyndman, R.J., Wand, M.P.: Mixed model-based hazard estimation. J. Comput. Graph. Stat. 11(4), 784–798 (2002)MathSciNetCrossRef Cai, T., Hyndman, R.J., Wand, M.P.: Mixed model-based hazard estimation. J. Comput. Graph. Stat. 11(4), 784–798 (2002)MathSciNetCrossRef
8.
go back to reference Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD 2016, pp. 785–794 (2016). arXiv: 1603.02754 Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD 2016, pp. 785–794 (2016). arXiv:​ 1603.​02754
9.
go back to reference Cox, D.R.: Regression models and life-tables. J. Royal Stat. Soc. Series B (Methodological) 34(2), 187–220 (1972) Cox, D.R.: Regression models and life-tables. J. Royal Stat. Soc. Series B (Methodological) 34(2), 187–220 (1972)
10.
go back to reference Faraggi, D., Simon, R.: A neural network model for survival data. Stat. Med. 14(1), 73–82 (1995)CrossRef Faraggi, D., Simon, R.: A neural network model for survival data. Stat. Med. 14(1), 73–82 (1995)CrossRef
11.
go back to reference Fornili, M., Ambrogi, F., Boracchi, P., Biganzoli, E.: Piecewise exponential artificial neural networks (PEANN) for modeling hazard function with right censored data. In: Formenti, E., Tagliaferri, R., Wit, E. (eds.) CIBB 2013 2013. LNCS, vol. 8452, pp. 125–136. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09042-9_9CrossRef Fornili, M., Ambrogi, F., Boracchi, P., Biganzoli, E.: Piecewise exponential artificial neural networks (PEANN) for modeling hazard function with right censored data. In: Formenti, E., Tagliaferri, R., Wit, E. (eds.) CIBB 2013 2013. LNCS, vol. 8452, pp. 125–136. Springer, Cham (2014). https://​doi.​org/​10.​1007/​978-3-319-09042-9_​9CrossRef
12.
go back to reference Friedman, J.H., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010). number: 1CrossRef Friedman, J.H., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010). number: 1CrossRef
13.
14.
go back to reference Gerds, T.A., Kattan, M.W., Schumacher, M., Yu, C.: Estimating a time-dependent concordance index for survival prediction models with covariate dependent censoring. Stat. Med. 32(13), 2173–2184 (2013)MathSciNetCrossRef Gerds, T.A., Kattan, M.W., Schumacher, M., Yu, C.: Estimating a time-dependent concordance index for survival prediction models with covariate dependent censoring. Stat. Med. 32(13), 2173–2184 (2013)MathSciNetCrossRef
15.
go back to reference Gerds, T.A., Schumacher, M.: Consistent estimation of the expected brier score in general survival models with right-censored event times. Biometrical J. 48(6), 1029–1040 (2006)MathSciNetCrossRef Gerds, T.A., Schumacher, M.: Consistent estimation of the expected brier score in general survival models with right-censored event times. Biometrical J. 48(6), 1029–1040 (2006)MathSciNetCrossRef
16.
go back to reference Guo, G.: Event-history analysis for left-truncated data. Sociol. Methodol. 23, 217–243 (1993)CrossRef Guo, G.: Event-history analysis for left-truncated data. Sociol. Methodol. 23, 217–243 (1993)CrossRef
17.
go back to reference Hothorn, T., Bühlmann, P.: Model-based boosting in high dimensions. Bioinformatics 22(22), 2828–2829 (2006)CrossRef Hothorn, T., Bühlmann, P.: Model-based boosting in high dimensions. Bioinformatics 22(22), 2828–2829 (2006)CrossRef
18.
go back to reference Hothorn, T., Hornik, K., Zeileis, A.: Unbiased recursive partitioning: a conditional inference framework. J. Comput. Graph. Stat. 15(3), 651–674 (2006)MathSciNetCrossRef Hothorn, T., Hornik, K., Zeileis, A.: Unbiased recursive partitioning: a conditional inference framework. J. Comput. Graph. Stat. 15(3), 651–674 (2006)MathSciNetCrossRef
19.
go back to reference Huang, X., Chen, S., Soong, S.j.: Piecewise exponential survival trees with time-dependent covariates. Biometrics 54(4), 1420–1433 (1998) Huang, X., Chen, S., Soong, S.j.: Piecewise exponential survival trees with time-dependent covariates. Biometrics 54(4), 1420–1433 (1998)
20.
21.
go back to reference Ishwaran, H., et al.: Random survival forests for competing risks. Biostatistics 15(4), 757–773 (2014)CrossRef Ishwaran, H., et al.: Random survival forests for competing risks. Biostatistics 15(4), 757–773 (2014)CrossRef
22.
go back to reference Ishwaran, H., Kogalur, U.B., Blackstone, E.H., Lauer, M.S.: Random survival forests. Ann. Appl. Stat. 2(3), 841–860 (2008)MathSciNetCrossRef Ishwaran, H., Kogalur, U.B., Blackstone, E.H., Lauer, M.S.: Random survival forests. Ann. Appl. Stat. 2(3), 841–860 (2008)MathSciNetCrossRef
24.
go back to reference Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 3146–3154. Curran Associates, Inc. (2017) Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 3146–3154. Curran Associates, Inc. (2017)
25.
go back to reference Klein, J.P., Moeschberger, M.L.: Survival Analysis: Techniques for Censored and Truncated Data. Springer, New York (2006)MATH Klein, J.P., Moeschberger, M.L.: Survival Analysis: Techniques for Censored and Truncated Data. Springer, New York (2006)MATH
26.
go back to reference Kyle, R.A., et al.: A long-term study of prognosis in monoclonal gammopathy of undetermined significance. N. Engl. J. Med. 346(8), 564–569 (2002)CrossRef Kyle, R.A., et al.: A long-term study of prognosis in monoclonal gammopathy of undetermined significance. N. Engl. J. Med. 346(8), 564–569 (2002)CrossRef
27.
go back to reference Lee, C., Yoon, J., Schaar, M.V.D.: Dynamic-DeepHit: a deep learning approach for dynamic survival analysis with competing risks based on longitudinal data. IEEE Trans. Bio-Med. Eng. 67(1), 122–133 (2020)CrossRef Lee, C., Yoon, J., Schaar, M.V.D.: Dynamic-DeepHit: a deep learning approach for dynamic survival analysis with competing risks based on longitudinal data. IEEE Trans. Bio-Med. Eng. 67(1), 122–133 (2020)CrossRef
28.
go back to reference Lee, C., Zame, W.R., Yoon, J., Schaar, M.V.d.: DeepHit: a deep learning approach to survival analysis with competing risks. In: Thirty-Second AAAI Conference on Artificial Intelligence (April 2018) Lee, C., Zame, W.R., Yoon, J., Schaar, M.V.d.: DeepHit: a deep learning approach to survival analysis with competing risks. In: Thirty-Second AAAI Conference on Artificial Intelligence (April 2018)
29.
go back to reference Lee, D.K.K., Chen, N., Ishwaran, H.: Boosted nonparametric hazards with time-dependent covariates. arXiv:1701.07926 [stat] (November 2019) Lee, D.K.K., Chen, N., Ishwaran, H.: Boosted nonparametric hazards with time-dependent covariates. arXiv:​1701.​07926 [stat] (November 2019)
30.
go back to reference Liestbl, K., Andersen, P.K., Andersen, U.: Survival analysis and neural nets. Stat. Med. 13(12), 1189–1200 (1994)CrossRef Liestbl, K., Andersen, P.K., Andersen, U.: Survival analysis and neural nets. Stat. Med. 13(12), 1189–1200 (1994)CrossRef
33.
go back to reference Sennhenn-Reulen, H., Kneib, T.: Structured fusion lasso penalized multi-state models. Stat. Med. 35(25), 4637–4659 (2016)MathSciNetCrossRef Sennhenn-Reulen, H., Kneib, T.: Structured fusion lasso penalized multi-state models. Stat. Med. 35(25), 4637–4659 (2016)MathSciNetCrossRef
34.
go back to reference Wang, P., Li, Y., Reddy, C.K.: Machine learning for survival analysis: a survey. ACM Comput. Surv. (CSUR) 51(6), 110:1–110:36 (2019) Wang, P., Li, Y., Reddy, C.K.: Machine learning for survival analysis: a survey. ACM Comput. Surv. (CSUR) 51(6), 110:1–110:36 (2019)
35.
go back to reference Wright, M.N., Ziegler, A.: Ranger: a fast implementation of random forests for high dimensional data in C++ and r. J. Stat. Softw. 77(1), 1–17 (2017)CrossRef Wright, M.N., Ziegler, A.: Ranger: a fast implementation of random forests for high dimensional data in C++ and r. J. Stat. Softw. 77(1), 1–17 (2017)CrossRef
36.
go back to reference Zaharia, M., et al.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)CrossRef Zaharia, M., et al.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)CrossRef
37.
go back to reference Zhang, X., Zhou, Y., Ma, Y., Chen, B.C., Zhang, L., Agarwal, D.: Glmix: generalized linear mixed models for large-scale response prediction. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 363–372 (2016) Zhang, X., Zhou, Y., Ma, Y., Chen, B.C., Zhang, L., Agarwal, D.: Glmix: generalized linear mixed models for large-scale response prediction. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 363–372 (2016)
Metadata
Title
A General Machine Learning Framework for Survival Analysis
Authors
Andreas Bender
David Rügamer
Fabian Scheipl
Bernd Bischl
Copyright Year
2021
DOI
https://doi.org/10.1007/978-3-030-67664-3_10

Premium Partner