Skip to main content

2016 | OriginalPaper | Buchkapitel

Stabilizing Linear Prediction Models Using Autoencoder

verfasst von : Shivapratap Gopakumar, Truyen Tran, Dinh Phung, Svetha Venkatesh

Erschienen in: Advanced Data Mining and Applications

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

To date, the instability of prognostic predictors in a sparse high dimensional model, which hinders their clinical adoption, has received little attention. Stable prediction is often overlooked in favour of performance. Yet, stability prevails as key when adopting models in critical areas as healthcare. Our study proposes a stabilization scheme by detecting higher order feature correlations. Using a linear model as basis for prediction, we achieve feature stability by regularizing latent correlation in features. Latent higher order correlation among features is modelled using an autoencoder network. Stability is enhanced by combining a recent technique that uses a feature graph, and augmenting external unlabelled data for training the autoencoder network. Our experiments are conducted on a heart failure cohort from an Australian hospital. Stability was measured using Consistency index for feature subsets and signal-to-noise ratio for model parameters. Our methods demonstrated significant improvement in feature stability and model estimation stability when compared to baselines.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
3
We ignore the bias parameter for simplicity.
 
4
Ethics approval was obtained from the Hospital and Research Ethics Committee at Barwon Health (number 12/83) and Deakin University.
 
Literatur
1.
Zurück zum Zitat Au, W.H., Chan, K.C., Wong, A.K., Wang, Y.: Attribute clustering for grouping, selection, and classification of gene expression data. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 2(2), 83–101 (2005)CrossRef Au, W.H., Chan, K.C., Wong, A.K., Wang, Y.: Attribute clustering for grouping, selection, and classification of gene expression data. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 2(2), 83–101 (2005)CrossRef
2.
Zurück zum Zitat Austin, P.C., Tu, J.V.: Automated variable selection methods for logistic regression produced unstable models for predicting acute myocardial infarction mortality. J. Clin. Epidemiol. 57(11), 1138–1146 (2004)CrossRef Austin, P.C., Tu, J.V.: Automated variable selection methods for logistic regression produced unstable models for predicting acute myocardial infarction mortality. J. Clin. Epidemiol. 57(11), 1138–1146 (2004)CrossRef
3.
Zurück zum Zitat Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009)CrossRefMATH Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009)CrossRefMATH
4.
Zurück zum Zitat Betihavas, V., Davidson, P.M., Newton, P.J., Frost, S.A., Macdonald, P.S., Stewart, S.: What are the factors in risk prediction models for rehospitalisation for adults with chronic heart failure? Aust. Crit. Care: Official J. Confederation Aust. Crit. Care Nurses 25(1), 31–40 (2012). http://www.ncbi.nlm.nih.gov/pubmed/21889893 CrossRef Betihavas, V., Davidson, P.M., Newton, P.J., Frost, S.A., Macdonald, P.S., Stewart, S.: What are the factors in risk prediction models for rehospitalisation for adults with chronic heart failure? Aust. Crit. Care: Official J. Confederation Aust. Crit. Care Nurses 25(1), 31–40 (2012). http://​www.​ncbi.​nlm.​nih.​gov/​pubmed/​21889893 CrossRef
5.
Zurück zum Zitat Cun, Y., Fröhlich, H.: Network and data integration for biomarker signature discovery via network smoothed t-statistics. PLoS One 8(9), e73074 (2013)CrossRef Cun, Y., Fröhlich, H.: Network and data integration for biomarker signature discovery via network smoothed t-statistics. PLoS One 8(9), e73074 (2013)CrossRef
6.
Zurück zum Zitat Gopakumar, S., Tran, T., Nguyen, T.D., Phung, D., Venkatesh, S.: Stabilizing highdimensional prediction models using feature graphs. IEEE J. Biomed. Health Inform. 19(3), 1044–1052 (2015)CrossRef Gopakumar, S., Tran, T., Nguyen, T.D., Phung, D., Venkatesh, S.: Stabilizing highdimensional prediction models using feature graphs. IEEE J. Biomed. Health Inform. 19(3), 1044–1052 (2015)CrossRef
7.
Zurück zum Zitat Jacob, L., Obozinski, G., Vert, J.P.: Group lasso with overlap and graph lasso. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 433–440. ACM (2009) Jacob, L., Obozinski, G., Vert, J.P.: Group lasso with overlap and graph lasso. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 433–440. ACM (2009)
8.
Zurück zum Zitat Kalousis, A., Prados, J., Hilario, M.: Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl. Inf. Syst. 12(1), 95–116 (2007)CrossRef Kalousis, A., Prados, J., Hilario, M.: Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl. Inf. Syst. 12(1), 95–116 (2007)CrossRef
9.
Zurück zum Zitat Kamkar, I., Gupta, S.K., Phung, D., Venkatesh, S.: Exploiting feature relationships towards stable feature selection. In: IEEE International Conference on Data Science and Advanced Analytics (DSAA), 36678, pp. 1–10. IEEE (2015) Kamkar, I., Gupta, S.K., Phung, D., Venkatesh, S.: Exploiting feature relationships towards stable feature selection. In: IEEE International Conference on Data Science and Advanced Analytics (DSAA), 36678, pp. 1–10. IEEE (2015)
10.
Zurück zum Zitat Kuncheva, L.I.: A stability index for feature selection. In: Artificial Intelligence and Applications, pp. 421–427 (2007) Kuncheva, L.I.: A stability index for feature selection. In: Artificial Intelligence and Applications, pp. 421–427 (2007)
11.
Zurück zum Zitat Li, C., Li, H.: Network-constrained regularization and variable selection for analysis of genomic data. Bioinform. 24(9), 1175–1182 (2008)CrossRef Li, C., Li, H.: Network-constrained regularization and variable selection for analysis of genomic data. Bioinform. 24(9), 1175–1182 (2008)CrossRef
12.
13.
Zurück zum Zitat Ma, S., Song, X., Huang, J.: Supervised group lasso with applications to microarray data analysis. BMC Bioinform. 8(1), 1–17 (2007)CrossRef Ma, S., Song, X., Huang, J.: Supervised group lasso with applications to microarray data analysis. BMC Bioinform. 8(1), 1–17 (2007)CrossRef
14.
Zurück zum Zitat Meinshausen, N., Bühlmann, P.: Stability selection. J. Roy. Stat. Soc. B (Stat. Methodol.) 72(4), 417–473 (2010)MathSciNetCrossRef Meinshausen, N., Bühlmann, P.: Stability selection. J. Roy. Stat. Soc. B (Stat. Methodol.) 72(4), 417–473 (2010)MathSciNetCrossRef
15.
Zurück zum Zitat Park, M.Y., Hastie, T., Tibshirani, R.: Averaged gene expressions for regression. Biostatistics 8(2), 212–227 (2007)CrossRefMATH Park, M.Y., Hastie, T., Tibshirani, R.: Averaged gene expressions for regression. Biostatistics 8(2), 212–227 (2007)CrossRefMATH
16.
Zurück zum Zitat Raghupathi, W., Raghupathi, V.: Big data analytics in healthcare: promise and potential. Health Inf. Sci. Syst. 2(1), 1–10 (2014)CrossRef Raghupathi, W., Raghupathi, V.: Big data analytics in healthcare: promise and potential. Health Inf. Sci. Syst. 2(1), 1–10 (2014)CrossRef
17.
Zurück zum Zitat Raina, R., Battle, A., Lee, H., Packer, B., Ng, A.Y.: Self-taught learning: transfer learning from unlabeled data. In: Proceedings of the 24th International Conference on Machine Learning, pp. 759–766. ACM (2007) Raina, R., Battle, A., Lee, H., Packer, B., Ng, A.Y.: Self-taught learning: transfer learning from unlabeled data. In: Proceedings of the 24th International Conference on Machine Learning, pp. 759–766. ACM (2007)
18.
Zurück zum Zitat Sandler, T., Blitzer, J., Talukdar, P.P., Ungar, L.H.: Regularized learning with networks of features. In: Advances in Neural Information Processing Systems, vol. 21, pp. 1401–1408. Curran Associates, Inc. (2009) Sandler, T., Blitzer, J., Talukdar, P.P., Ungar, L.H.: Regularized learning with networks of features. In: Advances in Neural Information Processing Systems, vol. 21, pp. 1401–1408. Curran Associates, Inc. (2009)
19.
Zurück zum Zitat Simon, N., Friedman, J., Hastie, T., Tibshirani, R., et al.: Regularization paths for cox’s proportional hazards model via coordinate descent. J. Stat. Softw. 39(5), 1–13 (2011)CrossRef Simon, N., Friedman, J., Hastie, T., Tibshirani, R., et al.: Regularization paths for cox’s proportional hazards model via coordinate descent. J. Stat. Softw. 39(5), 1–13 (2011)CrossRef
20.
Zurück zum Zitat Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc.: Ser. B (Methodol.) 58(1), 267–288 (1996)MathSciNetMATH Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc.: Ser. B (Methodol.) 58(1), 267–288 (1996)MathSciNetMATH
21.
Zurück zum Zitat Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused lasso. J. Roy. Stat. Soc. Ser. B (Stat. Methodol.) 67(1), 91–108 (2005)MathSciNetCrossRefMATH Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused lasso. J. Roy. Stat. Soc. Ser. B (Stat. Methodol.) 67(1), 91–108 (2005)MathSciNetCrossRefMATH
22.
Zurück zum Zitat Tran, T., Phung, D., Luo, W., Harvey, R., Berk, M., Venkatesh, S.: An integrated framework for suicide risk prediction. In: 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1410–1418. ACM (2013) Tran, T., Phung, D., Luo, W., Harvey, R., Berk, M., Venkatesh, S.: An integrated framework for suicide risk prediction. In: 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1410–1418. ACM (2013)
23.
Zurück zum Zitat Tran, T., Phung, D., Luo, W., Venkatesh, S.: Stabilized sparse ordinal regression for medical risk stratification. Knowl. Inf. Syst., 1–28 (2014) Tran, T., Phung, D., Luo, W., Venkatesh, S.: Stabilized sparse ordinal regression for medical risk stratification. Knowl. Inf. Syst., 1–28 (2014)
24.
25.
Zurück zum Zitat Yu, L., Ding, C., Loscalzo, S.: Stable feature selection via dense feature groups. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 803–811. ACM (2008) Yu, L., Ding, C., Loscalzo, S.: Stable feature selection via dense feature groups. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 803–811. ACM (2008)
26.
Zurück zum Zitat Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. Roy. Stat. Soc. B (Stat. Methodol.) 68(1), 49–67 (2006)MathSciNetCrossRefMATH Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. Roy. Stat. Soc. B (Stat. Methodol.) 68(1), 49–67 (2006)MathSciNetCrossRefMATH
27.
Zurück zum Zitat Zhao, P., Yu, B.: On model selection consistency of lasso. J. Mach. Learn. Res. 7, 2541–2563 (2006)MathSciNetMATH Zhao, P., Yu, B.: On model selection consistency of lasso. J. Mach. Learn. Res. 7, 2541–2563 (2006)MathSciNetMATH
28.
Zurück zum Zitat Zhou, J., Sun, J., Liu, Y., Hu, J., Ye, J.: Patient risk prediction model via top-k stability selection. In: Proceedings of the 13th SIAM International Conference on Data Mining. SIAM (2013) Zhou, J., Sun, J., Liu, Y., Hu, J., Ye, J.: Patient risk prediction model via top-k stability selection. In: Proceedings of the 13th SIAM International Conference on Data Mining. SIAM (2013)
29.
Metadaten
Titel
Stabilizing Linear Prediction Models Using Autoencoder
verfasst von
Shivapratap Gopakumar
Truyen Tran
Dinh Phung
Svetha Venkatesh
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-49586-6_46