Top

Published in:

2016 | OriginalPaper | Chapter

Stabilizing Linear Prediction Models Using Autoencoder

Authors : Shivapratap Gopakumar, Truyen Tran, Dinh Phung, Svetha Venkatesh

Published in: Advanced Data Mining and Applications

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

To date, the instability of prognostic predictors in a sparse high dimensional model, which hinders their clinical adoption, has received little attention. Stable prediction is often overlooked in favour of performance. Yet, stability prevails as key when adopting models in critical areas as healthcare. Our study proposes a stabilization scheme by detecting higher order feature correlations. Using a linear model as basis for prediction, we achieve feature stability by regularizing latent correlation in features. Latent higher order correlation among features is modelled using an autoencoder network. Stability is enhanced by combining a recent technique that uses a feature graph, and augmenting external unlabelled data for training the autoencoder network. Our experiments are conducted on a heart failure cohort from an Australian hospital. Stability was measured using Consistency index for feature subsets and signal-to-noise ratio for model parameters. Our methods demonstrated significant improvement in feature stability and model estimation stability when compared to baselines.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Query Classification by Leveraging Explicit Concept Information

next chapter Mining Source Code Topics Through Topic Model and Words Embedding

http://apps.who.int/classifications/icd10.

https://www.aihw.gov.au/procedures-data-cubes/.

We ignore the bias parameter for simplicity.

Ethics approval was obtained from the Hospital and Research Ethics Committee at Barwon Health (number 12/83) and Deakin University.

Au, W.H., Chan, K.C., Wong, A.K., Wang, Y.: Attribute clustering for grouping, selection, and classification of gene expression data. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 2(2), 83–101 (2005)CrossRef

Austin, P.C., Tu, J.V.: Automated variable selection methods for logistic regression produced unstable models for predicting acute myocardial infarction mortality. J. Clin. Epidemiol. 57(11), 1138–1146 (2004)CrossRef

Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009)CrossRefMATH

Betihavas, V., Davidson, P.M., Newton, P.J., Frost, S.A., Macdonald, P.S., Stewart, S.: What are the factors in risk prediction models for rehospitalisation for adults with chronic heart failure? Aust. Crit. Care: Official J. Confederation Aust. Crit. Care Nurses 25(1), 31–40 (2012). http://www.ncbi.nlm.nih.gov/pubmed/21889893 CrossRef

Cun, Y., Fröhlich, H.: Network and data integration for biomarker signature discovery via network smoothed t-statistics. PLoS One 8(9), e73074 (2013)CrossRef

Gopakumar, S., Tran, T., Nguyen, T.D., Phung, D., Venkatesh, S.: Stabilizing highdimensional prediction models using feature graphs. IEEE J. Biomed. Health Inform. 19(3), 1044–1052 (2015)CrossRef

Jacob, L., Obozinski, G., Vert, J.P.: Group lasso with overlap and graph lasso. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 433–440. ACM (2009)

Kalousis, A., Prados, J., Hilario, M.: Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl. Inf. Syst. 12(1), 95–116 (2007)CrossRef

Kamkar, I., Gupta, S.K., Phung, D., Venkatesh, S.: Exploiting feature relationships towards stable feature selection. In: IEEE International Conference on Data Science and Advanced Analytics (DSAA), 36678, pp. 1–10. IEEE (2015)

10.

Kuncheva, L.I.: A stability index for feature selection. In: Artificial Intelligence and Applications, pp. 421–427 (2007)

11.

Li, C., Li, H.: Network-constrained regularization and variable selection for analysis of genomic data. Bioinform. 24(9), 1175–1182 (2008)CrossRef

12.

Lin, W., Lv, J.: High-dimensional sparse additive hazards regression. J. Am. Stat. Assoc. 108(501), 247–264 (2013)MathSciNetCrossRefMATH

13.

Ma, S., Song, X., Huang, J.: Supervised group lasso with applications to microarray data analysis. BMC Bioinform. 8(1), 1–17 (2007)CrossRef

14.

Meinshausen, N., Bühlmann, P.: Stability selection. J. Roy. Stat. Soc. B (Stat. Methodol.) 72(4), 417–473 (2010)MathSciNetCrossRef

15.

Park, M.Y., Hastie, T., Tibshirani, R.: Averaged gene expressions for regression. Biostatistics 8(2), 212–227 (2007)CrossRefMATH

16.

Raghupathi, W., Raghupathi, V.: Big data analytics in healthcare: promise and potential. Health Inf. Sci. Syst. 2(1), 1–10 (2014)CrossRef

17.

Raina, R., Battle, A., Lee, H., Packer, B., Ng, A.Y.: Self-taught learning: transfer learning from unlabeled data. In: Proceedings of the 24th International Conference on Machine Learning, pp. 759–766. ACM (2007)

18.

Sandler, T., Blitzer, J., Talukdar, P.P., Ungar, L.H.: Regularized learning with networks of features. In: Advances in Neural Information Processing Systems, vol. 21, pp. 1401–1408. Curran Associates, Inc. (2009)

19.

Simon, N., Friedman, J., Hastie, T., Tibshirani, R., et al.: Regularization paths for cox’s proportional hazards model via coordinate descent. J. Stat. Softw. 39(5), 1–13 (2011)CrossRef

20.

Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc.: Ser. B (Methodol.) 58(1), 267–288 (1996)MathSciNetMATH

21.

Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused lasso. J. Roy. Stat. Soc. Ser. B (Stat. Methodol.) 67(1), 91–108 (2005)MathSciNetCrossRefMATH

22.

Tran, T., Phung, D., Luo, W., Harvey, R., Berk, M., Venkatesh, S.: An integrated framework for suicide risk prediction. In: 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1410–1418. ACM (2013)

23.

Tran, T., Phung, D., Luo, W., Venkatesh, S.: Stabilized sparse ordinal regression for medical risk stratification. Knowl. Inf. Syst., 1–28 (2014)

24.

Ye, J., Liu, J.: Sparse methods for biomedical data. ACM SIGKDD Explor. Newsl. 14(1), 4–15 (2012)MathSciNetCrossRef

25.

Yu, L., Ding, C., Loscalzo, S.: Stable feature selection via dense feature groups. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 803–811. ACM (2008)

26.

Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. Roy. Stat. Soc. B (Stat. Methodol.) 68(1), 49–67 (2006)MathSciNetCrossRefMATH

27.

Zhao, P., Yu, B.: On model selection consistency of lasso. J. Mach. Learn. Res. 7, 2541–2563 (2006)MathSciNetMATH

28.

Zhou, J., Sun, J., Liu, Y., Hu, J., Ye, J.: Patient risk prediction model via top-k stability selection. In: Proceedings of the 13th SIAM International Conference on Data Mining. SIAM (2013)

29.

Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. Roy. Stat. Soc. B 67, 301–320 (2005)MathSciNetCrossRefMATH

Title: Stabilizing Linear Prediction Models Using Autoencoder
Authors: Shivapratap Gopakumar
Truyen Tran
Dinh Phung
Svetha Venkatesh
Publisher: Springer International Publishing
Book: Advanced Data Mining and Applications
Print ISBN: 978-3-319-49585-9

Electronic ISBN: 978-3-319-49586-6

Copyright Year: 2016
DOI: https://doi.org/10.1007/978-3-319-49586-6_46

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner