Skip to main content
Top
Published in:

26-06-2022

Semi-supervised approach to event time annotation using longitudinal electronic health records

Authors: Liang Liang, Jue Hou, Hajime Uno, Kelly Cho, Yanyuan Ma, Tianxi Cai

Published in: Lifetime Data Analysis | Issue 3/2022

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Large clinical datasets derived from insurance claims and electronic health record (EHR) systems are valuable sources for precision medicine research. These datasets can be used to develop models for personalized prediction of risk or treatment response. Efficiently deriving prediction models using real world data, however, faces practical and methodological challenges. Precise information on important clinical outcomes such as time to cancer progression are not readily available in these databases. The true clinical event times typically cannot be approximated well based on simple extracts of billing or procedure codes. Whereas, annotating event times manually is time and resource prohibitive. In this paper, we propose a two-step semi-supervised multi-modal automated time annotation (MATA) method leveraging multi-dimensional longitudinal EHR encounter records. In step I, we employ a functional principal component analysis approach to estimate the underlying intensity functions based on observed point processes from the unlabeled patients. In step II, we fit a penalized proportional odds model to the event time outcomes with features derived in step I in the labeled data where the non-parametric baseline function is approximated using B-splines. Under regularity conditions, the resulting estimator of the feature effect vector is shown as root-n consistent. We demonstrate the superiority of our approach relative to existing approaches through simulations and a real data example on annotating lung cancer recurrence in an EHR cohort of lung cancer patients from Veteran Health Administration.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
go back to reference Cheng S, Wei L, Ying Z (1997) Predicting survival probabilities with semiparametric transformation models. J Am Stat Assoc 92:227–235MathSciNetCrossRef Cheng S, Wei L, Ying Z (1997) Predicting survival probabilities with semiparametric transformation models. J Am Stat Assoc 92:227–235MathSciNetCrossRef
go back to reference Chubak J, Onega T, Zhu W, Buist DS, Hubbard RA. An electronic health record-based algorithm to ascertain the date of second breast cancer events. Medical care (2015) Chubak J, Onega T, Zhu W, Buist DS, Hubbard RA. An electronic health record-based algorithm to ascertain the date of second breast cancer events. Medical care (2015)
go back to reference de Boor C (2001) A Practical Guide to Splines. Springer, New YorkMATH de Boor C (2001) A Practical Guide to Splines. Springer, New YorkMATH
go back to reference Dean C, Balshaw R (1997) Efficiency lost by analyzing counts rather than event times in poisson and overdispersed poisson regression models. J Am Stat Assoc 92:1387–1398MathSciNetCrossRef Dean C, Balshaw R (1997) Efficiency lost by analyzing counts rather than event times in poisson and overdispersed poisson regression models. J Am Stat Assoc 92:1387–1398MathSciNetCrossRef
go back to reference Demko S (1977) Inverses of band matrices and local convergence of spline projections. SIAM J Numer Anal 14:616–619MathSciNetCrossRef Demko S (1977) Inverses of band matrices and local convergence of spline projections. SIAM J Numer Anal 14:616–619MathSciNetCrossRef
go back to reference DeVore RA, Lorentz GG (1993) Constructive approximation, vol 303. Springer Science & Business Media, BerlinCrossRef DeVore RA, Lorentz GG (1993) Constructive approximation, vol 303. Springer Science & Business Media, BerlinCrossRef
go back to reference Golub GH, Van Loan CF (1996) Matrix computations, 3rd. Johns Hopkins University, Press, Baltimore, MD, USAMATH Golub GH, Van Loan CF (1996) Matrix computations, 3rd. Johns Hopkins University, Press, Baltimore, MD, USAMATH
go back to reference Hassett MJ, Uno H, Cronin AM, Carroll NM, Hornbrook MC, Ritzwoller D. Detecting lung and colorectal cancer recurrence using structured clinical/administrative data to enable outcomes research and population health management. Medical care (2015) Hassett MJ, Uno H, Cronin AM, Carroll NM, Hornbrook MC, Ritzwoller D. Detecting lung and colorectal cancer recurrence using structured clinical/administrative data to enable outcomes research and population health management. Medical care (2015)
go back to reference Horn RA, Johnson CR (1990) Matrix analysis. Cambridge University Press, CambridgeMATH Horn RA, Johnson CR (1990) Matrix analysis. Cambridge University Press, CambridgeMATH
go back to reference Klein JP, Moeschberger ML (2006) Survival analysis: techniques for censored and truncated data. Springer Science & Business Media, BerlinMATH Klein JP, Moeschberger ML (2006) Survival analysis: techniques for censored and truncated data. Springer Science & Business Media, BerlinMATH
go back to reference Nielsen J, Dean C (2005) Regression splines in the quasi-likelihood analysis of recurrent event data. J. statistical planning inference 134:521–535MathSciNetCrossRef Nielsen J, Dean C (2005) Regression splines in the quasi-likelihood analysis of recurrent event data. J. statistical planning inference 134:521–535MathSciNetCrossRef
go back to reference Rice JA, Silverman BW (1991) Estimating the mean and covariance structure nonparametrically when the data are curves. J Roy Stat Soc: Ser B (Methodol) 53:233–243MathSciNetMATH Rice JA, Silverman BW (1991) Estimating the mean and covariance structure nonparametrically when the data are curves. J Roy Stat Soc: Ser B (Methodol) 53:233–243MathSciNetMATH
go back to reference Royston P, Parmar MK (2002) Flexible parametric proportional-hazards and proportional-odds models for censored survival data, with application to prognostic modelling and estimation of treatment effects. Stat Med 21:2175–2197CrossRef Royston P, Parmar MK (2002) Flexible parametric proportional-hazards and proportional-odds models for censored survival data, with application to prognostic modelling and estimation of treatment effects. Stat Med 21:2175–2197CrossRef
go back to reference Stark H, Woods JW (1986) Probability, random processes, and estimation theory for engineers. Prentice-Hall Inc, Upper Saddle River, NJ Stark H, Woods JW (1986) Probability, random processes, and estimation theory for engineers. Prentice-Hall Inc, Upper Saddle River, NJ
go back to reference Uno H, Cai T, Pencina MJ, D’Agostino RB, Wei L (2011) On the c-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat Med 30:1105–1117MathSciNetCrossRef Uno H, Cai T, Pencina MJ, D’Agostino RB, Wei L (2011) On the c-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat Med 30:1105–1117MathSciNetCrossRef
go back to reference Uno H, Ritzwoller DP, Cronin AM, Carroll NM, Hornbrook MC, Hassett MJ (2018) Determining the time of cancer recurrence using claims or electronic medical record data. JCO Clinical Cancer Informatics 2:1–10CrossRef Uno H, Ritzwoller DP, Cronin AM, Carroll NM, Hornbrook MC, Hassett MJ (2018) Determining the time of cancer recurrence using claims or electronic medical record data. JCO Clinical Cancer Informatics 2:1–10CrossRef
go back to reference Wang H, Leng C (2007) Unified lasso estimation by least squares approximation. J Am Stat Assoc 102(479):1039–1048MathSciNetCrossRef Wang H, Leng C (2007) Unified lasso estimation by least squares approximation. J Am Stat Assoc 102(479):1039–1048MathSciNetCrossRef
go back to reference Wu S, Müller HG (2013) Zhang Z Functional data analysis for point processes with rare events. Stat Sin 23(1):1–23MATH Wu S, Müller HG (2013) Zhang Z Functional data analysis for point processes with rare events. Stat Sin 23(1):1–23MATH
go back to reference Yao F, Müller HG, Wang JL (2005) Functional data analysis for sparse longitudinal data. J Am Stat Assoc 100:577–590MathSciNetCrossRef Yao F, Müller HG, Wang JL (2005) Functional data analysis for sparse longitudinal data. J Am Stat Assoc 100:577–590MathSciNetCrossRef
go back to reference Younes N, Lachin J (1997) Link-based models for survival data with interval and continuous time censoring. Biometrics 53(4):1199–1211MathSciNetCrossRef Younes N, Lachin J (1997) Link-based models for survival data with interval and continuous time censoring. Biometrics 53(4):1199–1211MathSciNetCrossRef
go back to reference Zeng D, Lin D, Yin G (2005) Maximum likelihood estimation for the proportional odds model with random effects. J Am Stat Assoc 100:470–483MathSciNetCrossRef Zeng D, Lin D, Yin G (2005) Maximum likelihood estimation for the proportional odds model with random effects. J Am Stat Assoc 100:470–483MathSciNetCrossRef
go back to reference Zhang Y, Hua L, Huang J (2010) A spline-based semiparametric maximum likelihood estimation method for the cox model with interval-censored data. Scand J Stat 37:338–354MathSciNetCrossRef Zhang Y, Hua L, Huang J (2010) A spline-based semiparametric maximum likelihood estimation method for the cox model with interval-censored data. Scand J Stat 37:338–354MathSciNetCrossRef
Metadata
Title
Semi-supervised approach to event time annotation using longitudinal electronic health records
Authors
Liang Liang
Jue Hou
Hajime Uno
Kelly Cho
Yanyuan Ma
Tianxi Cai
Publication date
26-06-2022
Publisher
Springer US
Published in
Lifetime Data Analysis / Issue 3/2022
Print ISSN: 1380-7870
Electronic ISSN: 1572-9249
DOI
https://doi.org/10.1007/s10985-022-09557-5

Premium Partner