Predicting the graft survival for heart–lung transplantation patients: An integrated data mining methodology
Introduction
In many circumstances, organ transplantation is the preferred treatment, sometimes the only permanent treatment, for the chronic failure of the major organs. For example, dialysis can be an option for survival (for months or even years) for a kidney patient whereas for a heart/lung awaiting patient there is no option other than transplantation. There has been considerable success in the field of organ transplantation, and further improvements in the outcome of transplantation procedures are in prospect [1]. The main challenge in organ transplantation is the shortage of donated organs. Additionally, a significant number of organs are being rejected due to a suboptimal match between the graft and the patient. The demand for organ transplantation is increasing while the number of donors remains the same, resulting in longer lists of patients waiting for transplantation [2]. In such a setting, outcome prediction is becoming increasingly important in medicine. But when a resource is scarce the need for accurate prediction becomes acute [3]. Especially prediction of survival is clinically important but a challenging problem [4]. Therefore, optimization of the system necessitates sophisticated procedures for the selection of optimal organ recipients since currently it is impossible to satisfy all organ demands. On the other hand, there are competing principles in hand to satisfy such as utility, justice, and equity principles. Namely, the likelihood of satisfactory outcomes must be jointly optimized with the urgency of need. To be able to achieve this level of sophistication, the first step is to reveal the underlying knowledge in the large amount of data that is recorded in organ transplantation procedures. The main idea would be to maximize the survival rate for transplantation in the light of hundreds of determinative variables captured and stored in databases regarding the donor/graft and the potential recipient while simultaneously optimizing the utility, justice and equity principles as well. Until now, the main focus has been only on some specific factors although there might be many more to be taken into account. The finding in our study will provide a new insight into the aforementioned three principles. For example, Kirklin et al. [5] defines utility as “an allocation policy that maximizes patient and graft survival”. Here come two questions in mind: “(1) Based on what determinative variables can the patient and graft survival be maximized?” and “(2) How can these critical determinative variables be objectively specified and combined in a methodological manner?” It would be naïve to assume that a decision maker can take all of the independent factors into account to optimize the solution, due to the bounded rationality of human beings, and attempting to do that would be extremely time-consuming, resulting in some trivial information being inferred and acted upon in the process. Therefore, the abovementioned two questions are essentially addressed in our study and a data-driven variable selection methodology is provided for an effective solution for these two main questions.
Organ transplantation consists of kidney/pancreas, liver, and thoracic transplantation. Thoracic transplantation refers to heart, lung, and simultaneous heart–lung organ transplantation procedures. It has become an established form of therapy for patients with end-stage heart and lung disease since its first clinical introduction in the 1960s [6]. The number of heart transplant operations performed annually in the United States has grown from 2108 in 1990 to 2192 in 2006 (a marginal increase) while the number of lung transplants has grown from 18 in 1987 to 1400 in 2006 (a dramatic increase) [7], [8], [9]. Thoracic transplantation is significantly different from other organ transplantation procedures in that it requires transplantation faster and is more vital to patient survival. For example, a kidney transplantation awaiting patient might survive for extended periods of time by using a dialysis machine, while for a patient awaiting thoracic transplantation does not have this choice—at least not at the same comfort and cost level. A huge amount of data is complied for thoracic organ transplant patients and is analyzed to assess the importance of patient demographics, risk factors, and mortality [10]. These analyses have focused on identifying the characteristics of thoracic transplant recipients and their associated post-transplant outcomes, namely survival [11]. Section 1.2 briefly summarizes the current state-of-the-art on organ transplantation studies.
Vast majority of analytics-driven organ transplantation research involves simulation studies available since the mid-1980s specifically in a simulation modeling standpoint [2], [12]. Their study provided a useful simulation tool which utilized UNOS (United Network for Organ Sharing) liver allocation data hence named as ULAM (Unos Liver Allocation Model) [13]. McEwan et al. [14] focused on evaluating the cost-effectiveness for post-surgical management of renal transplant recipients using a discrete event stochastic simulation. Thompson et al. [15] proposed more sophisticated simulation tool which can simultaneously handle various organ transplantation scenarios, namely heart–lung, liver, and kidney.
The main limitations of the aforementioned studies are twofold, (1) they do not give satisfactory results at the local or regional levels since the datasets are mostly retrieved from national sources and (2) the outcome measure that drove many of the original liver allocation debates, such as waiting time, was found to be a poor measure of differences in access to transplantation [13]. Therefore, if a well-established decision support system can be developed, which is capable of determining valid indicators of medical urgency/priority, there will be a linkage to better simulation scenarios at all potential levels of organ transplantation.
A large body of research exists for data-driven analytics in various organ transplantation cases. To exemplify, Hariharan et al. [16] mainly focus on the analysis of improved graft survival rate by using cyclosporine after renal transplantation in both the short-term (less than 1 year) and the long-term (more than 1 year). In total, 93,934 renal transplantations which were performed in the United States between 1988 and 1996 have been investigated in this study. The Kaplan–Meier method was used to predict the survival rate of the transplants for each year. Demographic characteristics, transplant-related variables, and post-transplantation variables are the primary three types of variables analyzed in the study. After all, a regression analysis has been used to predict the probability of the graft failure after kidney transplantation in both short-term and long-term periods.
The study performed by Herrero et al. [17] includes 116 patients who received a liver transplant between the years 1994 and 2000. The dataset has roughly half of the patients younger than the age of 60 (n = 54), and the other half older than 60 (n = 57). Chi-square test, Student's t-test, and the Mann–Whitney U-test are used to compare the demographic and characteristic variables, pre-transplant, and intra-operative variables between the two groups, namely, patients younger and older than 60. The results indicate that there is no significant difference between these age groups. However, there is a clear trend showing that older patients have lower survival after liver transplantation. Kusiak et al. [18] conducted a study which compared two rule-based data mining techniques, i.e. decision trees and rough sets, for predicting survival time of kidney dialysis patients. Their study presented reasonable prediction accuracy rates. The main limitation of the study was the utilization of a small dataset with 188 patients in total and many patient-related parameters were ignored. Hong et al. [19] presented a survival analysis of liver transplant patients in Canada by considering only some of the determinative factors such as age, blood type, donor type (cadaveric or alive), race and gender of recipient and donors. Having limited the variables with this scope, in their study they also admit that the clinical information used in the study lacks many details. Specifically focused on thoracic transplantation, Jenkins et al. [20] and Fernandez-Yanez et al. [21] had a rich pool of independent variables for survival prediction. They employed Kaplan–Meier method of survival analysis with a Mantel–Haenszel log-rank test. These studies have two major limitations: First, they lack effective methods to reveal the previously unknown potentially useful patterns. Secondly, the variables were selected only based on the experiences and intuitions of the analysts who conducted the study. A more recent study has the same limitations, which was proposed by Tjang et al. [22]. Based on their experience, they adopted some newer explanatory variables such as body mass index, waiting time on the list, and previous cardiac surgery to determine the survival in heart transplantation. Similar limitations also exist in some other studies related to thoracic transplantation [23], [24], [25]. However, they revealed very useful knowledge to the organ transplantation domain based on the classical statistical assumptions adopted.
Although there are some advanced statistical techniques (e.g., nonparametric or nonlinear statistical modeling), they are computationally expensive—especially in the cases of very crowded set of variables, and they require a prior knowledge about the data to set the initial parameters for the modeling. On the other hand, data mining techniques provide relatively faster solutions and do not ask for prior knowledge about the data. Due to these reasons, this paper proposes an integrated data mining approach for predicting the graft survival for combined heart–lung transplantation patients. The rest of paper is organized as follows. Section 2 presents the overall methodology of the integrated data mining approach. Section 3 validates the method using some case studies with actual data set. Finally, Section 4 summarizes the entire methodology and also discusses the related future research.
Section snippets
Proposed methodology
Organ transplantation procedures involve a large number of variables that may have significant impact on the survival of the graft and/or the patient. However, as explained in Section 1.2, existing studies on thoracic transplantation procedures rely heavily on some specific variables derived from expert knowledge and experience rather than data-driven analytical methodologies. The omission of the vast majority of the variables may hinder the discovery of underlying relationships between
Results and discussion
As mentioned in Section 2.1, a total of 16,604 patients were modeled in this study. Of these 16,604 patients, 11,928 (∼72%) were first-time transplants and the rest (∼28%) were repeat transplants. Overall, 52% of all grafts succeeded and rest failed to survive the 9-year survival period. The mean and standard deviation for survivors and non-survivors in year terms were: , and , . A representative sample for most common causes
Conclusion and future research
This study suggests that when modeling combined heart–lung transplantation procedures a data-mining-driven methodology should be used to augment the variable selection process rather than focusing on mere expert-selected predictor variables. The human expert's input cannot be ignored in modeling combined heart–lung transplantation (nor can be in any area of medicine) but should be (and as shown in this study, could be) strengthen with the knowledge that can be discovered from data. In order to
Acknowledgements
We gratefully acknowledge the support and help of Chief Medical Officer and cardiologist of INTEGRIS Heart Hospital Charles Bethea, MD; INTEGRIS’ Director of Decision Support Leva Swim, Ph.D., and UNOS SAS software analyst Ms. Katarina Anderson. We are also thankful to two anonymous reviewers whose comments and suggestions have helped improve an earlier version of this paper. At last, but not least, we thank to Dr. Helena Karsten, International Journal of Medical Informatics special issue
References (48)
- et al.
Predicting cytomegalovirus disease after renal transplantation: an artificial neural network approach
International Journal of Medical Informatics
(1999) - et al.
Single and multiple time-point prediction models in kidney transplant outcomes
Journal of Biomedical Informatics
(2008) - et al.
Current outcomes following heart transplantation
Thoracic and Cardiovascular Surgery
(2004) - et al.
Thoracic organ transplantation
American Journal of Transplantation
(2004) - et al.
Thoracic transplantation
American Journal of Transplantation
(2003) - et al.
Evaluation of the cost-effectiveness of sirolimus cyclosporin for immunosuppression after renal transplantation in the United Kingdom
Clinical Therapeutics
(2005) - et al.
Liver transplant recipients older than 60 years have lower survival and higher incidence of malignancy
American Journal of Transplantation
(2003) - et al.
Predicting survival time for kidney dialysis patients: a data mining approach
Computers in Biology and Medicine
(2005) - et al.
Survival analysis of liver transplant patients in Canada
Transplantation Proceedings
(2006) - et al.
Survival analysis and risk factors for mortality in transplantation and staged surgery for hypoplastic left heart syndrome
Journal of the American College of Cardiology
(2000)
Prognosis of heart transplant candidates stabilized on medical therapy
Rev. Esp. Cardiol.
A cost comparison of heart transplantation versus alternative operations for cardiomyopathy
Annual Thoracic Surgery
Differences in clinical profile and survival after heart transplantation according to prior heart disease
Transplantation Proceedings
Predictive data mining in clinical medicine: current issues and guidelines
International Journal of Medical Informatics
Logistic regression and artificial neural network classification models: a methodology review
Journal of Biomedical Informatics
Modeling medical prognosis: survival analysis techniques
Journal of Biomedical Informatics
Feature mining and predictive model construction from severe trauma patient's data
International Journal of Medical Informatics
New directions for organ transplantation
Nature
Kidney transplantation: a simulation model for examining demand and supply
Management Science
Survival benefits of heart and lung transplantation
Annals of Surgery
Worldwide thoracic organ transplantation: a report from the UNOS/ISHLT International Registry for Thoracic Organ Transplantation
Clinical Transplantation
Organ transplantation policy evaluation
Cited by (96)
Survival analysis for pediatric heart transplant patients using a novel machine learning algorithm: A UNOS analysis
2023, Journal of Heart and Lung TransplantationArtificial intelligence, big data and heart transplantation: Actualities
2023, International Journal of Medical InformaticsPredictive analytics and machine learning for medical informatics: A survey of tasks and techniques
2021, Machine Learning, Big Data, and IoT for Medical InformaticsPredictors of length of stay in the coronary care unit in patient with acute coronary syndrome based on data mining methods
2020, Clinical Epidemiology and Global HealthSystematic review of using medical informatics in lung transplantation studies
2020, International Journal of Medical Informatics