All the studies presented in this section will cover data at the patient level and venture to answer clinical level questions including: Prediction of ICU readmission, prediction of patient mortality rate (after ICU discharge and 5 years) and making clinical predictions using data streams. Due to this use of streaming data, ICU data exemplifies the Velocity aspect of Big Data, as well as Variety, as clinical information can contain many different types of features. As with the molecular-level data, feature selection can help choose the most important features, which also helps in making quicker clinical decisions.
Prediction of ICU readmission and mortality rate
The focus of the research covered in this subsection will be on predicting Intensive Care Unit (ICU) readmission, mortality rate after ICU discharge as well as predicting a 5 year life expectancy rate. The five year life expectancy rate will be looking to see how likely a patient will survive within a 5 year period. This is a useful line of research in that it can potentially help physicians know what to look for in their patients, determine which patients should have their ICU stay extended, and better tell which patients should receive particular treatments.
The study done by Campbell et al.
[
12] focuses on ICU patients that were discharged and expected to both live and not return too early afterwards. Their research used 4376 ICU (out of 6208 total) admissions from a database containing admissions from one ICU from January 1995 to January 2005 (a 10 year period). There was a total of 16 attributes chosen for each ICU admission, where among these attributes there were a few well-known scores used for the prediction of ICU readmission and death after discharge including: Acute Physiology and Chronic Health Evaluation II (APAHACE II) score
[
32], Simplified Acute Physiology Score II (SAPS II)
[
33], and the updated Therapeutic Intervention Scoring System (TISS)
[
34] score. The APACHE II score is a popular and well tested score based severity of disease classification system (SoDCS), described in
[
32] by Knaus et al., which uses a simple set of 12 physiological variables for prediction. The SAPS II score is another popular and well tested SoDCS using 17 total features, including age, 12 physiology features, admission type, and 3 disease type features, which is discussed further by Gall et al.
[
33]. The TISS score is a third popular and well tested SoDCS where originally 57 therapeutic intervention measurements were used but was updated where some features were added and some removed, while test results stayed the same. The updated TISS score is covered by Keene et al.
[
34].
Campbell et al. decided upon three research directives for prediction: death after ICU discharge (but before hospital discharge), readmission to the ICU within 48 hours of ICU discharge (again, before hospital discharge), and readmission to the ICU at any point after ICU discharge (and before hospital discharge). It should be noted that each patient could potentially fall into more than one of these categories, but this is not an issue due to there being three separate binary models built. The importance these prediction models could potentially have would be to help physicians determine which of these patients fall into these three groups and most significantly why they fall into these groups. If physicians can know why then they can determine which patients need their ICU stay extended. The feature selection method decided upon was simple logistic regression for all three models to determine which of the 16 attributes had a strong correlation to each prediction (P ≤ 0.2). Multiple Logistic regression was chosen for building the prediction models. All three of their models were tested using the Hosmer and Lemeshow goodness-of-fit (HLgof) test
[
35] and determined that the calibration was good and no more calibration was necessary. The HLgof is used to determine if logistic regression models have sufficient calibration (discussed by Hosmer et al.
[
35]).
The Area Under the (ROC) Curve (AUC) was used in this study to determine the classification and discrimination performance of the three models. According to Bradley
[
36], AUC is the best criteria for measuring the classification performance of a binary classifier such as logistic regression. AUC is a fitting metric to use on ICU admission data because the positive-class instances are a small percentage (i.e. only 3.3% of the patients were readmitted to the ICU within 48 hours). For determining the quality of the three models (using the chosen set of features) the AUC results for each model are compared to results obtained from the APACHE II score for each prediction. The first model, predicting death after ICU discharge (before hospital discharge) had an AUC value of 0.74 compared to AUC garnered from APACHE II of 0.69. The second model, for predicting readmission (before hospital discharge) obtained an AUC of 0.67 while APACHE II received an AUC of 0.63. The third model for predicting readmission within 48 hours of ICU discharge acquired an AUC value of 0.62 while APACHE II earned an AUC of 0.59.
The three models with the chosen set of features only achieved minimal improvement over APACHE II alone for the prediction of ICU readmission and mortality rate for ICU patients. Campbell et al. note that this minimal improvement could be because the APACHE II score already uses physiological variables, and it is these variables which are generally most useful for predicting ICU readmission and the mortality rate after ICU discharge. However, one non-physiological variable was shown to be highly correlated to both ICU readmission and death predictions: increasing age.
Only about 23.3% of patients used fall into one of these three model’s positive class and should not have been released from the ICU where if their ICU stay was extended, maybe some of these patients could have been saved. One item to note is that Campbell et al. only used one feature selection method as well as only one prediction model for their research. There is a possibility if other methods were tested there could have been models built that were able to outperform APACHE II. Even though the data was gathered from a database containing a 10 year’s worth of ICU patients, the data was still only from one ICU, and for the purpose of validation a larger variation (collecting from various ICUs) would have been beneficial.
Ouanes et al.
[
14] conducted their research with the goal of predicting whether a patient would die or return to the ICU within the first week after ICU discharge. Research was performed on 3462 patients (out of 5014) admitted to an ICU for a minimum of 24 hours, gathered from 4 different ICUs from the Outcomerea database. As Campbell et al. found in their data, the positive class is very small compared to the overall population with only about 3% (where 0.8% died and 2.1% were readmitted within 7 days). The feature selection method chosen was univariate analysis selecting variables with (P < 0.2) to add to the final model, and used the Akaike Information Criterion (AIC)
[
37] to identify the best model. AIC is a metric used to verify the overall quality of statistical models. The model created was then subjugated to a few validation and verification steps for testing (clinical relevancy, variable inter-correlation and co-linearity between variables) in order to end up with their final set of 6 variables from the original 41. The six variables chosen were age, SAPS II, the need for a central venous catheter, SIRS score during ICU stay, SOFA score, and discharge at night.
These variables were used to make the final prediction model using multivariate logistic regression, which will be used to develop their Minimizing ICU Readmission (MIR) score. The MIR score will be a quantitative measurement for determining whether a patient should be discharged from an ICU or not. The predictive results of MIR are compared to the results garnered from both SAPS II and Stability and Workload Index for Transfer (SWIFT)
[
38], another SoDCS using a small set of commonly available variables. Through MIR, Ouanes et al. were able to achieve good results with good calibration decided by the HLgof test and an AUC of 0.74 at a 95% confidence interval. The result SAPS II received for AUC was 0.64 and SWIFT getting an AUC score if 0.61, which shows that MIR performed considerably better. In this research only one feature selection method as well as only one classification method is used. The MIR score possibly could have yielded better results if more than one feature selection technique was used to determine which of the 41 variables would make the best model. In conjunction, MIR could possibly be further benefitted by testing out a number of other classifiers to develop the final model with the highest predictive power for this line of research. To fully test these results, one more comparison that should be made is their MIR score to that of either APACHE II or APACHE III scores, as shown in Campbell et al.
[
12] and Failho et al.
[
13].
Failho et al.
[
13] also seek to predict patients that will be readmitted to the ICU, but with the goal of only using a small amount of physiological variables, following the findings of Campbell et al.
[
12]. The prediction that this research is focusing on making is determining if patients will be readmitted within 3 days after discharge. The dataset they gathered was from the MIMIC II database
[
39] choosing only the patients over the age of 15 having an ICU stay of more than 24 hours giving a sample of 19,075 adults that were admitted to one of four ICUs. These 19,075 patients were further reduced by considering only patients for whom the researchers have all the variables available, leaving 3,034 patients. Further preprocessing reduced this to 1,267 patients. Finally, out of these 1,267 patients only 1,028 survived, giving a final dataset with 1,028 instances (and 13 members of the positive (readmittance) class).
Failho et al. decided upon two possible methods for feature selection: Sequential Forward Selection (SFS), or bottom-up approach, and Sequential Backwards Elimination (SBE), or top-down feature selection where both are discussed in
[
40]. SFS works on the original pool of features and starts with one feature then during each iteration one more feature will be added and the iterations will be stopped when the current model is deemed the best possible. SBE takes a different approach starting with the whole set of the original features and removing one feature at each iteration until the model is deemed the best possible. Failho et al. compared both of these methods in combination with their classifier, and found SFS to have better results than SBE in terms of AUC. Thus, it was chosen as their feature selection method. Using SFS, 6 out of the original 24 physiological variables were chosen: mean heart rate, mean temperature, mean platelets, mean blood pressure, mean SpO2, and mean lactic acid.
The classifier method used in this study was fuzzy modeling with sequential forward selection
[
40,
41], specifically Takagi-Sugeno (TS) fuzzy modeling. Fuzzy models use linguistic interpretations to formulate rules and logical connectives in order to make connections between features and the final prediction. With the use of linguistic interpretation, fuzzy modeling is a good choice for this line of research as clinical data needs to be interpreted, as a physician would do in a clinical setting. The only worry would be that with the rule-based side of the models there could possibly be too much rigidity causing the absence of physician discernment in the final model.
Failho et al. compared the results they got from their set of 6 physiological variables (determined by SFS) in conjunction with TS to that of the sets determined by APACHE II and APACHE III
[
42] scores (also in conjunction with TS). APACHE III is another SoDCS, and was created with the goal of improving some of the problems in APACHE III as discussed in
[
42] by Knaus et al. The results show a significant advantage in favor of Failho et al.’s. set securing an AUC score of 0.72 ± 0.04 while APACHE II and APACHE scored an AUC of 0.62 ± 0.03 and 0.64 ± 0.04 respectively. The SFS set also scored better in terms of specificity, sensitivity, and accuracy. This result shows that good prediction performance can be reached by using a small set of physiological variables. Even with the promising results there is more that could have been done in this research, one being that more variables, either physiological or not, could have been added to the original pool to see if results could have been improved. It might also have been beneficial to see if other feature selection techniques (tree-based or otherwise) may have improved upon the results achieved by SFS. Failho et al. do mention that fuzzy modeling has been shown to work comparably well to other classification methods for medical data yet there still could be benefit to this research if other classifier methods were tested to see if TS does yield the best results.
The research objective of Mathias et al.
[
15] is along a slightly different line where instead of trying to predict ICU readmission they look to predict a 5 year mortality rate through the construction of an Ensemble Index (EI). They used a group of 7463 patients taken from an Electronic Health Record (EHR) along with 980 attributes for each patient. There were two requirements for the patients to be used in this study: must be over the age of 50 (due to increasing age being a huge prediction factor for this line of study) and had at least 1 hospital visit within the year 2003. Due to the large amount of attributes in the original set of variables the feature selection method chosen was Correlation Feature Selection (CFS)
[
43] along with greedy stepwise search which will be used to create their EI. The CFS method finds variables that are both strongly correlated to the final prediction and that are weakly correlated between them. The CFS method with greedy stepwise search found a subset of 52 features which was broken down further by manual reduction followed by another round of CFS bringing the subset down to 23. This subset was then populated by one variable (gender) giving the final EI subset of 24 variables. The top 6 attributes in the EI (ranked by information gain) are age, comorbidity count, amount of hospitalization a year prior to admission, high blood urea nitrogen levels, low calcium, and mean albumin.
Rotation Forest Ensembling (RFE)
[
44] with Alternating Decision Tree (ADT)
[
45] was used to create their predictive model and was evaluated with tenfold cross-validation. Mathias et al. tested this technique with many other methods and found this technique to perform better (the reason it was used). The RFE algorithm is an ensemble of decision trees creating variation by assigning each tree a subset of features randomly chosen where Principle Component Analysis (PCA) is applied to each subset before each tree model is built. The ADT is a decision tree which instead of having a single class prediction located in its base leaf nodes, has a “probability of class membership” prior to each terminal node, and the sum of all these values are along an instance’s whole path in order to predict its class value. RFE and ADT is a good combination as RFE brings accuracy and diversity to the model, and ADT allows for more information to be gained about an instance as it goes along the tree.
The Ensemble Index was able to achieve quite good results scoring better in recall, precision c-statistic, etc. than both the modified Walter Life Expectancy Index (WLEI)
[
46] and the modified Charlson Comorbidity Index (CCI)
[
47]. The modified WLE and CCI are two well-tested and better-known life expectancy indices that are used for prediction in similar research. Even though these good results were achieved for the EI, there was only one feature selection process used where if more techniques were used a better subset could have been created, which is possible with there being 980 attributes in the original set of features. Again more variation of data could be added to this research as all the data was gathered from one source.
The studies shown in this section have the potential to improve clinical discharge procedure, determining which patients should be released from the ICU and which patients should receive a particular line of treatment. The goal of these studies is to find which attributes are the most correlated to why patients return to ICUs early or do not survive after discharge. Looking at the research efforts of Campbell et al.
[
12], Failho et al.
[
13], and Ouanes et al.
[
14] that covered prediction of ICU readmission and death rate after discharge, the top variables of why are age, APACHE scores, various physiological variables (e.g. heart rate), amount of organ dysfunction, as well as a few others. If physicians can better predict which of their patients will return to the ICU or not survive then they would know which patients to keep in the ICU longer and to give more focused care potentially saving, if not, at least prolonging a life. According to Ouanes et al., between day one and day seven after discharge the readmission rate and death rate go down drastically, meaning that keeping a patient just a little longer could be beneficial, but could also take away an ICU bed from another patient that needs intensive care. These are the reasons that studies attempting to figure out why patients return early or die soon after ICU discharge are quite important as lives can potentially be saved. The target of this research of course is on those patients that have preventable death as not all death will be preventable.
The variables that were shown to be most telling for the 5 year survival rate (Mathias et al.
[
15]) happened to be similar with age and physiological variables being at the top of the list. One benefit of this research is by looking well into a patient’s future can help physicians advise their patients better as far a treatment options. An example of this would be a physician could advise a patient whether or not to go through a particular rough line of treatment when they may not live long enough to reap the benefits of such treatment. This research is especially important for patients with increasing age as the older a patient is the less likely a harsh treatment would be beneficial.
Real-time predictions using data streams
The studies covered in this section are also on data gathered from the patient level and again have the intention of answering clinical level questions. Instead of predicting the patient’s condition in the future (i.e. ICU readmission or 5 year survival), the research here will be using data streams in order to predict patient’s conditions in real-time. Data streams are never ending torrents of data that requires continuous analysis giving the possibility for real-time results (a feature not available when using static data sets). This section will sample two different categories of data steam studies: making prognosis and diagnosis predictions for patients, and detecting if a new born is experiencing a cardiorespiratory spell both in a real time. The researchers here are attempting to develop methods that use these constant streams of data and make predictions in a continuous manner while keeping satisfactory accuracy and precision.
Zhang et al.
[
17] develop a clinical support system using data stream mining with the goal of analyzing patient data in order to make real-time prognosis and diagnosis. In order to handle the continuous stream of data an algorithm that can handle high-throughput data will be necessary leading the authors to choose Very Fast Decision Tree (VFDT). The VFDT algorithm is quite efficient as it was built to handle thousands of instances per second using basic hardware (discussed by Domingos et al.
[
48]). They discuss that VFDT has many advantages over other methods (e.g., rule based, neural networks, other decision trees, Bayesian networks) such as VFDT can make prediction both diagnostically and prognostically, can handle a changing non-static dataset, not using rigid rules (can be difficult for experts to put their knowledge into rules).
VFDT alone, though, is not able to give future predictions of a patient’s status only the current status; therefore, Zhang et al. decided to modify VFDT. For the modified VFDT, one or more pointer(s) were added to each of the terminating leaf nodes, where each base node corresponds to a distinct medical condition and each pointer corresponds to one medical records of a previous patient. To connect each stored medical record to its corresponding pointer, the authors created a mapping table so when the VFDT runs through and ends up on a base node the map will connect the leaf to its pointer(s) (corresponding medical records). These medical records, through Natural Language Processing (sentence and semantic similarity
[
49,
50]), will then be used to make a prediction about the patient’s future and give physicians the ability to better treat and advise their patients based on previous similar situations. The VFDT and the mapping table are updated as necessary (i.e. when a physician makes a new diagnosis or when a new medical record is added to the map).
Zhang et al. compare their method to that of IBM’s similar data stream mining technique covered by Sun et al.
[
51]. To test their method, IBM used 1500 ICU patients from the MIMIC II database along with various physiological waveforms (taken from an assortment of medical devices) and clinical data on each patient. IBM’s method consists of three main parts: 1.) physiological stream processing, 2.) offline analysis, and 3.) online analysis. For the steam processing part, a correlation base technique was chosen as such techniques are able to correlate well among sensors and are able to efficiently handle missing data (estimating missing values by way of linear regression models using other sensors during that period of time). Better results could possibly have been attained if techniques other than linear regression models were used to estimate missing values. A correlation based technique was not the only technique tested, they also tried a window based technique which estimates missing values for a sensor by using an averaged value during a small window of time from that sensor and imputing that value for the missing time. The correlation based technique found better results and therefore was used in the final method.
During offline analysis, Sun et al. use a method they created called Locally Supervised Metric Learning (LSML), which learns an adjustable distance metric by using knowledge from the current domain (in this study, clinical knowledge). The last step of online analysis takes place when a new patient is ready for prediction of prognosis where the system will find the set of similar cases by way of temporal alignment, and this is followed by applying a regression model to account for the uniqueness of patients. Sun et al. ran a comparison of their LSML to that of PCA and Linear Discriminant Analysis (LDA) in terms of both precision and accuracy with LSML scoring considerably better in both.
Mentioned by Zhang et al., their system will use fewer computer resources to run compared to IBMs as offline analysis will not needed. This comes down to a comparison of the complexity between LSML and VFDT, where LSML is quite complex and does not allow the real-time nature that can be offered by VDFT (the most complex calculation of Zhang et al.’s. method). Zhang et al.’s. method is not tested on real world data and would need to be before its usefulness can be determined and can be legitimately compared to IBM’s method. The real-time nature does offer the ability for making quick prognosis for patients, and if the results can be similar to the results found in IBM’s method, then predictions could not just be made quickly but with good accuracy and precision.
Thommandram et al.
[
18] use data streams with a different goal attempting to detect and eventually classify neonatal cardiorespiratory spells (a condition that can be greatly helped by being detected and classified in real time). A cardiorespiratory spell is classified as some combination of a pause in breathing, drop in blood oxygen saturation, and a decrease in heart rate. The name of their system is called Artemis and is designed to use a steady stream of physiological data from the new born patient and both detect and ascertain which type of cardiorespiratory spell the patient is experiencing all in real-time. The real-time manner could potentially save the lives of these infants giving physicians more time to fix what is wrong as the need for human diagnosis will be less. The actual stream processing part of Artemis is handled by the middleware system developed by IBM called InfoSphere Streams
[
16], built to handle multiple high-throughput data streams. Middleware is software that works to connect two programs that are otherwise not connected. Three different data streams (from three different sensors) are used, which correspond to the three different conditions for a cardiorespiratory spell: a respiratory impedance wave, a decrease in blood oxygen saturation, and a decrease in heart rate.
The authors wanted to develop a system that will improve upon current machines used today that use the absolute change method (i.e. if a machine detects heart rate under or over a cutoff, then notify physicians). Artemis will use a relative change method, where instead of a cutoff; there will be a sliding baseline that will be continuously updated in real-time as the patients “normal” readings change over time. The sliding baseline method can give a more reliable reading as it adapts to the unique reading of each patient as well as allow for more accurate spell detection. For classification, the reading from the three streams will be analyzed through a hierarchal rule based temporal model to determine which of the many cardiorespiratory spells the infant is experiencing.
The detection part of Thommandram et al.’s. system was tested on one patient in a Neonatal ICU during a 24 hour period where the sliding baseline method was found to alert physicians as often as found in the cutoff method for both heart rate and SpO2 readings. From the results for this one tested patient, they showed that their sliding baseline method could achieve clinically significant results for heart rate detection, with a specificity of 98.9% and a sensitivity of 100%. They did not test their classification method as mentioned by the authors as their future work. The detection part of this method will need further testing on many more patients before this method can be considered as clinically acceptable. When the classification part is tested it could be beneficial if they were to compare other methods to their hierarchal rule based temporal model.
Data stream mining presented here has shown the potential to be beneficial for clinical practice as it can be extended to be used in real time by use of efficient algorithms and methods (that are not previously used in the clinic). By using these data stream diagnosis for prognosis and spell detection, physicians could make faster and more accurate decisions and start solving the problem without spending as much time developing a plan. This line of research is fairly young and more studies will be needed through the development and testing of various new methods for using data stream mining for medical data with the goal of outputting results in real-time.