1 Introduction
2 Method
2.1 Overview
2.2 Information retrieval strategy
2.2.1 Search sources
2.2.2 Search terms
2.2.3 Study eligibility criteria
Inclusion | Exclusion |
---|---|
• Studies that address application of generative AI in Precision Medicine • Studies published from 2013 onwards • Original research articles | • Grey literature (including materials like magazines, conference abstracts, etc.) • Non-peer-reviewed sources (including Wikipedia, posters, reviews, and survey studies) • Informal or opinion-based publications (including editorials, commentaries) • Studies solely on generative model implementation techniques • Papers investigating DGMs or FMs outside of the personalized medicine domain • Research limited to precision medicine without broader DGMs or FMs applications |
2.2.4 Study selection and screening process
2.2.5 Data items and data extraction process
2.2.6 Data synthesis
2.2.7 Quality assessment
2.2.8 Review of literature
3 Results
3.1 Search and selection
3.2 Characteristics of the included studies
Features | Values |
---|---|
Years of publications, n (%) | |
2019 2020 2021 2022 2023 | 1 (3) 4 (14) 4 (14) 3 (10) 17 (59) |
Country of publication, n (%) | |
USA China Germany others | 8 (28) 7 (24) 3 (10) 11 (36) |
Type of publications (n) | articles (29) |
Application in the articles, n (%) | |
Clinical Informatics Medical Imaging Bioinformatics Foundation models in precision medicine | 9 (31) 8 (28) 9 (31) 3 (10) |
3.3 Findings of the included studies
# | Ref | Year | Applied Generative Model | Underlying Generative Model | Focused Application |
---|---|---|---|---|---|
1 | (Rampášek et al. 2019) | 2019 | Dr. VAE | VAE | Personalized drug response prediction |
2 | (Ge et al. 2020) | 2020 | MCGAN | GANITE | Personalized treatment effect (ITE) |
3 | (Xue et al. 2020) | 2020 | VAE, S-VQ-VAE | VAE | Representation of cellular states from gene expression data |
4 | (Elazab et al. 2020) | 2020 | GP-GAN | GAN | Growth prediction of gliomas (brain tumors) |
5 | (Yoon et al. 2020) | 2020 | ADS-GAN | GAN | Anonymization through data synthesis |
6 | (Barbiero et al. 2021) | 2021 | WGAN | GAN | Production realistic gene expression samples |
7 | (Sui et al. 2021) | 2021 | CVAE-GAN | VAE & GAN | Analyze the correlation between lung cancer imaging and gene expression data |
8 | (Tang et al. 2021) | 2021 | GANDA | GAN | Generation of intratumoral nanoparticles distribution (nps) |
9 | (Piacentino et al. 2021) | 2021 | GAN based ECG | GAN | Anonymize private healthcare data |
10 | (Ahmed et al. 2022) | 2022 | omicsGAN | GAN | Improved disease phenotype prediction |
11 | (Rafael-Palou et al. 2022) | 2022 | U-HPNet | U-Net | Predicting the progression of lung nodules |
12 | (Ahuja et al. 2022) | 2022 | MixEHR | LDA | large-scale automatic phenotyping using electronic health record (EHR) data |
13 | (Jahanyar et al. 2023) | 2023 | MS-ASGAN | GAN | Evaluating tabular biomedical data generated by GANs |
14 | 2023 | CSAM-GAN | GAN | Predicting prognostic outcomes in cancer using multimodal data | |
15 | (Wang et al. 2023) | 2023 | MOICVAE | VAE | Predict cancer drug response |
16 | (Yamanaka et al. 2023) | 2023 | DRAGONET | VAE | Generate new drug candidate molecules |
17 | (Strack et al. 2023) | 2023 | Wasserstein-GA | GAN | Monitor brain tumor changes |
18 | (Gao et al. 2023) | 2023 | BrainStatTrans-GAN | GAN | Generate corresponding healthy images of patients, which further used to decode individualized brain atrophy |
19 | (Moon et al. 2023) | 2023 | AttentionGAN | GAN | Predict short-term anatomical treatment outcomes for different anti-vascular endothelial growth factor agents |
20 | 2023 | GANCMLAE | GAN | Precisely detect individual brain atrophy patterns in Alzheimer's disease (AD) and mild cognitive impairment (MCI) | |
21 | (El Emam 2023) | 2023 | Conditional GAN | GAN | Synthetic patient cohorts that accurately |
22 | (Bernardini et al. 2023) | 2023 | CCGAN | GAN | Clinical data imputation |
23 | (Li et al. 2023) | 2023 | GAN-boosted SSL | GAN | Improve prediction models trained on electronic health records (EHRs) |
24 | (Hsu and Lin 2023) | 2023 | SCAN | VAE | Predicting cancer patient prognosis using small medical datasets |
25 | (Zhou et al. 2023) | 2023 | SCGAN | CGAN | Counterfactual explanations in breast cancer prediction |
26 | (Zhu et al. 2023) | 2023 | GluGAN | GAN | Personalized glucose monitoring |
27 | (Benary et al. 2023) | 2023 | ChatGPT, Galactica, Perplexity, and BioMedLM | LLMs | Supporting tool in Precision oncology |
28 | (Huang et al. 2023) | 2023 | ChatGPT-3 and ChatGPT-4 | LLMs | Benchmarking ChatGPT-4 on a radiation oncology in-training exam and Red Journal Gray Zone cases |
29 | (Toufiq et al. 2023) | 2023 | GPT-3.5, GPT-4, Gemini and Claude | LLMs | Candidate gene prioritization and selection |
3.3.1 Clinical informatics
3.3.2 Medical imaging informatics
3.3.3 Bioinformatics: integrated omics and biomarkers profiling
3.3.4 Utilization of foundation models (FMs)
# | Ref | Dataset | Type | Evaluation Measure | Performance |
---|---|---|---|---|---|
1 | (Rampášek et al. 2019) | 860 cell lines and 481 drug compounds from CTRPv2 (Rees et al., 2015); 77 cell lines and 811 drug compounds from CMap-L1000v1, NIH LINCS Consortium (Subramanian et al., 2017) | Pharmacogenomics data | Random Forest (RForest) | RForest achieved 23/26 |
2 | (Elazab et al. 2020) | BRATS 2014 (9 High-Grade Glioma subjects) and Guangzhou General Hospital (9 Low-Grade Glioma subjects) BRATS Dataset (https://www.virtualskeleton.ch/BRATS/Start2014) | MRI images | Jaccard index and Dice coefficient | Jaccard index = 78.97% and Dice coefficien = 88.26% |
3 | (Ge et al. 2020) | 256 newly diagnosed AML patients, M. D. Anderson Cancer Center AML-RPPA Database (http://bioinformatics.mdanderson.org/Supplements/Kornblau-AML-RPPA/aml-rppa.xls) and Related Study (https://pubmed.ncbi.nlm.nih.gov/18840713/) | Biological processes (apoptosis, cell-cycle, signal transduction pathways) | Mean Squared Error (MSE), Standard Deviation (STD), Accuracy (ACC) | MSE = 0.062, STD = 0.235 and ACC = 0.938 |
4 | (Xue et al. 2020) | 116,782 expression profiles from 9 cell lines (GP) and 85,183 profiles from 7 cell lines (SMP), LINCS Project (Keenan et al., 2018; Subramanian et al., 2017) | Gene expression data | The models are effective in uncovering connections between cellular perturbagens and identifying the affected genes by each drug | |
5 | (Yoon et al. 2020) | MAGGIC (30,389 patients) (Pocock et al., 2013); UNOS Transplant datasets—Heart-Transplant (56,822 patients), Lung-Transplant (26,854 patients), Heart-Wait-list (23,706 patients) UNOS Data. (https://www.unos.org/ data/) | Binary and Continuous data | AUROC | ADS-GAN surpassed PATE-GAN and DP-GAN in downstream prediction performance, maintaining 0.1-identifiability constraint, and showed dependability in joint distributions |
6 | (Barbiero et al. 2021) | Genotype-Tissue Expression (GTEx) project (15,201 RNA-Seq samples from 49 tissues of 838 donors) (Aguet et al., 2019) | Gene expression measurements | Utilized GNN and GAN for monitoring and forecasting clinically relevant endpoints, offering a comprehensive view of patient health states | |
7 | (Piacentino et al. 2021) | Fingerprints database (ChaLearn- http://chalearnlap.cvc.uab.es/dataset /32/description/); Iris database (IIT Delhi); Thyroid database (KEEL—https://sci2s.ugr.es/keel/dataset.php?cod=67); Cardiogram database (Physionet—https://physionet.org/physiobank/database/ptbdb/) | Various (including ECGs, iris images, thyroid data) | Demonstrated the efficacy of GANs in anonymizing data, particularly ECGs | |
8 | (Sui et al. 2021) | 211 subjects from an NSCLC cohort (Clark et al., 2013) | Radio genomic image data | Implemented a deep learning framework for radiogenomic research | |
9 | (Tang et al. 2021) | 27,775 patches from whole-slide images of T1 breast cancer sections | Whole-slide images | MSE, ICC | MSE = 1.871; ICC for QDs extravasation distance = 0.95, sub-area distribution = 0.99 |
10 | (Ahmed et al. 2022) | The Cancer Genome Atlas (TCGA) for BRCA, LUAD, and OV (TCGA https://cancergenome.nih.gov/) | RNA-seq, mRNA e and miRNA expressions | AUC | AUC for synthetic mRNA and miRNA expressions in various cancer types (0.708 to 0.949) |
11 | (Ahuja et al. 2022) | Clinical notes, codes, lab tests, prescriptions | MixEHR-G outperformed in phenotype label annotation and phenotype prevalence estimation | ||
12 | (Rafael-Palou et al. 2022) | 160 pulmonary tumors with two CT scan images | CT images | MAE, DSC, AUC_tumor_growth | MAE = 1.74 mm, DSC = 78%, AUC_tumor_growth = 84% |
13 | (Benary et al. 2023) | 10 fictional cancer patients (4 with lung cancer, 6 with other types) | Molecular alterations | Precision, F1 score, and Recall | Precision = 0.29, Recall = 0.29 and F1-score = 0.29 |
14 | (Bernardini et al. 2023) | Multi-Diabetic Centers (MDC) dataset (120 K diabetic patients), MIMIC-III dataset (Purushotham et al. 2018) | Demographics (ID, gender, birth year, diabetes diagnosis date), pathological (ID, ICD-9 codes, diagnosis date), lab tests (ID, codes, values, prescription date) | Imputation accuracy, predictive performance for diabetic retinopathy detection | ccGAN significantly outperformed leading methods in imputation (approx. 19.79% improvement) and predictive performance (up to 1.60% advantage). Demonstrated robustness across varying levels of missing data (up to 1.61% advantage under high missingness rates) |
15 | (El Emam 2023) | 2043 MDS patients from GenoMed4All cohort (Bersanelli et al., 2021); 2,957 MDS from IWG-PM (Bernard et al., 2022); 1,002 AML from GenoMed4All (Bersanelli et al., 2021) | Hematologic malignancies (genomic data, patient characteristics, disease subtypes, risk classifications, etc.) | Clinical Synthetic Fidelity (CSF), Genomic Synthetic Fidelity (GSF) | CSF = 93%, GSF = 90% |
16 | (Gao et al. 2023) | ADNI (https://adni.loni.usc.edu), AIBL (https://ida.loni.usc.edu/home/projectPage.jsp?project=AIBL), and OASI (https://www.oasis-brains.org/). Total subjects: 1,739 | T1w-MRI image data | The method outperforms current techniques by modeling personalized brain atrophy, enhancing disease diagnosis and interpretation | |
17 | (Hsu and Lin 2023) | METABRIC (Curtis et al., 2012), Gene Expression Omnibus (GEO) for breast and NSCLC patients (GEO Repository). (https://www.ncbi.nlm.nih.gov/geo) | Gene expression | AUROC | SCAN significantly outperformed existing benchmarks, including previous bimodal neural network classifiers, with AUROC scores of 81.73% for breast cancer and 80.46% for NSCLC (compared to 77.71% and 78.67% respectively). Independent validation showed SCAN's AUROC scores of 74.74% for breast cancer and 72.80% for NSCLC, outperforming the bimodal classifiers (64.13% and 67.07% respectively) |
18 | (Huang et al. 2023) | The 38th ACR Radiation Oncology In-Training Exam (TXIT) with 300 questions (ACR TXIT Exam—https://www.acr.org/-/media/ACR/Files/DXIT-TXIT/ACR-2021-TXIT-am— ) and 2022 Red Journal Gray Zone cases (Red Journal Gray Zone—https://www.redjournal.org/content/grayzone) | Medical questions & complex cases | Exam Score, Case Evaluation Performance | ChatGPT-3.5 scored 62.05%, and ChatGPT-4 scored 78.77% in the TXIT Exam |
19 | (Jahanyar et al. 2023) | Gene-expression omnibus (GEO) archive (GEO Archive). (https:/www.ncbi.nlm.nih.gov/) | Gene expressions | Confidence Interval | Utilized confidence intervals for a more reliable prediction range, enhancing the credibility of performance measure estimates |
20 | LGG & KIRC from TCGA (TCGA). (https://cancergenome.nih.gov/) | miRNA, mRNA, and pathological image data | SAM-GAN, combining a multilayer deep neural network and GAN, excelled in analyzing multimodal datasets for lower-grade glioma and kidney renal clear cell carcinoma | ||
21 | (Moon et al. 2023) | 1684 OCT images from 842 patients treated with ranibizumab or aflibercept | OCT images | Sensitivity and Specificity | AI model showed varied sensitivity and specificity between ranibizumab and aflibercept treatments, outperforming human examiners in some aspects. In 18.5% of cases, posttreatment image fluid status differed between the two treatments |
22 | (Li et al. 2023) | University of California Irvine Machine Learning Repository Type 2 Diabetes 30-Day Readmission (61,675 recods) (Eby et al., 2015), Surveillance, Epidemiology, and End Results Ovarian Cancer (10,038 records), Surveillance, Epidemiology, and End Results Colorectal Cancer (40,014 records) (https://www.seer.cancer.gov), and Second Affiliated Hospital of Zhejiang University (1244 records) | AUC | Average AUCs were 0.945, 0.673, 0.611, and 0.588. Outperformed graph-based learning and label propagation methods, and showed competitive AUCs even with only 10% labeled data. Also addressed data synthesis and privacy concerns | |
23 | ADNI http://adni.loni.usc.edu/ and Xuanwu cohorts (Sino Longitudinal Study on Cognitive Decline) | Structural MRI data | SSIM, PSNR, MSE, AUC | SSIM = 0.929, PSNR = 31.04, MSE = 0.0014. AUCs for AD and MCI were 0.867 and 0.752, respectively | |
24 | (Strack et al. 2023) | Local dataset including longitudinal follow-up scans from 15 patients diagnosed with recurrent Grade IV glioblastoma & TCIA (Clark et al., 2013) (20 patients newly diagnosed with glioblastoma) | MRI images | AUC for Wasserstein-GAN and RANO criteria | AUC_Wasserstein-GAN = 0.87, AUC_RANO_criteria = 0.66 |
25 | (Toufiq et al. 2023) | BloodGen3 (co-expression gene set (M9.2) Erythroid cell modules (11)) https://drinchai.shinyapps.io/BloodGen3Module/ | Erythroid coexpression gene signature | Claude from Anthropic and GPT-4 from OpenAI showed superior performance in candidate gene prioritization and selection | |
26 | (Wang et al. 2023) | GDSC (426 cell lines, 191 drugs) (Yang et al., 2013); CCLE (401 cell lines, 24 drugs) (Barretina et al., 2012); TCGA (Cerami et al., 2012) | Multi-omics data | AUC score | AUC scores were 0.856 for GDSC, 0.808 for CCLE, and 0.91 for TCGA |
27 | (Yamanaka et al. 2023) | ZINC database (249,455 molecules) (Irwin & Shoichet, 2005); LINCS (20,655 molecules) (Subramanian et al., 2017) | Drug Molecules | The model is capable of building molecules with changing substructures by exploring the latent space | |
28 | (Zhou et al. 2023) | Pima Indians Diabetes (768 instances) (Sigillito et al., 1989); Ionosphere (351 instances) (Smith et al., 1988); Breast Cancer DCE-MRI (922 instances) (Saha et al., 2018) | Numerical features, radiomic features, binary classification labels | SCGAN generates sparse, diverse, plausible, and feasible counterfactual instances, aiding in understanding causal links between features and treatment response | |
29 | (Zhu et al. 2023) | OhioT1DM (12 T1D subjects) (Marling & Bunescu, 2020); ARISES (12 T1D subjects); ABC4D (25 T1D subjects) | Glucose levels | GluGAN effectively generates high-quality synthetic glucose time series, useful for evaluating insulin delivery algorithms and potentially replacing pre-clinical trials |
# | Ref | Main Strength | Main Limitation |
---|---|---|---|
1 | (Rampášek et al. 2019) | The Dr.VAE model partially models drug perturbation effects | ▪ Reliance solely on gene expression data, overlooking the benefits of multi-omic integration ▪ A mismatch between features and sample sizes affecting model performance ▪ Overfitting in neural networks and the need for models to handle sparse, heterogeneous data more effectively |
2 | (Elazab et al. 2020) | Demonstrates superior quantitative and qualitative performance of GP-GAN compared to state-of-the-art reaction–diffusion based and deep learning-based methods for both LGG and HGG datasets | ▪ Absence of a detailed tumor model and assumption of constant tumor growth, neglecting scenarios of tumor reduction due to treatment ▪ The study's reliance on a limited dataset of only 18 subjects, reducing generalizability ▪ Recognition of mode collapse as an unresolved issue in GANs, compounded by limited data affecting the diversity of tumor shapes generated |
3 | (Ge et al. 2020) | Unleashing GANITE's full potential by estimating effects beyond binary treatments | ▪ Training GANs can be unstable, leading to inaccurate individual treatment effect (ITE) estimates ▪ Assumes the absence of unobserved confounders, risking biased outcomes ▪ Relies on a small real data set for method comparison and primarily uses LASSO for feature selection, leaving other techniques unexplored |
4 | (Xue et al. 2020) | Demonstrated DGMs like VAE ability to learn complex patterns in high-dimensional biological data like gene expression profiles | ▪ The model's representations lack direct correlation to specific biological entities such as proteins or pathways ▪ Performance heavily reliant on architectural tuning ▪ Limited data restricts learning, especially for rare perturbations ▪ While distributions are accurate, generated profiles may not completely reflect true biological responses |
5 | (Yoon et al. 2020) | Generating synthetic healthcare data that addresses the challenge of balancing data utility and patient identifiability | ▪ Uncertain applicability of ADS-GAN across various healthcare data sources and settings ▪ Lack of comprehensive analysis of ethical and legal implications of using synthetic healthcare data |
6 | (Barbiero et al. 2021) | The generative model can produce synthetic data representing biological states that may not be observed in reality, allowing for the simulation of rare clinical scenarios and personalized experiments in a virtual environment | ▪ Difficulty in comprehending the generative model's behavior and decisions, hindering its healthcare application ▪ Needs thorough validation and careful interpretation for reliable and accurate outcomes ▪ Model effectiveness hinges on the quality and representativeness of input data, like data from GTEx |
7 | (Piacentino et al. 2021) | Validates the anonymization approach through qualitative and quantitative analysis of the generated synthetic ECGs compared to real data | ▪ The generated ECGs are primarily assessed visually, lacking quantitative analysis ▪ Reliance on a limited dataset from only one source ▪ Synthetic ECGs require validation by expert physicians ▪ Issues like privacy breaches from data leakage and biases in synthetic data remain unexplored |
8 | (Sui et al. 2021) | Enables the establishment of an effective correlation between genomic and radiology information using hierarchical features | ▪ A potentially insufficient dataset size, affecting model generalizability ▪ Issues with uneven distribution within the dataset ▪ Detection of anomalies, especially around lung edges and other organs like the heart ▪ Lack of complete independence in tumor characteristics due to coupling in generation |
9 | (Tang et al. 2021) | Demonstration of the feasibility of using DGMs to investigate complex tumor-nano interactions with pixel-level accuracy and high reliability | ▪ The study is constrained by a single tumor model and focuses only on staining specific cell types, overlooking the diversity of tumor models and broader tumor microenvironment components ▪ Inconsistencies in immunohistochemistry staining methods across labs, coupled with GANDA's current limitation to 2D imaging, necessitating adaptation for 3D analysis |
10 | (Ahmed et al. 2022) | Demonstrates the ability of omicsGAN to integrate multiple omics data types, such as miRNA and gene expression, and their interaction networks, leading to improved predictive performance for cancer phenotype classification and survival prediction | ▪ The study's concentration on particular cancers like breast, lung, and ovarian limits its direct applicability to other cancer types or diseases ▪ The absence of specific quantitative metrics for assessing omicsGAN's performance, apart from AUC scores ▪ The study omits external validation using independent datasets, crucial for establishing the generalizability and robustness of its results |
11 | (Ahuja et al. 2022) | The topics inferred by MixEHR-G align effectively with the corresponding phenotypes and complement rule-based phenotyping algorithms | ▪ Requirement to convert continuous variables and summarize time-based observations ▪ Challenges in handling data from multiple centers with demographic and coding differences ▪ Need to confirm specific associations, like bipolar disease and hypothyroidism ▪ Reliance on age and sex in the Bayesian topic prior may limit applicability to diverse patient profiles |
12 | (Rafael-Palou et al. 2022) | Holds promise in delivering valuable predictions regarding the progression of lung nodules | ▪ Low number of analyzed tumor cases ▪ The segmentations were generated semi-automatically based on original diameter, growth, and centroid annotations, with final validation by a visual expert ▪ Relies on a single axial slice of the tumor to predict tumor growth |
13 | (Benary et al. 2023) | Addresses the challenge of integrating multidimensional data beyond established guidelines, showcasing the potential of LLMs in dealing with complex medical decision-making | ▪ Small Sample Size which may impact the generalizability of the results ▪ Limited Conclusions due to the rapid development of new LLM models and versions ▪ Experienced challenges related to the interpretation of results |
14 | (Bernardini et al. 2023) | The proposed ccGAN strategy exhibits reliable imputation performance compared to baseline GAN-based methods, particularly in handling missing values in the clinical MDC dataset under the MCAR assumption | ▪ Excludes high-missingness predictors, potentially missing key diabetes features ▪ Challenges in applying findings broadly, with key data fields like examinations and medications omitted ▪ The study lacks comprehensive evaluation of the long-term impact of using synthetic data in myeloid malignancy research |
15 | (El Emam 2023) | Offered evidence that synthetic data using cGAN can accelerate translational research in hematology | ▪ Applicability of findings primarily restricted to specific malignancies, with limited extension to other cancer types ▪ Inadequate assessment and evaluation of the long-term effects and sustainability of using synthetic data in myeloid malignancy research |
16 | (Gao et al. 2023) | Innovative approach to modeling individualized brain atrophy patterns has the potential to aid in the | ▪ Significant resources needed for generating and analyzing personalized brain atrophy patterns ▪ Difficulties in decoding outputs from complex methods like BrainStatTrans-GAN ▪ Essential to test and validate on varied datasets for broader model applicability |
17 | (Hsu and Lin 2023) | SCAN effectively utilizes both labeled and unlabeled patient data in a semi-supervised manner for predicting cancer patient prognosis | ▪ SCAN's enhanced performance heavily relies on the availability of unlabeled data ▪ Initial performance gains from duplicated unlabeled data, but effectiveness decreases beyond a certain threshold ▪ Utilizing synthetic data from advanced GANs does not endlessly boost the model's performance ▪ The study is constrained by the scope and size of the datasets used |
18 | (Huang et al. 2023) | Explores the vast potential of ChatGPT-4, in assisting with medical diagnoses and patient care | ▪ Limited assessment scope, missing a comprehensive comparison with medical LLMs like Med-PaLM ▪ Time-specific benchmarks influenced by model updates ▪ Challenges arise from inconsistencies in complex case evaluation, reliance on external tools, and the requirement for internet browsing for up-to-date information |
19 | (Jahanyar et al. 2023) | The study provides insights into reliable data generation and evaluation for microarray data, specifically related to schizophrenia gene expression | ▪ Single omics focus may overlook complex disease patterns revealed by multi-omics analysis ▪ Method's specificity to schizophrenia gene expression data limits applicability to other diseases or data types ▪ Missing details on computational demands, implementation time, and generalizability beyond schizophrenia restrict broader use |
20 | Introducing a novel GAN architecture with an attention mechanism that enables the model to assign different weights to input parts based on their relevance to the task | ▪ Limited prognosis prediction with multimodal data ▪ Prone to overfitting ▪ Low feature dimension (k) may limit performance | |
21 | (Moon et al. 2023) | Generation of more realistic post-therapeutic OCT images and superior performance compared to human examiners | ▪ Single-center retrospective study limits generalizability ▪ Evaluation focuses on short-term outcomes ▪ Limited OCT image variety ▪ Small study population (842 patients) ▪ Absence of noise reduction ▪ Minimal exploration of AI models |
22 | (Li et al. 2023) | The model fully utilizes the inner graphical structure of Electronic Health Records (EHRs), enhancing its ability to extract meaningful information | ▪ Unclear impact on data quality and prediction ▪ Limited applicability in varying label-rate scenarios ▪ Lack of defined switching thresholds between algorithms ▪ Requires improved patient privacy and IP protection |
23 | The GANCMLAE model combines GAN, AE, and multiple loss functions, improving the detection of individual brain atrophy in AD and MCI patients | ▪ Resolution mismatch between generated and input images ▪ Lack of exploration into physiological mechanisms for AD and MCI ▪ Exclusive use of 2D images due to processing limitations ▪ Small sample size leads to higher p-values and limited comparisons | |
24 | (Strack et al. 2023) | Fully unsupervised, requiring no manual annotations or large pre-trained models. It only needs two MR images from the same patient at different timepoints | ▪ Ground truth generated by a neural network, not medical experts ▪ Performance sensitivity to MRI image quality and resolution ▪ Limitation due to reliance on data from one patient ▪ Insufficient capture of patient case variability ▪ Limited resources due to lack of external funding ▪ Modified RANO criteria may not align with clinical standards, affecting clinical applicability |
25 | (Toufiq et al. 2023) | Establishes a standardized workflow for how LLMs can be integrated into the candidate gene prioritization process in a systematic way | ▪ Factual accuracy concerns ▪ Risk of information hallucination ▪ Dependency on LLMs' training data ▪ LLMs cannot replace traditional scientific methods |
26 | (Wang et al. 2023) | · Demonstrated the potential to predict drug responses in different cancer types and revealed differences in survival outcomes | ▪ Lacks clinical validation for practical treatment ▪ Limited to two omics integrations, potentially overlooking other influential factors |
27 | (Yamanaka et al. 2023) | The method can generate new drug candidate molecules for diseases with unknown therapeutic target proteins, which is a significant contribution to precision medicine | ▪ Need for broader disease applicability validation ▪ Limitations of structural similarity in assessing therapeutic effects ▪ Evaluation required for generating molecules across numerous diseases |
28 | (Zhou et al. 2023) | Introduce new method, SCGAN, to produce sparse, diverse, and plausible counterfactuals while maintaining proximity to the original instances | ▪ Limited comparison with other counterfactual generation methods ▪ Generalizability constrained to breast cancer datasets ▪ Potential biases from specific open-source data ▪ Concerns about SCGAN's scalability and computational efficiency in clinical settings |
29 | (Zhu et al. 2023) | Creating individualized glucose time series data | ▪ Absence of standardized GAN model validation criteria and hyperparameter tuning ▪ Unusual responses to certain conditional inputs, like unexpected glucose level changes ▪ Implementation challenges for a personalized T1D simulator ▪ Limited evaluation scope to clinical datasets, warranting broader demographic and clinical profile assessment |
4 Discussion
4.1 Interpretation of results
4.2 Challenges and limitations of deep generative models in precision medicine
Common limitations | Suggested recommendations |
---|---|
Limited generalizability to other diseases | Include diverse datasets covering a broader spectrum of diseases to enhance the model's applicability |
Lack of comprehensive evaluation metrics | Utilize a variety of quantitative metrics beyond AUC scores to thoroughly evaluate model performance |
Absence of external validation | Validate and strengthen findings through external validation on independent datasets |
Complexity and lack of interpretability in generative models | Improve model interpretability, possibly employing visualization tools, to foster understanding and trust in healthcare applications |
Complex model architecture | Rigorously validate and interpret the results of the advanced model architecture to ensure reliability and accuracy |
Dependency on data quality and representativeness | Improve the quality and representativeness of input data, considering alternative sources if necessary |
Instability in training GANs | Stabilize GAN training to improve the accuracy of Individual Treatment Effect (ITE) estimation, address and mitigate identified drawbacks associated with GANs, and systematically tune hyperparameters to enhance the stability and performance of GANs |
Assumption of no unobserved confounders | Explicitly consider and address potential unobserved confounders to mitigate bias in results |
Small real data example size | Increase the sample size for real data example, address the imbalance in data features versus samples and consider alternative approaches or augmentation techniques to address data size limitations |
Sparse modeling techniques | Explore alternative feature selection algorithms alongside LASSO techniques |
Limitation to single omics data analysis | Emphasize the importance of multi-omics studies and expand the integration of multi-omics data beyond genomes and transcriptomes |
Disease-specific methodology | Ensure methods are adaptable to different types of data and diseases |
Inadequate consideration of computational resources | Address the computational resources and time required for practical implementation |
Focus on single modality (gene expression) | Explore multi-omics predictors |
Overfitting issues with discriminative neural networks | Implement strategies and regularization techniques to mitigate potential overfitting issues |
Need for predictive models with heterogeneous data | Develop models that effectively utilize sparsely sampled data |
Poor prognosis outcome prediction for multimodal data | Focus on improving prognosis outcome prediction for patients |
Lack of clinical validation for practical treatment | Conduct rigorous clinical validation to ensure the practical efficacy of proposed treatment methods |
Non-interpretable learned representations | Work on developing models with more interpretable representations |
Dependency on Parameters like Architecture | Explore architectures that are less sensitive to parameter variations |
Inability to Infer Causality | Acknowledge the limitations in causal inference and communicate results as statistical correlations |
Challenges in generating realistic new data | Continuously refine methods to generate more realistic data along |
Method's Applicability to a Broader Range of Diseases | Extend validation efforts to ensure the method's applicability across a broader range of diseases |