Machine learning compensates fold-change method and highlights oxidative phosphorylation in the brain transcriptome of Alzheimer’s disease

Cheng, Jack; Liu, Hsin-Ping; Lin, Wei-Yong; Tsai, Fuu-Jen

doi:10.1038/s41598-021-93085-z

Download PDF

Article
Open access
Published: 01 July 2021

Machine learning compensates fold-change method and highlights oxidative phosphorylation in the brain transcriptome of Alzheimer’s disease

Scientific Reports volume 11, Article number: 13704 (2021) Cite this article

2040 Accesses
7 Citations
11 Altmetric
Metrics details

Subjects

Abstract

Alzheimer’s disease (AD) is a neurodegenerative disorder causing 70% of dementia cases. However, the mechanism of disease development is still elusive. Despite the availability of a wide range of biological data, a comprehensive understanding of AD's mechanism from machine learning (ML) is so far unrealized, majorly due to the lack of needed data density. To harness the AD mechanism's knowledge from the expression profiles of postmortem prefrontal cortex samples of 310 AD and 157 controls, we used seven predictive operators or combinations of RapidMiner Studio operators to establish predictive models from the input matrix and to assign a weight to each attribute. Besides, conventional fold-change methods were also applied as controls. The identified genes were further submitted to enrichment analysis for KEGG pathways. The average accuracy of ML models ranges from 86.30% to 91.22%. The overlap ratio of the identified genes between ML and conventional methods ranges from 19.7% to 21.3%. ML exclusively identified oxidative phosphorylation genes in the AD pathway. Our results highlighted the deficiency of oxidative phosphorylation in AD and suggest that ML should be considered as complementary to the conventional fold-change methods in transcriptome studies.

Differential transcript usage unravels gene expression alterations in Alzheimer’s disease human brains

Article Open access 04 January 2021

A multi-omics dataset for the analysis of frontotemporal dementia genetic subtypes

Article Open access 01 December 2023

Exploring the pathogenesis and key genes associated of acute myocardial infarction complicated with Alzheimer’s disease

Article Open access 16 January 2024

Introduction

Alzheimer's disease (AD) is a neurodegenerative disease that usually starts gradually around the age of 65 and causes around 70% of dementia cases. Over 20 years, the Aβ amyloid hypothesis dominated the direction of research and drug development in AD. Briefly, APP excision by β- and γ-secretases sequentially yields 40 and 42 amino Aβ monomers, which in turn accumulate into amyloid fibrils and causes downstream tau hyperphosphorylation and neurotoxicity, under the condition of insufficient degradation of Aβ. Although Aβ amyloid and tau hypotheses are still the major focuses of clinical trials¹, the high failure rate (205 phase 3 trials completed, terminated, withdrawn, and only one approved by FDA up to Feb 2020, http://clinicaltrials.gov) pushed the research community for the reappraisal of the Aβ-centered etiology^2,3.

According to Gong et al. 2018, the collective effects of multiple genes/insults may lead to the development and onset of AD². Thus, multifactorial diagnosis and personalized treatment were emphasized since different combinations of etiological genes/insults may present in each individual. However, due to insufficient knowledge of AD's full spectrum, there is an urgent need to decipher the mechanism and risk factors of AD.

Machine learning (ML) is the process that computer systems use algorithms and statistical models to perform a prediction relying on patterns and inference without using explicit instructions. The application of ML on AD is focused on the diagnosis of AD from neuroimaging⁴. Despite the fact that the emergence of a wide range of biological data of AD, including genomic profiling and electronic health records, a comprehensive understanding of AD's mechanism from ML is so far unrealized, majorly due to the lack of needed data density⁵. We have previously identified MMP14 and dystonin potentially modulate the crosstalk between diabetes and AD by meta-analysis^6,7. In this study, we applied ML to a publically available transcriptome dataset from AD postmortem to uncover the complex genetic network and compare the results with conventional fold-change (FC) methods.

Methods

Data source

The gene expression profile of the prefrontal cortex brain tissues of 310 AD patients and 157 non-demented control samples were retrieved from the GSE33000 dataset⁸ of the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database. This dataset was selected. The processed data, which have been adjusted for the age, gender, RIN, pH, PMI, batch, and preservation of the samples, were downloaded from the Sample table. This dataset contains 39,279 detected probes, of which 13,798 were annotated, and a total of 9969 genes were profiled, while 31 probes were omitted due to mapping to more than one gene.

Another publically available microarray dataset GSE84422⁹, which profiled PFC from 56 postmortems with varying degrees of AD pathological abnormalities, was utilized as the unseen dataset to verify our models. The samples were classified into control or AD by CDR, Braak, and CERAD. Notably, due to the difference of microarray used, out of the 9966 attribute genes of the training dataset, 3680 genes were not profiled in the testing dataset. To conduct the testing, these 3680 gene profiles were artificially added with FC assigned as "1" for all samples.

Machine learning

RapidMiner Studio version 9.5 (WIN64 platform) was registered to Jack Cheng and was executed under the Windows 10 operating system with Intel Core i3-3220 CPU and 16 GB RAM. In addition to the samples' age and sex, the 9969 profiled genes were assigned as the regular attributes (potential contributing factors to be analyzed in modeling operator) in the modeling. The disease status (1 = AD; 0 = non-AD CTRL) was assigned as the Label attribute (the predicted class in modeling operator). The sample ID was assigned as the ID attribute (assigning the identity of the sample). The input matrix is supplied as Supplementary File 1.

Seven predictive operators or combinations of RapidMiner Studio operators were used to establish predictive models from the input matrix and assign a weight to each attribute. They were (1) AdaBoost + Decision Tree, (2) AdaBoost + Rule Induction, (3) AdaBoost + Decision Stump, (4) Generalized Linear Model, (5) Logistic Regression, (6) Gradient Boosted Trees, and (7) Random Forest + Weight by Tree Importance. The parameters of these operators are listed in the Parameters sheet of Supplementary File 2. Notably, in the Random Forest model, the number of trees was 500, and the depth of split was set to '-1', which means the maximal depth parameter puts no bound on the depth of the trees. Moreover, the Generalized Linear Model is a regularized GLM, and the elastic net penalty was used for parameter regularization. Other operators under the category Models / Predictive were abandoned in this study due to the reasons listed in the Models sheet of Supplementary File 2.

The model's performance was estimated by cross-validation of models, which contains two subprocesses: a training subprocess and a testing subprocess. The training subprocess produces a trained model to be applied to the testing subprocess for the performance evaluation. In this study, the samples were randomly divided into ten subsets, with an equal number of samples. Each of the ten subsets was iterationaly used in the testing subprocess to evaluate the trained model from the other nine subsets. The convergence of each model's iteration was recorded and summarized in Supplementary File 3, which describes how genes were aggregated from these iterations. The performance of a model can be evaluated by its accuracy, precision, and recall, where accuracy = (TP + TN)/(TP + FP + FN + TN), precision = TP/(TP + FP), recall = TP/(TP + FN), T = true, F = false, P = positive, and N = negative. The setup diagrams of the seven predictive models are illustrated in Supplementary File 4.

Conventional fold-change method

The fold-change (FC) was defined as the average of gene expression of AD samples relative to that of control samples. Student’s T-test was used to calculate the significance of FC. Non-significant FCs (p > 0.05) were neglected.

Gene enrichment analysis

The gene list was used as the input to STRING: functional protein association networks¹⁰ (https://string-db.org/). For the global enrichment analysis, gene symbols with weight/expression levels were submitted to the “Proteins with Values / Ranks” module. For KEGG^11,12 enrichment analysis, gene symbols were submitted to the “Multiple Proteins by Names / Identifiers” module.

Results

Identifying AD-predictive genes by ML

We developed a workflow (Fig. 1) to identify AD-predictive genes by ML, and each of the seven predictive operators or combinations of operators produced a gene list along with the weight of predictive contribution. The full lists are provided in the sheets of Generalized Linear Model, Logistic Regression, Rule Induction, Decision Stump, Decision Tree, Gradient Boosted Trees, and Weight of Random Forest of Supplementary File 2. The average accuracy of these models ranges from 86.30% to 91.22%, and the Performance sheet of Supplementary File 2 summarizes the accuracy, precision, and recall of each model, while ROC curves and precision recall curves are shown in Fig. 2. Combing the genes from the seven models, we got a union of 1126 non-redundant genes with weight > 0 (the Non-redundant Genes sheet of Supplementary File 2). To further extract the more representative genes, those genes satisfying both conditions, 1) genes with the weight of the minimum value (i.e., 0.001), and 2) genes without a presence in the global enrichment analysis (the Global Enrichment sheet of Supplementary File 2), were filtered out. Finally, we reached a list of 314 genes (the Genes sheet of Supplementary File 5).

We conducted the analysis of variance (ANOVA) test to determine the probability for the null hypothesis of the equal performance of the different ML models. The ANOVA result (f = 1.558, prob = 0.174, alpha = 0.050) could not reject the null hypothesis, indicating that the difference between the performance of the different ML models is not significant. The process was exported as “ANOVA.rmp” and was uploaded to GitHub at https://github.com/JackCheng-TW/RapidMiner-files/Process/.

To check if our findings are not unique to a single dataset, we took another microarray-profiled PFC dataset GSE84422 as an unseen dataset to verify our models. Although nearly one-third of the training genes are missing in the test dataset, GSE84422 is currently the 2nd largest one after GSE33000. Upon testing, the accuracy was 28.57%, 58.93%, 76.79%, 82.14%, 71.43%, 44.64%, and 69.64% for decision tree, random forest, gradient boosted tree, generalized linear model, linear regression, decision stump, and rule induction, respectively. Since there are only a few attributes in decision tree/stump, missing one or two may largely limit the model performance. In contrast, models with more attributes like GLM outperform the others. The results indicate some models' generalizability and the difficulty of applying ML models on cross-platform datasets. The testing dataset "GSE84422_testing.xls" and exported processes were uploaded to GitHub at https://github.com/JackCheng-TW/RapidMiner-files/testing/.

ML compensates conventional FC methods in gene identification

To compare the differences in gene identification between ML and conventional FC-based methods, we adopted two independent strategies, as illustrated in Fig. 3. In one way, the uppermost 157 genes and the bottommost 157 genes of fold change were selected (Genes sheet of Supplementary File 6). In the other way, we selected 314 DEGs by firstly filtering with the fold change cutoffs 1.2, and followed by the rank of the p values (Genes sheet of Supplementary File 7). Surprisingly, there were only 67 (21.3%) or 80 (25.5%) genes overlapped with the ML-derived 314 genes for the two conventional FC-based methods, respectively.

Next, to figure out the differences in enriched pathways, the final gene list from ML and those from the two conventional FC-based methods were submitted to KEGG pathway enrichment analysis, respectively. The top 15 enriched KEGG pathways are summarized in Table 1, while the KEGG sheets in Supplementary Files 5, 6, 7 provide the full results. As anticipated, most of the pathways enriched by ML-derived genes are not redundant to conventional FC-based methods. Interestingly, the KEGG Alzheimer's pathway (hsa05010) was only enriched by the ML-derived genes, not conventional methods. However, this does not imply that ML is superior to or can replace the conventional methods since the latter also exclusively enriched several AD-related pathways, such as complement cascades (hsa04610), cytokine-cytokine receptor interaction (hsa04060), and phagosome (hsa04145). The mutual exclusivity of critical pathways demonstrates that ML compensates the conventional FC methods in gene identification.

Table 1 Enriched KEGG pathways of genes identified by machine learning and conventional fold-change (FC) methods, respectively.

Full size table

ML highlights oxidative phosphorylation genes in the AD pathway

When we looked into ML-derived genes, which enriched the pathways, we found a considerable overlap of genes between the ML-exclusive pathways (Table 2). These genes are ATP5C1, ATP5G1, NDUFA1, NDUFA4, NDUFA6, NDUFA12, NDUFB1, NDUFB2, NDUFB9, NDUFV1, NDUFV2, and UQCRFS1. They belong to the oxidative phosphorylation pathway (hsa00190), which is also a part of the KEGG Alzheimer’s pathway (Fig. 4). Among them, NDUFA1, NDUFA4, NDUFA6, NDUFA12, NDUFB1, NDUFB2, NDUFB9, NDUFV1, NDUFV2 belong to the OXPHOS protein complexes (CX) I of the electron transport chain (ETC); while UQCRFS1 belongs to CX III of ETC. Moreover, ATP5C1 and ATP5G1 belong to the ATP synthase (CX V).

Table 2 Overlapping of ML-identified genes in the enriched KEGG pathways. “O” denotes the presence of the gene.

Full size table

Co-predictive partners of the CX genes

Random Forest produces decision trees, which use combinations of the Attribute value, i.e., expression of genes, to predict the sample Label, i.e., AD or not. Figure 5 shows 13 decision trees involving ETC complexes subunit genes, and Table 3 summarizes the 12 CX genes and other 37 predictive genes in these trees. Notably, 32 out of the 37 genes are relevant to AD. The AD-relevance is established by association studies of the expression, genomics, or metabolomics, respectively, with references listed in Table 3.

Table 3 Oxidative phosphorylation genes and their companions identified in machine learning.

Full size table

Figure 5A shows that the expression of ATP5G1 and ATN1 predicts AD. Although the exact function of ATN1 is unknown, it may act as a transcriptional co-repressor in neurons¹³. Moreover, alternative splicing of ATN1 was significantly detected in the frontal lobe of AD postmortem¹⁴. Figure 5B shows that the expression of NDUFV and CTXN1 predicts AD. CTXN1 encodes cortexin-1 and may mediate signaling of cortical neurons during forebrain development¹⁵, and it is highly dysregulated in the aging brain¹⁶. Figure 5C shows that slightly downregulation of two CX I genes, NDUFA6 and NDUFB1 predicts AD. Figure 5D shows that the expression of NDUFB9 and FRMPD4 predicts AD. FRMPD4 positively regulates dendritic spine morphogenesis and involves in excitatory synaptic transmission¹⁷. Besides, the expression of FRMPD4 was found to be significantly altered in the AD hippocampus¹⁸. Other AD-predictive genes in these decision trees will be discussed in groups according to their biological functions.

Discussion

We conducted machine learning (ML) analyses to train AD case/control classifiers using transcriptomic data and then compared the ML-derived gene features with that from the conventional differential expression analysis. ML exclusively highlighted oxidative phosphorylation but could not fully include the findings from the conventional methods. The pathways involving the identified genes and the limitation of the study are discussed below.

Oxidative phosphorylation

Oxidative phosphorylation in eukaryotes takes place at the electron transport chain in the mitochondrion. The oxidation of NADH or succinate from the citric acid cycle is the energy source of ATP synthase. During this process, several mitochondrial inner-membrane-embedded complexes, including CX I and CX III, pump protons out from the inner membrane to establish proton gradient, while CX V utilizes the energy of the influx of protons to generate ATP from ADP¹⁹.

Abnormal mitochondrial morphology and functions, including glucose metabolism and ROS production, have been identified as early hallmarks of AD^20,21. These phenotypes are directly related to the disruption of glycolytic processes and the impairment of the ETC complexes. In the '90 s, most research efforts have been devoted to investigating the role of CX IV in AD^22,23. However, the evidence is not conclusive on whether dysregulation of any single ETC complex dominates AD progress. For example, besides expression, several mutations in ETC complex subunit genes may impair the complex activity²⁴. Moreover, the ETC complex's dysregulation seems to be brain-region dependent, e.g., CX IV has no significant decrease in the temporal lobe, and CX I–III are decreased at certain cortex locations of AD²⁴.

Recent studies also highlighted CX I's role, especially its deregulation, is tau-dependent in contrast to the Aβ-dependent CX IV²⁵. Moreover, an SNP association study demonstrated the AD association for complex I genes but not for complexes II–V²⁶. Furthermore, from a postmortem study of 18 AD and 44 controls, the downregulation of CX I-V in the hippocampus was identified²⁷. However, the expression of CX I genes may not be monotonic during the AD progression. From a postmortem study of twelve AD and six controls, CX I genes are reported to decrease in the early stage and increase in the frontal cortex of definite AD patients²⁸.

When we only see the symptom, most things look complex, especially the case for ETC complexes in AD. Could ML guide us through this misty forest with the aid of the Random Forest model by finding out potential partners of CX genes in predicting AD?

Neural maintenance or transmission

Among the AD-predictive genes, CACNA1G, FGF13, LRFN2, NPFF, and SHOX2 participate in neural maintenance or transmission. CACNA1G encodes voltage-dependent T-type calcium channel subunit alpha-1G. FGF13 is a fibroblast growth factor and plays a critical role in neuron polarization and migration²⁹. LRFN2 promotes neurite outgrowth and increases the expression of the NMDA receptor³⁰. NPFF is a neuropeptide, while SHOX2 may be a growth regulator in the neural system and involves processing somatosensory information³¹. In AD, pathological hallmarks include synaptic failure and neuronal loss³². Moreover, the critical role of mitochondria in supporting synaptic, as well as the evidence of dysfunction of mitochondria from both clinical postmortem³³ and animal models³⁴ of AD, support the mitochondria-synapse hypothesis of AD. Our findings that simultaneous dysregulation of CX and neuronal genes predict AD supports this hypothesis.

Immune system

The innate immunity, especially neuroinflammation mediated by microglia, is considered a hallmark of AD, whereas the role of the adapted immunity in AD is not conclusive³⁵. Among the AD-predictive genes, CFHR1, CMTM4, HLA-DRA, IL18, MICA, MORN1, SCYE1, and SOCS4 participate in immunity. CMTM4 regulates PD-L1 protein³⁶, which binds to PD-1 and suppresses the T-cells' adaptive arm, while HLA-DRA presents the extracellular-protein-derived peptides to, and MICA presents the stress-induced self-antigen to T-cells, respectively³⁷. Moreover, MORN1 modulates functional Ca²⁺ influx in T cells upon activation of T-cell receptors³⁸. IL18 and SCYE1 (AIMP1) are pro-inflammatory cytokines, while SOCS4 is part of a negative feedback system that regulates cytokine signal transduction³⁹. CFHR1 is an inhibitor of the complement pathway that blocks C5 convertase and controls complement activation along with complement factor H⁴⁰. Our results indicate that the dysregulation of both innate and adaptive immunity genes may cooperate with CX genes to advance AD progression.

Phosphatase regulators

In AD, hyperphosphorylation of the microtubule-associated proteins, especially tau, disrupts the microtubules' assembly in neurons. Moreover, significantly lower type 1 phosphatase (PP1) activity in AD brains suggests the critical role of dysfunctional phosphatases in AD⁴¹. Among the AD-predictive genes, PPP1R14C and PPP1R7 belong to PP1 regulatory subunit 14 and subunit 7, respectively. Our results indicate that the dysregulation of PPP1R14C and PPP1R7, along with CX genes, may further advance AD progression by aggravating the microtubule-associated proteins' hyperphosphorylation.

Protein glycosylation

Protein glycosylation is a ubiquitous posttranslational modification of site-specific attachment of glycans and regulates the protein's folding and function. During the protein transport from Endoplasmic Reticulum to the Golgi apparatus, a series of attachment of oligosaccharides maturates a wide variety of complex N- or O-glycans. An N-glycosylation denotes the glycan's attachment to the amide nitrogen of an asparagine residue of the protein, whereas an O-glycosylation denotes the attachment to the oxygen atom of serine or threonine residues. Abnormal N- and O-glycosylation has been reported in AD^42,43. Among the AD-predictive genes, FUT8 and GCNT4 mediate glycosylation in the Golgi apparatus. FUT8 catalyzes the addition of fucose to the GlcNAc residue, while GCNT4 is a glycosyltransferase mediating O-glycan branching⁴⁴. Thus, the dysregulation of FUT8 and GCNT4 may aggravate AD progression by abnormal glycosylation under the condition of CX deficiency.

Other mitochondria machinery

Notably, among the AD-predictive genes, there are two mitochondrial genes besides the CX: Ornithine aminotransferase (OAT) and DnaJ homolog subfamily C member 30 (DNAJC30/ WBSCR18). OAT converts ornithine into pyrroline-5-carboxylate (P5C), which can serve as the precursor of proline and glutamate. Furthermore, since ornithine is an intermediate product in the urea cycle, OAT dysregulation may lead to abnormalities of both energy production machinery and the supply of neural transmitters. Recently, the OAT substrate ornithine has been proposed as an early diagnostic biomarker of AD⁴⁵, and altered expression of the urea cycle enzymes have been identified in sporadic AD brains⁴⁶. Our finding that simultaneous downregulation of OAT and CX I predicts AD indicates that the deficiency of the urea cycle and CX may co-operate to advance AD.

Meanwhile, DNAJC30 has been recently identified as an auxiliary component of ATP-synthase machinery in the mitochondria⁴⁷. The removal of Dnajc30 in mice resulted in hypofunctional mitochondria, decreased integrity of CXs, and abnormal neocortical pyramidal neurons⁴⁷. Our finding that the simultaneous downregulation of DNAJC30 and CX I predicts AD also supports the mitochondria deficiency hypothesis of AD.

Slightly dysregulated CX genes predict AD

From an overall observation on the CX-related decision trees (Fig. 5), a combination of down-regulated CX components and one or several partner genes mentioned above predicts AD. Notably, the margin is not conventional twofold, 1.5-fold, or even 1.2-fold. The margin is very subtle, and this is why the conventional FC method cannot identify them. With the criteria of p < 0.05 and FC 1.2, 1.5, or 2, the numbers of DEGs of the model dataset GSE33000 are 418, 10, and 0, respectively, as shown in Supplementary File 10. We compared the 418 DEGs with the ML 314 genes in Supplementary File 5 and found the number of intersection genes to be 60 (19.1%), which was compatible with the results of the conventional method 1 (21.3%) and the conventional method 2 (25.5%). Furthermore, it is difficult to identify complicated rules by conventional methods. Therefore, we suggest adopting machine-learning algorithms, especially decision trees, rule induction, and random forest, as complementary methods in transcriptome studies.

Limitations

There are several limitations to the interpretation of the results. (1) The samples are primarily of Caucasian ancestry. The biased sample race may limit the results to be applied to other races. (2) The samples are from the postmortem of a specific brain region. Since expressional heterogeneity, this may limit the results to be applied to other brain regions. (3) Due to the same reason, the results can hardly be applied to patient diagnosis purposes. (4) For the future application of the study pipeline, at least hundreds of samples might be required due to ML's nature. (5) ML models predict the patient disease labels but not the involvement of genes in disease, and additional genetic evidence is required to delineate any possible causal/reactive roles of these gene features in AD. (6) The performance difference in the independent dataset could be attributed to the detectable genes of different chip systems and the within-dataset variations. The absence of 36.9% attributes (genes) in the test set largely limited the performance of some models. Moreover, the limitation may also come from the difference in the sampling quality, which is reflected by the within-dataset variation (the average STD/INT were 8.2% and 21% for the modeling set and test set, respectively).

Rapidminer models have also been used to identify the transcriptomic bio-signature of an infectious disease condition in the mammary gland of the cow⁴⁸, with the performance ranging from 53 to 87%, which is compatible with the performance of this study. The differences in strategy majorly lay in whether pre-screening attributes (the so-called feature selection) before applying ML⁴⁹. The benefits of feature selection include simplifying models, shorter training times, and avoidance of high dimensionality problems; however, the feature selection step using the entire dataset may strongly bias all downstream prediction, even when cross-validation is used⁵⁰. Therefore, in this study, we decided to skip the feature selection step to achieve an unbiased understanding of AD.

Since decision trees were the final models to identify potential novel genes in this study, whether the data size is big enough is crucial. According to Vabalas et al.⁵¹, we conducted a series of train/test split to validate whether arbitrary partial subsets of data could generate decision trees to predict the “unseen” counterpart, with the same parameters used in this study. As shown in Supplementary File 8, the recall rates were saturated at n = 94, i.e., 20% of the total samples, which may imply the sample size was sufficient to conduct this study.

Although we did not combine datasets in this study, appropriate methods used for reducing the batch effect and differences between experiments⁵² should be applied when combing datasets in future studies. We also noticed that random forest analysis dominated the identified gene features, indicating that future similar studies might focus on random forest first. However, other models may supply other 10% genetic cues on the investigator’s demand.

Hypotheses developed from ML models

To discover and characterize the underlying pathophysiological pathways of AD are the main objectives of genetic research, including this ML study. Based on our findings, we postulate that two novel players, i.e., RNF157 and KIAA1715, may independently participate in AD pathophysiology by mediating the mitogen-activated protein kinase (MAPK) signaling pathway. MAPKs are serine/threonine protein kinases regulating cellular processes in response to environmental stimuli and participate in hallmark events of AD, including tau phosphorylation, Aβ deposition, and chronic inflammation^53,54.

In the #2 model of RF (Supplementary File 2), RNF157, EPHA2, and hCG_1776018 (also known as PIRT, an uncharacterized phosphoinositide-interacting protein) co-predict AD. EPHA2 is a membrane receptor tyrosine kinase, which regulates migration, adhesion, and blood–brain barrier through MAPK signaling⁵⁵. RNF157 is an E3 ubiquitin ligase that acts as a downstream effector of PI3K/MAPK signaling⁵⁶ and regulates the survival of neurons by ubiquitinating APBB1⁵⁷. Presently, there is no knowledge about the roles of these three genes in AD. We hypothesize that RNF157 may act as the downstream of EPHA2 and hCG_1776018, and regulate neural death upon cellular stress in the AD microenvironment. We also postulate that RNF157 agonist may act as a symptomatic treatment in AD.

In the #230 model of RF (Supplementary File 2), KIAA1715 and MAP3K9 co-predict AD. MAP3K9 is a serine/threonine kinase that is activated by environmental stress and acts as an upstream activator of the MKK/JNK signal transduction cascade regulating apoptosis⁵⁸. MAP3K9 dysregulation has been proposed as a possible marker in AD⁵⁹. KIAA1715 (also known as LNPK) is an endoplasmic reticulum (ER) membrane protein, which stabilizes ER curvature and ER tubular junction network^60,61. Mutations in KIAA1715 cause neurodevelopmental syndromes, such as intellectual disability and epilepsy⁶¹. Notably, disruption of ER-mitochondria contact has recently been found in AD postmortem⁶², while restoring ER-mitochondria contact rescues AD animal model⁶³. However, there is no knowledge about the role of KIAA1715 in AD. We hypothesize that under the pro-inflammatory microenvironment of AD, KIAA1715 deficiency may lead to instability of ER structure, leading to disruption of ER-mitochondria contact and eventually aggravate AD progression.

Conclusion

Our study using machine learning techniques on the gene expression profile of the postmortem of the prefrontal cortex brain tissues of AD and controls highlighted the oxidative phosphorylation genes in the AD pathway. These genes were exclusively identified in ML but not in the conventional counterpart. Our results imply that ML should be considered complementary to the conventional FC methods in transcriptome studies. More importantly, we show that hypotheses underlying pathophysiological pathways of AD could be developed by further looking into ML models.

Data availability

All data in this study are included in the supplementary data. The raw data used for machine learning and traditional expression analysis in the CSV format was uploaded as Supplementary File 1. Besides, it is also available from https://github.com/JackCheng-TW/RawData. The independent dataset was uploaded as Supplementary File 9.

Code availability

The machine learning platform RapidMiner Studio is available at https://rapidminer.com/. The process files in Rapidminer format (.rmp) of this study and the generated models were uploaded to GitHub at https://github.com/JackCheng-TW/RapidMiner-files.

Abbreviations

AD:: Alzheimer's disease
ML:: Machine learning
OXPHOS:: Oxidative phosphorylation
CX:: OXPHOS protein complex
FC:: Fold-change

References

Cummings, J., Lee, G., Ritter, A., Sabbagh, M. & Zhong, K. Alzheimer’s disease drug development pipeline: 2019. Alzheimer’s & Dement. 5, 272–293 (2019).
Article Google Scholar
Gong, C.-X., Liu, F. & Iqbal, K. Multifactorial hypothesis and multi-targets for Alzheimer’s disease. J. Alzheimers Dis. 64, S107–S117 (2018).
Article PubMed Google Scholar
Hölscher, C. Moving towards a more realistic concept of what constitutes Alzheimer’s disease. EBioMedicine 39, 17–18 (2019).
Article PubMed Google Scholar
Tanveer, M. et al. Machine learning techniques for the diagnosis of Alzheimer’s disease: A review. ACM Trans. Multimed. Comput. Commun. Appl. 16, 1–35 (2019).
Google Scholar
Perakslis, E., Riordan, H., Friedhoff, L., Nabulsi, A. & Pich, E. M. A call for a global ‘bigger’data approach to Alzheimer disease. Nat. Rev. Drug Discov. 18, 319 (2019).
Article PubMed CAS Google Scholar
Cheng, J. et al. Matrix metalloproteinase 14 modulates diabetes and Alzheimer’s disease cross-talk: A meta-analysis. Neurol. Sci. 39, 267–274 (2018).
Article PubMed Google Scholar
Cheng, J. et al. Dystonin/BPAG1 modulates diabetes and Alzheimer’s disease cross-talk: A meta-analysis. Neurol. Sci. 40, 1577–1582 (2019).
Article PubMed Google Scholar
Narayanan, M. et al. Common dysregulation network in the human prefrontal cortex underlies two neurodegenerative diseases. Mol. Syst. Biol. 10, 743 (2014).
Article PubMed PubMed Central CAS Google Scholar
Wang, M. et al. Integrative network analysis of nineteen brain regions identifies molecular signatures and networks underlying selective regional vulnerability to Alzheimer’s disease. Genome Med. 8, 1–21 (2016).
Article Google Scholar
Szklarczyk, D. et al. STRING v11: Protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. 47, D607–D613 (2019).
Article PubMed CAS Google Scholar
Kanehisa, M. & Goto, S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30 (2000).
Article PubMed PubMed Central CAS Google Scholar
Kanehisa, M. Toward understanding the origin and evolution of cellular organisms. Protein Sci. 28, 1947–1951 (2019).
Article PubMed PubMed Central CAS Google Scholar
Wood, J. D. et al. Atrophin-1, the dentato-rubral and pallido-luysian atrophy gene product, interacts with ETO/MTG8 in the nuclear matrix and represses transcription. J. Cell Biol. 150, 939–948 (2000).
Article PubMed PubMed Central CAS Google Scholar
Twine, N. A., Janitz, K., Wilkins, M. R. & Janitz, M. Whole transcriptome sequencing reveals gene expression and splicing differences in brain regions affected by Alzheimer’s disease. PLoS ONE 6, e16266 (2011).
Article ADS PubMed PubMed Central CAS Google Scholar
Coulter, P. M., Bautista, E. A., Margulies, J. E. & Watson, J. B. Identification of cortexin: A novel, neuron-specific, 82-residue membrane protein enriched in rodent cerebral cortex. J. Neurochem. 61, 756–759 (1993).
Article PubMed CAS Google Scholar
Wang, J. et al. Chromosome 19p in Alzheimer’s disease: When genome meets transcriptome. J. Alzheimers Dis. 38, 245–250 (2014).
Article PubMed CAS Google Scholar
Lee, H. W. et al. Preso, a novel PSD-95-interacting FERM and PDZ domain protein that regulates dendritic spine morphogenesis. J. Neurosci. 28, 14546–14556 (2008).
Article PubMed PubMed Central CAS Google Scholar
Hokama, M. et al. Altered expression of diabetes-related genes in Alzheimer’s disease brains: The Hisayama study. Cereb. Cortex 24, 2476–2488 (2014).
Article PubMed Google Scholar
Zorova, L. D. et al. Mitochondrial membrane potential. Anal. Biochem. 552, 50–59 (2018).
Article PubMed CAS Google Scholar
Lin, M. T. & Beal, M. F. Mitochondrial dysfunction and oxidative stress in neurodegenerative diseases. Nature 443, 787–795 (2006).
Article ADS PubMed CAS Google Scholar
Zhu, X., Perry, G., Smith, M. A. & Wang, X. Abnormal mitochondrial dynamics in the pathogenesis of Alzheimer’s disease. J. Alzheimers Dis. 33, S253–S262 (2013).
Article PubMed PubMed Central CAS Google Scholar
Kish, S. J. et al. Brain cytochrome oxidase in Alzheimer’s disease. J. Neurochem. 59, 776–779 (1992).
Article PubMed CAS Google Scholar
Mutisya, E. M., Bowling, A. C. & Beal, M. F. Cortical cytochrome oxidase activity is reduced in Alzheimer’s disease. J. Neurochem. 63, 2179–2184 (1994).
Article PubMed CAS Google Scholar
Shoffner, J. M. Oxidative phosphorylation defects and Alzheimer’s disease. Neurogenetics 1, 13–19 (1997).
Article PubMed CAS Google Scholar
Rhein, V. et al. Amyloid-β and tau synergistically impair the oxidative phosphorylation system in triple transgenic Alzheimer’s disease mice. Proc. Natl. Acad. Sci. USA 106, 20057–20062 (2009).
Article ADS PubMed PubMed Central Google Scholar
Biffi, A. et al. Genetic variation of oxidative phosphorylation genes in stroke and Alzheimer’s disease. Neurobiol. Aging 35, 1956.e1951-1956.e1958 (2014).
Article CAS Google Scholar
Mastroeni, D. et al. Nuclear but not mitochondrial-encoded oxidative phosphorylation genes are altered in aging, mild cognitive impairment, and Alzheimer’s disease. Alzheimers Dement. 13, 510–519 (2017).
Article PubMed Google Scholar
Manczak, M., Park, B. S., Jung, Y. & Reddy, P. H. Differential expression of oxidative phosphorylation genes in patients with Alzheimer’s disease. NeuroMol. Med. 5, 147–162 (2004).
Article CAS Google Scholar
Smallwood, P. M. et al. Fibroblast growth factor (FGF) homologous factors: New members of the FGF family implicated in nervous system development. Proc. Natl. Acad. Sci. USA 93, 9850–9857 (1996).
Article ADS PubMed PubMed Central CAS Google Scholar
Wang, C.-Y. et al. A novel family of adhesion-like molecules that interacts with the NMDA receptor. J. Neurosci. 26, 2174–2183 (2006).
Article PubMed PubMed Central CAS Google Scholar
Blaschke, R. J. et al. SHOT, a SHOX-related homeobox gene, is implicated in craniofacial, brain, heart, and limb development. Proc. Natl. Acad. Sci. USA 95, 2406–2411 (1998).
Article ADS PubMed PubMed Central CAS Google Scholar
Guo, L., Tian, J. & Du, H. Mitochondrial dysfunction and synaptic transmission failure in Alzheimer’s disease. J. Alzheimers Dis. 57, 1071–1086 (2017).
Article PubMed PubMed Central CAS Google Scholar
Parker, W. D., Parks, J., Filley, C. M. & Kleinschmidt-DeMasters, B. Electron transport chain defects in Alzheimer’s disease brain. Neurology 44, 1090–1090 (1994).
Article PubMed Google Scholar
Du, H. et al. Cyclophilin D deficiency attenuates mitochondrial and neuronal perturbation and ameliorates learning and memory in Alzheimer’s disease. Nat. Med. 14, 1097–1105 (2008).
Article PubMed PubMed Central CAS Google Scholar
Van Eldik, L. J. et al. The roles of inflammation and immune mechanisms in Alzheimer’s disease. Alzheimer’s & Dement. 2, 99–109 (2016).
Article Google Scholar
Mezzadra, R. et al. Identification of CMTM6 and CMTM4 as PD-L1 protein regulators. Nature 549, 106–110 (2017).
Article ADS PubMed PubMed Central CAS Google Scholar
Davis, M. M. & Bjorkman, P. J. T-cell antigen receptor genes and T-cell recognition. Nature 334, 395–402 (1988).
Article ADS PubMed CAS Google Scholar
Woo, J. S. et al. Junctophilin-4, a component of the endoplasmic reticulum–plasma membrane junctions, regulates Ca2+ dynamics in T cells. Proc. Natl. Acad. Sci. USA 113, 2762–2767 (2016).
Article ADS PubMed PubMed Central CAS Google Scholar
Kedzierski, L. et al. Suppressor of cytokine signaling 4 (SOCS4) protects against severe cytokine storm and enhances viral clearance during influenza infection. PLoS Pathog. 10, e1004134 (2014).
Article PubMed PubMed Central CAS Google Scholar
Heinen, S. et al. Factor H–related protein 1 (CFHR-1) inhibits complement C5 convertase activity and terminal complex formation. Blood 114, 2439–2447 (2009).
Article PubMed CAS Google Scholar
Gong, C. X., Singh, T. J., Grundke-Iqbal, I. & Iqbal, K. Phosphoprotein phosphatase activities in Alzheimer disease brain. J. Neurochem. 61, 921–927 (1993).
Article PubMed CAS Google Scholar
Kanninen, K., Goldsteins, G., Auriola, S., Alafuzoff, I. & Koistinaho, J. Glycosylation changes in Alzheimer’s disease as revealed by a proteomic approach. Neurosci. Lett. 367, 235–240 (2004).
Article PubMed CAS Google Scholar
Zhu, Y., Shan, X., Yuzwa, S. A. & Vocadlo, D. J. The emerging link between O-GlcNAc and Alzheimer disease. J. Biol. Chem. 289, 34472–34481 (2014).
Article PubMed PubMed Central CAS Google Scholar
Schwientek, T. et al. Control of O-glycan branch formation molecular cloning and characterization of a novel thymus-associated core 2 β1, 6-N-acetylglucosaminyltransferase. J. Biol. Chem. 275, 11106–11113 (2000).
Article PubMed CAS Google Scholar
Liang, Q. et al. Metabolomics-based screening of salivary biomarkers for early diagnosis of Alzheimer’s disease. RSC Adv. 5, 96074–96079 (2015).
Article ADS CAS Google Scholar
Jęśko, H. et al. Altered expression of urea cycle enzymes in amyloid-β protein precursor overexpressing PC12 cells and in sporadic Alzheimer’s disease brain. J. Alzheimers Dis. 62, 279–291 (2018).
Article PubMed CAS Google Scholar
Tebbenkamp, A. T. et al. The 7q11. 23 protein DNAJC30 interacts with ATP synthase and links mitochondria to brain development. Cell 175, 1088–1104 (2018).
Article PubMed PubMed Central CAS Google Scholar
Sharifi, S. et al. Integration of machine learning and meta-analysis identifies the transcriptomic bio-signature of mastitis disease in cattle. PLoS ONE 13, e0191227 (2018).
Article PubMed PubMed Central CAS Google Scholar
Cheng, J., Liu, H.-P., Lin, W.-Y. & Tsai, F.-J. Identification of contributing genes of Huntington’s disease by machine learning. BMC Med. Genom. 13, 1–11 (2020).
Article CAS Google Scholar
Ambroise, C. & McLachlan, G. J. Selection bias in gene extraction on the basis of microarray gene-expression data. Proc. Natl. Acad. Sci. USA 99, 6562–6566 (2002).
Article ADS PubMed PubMed Central MATH CAS Google Scholar
Vabalas, A., Gowen, E., Poliakoff, E. & Casson, A. J. Machine learning algorithm validation with a limited sample size. PLoS ONE 14, e0224365. https://doi.org/10.1371/journal.pone.0224365 (2019).
Article PubMed PubMed Central CAS Google Scholar
Mohammadi-Dehcheshmeh, M. et al. Unified transcriptomic signature of arbuscular mycorrhiza colonization in roots of Medicago truncatula by integration of machine learning, promoter analysis, and direct merging meta-analysis. Front. Plant Sci. 9, 1550 (2018).
Article PubMed PubMed Central Google Scholar
Zhu, X., Lee, H.-G., Raina, A. K., Perry, G. & Smith, M. A. The role of mitogen-activated protein kinase pathways in Alzheimer’s disease. Neurosignals 11, 270–281 (2002).
Article PubMed CAS Google Scholar
Lee, J. K. & Kim, N.-J. Recent advances in the inhibition of p38 MAPK as a potential strategy for the treatment of Alzheimer’s disease. Molecules 22, 1287 (2017).
Article CAS PubMed Central Google Scholar
Darling, T. K. et al. EphA2 contributes to disruption of the blood-brain barrier in cerebral malaria. PLoS Pathog. 16, e1008261 (2020).
Article PubMed PubMed Central CAS Google Scholar
Dogan, T. et al. Role of the E3 ubiquitin ligase RNF157 as a novel downstream effector linking PI3K and MAPK signaling pathways to the cell cycle. J. Biol. Chem. 292, 14311–14324 (2017).
Article PubMed PubMed Central CAS Google Scholar
Matz, A. et al. Regulation of neuronal survival and morphology by the E3 ubiquitin ligase RNF157. Cell Death Differ. 22, 626–642 (2015).
Article PubMed CAS Google Scholar
Durkin, J. T. et al. Phosphoregulation of mixed-lineage kinase 1 activity by multiple phosphorylation in the activation loop. Biochemistry 43, 16348–16355 (2004).
Article PubMed CAS Google Scholar
Zhang, L. et al. Potential hippocampal genes and pathways involved in Alzheimer’s disease: A bioinformatic analysis. Genet. Mol. Res. 14, 7218–7232 (2015).
Article PubMed CAS Google Scholar
Shemesh, T. et al. A model for the generation and interconversion of ER morphologies. Proc. Natl. Acad. Sci. USA 111, E5243–E5251 (2014).
Article PubMed PubMed Central CAS Google Scholar
Breuss, M. W. et al. Mutations in LNPK, encoding the endoplasmic reticulum junction stabilizer lunapark, cause a recessive neurodevelopmental syndrome. Am. J. Hum. Genet. 103, 296–304 (2018).
Article PubMed PubMed Central CAS Google Scholar
Lau, D. H. et al. Disruption of endoplasmic reticulum-mitochondria tethering proteins in post-mortem Alzheimer’s disease brain. Neurobiol. Dis. 143, 105020 (2020).
Article PubMed PubMed Central CAS Google Scholar
Garrido-Maraver, J., Loh, S. H. & Martins, L. M. Forcing contacts between mitochondria and the endoplasmic reticulum extends lifespan in a Drosophila model of Alzheimer’s disease. Biol. Open 9, bio47530 (2020).
Google Scholar
Lanke, V. Integrative Analysis of Gene Expression Profiles in Aging and Alzheimer’s Disease (International Institute of Information Technology, 2019).
Google Scholar
Ramanan, V. K. et al. Genome-wide pathway analysis of memory impairment in the Alzheimer’s Disease Neuroimaging Initiative (ADNI) cohort implicates gene candidates, canonical pathways, and networks. Brain Imaging Behav. 6, 634–648 (2012).
Article PubMed PubMed Central Google Scholar
Canchi, S. et al. Integrating gene and protein expression reveals perturbed functional networks in Alzheimer’s disease. Cell Rep. 28, 1103–1116 (2019).
Article PubMed PubMed Central CAS Google Scholar
Antonell, A. et al. A preliminary study of the whole-genome expression profile of sporadic and monogenic early-onset Alzheimer’s disease. Neurobiol. Aging 34, 1772–1778 (2013).
Article PubMed CAS Google Scholar
Shi, L. et al. A decade of blood biomarkers for Alzheimer’s disease research: An evolving field, improving study designs, and the challenge of replication. J. Alzheimers Dis. 62, 1181–1198 (2018).
Article PubMed PubMed Central Google Scholar
Mamoor, S. The Middle Temporal Gyrus is Transcriptionally Altered in Patients with Alzheimer’s Disease (OSF, 2020).
Book Google Scholar
Pang, X. et al. The bioinformatic analysis of the dysregulated genes and microRNAs in entorhinal cortex, hippocampus, and blood for Alzheimer’s disease. BioMed Res. Int. https://doi.org/10.1155/2017/9084507 (2017).
Article PubMed PubMed Central Google Scholar
Furney, S. et al. Genome-wide association with MRI atrophy measures as a quantitative trait locus for Alzheimer’s disease. Mol. Psychiatry 16, 1130–1138 (2011).
Article PubMed CAS Google Scholar
Tseveleki, V. et al. Comparative gene expression analysis in mouse models for multiple sclerosis, Alzheimer’s disease and stroke for identifying commonly regulated and disease-specific gene changes. Genomics 96, 82–91 (2010).
Article PubMed CAS Google Scholar
Neuner, S. M. et al. Systems genetics identifies modifiers of Alzheimer’s disease risk and resilience. BioRxiv 2017, 225714 (2017).
Google Scholar
Shi, Y. et al. Transcriptomic analyses for identification and prioritization of genes associated with Alzheimer’s disease in humans. Front. Bioeng. Biotechnol. 8, 31 (2020).
Article PubMed PubMed Central Google Scholar
Lanke, V., Moolamalla, S., Roy, D. & Vinod, P. Integrative analysis of hippocampus gene expression profiles identifies network alterations in aging and Alzheimer’s disease. Front. Aging Neurosci. 10, 153 (2018).
Article PubMed PubMed Central CAS Google Scholar
Szymanski, M., Wang, R., Fallin, M. D., Bassett, S. S. & Avramopoulos, D. Neuroglobin and Alzheimer’s dementia: Genetic association and gene expression changes. Neurobiol. Aging 31, 1835–1842 (2010).
Article PubMed CAS Google Scholar
Swaminathan, S. et al. Analysis of copy number variation in Alzheimer’s disease in a cohort of clinically characterized and neuropathologically verified individuals. PLoS ONE 7, e50640 (2012).
Article ADS PubMed PubMed Central CAS Google Scholar
Ojala, J. et al. Expression of interleukin-18 is increased in the brains of Alzheimer’s disease patients. Neurobiol. Aging 30, 198–209 (2009).
Article PubMed CAS Google Scholar
Sherva, R. et al. Genome-wide association study of the rate of cognitive decline in Alzheimer’s disease. Alzheimers Dement. 10, 45–52 (2014).
Article PubMed Google Scholar
Kong, W. et al. The construction of common and specific significance subnetworks of Alzheimer’s Disease from multiple brain regions. BioMed Res. Int. https://doi.org/10.1155/2015/394260 (2015).
Article PubMed PubMed Central Google Scholar
Rahman, M. R. et al. Identification of common molecular biomarker signatures in blood and brain of Alzheimer’s disease. BioRxiv 2019, 482828 (2019).
Google Scholar
Pradeep, C., Prerna, D. & Lukiw, W. An exploratory analysis of conservation of co-expressed genes across Alzheimer’s disease progression. J. Comput. Sci. Syst. Biol. 6, 215–227 (2013).
Google Scholar
Floudas, C. S., Um, N., Kamboh, M. I., Barmada, M. M. & Visweswaran, S. Identifying genetic interactions associated with late-onset Alzheimer’s disease. BioData Min. 7, 35 (2014).
Article PubMed PubMed Central CAS Google Scholar
Baye, T. M. et al. Candidate gene discovery procedure after follow-up confirmatory analyses of candidate regions of interests for Alzheimer’s disease in the NIMH sibling dataset. Dis. Markers 24, 293–309 (2008).
Article PubMed PubMed Central CAS Google Scholar
Seyfried, N. T. et al. A multi-network approach identifies protein-specific co-expression in asymptomatic and symptomatic Alzheimer’s disease. Cell Syst. 4, 60–72 (2017).
Article PubMed CAS Google Scholar
Muraoka, S. et al. Proteomic profiling of extracellular vesicles derived from cerebrospinal fluid of Alzheimer’s disease patients: A pilot study. Cells 9, 1959 (2020).
Article PubMed Central CAS Google Scholar
Walker, D., Whetzel, A. & Lue, L.-F. Expression of suppressor of cytokine signaling genes in human elderly and Alzheimer’s disease brains and human microglia. Neuroscience 302, 121–137 (2015).
Article PubMed CAS Google Scholar
Lee, Y. H. & Song, G. G. Genome-wide pathway analysis of a genome-wide association study on Alzheimer’s disease. Neurol. Sci. 36, 53–59 (2015).
Article PubMed Google Scholar
Puthiyedth, N., Riveros, C., Berretta, R. & Moscato, P. Identification of differentially expressed genes through integrated study of Alzheimer’s disease affected brain regions. PLoS ONE 11, e0152342 (2016).
Article PubMed PubMed Central Google Scholar

Download references

Funding

This work was supported by grants from the Ministry of Science and Technology in Taiwan (MOST107-2314-B-039-042-MY2, MOST106-2314-B-039-009-, MOST108-2320-B-039-031-MY3, MOST 109-2314-B-039-030) and grants from China Medical University & Hospital (CMU109-MF-85, CMU108-MF-68, CMU108-MF-61, CMU107-S-08, DMR-109-150, DMR-106-119).

Author information

These authors contributed equally: Jack Cheng and Hsin-Ping Liu.

Authors and Affiliations

Graduate Institute of Integrated Medicine, College of Chinese Medicine, China Medical University, Taichung, 40402, Taiwan
Jack Cheng & Wei-Yong Lin
Department of Medical Research, China Medical University Hospital, Taichung, 40447, Taiwan
Jack Cheng, Wei-Yong Lin & Fuu-Jen Tsai
Graduate Institute of Acupuncture Science, College of Chinese Medicine, China Medical University, Taichung, 40402, Taiwan
Hsin-Ping Liu
Brain Diseases Research Center, China Medical University, Taichung, 40402, Taiwan
Wei-Yong Lin
School of Chinese Medicine, China Medical University, Taichung, 40402, Taiwan
Fuu-Jen Tsai
Department of Medical Laboratory and Biotechnology, Asia University, Taichung, 41354, Taiwan
Fuu-Jen Tsai
Division of Pediatric Genetics, Children’s Hospital of China Medical University, Taichung, 40447, Taiwan
Fuu-Jen Tsai

Authors

Jack Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Hsin-Ping Liu
View author publications
You can also search for this author in PubMed Google Scholar
Wei-Yong Lin
View author publications
You can also search for this author in PubMed Google Scholar
Fuu-Jen Tsai
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

W.Y.L. and F.J.T. initiated and supervised this study. J.C. and H.P.L. contributed to the acquisition, analysis, and interpretation of data. All authors discussed and drafted the manuscript.

Corresponding authors

Correspondence to Wei-Yong Lin or Fuu-Jen Tsai.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information 1.

Supplementary Information 2.

Supplementary Information 3.

Supplementary Information 4.

Supplementary Information 5.

Supplementary Information 6.

Supplementary Information 7.

Supplementary Information 8.

Supplementary Information 9.

Supplementary Information 10.

Supplementary Information 11.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Cheng, J., Liu, HP., Lin, WY. et al. Machine learning compensates fold-change method and highlights oxidative phosphorylation in the brain transcriptome of Alzheimer’s disease. Sci Rep 11, 13704 (2021). https://doi.org/10.1038/s41598-021-93085-z

Download citation

Received: 19 October 2020
Accepted: 18 June 2021
Published: 01 July 2021
DOI: https://doi.org/10.1038/s41598-021-93085-z

This article is cited by

A primer on the use of machine learning to distil knowledge from data in biological psychiatry
- Thomas P. Quinn
- Jonathan L. Hess
- Stephen J. Glatt
Molecular Psychiatry (2024)
miRNA profiling as a complementary diagnostic tool for amyotrophic lateral sclerosis
- Jack Cheng
- Wen-Kuang Ho
- Wei-Yong Lin
Scientific Reports (2023)
Identifying the candidate genes using co-expression, GO, and machine learning techniques for Alzheimer’s disease
- Shailendra Sahu
- Pankaj Singh Dholaniya
- T. Sobha Rani
Network Modeling Analysis in Health Informatics and Bioinformatics (2022)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Introduction

Methods

Data source

Machine learning

Conventional fold-change method

Gene enrichment analysis

Results

Identifying AD-predictive genes by ML

ML compensates conventional FC methods in gene identification

ML highlights oxidative phosphorylation genes in the AD pathway

Co-predictive partners of the CX genes

Discussion

Oxidative phosphorylation

Neural maintenance or transmission

Immune system

Phosphatase regulators

Protein glycosylation

Other mitochondria machinery

Slightly dysregulated CX genes predict AD

Limitations

Hypotheses developed from ML models

Conclusion

Data availability

Code availability

Abbreviations

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links