Skip to main content

2022 | OriginalPaper | Buchkapitel

Recent Dimensionality Reduction Techniques for High-Dimensional COVID-19 Data

verfasst von : Ioannis L. Dallas, Aristidis G. Vrahatis, Sotiris K. Tasoulis, Vassilis P. Plagianakos

Erschienen in: Computational Intelligence Methods for Bioinformatics and Biostatistics

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We are going through the last years of the COVID-19 pandemic, where almost the entire research community has focused on the challenges that constantly arise. From the computational and mathematical perspective, we have to deal with a dataset with ultra-high volume and ultra-high dimensionality in several experimental studies. An indicative example is DNA sequencing technologies, which offer a more realistic picture of human diseases at the molecular biology level. However, these technologies produce data with high complexity and ultra-high dimensionality. On the other hand, dimensionality reduction techniques are the first choice to address this complexity, revealing the hidden data structure in the original multidimensional space. Also, such techniques can improve the efficiency of machine learning tasks such as classification and clustering. Towards this direction, we study the behavior of seven well-known and cutting-edge dimensionality reduction techniques tailored for RNA-sequencing data. Along with the study of the effect of these algorithms, we propose the extension of the Random projection and Geodesic distance t-Stochastic Neighbor Embedding (RGt-SNE) algorithm, a recent t-Stochastic Neighbor Embedding (t-SNE) improvement. We suggest a new distance criterion for the kernel matrix construction. Our results show the potential of the proposed algorithm and, at the same time, highlight the complexity of the COVID-19 data, which are not separable, creating a significant challenge that the Machine Learning field will have to face.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Ioannidis, J.P., Salholz-Hillel, M., Boyack, K.W., Baas, J.: The rapid, massive growth of COVID-19 authors in the scientific literature. R. Soc. Open Sci. 8(9), 210389 (2021)CrossRefPubMedPubMedCentral Ioannidis, J.P., Salholz-Hillel, M., Boyack, K.W., Baas, J.: The rapid, massive growth of COVID-19 authors in the scientific literature. R. Soc. Open Sci. 8(9), 210389 (2021)CrossRefPubMedPubMedCentral
2.
Zurück zum Zitat Bohn, M.K., Hall, A., Sepiashvili, L., Jung, B., Steele, S., Adeli, K.: Pathophysiology of COVID-19: mechanisms underlying disease severity and progression. Physiology 35(5), 288–301 (2020)CrossRefPubMedPubMedCentral Bohn, M.K., Hall, A., Sepiashvili, L., Jung, B., Steele, S., Adeli, K.: Pathophysiology of COVID-19: mechanisms underlying disease severity and progression. Physiology 35(5), 288–301 (2020)CrossRefPubMedPubMedCentral
3.
Zurück zum Zitat Feng, W., et al.: Molecular diagnosis of COVID-19: challenges and research needs. Anal. Chem. 92(15), 10196–10209 (2020)CrossRefPubMed Feng, W., et al.: Molecular diagnosis of COVID-19: challenges and research needs. Anal. Chem. 92(15), 10196–10209 (2020)CrossRefPubMed
4.
Zurück zum Zitat Qi, C., et al.: SCovid: single-cell atlases for exposing molecular characteristics of COVID-19 across 10 human tissues. Nucleic Acids Res. 50(D1), D867–D874 (2022)CrossRefPubMed Qi, C., et al.: SCovid: single-cell atlases for exposing molecular characteristics of COVID-19 across 10 human tissues. Nucleic Acids Res. 50(D1), D867–D874 (2022)CrossRefPubMed
5.
Zurück zum Zitat Saliba, A.E., Westermann, A.J., Gorski, S.A., Vogel, J.: Single-cell RNA-seq: advances and future challenges. Nucleic Acids Res. 42(14), 8845–8860 (2014)CrossRefPubMedPubMedCentral Saliba, A.E., Westermann, A.J., Gorski, S.A., Vogel, J.: Single-cell RNA-seq: advances and future challenges. Nucleic Acids Res. 42(14), 8845–8860 (2014)CrossRefPubMedPubMedCentral
6.
8.
Zurück zum Zitat Sun, S., Zhu, J., Ma, Y., Zhou, X.: Accuracy, robustness and scalability of dimensionality reduction methods for single-cell RNA-seq analysis. Genome Biol. 20(1), 1–21 (2019)CrossRef Sun, S., Zhu, J., Ma, Y., Zhou, X.: Accuracy, robustness and scalability of dimensionality reduction methods for single-cell RNA-seq analysis. Genome Biol. 20(1), 1–21 (2019)CrossRef
10.
Zurück zum Zitat Hasin, Y., Seldin, M., Lusis, A.: Multi-omics approaches to disease. Genome Biol. 18(1), 1–15 (2017)CrossRef Hasin, Y., Seldin, M., Lusis, A.: Multi-omics approaches to disease. Genome Biol. 18(1), 1–15 (2017)CrossRef
11.
12.
Zurück zum Zitat Van Der Maaten, L., Postma, E., Van den Herik, J.: Dimensionality reduction: a comparative. J. Mach. Learn. Res. 10(66–71), 13 (2009) Van Der Maaten, L., Postma, E., Van den Herik, J.: Dimensionality reduction: a comparative. J. Mach. Learn. Res. 10(66–71), 13 (2009)
13.
Zurück zum Zitat Jolliffe, I.T., Cadima, J.: Principal component analysis: a review and recent developments. Philos. Trans. R. Soc. A: Math. Phys. Eng. Sci. 374(2065), 20150202 (2016)CrossRef Jolliffe, I.T., Cadima, J.: Principal component analysis: a review and recent developments. Philos. Trans. R. Soc. A: Math. Phys. Eng. Sci. 374(2065), 20150202 (2016)CrossRef
14.
Zurück zum Zitat Kobak, D., Berens, P.: The art of using t-SNE for single-cell transcriptomics. Nat. Commun. 10(1), 1–14 (2019)CrossRef Kobak, D., Berens, P.: The art of using t-SNE for single-cell transcriptomics. Nat. Commun. 10(1), 1–14 (2019)CrossRef
15.
Zurück zum Zitat McInnes, L., Healy, J., Melville, J.: UMAP: uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018) McInnes, L., Healy, J., Melville, J.: UMAP: uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:​1802.​03426 (2018)
16.
Zurück zum Zitat Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11) (2008) Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11) (2008)
17.
Zurück zum Zitat Becht, E., et al.: Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 37(1), 38–44 (2019)CrossRef Becht, E., et al.: Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 37(1), 38–44 (2019)CrossRef
18.
Zurück zum Zitat Narayan, A., Berger, B., Cho, H.: Assessing single-cell transcriptomic variability through density-preserving data visualization. Nat. Biotechnol. 39(6), 765–774 (2021)CrossRefPubMedPubMedCentral Narayan, A., Berger, B., Cho, H.: Assessing single-cell transcriptomic variability through density-preserving data visualization. Nat. Biotechnol. 39(6), 765–774 (2021)CrossRefPubMedPubMedCentral
19.
20.
Zurück zum Zitat Vrahatis, A.G., Tasoulis, S.K., Dimitrakopoulos, G.N., Plagianakos, V.P.: Visualizing high-dimensional single-cell RNA-seq data via random projections and geodesic distances. In: 2019 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), pp. 1–6. IEEE (2019) Vrahatis, A.G., Tasoulis, S.K., Dimitrakopoulos, G.N., Plagianakos, V.P.: Visualizing high-dimensional single-cell RNA-seq data via random projections and geodesic distances. In: 2019 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), pp. 1–6. IEEE (2019)
21.
Zurück zum Zitat Pardo-Diaz, J., Bozhilova, L.V., Beguerisse-Díaz, M., Poole, P.S., Deane, C.M., Reinert, G.: Robust gene coexpression networks using signed distance correlation. Bioinformatics 37(14), 1982–1989 (2021)CrossRefPubMedPubMedCentral Pardo-Diaz, J., Bozhilova, L.V., Beguerisse-Díaz, M., Poole, P.S., Deane, C.M., Reinert, G.: Robust gene coexpression networks using signed distance correlation. Bioinformatics 37(14), 1982–1989 (2021)CrossRefPubMedPubMedCentral
22.
Zurück zum Zitat Liesecke, F., et al.: Ranking genome-wide correlation measurements improves microarray and RNA-seq based global and targeted co-expression networks. Sci. Rep. 8(1), 1–16 (2018)CrossRef Liesecke, F., et al.: Ranking genome-wide correlation measurements improves microarray and RNA-seq based global and targeted co-expression networks. Sci. Rep. 8(1), 1–16 (2018)CrossRef
23.
24.
Zurück zum Zitat Lieberman, N.A., et al.: In vivo antiviral host transcriptional response to SARS-CoV-2 by viral load, sex, and age. PLoS Biol. 18(9), e3000849 (2020)CrossRefPubMedPubMedCentral Lieberman, N.A., et al.: In vivo antiviral host transcriptional response to SARS-CoV-2 by viral load, sex, and age. PLoS Biol. 18(9), e3000849 (2020)CrossRefPubMedPubMedCentral
25.
Zurück zum Zitat Ng, D.L., et al.: A diagnostic host response biosignature for COVID-19 from RNA profiling of nasal swabs and blood. Sci. Adv. 7(6), eabe5984 (2021) Ng, D.L., et al.: A diagnostic host response biosignature for COVID-19 from RNA profiling of nasal swabs and blood. Sci. Adv. 7(6), eabe5984 (2021)
26.
Zurück zum Zitat Overmyer, K.A., et al.: Large-scale multi-omic analysis of COVID-19 severity. Cell Syst. 12(1), 23–40 (2021)CrossRefPubMed Overmyer, K.A., et al.: Large-scale multi-omic analysis of COVID-19 severity. Cell Syst. 12(1), 23–40 (2021)CrossRefPubMed
27.
Zurück zum Zitat Silvin, A., et al.: Elevated calprotectin and abnormal myeloid cell subsets discriminate severe from mild COVID-19. Cell 182(6), 1401–1418 (2020)CrossRefPubMedPubMedCentral Silvin, A., et al.: Elevated calprotectin and abnormal myeloid cell subsets discriminate severe from mild COVID-19. Cell 182(6), 1401–1418 (2020)CrossRefPubMedPubMedCentral
28.
Zurück zum Zitat Handl, J., Knowles, J., Kell, D.B.: Computational cluster validation in post-genomic data analysis. Bioinformatics 21(15), 3201–3212 (2005)CrossRefPubMed Handl, J., Knowles, J., Kell, D.B.: Computational cluster validation in post-genomic data analysis. Bioinformatics 21(15), 3201–3212 (2005)CrossRefPubMed
29.
Zurück zum Zitat Rendón, E., Abundez, I., Arizmendi, A., Quiroz, E.M.: Internal versus external cluster validation indexes. Int. J. Comput. Commun. 5(1), 27–34 (2011) Rendón, E., Abundez, I., Arizmendi, A., Quiroz, E.M.: Internal versus external cluster validation indexes. Int. J. Comput. Commun. 5(1), 27–34 (2011)
30.
Zurück zum Zitat Bolshakova, N., Azuaje, F.: Cluster validation techniques for genome expression data. Signal Process. 83(4), 825–833 (2003)CrossRef Bolshakova, N., Azuaje, F.: Cluster validation techniques for genome expression data. Signal Process. 83(4), 825–833 (2003)CrossRef
31.
Zurück zum Zitat Cakir, B., Prete, M., Huang, N., Van Dongen, S., Pir, P., Kiselev, V.Y.: Comparison of visualization tools for single-cell RNAseq data. NAR Genomics Bioinform. 2(3), lqaa052 (2020) Cakir, B., Prete, M., Huang, N., Van Dongen, S., Pir, P., Kiselev, V.Y.: Comparison of visualization tools for single-cell RNAseq data. NAR Genomics Bioinform. 2(3), lqaa052 (2020)
Metadaten
Titel
Recent Dimensionality Reduction Techniques for High-Dimensional COVID-19 Data
verfasst von
Ioannis L. Dallas
Aristidis G. Vrahatis
Sotiris K. Tasoulis
Vassilis P. Plagianakos
Copyright-Jahr
2022
DOI
https://doi.org/10.1007/978-3-031-20837-9_18

Premium Partner