An interpretable deep-learning architecture of capsule networks for identifying cell-type gene expression programs from single-cell RNA-sequencing data

Wang, Lifei; Nie, Rui; Yu, Zeyang; Xin, Ruyue; Zheng, Caihong; Zhang, Zhang; Zhang, Jiang; Cai, Jun

doi:10.1038/s42256-020-00244-4

Article
Published: 02 November 2020

An interpretable deep-learning architecture of capsule networks for identifying cell-type gene expression programs from single-cell RNA-sequencing data

Lifei Wang^1,2,3,
Rui Nie^1,2,3,
Zeyang Yu^1,2,3,
Ruyue Xin⁴,
Caihong Zheng^1,2,3,
Zhang Zhang⁴,
Jiang Zhang ORCID: orcid.org/0000-0001-7402-7482⁴ &
…
Jun Cai ORCID: orcid.org/0000-0003-2733-9373^1,2,3

Nature Machine Intelligence volume 2, pages 693–703 (2020)Cite this article

3862 Accesses
32 Citations
5 Altmetric
Metrics details

Subjects

Abstract

Single-cell RNA sequencing (scRNA-seq) technologies are used to characterize the heterogeneity of cells in cell types, developmental stages and spatial positions. The rapid accumulation of scRNA-seq data has enabled single-cell-type labelling to transform single-cell transcriptome analysis. Here we propose an interpretable deep-learning architecture using capsule networks (called scCapsNet). A capsule structure (a neuron vector representing a set of properties of a specific object) captures hierarchical relations. By utilizing competitive single-cell-type recognition, the scCapsNet model is able to perform feature selection to identify groups of genes encoding different subcellular types. The RNA expression signatures, which enable subcellular-type recognition, are effectively integrated into the parameter matrices of scCapsNet. This characteristic enables the discovery of gene regulatory modules in which genes interact with each other and are closely related in function, but present distinct expression patterns.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: The architecture of scCapsNet and its cell-type-recognition characteristics.**

**Fig. 2: The identification of the core gene set responsible for recognition of each cell type.**

**Fig. 3: The core genes that are essential for the biological functions of different subcellular types.**

**Fig. 4: An embedding representation of each gene integrating its RNA expression signature and its cell-type-labelling attribute in scCapsNet.**

**Fig. 5: Some characteristics of the core genes recognized by scCapsNet.**

Highly accurate protein structure prediction with AlphaFold

Article Open access 15 July 2021

Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data

Article Open access 12 April 2024

Simultaneous single-cell three-dimensional genome and gene expression profiling uncovers dynamic enhancer connectivity underlying olfactory receptor choice

Article Open access 15 April 2024

Data availability

The pre-processed single-cell transcriptome data of mRBCs²⁸ and hPBMCs²⁷ can be downloaded and extracted from Github (RetinaDataset and PurifiedPBMCDataset, https://github.com/YosefLab/scVI)²⁰. Other pre-processed single-cell transcriptome data for the cross-dataset experiment, unseen population experiment and negative control experiment can be downloaded from https://zenodo.org/record/3357167#.X0kHlPZuJZU¹¹. All the data used in this Article are summarized in Supplementary Table 3.

Code availability

The implementation of scCapsNet can be found in https://github.com/wanglf19/scCaps or https://zenodo.org/record/4007185#.X0oHPPZuJZU.

References

Zheng, C. et al. Landscape of infiltrating T cells in liver cancer revealed by single-cell sequencing. Cell 169, 1342–1356 (2017).
Google Scholar
Guo, X. et al. Global characterization of T cells in non-small-cell lung cancer by single-cell sequencing. Nat. Med. 24, 978–985 (2018).
Google Scholar
Setty, M. et al. Wishbone identifies bifurcating developmental trajectories from single-cell data. Nat. Biotechnol. 34, 637–645 (2016).
Google Scholar
Qiu, X. et al. Reversed graph embedding resolves complex single-cell trajectories. Nat. Methods 14, 979–982 (2017).
Google Scholar
Halpern, K. B. et al. Single-cell spatial reconstruction reveals global division of labour in the mammalian liver. Nature 542, 352–356 (2017).
Google Scholar
Halpern, K. B. et al. Paired-cell sequencing enables spatial gene expression mapping of liver endothelial cells. Nat. Biotechnol. 36, 962–970 (2018).
Google Scholar
Han, X. et al. Mapping the mouse cell atlas by Microwell-Seq. Cell 173, 1307 (2018).
Google Scholar
de Kanter, J. K., Lijnzaad, P., Candelli, T., Margaritis, T. & Holstege, F. C. P. CHETAH: a selective, hierarchical cell type identification method for single-cell RNA sequencing. Nucleic Acids Res. 47, e95 (2019).
Google Scholar
Kiselev, V. Y., Yiu, A. & Hemberg, M. scmap: projection of single-cell RNA-seq data across data sets. Nat. Methods 15, 359–362 (2018).
Google Scholar
Pliner, H. A., Shendure, J. & Trapnell, C. Supervised classification enables rapid annotation of cell atlases. Nat. Methods 16, 983–986 (2019).
Google Scholar
Abdelaal, T. et al. A comparison of automatic cell identification methods for single-cell RNA sequencing data. Genome Biol. 20, 194 (2019).
Google Scholar
Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).
Google Scholar
Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902 e1821 (2019).
Google Scholar
Florian Wagner, P. Y. Moana: a robust and scalable cell type classification framework for single-cell RNA-Seq data. bioRxiv https://doi.org/10.1101/456129 (2018).
Haghverdi, L., Lun, A. T. L., Morgan, M. D. & Marioni, J. C. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat. Biotechnol. 36, 421–427 (2018).
Google Scholar
Almas Jabeen, N. A. & Raza, K. Machine learning-based state-of-the-art methods for the classification of RNA-seq data. bioRxiv https://doi.org/10.1101/120592 (2017).
Camacho, D. M., Collins, K. M., Powers, R. K., Costello, J. C. & Collins, J. J. Next-generation machine learning for biological networks. Cell 173, 1581–1592 (2018).
Google Scholar
Zhou, J. et al. Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nat. Genet. 50, 1171–1179 (2018).
Google Scholar
Alipanahi, B., Delong, A., Weirauch, M. T. & Frey, B. J. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 33, 831–838 (2015).
Google Scholar
Lopez, R., Regier, J., Cole, M. B., Jordan, M. I. & Yosef, N. Deep generative modeling for single-cell transcriptomics. Nat. Methods 15, 1053 (2018). +.
Google Scholar
Amodio, M. et al. Exploring single-cell data with deep multitasking neural networks. Nature Methods 16, 1139–1145https://doi.org/10.1038/s41592-019-0576-7 (2019).
Chen, H. H. et al. GSAE: an autoencoder with embedded gene-set nodes for genomics functional characterization. BMC Syst. Biol. 12, 142 (2018).
Google Scholar
Ding, J., Condon, A. & Shah, S. P. Interpretable dimensionality reduction of single cell transcriptome data with deep generative models. Nat. Commun. 9, 2002 (2018).
Google Scholar
Lin, C., Jain, S., Kim, H. & Bar-Joseph, Z. Using neural networks for reducing the dimensions of single-cell RNA-seq data. Nucleic Acids Res. 45, e156 (2017).
Google Scholar
Sabour, S., Frosst, N. & Hinton, G. E. Dynamic routing between capsules. In Advances in Neural Information Processing Systems 30 (eds Guyon, I. et al.) 3856–3866 (Curran Associates, 2017).
Macosko, E. Z. et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202–1214 (2015).
Google Scholar
Zheng, G. X. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017).
Google Scholar
Shekhar, K. et al. Comprehensive classification of retinal bipolar neurons by single-cell transcriptomics. Cell 166, 1308–1323 (2016).
Google Scholar
Ding, J. et al. Systematic comparative analysis of single cell RNA-sequencing methods. bioRxiv https://doi.org/10.1101/632216 (2019).
Baron, M. et al. A single-cell transcriptomic map of the human and mouse pancreas reveals inter- and intra-cell population structure. Cell Syst. 3, 346–360 (2016).
Google Scholar
Muraro, M. J. et al. A single-cell transcriptome atlas of the human pancreas. Cell Syst. 3, 385–394 (2016).
Google Scholar
Segerstolpe, A. et al. Single-cell transcriptome profiling of human pancreatic islets in health and type 2 diabetes. Cell Metab. 24, 593–607 (2016).
Google Scholar
Xin, Y. et al. RNA sequencing of single human islet cells reveals type 2 diabetes genes. Cell Metab. 24, 608–615 (2016).
Google Scholar
Tasic, B. et al. Shared and distinct transcriptomic cell types across neocortical areas. Nature 563, 72–78 (2018).
Google Scholar
Chahwan, R., Edelmann, W., Scharff, M. D. & Roa, S. AIDing antibody diversity by error-prone mismatch repair. Semin. Immunol. 24, 293–300 (2012).
Google Scholar
Stone, S. F. et al. Changes in differential gene expression during a fatal stroke. J. Clin. Neurosci. 23, 130–134 (2016).
Google Scholar
Fishilevich, S. et al. GeneHancer: genome-wide integration of enhancers and target genes in GeneCards. Database J. Biol. Databases Curation 2017, bax028 (2017).
Stark, C. et al. BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 34, D535–D539 (2006).
Google Scholar
Lee, C. K. et al. Cloning thymic precursor cells: demonstration that individual pro-T1 cells have dual T-NK potential and individual pro-T2 cells have dual alphabeta-gammadelta T cell potential. Cell. Immunol. 191, 139–144 (1999).
Google Scholar
Lun, A. T., McCarthy, D. J. & Marioni, J. C. A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor. F1000Res. 5, 2122 (2016).
Google Scholar
Frankenberger, M. et al. Transcript profiling of CD16-positive monocytes reveals a unique molecular fingerprint. Eur. J. Immunol. 42, 957–974 (2012).
Google Scholar
Bernal-Quiros, M., Wu, Y. Y., Alarcon-Riquelme, M. E. & Castillejo-Lopez, C. BANK1 and BLK act through phospholipase C gamma 2 in B-cell signaling. PLoS One 8, e59842 (2013).
Google Scholar
Lapter, S. et al. A role for the B-cell CD74/macrophage migration inhibitory factor pathway in the immunomodulation of systemic lupus erythematosus by a therapeutic tolerogenic peptide. Immunology 132, 87–95 (2011).
Google Scholar
Huang, X. et al. Downregulation of the B-cell receptor signaling component CD79b in plasma cell myeloma: a possible post transcriptional regulation. Pathol. Int. 61, 122–129 (2011).
Google Scholar
Stang, S. L. et al. A proapoptotic signaling pathway involving RasGRP, Erk, and Bim in B cells. Exp. Hematol. 37, 122–134 (2009).
Google Scholar
Shah, R. D. et al. Expression of calgranulin genes S100A8, S100A9 and S100A12 is modulated by n-3 PUFA during inflammation in adipose tissue and mononuclear cells. PLoS One 12, e0169614 (2017).
Google Scholar
Gren, S. T. et al. A single-cell gene-expression profile reveals inter-cellular heterogeneity within human monocyte subsets. PLoS One 10, e0144351 (2015).
Google Scholar
Villasenor-Cardoso, M. I., Frausto-Del-Rio, D. A. & Ortega, E. Aminopeptidase N (CD13) is involved in phagocytic processes in human dendritic cells and macrophages. BioMed Res. Int. 2013, 562984 (2013).
Google Scholar
Munthe-Fog, L. et al. Variation in FCN1 affects biosynthesis of ficolin-1 and is associated with outcome of systemic inflammation. Genes Immun. 13, 515–522 (2012).
Google Scholar
Li, Y. et al. A possible role of HMGB1 in DNA demethylation in CD4+ T cells from patients with systemic lupus erythematosus. Clin. Dev. Immunol. 2013, 206298 (2013).
Google Scholar
Chan, D. V. et al. Differential CTLA-4 expression in human CD4+ versus CD8+ T cells is associated with increased NFAT1 and inhibition of CD4+ proliferation. Genes Immun. 15, 25–32 (2014).
Google Scholar
Alonso, M. A. & Weissman, S. M. cDNA cloning and sequence of MAL, a hydrophobic protein associated with human T-cell differentiation. Proc. Natl. Acad. Sci. 84, 1997–2001 (1987).
Google Scholar
Cismasiu, V. B. et al. BCL11B participates in the activation of IL2 gene expression in CD4+ T lymphocytes. Blood 108, 2695–2702 (2006).
Google Scholar
Bade, B. et al. Differential expression of the granzymes A, K and M and perforin in human peripheral blood lymphocytes. Int. Immunol. 17, 1419–1428 (2005).
Google Scholar
Huang, R. Y. et al. LAG3 and PD1 co-inhibitory molecules collaborate to limit CD8+ T cell signaling and dampen antitumor immunity in a murine ovarian cancer model. Oncotarget 6, 27359–27377 (2015).
Google Scholar
Stoeckle, C. et al. Cathepsin W expressed exclusively in CD8+ T cells and NK cells, is secreted during target cell killing but is not essential for cytotoxicity in human CTLs. Exp. Hematol. 37, 266–275 (2009).
Google Scholar
Nizzoli, G. et al. Human CD1c+ dendritic cells secrete high levels of IL-12 and potently prime cytotoxic T-cell responses. Blood 122, 932–942 (2013).
Google Scholar
Heger, L. et al. CLEC10A is a specific marker for human CD1c(+) dendritic cells and enhances their toll-like receptor 7/8-induced cytokine secretion. Front. Immunol. 9, 744 (2018).
Google Scholar
Karsunky, H., Merad, M., Cozzio, A., Weissman, I. L. & Manz, M. G. Flt3 ligand regulates dendritic cell development from Flt3+ lymphoid and myeloid-committed progenitors to Flt3+ dendritic cells in vivo. J. Exp. Med. 198, 305–313 (2003).
Google Scholar
Ohta, M. et al. Immunomodulation of monocyte-derived dendritic cells through ligation of tumor-produced mucins to Siglec-9. Biochem. Biophys. Res. Commun. 402, 663–669 (2010).
Google Scholar
Chen, Y. J. et al. Eps8 protein facilitates phagocytosis by increasing TLR4-MyD88 protein interaction in lipopolysaccharide-stimulated macrophages. J. Biol. Chem. 287, 18806–18819 (2012).
Google Scholar
Kitzenberg, D., Colgan, S. P. & Glover, L. E. Creatine kinase in ischemic and inflammatory disorders. Clin. Transl. Med. 5, 31 (2016).
Google Scholar
Martinez, F. O. The transcriptome of human monocyte subsets begins to emerge. J. Biol. 8, 99 (2009).
Google Scholar
Zhang, C., Gadue, P., Scott, E., Atchison, M. & Poncz, M. Activation of the megakaryocyte-specific gene platelet basic protein (PBP) by the Ets family factor PU.1. J. Biol. Chem. 272, 26236–26246 (1997).
Google Scholar
Seo, H. et al. A beta1-tubulin-based megakaryocyte maturation reporter system identifies novel drugs that promote platelet production. Blood Adv. 2, 2262–2272 (2018).
Google Scholar
Clay, D. et al. CD9 and megakaryocyte differentiation. Blood 97, 1982–1989 (2001).
Google Scholar
Hickey, M. J., Deaven, L. L. & Roth, G. J. Human platelet glycoprotein IX. Characterization of cDNA and localization of the gene to chromosome 3. FEBS Lett. 274, 189–192 (1990).
Google Scholar
Kim, T. D. et al. Human microRNA-27a* targets Prf1 and GzmB expression to regulate NK-cell cytotoxicity. Blood 118, 5476–5486 (2011).
Google Scholar
Kuttruff, S. et al. NKp80 defines and stimulates a reactive subset of CD8 T cells. Blood 113, 358–369 (2009).
Google Scholar
Sim, M. J. et al. KIR2DL3 and KIR2DL1 show similar impact on licensing of human NK cells. Eur. J. Immunol. 46, 185–191 (2016).
Google Scholar
Frohlich, H., Speer, N., Poustka, A. & Beissbarth, T. GOSim—an R-package for computation of information theoretic GO similarities between terms and gene products. BMC Bioinf. 8, 166 (2007).
Google Scholar
Adrian Alexa, J. R. topGO: Enrichment Analysis for Gene Ontology R package version 2.34.0 (2018).
Fabregat, A. et al. The reactome pathway Knowledgebase. Nucleic Acids Res. 44, D481–D487 (2016).
Google Scholar
Fabregat, A. et al. Reactome pathway analysis: a high-performance in-memory approach. BMC Bioinf. 18, 142 (2017).
Google Scholar

Download references

Acknowledgements

This work was supported by grants from the National Key R&D Program of China (grant 2018YFC0910402 to J.C.; grant 2018YFC1003102 to C.Z. and grant 2017YFC0908402 to C.Z.); the Strategic Priority Research Program of the Chinese Academy of Sciences (grant E0XD842201 to J.C.); the National Natural Science Foundation of China (grant 32070795 to J.C. and grant 61673070 to J.Z.); and the Open Project of Key Laboratory of Genomic and Precision Medicine, Chinese Academy of Sciences.

Author information

Authors and Affiliations

China National Center for Bioinformation, Beijing, China
Lifei Wang, Rui Nie, Zeyang Yu, Caihong Zheng & Jun Cai
Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
Lifei Wang, Rui Nie, Zeyang Yu, Caihong Zheng & Jun Cai
University of Chinese Academy of Sciences, Beijing, China
Lifei Wang, Rui Nie, Zeyang Yu, Caihong Zheng & Jun Cai
School of Systems Science, Beijing Normal University, Beijing, China
Ruyue Xin, Zhang Zhang & Jiang Zhang

Authors

Lifei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Rui Nie
View author publications
You can also search for this author in PubMed Google Scholar
Zeyang Yu
View author publications
You can also search for this author in PubMed Google Scholar
Ruyue Xin
View author publications
You can also search for this author in PubMed Google Scholar
Caihong Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Zhang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jiang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jun Cai
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.C., J.Z. and L.W. envisioned the project. L.W. implemented the model and performed the analysis. L.W. and J.C. wrote the paper. R.N., Z.Y., R.X., C.Z., Z.Z. and J.Z. provided assistance in writing and analysis.

Corresponding authors

Correspondence to Jiang Zhang or Jun Cai.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 The performance and its internal parameters of scCapsNet relative with cell type recognition.

a, The classification performance across two hPBMC datasets from the 10x Genomics platform. We trained scCapsNet and other machine learning methods using one dataset and then evaluated their performances on another dataset. The heatmap shows the prediction accuracy for each classifier. b, The classification performance across four human pancreatic datasets from different single-cell RNA-seq protocols. The four datasets are quoted from Abdelaal’s paper. Each column corresponds to one sub-task in which one of the four datasets was used as a test set and the rest three datasets were used as training. The heatmap shows the prediction accuracy for each classifier. c, The rejection option evaluation in the negative control experiment on scCapsNet, SVM_rejection and LDA_rejection models. There are two groups of datasets, the group of human dataset from PBMC and pancreas tissues, and the group of mouse dataset from visual cortex and pancreas tissues. In each column, the classifiers are used to predict single cell identity of one dataset after training on the paired dataset from another different tissue. The recognition rates of unlabeled single cells as the negative control are shown in the heatmap. The LDA_rejection reported error in AMB16-Baron Mouse experiment, so we set the percentage of unlabeled cells to 0. d, The heatmaps of the matrices of averaged coupling coefficients for mRBC dataset with cell type listed above. For each heatmap, the row represents type capsules and the column represents primary capsules. e, The overall heatmap of the combining matrix of average coupling coefficient. The combining matrix contains the effective type capsule row in Extended Data Fig. 1b where its recognition type is in accordance with the type of input single cells.

Extended Data Fig. 2 The identification of the core gene set responsible for recognition of B cells in hPBMC.

a, The coloured changing curves of cell-type recognition accuracies while the ranking genes defined by a sliding cutoff value on the principal component score were excluded in the inputs of the scCapsNet model. The accuracy curve for each cell type is represented in a distinct colour. The dotted line defines a group of core genes responsible for B-cell identification, where the recognition accuracy of B cells degrades close to 0 but slightly decreases for any other cell type. b, The plot depicts the two-dimensional PCA on the weight matrix for the primary capsule. Each dot represents a gene with a rank according to the score of principal components. A group of core genes marked as blue colour are defined. c, The comparison of prediction accuracy of each cell type before and after the masking of the B-cell core genes. d, The heatmaps of the revised matrices of averaged coupling coefficients for hPBMC dataset with the loss of the group of B-cell core genes in the inputs of the scCapsNet model. The heatmaps in order represent the revised averaged coupling coefficient matrix for the single B cells, CD14+ monocytes, CD4+ T cells, CD8+ T cells, dendritic cells, FCGR3A+ monocytes, megakaryocytes and NK cells. For each heatmap, the row represents type capsules and the column represents primary capsules. e, The revision of the overall heatmap of the combining matrix of average coupling coefficient. The combining matrix contains the effective type capsule row in Extended Data Fig. 2d where its recognition type is in accordance with the type of input single cells.

Extended Data Fig. 3 The identification of the core gene set responsible for recognition of CD14+ monocytes in hPBMC.

a, The coloured changing curves of cell-type recognition accuracies while the ranking genes defined by a sliding cutoff value on the principal component score were excluded in the inputs of the scCapsNet model. The accuracy curve for each cell type is represented in a distinct colour. The dotted line defines a group of core genes responsible for CD14+ monocyte identification, where the recognition accuracy of CD14+ monocytes degrades close to 0 but slightly decreases for any other cell type. b, The plot depicts the two-dimensional PCA on the weight matrix for the primary capsule. Each dot represents a gene with a rank according to the score of principal components. A group of core genes marked as blue colour are defined. c, The comparison of prediction accuracy of each cell type before and after the masking of the CD14+ monocyte core genes. d, The heatmaps of the revised matrices of averaged coupling coefficients for hPBMC dataset with the loss of the group of CD14+ monocytes core genes in the inputs of the scCapsNet model. The heatmaps in order represent the revised averaged coupling coefficient matrix for the single B cells, CD14+ monocytes, CD4+ T cells, CD8+ T cells, dendritic cells, FCGR3A+ monocytes, megakaryocytes and NK cells. For each heatmap, the row represents type capsules and the column represents primary capsules. e, The revision of the overall heatmap of the combining matrix of average coupling coefficient. The combining matrix contains the effective type capsule row in Extended Data Fig. 3d where its recognition type is in accordance with the type of input single cells.

Extended Data Fig. 4 The identification of the core gene set responsible for recognition of CD4+ T cells in hPBMC.

a, The coloured changing curves of cell-type recognition accuracies while the ranking genes defined by a sliding cutoff value on the principal component score were excluded in the inputs of the scCapsNet model. The accuracy curve for each cell type is represented in a distinct colour. The dotted line defines a group of core genes responsible for CD4+ T cell identification, where the recognition accuracy of CD4+ T cells degrades close to 0 but slightly decreases for any other cell type. b, The plot depicts the two-dimensional PCA on the weight matrix for the primary capsule. Each dot represents a gene with a rank according to the score of principal components. A group of core genes marked as blue colour are defined. c, The comparison of prediction accuracy of each cell type before and after the masking of the CD4+ T cell core genes. d, The heatmaps of the revised matrices of averaged coupling coefficients for hPBMC dataset with the loss of the group of CD4+ T cell core genes in the inputs of the scCapsNet model. The heatmaps in order represent the revised averaged coupling coefficient matrix for the single B cells, CD14+ monocytes, CD4+ T cells, CD8+ T cells, dendritic cells, FCGR3A+ monocytes, megakaryocytes and NK cells. For each heatmap, the row represents type capsules and the column represents primary capsules. e, The revision of the overall heatmap of the combining matrix of average coupling coefficient. The combining matrix contains the effective type capsule row in Extended Data Fig. 4d where its recognition type is in accordance with the type of input single cells.

Extended Data Fig. 5 The identification of the core gene set responsible for recognition of dendritic cells in hPBMC.

a, The coloured changing curves of cell-type recognition accuracies while the ranking genes defined by a sliding cutoff value on the principal component score were excluded in the inputs of the scCapsNet model. The accuracy curve for each cell type is represented in a distinct colour. The dotted line defines a group of core genes responsible for dendritic-cell identification, where the recognition accuracy of dendritic cells degrades close to 0 but slightly decreases for any other cell type. b, The plot depicts the two-dimensional PCA on the weight matrix for the primary capsule. Each dot represents a gene with a rank according to the score of principal components. A group of core genes marked as blue colour are defined. c, The comparison of prediction accuracy of each cell type before and after the masking of the dendritic-cell core genes. d, The heatmaps of the revised matrices of averaged coupling coefficients for hPBMC dataset with the loss of the group of dendritic-cell core genes in the inputs of the scCapsNet model. The heatmaps in order represent the revised averaged coupling coefficient matrix for the single B cells, CD14+ monocytes, CD4+ T cells, CD8+ T cells, dendritic cells, FCGR3A+ monocytes, megakaryocytes and NK cells. For each heatmap, the row represents type capsules and the column represents primary capsules. e, The revision of the overall heatmap of the combining matrix of average coupling coefficient. The combining matrix contains the effective type capsule row in Extended Data Fig. 5d where its recognition type is in accordance with the type of input single cells.

Extended Data Fig. 6 The identification of the core gene set responsible for recognition of FCGR3A+ monocytes in hPBMC.

a, The coloured changing curves of cell-type recognition accuracies while the ranking genes defined by a sliding cutoff value on the principal component score were excluded in the inputs of the scCapsNet model. The accuracy curve for each cell type is represented in a distinct colour. The dotted line defines a group of core genes responsible for FCGR3A+ monocyte identification, where the recognition accuracy of FCGR3A+ monocytes degrades close to 0 but slightly decreases for any other cell type. b, The plot depicts the two-dimensional PCA on the weight matrix for the primary capsule. Each dot represents a gene with a rank according to the score of principal components. A group of core genes marked as blue colour are defined. c, The comparison of prediction accuracy of each cell type before and after the masking of the FCGR3A+ monocyte core genes. d, The heatmaps of the revised matrices of averaged coupling coefficients for the hPBMC dataset with the loss of the group of FCGR3A+ monocyte core genes in the inputs of the scCapsNet model. The heatmaps in order represent the revised averaged coupling coefficient matrix for the single B cells, CD14+ monocytes, CD4+ T cells, CD8+ T cells, dendritic cells, FCGR3A+ monocytes, megakaryocytes and NK cells. For each heatmap, the row represents type capsules and the column represents primary capsules. e, The revision of the overall heatmap of the combining matrix of average coupling coefficient. The combining matrix contains the effective type capsule row in Extended Data Fig. 6d where its recognition type is in accordance with the type of input single cells.

Extended Data Fig. 7 The identification of the core gene set responsible for recognition of megakaryocytes in hPBMC.

a, The coloured changing curves of cell-type recognition accuracies while the ranking genes defined by a sliding cutoff value on the principal component score were excluded in the inputs of the scCapsNet model. The accuracy curve for each cell type is represented in a distinct colour. The dotted line defines a group of core genes responsible for megakaryocyte identification, where the recognition accuracy of megakaryocytes degrades close to 0 but slightly decreases for any other cell type. b, The plot depicts the two-dimensional PCA on the weight matrix for the primary capsule. Each dot represents a gene with a rank according to the score of principal components. A group of core genes marked as blue colour are defined. c, The comparison of prediction accuracy of each cell type before and after the masking of the megakaryocyte core genes. d, The heatmaps of the revised matrices of averaged coupling coefficients for hPBMC dataset with the loss of the group of megakaryocyte core genes in the inputs of the scCapsNet model. The heatmaps in order represent the revised averaged coupling coefficient matrix for the single B cells, CD14+ monocytes, CD4+ T cells, CD8+ T cells, dendritic cells, FCGR3A+ monocytes, megakaryocytes and NK cells. For each heatmap, the row represents type capsules and the column represents primary capsules. e, The revision of the overall heatmap of the combining matrix of average coupling coefficient. The combining matrix contains the effective type capsule row in Extended Data Fig. 7d where its recognition type is in accordance with the type of input single cells.

Extended Data Fig. 8 The identification of the core gene set responsible for recognition of NK cells in hPBMC.

a, The coloured changing curves of cell-type recognition accuracies while the ranking genes defined by a sliding cutoff value on the principal component score were excluded in the inputs of the scCapsNet model. The accuracy curve for each cell type is represented in a distinct colour. The dotted line defines a group of core genes responsible for NK cell identification, where the recognition accuracy of NK cells degrades close to 0 but slightly decreases for any other cell type. b, The plot depicts the two-dimensional PCA on the weight matrix for the primary capsule. Each dot represents a gene with a rank according to the score of principal components. A group of core genes marked as blue colour are defined. c, The comparison of prediction accuracy of each cell type before and after the masking of the NK cell core genes. d, The heatmaps of the revised matrices of averaged coupling coefficients for hPBMC dataset with the loss of the group of NK cell core genes in the inputs of the scCapsNet model. The heatmaps in order represent the revised averaged coupling coefficient matrix for the single B cells, CD14+ monocytes, CD4+ T cells, CD8+ T cells, dendritic cells, FCGR3A+ monocytes, megakaryocytes and NK cells. For each heatmap, the row represents type capsules and the column represents primary capsules. e, The revision of the overall heatmap of the combining matrix of average coupling coefficient. The combining matrix contains the effective type capsule row in Extended Data Fig. 8d where its recognition type is in accordance with the type of input single cells.

Extended Data Fig. 9 Identification of the core gene set responsible for recognition of one cell type in hRBC.

a, The plot depicts the two-dimensional PCA on the weight matrix for the primary capsule five in model trained on mRBC dataset. Each dot represents a gene with a rank according to the score of principal components. A group of core genes marked as blue colour are defined. b, The comparison of prediction accuracy of this cell type before and after the masking of the core genes. c, The heatmaps of the revised matrices of averaged coupling coefficients for hRBC dataset with the loss of the group of core genes in the inputs of the scCapsNet model. For each heatmap, the row represents type capsules and the column represents primary capsules. d, The revision of the overall heatmap of the combining matrix of average coupling coefficient. The combining matrix contains the effective type capsule row in Extended Data Fig. 9c where its recognition type is in accordance with the type of input single cells.

Extended Data Fig. 10 The well studied cell-type associated genes in the core gene sets relevant to distinct subcellular types.

The scatter plots in order depict the two-dimensional PCA on column vectors of weight matrices fully connecting inputs and primary capsules 10, 1, 2, 4, 8, 14, 6, and 16. They defined the groups of core genes (in blue dots), contributing to the identification of B cells, CD14 + monocytes, CD4 + T cells, CD8 + T cells, dendritic cells, FCGR3A + monocytes, megakaryocytes, and NK cells respectively. Several well-studied cell type associated genes are represented as coloured stars with gene name underneath. The colours of the stars represent the cell type of the corresponding gene associated.

Supplementary information

Supplementary Tables

Supplementary Table 1. Core genes in hPBMC dataset identified by scCapsNet. Supplementary Table 2. Results of GO enrichment analysis and reactome pathway analysis. Supplementary Table 3. Summary of all the scRNA-seq datasets used.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, L., Nie, R., Yu, Z. et al. An interpretable deep-learning architecture of capsule networks for identifying cell-type gene expression programs from single-cell RNA-sequencing data. Nat Mach Intell 2, 693–703 (2020). https://doi.org/10.1038/s42256-020-00244-4

Download citation

Received: 03 May 2020
Accepted: 28 September 2020
Published: 02 November 2020
Issue Date: November 2020
DOI: https://doi.org/10.1038/s42256-020-00244-4

This article is cited by

Predicting the prevalence of complex genetic diseases from individual genotype profiles using capsule networks
- Xiao Luo
- Xiongbin Kang
- Alexander Schönhuth
Nature Machine Intelligence (2023)
A scalable sparse neural network framework for rare cell type annotation of single-cell transcriptome data
- Yuqi Cheng
- Xingyu Fan
- Yu Li
Communications Biology (2023)
CapsNet-MHC predicts peptide-MHC class I binding based on capsule neural networks
- Mahmood Kalemati
- Saeid Darvishi
- Somayyeh Koohi
Communications Biology (2023)
Identifying tumor cells at the single-cell level using machine learning
- Jan Dohmen
- Artem Baranovskii
- Altuna Akalin
Genome Biology (2022)
scCapsNet-mask: an updated version of scCapsNet with extended applicability in functional analysis related to scRNA-seq data
- Lifei Wang
- Rui Nie
- Jun Cai
BMC Bioinformatics (2022)