Skip to main content
Top

2021 | OriginalPaper | Chapter

Finding the Centre: Compositional Asymmetry in High-Throughput Sequencing Datasets

Authors : Jia R. Wu, Jean M. Macklaim, Briana L. Genge, Gregory B. Gloor

Published in: Advances in Compositional Data Analysis

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

High-throughput sequencing datasets comprise millions of reads of genomic data and can be modelled as count compositions. These data are used for transcription profiles, microbial diversity, or relative cellular abundance in culture. The data are sparse and high dimensional. Moreover, they are often unbalanced, i.e. there is often systematic variation between groups due to presence or absence of features, and this variation is important to the biological interpretation of the data. The imbalance causes samples in the comparison groups to exhibit varying centres contributing to false positive and false negative identifications. Here, we extend the centred log-ratio transformation method used for the comparison of differential relative abundance between two groups in a Bayesian compositional context. We demonstrate the pathology in modelled and real unbalanced experimental designs to show how this causes both false negative and false positive inference. We examined four approaches to identify denominator features, and tested them with different proportions of modelled asymmetry; two were relatively robust, and recommended. We recommend the ‘LVHA’ transformation for asymmetric transcriptome datasets, and the ‘IQLR’ method for all other datasets when using the ALDEx2 tool available on Bioconductor.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference J. Aitchison, The Statistical Analysis of Compositional Data (Chapman & Hall, London, 1986) J. Aitchison, The Statistical Analysis of Compositional Data (Chapman & Hall, London, 1986)
go back to reference C. Barceló-Vidal, J.A. Martin-Fernàndez, V. Pawlowsky-Glahn, Mathematical foundations of compositional data analysis, in Proceedings of IAMG, vol. 1 (Springer, 2001), pp. 1–20 C. Barceló-Vidal, J.A. Martin-Fernàndez, V. Pawlowsky-Glahn, Mathematical foundations of compositional data analysis, in Proceedings of IAMG, vol. 1 (Springer, 2001), pp. 1–20
go back to reference G. Bian, G.B. Gloor, A. Gong, C. Jia, W. Zhang, J. Hu, H. Zhang, Y. Zhang, Z. Zhou, J. Zhang, J.P. Burton, G. Reid, Y. Xiao, Q. Zeng, K. Yang, J. Li, The gut microbiota of healthy aged Chinese is similar to that of the healthy young. mSphere 2(5), e00327–17 (2017). https://doi.org/10.1128/mSphere.00327-17 G. Bian, G.B. Gloor, A. Gong, C. Jia, W. Zhang, J. Hu, H. Zhang, Y. Zhang, Z. Zhou, J. Zhang, J.P. Burton, G. Reid, Y. Xiao, Q. Zeng, K. Yang, J. Li, The gut microbiota of healthy aged Chinese is similar to that of the healthy young. mSphere 2(5), e00327–17 (2017). https://​doi.​org/​10.​1128/​mSphere.​00327-17
go back to reference M.A. Dillies, A. Rau, J. Aubert, C. Hennequet-Antier, M. Jeanmougin, N. Servant, C. Keime, G. Marot, D. Castel, J. Estelle, G. Guernec, B. Jagla, L. Jouneau, D. Laloë, C. Le Gall, B. Schaëffer, S. Le Crom, M. Guedj, F. Jaffrézic, French StatOmique consortium: a comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis. Brief. Bioinform. 14(6), 671–683 (2013). https://doi.org/10.1093/bib/bbs046 M.A. Dillies, A. Rau, J. Aubert, C. Hennequet-Antier, M. Jeanmougin, N. Servant, C. Keime, G. Marot, D. Castel, J. Estelle, G. Guernec, B. Jagla, L. Jouneau, D. Laloë, C. Le Gall, B. Schaëffer, S. Le Crom, M. Guedj, F. Jaffrézic, French StatOmique consortium: a comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis. Brief. Bioinform. 14(6), 671–683 (2013). https://​doi.​org/​10.​1093/​bib/​bbs046
go back to reference A.D. Fernandes, J.N. Reid, J.M. Macklaim, T.A. McMurrough, D.R. Edgell, G.B. Gloor, Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis. Microbiome 2, 15.1–15.13 (2014). https://doi.org/10.1186/2049-2618-2-15 A.D. Fernandes, J.N. Reid, J.M. Macklaim, T.A. McMurrough, D.R. Edgell, G.B. Gloor, Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis. Microbiome 2, 15.1–15.13 (2014). https://​doi.​org/​10.​1186/​2049-2618-2-15
go back to reference A.C. Frazee, A.E. Jaffe, R. Kirchner, J.T. Leek, Polyester: simulate RNA-seq reads. R package version 1.10.0 (2016) A.C. Frazee, A.E. Jaffe, R. Kirchner, J.T. Leek, Polyester: simulate RNA-seq reads. R package version 1.10.0 (2016)
go back to reference M. Gierliński, C. Cole, P. Schofield, N.J. Schurch, A. Sherstnev, V. Singh, N. Wrobel, K. Gharbi, G. Simpson, T. Owen-Hughes, M. Blaxter, G.J. Barton, Statistical models for RNA-seq data derived from a two-condition 48-replicate experiment. Bioinformatics 31(22), 3625–3630 (2015). https://doi.org/10.1093/bioinformatics/btv425 M. Gierliński, C. Cole, P. Schofield, N.J. Schurch, A. Sherstnev, V. Singh, N. Wrobel, K. Gharbi, G. Simpson, T. Owen-Hughes, M. Blaxter, G.J. Barton, Statistical models for RNA-seq data derived from a two-condition 48-replicate experiment. Bioinformatics 31(22), 3625–3630 (2015). https://​doi.​org/​10.​1093/​bioinformatics/​btv425
go back to reference B. Langmead, S.L. Salzberg, Fast gapped-read alignment with Bowtie 2. Nat. Methods 9(4), 357–359 (2012) B. Langmead, S.L. Salzberg, Fast gapped-read alignment with Bowtie 2. Nat. Methods 9(4), 357–359 (2012)
go back to reference D.R. Lovell, X.Y. Chua, A. McGrath, Counts: an outstanding challenge for log-ratio analysis of compositional data in the molecular biosciences. NAR Genomics Bioinform. 2(2), lqaa040 (2020) D.R. Lovell, X.Y. Chua, A. McGrath, Counts: an outstanding challenge for log-ratio analysis of compositional data in the molecular biosciences. NAR Genomics Bioinform. 2(2), lqaa040 (2020)
go back to reference J.M. Macklaim, A.D. Fernandes, J.M. Di Bella, J.A. Hammond, G. Reid, G.B. Gloor, Comparative meta-RNA-seq of the vaginal microbiota and differential expression by Lactobacillus iners in health and dysbiosis. Microbiome 1(1), 12 (2013). https://doi.org/10.1186/2049-2618-1-12 J.M. Macklaim, A.D. Fernandes, J.M. Di Bella, J.A. Hammond, G. Reid, G.B. Gloor, Comparative meta-RNA-seq of the vaginal microbiota and differential expression by Lactobacillus iners in health and dysbiosis. Microbiome 1(1), 12 (2013). https://​doi.​org/​10.​1186/​2049-2618-1-12
go back to reference J.M. Macklaim, J.C. Clemente, R. Knight, G.B. Gloor, G. Reid, Changes in vaginal microbiota following antimicrobial and probiotic therapy. Microb. Ecol. Health Dis. 26, 27799 (2015) J.M. Macklaim, J.C. Clemente, R. Knight, G.B. Gloor, G. Reid, Changes in vaginal microbiota following antimicrobial and probiotic therapy. Microb. Ecol. Health Dis. 26, 27799 (2015)
go back to reference T.A. McMurrough, C.M. Brown, K. Zhang, G. Hausner, M.S. Junop, G.B. Gloor, D.R. Edgell, Active site residue identity regulates cleavage preference of LAGLIDADG homing endonucleases. Nucleic Acids Res. 46(22), 11990–12007 (2018). https://doi.org/10.1093/nar/gky976 T.A. McMurrough, C.M. Brown, K. Zhang, G. Hausner, M.S. Junop, G.B. Gloor, D.R. Edgell, Active site residue identity regulates cleavage preference of LAGLIDADG homing endonucleases. Nucleic Acids Res. 46(22), 11990–12007 (2018). https://​doi.​org/​10.​1093/​nar/​gky976
go back to reference A.L. Mitchell, M. Scheremetjew, H. Denise, S. Potter, A. Tarkowska, M. Qureshi, G.A. Salazar, S. Pesseat, M.A. Boland, F.M.I. Hunter, P. Ten Hoopen, B. Alako, C. Amid, D.J. Wilkinson, T.P. Curtis, G. Cochrane, R.D. Finn, EBI metagenomics in 2017: enriching the analysis of microbial communities, from sequence reads to assemblies. Nucleic Acids Res. 46(D1), D726–D735 (2018). https://doi.org/10.1093/nar/gkx967 A.L. Mitchell, M. Scheremetjew, H. Denise, S. Potter, A. Tarkowska, M. Qureshi, G.A. Salazar, S. Pesseat, M.A. Boland, F.M.I. Hunter, P. Ten Hoopen, B. Alako, C. Amid, D.J. Wilkinson, T.P. Curtis, G. Cochrane, R.D. Finn, EBI metagenomics in 2017: enriching the analysis of microbial communities, from sequence reads to assemblies. Nucleic Acids Res. 46(D1), D726–D735 (2018). https://​doi.​org/​10.​1093/​nar/​gkx967
go back to reference R. Overbeek, R. Olson, G.D. Pusch, G.J. Olsen, J.J. Davis, T. Disz, R.A. Edwards, S. Gerdes, B. Parrello, M. Shukla, V. Vonstein, A.R. Wattam, F. Xia, R. Stevens, The seed and the rapid annotation of microbial genomes using subsystems technology (RAST). Nucleic Acids Res. 42(Database issue), D206–14 (2014). https://doi.org/10.1093/nar/gkt1226 R. Overbeek, R. Olson, G.D. Pusch, G.J. Olsen, J.J. Davis, T. Disz, R.A. Edwards, S. Gerdes, B. Parrello, M. Shukla, V. Vonstein, A.R. Wattam, F. Xia, R. Stevens, The seed and the rapid annotation of microbial genomes using subsystems technology (RAST). Nucleic Acids Res. 42(Database issue), D206–14 (2014). https://​doi.​org/​10.​1093/​nar/​gkt1226
go back to reference J. Ravel, P. Gajer, Z. Abdo, G.M. Schneider, S.S.K. Koenig, S.L. McCulle, S. Karlebach, R. Gorle, J. Russell, C.O. Tacket, R.M. Brotman, C.C. Davis, K. Ault, L. Peralta, L.J. Forney, Vaginal microbiome of reproductive-age women. Proc. Natl. Acad. Sci. USA 108, 4680–4687 (2011). https://doi.org/10.1073/pnas.100611107 J. Ravel, P. Gajer, Z. Abdo, G.M. Schneider, S.S.K. Koenig, S.L. McCulle, S. Karlebach, R. Gorle, J. Russell, C.O. Tacket, R.M. Brotman, C.C. Davis, K. Ault, L. Peralta, L.J. Forney, Vaginal microbiome of reproductive-age women. Proc. Natl. Acad. Sci. USA 108, 4680–4687 (2011). https://​doi.​org/​10.​1073/​pnas.​100611107
go back to reference O. Thellin, W. Zorzi, B. Lakaye, B. De Borman, B. Coumans, G. Hennen, T. Grisar, A. Igout, E. Heinen, Housekeeping genes as internal standards: use and limits. J. Biotechnol. 75(2–3), 291–295 (1999)CrossRef O. Thellin, W. Zorzi, B. Lakaye, B. De Borman, B. Coumans, G. Hennen, T. Grisar, A. Igout, E. Heinen, Housekeeping genes as internal standards: use and limits. J. Biotechnol. 75(2–3), 291–295 (1999)CrossRef
go back to reference J. Vandesompele, K. De Preter, F. Pattyn, B. Poppe, N. Van Roy, A. De Paepe, F. Speleman, Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes. Genome Biol. 3(7), RESEARCH0034 (2002) J. Vandesompele, K. De Preter, F. Pattyn, B. Poppe, N. Van Roy, A. De Paepe, F. Speleman, Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes. Genome Biol. 3(7), RESEARCH0034 (2002)
go back to reference J.M. Wolfs, T.A. Hamilton, J.T. Lant, M. Laforet, J. Zhang, L.M. Salemi, G.B. Gloor, C. Schild-Poulter, D.R. Edgell, Biasing genome-editing events toward precise length deletions with an RNA-guided TevCas9 dual nuclease. Proc. Natl. Acad. Sci. USA (2016). https://doi.org/10.1073/pnas.1616343114 J.M. Wolfs, T.A. Hamilton, J.T. Lant, M. Laforet, J. Zhang, L.M. Salemi, G.B. Gloor, C. Schild-Poulter, D.R. Edgell, Biasing genome-editing events toward precise length deletions with an RNA-guided TevCas9 dual nuclease. Proc. Natl. Acad. Sci. USA (2016). https://​doi.​org/​10.​1073/​pnas.​1616343114
go back to reference H. Zhao, C. Chen, Y. Xiong, X. Xu, R. Lan, H. Wang, X. Yao, X. Bai, X. Liu, Q. Meng, X. Zhang, H. Sun, A. Zhao, X. Bai, Y. Cheng, Q. Chen, C. Ye, J. Xu, Global transcriptional and phenotypic analyses of Escherichia coli O157:H7 strain Xuzhou21 and its pO157\_Sal cured mutant. PLoS One 8(5), e65466 (2013). https://doi.org/10.1371/journal.pone.0065466 H. Zhao, C. Chen, Y. Xiong, X. Xu, R. Lan, H. Wang, X. Yao, X. Bai, X. Liu, Q. Meng, X. Zhang, H. Sun, A. Zhao, X. Bai, Y. Cheng, Q. Chen, C. Ye, J. Xu, Global transcriptional and phenotypic analyses of Escherichia coli O157:H7 strain Xuzhou21 and its pO157\_Sal cured mutant. PLoS One 8(5), e65466 (2013). https://​doi.​org/​10.​1371/​journal.​pone.​0065466
Metadata
Title
Finding the Centre: Compositional Asymmetry in High-Throughput Sequencing Datasets
Authors
Jia R. Wu
Jean M. Macklaim
Briana L. Genge
Gregory B. Gloor
Copyright Year
2021
DOI
https://doi.org/10.1007/978-3-030-71175-7_17

Premium Partner