Skip to main content

2015 | OriginalPaper | Buchkapitel

A New Bioinformatic Pipeline to Address the Most Common Requirements in RNA-seq Data Analysis

verfasst von : Osvaldo Graña, Miriam Rubio-Camarillo, Florentino Fdez-Riverola, David G. Pisano, Daniel Glez-Peña

Erschienen in: 9th International Conference on Practical Applications of Computational Biology and Bioinformatics

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Many bioinformatic programs have been developed to analyze data from RNA-seq experiments. These programs are widely used and often included in computational pipelines. Nevertheless, there does not seem to be a precise definition of what constitutes a proper workflow for this kind of data. We present here a new workflow that takes into account the most common requirements for RNA-seq analysis, and that is implemented as an automatic pipeline to perform an efficient and complete evaluation.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Wang, Z., Gerstein, M., Snyder, M.: RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10(1), 57–63 (2009)CrossRef Wang, Z., Gerstein, M., Snyder, M.: RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10(1), 57–63 (2009)CrossRef
2.
Zurück zum Zitat International Cancer Genome Consortium, et al: International network of cancer genome projects. Nature 464(7291), 993–998 (2010)CrossRef International Cancer Genome Consortium, et al: International network of cancer genome projects. Nature 464(7291), 993–998 (2010)CrossRef
3.
Zurück zum Zitat Abbott, A.: Europe to map the human epigenome. Nature 477(7366), 518 (2011)CrossRef Abbott, A.: Europe to map the human epigenome. Nature 477(7366), 518 (2011)CrossRef
4.
Zurück zum Zitat ENCODE Project Consortium: An integrated encyclopedia of DNA elements in the human genome. Nature 489(7414), 57–74 (2012)CrossRef ENCODE Project Consortium: An integrated encyclopedia of DNA elements in the human genome. Nature 489(7414), 57–74 (2012)CrossRef
5.
Zurück zum Zitat Cancer Genome Atlas Research Network et al.: The cancer genome atlas pan-cancer analysis project. Nat. Genet. 45(10), 1113–1120 (2013) Cancer Genome Atlas Research Network et al.: The cancer genome atlas pan-cancer analysis project. Nat. Genet. 45(10), 1113–1120 (2013)
6.
Zurück zum Zitat Goncalves, A., Tikhonov, A., Brazma, A., Kapushesky, M.: A pipeline for RNA-seq data processing and quality assessment. Bioinformatics 27(6), 867–869 (2011)CrossRef Goncalves, A., Tikhonov, A., Brazma, A., Kapushesky, M.: A pipeline for RNA-seq data processing and quality assessment. Bioinformatics 27(6), 867–869 (2011)CrossRef
7.
Zurück zum Zitat Goecks, J., Nekrutenko, A., Taylor, J.: Galaxy Team. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11(8), R86 (2010) Goecks, J., Nekrutenko, A., Taylor, J.: Galaxy Team. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11(8), R86 (2010)
8.
Zurück zum Zitat Cumbie, J.S., Kimbrel, J.A., Di, Y., Schafer, D.W., Wilhelm, L.J., Fox, S.E., Sullivan, C.M., Curzon, A.D., Carrington, J.C., Mockler, T.C., Chang, J.H.: GENE-counter: a computational pipeline for the analysis of RNA-Seq data for gene expression differences. PLoS ONE 6(10), e25279 (2011)CrossRef Cumbie, J.S., Kimbrel, J.A., Di, Y., Schafer, D.W., Wilhelm, L.J., Fox, S.E., Sullivan, C.M., Curzon, A.D., Carrington, J.C., Mockler, T.C., Chang, J.H.: GENE-counter: a computational pipeline for the analysis of RNA-Seq data for gene expression differences. PLoS ONE 6(10), e25279 (2011)CrossRef
9.
Zurück zum Zitat Reich, M., Liefeld, T., Gould, J., Lerner, J., Tamayo, P., Mesirov, J.P.: GenePattern 2.0. Nat. Genet. 38(5), 500–501 (2006)CrossRef Reich, M., Liefeld, T., Gould, J., Lerner, J., Tamayo, P., Mesirov, J.P.: GenePattern 2.0. Nat. Genet. 38(5), 500–501 (2006)CrossRef
10.
Zurück zum Zitat Knowles, D.G., Röder, M., Merkel, A., Guigó, R.: Grape RNA-Seq analysis pipeline environment. Bioinformatics 29(5), 614–621 (2013)CrossRef Knowles, D.G., Röder, M., Merkel, A., Guigó, R.: Grape RNA-Seq analysis pipeline environment. Bioinformatics 29(5), 614–621 (2013)CrossRef
11.
Zurück zum Zitat Kalari, K.R., Nair, A.A., Bhavsar, J.D., O’Brien, D.R., Davila, J.I., Bockol, M.A., Nie, J., Tang, X., Baheti, S., Doughty, J.B., Middha, S., Sicotte, H., Thompson, A.E., Asmann, Y.W., Kocher, J.P.: MAP-RSeq: mayo analysis pipeline for RNA sequencing. BMC Bioinform. 15, 224 (2014)CrossRef Kalari, K.R., Nair, A.A., Bhavsar, J.D., O’Brien, D.R., Davila, J.I., Bockol, M.A., Nie, J., Tang, X., Baheti, S., Doughty, J.B., Middha, S., Sicotte, H., Thompson, A.E., Asmann, Y.W., Kocher, J.P.: MAP-RSeq: mayo analysis pipeline for RNA sequencing. BMC Bioinform. 15, 224 (2014)CrossRef
12.
Zurück zum Zitat Torres-García, W., Zheng, S., Sivachenko, A., Vegesna, R., Wang, Q., Yao, R., Berger, M.F., Weinstein, J.N., Getz, G., Verhaak, R.G.: PRADA: pipeline for RNA sequencing data analysis. Bioinformatics 30(15), 2224–2226 (2014)CrossRef Torres-García, W., Zheng, S., Sivachenko, A., Vegesna, R., Wang, Q., Yao, R., Berger, M.F., Weinstein, J.N., Getz, G., Verhaak, R.G.: PRADA: pipeline for RNA sequencing data analysis. Bioinformatics 30(15), 2224–2226 (2014)CrossRef
13.
Zurück zum Zitat Engström, P.G., Steijger, T., Sipos, B., Grant, G.R., Kahles, A., Rätsch, G., Goldman, N., Hubbard, T.J., Harrow, J., Guigó, R.: Bertone P; RGASP Consortium. Systematic evaluation of spliced alignment programs for RNA-seq data. Nat. Methods 10(12), 1185–1191 (2013)CrossRef Engström, P.G., Steijger, T., Sipos, B., Grant, G.R., Kahles, A., Rätsch, G., Goldman, N., Hubbard, T.J., Harrow, J., Guigó, R.: Bertone P; RGASP Consortium. Systematic evaluation of spliced alignment programs for RNA-seq data. Nat. Methods 10(12), 1185–1191 (2013)CrossRef
14.
Zurück zum Zitat Soneson, C., Delorenzi, M.: A comparison of methods for differential expression analysis of RNA-seq data. BMC Bioinform. 14, 91 (2013)CrossRef Soneson, C., Delorenzi, M.: A comparison of methods for differential expression analysis of RNA-seq data. BMC Bioinform. 14, 91 (2013)CrossRef
15.
Zurück zum Zitat Rapaport, F., Khanin, R., Liang, Y., Pirun, M., Krek, A., Zumbo, P., Mason, C.E., Socci, N.D., Betel, D.: Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data. Genome Biol. 14(9), R95 (2013)CrossRef Rapaport, F., Khanin, R., Liang, Y., Pirun, M., Krek, A., Zumbo, P., Mason, C.E., Socci, N.D., Betel, D.: Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data. Genome Biol. 14(9), R95 (2013)CrossRef
16.
Zurück zum Zitat Steijger, T., Abril, J.F., Engström, P.G., Kokocinski, F., Hubbard, T.J., Guigó, R., Harrow, J., Bertone, P.: RGASP Consortium. Assessment of transcript reconstruction methods for RNA-seq. Nat. Methods 10(12), 1177–1184 (2013) Steijger, T., Abril, J.F., Engström, P.G., Kokocinski, F., Hubbard, T.J., Guigó, R., Harrow, J., Bertone, P.: RGASP Consortium. Assessment of transcript reconstruction methods for RNA-seq. Nat. Methods 10(12), 1177–1184 (2013)
17.
Zurück zum Zitat Fonseca, N.A., Marioni, J., Brazma, A.: RNA-Seq gene profiling - A systematic empirical comparison. PLoS ONE 9(9), e107026 (2014)CrossRef Fonseca, N.A., Marioni, J., Brazma, A.: RNA-Seq gene profiling - A systematic empirical comparison. PLoS ONE 9(9), e107026 (2014)CrossRef
18.
Zurück zum Zitat Rubio-Camarillo, M., Gómez-López, G., Fernández, J.M., Valencia, A., Pisano, D.G.: RUbioSeq: a suite of parallelized pipelines to automate exome variation and bisulfite-seq analyses. Bioinformatics 29(13), 1687–1689 (2013)CrossRef Rubio-Camarillo, M., Gómez-López, G., Fernández, J.M., Valencia, A., Pisano, D.G.: RUbioSeq: a suite of parallelized pipelines to automate exome variation and bisulfite-seq analyses. Bioinformatics 29(13), 1687–1689 (2013)CrossRef
19.
Zurück zum Zitat Cock, P.J., Fields, C.J., Goto, N., Heuer, M.L., Rice, P.M.: The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucl. Acids Res. 38(6), 1767–1771 (2010)CrossRef Cock, P.J., Fields, C.J., Goto, N., Heuer, M.L., Rice, P.M.: The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucl. Acids Res. 38(6), 1767–1771 (2010)CrossRef
20.
Zurück zum Zitat Trapnell, C., et al.: Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 7, 562–578 (2012)CrossRef Trapnell, C., et al.: Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 7, 562–578 (2012)CrossRef
21.
Zurück zum Zitat Langmead, B., Trapnell, C., Pop, M., Salzberg, S.L.: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10(3), R25 (2009)CrossRef Langmead, B., Trapnell, C., Pop, M., Salzberg, S.L.: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10(3), R25 (2009)CrossRef
22.
Zurück zum Zitat Li, H., et al.: The Sequence Alignment/Map format and SAMtools. Bioinformatics 25(16), 2078–2079 (2009)CrossRef Li, H., et al.: The Sequence Alignment/Map format and SAMtools. Bioinformatics 25(16), 2078–2079 (2009)CrossRef
23.
Zurück zum Zitat Lovén, J., Orlando, D.A., Sigova, A.A., Lin, C.Y., Rahl, P.B., Burge, C.B., Levens, D.L., Lee, T.I., Young, R.A.: Revisiting global gene expression analysis. Cell 151(3), 476–482 (2012)CrossRef Lovén, J., Orlando, D.A., Sigova, A.A., Lin, C.Y., Rahl, P.B., Burge, C.B., Levens, D.L., Lee, T.I., Young, R.A.: Revisiting global gene expression analysis. Cell 151(3), 476–482 (2012)CrossRef
24.
Zurück zum Zitat Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S., Mesirov, J.P.: Gene set enrichment analysis: a knowledge based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. U S A 102(43), 15545–15550 (2005)CrossRef Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S., Mesirov, J.P.: Gene set enrichment analysis: a knowledge based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. U S A 102(43), 15545–15550 (2005)CrossRef
25.
Zurück zum Zitat Anders, S., Pyl, P.T., Huber, W.: HTSeq-a Python framework to work with high-throughput sequencing data. Bioinformatics 31(2), 166–169 (2015)CrossRef Anders, S., Pyl, P.T., Huber, W.: HTSeq-a Python framework to work with high-throughput sequencing data. Bioinformatics 31(2), 166–169 (2015)CrossRef
26.
Zurück zum Zitat Anders, S., McCarthy, D.J., Chen, Y., Okoniewski, M., Smyth, G.K., Huber, W., Robinson, M.D.: Count-based differential expression analysis of RNA sequencing data using R and Bioconductor. Nat. Protoc. 8(9), 1765–1786 (2013)CrossRef Anders, S., McCarthy, D.J., Chen, Y., Okoniewski, M., Smyth, G.K., Huber, W., Robinson, M.D.: Count-based differential expression analysis of RNA sequencing data using R and Bioconductor. Nat. Protoc. 8(9), 1765–1786 (2013)CrossRef
27.
Zurück zum Zitat Kim, D., Salzberg, S.L.: TopHat-Fusion: an algorithm for discovery of novel fusion transcripts. Genome Biol. 12(8), R72 (2011)CrossRef Kim, D., Salzberg, S.L.: TopHat-Fusion: an algorithm for discovery of novel fusion transcripts. Genome Biol. 12(8), R72 (2011)CrossRef
28.
Zurück zum Zitat Quinlan, A.R., Hall, I.M.: BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26(6), 841–842 (2010)CrossRef Quinlan, A.R., Hall, I.M.: BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26(6), 841–842 (2010)CrossRef
Metadaten
Titel
A New Bioinformatic Pipeline to Address the Most Common Requirements in RNA-seq Data Analysis
verfasst von
Osvaldo Graña
Miriam Rubio-Camarillo
Florentino Fdez-Riverola
David G. Pisano
Daniel Glez-Peña
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-19776-0_13