Skip to main content
Erschienen in: Datenbank-Spektrum 3/2015

01.11.2015 | SCHWERPUNKTBEITRAG

Genome sequence analysis with MonetDB

A case study on Ebola virus diversity

verfasst von: Robin Cijvat, Stefan Manegold, Martin Kersten, Gunnar W. Klau, Alexander Schönhuth, Tobias Marschall, Ying Zhang

Erschienen in: Datenbank-Spektrum | Ausgabe 3/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Next-generation sequencing (NGS) technology has led the life sciences into the big data era. Today, sequencing genomes takes little time and cost, but yields terabytes of data to be stored and analyzed. Biologists are often exposed to excessively time consuming and error-prone data management and analysis hurdles. In this paper, we propose a database management system (DBMS) based approach to accelerate and substantially simplify genome sequence analysis. We have extended MonetDB, an open-source column-based DBMS, with a BAM module, which enables easy, flexible, and rapid management and analysis of sequence alignment data stored as Sequence Alignment/Map (SAM/BAM) files. We describe the main features of MonetDB/BAM using a case study on Ebola virus
genomes.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Weitere Produktempfehlungen anzeigen
Fußnoten
3
For this use case, we do not benefit from the read oriented storage that MonetDB/BAM uses. However, [2] shows many use cases for which it does.
 
Literatur
1.
Zurück zum Zitat Beerenwinkel N et al (2012) Challenges and opportunities in estimating viral genetic diversity from next-generation sequencing data. Front Microbiol 3:329 Beerenwinkel N et al (2012) Challenges and opportunities in estimating viral genetic diversity from next-generation sequencing data. Front Microbiol 3:329
2.
Zurück zum Zitat Cijvat R (2014) Bridging the gap between big genome data analysis and database management systems. Master’s thesis, CWI and Utrecht University Cijvat R (2014) Bridging the gap between big genome data analysis and database management systems. Master’s thesis, CWI and Utrecht University
3.
Zurück zum Zitat Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. Paper presented at the 6th Symposium on Operating System Design and Implementation, San Francisco, December 2004 Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. Paper presented at the 6th Symposium on Operating System Design and Implementation, San Francisco, December 2004
4.
Zurück zum Zitat Dorok S et al (2014) Toward Efficient Variant Calling Inside Main-Memory Database Systems. BIOKDD-DEXA Workshops, pp. 41–45 Dorok S et al (2014) Toward Efficient Variant Calling Inside Main-Memory Database Systems. BIOKDD-DEXA Workshops, pp. 41–45
5.
Zurück zum Zitat Gire SK et al (2014) Genomic surveillance elucidates Ebola virus origin and transmission during the 2014 outbreak. Science 345(6202):1369–1372 Gire SK et al (2014) Genomic surveillance elucidates Ebola virus origin and transmission during the 2014 outbreak. Science 345(6202):1369–1372
6.
Zurück zum Zitat Kargin Y, Kersten ML, Manegold S, Pirk H (2015) The DBMS—your big data sommelier. Proceedings of IEEE International Conference on Data Engineering 2015 (ICDE 31) Kargin Y, Kersten ML, Manegold S, Pirk H (2015) The DBMS—your big data sommelier. Proceedings of IEEE International Conference on Data Engineering 2015 (ICDE 31)
7.
Zurück zum Zitat Li H, Durbin R (2009) Fast and accurate short read alignment with burrows-wheeler transform. Bioinformatics 25:1754–1760 Li H, Durbin R (2009) Fast and accurate short read alignment with burrows-wheeler transform. Bioinformatics 25:1754–1760
8.
Zurück zum Zitat Li H et al (2009) The Sequence Alignment/{M}ap format and SAMtools. Bioinformatics 25:2078–2079 Li H et al (2009) The Sequence Alignment/{M}ap format and SAMtools. Bioinformatics 25:2078–2079
9.
Zurück zum Zitat Manegold S et al (2009) Database architecture evolution: mammals flourished long before dinosaurs became extinct. PVLDB 2(2):1648–1653 Manegold S et al (2009) Database architecture evolution: mammals flourished long before dinosaurs became extinct. PVLDB 2(2):1648–1653
10.
Zurück zum Zitat Pavlo A et al (2009) A Comparison of Approaches to Large-Scale Data Analysis. SIGMOD, pp. 165–178 Pavlo A et al (2009) A Comparison of Approaches to Large-Scale Data Analysis. SIGMOD, pp. 165–178
11.
Zurück zum Zitat Quinlan A Hall I (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26(6):841–842 Quinlan A Hall I (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26(6):841–842
12.
Zurück zum Zitat Röhm U, Blakeley JA (2009) Data management for high-throughput genomics. CIDR, pp. 97–111 Röhm U, Blakeley JA (2009) Data management for high-throughput genomics. CIDR, pp. 97–111
13.
Zurück zum Zitat Schapranow MP, Plattner H (2013) HIG - An in-memory database platform enabling real-time analyses of genome data. BigData, pp. 691–696 Schapranow MP, Plattner H (2013) HIG - An in-memory database platform enabling real-time analyses of genome data. BigData, pp. 691–696
14.
Zurück zum Zitat Schatz MC, Langmead B (2013) The DNA data deluge. IEEE Spectrum 50(7):28–33 Schatz MC, Langmead B (2013) The DNA data deluge. IEEE Spectrum 50(7):28–33
15.
Zurück zum Zitat Toepfer A et al (2014) Viral quasispecies assembly via maximal clique enumeration. PLoS Comput Biol 10(3):e1003515 Toepfer A et al (2014) Viral quasispecies assembly via maximal clique enumeration. PLoS Comput Biol 10(3):e1003515
16.
Zurück zum Zitat Volchkov VE et al (1999) Characterization of the L gene and 5` trailer region of Ebola virus. J Gen Virol 80(Pt2):355–362 Volchkov VE et al (1999) Characterization of the L gene and 5` trailer region of Ebola virus. J Gen Virol 80(Pt2):355–362
17.
Zurück zum Zitat Wolstencroft K et al (2013) The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud. Nucleic Acids Res 41(Web Server issue):W557–W561 Wolstencroft K et al (2013) The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud. Nucleic Acids Res 41(Web Server issue):W557–W561
Metadaten
Titel
Genome sequence analysis with MonetDB
A case study on Ebola virus diversity
verfasst von
Robin Cijvat
Stefan Manegold
Martin Kersten
Gunnar W. Klau
Alexander Schönhuth
Tobias Marschall
Ying Zhang
Publikationsdatum
01.11.2015
Verlag
Springer Berlin Heidelberg
Erschienen in
Datenbank-Spektrum / Ausgabe 3/2015
Print ISSN: 1618-2162
Elektronische ISSN: 1610-1995
DOI
https://doi.org/10.1007/s13222-015-0198-x

Weitere Artikel der Ausgabe 3/2015

Datenbank-Spektrum 3/2015 Zur Ausgabe

KURZ ERKLÄRT

Polyglot Persistence

EDITORIAL

Editorial

COMMUNITY

News

DISSERTATIONEN

Dissertationen