Skip to main content
Top

2018 | OriginalPaper | Chapter

Quality Assessment of High-Throughput DNA Sequencing Data via Range Analysis

Authors : Ali Fotouhi, Mina Majidi, M. Oğuzhan Külekci

Published in: Bioinformatics and Biomedical Engineering

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In the recent literature, there appeared a number of studies for the quality assessment of sequencing data. These efforts, to a great extent, focused on reporting the statistical parameters regarding the distribution of the quality scores and/or the base-calls in a FASTQ file. We investigate another dimension for the quality assessment motivated by the fact that reads including long intervals having fewer errors improve the performances of the post-processing tools in the downstream analysis. Thus, the quality assessment procedures proposed in this study aim to analyze the segments on the reads that are above a certain quality. We define an interval of a read to be of desired–quality when there are at most k quality scores less than or equal to a threshold value v, for some k and v provided by the user. We present the algorithm to detect those ranges and introduce new metrics computed from their lengths. These metrics include the mean values for the longest, shortest, average, cubic average, coefficient variation, and segment numbers of the fragment lengths in each read that are appropriate according to the k and v input parameters. We also provide a new software tool QASDRA for quality assessment of sequencing data via range analysis, which is available at https://​github.​com/​ali-cp/​QASDRA.​git. QASDRA creates the quality assessment report of an input FASTQ file according to the user-specified k and v parameters. It also has the capabilities to filter out the reads according to the metrics introduced.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
1.
go back to reference Cox, M.P., Peterson, D.A., Biggs, P.J.: SolexaQA: at-a-glance quality assessment of illumina second-generation sequencing data. BMC Bioinf. 11(1), 485 (2010)CrossRef Cox, M.P., Peterson, D.A., Biggs, P.J.: SolexaQA: at-a-glance quality assessment of illumina second-generation sequencing data. BMC Bioinf. 11(1), 485 (2010)CrossRef
2.
go back to reference Yang, X., Liu, D., Liu, F., Wu, J., Zou, J., Xiao, X., Zhao, F., Zhu, B.: HTQC: a fast quality control toolkit for illumina sequencing data. BMC Bioinf. 14(1), 1 (2013)CrossRef Yang, X., Liu, D., Liu, F., Wu, J., Zou, J., Xiao, X., Zhao, F., Zhu, B.: HTQC: a fast quality control toolkit for illumina sequencing data. BMC Bioinf. 14(1), 1 (2013)CrossRef
3.
go back to reference Martínez-Alcántara, A., Ballesteros, E., Feng, C., Rojas, M., Koshinsky, H., Fofanov, V., Havlak, P., Fofanov, Y.: PIQA: pipeline for illumina G1 genome analyzer data quality assessment. Bioinformatics 25(18), 2438–2439 (2009)CrossRef Martínez-Alcántara, A., Ballesteros, E., Feng, C., Rojas, M., Koshinsky, H., Fofanov, V., Havlak, P., Fofanov, Y.: PIQA: pipeline for illumina G1 genome analyzer data quality assessment. Bioinformatics 25(18), 2438–2439 (2009)CrossRef
4.
go back to reference Zhang, T., Luo, Y., Liu, K., Pan, L., Zhang, B., Yu, J., Hu, S.: BIGpre: a quality assessment package for next-generation sequencing data. Genomics Proteomics Bioinf. 9(6), 238–244 (2011)CrossRef Zhang, T., Luo, Y., Liu, K., Pan, L., Zhang, B., Yu, J., Hu, S.: BIGpre: a quality assessment package for next-generation sequencing data. Genomics Proteomics Bioinf. 9(6), 238–244 (2011)CrossRef
5.
go back to reference Patel, R.K., Jain, M.: NGS QC toolkit: a toolkit for quality control of next generation sequencing data. PLoS ONE 7(2), e30619 (2012)CrossRef Patel, R.K., Jain, M.: NGS QC toolkit: a toolkit for quality control of next generation sequencing data. PLoS ONE 7(2), e30619 (2012)CrossRef
7.
go back to reference Auwera, G.A., Carneiro, M.O., Hartl, C., Poplin, R., del Angel, G., Levy-Moonshine, A., Jordan, T., Shakir, K., Roazen, D., Thibault, J., et al.: From FastQ data to high-confidence variant calls: the genome analysis toolkit best practices pipeline. Curr. Protoc. Bioinf. 43, 11.10.1–11.10.33 (2013) Auwera, G.A., Carneiro, M.O., Hartl, C., Poplin, R., del Angel, G., Levy-Moonshine, A., Jordan, T., Shakir, K., Roazen, D., Thibault, J., et al.: From FastQ data to high-confidence variant calls: the genome analysis toolkit best practices pipeline. Curr. Protoc. Bioinf. 43, 11.10.1–11.10.33 (2013)
9.
go back to reference Li, H., Durbin, R.: Fast and accurate long-read alignment with burrows-wheeler transform. Bioinformatics 26(5), 589–595 (2010)CrossRef Li, H., Durbin, R.: Fast and accurate long-read alignment with burrows-wheeler transform. Bioinformatics 26(5), 589–595 (2010)CrossRef
10.
go back to reference Langmead, B., Trapnell, C., Pop, M., Salzberg, S.L.: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10(3), 1 (2009)CrossRef Langmead, B., Trapnell, C., Pop, M., Salzberg, S.L.: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10(3), 1 (2009)CrossRef
11.
go back to reference Li, H., Homer, N.: A survey of sequence alignment algorithms for next-generation sequencing. Brief. Bioinf. 11(5), 473–483 (2010)CrossRef Li, H., Homer, N.: A survey of sequence alignment algorithms for next-generation sequencing. Brief. Bioinf. 11(5), 473–483 (2010)CrossRef
Metadata
Title
Quality Assessment of High-Throughput DNA Sequencing Data via Range Analysis
Authors
Ali Fotouhi
Mina Majidi
M. Oğuzhan Külekci
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-78723-7_37

Premium Partner