Skip to main content
Top

2018 | OriginalPaper | Chapter

Efficient Approximate Subsequence Matching Using Hybrid Signatures

Authors : Tao Qiu, Xiaochun Yang, Bin Wang, Yutong Han, Siyao Wang

Published in: Database Systems for Advanced Applications

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this paper, we focus on the problem of approximate subsequence matching, also called the read mapping problem in genomics, which is finding similar subsequences (A subsequence refers to a substring which has consecutive characters) of a query (DNA subsequence) from a reference genome under a user-specified similarity threshold k. Existing methods first extract subsequences from a query to generate signatures, then produce candidate positions using the generated signatures, and finally verify these candidate positions to obtain the true mapping positions. However, there exist two main issues in these works: (1) producing many candidate positions; and (2) generating large numbers of signatures, among which many signatures are redundant. To address the above two issues, we propose a novel filtering technique, called hybrid signatures, which can achieve a better balance between the filtering ability of signatures and the overhead of producing candidate positions. Accordingly, we devise an adaptive algorithm to produce candidate positions using hybrid signatures. Finally, the experimental results on real-world genomic sequences show that our method outperforms state-of-the-art methods in query efficiency.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Ahmadi, A., Behm, A., Honnalli, N., Li, C., Xie, X.: Hobbes: optimized gram-based methods for efficient read alignment. Nucleic Acids Res. 40, e41 (2012)CrossRef Ahmadi, A., Behm, A., Honnalli, N., Li, C., Xie, X.: Hobbes: optimized gram-based methods for efficient read alignment. Nucleic Acids Res. 40, e41 (2012)CrossRef
2.
go back to reference Kim, J., Li, C., Xie, X.: Improving read mapping using additional prefix grams. BMC Bioinf. 15(1), 42 (2014)CrossRef Kim, J., Li, C., Xie, X.: Improving read mapping using additional prefix grams. BMC Bioinf. 15(1), 42 (2014)CrossRef
3.
go back to reference Kim, J., Li, C., Xie, X.: Hobbes3: dynamic generation of variable-length signatures for efficient approximate subsequence mappings. In: ICDE 2016. IEEE (2016) Kim, J., Li, C., Xie, X.: Hobbes3: dynamic generation of variable-length signatures for efficient approximate subsequence mappings. In: ICDE 2016. IEEE (2016)
4.
go back to reference Yang, X., Wang, B., Li, C., Wang, J., Xie, X.: Efficient direct search on compressed genomic data. In: ICDE 2013, Brisbane, Australia, 8–12 April 2013, pp. 961–972 (2013) Yang, X., Wang, B., Li, C., Wang, J., Xie, X.: Efficient direct search on compressed genomic data. In: ICDE 2013, Brisbane, Australia, 8–12 April 2013, pp. 961–972 (2013)
5.
go back to reference Yang, X., Wang, Y., Wang, B., Wang, W.: Local filtering: improving the performance of approximate queries on string collections. In: SIGMOD 2015, pp. 377–392 (2015) Yang, X., Wang, Y., Wang, B., Wang, W.: Local filtering: improving the performance of approximate queries on string collections. In: SIGMOD 2015, pp. 377–392 (2015)
6.
go back to reference Qin, J., Wang, W., Xiao, C., Lu, Y., Lin, X., Wang, H.: Asymmetric signature schemes for efficient exact edit similarity query processing. ACM Trans. Database Syst. 38(3), 16 (2013)MathSciNetCrossRef Qin, J., Wang, W., Xiao, C., Lu, Y., Lin, X., Wang, H.: Asymmetric signature schemes for efficient exact edit similarity query processing. ACM Trans. Database Syst. 38(3), 16 (2013)MathSciNetCrossRef
7.
go back to reference Wang, J., Yang, X., Wang, B., Liu, C.: LS-Join: local similarity join on string collections. IEEE Trans. Knowl. Data Eng. 29(9), 1928–1942 (2017)CrossRef Wang, J., Yang, X., Wang, B., Liu, C.: LS-Join: local similarity join on string collections. IEEE Trans. Knowl. Data Eng. 29(9), 1928–1942 (2017)CrossRef
8.
go back to reference Siragusa, E., Weese, D., Reinert, K.: Fast and accurate read mapping with approximate seeds and multiple backtracking. Nucleic Acids Res. 41, e78 (2013)CrossRef Siragusa, E., Weese, D., Reinert, K.: Fast and accurate read mapping with approximate seeds and multiple backtracking. Nucleic Acids Res. 41, e78 (2013)CrossRef
9.
go back to reference Cheng, H., Jiang, H., Yang, J., Xu, Y., Shang, Y.: BitMapper: an efficient all-mapper based on bit-vector computing. BMC Bioinf. 16, 192 (2016)CrossRef Cheng, H., Jiang, H., Yang, J., Xu, Y., Shang, Y.: BitMapper: an efficient all-mapper based on bit-vector computing. BMC Bioinf. 16, 192 (2016)CrossRef
10.
go back to reference Langmead, B., Trapnell, C., Pop, M., Salzberg, S.: Ultrafast and memory-efficient alignment of short dna sequences to the human genome. Genome Biol. 10, r25 (2009)CrossRef Langmead, B., Trapnell, C., Pop, M., Salzberg, S.: Ultrafast and memory-efficient alignment of short dna sequences to the human genome. Genome Biol. 10, r25 (2009)CrossRef
11.
go back to reference Langmead, B., Salzberg, S.: Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012)CrossRef Langmead, B., Salzberg, S.: Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012)CrossRef
12.
go back to reference Li, H., Durbin, R.: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009)CrossRef Li, H., Durbin, R.: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009)CrossRef
13.
go back to reference Newkirk, D., Biesinger, J., Chon, A., Yokomori, K.: AREM: aligning short reads from ChIP-sequencing by expectation maximization. J. Comput. Biol. 18, 1495–1505 (2011)MathSciNetCrossRef Newkirk, D., Biesinger, J., Chon, A., Yokomori, K.: AREM: aligning short reads from ChIP-sequencing by expectation maximization. J. Comput. Biol. 18, 1495–1505 (2011)MathSciNetCrossRef
14.
go back to reference Roberts, A., Pachter, L.: Streaming fragment assignment for realtime analysis of sequencing experiments. Nat. Methods 10, 71–73 (2013)CrossRef Roberts, A., Pachter, L.: Streaming fragment assignment for realtime analysis of sequencing experiments. Nat. Methods 10, 71–73 (2013)CrossRef
Metadata
Title
Efficient Approximate Subsequence Matching Using Hybrid Signatures
Authors
Tao Qiu
Xiaochun Yang
Bin Wang
Yutong Han
Siyao Wang
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-91452-7_39

Premium Partner