Skip to main content
Top

2020 | OriginalPaper | Chapter

Association Matrix Method and Its Applications in Mining DNA Sequences

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Many mining algorithms have been presented for business big data such as marketing baskets, but they cannot be effective or efficient for mining DNA sequences, any of which is typically with a small alphabet but a much long sizes. This paper will design a compact data structure called Association Matrix, and give an algorithm to specially mine long DNA sequences. The Association Matrix is novel in-memory data structure, which can be so compact that it can deal with super long DNA sequences in a limited memory spaces. Such, based on the Association Matrix structure, we can design the algorithms for efficiently mining key segments from DNA sequences. Additionally, we will show our related experiments and results in this paper.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Papapetrou, P., Benson, G., Kollios, G.: Mining poly-regions in DNA. Int. J. Data Min. Bioinform. 4, 406–428 (2012)CrossRef Papapetrou, P., Benson, G., Kollios, G.: Mining poly-regions in DNA. Int. J. Data Min. Bioinform. 4, 406–428 (2012)CrossRef
2.
go back to reference Agrawal, R., Srikant, R.: Mining sequential patterns. In: The 1995 International Conference on Data Engineering, pp. 3–14. Taipei, Taiwan (1995) Agrawal, R., Srikant, R.: Mining sequential patterns. In: The 1995 International Conference on Data Engineering, pp. 3–14. Taipei, Taiwan (1995)
3.
go back to reference Srikant, R., Agrawal, R.: Mining sequential patterns: generalizations and performance improvements. In: The 1996 International Conference on Extending Database Technology (EDBT), pp. 3–17 (1996) Srikant, R., Agrawal, R.: Mining sequential patterns: generalizations and performance improvements. In: The 1996 International Conference on Extending Database Technology (EDBT), pp. 3–17 (1996)
4.
go back to reference Han, J., Pei, J.: Free-span: frequent pattern-projected sequential pattern mining. In: The 2000 International Conference on Knowledge Discovery and Data Mining, pp. 355–359 (2000) Han, J., Pei, J.: Free-span: frequent pattern-projected sequential pattern mining. In: The 2000 International Conference on Knowledge Discovery and Data Mining, pp. 355–359 (2000)
5.
go back to reference Mohammed, J.: SPADE: an efficient algorithm for mining frequent sequences. J. Mach. Learn. 1, 31–60 (2001)MATH Mohammed, J.: SPADE: an efficient algorithm for mining frequent sequences. J. Mach. Learn. 1, 31–60 (2001)MATH
6.
go back to reference Liu, C., Chen, L., Liu, Z., Tseng, V.: Effective peak alignment for mass spectrometry data analysis using two-phase clustering approach. Int. J. Data Min. Bioinf. 1, 52–66 (2014)CrossRef Liu, C., Chen, L., Liu, Z., Tseng, V.: Effective peak alignment for mass spectrometry data analysis using two-phase clustering approach. Int. J. Data Min. Bioinf. 1, 52–66 (2014)CrossRef
7.
go back to reference Bell, D., Guan, J.: Data mining for motifs in DNA sequences. In: The 2003 Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing. LNCS, vol. 2639, pp. 507–514 (2003) Bell, D., Guan, J.: Data mining for motifs in DNA sequences. In: The 2003 Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing. LNCS, vol. 2639, pp. 507–514 (2003)
8.
go back to reference Liu, Z., Jiao, D., Sun, X.: Classifying genomic sequences by sequence feature analysis. Genomics Proteomics Bioinf. 4, 201–205 (2005)CrossRef Liu, Z., Jiao, D., Sun, X.: Classifying genomic sequences by sequence feature analysis. Genomics Proteomics Bioinf. 4, 201–205 (2005)CrossRef
9.
go back to reference Habib, N., Kaplan, T., Margalit, H., Friedman, N.: A novel Bayesian DNA motif comparison method for clustering and retrieval. PLoS Comput. Biol. 4, 1–16 (2008)MathSciNetCrossRef Habib, N., Kaplan, T., Margalit, H., Friedman, N.: A novel Bayesian DNA motif comparison method for clustering and retrieval. PLoS Comput. Biol. 4, 1–16 (2008)MathSciNetCrossRef
10.
go back to reference Mannila, H., Toivonen, H., Verkamo, I.: Discovery of frequent episodes in event sequences. Data Min. Knowl. Discov. 1, 259–289 (1997)CrossRef Mannila, H., Toivonen, H., Verkamo, I.: Discovery of frequent episodes in event sequences. Data Min. Knowl. Discov. 1, 259–289 (1997)CrossRef
11.
go back to reference Mannila, H., Salmenkivi, M.: Finding simple intensity descriptions from event sequence data. In: The 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 341–346 (2001) Mannila, H., Salmenkivi, M.: Finding simple intensity descriptions from event sequence data. In: The 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 341–346 (2001)
12.
go back to reference Keogh, E., Chu, S., Hart, D., Pazzani, M.: An online algorithm for segmenting time series. In: The 2001 IEEE International Conference on Data Mining, pp. 289–296 (2001) Keogh, E., Chu, S., Hart, D., Pazzani, M.: An online algorithm for segmenting time series. In: The 2001 IEEE International Conference on Data Mining, pp. 289–296 (2001)
13.
go back to reference Stegmaier, P., Kel, A., Wingender, E., Borlak, J.: A discriminative approach for unsupervised clustering of DNA sequence motifs. PLoS Comput. Bio. 9, e1002958 (2013)MathSciNetCrossRef Stegmaier, P., Kel, A., Wingender, E., Borlak, J.: A discriminative approach for unsupervised clustering of DNA sequence motifs. PLoS Comput. Bio. 9, e1002958 (2013)MathSciNetCrossRef
14.
go back to reference Wu, Y., Wang, L., Ren, J., Ding, W., Wu, X.: Mining sequential patterns with periodic wildcard gap. J. Appl. Intell. 41, 99–116 (2014)CrossRef Wu, Y., Wang, L., Ren, J., Ding, W., Wu, X.: Mining sequential patterns with periodic wildcard gap. J. Appl. Intell. 41, 99–116 (2014)CrossRef
15.
go back to reference Wang, K., Xu, Y., Yu, J.: Scalable sequential pattern mining for biological sequences. In: The 13th International Conference on Information and Knowledge Management, pp. 10–15 (2004) Wang, K., Xu, Y., Yu, J.: Scalable sequential pattern mining for biological sequences. In: The 13th International Conference on Information and Knowledge Management, pp. 10–15 (2004)
Metadata
Title
Association Matrix Method and Its Applications in Mining DNA Sequences
Author
Guojun Mao
Copyright Year
2020
DOI
https://doi.org/10.1007/978-3-030-20454-9_15

Premium Partners