Top

Published in:

2020 | OriginalPaper | Chapter

Association Matrix Method and Its Applications in Mining DNA Sequences

Author : Guojun Mao

Published in: Advances in Artificial Intelligence, Software and Systems Engineering

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Many mining algorithms have been presented for business big data such as marketing baskets, but they cannot be effective or efficient for mining DNA sequences, any of which is typically with a small alphabet but a much long sizes. This paper will design a compact data structure called Association Matrix, and give an algorithm to specially mine long DNA sequences. The Association Matrix is novel in-memory data structure, which can be so compact that it can deal with super long DNA sequences in a limited memory spaces. Such, based on the Association Matrix structure, we can design the algorithms for efficiently mining key segments from DNA sequences. Additionally, we will show our related experiments and results in this paper.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Deep-Learned Artificial Intelligence and System-Informational Culture Ergonomics

next chapter Deep Learning-Based Real-Time Failure Detection of Storage Devices

Papapetrou, P., Benson, G., Kollios, G.: Mining poly-regions in DNA. Int. J. Data Min. Bioinform. 4, 406–428 (2012)CrossRef

Agrawal, R., Srikant, R.: Mining sequential patterns. In: The 1995 International Conference on Data Engineering, pp. 3–14. Taipei, Taiwan (1995)

Srikant, R., Agrawal, R.: Mining sequential patterns: generalizations and performance improvements. In: The 1996 International Conference on Extending Database Technology (EDBT), pp. 3–17 (1996)

Han, J., Pei, J.: Free-span: frequent pattern-projected sequential pattern mining. In: The 2000 International Conference on Knowledge Discovery and Data Mining, pp. 355–359 (2000)

Mohammed, J.: SPADE: an efficient algorithm for mining frequent sequences. J. Mach. Learn. 1, 31–60 (2001)MATH

Liu, C., Chen, L., Liu, Z., Tseng, V.: Effective peak alignment for mass spectrometry data analysis using two-phase clustering approach. Int. J. Data Min. Bioinf. 1, 52–66 (2014)CrossRef

Bell, D., Guan, J.: Data mining for motifs in DNA sequences. In: The 2003 Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing. LNCS, vol. 2639, pp. 507–514 (2003)

Liu, Z., Jiao, D., Sun, X.: Classifying genomic sequences by sequence feature analysis. Genomics Proteomics Bioinf. 4, 201–205 (2005)CrossRef

Habib, N., Kaplan, T., Margalit, H., Friedman, N.: A novel Bayesian DNA motif comparison method for clustering and retrieval. PLoS Comput. Biol. 4, 1–16 (2008)MathSciNetCrossRef

10.

Mannila, H., Toivonen, H., Verkamo, I.: Discovery of frequent episodes in event sequences. Data Min. Knowl. Discov. 1, 259–289 (1997)CrossRef

11.

Mannila, H., Salmenkivi, M.: Finding simple intensity descriptions from event sequence data. In: The 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 341–346 (2001)

12.

Keogh, E., Chu, S., Hart, D., Pazzani, M.: An online algorithm for segmenting time series. In: The 2001 IEEE International Conference on Data Mining, pp. 289–296 (2001)

13.

Stegmaier, P., Kel, A., Wingender, E., Borlak, J.: A discriminative approach for unsupervised clustering of DNA sequence motifs. PLoS Comput. Bio. 9, e1002958 (2013)MathSciNetCrossRef

14.

Wu, Y., Wang, L., Ren, J., Ding, W., Wu, X.: Mining sequential patterns with periodic wildcard gap. J. Appl. Intell. 41, 99–116 (2014)CrossRef

15.

Wang, K., Xu, Y., Yu, J.: Scalable sequential pattern mining for biological sequences. In: The 13th International Conference on Information and Knowledge Management, pp. 10–15 (2004)

Title: Association Matrix Method and Its Applications in Mining DNA Sequences
Author: Guojun Mao
Publisher: Springer International Publishing
Book: Advances in Artificial Intelligence, Software and Systems Engineering
Print ISBN: 978-3-030-20453-2

Electronic ISBN: 978-3-030-20454-9

Copyright Year: 2020
DOI: https://doi.org/10.1007/978-3-030-20454-9_15

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partners