Skip to main content
Erschienen in: Knowledge and Information Systems 3/2014

01.03.2014 | Regular Paper

DFSP: a Depth-First SPelling algorithm for sequential pattern mining of biological sequences

verfasst von: Vance Chiang-Chi Liao, Ming-Syan Chen

Erschienen in: Knowledge and Information Systems | Ausgabe 3/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Scientific progress in recent years has led to the generation of huge amounts of biological data, most of which remains unanalyzed. Mining the data may provide insights into various realms of biology, such as finding co-occurring biosequences, which are essential for biological data mining and analysis. Data mining techniques like sequential pattern mining may reveal implicitly meaningful patterns among the DNA or protein sequences. If biologists hope to unlock the potential of sequential pattern mining in their field, it is necessary to move away from traditional sequential pattern mining algorithms, because they have difficulty handling a small number of items and long sequences in biological data, such as gene and protein sequences. To address the problem, we propose an approach called Depth-First SPelling (DFSP) algorithm for mining sequential patterns in biological sequences. The algorithm’s processing speed is faster than that of PrefixSpan, its leading competitor, and it is superior to other sequential pattern mining algorithms for biological sequences.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Achar A, Laxman S, Sastry PS (2011) A unified view of the apriori-based algorithms for frequent episode discovery. Knowl Inf Syst 31: 223–250 Achar A, Laxman S, Sastry PS (2011) A unified view of the apriori-based algorithms for frequent episode discovery. Knowl Inf Syst 31: 223–250
2.
Zurück zum Zitat Agrawal R and Srikant R (1995) Mining sequential patterns. Proceedings of the 11th international conference on data, engineering, pp 3–14 Agrawal R and Srikant R (1995) Mining sequential patterns. Proceedings of the 11th international conference on data, engineering, pp 3–14
3.
Zurück zum Zitat Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410 Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410
4.
Zurück zum Zitat Aseervatham S, Osmani A, Viennet E (2006) bitSPADE: A lattice-based sequential pattern mining algorithm using bitmap representation. In: Proceedings of 6th international conference on data mining, pp 792–797 Aseervatham S, Osmani A, Viennet E (2006) bitSPADE: A lattice-based sequential pattern mining algorithm using bitmap representation. In: Proceedings of 6th international conference on data mining, pp 792–797
5.
Zurück zum Zitat Ayres J, Gehrke J, Yiu T, Flannick J (2002), Sequential pattern mining using a bitmap representation. In: Proceedings of 8th ACM SIGKDD international conference on knowledge discovery and data mining, pp 429–435, July 2002 Ayres J, Gehrke J, Yiu T, Flannick J (2002), Sequential pattern mining using a bitmap representation. In: Proceedings of 8th ACM SIGKDD international conference on knowledge discovery and data mining, pp 429–435, July 2002
6.
Zurück zum Zitat Bajcsy P, Han J, Liu L, Young J (2004) Survey of biodata analysis from a data mining perspective. Wang JTL, Zaki MJ, Toivonen HTT, and Shasha D (eds) Data Mining in Bioinformatics, Chapter 2. Springer, pp 9–39 Bajcsy P, Han J, Liu L, Young J (2004) Survey of biodata analysis from a data mining perspective. Wang JTL, Zaki MJ, Toivonen HTT, and Shasha D (eds) Data Mining in Bioinformatics, Chapter 2. Springer, pp 9–39
8.
Zurück zum Zitat Chen M-S, Han J, Yu PS (1996) Data mining: an overview from database perspective. IEEE Trans Knowl Data Eng 5(1):866–883CrossRef Chen M-S, Han J, Yu PS (1996) Data mining: an overview from database perspective. IEEE Trans Knowl Data Eng 5(1):866–883CrossRef
9.
Zurück zum Zitat Chen Y-C, Peng W-C, Lee S-Y (2011) CEMiner—An efficient algorithms for mining closed patterns from interval-based data. In: Proceedings of the 11th IEEE international conference on data mining (ICDM). Vancouver, Canada, pp 121–130, Dec 11–14 Chen Y-C, Peng W-C, Lee S-Y (2011) CEMiner—An efficient algorithms for mining closed patterns from interval-based data. In: Proceedings of the 11th IEEE international conference on data mining (ICDM). Vancouver, Canada, pp 121–130, Dec 11–14
10.
Zurück zum Zitat Cheng H, Yan X, Han J (2004) Incspan: incremental mining of sequential patterns in large database. In: Proceedings of 10th ACM SIGKDD international conference on knowledge discovery and data mining, pp 527–532 Cheng H, Yan X, Han J (2004) Incspan: incremental mining of sequential patterns in large database. In: Proceedings of 10th ACM SIGKDD international conference on knowledge discovery and data mining, pp 527–532
11.
Zurück zum Zitat Han J (2002) How can data mining help bio-data analysis? In: Proceedings of the workshop on data mining in bioinformatics (BIOKDD’02 with SIGKDD’02 conference. Edmonton, Canada), pp 1–4 Han J (2002) How can data mining help bio-data analysis? In: Proceedings of the workshop on data mining in bioinformatics (BIOKDD’02 with SIGKDD’02 conference. Edmonton, Canada), pp 1–4
12.
Zurück zum Zitat Han J, Kamber M (2000) Data mining: concepts and techniques. Morgan Kaufmann, San Francisco Han J, Kamber M (2000) Data mining: concepts and techniques. Morgan Kaufmann, San Francisco
13.
Zurück zum Zitat Han J, Pei J, Mortazavi-Asl B, Chen Q, Dayal U, Hsu MC (2000) Freespan: frequent pattern-projected sequential pattern mining. In: Proceedings of the 6th ACM SIGKDD international conference on knowledge discovery and data mining, pp 355–359 Han J, Pei J, Mortazavi-Asl B, Chen Q, Dayal U, Hsu MC (2000) Freespan: frequent pattern-projected sequential pattern mining. In: Proceedings of the 6th ACM SIGKDD international conference on knowledge discovery and data mining, pp 355–359
14.
Zurück zum Zitat Han J, Pei J, Yan X (2005) Sequential pattern mining by pattern-growth: principles and extensions. In: Chu WW, lIN TY (eds) Recent advances in data mining and granular computing. Springer, Berlin, pp 183–220 Han J, Pei J, Yan X (2005) Sequential pattern mining by pattern-growth: principles and extensions. In: Chu WW, lIN TY (eds) Recent advances in data mining and granular computing. Springer, Berlin, pp 183–220
15.
Zurück zum Zitat Hirosawa M, Totoki Y, Hoshida M, Ishikawa M (1995) Comprehensive study on iterative algorithms of multiple sequence alignment. Bioinformatics 11:13–18CrossRef Hirosawa M, Totoki Y, Hoshida M, Ishikawa M (1995) Comprehensive study on iterative algorithms of multiple sequence alignment. Bioinformatics 11:13–18CrossRef
16.
Zurück zum Zitat Ho C-C, Li H-F, Kuo F-F, Lee S-Y (2006) Incremental mining of sequential patterns over a stream sliding window. In: Proceedings of IEEE international workshop on mining evolving and streaming data (IWMESD-2006), pp 677–681, Dec 2006 Ho C-C, Li H-F, Kuo F-F, Lee S-Y (2006) Incremental mining of sequential patterns over a stream sliding window. In: Proceedings of IEEE international workshop on mining evolving and streaming data (IWMESD-2006), pp 677–681, Dec 2006
17.
Zurück zum Zitat Hsu C-M, Chen C-Y, Liu B-J (2006) MAGIIC-PRO: detecting functional signatures by efficient discovery of long patterns in protein sequences. Nucleic Acids Res, W356–W361 Hsu C-M, Chen C-Y, Liu B-J (2006) MAGIIC-PRO: detecting functional signatures by efficient discovery of long patterns in protein sequences. Nucleic Acids Res, W356–W361
18.
Zurück zum Zitat Hsu C-M, Chen CY, Liu BJ, Huang CC, Laio MH, Lin CC, Wu TL (2007) Identification of hot regions in protein-protein interactions by sequential pattern mining. BMC Bioinform 8(Suppl. 5):S8. doi:10.1186/1471-2105-8-S5-S8 CrossRef Hsu C-M, Chen CY, Liu BJ, Huang CC, Laio MH, Lin CC, Wu TL (2007) Identification of hot regions in protein-protein interactions by sequential pattern mining. BMC Bioinform 8(Suppl. 5):S8. doi:10.​1186/​1471-2105-8-S5-S8 CrossRef
19.
Zurück zum Zitat Huang J-W, Tseng C-Y, Ou J-C, Chen M-S (2008) A General Model for Sequential Pattern Mining with a Progressive Database. IEEE Trans Knowl Data Eng, 20: 1153–1167, 11 Feb 2008 Huang J-W, Tseng C-Y, Ou J-C, Chen M-S (2008) A General Model for Sequential Pattern Mining with a Progressive Database. IEEE Trans Knowl Data Eng, 20: 1153–1167, 11 Feb 2008
20.
Zurück zum Zitat Jones N, Pevzner P (2004) An introduction to bioinformatics algorithms. MIT Press, Cambridge Jones N, Pevzner P (2004) An introduction to bioinformatics algorithms. MIT Press, Cambridge
21.
Zurück zum Zitat Lin M-Y, Lee S-Y (2004) Incremental update on sequential patterns in large databases by implicit merging and efficient counting. Inf Syst 29(5):385–404CrossRefMathSciNet Lin M-Y, Lee S-Y (2004) Incremental update on sequential patterns in large databases by implicit merging and efficient counting. Inf Syst 29(5):385–404CrossRefMathSciNet
22.
Zurück zum Zitat Luo C, Chung SM (2005) Efficient mining of maximal sequential patterns using multiple samples. In: Proceedings of the 5th SIAM international conference on data mining (SDM’05), pp 415–426 Luo C, Chung SM (2005) Efficient mining of maximal sequential patterns using multiple samples. In: Proceedings of the 5th SIAM international conference on data mining (SDM’05), pp 415–426
23.
Zurück zum Zitat Mampaey M, Tatti N, Vreeken J (2011) Tell me what I need to know: succinctly summarizing data with itemsets. In: Proceedings of 17th ACM SIGKDD international conference on knowledge discovery and data mining, pp 573–581 Mampaey M, Tatti N, Vreeken J (2011) Tell me what I need to know: succinctly summarizing data with itemsets. In: Proceedings of 17th ACM SIGKDD international conference on knowledge discovery and data mining, pp 573–581
24.
Zurück zum Zitat Marascu A, Masseglia F (2006) Mining sequential patterns from data streams: a centroid approach. J Intell Inf Syst (JIIS) 27(3):291–307 Marascu A, Masseglia F (2006) Mining sequential patterns from data streams: a centroid approach. J Intell Inf Syst (JIIS) 27(3):291–307
25.
Zurück zum Zitat Nguyen S, Sun X, Orlowska M (2005) Improvements of IncSpan: incremental mining of sequential patterns in large database. In: Proceedings of the 9th Pacific-Asia conference on knowledge discovery and data mining, pp 442–451 Nguyen S, Sun X, Orlowska M (2005) Improvements of IncSpan: incremental mining of sequential patterns in large database. In: Proceedings of the 9th Pacific-Asia conference on knowledge discovery and data mining, pp 442–451
26.
Zurück zum Zitat Parthasarathy S, Zaki MJ, Ogihara M, Dwarkadas S (1999) Incremental and interactive sequence mining. In: Proceedings of the 8th international conference on information and, knowledge management, pp 251–258 Parthasarathy S, Zaki MJ, Ogihara M, Dwarkadas S (1999) Incremental and interactive sequence mining. In: Proceedings of the 8th international conference on information and, knowledge management, pp 251–258
27.
Zurück zum Zitat Pei J, Han J, Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu M (2004) Mining sequential patterns by pattern-growth: the PrefixSpan approach. IEEE Trans Knowl Data Eng, pp 1424–1440, Oct 2004 Pei J, Han J, Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu M (2004) Mining sequential patterns by pattern-growth: the PrefixSpan approach. IEEE Trans Knowl Data Eng, pp 1424–1440, Oct 2004
28.
Zurück zum Zitat Pei J, Han J, Lu H, Nishio S, Tang S, Yang D (2001) H-mine: hyper-structure mining of frequent patterns in large databases. In: Proceedings of the 2001 IEEE international conference on data mining (ICDM’01), San Jose, California, pp 441–448, Nov 29–Dec 2 Pei J, Han J, Lu H, Nishio S, Tang S, Yang D (2001) H-mine: hyper-structure mining of frequent patterns in large databases. In: Proceedings of the 2001 IEEE international conference on data mining (ICDM’01), San Jose, California, pp 441–448, Nov 29–Dec 2
29.
Zurück zum Zitat Pei J, Han J, Mortazavi-Asl B, Zhu H (2000) Mining access patterns efficiently from Web logs. In: Proceedings of the 2000 Pacific-Asia conference on knowledge discovery and data mining (PAKDD’00). Kyoto, Japan, April, pp 396–407 Pei J, Han J, Mortazavi-Asl B, Zhu H (2000) Mining access patterns efficiently from Web logs. In: Proceedings of the 2000 Pacific-Asia conference on knowledge discovery and data mining (PAKDD’00). Kyoto, Japan, April, pp 396–407
30.
Zurück zum Zitat Pei J, Han J, Wang W (2007) Constraint-based sequential pattern mining: the pattern-growth methods. J Intell Inf Syst 28(2):133–160 Pei J, Han J, Wang W (2007) Constraint-based sequential pattern mining: the pattern-growth methods. J Intell Inf Syst 28(2):133–160
31.
Zurück zum Zitat Rajpathak D, Chougule R, Bandyopadhyay P (2011) A domain-specific decision support system for knowledge discovery using association and text mining. Knowl Inf Syst, pp 405–432 Rajpathak D, Chougule R, Bandyopadhyay P (2011) A domain-specific decision support system for knowledge discovery using association and text mining. Knowl Inf Syst, pp 405–432
32.
Zurück zum Zitat Salam A, Sikandar Hayat Khayal M (2011) Mining top-k frequent patterns without minimum support threshold. Knowl Inf Syst, pp 57–86 Salam A, Sikandar Hayat Khayal M (2011) Mining top-k frequent patterns without minimum support threshold. Knowl Inf Syst, pp 57–86
33.
Zurück zum Zitat Senkul P, Salin S (2011) Improving pattern quality in web usage mining by using semantic information. Knowl Inf Syst, pp 527–541 Senkul P, Salin S (2011) Improving pattern quality in web usage mining by using semantic information. Knowl Inf Syst, pp 527–541
34.
Zurück zum Zitat Srikant R and Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: Proceedings of 5th international conference extending database technology (EDBT), vol 1057, Springer, pp 3–17 Srikant R and Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: Proceedings of 5th international conference extending database technology (EDBT), vol 1057, Springer, pp 3–17
35.
Zurück zum Zitat Wang J, Han J (2004) Bide: efficient mining of frequent closed sequences. In: Proceedings of the 20th international conference on data, engineering, pp 79–91 Wang J, Han J (2004) Bide: efficient mining of frequent closed sequences. In: Proceedings of the 20th international conference on data, engineering, pp 79–91
36.
Zurück zum Zitat Wang X, Li G, Jiang G, Shi Z (2011) Semantic trajectory-based event detection and event pattern mining. Knowl Inf Syst, pp 1–25 Wang X, Li G, Jiang G, Shi Z (2011) Semantic trajectory-based event detection and event pattern mining. Knowl Inf Syst, pp 1–25
37.
Zurück zum Zitat Wang K, Xu Y, Yu J (2004) Scalable sequential pattern mining for biological sequences. In: Proceedings of conference information and, knowledge management, pp 178–187 Wang K, Xu Y, Yu J (2004) Scalable sequential pattern mining for biological sequences. In: Proceedings of conference information and, knowledge management, pp 178–187
38.
Zurück zum Zitat Yan X, Han J, Afshar R (2003) Clospan: Mining closed sequential patterns in large datasets. In: Proceedings of the 3rd SIAM international conference on data mining (SDM’03), pp 166–177, May 2003 Yan X, Han J, Afshar R (2003) Clospan: Mining closed sequential patterns in large datasets. In: Proceedings of the 3rd SIAM international conference on data mining (SDM’03), pp 166–177, May 2003
39.
Zurück zum Zitat Yang J, Wang W, Yu PS, Han J (2002) Mining long sequential patterns in a noisy environment, In: Proceedings 2002 ACM-SIGMOD I international conference. Management of data (SIGMOD ’02), pp 406–417, June 2002 Yang J, Wang W, Yu PS, Han J (2002) Mining long sequential patterns in a noisy environment, In: Proceedings 2002 ACM-SIGMOD I international conference. Management of data (SIGMOD ’02), pp 406–417, June 2002
40.
Zurück zum Zitat Zaki MJ (1998) Efficient enumeration of frequent sequences. In: Proceedings of the 7th international conference on information and knowledge management, pp 68–75, Nov 1998 Zaki MJ (1998) Efficient enumeration of frequent sequences. In: Proceedings of the 7th international conference on information and knowledge management, pp 68–75, Nov 1998
41.
Zurück zum Zitat Zaki MJ (2001) SPADE: an efficient algorithm for mining frequent sequences. Machine learning 42(1/2):31–60CrossRefMATH Zaki MJ (2001) SPADE: an efficient algorithm for mining frequent sequences. Machine learning 42(1/2):31–60CrossRefMATH
Metadaten
Titel
DFSP: a Depth-First SPelling algorithm for sequential pattern mining of biological sequences
verfasst von
Vance Chiang-Chi Liao
Ming-Syan Chen
Publikationsdatum
01.03.2014
Verlag
Springer London
Erschienen in
Knowledge and Information Systems / Ausgabe 3/2014
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-012-0602-x

Weitere Artikel der Ausgabe 3/2014

Knowledge and Information Systems 3/2014 Zur Ausgabe