Skip to main content
Erschienen in: International Journal of Machine Learning and Cybernetics 4/2019

15.11.2017 | Original Article

Two-stage pruning method for gram-based categorical sequence clustering

verfasst von: Liang Yuan, Wenjian Wang, Lifei Chen

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 4/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Gram-based vector space model has been extensively applied to categorical sequence clustering. However, there is a general lack of an efficient method to determine the length of grams and to identify redundant and non-significant grams involved in the model. In this paper, a variable-length gram model is proposed, different from previous studies mainly focused on the fixed-length grams of sequences. The variable-length grams are obtained using a two-stage pruning method aimed at selecting the irredundant and significant subsequences from the prefix trees, created from the fixed-length grams with an initially large length. A robust partitioning algorithm is then defined for categorical sequence clustering on the normalized representation model using variable-length grams collected from the pruned trees. Experimental results on real-world sequence sets from various domains are given to demonstrate the performance of the proposed methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Weitere Produktempfehlungen anzeigen
Literatur
1.
Zurück zum Zitat Xing Z, Pei J, Keogh E (2010) A brief survey on sequence classification. ACM SIGKDD Explor 12(1):40–48CrossRef Xing Z, Pei J, Keogh E (2010) A brief survey on sequence classification. ACM SIGKDD Explor 12(1):40–48CrossRef
2.
Zurück zum Zitat Kelil A, Wang S (2008) SCS: a new similarity measure for categorical sequences. In: Proceedings of the IEEE ICDM, pp 343–352 Kelil A, Wang S (2008) SCS: a new similarity measure for categorical sequences. In: Proceedings of the IEEE ICDM, pp 343–352
3.
Zurück zum Zitat Aggarwal CC (2015) Data mining: the textbook. Springer, BerlinMATH Aggarwal CC (2015) Data mining: the textbook. Springer, BerlinMATH
5.
Zurück zum Zitat Cao F, Yu L, Huang J, Liang J (2017) K-mw-modes: an algorithm for clustering categorical matrix-object data. Appl Soft Comput 57:605–614CrossRef Cao F, Yu L, Huang J, Liang J (2017) K-mw-modes: an algorithm for clustering categorical matrix-object data. Appl Soft Comput 57:605–614CrossRef
7.
Zurück zum Zitat Chen L (2014) EM-type method for measuring graph dissimilarity. Int J Mach Learn Cybern 5:625–633CrossRef Chen L (2014) EM-type method for measuring graph dissimilarity. Int J Mach Learn Cybern 5:625–633CrossRef
8.
Zurück zum Zitat Herranz J, Nin J, Sol\(\acute{e}\) M (2011) Optimal symbol alignment distance: a new distance for sequences of symbols. IEEE Trans Knowl Data Eng 23:1541–1554 Herranz J, Nin J, Sol\(\acute{e}\) M (2011) Optimal symbol alignment distance: a new distance for sequences of symbols. IEEE Trans Knowl Data Eng 23:1541–1554
9.
Zurück zum Zitat Song K, Ren J, Reinert G, Deng M, Waterman MS, Sun F (2014) New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing. Brief Bioinf 15(3):343–353CrossRef Song K, Ren J, Reinert G, Deng M, Waterman MS, Sun F (2014) New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing. Brief Bioinf 15(3):343–353CrossRef
10.
Zurück zum Zitat Wei D, Jiang Q, Wei Y, Wang S (2012) A novel hierarchical clustering algorithm for gene sequences. BMC Bioinf 13:174CrossRef Wei D, Jiang Q, Wei Y, Wang S (2012) A novel hierarchical clustering algorithm for gene sequences. BMC Bioinf 13:174CrossRef
11.
Zurück zum Zitat Yang J, Wang W (2003) CLUSEQ: efficient and effective sequence clustering. In: Proceedings of the IEEE ICDE, pp 101–112 Yang J, Wang W (2003) CLUSEQ: efficient and effective sequence clustering. In: Proceedings of the IEEE ICDE, pp 101–112
12.
Zurück zum Zitat Xiong T, Wang S, Jiang Q, Huang JZ (2014) A novel variable-order Markov model for clustering categorical sequences. IEEE Trans Knowl Data Eng 26:2339–2353CrossRef Xiong T, Wang S, Jiang Q, Huang JZ (2014) A novel variable-order Markov model for clustering categorical sequences. IEEE Trans Knowl Data Eng 26:2339–2353CrossRef
13.
Zurück zum Zitat Sbakan YC, Kurt B, Cemgil AT, Sankurc B (2014) Probabilistic sequence clustering with spectral learning. Digital Signal Process 29:1–19MathSciNetCrossRef Sbakan YC, Kurt B, Cemgil AT, Sankurc B (2014) Probabilistic sequence clustering with spectral learning. Digital Signal Process 29:1–19MathSciNetCrossRef
14.
Zurück zum Zitat Fink GA (2008) Markov models for pattern recognition: from theory to applications. Springer, New York, Berlin HeidelbergMATH Fink GA (2008) Markov models for pattern recognition: from theory to applications. Springer, New York, Berlin HeidelbergMATH
15.
Zurück zum Zitat Namiki Y, Ishida T, Akiyama Y (2013) Acceleration of sequence clustering using longest common subsequence filtering. BMC Bioinf 14(Suppl 8):S7CrossRef Namiki Y, Ishida T, Akiyama Y (2013) Acceleration of sequence clustering using longest common subsequence filtering. BMC Bioinf 14(Suppl 8):S7CrossRef
16.
Zurück zum Zitat Basu T, Murthy CA (2016) A supervised term selection technique for effective text categorization. Int J Mach Learn Cybern 7(5):877–892CrossRef Basu T, Murthy CA (2016) A supervised term selection technique for effective text categorization. Int J Mach Learn Cybern 7(5):877–892CrossRef
17.
Zurück zum Zitat Domeniconi C, Gunopulos S, Ma S, Yan B, Razgan MA, Papadopoulos D (2007) Locally adaptive metrics for clustering high dimensional data. Data Min Knowl Disc 14(1):63–97MathSciNetCrossRef Domeniconi C, Gunopulos S, Ma S, Yan B, Razgan MA, Papadopoulos D (2007) Locally adaptive metrics for clustering high dimensional data. Data Min Knowl Disc 14(1):63–97MathSciNetCrossRef
18.
Zurück zum Zitat Yuan L, Hong Z, Chen L, Cai Q (2016) Clustering categorical sequences with variable-length tuples representation, In: Proceedings of the KSEM, pp 15–27 Yuan L, Hong Z, Chen L, Cai Q (2016) Clustering categorical sequences with variable-length tuples representation, In: Proceedings of the KSEM, pp 15–27
19.
Zurück zum Zitat Bezdek JC (1998) Pattern recognition in handbook of fuzzy computation. IOP Publishing Ltd, Bristol Bezdek JC (1998) Pattern recognition in handbook of fuzzy computation. IOP Publishing Ltd, Bristol
20.
Zurück zum Zitat Wu D, Ren J (2017) Sequence clustering algorithm based on weighted vector identification. Int J Mach Learn Cybern 8(3):731–738CrossRef Wu D, Ren J (2017) Sequence clustering algorithm based on weighted vector identification. Int J Mach Learn Cybern 8(3):731–738CrossRef
21.
Zurück zum Zitat Loiselle S, Rouat J, Pressnitzer D, Thorpe S (2005) Exploration of rank order coding with spiking neural networks for speech recognition. Proc IEEE IJCNN 4:2076–2080 Loiselle S, Rouat J, Pressnitzer D, Thorpe S (2005) Exploration of rank order coding with spiking neural networks for speech recognition. Proc IEEE IJCNN 4:2076–2080
Metadaten
Titel
Two-stage pruning method for gram-based categorical sequence clustering
verfasst von
Liang Yuan
Wenjian Wang
Lifei Chen
Publikationsdatum
15.11.2017
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal of Machine Learning and Cybernetics / Ausgabe 4/2019
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-017-0744-y

Weitere Artikel der Ausgabe 4/2019

International Journal of Machine Learning and Cybernetics 4/2019 Zur Ausgabe

Neuer Inhalt