Skip to main content

2016 | OriginalPaper | Buchkapitel

Clustering Categorical Sequences with Variable-Length Tuples Representation

verfasst von : Liang Yuan, Zhiling Hong, Lifei Chen, Qiang Cai

Erschienen in: Knowledge Science, Engineering and Management

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Clustering categorical sequences is currently a difficult problem due to the lack of an efficient representation model for sequences. Unlike the existing models, which mainly focus on the fixed-length tuples representation, in this paper, a new representation model on the variable-length tuples is proposed. The variable-length tuples are obtained using a pruning method applied to delete the redundant tuples from the suffix tree, which is created for the fixed-length tuples with a large memory-length of sequences, in terms of the entropy-based measure evaluating the redundancy of tuples. A partitioning algorithm for clustering categorical sequences is then defined based on the normalized representation using tuples collected from the pruned tree. Experimental studies on six real-world sequence sets show the effectiveness and suitability of the proposed method for subsequence-based clustering.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Xu, R., Wunsch, D.C.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16, 645–678 (2005)CrossRef Xu, R., Wunsch, D.C.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16, 645–678 (2005)CrossRef
3.
Zurück zum Zitat Yang, J., Wang, W.: CLUSEQ: Efficient and effective sequence clustering. In: Proceedings of IEEE ICDE, pp. 101–112 (2003) Yang, J., Wang, W.: CLUSEQ: Efficient and effective sequence clustering. In: Proceedings of IEEE ICDE, pp. 101–112 (2003)
4.
Zurück zum Zitat Dong, G., Pei, J.: Classification, clustering, features and distances of sequence data. Seq. Data Min. 33, 47–65 (2007)CrossRef Dong, G., Pei, J.: Classification, clustering, features and distances of sequence data. Seq. Data Min. 33, 47–65 (2007)CrossRef
5.
Zurück zum Zitat Kelil, A., Wang, S.: SCS: a new similarity measure for categorical sequences. In: Proceedings of IEEE ICDM, pp. 343–352 (2008) Kelil, A., Wang, S.: SCS: a new similarity measure for categorical sequences. In: Proceedings of IEEE ICDM, pp. 343–352 (2008)
6.
Zurück zum Zitat Vinga, S., Almeida, J.: Alignment-free sequence comparison: a review. Bioinformatics 19, 513–523 (2003)CrossRef Vinga, S., Almeida, J.: Alignment-free sequence comparison: a review. Bioinformatics 19, 513–523 (2003)CrossRef
7.
Zurück zum Zitat Leopold, E., Kindermann, J.: Text categorization with support vector machines: how to represent texts in input space? Mach. Learn. 46, 423–444 (2002)CrossRefMATH Leopold, E., Kindermann, J.: Text categorization with support vector machines: how to represent texts in input space? Mach. Learn. 46, 423–444 (2002)CrossRefMATH
8.
9.
Zurück zum Zitat Wei, D., Jiang, Q., Wei, Y., Wang, S.: A novel hierarchical clustering algorithm for gene sequences. BMC Bioinform. 13, 174 (2012)CrossRef Wei, D., Jiang, Q., Wei, Y., Wang, S.: A novel hierarchical clustering algorithm for gene sequences. BMC Bioinform. 13, 174 (2012)CrossRef
10.
Zurück zum Zitat Xiong, T., Wang, S., Jiang, Q., Huang, J.Z.: A novel variable-order Markov model for clustering categorical sequences. IEEE Trans. Knowl. Data Eng. 26, 2339–2353 (2014)CrossRef Xiong, T., Wang, S., Jiang, Q., Huang, J.Z.: A novel variable-order Markov model for clustering categorical sequences. IEEE Trans. Knowl. Data Eng. 26, 2339–2353 (2014)CrossRef
11.
Zurück zum Zitat Jing, L., Ng, M.K., Huang, J.Z.: An entropy weighting k-means algorithm for subspace clustering of high-dimensinoal sparse data. IEEE Trans. Knowl. Data Eng. 19, 1–16 (2007)CrossRef Jing, L., Ng, M.K., Huang, J.Z.: An entropy weighting k-means algorithm for subspace clustering of high-dimensinoal sparse data. IEEE Trans. Knowl. Data Eng. 19, 1–16 (2007)CrossRef
12.
Zurück zum Zitat Chen, L., Jiang, Q., Wang, S.: Model-based method for projective clustering. IEEE Trans. Knowl. Data Eng. 24, 1291–1305 (2012)CrossRef Chen, L., Jiang, Q., Wang, S.: Model-based method for projective clustering. IEEE Trans. Knowl. Data Eng. 24, 1291–1305 (2012)CrossRef
13.
Zurück zum Zitat Herranz, J., Nin, J.: Sol\(\acute{e}\) M.: optimal symbol alignment distance: a new distance for sequences of symbols. IEEE Trans. Knowl. Data Eng. 23, 1541–1554 (2011)CrossRef Herranz, J., Nin, J.: Sol\(\acute{e}\) M.: optimal symbol alignment distance: a new distance for sequences of symbols. IEEE Trans. Knowl. Data Eng. 23, 1541–1554 (2011)CrossRef
14.
Zurück zum Zitat Chen, L.: EM-type method for measuring graph dissimilarity. Int. J. Mach. Learn. Cybern. 5, 625–633 (2014)CrossRef Chen, L.: EM-type method for measuring graph dissimilarity. Int. J. Mach. Learn. Cybern. 5, 625–633 (2014)CrossRef
15.
Zurück zum Zitat Wu, T.J., Burke, J.P., Davison, D.B.: A measure of DNA sequence dissimilarity based on Mahalanobis distance between frequencies of words. Biometrics. 53, 1431–1439 (1997)MathSciNetCrossRefMATH Wu, T.J., Burke, J.P., Davison, D.B.: A measure of DNA sequence dissimilarity based on Mahalanobis distance between frequencies of words. Biometrics. 53, 1431–1439 (1997)MathSciNetCrossRefMATH
16.
Zurück zum Zitat Wu, T., Fan, Y., Hong, Z., Chen, L.: Subspace clustering on mobile data for discovering circle of friends. In: Zhang, S., Wirsing, M., Zhang, Z. (eds.) KSEM 2015. LNCS (LNAI), vol. 9403, pp. 703–711. Springer, Heidelberg (2015). doi:10.1007/978-3-319-25159-2_64 CrossRef Wu, T., Fan, Y., Hong, Z., Chen, L.: Subspace clustering on mobile data for discovering circle of friends. In: Zhang, S., Wirsing, M., Zhang, Z. (eds.) KSEM 2015. LNCS (LNAI), vol. 9403, pp. 703–711. Springer, Heidelberg (2015). doi:10.​1007/​978-3-319-25159-2_​64 CrossRef
17.
Zurück zum Zitat Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27, 1226–1238 (2005)CrossRef Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27, 1226–1238 (2005)CrossRef
18.
Zurück zum Zitat Loiselle, S., Rouat, J., Pressnitzer, D., Thorpe, S.: Exploration of rank order coding with spiking neural networks for speech recognition. Proc. IEEE IJCNN 4, 2076–2080 (2005) Loiselle, S., Rouat, J., Pressnitzer, D., Thorpe, S.: Exploration of rank order coding with spiking neural networks for speech recognition. Proc. IEEE IJCNN 4, 2076–2080 (2005)
Metadaten
Titel
Clustering Categorical Sequences with Variable-Length Tuples Representation
verfasst von
Liang Yuan
Zhiling Hong
Lifei Chen
Qiang Cai
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-47650-6_2

Premium Partner