Skip to main content
Erschienen in: Cluster Computing 3/2019

09.01.2018

Mining distinguishing subsequence patterns with nonoverlapping condition

verfasst von: Youxi Wu, Yuehua Wang, Jingyu Liu, Ming Yu, Jing Liu, Yan Li

Erschienen in: Cluster Computing | Sonderheft 3/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Distinguishing subsequence patterns mining aims to discover the differences between different categories of sequence databases and to express characteristics of classes. It plays an important role in biomedicine, feature information selection, time-series classification, and other areas. The existing distinguishing subsequence patterns mining only focuses on whether a pattern appears in a sequence, regardless of the number of occurrences of the pattern in the sequence and the proportion of the pattern in the entire sequence database, which affects the discovery of the distinguishing patterns when there are a large number of irrelevant occurrences. Therefore, the nonoverlapping conditional distinguishing subsequence patterns mining algorithm is proposed. In this paper, we focus on the number of nonoverlapping occurrences that effectively reduce the number of irrelevant or redundant occurrences, and in this way, the number of occurrences can be better grasped. At the same time, we use a specially designed data structure, namely, a Nettree, to avoid backtracking. In addition, we use the distinguishing patterns as classification features, and carry out classification experiments on DNA sequences and time-series data with two classes. Extensive experimental results and comparisons demonstrate the efficiency of the proposed algorithm and the correctness of the feature extraction.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Malarvizhi, S.P., Sathiyabhama, B.: Frequent pagesets from web log by enhanced weighted association rule mining. Clust. Comput. 19(1), 269–277 (2016)CrossRef Malarvizhi, S.P., Sathiyabhama, B.: Frequent pagesets from web log by enhanced weighted association rule mining. Clust. Comput. 19(1), 269–277 (2016)CrossRef
2.
Zurück zum Zitat Ding, B., Lo, D., Han, J., et al.: Efficient mining of closed repetitive gapped subsequences from a sequence database. In: IEEE 25th International Conference on Data Engineering, pp. 1024–1035 (2009) Ding, B., Lo, D., Han, J., et al.: Efficient mining of closed repetitive gapped subsequences from a sequence database. In: IEEE 25th International Conference on Data Engineering, pp. 1024–1035 (2009)
3.
Zurück zum Zitat Zhang, S., Du, Z., Wang, J.T.: New techniques for mining frequent patterns in unordered trees. IEEE Trans. Cybern. 45(6), 1113–1125 (2015)CrossRef Zhang, S., Du, Z., Wang, J.T.: New techniques for mining frequent patterns in unordered trees. IEEE Trans. Cybern. 45(6), 1113–1125 (2015)CrossRef
4.
Zurück zum Zitat Tan, C., Min, F., Wang, M., et al.: Discovering patterns with weak-wildcard gaps. IEEE Access 4, 4922–4932 (2016)CrossRef Tan, C., Min, F., Wang, M., et al.: Discovering patterns with weak-wildcard gaps. IEEE Access 4, 4922–4932 (2016)CrossRef
5.
Zurück zum Zitat Feng, Y., Ji, M., Xiao, J., et al.: Mining spatial-temporal patterns and structural sparsity for human motion data denoising. IEEE Trans. Cybern. 45(12), 2693–2706 (2015)CrossRef Feng, Y., Ji, M., Xiao, J., et al.: Mining spatial-temporal patterns and structural sparsity for human motion data denoising. IEEE Trans. Cybern. 45(12), 2693–2706 (2015)CrossRef
6.
Zurück zum Zitat Ji, X., Bailey, J., Dong, G.: Mining minimal distinguishing subsequence patterns with gap constraints. Knowl. Inf. Syst. 11(3), 259–286 (2007)CrossRef Ji, X., Bailey, J., Dong, G.: Mining minimal distinguishing subsequence patterns with gap constraints. Knowl. Inf. Syst. 11(3), 259–286 (2007)CrossRef
7.
Zurück zum Zitat Wu, Y., Wang, L., Ren, J., et al.: Mining sequential patterns with periodic wildcard gaps. Appl. Intell. 41(1), 99–116 (2014)CrossRef Wu, Y., Wang, L., Ren, J., et al.: Mining sequential patterns with periodic wildcard gaps. Appl. Intell. 41(1), 99–116 (2014)CrossRef
8.
Zurück zum Zitat Chou, C., Jea, K., Liao, H.: A syntactic approach to twig-query matching on XML streams. J. Syst. Softw. 84(6), 993–1007 (2011)CrossRef Chou, C., Jea, K., Liao, H.: A syntactic approach to twig-query matching on XML streams. J. Syst. Softw. 84(6), 993–1007 (2011)CrossRef
9.
Zurück zum Zitat Cole, J., Chai, B., Farris, R., et al.: The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis. Nucleic Acids Res. 33(suppl_1), D294–D296 (2005)CrossRef Cole, J., Chai, B., Farris, R., et al.: The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis. Nucleic Acids Res. 33(suppl_1), D294–D296 (2005)CrossRef
10.
Zurück zum Zitat Li, C., Yang, Q., Wang, J., et al.: Efficient mining of gap-constrained subsequences and its various applications. ACM Trans. Knowl. Discov. Data 6(1), 2 (2012)CrossRef Li, C., Yang, Q., Wang, J., et al.: Efficient mining of gap-constrained subsequences and its various applications. ACM Trans. Knowl. Discov. Data 6(1), 2 (2012)CrossRef
11.
Zurück zum Zitat Ghosh, S., Feng, M., Nguyen, H., et al.: Risk prediction for acute hypotensive patients by using gap constrained sequential contrast patterns. In: AMIA Annual Symposium Proceedings, pp. 1748–1757. American Medical Informatics Association (2014) Ghosh, S., Feng, M., Nguyen, H., et al.: Risk prediction for acute hypotensive patients by using gap constrained sequential contrast patterns. In: AMIA Annual Symposium Proceedings, pp. 1748–1757. American Medical Informatics Association (2014)
12.
Zurück zum Zitat Drory Retwitzer, M., Polishchuk, M., Churkin, E., et al.: RNAPattMatch: a web server for RNA sequence/structure motif detection based on pattern matching with flexible gaps. Nucleic Acids Res. 43(W1), W507–W512 (2015)CrossRef Drory Retwitzer, M., Polishchuk, M., Churkin, E., et al.: RNAPattMatch: a web server for RNA sequence/structure motif detection based on pattern matching with flexible gaps. Nucleic Acids Res. 43(W1), W507–W512 (2015)CrossRef
13.
Zurück zum Zitat Wang, X., Duan, L., Dong, G., et al.: Efficient mining of density-aware distinguishing sequential patterns with gap constraints. In: International Conference on Database Systems for Advanced Applications, pp. 372–387. Springer, Cham (2014)CrossRef Wang, X., Duan, L., Dong, G., et al.: Efficient mining of density-aware distinguishing sequential patterns with gap constraints. In: International Conference on Database Systems for Advanced Applications, pp. 372–387. Springer, Cham (2014)CrossRef
14.
Zurück zum Zitat Yang, H., Duan, L., Hu, B., et al.: Mining Top-k distinguishing sequential patterns with gap constraint. J. Softw. 26(11), 2994–3009 (2015). (in Chinese)MathSciNetMATH Yang, H., Duan, L., Hu, B., et al.: Mining Top-k distinguishing sequential patterns with gap constraint. J. Softw. 26(11), 2994–3009 (2015). (in Chinese)MathSciNetMATH
15.
Zurück zum Zitat Wang, H., Duan, L., Zuo, J., et al.: Efficient mining of distinguishing sequential patterns without a predefined gap constraint. Chin. J. Comput. 39(10), 1979–1991 (2016). (in Chinese)MathSciNet Wang, H., Duan, L., Zuo, J., et al.: Efficient mining of distinguishing sequential patterns without a predefined gap constraint. Chin. J. Comput. 39(10), 1979–1991 (2016). (in Chinese)MathSciNet
17.
Zurück zum Zitat Min, F., Wu, Y., Wu, X.: The Apriori property of sequence pattern mining with wildcard gaps. Int. J. Funct. Inform. Pers. Med. 4(1), 15–31 (2012) Min, F., Wu, Y., Wu, X.: The Apriori property of sequence pattern mining with wildcard gaps. Int. J. Funct. Inform. Pers. Med. 4(1), 15–31 (2012)
18.
Zurück zum Zitat Zhang, M., Kao, B., Cheung, D., et al.: Mining periodic patterns with gap requirement from sequences. ACM Trans. Knowl. Discov. Data 1(2), 7 (2007)CrossRef Zhang, M., Kao, B., Cheung, D., et al.: Mining periodic patterns with gap requirement from sequences. ACM Trans. Knowl. Discov. Data 1(2), 7 (2007)CrossRef
19.
Zurück zum Zitat Zhang, L., Luo, P., Tang, L., et al.: Occupancy-based frequent pattern mining. ACM Trans. Knowl. Discov. Data (TKDD) 10(2), 14 (2015) Zhang, L., Luo, P., Tang, L., et al.: Occupancy-based frequent pattern mining. ACM Trans. Knowl. Discov. Data (TKDD) 10(2), 14 (2015)
20.
Zurück zum Zitat Wu, Y., Liu, D., Jiang, H.: Length-changeable incremental extreme learning machine. J. Comput. Sci. Technol. 32(3), 630–643 (2017)MathSciNetCrossRef Wu, Y., Liu, D., Jiang, H.: Length-changeable incremental extreme learning machine. J. Comput. Sci. Technol. 32(3), 630–643 (2017)MathSciNetCrossRef
21.
Zurück zum Zitat Egho, E., Gay, D., Boulle, M., et al.: A parameter-free approach for mining robust sequential classification rules. Knowl. Inf. Syst. 52(1), 53–81 (2017)CrossRef Egho, E., Gay, D., Boulle, M., et al.: A parameter-free approach for mining robust sequential classification rules. Knowl. Inf. Syst. 52(1), 53–81 (2017)CrossRef
22.
Zurück zum Zitat Wu, Y., Shen, C., Jiang, H., et al.: Strict pattern matching under non-overlapping condition. Sci. China Inf. Sci. 60(1), 012101 (2017)CrossRef Wu, Y., Shen, C., Jiang, H., et al.: Strict pattern matching under non-overlapping condition. Sci. China Inf. Sci. 60(1), 012101 (2017)CrossRef
23.
24.
Zurück zum Zitat Wu, Y., Wu, X., Min, F., et al.: A Nettree for pattern matching with flexible wildcard constraints. In: International Conference on Information Reuse and Integration, pp. 109–114 (2010) Wu, Y., Wu, X., Min, F., et al.: A Nettree for pattern matching with flexible wildcard constraints. In: International Conference on Information Reuse and Integration, pp. 109–114 (2010)
25.
Zurück zum Zitat Wu, Y., Tang, Z., Jiang, H., et al.: Approximate pattern matching with gap constraints. J. Inf. Sci. 42(5), 639–658 (2016)CrossRef Wu, Y., Tang, Z., Jiang, H., et al.: Approximate pattern matching with gap constraints. J. Inf. Sci. 42(5), 639–658 (2016)CrossRef
26.
Zurück zum Zitat Wu, Y., Fu, S., Jiang, H., et al.: Strict approximate pattern matching with general gaps. Appl. Intell. 42(3), 566–580 (2015)CrossRef Wu, Y., Fu, S., Jiang, H., et al.: Strict approximate pattern matching with general gaps. Appl. Intell. 42(3), 566–580 (2015)CrossRef
27.
Zurück zum Zitat Fradkin, D., Mörchen, F.: Mining sequential patterns for classification. Knowl. Inf. Syst. 45(3), 731–749 (2015)CrossRef Fradkin, D., Mörchen, F.: Mining sequential patterns for classification. Knowl. Inf. Syst. 45(3), 731–749 (2015)CrossRef
28.
Zurück zum Zitat Zhou, C., Cule, B., Goethals, B.: Pattern based sequence classification. IEEE Trans. Knowl. Data Eng. 28(5), 1285–1298 (2016)CrossRef Zhou, C., Cule, B., Goethals, B.: Pattern based sequence classification. IEEE Trans. Knowl. Data Eng. 28(5), 1285–1298 (2016)CrossRef
29.
Zurück zum Zitat Fong, S., Wong, R., Vasilakos, A.: Accelerated PSO swarm search feature selection for data stream mining big data. IEEE Trans. Serv. Comput. 9(1), 33–45 (2016) Fong, S., Wong, R., Vasilakos, A.: Accelerated PSO swarm search feature selection for data stream mining big data. IEEE Trans. Serv. Comput. 9(1), 33–45 (2016)
Metadaten
Titel
Mining distinguishing subsequence patterns with nonoverlapping condition
verfasst von
Youxi Wu
Yuehua Wang
Jingyu Liu
Ming Yu
Jing Liu
Yan Li
Publikationsdatum
09.01.2018
Verlag
Springer US
Erschienen in
Cluster Computing / Ausgabe Sonderheft 3/2019
Print ISSN: 1386-7857
Elektronische ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-017-1671-0

Weitere Artikel der Sonderheft 3/2019

Cluster Computing 3/2019 Zur Ausgabe