Skip to main content
Erschienen in: International Journal of Parallel Programming 2/2018

19.01.2017

Hierarchical Pattern Mining with the Automata Processor

verfasst von: Ke Wang, Elaheh Sadredini, Kevin Skadron

Erschienen in: International Journal of Parallel Programming | Ausgabe 2/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Mining complex patterns with hierarchical structures becomes more and more important to understand the underlying information in large and unstructured databases. When compared with a set-mining problem or a string-mining problem, the computation complexity to recognize a pattern with hierarchical structure, and the large associated search space, make hierarchical pattern mining (HPM) extremely expensive on conventional processor architectures. We propose a flexible, hardware-accelerated framework for mining hierarchical patterns with Apriori-based algorithms, which leads to multi-pass pruning strategies but exposes massive parallelism. Under this framework, we implemented two widely used HPM techniques, sequential pattern mining (SPM) and disjunctive rule mining (DRM) on the Automata Processor (AP), a hardware implementation of non-deterministic finite automata (NFAs). Two automaton-design strategies for matching and counting different types of hierarchical patterns, called linear design and reduction design, are proposed in this paper. To generalize automaton structure for SPM, the linear design strategy is proposed by flattening sequential patterns to plain strings to produce automaton design space and to minimize the overhead of reconfiguration. Up to 90\(\times \) and 29\(\times \) speedups are achieved by the AP-accelerated algorithm on six real-world datasets, when compared with the optimized multicore CPU and GPU GSP implementations, respectively. The proposed CPU-AP solution also outperforms the state-of-the-art PrefixSpan and SPADE algorithms on a multicore CPU by up to 452\(\times \) and 49\(\times \) speedups. The AP advantage grows further with larger datasets. For DRM, the reduction design strategy is adopted by applying reduction operation of AND, with on-chip Boolean units, on several parallel sub-structures for recognizing disjunctive items. This strategy allows implicit OR reduction on alternative items within a disjunctive item by utilizing bit-wise parallelism feature of the on-chip state units. The experiments show up to 614\(\times \) speedups of the proposed CPU-AP DRM solution over a sequential CPU algorithm on two real-world datasets. The experiments also show significant increase of CPU matching-and-counting time when increasing d-rule size or the number of alternative items. However, in practical cases, the AP solution runs hundreds of times faster in matching and counting than the CPU solution, and keeps constant processing time despite the increasing complexity of disjunctive rules.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
3.
Zurück zum Zitat Aggarwal, C.C., Han, J. (eds.): Frequent Pattern Mining. Springer, Cham (2014)MATH Aggarwal, C.C., Han, J. (eds.): Frequent Pattern Mining. Springer, Cham (2014)MATH
4.
Zurück zum Zitat Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proc. of the International Conference on Data Engineering (ICDE), IEEE, pp. 3–14 (1995) Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proc. of the International Conference on Data Engineering (ICDE), IEEE, pp. 3–14 (1995)
5.
Zurück zum Zitat Agrawal, R., Imieliński ,T., Swami, A.: Mining association rules between sets of items in large databases. In: Proc. of SIGMOD ’93 (1993) Agrawal, R., Imieliński ,T., Swami, A.: Mining association rules between sets of items in large databases. In: Proc. of SIGMOD ’93 (1993)
6.
Zurück zum Zitat Bo, C. et al.: Entity resolution acceleration using microns automata processor. In: Proc. of the International Conference on Big Data (BigData) (2016) Bo, C. et al.: Entity resolution acceleration using microns automata processor. In: Proc. of the International Conference on Big Data (BigData) (2016)
7.
Zurück zum Zitat Chiang, D.A., Wang, Y.F., Wang, Y.H., Chen, Z.Y., Hsu, M.H.: Mining disjunctive consequent association rules. Appl. Soft Comput. 11(2), 2129–2133 (2011)CrossRef Chiang, D.A., Wang, Y.F., Wang, Y.H., Chen, Z.Y., Hsu, M.H.: Mining disjunctive consequent association rules. Appl. Soft Comput. 11(2), 2129–2133 (2011)CrossRef
8.
Zurück zum Zitat Cong, S., Han, J., Hoeflinger, J., Padua, D.: A sampling-based framework for parallel data mining. In: Proc. of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), ACM (2005) Cong, S., Han, J., Hoeflinger, J., Padua, D.: A sampling-based framework for parallel data mining. In: Proc. of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), ACM (2005)
9.
Zurück zum Zitat Dlugosch, P., et al.: An efficient and scalable semiconductor architecture for parallel automata processing. IEEE Trans. Parallel Distrib. Syst. 25(12), 3088–3098 (2014)CrossRef Dlugosch, P., et al.: An efficient and scalable semiconductor architecture for parallel automata processing. IEEE Trans. Parallel Distrib. Syst. 25(12), 3088–3098 (2014)CrossRef
10.
Zurück zum Zitat Fang, W. et al.: Frequent itemset mining on graphics processors. In: Proc. International Workshop on Data Management on New Hardware (DaMoN) (2009) Fang, W. et al.: Frequent itemset mining on graphics processors. In: Proc. International Workshop on Data Management on New Hardware (DaMoN) (2009)
11.
Zurück zum Zitat Fournier-Viger, P., et al.: Spmf: a Java open-source pattern mining library. J. Mach. Learn. Res. 15, 3569–3573 (2014)MATH Fournier-Viger, P., et al.: Spmf: a Java open-source pattern mining library. J. Mach. Learn. Res. 15, 3569–3573 (2014)MATH
12.
Zurück zum Zitat Guralnik, V., Karypis, G.: Parallel tree-projection-based sequence mining algorithms. Parallel Comput. 30(4), 443–472 (2004)CrossRef Guralnik, V., Karypis, G.: Parallel tree-projection-based sequence mining algorithms. Parallel Comput. 30(4), 443–472 (2004)CrossRef
13.
Zurück zum Zitat Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proc. of SIGMOD ’00, ACM (2000) Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proc. of SIGMOD ’00, ACM (2000)
14.
Zurück zum Zitat Hryniów, K.: Parallel pattern mining-application of gsp algorithm for graphics processing units. In: Proc. of the International Carpathian Control Conference (ICCC), IEEE, pp. 233–236 (2012) Hryniów, K.: Parallel pattern mining-application of gsp algorithm for graphics processing units. In: Proc. of the International Carpathian Control Conference (ICCC), IEEE, pp. 233–236 (2012)
15.
Zurück zum Zitat Nanavati, A.A., Chitrapura, K.P., Joshi, S., Krishnapuram, R.: Mining generalised disjunctive association rules. In: Proc. of the Tenth International Conference on Information and Knowledge Management (CIKM), ACM, New York, NY, USA, pp. 482–489 (2001) Nanavati, A.A., Chitrapura, K.P., Joshi, S., Krishnapuram, R.: Mining generalised disjunctive association rules. In: Proc. of the Tenth International Conference on Information and Knowledge Management (CIKM), ACM, New York, NY, USA, pp. 482–489 (2001)
16.
Zurück zum Zitat Noyes, H.: Micron automata processor architecture: Reconfigurable and massively parallel automata processing. In: Proc. of the Fifth International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies, keynote presentation (2014) Noyes, H.: Micron automata processor architecture: Reconfigurable and massively parallel automata processing. In: Proc. of the Fifth International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies, keynote presentation (2014)
17.
Zurück zum Zitat Pei, J., et al.: Mining sequential patterns by pattern-growth: the prefixspan approach. IEEE Trans. Knowl. Data Eng. (TKDE) 16(11), 1424–1440 (2004)CrossRef Pei, J., et al.: Mining sequential patterns by pattern-growth: the prefixspan approach. IEEE Trans. Knowl. Data Eng. (TKDE) 16(11), 1424–1440 (2004)CrossRef
18.
Zurück zum Zitat Roy, I., Aluru, S.: Discovering motifs in biological sequences using the micron automata processor. IEEE/ACM Trans. Comput. Biol. Bioinform. 13(1), 99–111 (2016)CrossRef Roy, I., Aluru, S.: Discovering motifs in biological sequences using the micron automata processor. IEEE/ACM Trans. Comput. Biol. Bioinform. 13(1), 99–111 (2016)CrossRef
19.
Zurück zum Zitat Sampaio, M.C., Cardoso, O.H.B., Santos, GPD., Hattori, L.: Mining disjunctive association rules (2008) Sampaio, M.C., Cardoso, O.H.B., Santos, GPD., Hattori, L.: Mining disjunctive association rules (2008)
20.
Zurück zum Zitat Shintani, T., Kitsuregawa, M.: Mining algorithms for sequential patterns in parallel: Hash based approach. In: Proc. of the Second Pacific\(-\)Asia Conference on Knowledge Discovery and Data mining, pp. 283–294 (1998) Shintani, T., Kitsuregawa, M.: Mining algorithms for sequential patterns in parallel: Hash based approach. In: Proc. of the Second Pacific\(-\)Asia Conference on Knowledge Discovery and Data mining, pp. 283–294 (1998)
21.
Zurück zum Zitat Srikant, R., Agrawal, R.: Mining sequential patterns: Generalizations and performance improvements. In: Proc. of the International Conference on Extending Database Technology (EDBT) (1996) Srikant, R., Agrawal, R.: Mining sequential patterns: Generalizations and performance improvements. In: Proc. of the International Conference on Extending Database Technology (EDBT) (1996)
22.
Zurück zum Zitat Wang, K., Qi, Y., Fox, J., Stan, M., Skadron, K.: Association rule mining with the micron automata processor. In: Proc. of the IEEE International Parallel and Distributed Processing Symposium (IPDPS) (2015) Wang, K., Qi, Y., Fox, J., Stan, M., Skadron, K.: Association rule mining with the micron automata processor. In: Proc. of the IEEE International Parallel and Distributed Processing Symposium (IPDPS) (2015)
23.
Zurück zum Zitat Wang, K., Sadredini, E., Skadron, K.: Sequential pattern mining with the micron automata processor. In: Proc. of the ACM International Conference on Computing Frontiers, ACM, New York, NY, USA, CF ’16 (2016a) Wang, K., Sadredini, E., Skadron, K.: Sequential pattern mining with the micron automata processor. In: Proc. of the ACM International Conference on Computing Frontiers, ACM, New York, NY, USA, CF ’16 (2016a)
24.
Zurück zum Zitat Wang, M.H., et al.: Using the automata processor for fast pattern recognition in high energy physics experimentsa proof of concept. Nucl. Instrum. Methods Phys. Res. Sect. A: Accel., Spectrom., Detect. Assoc. Equip. 832, 219–230 (2016b)CrossRef Wang, M.H., et al.: Using the automata processor for fast pattern recognition in high energy physics experimentsa proof of concept. Nucl. Instrum. Methods Phys. Res. Sect. A: Accel., Spectrom., Detect. Assoc. Equip. 832, 219–230 (2016b)CrossRef
25.
Zurück zum Zitat Zaki, M.J.: Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. (TKDE) 12(3), 372–390 (2000)CrossRef Zaki, M.J.: Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. (TKDE) 12(3), 372–390 (2000)CrossRef
26.
Zurück zum Zitat Zaki, M.J.: Parallel sequence mining on shared-memory machines. J. Parallel Distrib. Comput. 61(3), 401–426 (2001a)CrossRefMATH Zaki, M.J.: Parallel sequence mining on shared-memory machines. J. Parallel Distrib. Comput. 61(3), 401–426 (2001a)CrossRefMATH
27.
Zurück zum Zitat Zaki, M.J.: Spade: an efficient algorithm for mining frequent sequences. Mach. Learn. 42(1–2), 31–60 (2001b)CrossRefMATH Zaki, M.J.: Spade: an efficient algorithm for mining frequent sequences. Mach. Learn. 42(1–2), 31–60 (2001b)CrossRefMATH
28.
Zurück zum Zitat Zhang, F., Zhang, Y., Bakos, J.D.: Accelerating frequent itemset mining on graphics processing units. J. Supercomput. 66(1), 94–117 (2013)CrossRef Zhang, F., Zhang, Y., Bakos, J.D.: Accelerating frequent itemset mining on graphics processing units. J. Supercomput. 66(1), 94–117 (2013)CrossRef
29.
Zurück zum Zitat Zu, Y. et al.: GPU-based NFA implementation for memory efficient high speed regular expression matching. In: Proc. of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), ACM, pp. 129–140 (2012) Zu, Y. et al.: GPU-based NFA implementation for memory efficient high speed regular expression matching. In: Proc. of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), ACM, pp. 129–140 (2012)
Metadaten
Titel
Hierarchical Pattern Mining with the Automata Processor
verfasst von
Ke Wang
Elaheh Sadredini
Kevin Skadron
Publikationsdatum
19.01.2017
Verlag
Springer US
Erschienen in
International Journal of Parallel Programming / Ausgabe 2/2018
Print ISSN: 0885-7458
Elektronische ISSN: 1573-7640
DOI
https://doi.org/10.1007/s10766-017-0489-y

Weitere Artikel der Ausgabe 2/2018

International Journal of Parallel Programming 2/2018 Zur Ausgabe