Skip to main content

01.10.2013

Accelerating frequent itemset mining on graphics processing units

verfasst von: Fan Zhang, Yan Zhang, Jason D. Bakos

Erschienen in: The Journal of Supercomputing | Ausgabe 1/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper we describe a new parallel Frequent Itemset Mining algorithm called “Frontier Expansion.” This implementation is optimized to achieve high performance on a heterogeneous platform consisting of a shared memory multiprocessor and multiple Graphics Processing Unit (GPU) coprocessors. Frontier Expansion is an improved data-parallel algorithm derived from the Equivalent Class Clustering (Eclat) method, in which a partial breadth-first search is utilized to exploit maximum parallelism while being constrained by the available memory capacity. In our approach, the vertical transaction lists are represented using a “bitset” representation and operated using wide bitwise operations across multiple threads on a GPU. We evaluate our approach using four NVIDIA Tesla GPUs and observed a 6–30× speedup relative to state-of-the-art sequential Eclat and FPGrowth implementations executed on a multicore CPU.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Agrawal R, Shafer JC (1996) Parallel mining of association rules. IEEE Trans Knowl Data Eng 8:962–969 CrossRef Agrawal R, Shafer JC (1996) Parallel mining of association rules. IEEE Trans Knowl Data Eng 8:962–969 CrossRef
2.
Zurück zum Zitat Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proc of 20th intl conf on VLDB, pp 487–499 Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proc of 20th intl conf on VLDB, pp 487–499
3.
Zurück zum Zitat Ansari E, Dastghaibifard G (2008) Distributed frequent itemset mining using trie data structure. Int J Comput Sci 35(3):377–381 Ansari E, Dastghaibifard G (2008) Distributed frequent itemset mining using trie data structure. Int J Comput Sci 35(3):377–381
4.
Zurück zum Zitat Aouad LM, Na L-k (2007) Distributed frequent itemsets mining in heterogeneous platforms. J Eng Comput Arch 1(2), ISSN: 1934–7197 Aouad LM, Na L-k (2007) Distributed frequent itemsets mining in heterogeneous platforms. J Eng Comput Arch 1(2), ISSN: 1934–7197
6.
Zurück zum Zitat Bodon F (2005) A trie-based APRIORI implementation for mining frequent item sequences. In: Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations, OSDM ’05. ACM Press, New York, pp 56–65 CrossRef Bodon F (2005) A trie-based APRIORI implementation for mining frequent item sequences. In: Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations, OSDM ’05. ACM Press, New York, pp 56–65 CrossRef
7.
Zurück zum Zitat Borgelt C (2003) Efficient implementations of apriori and eclat. In: Proc 1st IEEE ICDM workshop on frequent item set mining implementations (FIMI 2003), pp 90–99 Borgelt C (2003) Efficient implementations of apriori and eclat. In: Proc 1st IEEE ICDM workshop on frequent item set mining implementations (FIMI 2003), pp 90–99
8.
Zurück zum Zitat Borgelt C, Kruse R (2002) Induction of association rules: apriori implementation. In: 15th conference on computational statistics, pp 395–400 Borgelt C, Kruse R (2002) Induction of association rules: apriori implementation. In: 15th conference on computational statistics, pp 395–400
9.
Zurück zum Zitat Burdick D, Calimlim M (2001) Mafia: a maximal frequent itemset algorithm for transactional databases. In: Proceedings 17th international conference on data engineering, pp 443–452 CrossRef Burdick D, Calimlim M (2001) Mafia: a maximal frequent itemset algorithm for transactional databases. In: Proceedings 17th international conference on data engineering, pp 443–452 CrossRef
10.
Zurück zum Zitat Craus M (2008) A new parallel algorithm for the frequent itemset mining problem. In: International symposium on parallel and distributed computing, 2008, ISPDC ’08, pp 165–170 CrossRef Craus M (2008) A new parallel algorithm for the frequent itemset mining problem. In: International symposium on parallel and distributed computing, 2008, ISPDC ’08, pp 165–170 CrossRef
11.
Zurück zum Zitat Fang W, Lu M (2009) Frequent itemset mining on graphics processors. In: Proceedings of the fifth international workshop on data management on new hardware, DaMoN ’09. ACM Press, New York, pp 34–42 CrossRef Fang W, Lu M (2009) Frequent itemset mining on graphics processors. In: Proceedings of the fifth international workshop on data management on new hardware, DaMoN ’09. ACM Press, New York, pp 34–42 CrossRef
12.
Zurück zum Zitat Fiat A, Shporer S (2003) AIM: another itemset miner. In: IEEE ICDM workshop on frequent itemset mining implementations (FIMI’03) Fiat A, Shporer S (2003) AIM: another itemset miner. In: IEEE ICDM workshop on frequent itemset mining implementations (FIMI’03)
13.
Zurück zum Zitat Goethals B, Zaki MJ (2004) Advances in frequent itemset mining implementations: report on fimi’03. ACM SIGKDD Explor Newsl 6(1):109–117 CrossRef Goethals B, Zaki MJ (2004) Advances in frequent itemset mining implementations: report on fimi’03. ACM SIGKDD Explor Newsl 6(1):109–117 CrossRef
14.
Zurück zum Zitat Han J, Pei J (2004) Mining frequent patterns without candidate generation: a Frequent-Pattern tree approach. Data Min Knowl Discov 8:53–87 MathSciNetCrossRef Han J, Pei J (2004) Mining frequent patterns without candidate generation: a Frequent-Pattern tree approach. Data Min Knowl Discov 8:53–87 MathSciNetCrossRef
15.
Zurück zum Zitat Kosters WA, Pijls W (2003) APRIORI, a depth first implementation. In: Proc of the workshop on frequent itemset mining implementations Kosters WA, Pijls W (2003) APRIORI, a depth first implementation. In: Proc of the workshop on frequent itemset mining implementations
16.
Zurück zum Zitat Liu L, Li E (2007) Optimization of frequent itemset mining on Multiple-Core processor. In: VLDB ’07, pp 1275–1285 Liu L, Li E (2007) Optimization of frequent itemset mining on Multiple-Core processor. In: VLDB ’07, pp 1275–1285
17.
Zurück zum Zitat NVIDIA (2011) NVIDIA CUDA compute unified device architecture programming guide. NVIDIA, Santa Clara NVIDIA (2011) NVIDIA CUDA compute unified device architecture programming guide. NVIDIA, Santa Clara
18.
Zurück zum Zitat Parthasarathy S, Zaki MJ (1996) Parallel data mining for association rules on shared-memory multiprocessors. In: Proc Supercomputing’96, pp 43–64 Parthasarathy S, Zaki MJ (1996) Parallel data mining for association rules on shared-memory multiprocessors. In: Proc Supercomputing’96, pp 43–64
19.
Zurück zum Zitat Pramudiono I, Kitsuregawa M (2003) Parallel FP-Growth on PC cluster. In: Advances in knowledge discovery and data mining. Lecture notes in computer science, vol 2637. Springer, Berlin/Heidelberg, pp 467–473 CrossRef Pramudiono I, Kitsuregawa M (2003) Parallel FP-Growth on PC cluster. In: Advances in knowledge discovery and data mining. Lecture notes in computer science, vol 2637. Springer, Berlin/Heidelberg, pp 467–473 CrossRef
20.
Zurück zum Zitat Salvatore O, Claudio L (2003) kdci: a multi-strategy algorithm for mining frequent sets. In: Goethals B, Zaki MJ (eds) FIMI 03, frequent itemset mining implementations. Proceedings of the ICDM 2003 workshop on frequent itemset mining implementations, 19 December 2003, Melbourne, Florida, USA, CEUR-WS.org, CEUR workshop proceedings, vol 90 Salvatore O, Claudio L (2003) kdci: a multi-strategy algorithm for mining frequent sets. In: Goethals B, Zaki MJ (eds) FIMI 03, frequent itemset mining implementations. Proceedings of the ICDM 2003 workshop on frequent itemset mining implementations, 19 December 2003, Melbourne, Florida, USA, CEUR-WS.org, CEUR workshop proceedings, vol 90
21.
Zurück zum Zitat Sucahyo YG, Gopalan RP (2003) Efficiently mining frequent patterns from dense datasets using a cluster of computers. In: Australian conference on artificial intelligence’03, pp 233–244 Sucahyo YG, Gopalan RP (2003) Efficiently mining frequent patterns from dense datasets using a cluster of computers. In: Australian conference on artificial intelligence’03, pp 233–244
22.
Zurück zum Zitat Ye Y, Chiang C (2006) A parallel apriori algorithm for frequent itemsets mining. In: Fourth international conference on software engineering research, management and applications, pp 87–94 Ye Y, Chiang C (2006) A parallel apriori algorithm for frequent itemsets mining. In: Fourth international conference on software engineering research, management and applications, pp 87–94
23.
Zurück zum Zitat Zaki MJ, Gouda K (2003) Fast vertical mining using diffsets. In: Proc SIGKDD, pp 326–335 Zaki MJ, Gouda K (2003) Fast vertical mining using diffsets. In: Proc SIGKDD, pp 326–335
24.
Zurück zum Zitat Zaki MJ, Parthasarathyi S (1997) New algorithms for fast discovery of association rules. In: 3rd intl conf on knowledge discovery and data mining. AAAI Press, Menlo Park, pp 283–286 Zaki MJ, Parthasarathyi S (1997) New algorithms for fast discovery of association rules. In: 3rd intl conf on knowledge discovery and data mining. AAAI Press, Menlo Park, pp 283–286
25.
Zurück zum Zitat Zhang F, Zhang Y, Bakos J (2012) Gpapriori: Gpu-accelerated frequent itemset mining. In: IEEE international conference on cluster computing, pp 590–594 Zhang F, Zhang Y, Bakos J (2012) Gpapriori: Gpu-accelerated frequent itemset mining. In: IEEE international conference on cluster computing, pp 590–594
Metadaten
Titel
Accelerating frequent itemset mining on graphics processing units
verfasst von
Fan Zhang
Yan Zhang
Jason D. Bakos
Publikationsdatum
01.10.2013
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 1/2013
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-013-0887-x