Skip to main content
Erschienen in: The Journal of Supercomputing 2/2016

01.02.2016

A sparse memory allocation data structure for sequential and parallel association rule mining

verfasst von: Ömer M. Soysal, Eera Gupta, Harisha Donepudi

Erschienen in: The Journal of Supercomputing | Ausgabe 2/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, we present a sparse memory allocation data structure for sequential and parallel data mining. We explored three algorithms utilizing the proposed data structure: MASP-tree, apriori-TID, and FP-growth. We modified the data structure of apriori-TID and FP-growth algorithms to reduce memory allocation cost. Five data sets are used for comparison. The results show that the modified apriori-TID has a higher speed-up than the modified FP-growth when the proposed data structure is used. A maximum speed-up of 3.42 is observed when MASP algorithm is tested.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Agrawal A, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th VLDB conference, Santiago, Chile, pp 487–499 Agrawal A, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th VLDB conference, Santiago, Chile, pp 487–499
2.
Zurück zum Zitat Agrawal R, Shafer JC (1996) Parallel mining of association rules. IEEE Trans Knowl Data Eng 8(6):962–969CrossRef Agrawal R, Shafer JC (1996) Parallel mining of association rules. IEEE Trans Knowl Data Eng 8(6):962–969CrossRef
3.
Zurück zum Zitat Appice A, Ceci M, Turi A, Malerba D (2011) A parallel, distributed algorithm for relational frequent pattern discovery from very large data sets. Intell Data Anal 15:69–88 Appice A, Ceci M, Turi A, Malerba D (2011) A parallel, distributed algorithm for relational frequent pattern discovery from very large data sets. Intell Data Anal 15:69–88
6.
Zurück zum Zitat Cheung DW, Lee SD, Xiao Y (2002) Effect of data skewness and workload balance in parallel data mining. IEEE Trans Knowl Data Eng 14(3):498–514CrossRef Cheung DW, Lee SD, Xiao Y (2002) Effect of data skewness and workload balance in parallel data mining. IEEE Trans Knowl Data Eng 14(3):498–514CrossRef
8.
Zurück zum Zitat Fakhrahmad SM, Dastghaibyfard G (2011) An efficient frequent pattern mining method and its parallelization in transactional databases. J Inf Sci Eng 27:511–525 Fakhrahmad SM, Dastghaibyfard G (2011) An efficient frequent pattern mining method and its parallelization in transactional databases. J Inf Sci Eng 27:511–525
9.
Zurück zum Zitat Garg R, Mishra PK (2009) Some observations of sequential, parallel and distributed association rule mining algorithms. In: International Conference on Computer and Automation Engineering, pp 336–342. doi:10.1109/ICCAE.2009.28 Garg R, Mishra PK (2009) Some observations of sequential, parallel and distributed association rule mining algorithms. In: International Conference on Computer and Automation Engineering, pp 336–342. doi:10.​1109/​ICCAE.​2009.​28
10.
11.
Zurück zum Zitat Haglin D, Mayes KR, Manning AM, Feo J, Gurd JR, Elliot M, Keane JA (2009) Factors affecting the performance of parallel mining of minimal unique itemsets on diverse architectures. Concurr Comput Pract Exp 21(9):1131–1158CrossRef Haglin D, Mayes KR, Manning AM, Feo J, Gurd JR, Elliot M, Keane JA (2009) Factors affecting the performance of parallel mining of minimal unique itemsets on diverse architectures. Concurr Comput Pract Exp 21(9):1131–1158CrossRef
12.
Zurück zum Zitat Han E-H, Karypis G, Kumar V (2000) Scalable parallel data mining for association rules. IEEE Trans Knowl Data Eng 12(3):337–352CrossRef Han E-H, Karypis G, Kumar V (2000) Scalable parallel data mining for association rules. IEEE Trans Knowl Data Eng 12(3):337–352CrossRef
13.
Zurück zum Zitat Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8:53–87CrossRefMathSciNet Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8:53–87CrossRefMathSciNet
14.
Zurück zum Zitat HSRG (2014) Highway Safety Research Group HSRG (2014) Highway Safety Research Group
15.
Zurück zum Zitat Javed A, Khokhar A (2004) Frequent pattern mining on message passing multiprocessor systems. Distrib Parallel Databases 16(3):321–334CrossRef Javed A, Khokhar A (2004) Frequent pattern mining on message passing multiprocessor systems. Distrib Parallel Databases 16(3):321–334CrossRef
16.
Zurück zum Zitat Kambadur P, Ghoting A, Gupta A, Lumsdaine A (2012) Extending task parallelism for frequent pattern mining. CoRR, abs/1211.1658. arXiv:1211.1658v1[cs.DC] Kambadur P, Ghoting A, Gupta A, Lumsdaine A (2012) Extending task parallelism for frequent pattern mining. CoRR, abs/1211.1658. arXiv:​1211.​1658v1[cs.DC]
17.
Zurück zum Zitat Kambadur P, Gupta A, Ghoting A, Avron H, Lumsdaine A (2009) PFunc: modern task parallelism for modern high performance computing. Proc Conf High Perform Comput Netw Storage Anal. doi:10.1145/1654059.1654103 Kambadur P, Gupta A, Ghoting A, Avron H, Lumsdaine A (2009) PFunc: modern task parallelism for modern high performance computing. Proc Conf High Perform Comput Netw Storage Anal. doi:10.​1145/​1654059.​1654103
19.
Zurück zum Zitat Liu L, Li E, Zhang Y, Tang Z (2007) Optimization of frequent itemset mining on multiple-core processor. In: Proceedings of the 33rd international conference on very large data bases, pp 1275–1285 Liu L, Li E, Zhang Y, Tang Z (2007) Optimization of frequent itemset mining on multiple-core processor. In: Proceedings of the 33rd international conference on very large data bases, pp 1275–1285
20.
Zurück zum Zitat Negrevergne B, Termier A, Mehaut J, Uno T (2010) Discovering closed frequent itemsets on multicore: parallelizing computations and optimizing memory accesses. In: IEEE international conference on high performance computing and simulation (HPCS), pp 521–528 Negrevergne B, Termier A, Mehaut J, Uno T (2010) Discovering closed frequent itemsets on multicore: parallelizing computations and optimizing memory accesses. In: IEEE international conference on high performance computing and simulation (HPCS), pp 521–528
21.
Zurück zum Zitat Nguyen D, Vo B, Le B (2014) Efficient strategies for parallel mining class association rules. Expert Syst Appl 41(10):4716–4729CrossRef Nguyen D, Vo B, Le B (2014) Efficient strategies for parallel mining class association rules. Expert Syst Appl 41(10):4716–4729CrossRef
22.
Zurück zum Zitat Ozkural E, Ucar B, Aykanat C (2011) Parallel frequent item set mining with selective item replication. IEEE Trans Parallel Distrib Syst 22(10):1632–1640CrossRef Ozkural E, Ucar B, Aykanat C (2011) Parallel frequent item set mining with selective item replication. IEEE Trans Parallel Distrib Syst 22(10):1632–1640CrossRef
23.
Zurück zum Zitat Shanthi MM, Irudhayaraj AA (2009) Multithreading—an efficient technique for enhancing application performance. Int J Recent Trends Eng 165–167 Shanthi MM, Irudhayaraj AA (2009) Multithreading—an efficient technique for enhancing application performance. Int J Recent Trends Eng 165–167
24.
25.
Zurück zum Zitat Sohrabi MK, Barforoush AA (2013) Parallel frequent itemset mining using systolic arrays. Knowl Based Syst 37:462–471CrossRef Sohrabi MK, Barforoush AA (2013) Parallel frequent itemset mining using systolic arrays. Knowl Based Syst 37:462–471CrossRef
26.
Zurück zum Zitat Souliou D, Pagourtzis A, Drosinos N, Tsanakas P (2006) Computing frequent itemsets in parallel using partial support trees. J Syst Softw 79(12):1735–1743CrossRef Souliou D, Pagourtzis A, Drosinos N, Tsanakas P (2006) Computing frequent itemsets in parallel using partial support trees. J Syst Softw 79(12):1735–1743CrossRef
27.
Zurück zum Zitat Soysal ÖM (2015) Association rule mining with mostly associated sequential patterns. Expert Syst Appl 42(5):2582–2592CrossRef Soysal ÖM (2015) Association rule mining with mostly associated sequential patterns. Expert Syst Appl 42(5):2582–2592CrossRef
28.
Zurück zum Zitat Strack B, DeShazo JP, Gennings C, Olmo JL, Ventura S (2014) Impact of hba1c measurement on hospital readmission rates: analysis of 70,000 clinical database patient records. BioMed Res Int. doi:10.1155/2014/781670 Strack B, DeShazo JP, Gennings C, Olmo JL, Ventura S (2014) Impact of hba1c measurement on hospital readmission rates: analysis of 70,000 clinical database patient records. BioMed Res Int. doi:10.​1155/​2014/​781670
30.
Zurück zum Zitat Yu KM, Zhou J (2010) Parallel TID-based frequent pattern mining algorithm on a PC Cluster and grid computing system. Expert Syst Appl 37(3):2486–2494CrossRefMathSciNet Yu KM, Zhou J (2010) Parallel TID-based frequent pattern mining algorithm on a PC Cluster and grid computing system. Expert Syst Appl 37(3):2486–2494CrossRefMathSciNet
31.
Zurück zum Zitat Yu K-M, Zhou J, Hong T-P, Zhou J-L (2010) A load-balanced distributed parallel mining algorithm. Expert Syst Appl 37(3):2459–2464CrossRef Yu K-M, Zhou J, Hong T-P, Zhou J-L (2010) A load-balanced distributed parallel mining algorithm. Expert Syst Appl 37(3):2459–2464CrossRef
32.
Zurück zum Zitat Zaki M, Parthasarathy S, Ogihara M (1997) Parallel algorithms for discovery of association rules. Data Min Knowl Discov 1:343–373CrossRef Zaki M, Parthasarathy S, Ogihara M (1997) Parallel algorithms for discovery of association rules. Data Min Knowl Discov 1:343–373CrossRef
33.
Zurück zum Zitat Zaki MJ (1999) Parallel and distributed association mining: a survey. IEEE Concurr 7(4):14–25CrossRef Zaki MJ (1999) Parallel and distributed association mining: a survey. IEEE Concurr 7(4):14–25CrossRef
Metadaten
Titel
A sparse memory allocation data structure for sequential and parallel association rule mining
verfasst von
Ömer M. Soysal
Eera Gupta
Harisha Donepudi
Publikationsdatum
01.02.2016
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 2/2016
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-015-1566-x

Weitere Artikel der Ausgabe 2/2016

The Journal of Supercomputing 2/2016 Zur Ausgabe