Skip to main content
Erschienen in: Cluster Computing 2/2019

03.01.2018

Multi-level dataset decomposition for parallel frequent itemset mining on a cluster of personal computers

verfasst von: Chun-Hong Huang, Yungho Leu

Erschienen in: Cluster Computing | Sonderheft 2/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Frequent Itemset mining is time consuming for large datasets. Many parallel frequent itemset mining algorithms have been proposed to speed up the mining process. This paper presents a parallel frequent itemset mining algorithm on a cluster of personal computers. To facilitate parallel frequent itemset mining, we use prefix path based method to decompose a transactional dataset into its frequent 1-itemset sub-datasets. We called the parallel frequent itemset mining algorithm based on the frequent 1-itemset sub-dataset decomposition the single-level parallel frequent itemset mining algorithm (SLPFIM) in our PC cluster platform. To mitigate the bottleneck caused by time-consuming 1-itemset sub-datasets, we propose a multi-level parallel frequent itemset mining (MLPFIM) algorithm to further decompose the time-consuming 1-itemset sub-datasets into their corresponding sub-sub-datasets. The fine granule of the sub-sub-datasets enhances the load balancing in parallel frequent itemset mining. The experimental results showed that the SLPFIM offered a maximum of 11.9x speedup over the non-parallel execution of the FP-Growth algorithm while the MLPFIM achieved a maximum of 23.1x speedup over the non-parallel execution of the FP-Growth algorithm. The experimental results also showed that the MLPFIM offered a maximum of 2.14x speedup over the SLPFIM.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. Paper presented at the ACM SIGMOD (1993) Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. Paper presented at the ACM SIGMOD (1993)
2.
Zurück zum Zitat Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. ACM SIGMOD Record 29, 1–12 (2000) Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. ACM SIGMOD Record 29, 1–12 (2000)
3.
Zurück zum Zitat Zaki, M.J.: Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. 12(3), 372–390 (2000) Zaki, M.J.: Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. 12(3), 372–390 (2000)
4.
Zurück zum Zitat Wur, S.-Y., Leu, Y.: An effective boolean algorithm for mining association rules in large databases. Paper presented at the sixth international conference on database systems for advanced applications, Hsinchu, Taiwan (1999) Wur, S.-Y., Leu, Y.: An effective boolean algorithm for mining association rules in large databases. Paper presented at the sixth international conference on database systems for advanced applications, Hsinchu, Taiwan (1999)
5.
Zurück zum Zitat Zaiane, O.R., El-Hajj, M., Lu, P.: Fast parallel association rule mining without candidacy generation. Paper presented at the data mining, 2001. Proceedings IEEE International Conference on ICDM 2001 (2001) Zaiane, O.R., El-Hajj, M., Lu, P.: Fast parallel association rule mining without candidacy generation. Paper presented at the data mining, 2001. Proceedings IEEE International Conference on ICDM 2001 (2001)
6.
Zurück zum Zitat Dong, J., Han, M.: BitTableFI: an efficient mining frequent itemsets algorithm. Knowl.-Based Syst. 20(4), 329–335 (2007) Dong, J., Han, M.: BitTableFI: an efficient mining frequent itemsets algorithm. Knowl.-Based Syst. 20(4), 329–335 (2007)
7.
Zurück zum Zitat Grahne, G., Zhu, J.: Efficiently using prefix-trees in mining frequent itemsets. Paper presented at the workshop frequent item set mining implementations (FIMI 2003). Melbourne (2003) Grahne, G., Zhu, J.: Efficiently using prefix-trees in mining frequent itemsets. Paper presented at the workshop frequent item set mining implementations (FIMI 2003). Melbourne (2003)
8.
Zurück zum Zitat Rácz, B.: Nonordfp: an FP-growth variation without rebuilding the FP-tree. paper presented at the 2nd international workshop on frequent itemset mining implementations (FIMI 2004), Brighton (2004) Rácz, B.: Nonordfp: an FP-growth variation without rebuilding the FP-tree. paper presented at the 2nd international workshop on frequent itemset mining implementations (FIMI 2004), Brighton (2004)
10.
Zurück zum Zitat Javed, A., Khokhar, A.: Frequent pattern mining on message passing multiprocessor systems. Distrib. Parallel Databases 16, 321–334 (2004) Javed, A., Khokhar, A.: Frequent pattern mining on message passing multiprocessor systems. Distrib. Parallel Databases 16, 321–334 (2004)
11.
Zurück zum Zitat Fang, W., Lu, M., Xiao, X., He, B., Luo, Q.: Frequent itemset mining on graphics processors. Paper presented at the DaMoN ’09 Proceedings of the fifth international workshop on data management on new hardware Providence, RI, USA, June 28–28 (2009) Fang, W., Lu, M., Xiao, X., He, B., Luo, Q.: Frequent itemset mining on graphics processors. Paper presented at the DaMoN ’09 Proceedings of the fifth international workshop on data management on new hardware Providence, RI, USA, June 28–28 (2009)
12.
Zurück zum Zitat Zhang, F., Zhang, Y., Bakos, J.D.: Accelerating frequent itemset mining on graphics processing units. J. Supercomput. (2013) Zhang, F., Zhang, Y., Bakos, J.D.: Accelerating frequent itemset mining on graphics processing units. J. Supercomput. (2013)
13.
Zurück zum Zitat Zhou, J., Kun-Ming, Y., Bin-Chang, W.: Parallel frequent patterns mining algorithm on GPU. Paper presented at the systems man, 10–13 Oct. 2010 (2010) Zhou, J., Kun-Ming, Y., Bin-Chang, W.: Parallel frequent patterns mining algorithm on GPU. Paper presented at the systems man, 10–13 Oct. 2010 (2010)
14.
Zurück zum Zitat Özdogan, G., Abul, O.: Task-parallel FP-growth on cluster computers. In: Gelenbe, E., Lent, R., Sakellari, G., Sacan, A., Toroslu, H., Yazici, A. (eds.) Computer and Information Sciences, vol. 62, pp. 383–388. Springer, Dordrecht (2010) Özdogan, G., Abul, O.: Task-parallel FP-growth on cluster computers. In: Gelenbe, E., Lent, R., Sakellari, G., Sacan, A., Toroslu, H., Yazici, A. (eds.) Computer and Information Sciences, vol. 62, pp. 383–388. Springer, Dordrecht (2010)
15.
Zurück zum Zitat Pramudiono, I., Kitsuregawa, M.: Parallel FP-growth on PC cluster. In: Whang, K.-Y., Jeon, J., Shim, K., Srivastava, J. (eds.) Advances in Knowledge Discovery and Data Mining, vol. 2637, pp. 467–473. Springer, Berlin (2003) Pramudiono, I., Kitsuregawa, M.: Parallel FP-growth on PC cluster. In: Whang, K.-Y., Jeon, J., Shim, K., Srivastava, J. (eds.) Advances in Knowledge Discovery and Data Mining, vol. 2637, pp. 467–473. Springer, Berlin (2003)
16.
Zurück zum Zitat Yu, K.-M., Zhou, J.: Parallel TID-based frequent pattern mining algorithm on a PC Cluster and grid computing system. Expert Syst. Appl. 37, 2486–2494 (2010) Yu, K.-M., Zhou, J.: Parallel TID-based frequent pattern mining algorithm on a PC Cluster and grid computing system. Expert Syst. Appl. 37, 2486–2494 (2010)
17.
Zurück zum Zitat Huang, C.-H., Leu, Y.: A LINQ-based conditional pattern collection algorithm for parallel frequent itemset mining on a multi-core computer. Paper Presented at the Proceedings of the ASE BigData & SocialInformatics 2015, Kaohsiung, Taiwan (2015) Huang, C.-H., Leu, Y.: A LINQ-based conditional pattern collection algorithm for parallel frequent itemset mining on a multi-core computer. Paper Presented at the Proceedings of the ASE BigData & SocialInformatics 2015, Kaohsiung, Taiwan (2015)
18.
Zurück zum Zitat Liu, L., Li, E., Zhang, Y., Tang, Z.: Optimization of frequent itemset mining on multiple-core processor. Paper Presented at the Very Large Data Base (2007) Liu, L., Li, E., Zhang, Y., Tang, Z.: Optimization of frequent itemset mining on multiple-core processor. Paper Presented at the Very Large Data Base (2007)
19.
Zurück zum Zitat Vu, L., Alaghband, G.: Novel parallel method for mining frequent patterns on multi-core shared memory systems. Paper Presented at the Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems (2013) Vu, L., Alaghband, G.: Novel parallel method for mining frequent patterns on multi-core shared memory systems. Paper Presented at the Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems (2013)
21.
Zurück zum Zitat Farzanyar, Z., Cercone, N.: Accelerating frequent itemsets mining on the cloud: a MapReduce-based approach. Paper Presented at the proceedings of the 2013 IEEE 13th international conference on data mining workshops (2013) Farzanyar, Z., Cercone, N.: Accelerating frequent itemsets mining on the cloud: a MapReduce-based approach. Paper Presented at the proceedings of the 2013 IEEE 13th international conference on data mining workshops (2013)
22.
Zurück zum Zitat Le, Z., Zhiyong, Z., Jin, C., Junjie, L., Joshua Zhexue, H., Shengzhong, F.: Balanced parallel FP-growth with MapReduce. Paper presented at the information computing and telecommunications (YC-ICT), 2010 IEEE youth conference on, 28–30 Nov. 2010 (2010) Le, Z., Zhiyong, Z., Jin, C., Junjie, L., Joshua Zhexue, H., Shengzhong, F.: Balanced parallel FP-growth with MapReduce. Paper presented at the information computing and telecommunications (YC-ICT), 2010 IEEE youth conference on, 28–30 Nov. 2010 (2010)
23.
Zurück zum Zitat Li, H., Wang, Y., Zhang, D., Zhang, M., Chang, E.Y.: Pfp: parallel fp-growth for query recommendation. Paper presented at the proceedings of the 2008 ACM conference on recommender systems, Lausanne, Switzerland (2008) Li, H., Wang, Y., Zhang, D., Zhang, M., Chang, E.Y.: Pfp: parallel fp-growth for query recommendation. Paper presented at the proceedings of the 2008 ACM conference on recommender systems, Lausanne, Switzerland (2008)
24.
Zurück zum Zitat Li, N., Zeng, L., He, Q., Shi, Z.: Parallel Implementation of Apriori Algorithm Based on MapReduce. Paper presented at the Software Engineering, Artificial Intelligence, Networking and Parallel & Distributed Computing (SNPD), 2012 13th ACIS International Conference on 8-10 Aug. 2012 (2012) Li, N., Zeng, L., He, Q., Shi, Z.: Parallel Implementation of Apriori Algorithm Based on MapReduce. Paper presented at the Software Engineering, Artificial Intelligence, Networking and Parallel & Distributed Computing (SNPD), 2012 13th ACIS International Conference on 8-10 Aug. 2012 (2012)
25.
Zurück zum Zitat Moens, S., Aksehirli, E., Goethals, B.: Frequent Itemset Mining for Big Data. Paper presented at the Big Data, 2013 IEEE International Conference on, 6–9 Oct. 2013 (2013) Moens, S., Aksehirli, E., Goethals, B.: Frequent Itemset Mining for Big Data. Paper presented at the Big Data, 2013 IEEE International Conference on, 6–9 Oct. 2013 (2013)
26.
Zurück zum Zitat Xun, Y., Zhang, J., Qin, X., Zhao, X.: FiDoop-DP: data partitioning in frequent itemset mining on hadoop clusters. IEEE Trans. Parallel Distrib. Syst. 28(1), 101–114 (2017) Xun, Y., Zhang, J., Qin, X., Zhao, X.: FiDoop-DP: data partitioning in frequent itemset mining on hadoop clusters. IEEE Trans. Parallel Distrib. Syst. 28(1), 101–114 (2017)
27.
Zurück zum Zitat Chen, Lin, Junzhong, Gu: PFIN: a parallel frequent itemset mining algorithm using nodesets. Int. J. Database Theory Appl. 9(6), 81–92 (2016) Chen, Lin, Junzhong, Gu: PFIN: a parallel frequent itemset mining algorithm using nodesets. Int. J. Database Theory Appl. 9(6), 81–92 (2016)
28.
Zurück zum Zitat Ozkural, E., Ucar, B., Aykanat, C.: Parallel Frequent Item Set Mining with Selective Item Replication. IEEE Trans. Parallel Distrib. Syst. 22(10), October 20 (2011) Ozkural, E., Ucar, B., Aykanat, C.: Parallel Frequent Item Set Mining with Selective Item Replication. IEEE Trans. Parallel Distrib. Syst. 22(10), October 20 (2011)
29.
Zurück zum Zitat Joy, R., Sherly, K.K.: Parallel frequent itemset mining with spark RDD framework for disease prediction. 2016 International Conference on Circuit, Power and Computing Technologies (ICCPCT) (2016) Joy, R., Sherly, K.K.: Parallel frequent itemset mining with spark RDD framework for disease prediction. 2016 International Conference on Circuit, Power and Computing Technologies (ICCPCT) (2016)
Metadaten
Titel
Multi-level dataset decomposition for parallel frequent itemset mining on a cluster of personal computers
verfasst von
Chun-Hong Huang
Yungho Leu
Publikationsdatum
03.01.2018
Verlag
Springer US
Erschienen in
Cluster Computing / Ausgabe Sonderheft 2/2019
Print ISSN: 1386-7857
Elektronische ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-017-1609-6

Weitere Artikel der Sonderheft 2/2019

Cluster Computing 2/2019 Zur Ausgabe