Top

Cluster Computing

Published in:

03-01-2018

Multi-level dataset decomposition for parallel frequent itemset mining on a cluster of personal computers

Authors: Chun-Hong Huang, Yungho Leu

Published in: Cluster Computing | Special Issue 2/2019

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Frequent Itemset mining is time consuming for large datasets. Many parallel frequent itemset mining algorithms have been proposed to speed up the mining process. This paper presents a parallel frequent itemset mining algorithm on a cluster of personal computers. To facilitate parallel frequent itemset mining, we use prefix path based method to decompose a transactional dataset into its frequent 1-itemset sub-datasets. We called the parallel frequent itemset mining algorithm based on the frequent 1-itemset sub-dataset decomposition the single-level parallel frequent itemset mining algorithm (SLPFIM) in our PC cluster platform. To mitigate the bottleneck caused by time-consuming 1-itemset sub-datasets, we propose a multi-level parallel frequent itemset mining (MLPFIM) algorithm to further decompose the time-consuming 1-itemset sub-datasets into their corresponding sub-sub-datasets. The fine granule of the sub-sub-datasets enhances the load balancing in parallel frequent itemset mining. The experimental results showed that the SLPFIM offered a maximum of 11.9x speedup over the non-parallel execution of the FP-Growth algorithm while the MLPFIM achieved a maximum of 23.1x speedup over the non-parallel execution of the FP-Growth algorithm. The experimental results also showed that the MLPFIM offered a maximum of 2.14x speedup over the SLPFIM.

previous article The optimization and simulation analysis of the enterprise’s decision making system with the interference of the consumption strategy

next article A community detection algorithm based on multi-similarity method

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. Paper presented at the ACM SIGMOD (1993)

Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. ACM SIGMOD Record 29, 1–12 (2000)

Zaki, M.J.: Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. 12(3), 372–390 (2000)

Wur, S.-Y., Leu, Y.: An effective boolean algorithm for mining association rules in large databases. Paper presented at the sixth international conference on database systems for advanced applications, Hsinchu, Taiwan (1999)

Zaiane, O.R., El-Hajj, M., Lu, P.: Fast parallel association rule mining without candidacy generation. Paper presented at the data mining, 2001. Proceedings IEEE International Conference on ICDM 2001 (2001)

Dong, J., Han, M.: BitTableFI: an efficient mining frequent itemsets algorithm. Knowl.-Based Syst. 20(4), 329–335 (2007)

Grahne, G., Zhu, J.: Efficiently using prefix-trees in mining frequent itemsets. Paper presented at the workshop frequent item set mining implementations (FIMI 2003). Melbourne (2003)

Rácz, B.: Nonordfp: an FP-growth variation without rebuilding the FP-tree. paper presented at the 2nd international workshop on frequent itemset mining implementations (FIMI 2004), Brighton (2004)

Goethals, B., Zaki, M.J.: Advances in Frequent Itemset Mining Implementations: Report on FIMI’03. from http://fimi.cs.helsinki.fi (2003)

10.

Javed, A., Khokhar, A.: Frequent pattern mining on message passing multiprocessor systems. Distrib. Parallel Databases 16, 321–334 (2004)

11.

Fang, W., Lu, M., Xiao, X., He, B., Luo, Q.: Frequent itemset mining on graphics processors. Paper presented at the DaMoN ’09 Proceedings of the fifth international workshop on data management on new hardware Providence, RI, USA, June 28–28 (2009)

12.

Zhang, F., Zhang, Y., Bakos, J.D.: Accelerating frequent itemset mining on graphics processing units. J. Supercomput. (2013)

13.

Zhou, J., Kun-Ming, Y., Bin-Chang, W.: Parallel frequent patterns mining algorithm on GPU. Paper presented at the systems man, 10–13 Oct. 2010 (2010)

14.

Özdogan, G., Abul, O.: Task-parallel FP-growth on cluster computers. In: Gelenbe, E., Lent, R., Sakellari, G., Sacan, A., Toroslu, H., Yazici, A. (eds.) Computer and Information Sciences, vol. 62, pp. 383–388. Springer, Dordrecht (2010)

15.

Pramudiono, I., Kitsuregawa, M.: Parallel FP-growth on PC cluster. In: Whang, K.-Y., Jeon, J., Shim, K., Srivastava, J. (eds.) Advances in Knowledge Discovery and Data Mining, vol. 2637, pp. 467–473. Springer, Berlin (2003)

16.

Yu, K.-M., Zhou, J.: Parallel TID-based frequent pattern mining algorithm on a PC Cluster and grid computing system. Expert Syst. Appl. 37, 2486–2494 (2010)

17.

Huang, C.-H., Leu, Y.: A LINQ-based conditional pattern collection algorithm for parallel frequent itemset mining on a multi-core computer. Paper Presented at the Proceedings of the ASE BigData & SocialInformatics 2015, Kaohsiung, Taiwan (2015)

18.

Liu, L., Li, E., Zhang, Y., Tang, Z.: Optimization of frequent itemset mining on multiple-core processor. Paper Presented at the Very Large Data Base (2007)

19.

Vu, L., Alaghband, G.: Novel parallel method for mining frequent patterns on multi-core shared memory systems. Paper Presented at the Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems (2013)

20.

Hadoop. (2015). Hadoop. from http://zh.wikipedia.org/wiki/Apache_Hadoop

21.

Farzanyar, Z., Cercone, N.: Accelerating frequent itemsets mining on the cloud: a MapReduce-based approach. Paper Presented at the proceedings of the 2013 IEEE 13th international conference on data mining workshops (2013)

22.

Le, Z., Zhiyong, Z., Jin, C., Junjie, L., Joshua Zhexue, H., Shengzhong, F.: Balanced parallel FP-growth with MapReduce. Paper presented at the information computing and telecommunications (YC-ICT), 2010 IEEE youth conference on, 28–30 Nov. 2010 (2010)

23.

Li, H., Wang, Y., Zhang, D., Zhang, M., Chang, E.Y.: Pfp: parallel fp-growth for query recommendation. Paper presented at the proceedings of the 2008 ACM conference on recommender systems, Lausanne, Switzerland (2008)

24.

Li, N., Zeng, L., He, Q., Shi, Z.: Parallel Implementation of Apriori Algorithm Based on MapReduce. Paper presented at the Software Engineering, Artificial Intelligence, Networking and Parallel & Distributed Computing (SNPD), 2012 13th ACIS International Conference on 8-10 Aug. 2012 (2012)

25.

Moens, S., Aksehirli, E., Goethals, B.: Frequent Itemset Mining for Big Data. Paper presented at the Big Data, 2013 IEEE International Conference on, 6–9 Oct. 2013 (2013)

26.

Xun, Y., Zhang, J., Qin, X., Zhao, X.: FiDoop-DP: data partitioning in frequent itemset mining on hadoop clusters. IEEE Trans. Parallel Distrib. Syst. 28(1), 101–114 (2017)

27.

Chen, Lin, Junzhong, Gu: PFIN: a parallel frequent itemset mining algorithm using nodesets. Int. J. Database Theory Appl. 9(6), 81–92 (2016)

28.

Ozkural, E., Ucar, B., Aykanat, C.: Parallel Frequent Item Set Mining with Selective Item Replication. IEEE Trans. Parallel Distrib. Syst. 22(10), October 20 (2011)

29.

Joy, R., Sherly, K.K.: Parallel frequent itemset mining with spark RDD framework for disease prediction. 2016 International Conference on Circuit, Power and Computing Technologies (ICCPCT) (2016)

30.

Goethals, B. (2015). FIMI repository. from http://fimi.ua.ac.be/data/

31.

Borgelt, C. (2015). Christian Borgelt’s Web Pages from http://www.borgelt.net/

Title: Multi-level dataset decomposition for parallel frequent itemset mining on a cluster of personal computers
Authors: Chun-Hong Huang
Yungho Leu
Publication date: 03-01-2018
Publisher: Springer US
Published in: Cluster Computing / Issue Special Issue 2/2019
Print ISSN: 1386-7857
Electronic ISSN: 1573-7543
DOI: https://doi.org/10.1007/s10586-017-1609-6

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Special Issue 2/2019

Grid Connected Photovoltaic Power Generation Control Method Based on Swarm Intelligence Algorithm

On the sub-low contact velocity control of rendezvous docking for smart world

Optimal design of transportation distance in logistics supply chain model based on data mining algorithm

Utilization of compact genetic algorithm for optimal shortest path selection to improve the throughput in mobile Ad-Hoc networks

Closed-loop subspace identification of multivariable dynamic errors-in-variables models based on ORT

A study on supervised machine learning algorithm to improvise intrusion detection systems for mobile ad hoc networks

Premium Partner