nach oben

Cluster Computing

Erschienen in:

03.01.2018

Multi-level dataset decomposition for parallel frequent itemset mining on a cluster of personal computers

verfasst von: Chun-Hong Huang, Yungho Leu

Erschienen in: Cluster Computing | Sonderheft 2/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Frequent Itemset mining is time consuming for large datasets. Many parallel frequent itemset mining algorithms have been proposed to speed up the mining process. This paper presents a parallel frequent itemset mining algorithm on a cluster of personal computers. To facilitate parallel frequent itemset mining, we use prefix path based method to decompose a transactional dataset into its frequent 1-itemset sub-datasets. We called the parallel frequent itemset mining algorithm based on the frequent 1-itemset sub-dataset decomposition the single-level parallel frequent itemset mining algorithm (SLPFIM) in our PC cluster platform. To mitigate the bottleneck caused by time-consuming 1-itemset sub-datasets, we propose a multi-level parallel frequent itemset mining (MLPFIM) algorithm to further decompose the time-consuming 1-itemset sub-datasets into their corresponding sub-sub-datasets. The fine granule of the sub-sub-datasets enhances the load balancing in parallel frequent itemset mining. The experimental results showed that the SLPFIM offered a maximum of 11.9x speedup over the non-parallel execution of the FP-Growth algorithm while the MLPFIM achieved a maximum of 23.1x speedup over the non-parallel execution of the FP-Growth algorithm. The experimental results also showed that the MLPFIM offered a maximum of 2.14x speedup over the SLPFIM.

Vorheriger Artikel The optimization and simulation analysis of the enterprise’s decision making system with the interference of the consumption strategy

Nächster Artikel A community detection algorithm based on multi-similarity method

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. Paper presented at the ACM SIGMOD (1993)

Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. ACM SIGMOD Record 29, 1–12 (2000)

Zaki, M.J.: Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. 12(3), 372–390 (2000)

Wur, S.-Y., Leu, Y.: An effective boolean algorithm for mining association rules in large databases. Paper presented at the sixth international conference on database systems for advanced applications, Hsinchu, Taiwan (1999)

Zaiane, O.R., El-Hajj, M., Lu, P.: Fast parallel association rule mining without candidacy generation. Paper presented at the data mining, 2001. Proceedings IEEE International Conference on ICDM 2001 (2001)

Dong, J., Han, M.: BitTableFI: an efficient mining frequent itemsets algorithm. Knowl.-Based Syst. 20(4), 329–335 (2007)

Grahne, G., Zhu, J.: Efficiently using prefix-trees in mining frequent itemsets. Paper presented at the workshop frequent item set mining implementations (FIMI 2003). Melbourne (2003)

Rácz, B.: Nonordfp: an FP-growth variation without rebuilding the FP-tree. paper presented at the 2nd international workshop on frequent itemset mining implementations (FIMI 2004), Brighton (2004)

Goethals, B., Zaki, M.J.: Advances in Frequent Itemset Mining Implementations: Report on FIMI’03. from http://fimi.cs.helsinki.fi (2003)

10.

Javed, A., Khokhar, A.: Frequent pattern mining on message passing multiprocessor systems. Distrib. Parallel Databases 16, 321–334 (2004)

11.

Fang, W., Lu, M., Xiao, X., He, B., Luo, Q.: Frequent itemset mining on graphics processors. Paper presented at the DaMoN ’09 Proceedings of the fifth international workshop on data management on new hardware Providence, RI, USA, June 28–28 (2009)

12.

Zhang, F., Zhang, Y., Bakos, J.D.: Accelerating frequent itemset mining on graphics processing units. J. Supercomput. (2013)

13.

Zhou, J., Kun-Ming, Y., Bin-Chang, W.: Parallel frequent patterns mining algorithm on GPU. Paper presented at the systems man, 10–13 Oct. 2010 (2010)

14.

Özdogan, G., Abul, O.: Task-parallel FP-growth on cluster computers. In: Gelenbe, E., Lent, R., Sakellari, G., Sacan, A., Toroslu, H., Yazici, A. (eds.) Computer and Information Sciences, vol. 62, pp. 383–388. Springer, Dordrecht (2010)

15.

Pramudiono, I., Kitsuregawa, M.: Parallel FP-growth on PC cluster. In: Whang, K.-Y., Jeon, J., Shim, K., Srivastava, J. (eds.) Advances in Knowledge Discovery and Data Mining, vol. 2637, pp. 467–473. Springer, Berlin (2003)

16.

Yu, K.-M., Zhou, J.: Parallel TID-based frequent pattern mining algorithm on a PC Cluster and grid computing system. Expert Syst. Appl. 37, 2486–2494 (2010)

17.

Huang, C.-H., Leu, Y.: A LINQ-based conditional pattern collection algorithm for parallel frequent itemset mining on a multi-core computer. Paper Presented at the Proceedings of the ASE BigData & SocialInformatics 2015, Kaohsiung, Taiwan (2015)

18.

Liu, L., Li, E., Zhang, Y., Tang, Z.: Optimization of frequent itemset mining on multiple-core processor. Paper Presented at the Very Large Data Base (2007)

19.

Vu, L., Alaghband, G.: Novel parallel method for mining frequent patterns on multi-core shared memory systems. Paper Presented at the Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems (2013)

20.

Hadoop. (2015). Hadoop. from http://zh.wikipedia.org/wiki/Apache_Hadoop

21.

Farzanyar, Z., Cercone, N.: Accelerating frequent itemsets mining on the cloud: a MapReduce-based approach. Paper Presented at the proceedings of the 2013 IEEE 13th international conference on data mining workshops (2013)

22.

Le, Z., Zhiyong, Z., Jin, C., Junjie, L., Joshua Zhexue, H., Shengzhong, F.: Balanced parallel FP-growth with MapReduce. Paper presented at the information computing and telecommunications (YC-ICT), 2010 IEEE youth conference on, 28–30 Nov. 2010 (2010)

23.

Li, H., Wang, Y., Zhang, D., Zhang, M., Chang, E.Y.: Pfp: parallel fp-growth for query recommendation. Paper presented at the proceedings of the 2008 ACM conference on recommender systems, Lausanne, Switzerland (2008)

24.

Li, N., Zeng, L., He, Q., Shi, Z.: Parallel Implementation of Apriori Algorithm Based on MapReduce. Paper presented at the Software Engineering, Artificial Intelligence, Networking and Parallel & Distributed Computing (SNPD), 2012 13th ACIS International Conference on 8-10 Aug. 2012 (2012)

25.

Moens, S., Aksehirli, E., Goethals, B.: Frequent Itemset Mining for Big Data. Paper presented at the Big Data, 2013 IEEE International Conference on, 6–9 Oct. 2013 (2013)

26.

Xun, Y., Zhang, J., Qin, X., Zhao, X.: FiDoop-DP: data partitioning in frequent itemset mining on hadoop clusters. IEEE Trans. Parallel Distrib. Syst. 28(1), 101–114 (2017)

27.

Chen, Lin, Junzhong, Gu: PFIN: a parallel frequent itemset mining algorithm using nodesets. Int. J. Database Theory Appl. 9(6), 81–92 (2016)

28.

Ozkural, E., Ucar, B., Aykanat, C.: Parallel Frequent Item Set Mining with Selective Item Replication. IEEE Trans. Parallel Distrib. Syst. 22(10), October 20 (2011)

29.

Joy, R., Sherly, K.K.: Parallel frequent itemset mining with spark RDD framework for disease prediction. 2016 International Conference on Circuit, Power and Computing Technologies (ICCPCT) (2016)

30.

Goethals, B. (2015). FIMI repository. from http://fimi.ua.ac.be/data/

31.

Borgelt, C. (2015). Christian Borgelt’s Web Pages from http://www.borgelt.net/

Titel: Multi-level dataset decomposition for parallel frequent itemset mining on a cluster of personal computers
verfasst von: Chun-Hong Huang
Yungho Leu
Publikationsdatum: 03.01.2018
Verlag: Springer US
Erschienen in: Cluster Computing / Ausgabe Sonderheft 2/2019
Print ISSN: 1386-7857
Elektronische ISSN: 1573-7543
DOI: https://doi.org/10.1007/s10586-017-1609-6

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Sonderheft 2/2019

Soft computing approaches based bookmark selection and clustering techniques for social tagging systems

Fuzzy TOPSIS analytic hierarchy process based cooperative development of cultural soft power

Autoregressive moving average model based relationship identification between exchange rate and export trade

Urban public art and interaction design strategy based on digital technology

A trust based security framework with anonymous authentication system using multiple attributes in decentralized cloud

Method for analyzing the strength of sandstone with three intermittent fractures based on discrete fracture model as an approximation