nach oben

Cluster Computing

Erschienen in:

09.02.2018

BIGMiner: a fast and scalable distributed frequent pattern miner for big data

verfasst von: Kang-Wook Chon, Min-Soo Kim

Erschienen in: Cluster Computing | Ausgabe 3/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Frequent itemset mining is widely used as a fundamental data mining technique. Recently, there have been proposed a number of MapReduce-based frequent itemset mining methods in order to overcome the limits on data size and speed of mining that sequential mining methods have. However, the existing MapReduce-based methods still do not have a good scalability due to high workload skewness, large intermediate data, and large network communication overhead. In this paper, we propose BIGMiner, a fast and scalable MapReduce-based frequent itemset mining method. BIGMiner generates equal-sized sub-databases called transaction chunks and performs support counting only based on transaction chunks and bitwise operations without generating and shuffling intermediate data. As a result, BIGMiner achieves very high scalability due to no workload skewness, no intermediate data, and small network communication overhead. Through extensive experiments using large-scale datasets of up to 6.5 billion transactions, we have shown that BIGMiner consistently and significantly outperforms the state-of-the-art methods without any memory problems.

Vorheriger Artikel Network slicing to improve multicasting in HPC clusters

Nächster Artikel Energy-efficient hybrid coherence protocol for multicore processors

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Aggarwal, C.C., Han, J.: Frequent Pattern Mining. Springer, New York (2014)CrossRef

Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: VLDB, pp. 487–499 (1994). http://www.vldb.org/conf/1994/P487.PDF

Apache Hadoop (2006). http://hadoop.apache.org

Apache Mahout (2013). http://mahout.apache.org

Apache Spark MLlib (2014). http://spark.apache.org/mllib/

BigFIM (2013). https://gitlab.com/adrem/bigfim-sa

Buehrer, G., de Oliveira, R.L., Fuhry, D., Parthasarathy, S.: Towards a parameter-free and parallel itemset mining algorithm in linearithmic time. In: 2015 IEEE 31st International Conference on Data Engineering (ICDE), pp. 1071–1082. IEEE (2015)

Cheng, L., Kotoulas, S.: Efficient skew handling for outer joins in a cloud computing environment. IEEE Transactions on Cloud Computing (2015)

Cheng, L., Kotoulas, S., Ward, T.E., Theodoropoulos, G.: Robust and skew-resistant parallel joins in shared-nothing systems. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 1399–1408. ACM (2014)

10.

Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef

11.

Fang, W., Lu, M., Xiao, X., He, B., Luo, Q.: Frequent itemset mining on graphics processors. In: DaMon, pp. 34–42. ACM (2009)

12.

FIMI Repository (2005). http://fimi.ua.ac.be

13.

Gonen, Y., Gudes, E.: An improved mapreduce algorithm for mining closed frequent itemsets. In: 2016 IEEE International Conference on Software Science, Technology and Engineering (SWSTE), pp. 77–83. IEEE (2016)

14.

Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, Amsterdam (2011)MATH

15.

Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: ACM SIGMOD Record. vol. 29, pp. 1–12. ACM (2000)

16.

Kovacs, F., Illés, J.: Frequent itemset mining on hadoop. In: 2013 IEEE 9th International Conference on Computational Cybernetics (ICCC), pp. 241–245. IEEE (2013)

17.

Li, H., Wang, Y., Zhang, D., Zhang, M., Chang, E.Y.: Pfp: parallel fp-growth for query recommendation. In: RecSys, pp. 107–114. ACM (2008)

18.

Li, N., Zeng, L., He, Q., Shi, Z.: Parallel implementation of apriori algorithm based on mapreduce. In: 2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel & Distributed Computing (SNPD), pp. 236–241. IEEE (2012)

19.

Li, X., Han, J., Gonzalez, H.: High-dimensional olap: a minimal cubing approach. In: PVLDB, pp. 528–539. VLDB Endowment (2004)

20.

Lin, M.Y., Lee, P.Y., Hsueh, S.C.: Apriori-based frequent itemset mining algorithms on mapreduce. In: ICUIMC, p. 76. ACM (2012)

21.

Lin, W., Alvarez, S.A., Ruiz, C.: Efficient adaptive-support association rule mining for recommender systems. Data Min. Knowl. Discov. 6(1), 83–105 (2002)MathSciNetCrossRef

22.

Lucchese, C., Orlando, S., Perego, R., Silvestri, F.: Webdocs: a real-life huge transactional dataset. In: FIMI, vol. 126 (2004)

23.

Moens, S., Aksehirli, E., Goethals, B.: Frequent itemset mining for big data. In: Big Data, pp. 111–118. IEEE (2013)

24.

Sandvig, J.J., Mobasher, B., Burke, R.: Robustness of collaborative recommendation based on association rule mining. In: Recsys, pp. 105–112. ACM (2007)

25.

Schlegel, B.: Frequent itemset mining on multiprocessor systems. Dissertation, Technischen Universit\(\ddot{a}\)t Dresden (2013)

26.

Sethi, K.K., Ramesh, D.: Hfim: a spark-based hybrid frequent itemset mining algorithm for big data processing. J. Supercomput. 1–17 (2017)

27.

Wang, L., Feng, L., Zhang, J., Liao, P.: An efficient algorithm of frequent itemsets mining based on mapreduce. J. Inf. Comput. Sci. 11(8), 2809–2816 (2014)CrossRef

28.

Xun, Y., Zhang, J., Qin, X., Zhao, X.: Fidoop-dp: data partitioning in frequent itemset mining on hadoop clusters. IEEE Trans. Parallel Distrib. Syst. 28(1), 101–114 (2017)CrossRef

29.

Yahoo webscope. Yahoo! altavista web page hyperlink connectivity graph (2009). http://webscope.sandbox.yahoo.com

30.

Yu, H., Wen, J., Wang, H., Jun, L.: An improved apriori algorithm based on the boolean matrix and hadoop. Procedia Eng. 15, 1827–1831 (2011)CrossRef

31.

Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W., et al.: New algorithms for fast discovery of association rules. KDD. 97, 283–286 (1997)

32.

Zhang, F., Zhang, Y., Bakos, J.D.: Gpapriori: Gpu-accelerated frequent itemset mining. In: Cluster, pp. 590–594 (2011). http://dx.doi.org/10.1109/CLUSTER.2011.61

33.

Zhang, F., Zhang, Y., Bakos, J.D.: Accelerating frequent itemset mining on graphics processing units. J. Supercomput. 66(1), 94–117 (2013). http://dx.doi.org/10.1007/s11227-013-0887-x CrossRef

34.

Zhou, L., Zhong, Z., Chang, J., Li, J., Huang, J.Z., Feng, S.: Balanced parallel fp-growth with mapreduce. In: 2010 IEEE Youth Conference on Information Computing and Telecommunications (YC-ICT), pp. 243–246. IEEE (2010)

Titel: BIGMiner: a fast and scalable distributed frequent pattern miner for big data
verfasst von: Kang-Wook Chon
Min-Soo Kim
Publikationsdatum: 09.02.2018
Verlag: Springer US
Erschienen in: Cluster Computing / Ausgabe 3/2018
Print ISSN: 1386-7857
Elektronische ISSN: 1573-7543
DOI: https://doi.org/10.1007/s10586-018-1812-0

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 3/2018

RALBA: a computation-aware load balancing scheduler for cloud computing

EAD: elasticity aware deduplication manager for datacenters with multi-tier storage systems

A new semantic web service classification (SWSC) strategy

Ensuring performance and provider profit through data replication in cloud systems

A case study of a shared/buy-in computing ecosystem

Robust optimization for energy-efficient virtual machine consolidation in modern datacenters