Skip to main content
Erschienen in: Cluster Computing 3/2018

09.02.2018

BIGMiner: a fast and scalable distributed frequent pattern miner for big data

verfasst von: Kang-Wook Chon, Min-Soo Kim

Erschienen in: Cluster Computing | Ausgabe 3/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Frequent itemset mining is widely used as a fundamental data mining technique. Recently, there have been proposed a number of MapReduce-based frequent itemset mining methods in order to overcome the limits on data size and speed of mining that sequential mining methods have. However, the existing MapReduce-based methods still do not have a good scalability due to high workload skewness, large intermediate data, and large network communication overhead. In this paper, we propose BIGMiner, a fast and scalable MapReduce-based frequent itemset mining method. BIGMiner generates equal-sized sub-databases called transaction chunks and performs support counting only based on transaction chunks and bitwise operations without generating and shuffling intermediate data. As a result, BIGMiner achieves very high scalability due to no workload skewness, no intermediate data, and small network communication overhead. Through extensive experiments using large-scale datasets of up to 6.5 billion transactions, we have shown that BIGMiner consistently and significantly outperforms the state-of-the-art methods without any memory problems.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Aggarwal, C.C., Han, J.: Frequent Pattern Mining. Springer, New York (2014)CrossRef Aggarwal, C.C., Han, J.: Frequent Pattern Mining. Springer, New York (2014)CrossRef
7.
Zurück zum Zitat Buehrer, G., de Oliveira, R.L., Fuhry, D., Parthasarathy, S.: Towards a parameter-free and parallel itemset mining algorithm in linearithmic time. In: 2015 IEEE 31st International Conference on Data Engineering (ICDE), pp. 1071–1082. IEEE (2015) Buehrer, G., de Oliveira, R.L., Fuhry, D., Parthasarathy, S.: Towards a parameter-free and parallel itemset mining algorithm in linearithmic time. In: 2015 IEEE 31st International Conference on Data Engineering (ICDE), pp. 1071–1082. IEEE (2015)
8.
Zurück zum Zitat Cheng, L., Kotoulas, S.: Efficient skew handling for outer joins in a cloud computing environment. IEEE Transactions on Cloud Computing (2015) Cheng, L., Kotoulas, S.: Efficient skew handling for outer joins in a cloud computing environment. IEEE Transactions on Cloud Computing (2015)
9.
Zurück zum Zitat Cheng, L., Kotoulas, S., Ward, T.E., Theodoropoulos, G.: Robust and skew-resistant parallel joins in shared-nothing systems. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 1399–1408. ACM (2014) Cheng, L., Kotoulas, S., Ward, T.E., Theodoropoulos, G.: Robust and skew-resistant parallel joins in shared-nothing systems. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 1399–1408. ACM (2014)
10.
Zurück zum Zitat Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef
11.
Zurück zum Zitat Fang, W., Lu, M., Xiao, X., He, B., Luo, Q.: Frequent itemset mining on graphics processors. In: DaMon, pp. 34–42. ACM (2009) Fang, W., Lu, M., Xiao, X., He, B., Luo, Q.: Frequent itemset mining on graphics processors. In: DaMon, pp. 34–42. ACM (2009)
13.
Zurück zum Zitat Gonen, Y., Gudes, E.: An improved mapreduce algorithm for mining closed frequent itemsets. In: 2016 IEEE International Conference on Software Science, Technology and Engineering (SWSTE), pp. 77–83. IEEE (2016) Gonen, Y., Gudes, E.: An improved mapreduce algorithm for mining closed frequent itemsets. In: 2016 IEEE International Conference on Software Science, Technology and Engineering (SWSTE), pp. 77–83. IEEE (2016)
14.
Zurück zum Zitat Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, Amsterdam (2011)MATH Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, Amsterdam (2011)MATH
15.
Zurück zum Zitat Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: ACM SIGMOD Record. vol. 29, pp. 1–12. ACM (2000) Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: ACM SIGMOD Record. vol. 29, pp. 1–12. ACM (2000)
16.
Zurück zum Zitat Kovacs, F., Illés, J.: Frequent itemset mining on hadoop. In: 2013 IEEE 9th International Conference on Computational Cybernetics (ICCC), pp. 241–245. IEEE (2013) Kovacs, F., Illés, J.: Frequent itemset mining on hadoop. In: 2013 IEEE 9th International Conference on Computational Cybernetics (ICCC), pp. 241–245. IEEE (2013)
17.
Zurück zum Zitat Li, H., Wang, Y., Zhang, D., Zhang, M., Chang, E.Y.: Pfp: parallel fp-growth for query recommendation. In: RecSys, pp. 107–114. ACM (2008) Li, H., Wang, Y., Zhang, D., Zhang, M., Chang, E.Y.: Pfp: parallel fp-growth for query recommendation. In: RecSys, pp. 107–114. ACM (2008)
18.
Zurück zum Zitat Li, N., Zeng, L., He, Q., Shi, Z.: Parallel implementation of apriori algorithm based on mapreduce. In: 2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel & Distributed Computing (SNPD), pp. 236–241. IEEE (2012) Li, N., Zeng, L., He, Q., Shi, Z.: Parallel implementation of apriori algorithm based on mapreduce. In: 2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel & Distributed Computing (SNPD), pp. 236–241. IEEE (2012)
19.
Zurück zum Zitat Li, X., Han, J., Gonzalez, H.: High-dimensional olap: a minimal cubing approach. In: PVLDB, pp. 528–539. VLDB Endowment (2004) Li, X., Han, J., Gonzalez, H.: High-dimensional olap: a minimal cubing approach. In: PVLDB, pp. 528–539. VLDB Endowment (2004)
20.
Zurück zum Zitat Lin, M.Y., Lee, P.Y., Hsueh, S.C.: Apriori-based frequent itemset mining algorithms on mapreduce. In: ICUIMC, p. 76. ACM (2012) Lin, M.Y., Lee, P.Y., Hsueh, S.C.: Apriori-based frequent itemset mining algorithms on mapreduce. In: ICUIMC, p. 76. ACM (2012)
21.
Zurück zum Zitat Lin, W., Alvarez, S.A., Ruiz, C.: Efficient adaptive-support association rule mining for recommender systems. Data Min. Knowl. Discov. 6(1), 83–105 (2002)MathSciNetCrossRef Lin, W., Alvarez, S.A., Ruiz, C.: Efficient adaptive-support association rule mining for recommender systems. Data Min. Knowl. Discov. 6(1), 83–105 (2002)MathSciNetCrossRef
22.
Zurück zum Zitat Lucchese, C., Orlando, S., Perego, R., Silvestri, F.: Webdocs: a real-life huge transactional dataset. In: FIMI, vol. 126 (2004) Lucchese, C., Orlando, S., Perego, R., Silvestri, F.: Webdocs: a real-life huge transactional dataset. In: FIMI, vol. 126 (2004)
23.
Zurück zum Zitat Moens, S., Aksehirli, E., Goethals, B.: Frequent itemset mining for big data. In: Big Data, pp. 111–118. IEEE (2013) Moens, S., Aksehirli, E., Goethals, B.: Frequent itemset mining for big data. In: Big Data, pp. 111–118. IEEE (2013)
24.
Zurück zum Zitat Sandvig, J.J., Mobasher, B., Burke, R.: Robustness of collaborative recommendation based on association rule mining. In: Recsys, pp. 105–112. ACM (2007) Sandvig, J.J., Mobasher, B., Burke, R.: Robustness of collaborative recommendation based on association rule mining. In: Recsys, pp. 105–112. ACM (2007)
25.
Zurück zum Zitat Schlegel, B.: Frequent itemset mining on multiprocessor systems. Dissertation, Technischen Universit\(\ddot{a}\)t Dresden (2013) Schlegel, B.: Frequent itemset mining on multiprocessor systems. Dissertation, Technischen Universit\(\ddot{a}\)t Dresden (2013)
26.
Zurück zum Zitat Sethi, K.K., Ramesh, D.: Hfim: a spark-based hybrid frequent itemset mining algorithm for big data processing. J. Supercomput. 1–17 (2017) Sethi, K.K., Ramesh, D.: Hfim: a spark-based hybrid frequent itemset mining algorithm for big data processing. J. Supercomput. 1–17 (2017)
27.
Zurück zum Zitat Wang, L., Feng, L., Zhang, J., Liao, P.: An efficient algorithm of frequent itemsets mining based on mapreduce. J. Inf. Comput. Sci. 11(8), 2809–2816 (2014)CrossRef Wang, L., Feng, L., Zhang, J., Liao, P.: An efficient algorithm of frequent itemsets mining based on mapreduce. J. Inf. Comput. Sci. 11(8), 2809–2816 (2014)CrossRef
28.
Zurück zum Zitat Xun, Y., Zhang, J., Qin, X., Zhao, X.: Fidoop-dp: data partitioning in frequent itemset mining on hadoop clusters. IEEE Trans. Parallel Distrib. Syst. 28(1), 101–114 (2017)CrossRef Xun, Y., Zhang, J., Qin, X., Zhao, X.: Fidoop-dp: data partitioning in frequent itemset mining on hadoop clusters. IEEE Trans. Parallel Distrib. Syst. 28(1), 101–114 (2017)CrossRef
30.
Zurück zum Zitat Yu, H., Wen, J., Wang, H., Jun, L.: An improved apriori algorithm based on the boolean matrix and hadoop. Procedia Eng. 15, 1827–1831 (2011)CrossRef Yu, H., Wen, J., Wang, H., Jun, L.: An improved apriori algorithm based on the boolean matrix and hadoop. Procedia Eng. 15, 1827–1831 (2011)CrossRef
31.
Zurück zum Zitat Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W., et al.: New algorithms for fast discovery of association rules. KDD. 97, 283–286 (1997) Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W., et al.: New algorithms for fast discovery of association rules. KDD. 97, 283–286 (1997)
34.
Zurück zum Zitat Zhou, L., Zhong, Z., Chang, J., Li, J., Huang, J.Z., Feng, S.: Balanced parallel fp-growth with mapreduce. In: 2010 IEEE Youth Conference on Information Computing and Telecommunications (YC-ICT), pp. 243–246. IEEE (2010) Zhou, L., Zhong, Z., Chang, J., Li, J., Huang, J.Z., Feng, S.: Balanced parallel fp-growth with mapreduce. In: 2010 IEEE Youth Conference on Information Computing and Telecommunications (YC-ICT), pp. 243–246. IEEE (2010)
Metadaten
Titel
BIGMiner: a fast and scalable distributed frequent pattern miner for big data
verfasst von
Kang-Wook Chon
Min-Soo Kim
Publikationsdatum
09.02.2018
Verlag
Springer US
Erschienen in
Cluster Computing / Ausgabe 3/2018
Print ISSN: 1386-7857
Elektronische ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-018-1812-0

Weitere Artikel der Ausgabe 3/2018

Cluster Computing 3/2018 Zur Ausgabe