Skip to main content
Top

2015 | OriginalPaper | Chapter

Parallel Eclat for Opportunistic Mining of Frequent Itemsets

Authors : Junqiang Liu, Yongsheng Wu, Qingfeng Zhou, Benjamin C. M. Fung, Fanghui Chen, Binxiao Yu

Published in: Database and Expert Systems Applications

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Mining frequent itemsets is an essential data mining problem. As the big data era comes, the size of databases is becoming so large that traditional algorithms will not scale well. An approach to the issue is to parallelize the mining algorithm, which however is a challenge that has not been well addressed yet. In this paper, we propose a MapReduce-based algorithm, Peclat, that parallelizes the vertical mining algorithm, Eclat, with three improvements. First, Peclat proposes a hybrid vertical data format to represent the data, which saves both space and time in the mining process. Second, Peclat adopts the pruning technique from the Apriori algorithm to improve efficiency of breadth-first search. Third, Peclat employs an ordering of itemsets that helps balancing the workloads. Extensive experiments demonstrate that Peclat outperforms the existing MapReduce-based algorithms significantly.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Agrawal, R., Shafer, J.: Parallel mining of association rules. IEEE Trans. Knowl. Data Eng. 8, 962–969 (1996)CrossRefMATH Agrawal, R., Shafer, J.: Parallel mining of association rules. IEEE Trans. Knowl. Data Eng. 8, 962–969 (1996)CrossRefMATH
2.
go back to reference Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: 20th VLDB, p. 487 (1994) Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: 20th VLDB, p. 487 (1994)
3.
go back to reference Chen, X., He, Y., Chen, P., Miao, S., Song, W., Yue, M.: HPFP-Miner: a novel parallel frequent itemset mining algorithm. ICNC 3, 139–143 (2009) Chen, X., He, Y., Chen, P., Miao, S., Song, W., Yue, M.: HPFP-Miner: a novel parallel frequent itemset mining algorithm. ICNC 3, 139–143 (2009)
4.
go back to reference Cyrans, J.-D., Ratt, S., Champagne, R.: Adaptation of apriori to MapReduce to build a warehouse of relations between named entities across the web. In: 2010 DBKDA, pp. 185–189 (2010) Cyrans, J.-D., Ratt, S., Champagne, R.: Adaptation of apriori to MapReduce to build a warehouse of relations between named entities across the web. In: 2010 DBKDA, pp. 185–189 (2010)
5.
go back to reference Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef
6.
go back to reference Dunkel, B., Soparkar, N.: Data organization and access for efficient data mining. In: 15th ICDE, pp. 522–529 (1999) Dunkel, B., Soparkar, N.: Data organization and access for efficient data mining. In: 15th ICDE, pp. 522–529 (1999)
7.
go back to reference Farzanyar, Z., Cercone, N.: Efficient mining of frequent itemsets in social network data based on MapReduce framework. In: ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 1183–1188 (2013) Farzanyar, Z., Cercone, N.: Efficient mining of frequent itemsets in social network data based on MapReduce framework. In: ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 1183–1188 (2013)
8.
go back to reference Hammoud, S.: MapReduce network enabled algorithms for classification based on association rules. Ph.D. Thesis, Brunel University (2011) Hammoud, S.: MapReduce network enabled algorithms for classification based on association rules. Ph.D. Thesis, Brunel University (2011)
9.
go back to reference Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: 2000 SIGMOD, pp. 1–12 (2000) Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: 2000 SIGMOD, pp. 1–12 (2000)
10.
go back to reference Li, H., Wang, Y., Zhang, D., Zhang, M., Chang, E.Y.: PFP: parallel FP-growth for query recommendation. In: 2008 ACM Conference on Recommender System (RecSys 2008), pp. 107–114 (2008) Li, H., Wang, Y., Zhang, D., Zhang, M., Chang, E.Y.: PFP: parallel FP-growth for query recommendation. In: 2008 ACM Conference on Recommender System (RecSys 2008), pp. 107–114 (2008)
11.
go back to reference Li, L., Zhang, M.: The strategy of mining association rules based on cloud computing. In: BCGIN, pp. 475–478 (2011) Li, L., Zhang, M.: The strategy of mining association rules based on cloud computing. In: BCGIN, pp. 475–478 (2011)
12.
go back to reference Li, N., Zeng, L., He, Q., Shi, Z.: Parallel implementation of apriori algorithm based on MapReduce. In: ACIS International Conference on Software Engineering, Artificial Intelligence, Networking & Parallel/Distributed Computing, pp. 236–241 (2012) Li, N., Zeng, L., He, Q., Shi, Z.: Parallel implementation of apriori algorithm based on MapReduce. In: ACIS International Conference on Software Engineering, Artificial Intelligence, Networking & Parallel/Distributed Computing, pp. 236–241 (2012)
13.
go back to reference Lin, M.-Y., Lee, P.-Y., Hsueh, S.-C.: Apriori-based frequent itemset mining algorithms on MapReduce. In: ICUIMC (2012) Lin, M.-Y., Lee, P.-Y., Hsueh, S.-C.: Apriori-based frequent itemset mining algorithms on MapReduce. In: ICUIMC (2012)
14.
go back to reference Riondato, M., DeBrabant, J.A., Fonseca, R., Upfal, E.: PARMA: a randomized parallel algorithm for approximate association rule mining in MapReduce. In: 21st CIKM, pp. 85–94 (2012) Riondato, M., DeBrabant, J.A., Fonseca, R., Upfal, E.: PARMA: a randomized parallel algorithm for approximate association rule mining in MapReduce. In: 21st CIKM, pp. 85–94 (2012)
15.
go back to reference Sarawagi, S., Thomas, S., Agrawal, R.: Integrating association rule mining with databases: alternatives and implications. In: 1998 SIGMOD, pp. 343–354 (1998) Sarawagi, S., Thomas, S., Agrawal, R.: Integrating association rule mining with databases: alternatives and implications. In: 1998 SIGMOD, pp. 343–354 (1998)
16.
go back to reference Shenoy, P., Haritsa, J.R., Sudarshan, S.: Turbo-charging vertical mining of large databases. In: 2000 SIGMOD, pp. 22–33 (2000) Shenoy, P., Haritsa, J.R., Sudarshan, S.: Turbo-charging vertical mining of large databases. In: 2000 SIGMOD, pp. 22–33 (2000)
17.
go back to reference Sohrabi, M.K., Barforoush, A.A.: Parallel frequent itemset mining using systolic arrays. Knowl. Based Syst. 37, 462–471 (2013)CrossRef Sohrabi, M.K., Barforoush, A.A.: Parallel frequent itemset mining using systolic arrays. Knowl. Based Syst. 37, 462–471 (2013)CrossRef
18.
go back to reference Yang, X.Y., Liu, Z., Fu, Y.: MapReduce as a programming model for association rules algorithm on Hadoop. In: ICIS (2010) Yang, X.Y., Liu, Z., Fu, Y.: MapReduce as a programming model for association rules algorithm on Hadoop. In: ICIS (2010)
19.
20.
go back to reference Zaki, M.J., Gouda, K.: Fast vertical mining using diffsets. In: 9th SIGKDD, pp. 326–335 (2003) Zaki, M.J., Gouda, K.: Fast vertical mining using diffsets. In: 9th SIGKDD, pp. 326–335 (2003)
21.
go back to reference Zaki, M.J., Ogihara, M., Parthasarathy, S., Li, W.: Parallel algorithms for discovery of association rules. Data Min. Knowl. Disc. 1(4), 343–373 (1997)CrossRef Zaki, M.J., Ogihara, M., Parthasarathy, S., Li, W.: Parallel algorithms for discovery of association rules. Data Min. Knowl. Disc. 1(4), 343–373 (1997)CrossRef
Metadata
Title
Parallel Eclat for Opportunistic Mining of Frequent Itemsets
Authors
Junqiang Liu
Yongsheng Wu
Qingfeng Zhou
Benjamin C. M. Fung
Fanghui Chen
Binxiao Yu
Copyright Year
2015
DOI
https://doi.org/10.1007/978-3-319-22849-5_27

Premium Partner