Skip to main content

2015 | OriginalPaper | Buchkapitel

Parallel Eclat for Opportunistic Mining of Frequent Itemsets

verfasst von : Junqiang Liu, Yongsheng Wu, Qingfeng Zhou, Benjamin C. M. Fung, Fanghui Chen, Binxiao Yu

Erschienen in: Database and Expert Systems Applications

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Mining frequent itemsets is an essential data mining problem. As the big data era comes, the size of databases is becoming so large that traditional algorithms will not scale well. An approach to the issue is to parallelize the mining algorithm, which however is a challenge that has not been well addressed yet. In this paper, we propose a MapReduce-based algorithm, Peclat, that parallelizes the vertical mining algorithm, Eclat, with three improvements. First, Peclat proposes a hybrid vertical data format to represent the data, which saves both space and time in the mining process. Second, Peclat adopts the pruning technique from the Apriori algorithm to improve efficiency of breadth-first search. Third, Peclat employs an ordering of itemsets that helps balancing the workloads. Extensive experiments demonstrate that Peclat outperforms the existing MapReduce-based algorithms significantly.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Agrawal, R., Shafer, J.: Parallel mining of association rules. IEEE Trans. Knowl. Data Eng. 8, 962–969 (1996)CrossRefMATH Agrawal, R., Shafer, J.: Parallel mining of association rules. IEEE Trans. Knowl. Data Eng. 8, 962–969 (1996)CrossRefMATH
2.
Zurück zum Zitat Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: 20th VLDB, p. 487 (1994) Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: 20th VLDB, p. 487 (1994)
3.
Zurück zum Zitat Chen, X., He, Y., Chen, P., Miao, S., Song, W., Yue, M.: HPFP-Miner: a novel parallel frequent itemset mining algorithm. ICNC 3, 139–143 (2009) Chen, X., He, Y., Chen, P., Miao, S., Song, W., Yue, M.: HPFP-Miner: a novel parallel frequent itemset mining algorithm. ICNC 3, 139–143 (2009)
4.
Zurück zum Zitat Cyrans, J.-D., Ratt, S., Champagne, R.: Adaptation of apriori to MapReduce to build a warehouse of relations between named entities across the web. In: 2010 DBKDA, pp. 185–189 (2010) Cyrans, J.-D., Ratt, S., Champagne, R.: Adaptation of apriori to MapReduce to build a warehouse of relations between named entities across the web. In: 2010 DBKDA, pp. 185–189 (2010)
5.
Zurück zum Zitat Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef
6.
Zurück zum Zitat Dunkel, B., Soparkar, N.: Data organization and access for efficient data mining. In: 15th ICDE, pp. 522–529 (1999) Dunkel, B., Soparkar, N.: Data organization and access for efficient data mining. In: 15th ICDE, pp. 522–529 (1999)
7.
Zurück zum Zitat Farzanyar, Z., Cercone, N.: Efficient mining of frequent itemsets in social network data based on MapReduce framework. In: ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 1183–1188 (2013) Farzanyar, Z., Cercone, N.: Efficient mining of frequent itemsets in social network data based on MapReduce framework. In: ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 1183–1188 (2013)
8.
Zurück zum Zitat Hammoud, S.: MapReduce network enabled algorithms for classification based on association rules. Ph.D. Thesis, Brunel University (2011) Hammoud, S.: MapReduce network enabled algorithms for classification based on association rules. Ph.D. Thesis, Brunel University (2011)
9.
Zurück zum Zitat Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: 2000 SIGMOD, pp. 1–12 (2000) Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: 2000 SIGMOD, pp. 1–12 (2000)
10.
Zurück zum Zitat Li, H., Wang, Y., Zhang, D., Zhang, M., Chang, E.Y.: PFP: parallel FP-growth for query recommendation. In: 2008 ACM Conference on Recommender System (RecSys 2008), pp. 107–114 (2008) Li, H., Wang, Y., Zhang, D., Zhang, M., Chang, E.Y.: PFP: parallel FP-growth for query recommendation. In: 2008 ACM Conference on Recommender System (RecSys 2008), pp. 107–114 (2008)
11.
Zurück zum Zitat Li, L., Zhang, M.: The strategy of mining association rules based on cloud computing. In: BCGIN, pp. 475–478 (2011) Li, L., Zhang, M.: The strategy of mining association rules based on cloud computing. In: BCGIN, pp. 475–478 (2011)
12.
Zurück zum Zitat Li, N., Zeng, L., He, Q., Shi, Z.: Parallel implementation of apriori algorithm based on MapReduce. In: ACIS International Conference on Software Engineering, Artificial Intelligence, Networking & Parallel/Distributed Computing, pp. 236–241 (2012) Li, N., Zeng, L., He, Q., Shi, Z.: Parallel implementation of apriori algorithm based on MapReduce. In: ACIS International Conference on Software Engineering, Artificial Intelligence, Networking & Parallel/Distributed Computing, pp. 236–241 (2012)
13.
Zurück zum Zitat Lin, M.-Y., Lee, P.-Y., Hsueh, S.-C.: Apriori-based frequent itemset mining algorithms on MapReduce. In: ICUIMC (2012) Lin, M.-Y., Lee, P.-Y., Hsueh, S.-C.: Apriori-based frequent itemset mining algorithms on MapReduce. In: ICUIMC (2012)
14.
Zurück zum Zitat Riondato, M., DeBrabant, J.A., Fonseca, R., Upfal, E.: PARMA: a randomized parallel algorithm for approximate association rule mining in MapReduce. In: 21st CIKM, pp. 85–94 (2012) Riondato, M., DeBrabant, J.A., Fonseca, R., Upfal, E.: PARMA: a randomized parallel algorithm for approximate association rule mining in MapReduce. In: 21st CIKM, pp. 85–94 (2012)
15.
Zurück zum Zitat Sarawagi, S., Thomas, S., Agrawal, R.: Integrating association rule mining with databases: alternatives and implications. In: 1998 SIGMOD, pp. 343–354 (1998) Sarawagi, S., Thomas, S., Agrawal, R.: Integrating association rule mining with databases: alternatives and implications. In: 1998 SIGMOD, pp. 343–354 (1998)
16.
Zurück zum Zitat Shenoy, P., Haritsa, J.R., Sudarshan, S.: Turbo-charging vertical mining of large databases. In: 2000 SIGMOD, pp. 22–33 (2000) Shenoy, P., Haritsa, J.R., Sudarshan, S.: Turbo-charging vertical mining of large databases. In: 2000 SIGMOD, pp. 22–33 (2000)
17.
Zurück zum Zitat Sohrabi, M.K., Barforoush, A.A.: Parallel frequent itemset mining using systolic arrays. Knowl. Based Syst. 37, 462–471 (2013)CrossRef Sohrabi, M.K., Barforoush, A.A.: Parallel frequent itemset mining using systolic arrays. Knowl. Based Syst. 37, 462–471 (2013)CrossRef
18.
Zurück zum Zitat Yang, X.Y., Liu, Z., Fu, Y.: MapReduce as a programming model for association rules algorithm on Hadoop. In: ICIS (2010) Yang, X.Y., Liu, Z., Fu, Y.: MapReduce as a programming model for association rules algorithm on Hadoop. In: ICIS (2010)
19.
20.
Zurück zum Zitat Zaki, M.J., Gouda, K.: Fast vertical mining using diffsets. In: 9th SIGKDD, pp. 326–335 (2003) Zaki, M.J., Gouda, K.: Fast vertical mining using diffsets. In: 9th SIGKDD, pp. 326–335 (2003)
21.
Zurück zum Zitat Zaki, M.J., Ogihara, M., Parthasarathy, S., Li, W.: Parallel algorithms for discovery of association rules. Data Min. Knowl. Disc. 1(4), 343–373 (1997)CrossRef Zaki, M.J., Ogihara, M., Parthasarathy, S., Li, W.: Parallel algorithms for discovery of association rules. Data Min. Knowl. Disc. 1(4), 343–373 (1997)CrossRef
Metadaten
Titel
Parallel Eclat for Opportunistic Mining of Frequent Itemsets
verfasst von
Junqiang Liu
Yongsheng Wu
Qingfeng Zhou
Benjamin C. M. Fung
Fanghui Chen
Binxiao Yu
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-22849-5_27