Skip to main content
Erschienen in: International Journal of Machine Learning and Cybernetics 11/2019

04.01.2019 | Original Article

HashEclat: an efficient frequent itemset algorithm

verfasst von: Chunkai Zhang, Panbo Tian, Xudong Zhang, Qing Liao, Zoe L. Jiang, Xuan Wang

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 11/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The Eclat algorithm is one of the most widely used frequent itemset mining methods. However, the inefficiency for calculating the intersection of itemsets makes it a time-consuming method, especially when the dataset has a large number of transactions. In this work, for the purpose of efficiency improvement, we proposed an approximate Eclat algorithm named HashEclat based on MinHash, which could quickly estimate the size of the intersection set, and adjust the parameters k, E and minSup to consider the tradeoff between accuracy of the mining results and execution time. The parameter k is the top-k parameter of one-permutation MinHash algorithm; the parameter E is the estimate error of one intersection size; the parameter minSup is the minimum support threshold. In many real situations, an approximate result with faster speed maybe more useful than ‘exact’ result. The theoretical analysis and experiment results that we present demonstrate that the proposed algorithm can output almost all of the frequent itemset with faster speed and less memory space.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Weitere Produktempfehlungen anzeigen
Literatur
1.
Zurück zum Zitat Han J, Kamber M (2006) Data mining: concepts and techniques. Data Min Concepts Models Methods Algorithms Second Ed 5(4): 1–18MATH Han J, Kamber M (2006) Data mining: concepts and techniques. Data Min Concepts Models Methods Algorithms Second Ed 5(4): 1–18MATH
2.
Zurück zum Zitat Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings 20th international conference very large data bases, VLDB vol, pp 487–499 Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings 20th international conference very large data bases, VLDB vol, pp 487–499
3.
Zurück zum Zitat Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: ACM sigmod record, pp 1–12CrossRef Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: ACM sigmod record, pp 1–12CrossRef
4.
Zurück zum Zitat Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12(3):372–390CrossRef Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12(3):372–390CrossRef
5.
Zurück zum Zitat Heaton J (2016) Comparing dataset characteristics that favor the Apriori, Eclat or FP-growth frequent itemset mining algorithms. In: Southeast con, pp 1–7 Heaton J (2016) Comparing dataset characteristics that favor the Apriori, Eclat or FP-growth frequent itemset mining algorithms. In: Southeast con, pp 1–7
6.
Zurück zum Zitat Preiss PM, Ma R, Tai ME, Lukin A, Rispoli M, Zupancic P, Greiner M (2015) Strongly correlated quantum walks in optical lattices. Science 347(6227):1229–1233MathSciNetCrossRef Preiss PM, Ma R, Tai ME, Lukin A, Rispoli M, Zupancic P, Greiner M (2015) Strongly correlated quantum walks in optical lattices. Science 347(6227):1229–1233MathSciNetCrossRef
7.
Zurück zum Zitat Zaki MJ, Gouda K (2003) Fast vertical mining using diffsets. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 326–335 Zaki MJ, Gouda K (2003) Fast vertical mining using diffsets. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 326–335
8.
Zurück zum Zitat Ma Z, Yang J, Zhang T, Liu F (2016) An improved eclat algorithm for mining association rules based on increased search strategy. Int J Database Theory Appl 9(5):251–266CrossRef Ma Z, Yang J, Zhang T, Liu F (2016) An improved eclat algorithm for mining association rules based on increased search strategy. Int J Database Theory Appl 9(5):251–266CrossRef
9.
Zurück zum Zitat Xiong ZY, Chen PE, Zhang YF (2010) Improvement of eclat algorithm for association rules based on hash boolean matrix. Appl Res Comput 27(4):1323–1325 Xiong ZY, Chen PE, Zhang YF (2010) Improvement of eclat algorithm for association rules based on hash boolean matrix. Appl Res Comput 27(4):1323–1325
10.
Zurück zum Zitat Cohen H, Porat E (2010) Fast set intersection and two-patterns matching. Theor Comput Sci 411(40–42):3795–3800MathSciNetCrossRef Cohen H, Porat E (2010) Fast set intersection and two-patterns matching. Theor Comput Sci 411(40–42):3795–3800MathSciNetCrossRef
11.
Zurück zum Zitat Wang X, Wang R, Xu C (2018) Discovering the relationship between generalization and uncertainty by incorporating complexity of classification. IEEE Trans Cybern 48(2):703–715MathSciNetCrossRef Wang X, Wang R, Xu C (2018) Discovering the relationship between generalization and uncertainty by incorporating complexity of classification. IEEE Trans Cybern 48(2):703–715MathSciNetCrossRef
12.
Zurück zum Zitat Wang R, Wang X, Kwong S et al (2017) Incorporating diversity and informativeness in multiple-instance active learning. IEEEE Trans Fuzzy Syst 25(6):1460–1475CrossRef Wang R, Wang X, Kwong S et al (2017) Incorporating diversity and informativeness in multiple-instance active learning. IEEEE Trans Fuzzy Syst 25(6):1460–1475CrossRef
13.
14.
Zurück zum Zitat Cohen E, Kaplan H, Sen S (2009) Coordinated weighted sampling for estimating aggregates over multiple weight assignments. Proc VLDB Endow 2(1):646–657CrossRef Cohen E, Kaplan H, Sen S (2009) Coordinated weighted sampling for estimating aggregates over multiple weight assignments. Proc VLDB Endow 2(1):646–657CrossRef
15.
Zurück zum Zitat Teschner M, Heidelberger B, Müller M, Pomerantes D, Gross MH (2003) Optimized spatial hashing for collision detection of deformable objects. In: Vmv, pp 47–54 Teschner M, Heidelberger B, Müller M, Pomerantes D, Gross MH (2003) Optimized spatial hashing for collision detection of deformable objects. In: Vmv, pp 47–54
16.
Zurück zum Zitat Broder AZ, Charikar M, Frieze AM, Mitzenmacher M (2000) Min-wise independent permutations. J Comput Syst Sci 60(3):630–659MathSciNetCrossRef Broder AZ, Charikar M, Frieze AM, Mitzenmacher M (2000) Min-wise independent permutations. J Comput Syst Sci 60(3):630–659MathSciNetCrossRef
17.
Zurück zum Zitat Cohen E, Datar M, Fujiwara S, Gionis A, Indyk P, Motwani R et al (2001) Finding interesting associations without support pruning. IEEE Trans Knowl Data Eng 13(1):64–78CrossRef Cohen E, Datar M, Fujiwara S, Gionis A, Indyk P, Motwani R et al (2001) Finding interesting associations without support pruning. IEEE Trans Knowl Data Eng 13(1):64–78CrossRef
18.
Zurück zum Zitat Pagh R, Stöckel M, Woodruff DP (2014) Is min-wise hashing optimal for summarizing set intersection?. In: Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, pp 109–120 Pagh R, Stöckel M, Woodruff DP (2014) Is min-wise hashing optimal for summarizing set intersection?. In: Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, pp 109–120
19.
Zurück zum Zitat Goethals B (2002) Survey on frequent pattern mining. Univ Helsinki 63(14):47–52 Goethals B (2002) Survey on frequent pattern mining. Univ Helsinki 63(14):47–52
20.
Zurück zum Zitat Aggarwal CC, Han J (eds) (2014) Frequent pattern mining. Springer, Berlin, pp 19–23MATH Aggarwal CC, Han J (eds) (2014) Frequent pattern mining. Springer, Berlin, pp 19–23MATH
21.
Zurück zum Zitat Wang X, Zhang T, Wang R (2017) Non-iterative deep learning: incorporating restricted boltzmann machine into multilayer random weight neural networks. In: IEEE transactions on systems, man, and cybernetics: systems, IEEE early access articles, p 99 Wang X, Zhang T, Wang R (2017) Non-iterative deep learning: incorporating restricted boltzmann machine into multilayer random weight neural networks. In: IEEE transactions on systems, man, and cybernetics: systems, IEEE early access articles, p 99
22.
Zurück zum Zitat Xun Y, Zhang J, Qin X, Zhao X (2017) Fidoop-dp: data partitioning in frequent itemset mining on hadoop clusters. In: IEEE transactions on parallel and distributed systems, vol 99, pp 77–84 Xun Y, Zhang J, Qin X, Zhao X (2017) Fidoop-dp: data partitioning in frequent itemset mining on hadoop clusters. In: IEEE transactions on parallel and distributed systems, vol 99, pp 77–84
23.
Zurück zum Zitat Broder AZ, Charikar M, Frieze AM, Mitzenmacher M (1998) Min-wise independent permutations (extended abstract). In: Stoc’98 Proceedings of the Thirtieth annual acm symposium on theory of computing, vol 60, no 3, pp 327–336 Broder AZ, Charikar M, Frieze AM, Mitzenmacher M (1998) Min-wise independent permutations (extended abstract). In: Stoc’98 Proceedings of the Thirtieth annual acm symposium on theory of computing, vol 60, no 3, pp 327–336
24.
Zurück zum Zitat Broder AZ (1997) On the resemblance and containment of documents. In: Proceedings of compression and complexity of sequences, pp 21–29 Broder AZ (1997) On the resemblance and containment of documents. In: Proceedings of compression and complexity of sequences, pp 21–29
25.
Zurück zum Zitat Leskovec J, Rajaraman A, Ullman JD (2014) Mining of massive datasets. Cambridge University Press, Cambridge, pp 7–15CrossRef Leskovec J, Rajaraman A, Ullman JD (2014) Mining of massive datasets. Cambridge University Press, Cambridge, pp 7–15CrossRef
26.
Zurück zum Zitat Wang X, Xing HJ, Li Y et al (2015) A study on relationship between generalization abilities and fuzziness of base classifiers in ensemble learning. IEEE Trans Fuzzy Syst 23(5):1638–1654CrossRef Wang X, Xing HJ, Li Y et al (2015) A study on relationship between generalization abilities and fuzziness of base classifiers in ensemble learning. IEEE Trans Fuzzy Syst 23(5):1638–1654CrossRef
27.
Zurück zum Zitat Szmit R (2013) Locality sensitive hashing for similarity search using MapReduce on large scale data. In: Language processing and intelligent information systems, pp 171–178CrossRef Szmit R (2013) Locality sensitive hashing for similarity search using MapReduce on large scale data. In: Language processing and intelligent information systems, pp 171–178CrossRef
28.
Zurück zum Zitat Chum O, Philbin J, Zisserman A (2008) Near duplicate image detection: min-Hash and tf-idf weighting. In: BMVC, pp 812–815 Chum O, Philbin J, Zisserman A (2008) Near duplicate image detection: min-Hash and tf-idf weighting. In: BMVC, pp 812–815
29.
Zurück zum Zitat Wang H, Cao J, Shu L, Rafiei D (2013) Locality sensitive hashing revisited: filling the gap between theory and algorithm analysis. In Proceedings of the 22nd ACM international conference on information and knowledge management, pp 1969–1978 Wang H, Cao J, Shu L, Rafiei D (2013) Locality sensitive hashing revisited: filling the gap between theory and algorithm analysis. In Proceedings of the 22nd ACM international conference on information and knowledge management, pp 1969–1978
30.
Zurück zum Zitat Li P, Owen A, Zhang CH (2012) One permutation hashing for efficient search and learning. arXiv preprint arXiv, pp 1208–1259 Li P, Owen A, Zhang CH (2012) One permutation hashing for efficient search and learning. arXiv preprint arXiv, pp 1208–1259
31.
Zurück zum Zitat Li P, Shrivastava A, Moore J, König AC (2011) B-bit minwise hashing for large-scale learning. Comput Sci 54(8):101–109 Li P, Shrivastava A, Moore J, König AC (2011) B-bit minwise hashing for large-scale learning. Comput Sci 54(8):101–109
32.
Zurück zum Zitat Wang X, Wang R, Feng H-M, Wang H (2014) A new approach to classifier fusion based on upper integral. IEEE Trans Cybern 44(5):620–635MathSciNetCrossRef Wang X, Wang R, Feng H-M, Wang H (2014) A new approach to classifier fusion based on upper integral. IEEE Trans Cybern 44(5):620–635MathSciNetCrossRef
33.
Zurück zum Zitat Li P, Gui W (2010) b-Bit minwise hashing for estimating three-way similarities. In: International conference on neural information processing systems, pp 1387–1395 Li P, Gui W (2010) b-Bit minwise hashing for estimating three-way similarities. In: International conference on neural information processing systems, pp 1387–1395
35.
Zurück zum Zitat Huang H, Xu H, Wang X, Silamu W (2015) Maximum f1-score discriminative training criterion for automatic mispronunciation detection. IEEE/ACM Trans Audio Speech Lang Process 23(4):787–797CrossRef Huang H, Xu H, Wang X, Silamu W (2015) Maximum f1-score discriminative training criterion for automatic mispronunciation detection. IEEE/ACM Trans Audio Speech Lang Process 23(4):787–797CrossRef
36.
Zurück zum Zitat Wang X, He Y-L, Dabby D (2014) Non-naive bayesian classifiers for classification problems with continuous attributes. IEEE Trans Cybern 44(1):21–39CrossRef Wang X, He Y-L, Dabby D (2014) Non-naive bayesian classifiers for classification problems with continuous attributes. IEEE Trans Cybern 44(1):21–39CrossRef
Metadaten
Titel
HashEclat: an efficient frequent itemset algorithm
verfasst von
Chunkai Zhang
Panbo Tian
Xudong Zhang
Qing Liao
Zoe L. Jiang
Xuan Wang
Publikationsdatum
04.01.2019
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal of Machine Learning and Cybernetics / Ausgabe 11/2019
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-018-00918-x

Weitere Artikel der Ausgabe 11/2019

International Journal of Machine Learning and Cybernetics 11/2019 Zur Ausgabe

Neuer Inhalt