Skip to main content
Erschienen in: International Journal of Machine Learning and Cybernetics 2/2021

10.08.2020 | Original Article

CL-MAX: a clustering-based approximation algorithm for mining maximal frequent itemsets

verfasst von: Seyed Mohsen Fatemi, Seyed Mohsen Hosseini, Ali Kamandi, Mahmood Shabankhah

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 2/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The problem of frequent itemset mining is one of the more important problems in data mining which has been extensively employed across a wide range of other relevant tasks such as market basket analysis in marketing, or text analysis in text mining applications. The majority of the deterministic frequent itemset mining algorithms which have been proposed in recent years use some sort or another of an optimal data structures to reduce the overall execution time of the algorithm. In this paper, however, we have tried instead to introduce an approximation algorithm which works by converting the problem into a clustering problem where similar transactions are grouped together. Each cluster centroid represents an itemset which may be assumed to be a candidate frequent itemsets. The validity of this assumption is simply verified by calculating the support count of these itemsets. Those who meet the min-support condition are considered to be an actual frequent itemset. As for the remaining itemsets, they are then passed to MAFIA which extract all maximal frequent itemsets therefrom. Experimentations made on several well-known and diverse datasets show that the proposed algorithm performs almost always faster, and in some cases up to 10 times faster, than the existing deterministic algorithms, and all this by retaining up to 95% of its accuracy.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Weitere Produktempfehlungen anzeigen
Literatur
1.
Zurück zum Zitat Agarwal RC, Aggarwal CC, Prasad V (2001) A tree projection algorithm for generation of frequent item sets. J Parallel Distrib Comput 61(3):350–371CrossRef Agarwal RC, Aggarwal CC, Prasad V (2001) A tree projection algorithm for generation of frequent item sets. J Parallel Distrib Comput 61(3):350–371CrossRef
2.
Zurück zum Zitat Aggarwal CC, Bhuiyan MA, Al Hasan M (2014) Frequent pattern mining algorithms: a survey. In: Frequent pattern mining, pp. 19–64, Springer, New York Aggarwal CC, Bhuiyan MA, Al Hasan M (2014) Frequent pattern mining algorithms: a survey. In: Frequent pattern mining, pp. 19–64, Springer, New York
3.
Zurück zum Zitat Aggarwal CC, Yu PS (1998) Mining large itemsets for association rules. IEEE Data Eng Bull 21(1):23–31 Aggarwal CC, Yu PS (1998) Mining large itemsets for association rules. IEEE Data Eng Bull 21(1):23–31
4.
Zurück zum Zitat Aggarwal CC, Yu PS (1998) Online generation of association rules. In: Proceedings 14th International Conference on Data Engineering, IEEE, pp 402–411 Aggarwal CC, Yu PS (1998) Online generation of association rules. In: Proceedings 14th International Conference on Data Engineering, IEEE, pp 402–411
5.
Zurück zum Zitat Agrawal R, Mannila H, Srikant R, Toivonen H, Verkamo AI et al (1996) Fast discovery of association rules. Adv Knowl Discov Data Min 12(1):307–328 Agrawal R, Mannila H, Srikant R, Toivonen H, Verkamo AI et al (1996) Fast discovery of association rules. Adv Knowl Discov Data Min 12(1):307–328
6.
Zurück zum Zitat Agrawal R, Srikant R et al (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on VLDB, vol. 1215, pp. 487–499 Agrawal R, Srikant R et al (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on VLDB, vol. 1215, pp. 487–499
7.
Zurück zum Zitat Bayardo Jr RJ (1998) Efficiently mining long patterns from databases. In: ACM Sigmod Record, ACM, vol 27, pp 85–93 Bayardo Jr RJ (1998) Efficiently mining long patterns from databases. In: ACM Sigmod Record, ACM, vol 27, pp 85–93
8.
Zurück zum Zitat Bhandari A, Gupta A, Das D (2015) Improvised apriori algorithm using frequent pattern tree for real time applications in data mining. Proc Comput Sci 46:644–651CrossRef Bhandari A, Gupta A, Das D (2015) Improvised apriori algorithm using frequent pattern tree for real time applications in data mining. Proc Comput Sci 46:644–651CrossRef
9.
Zurück zum Zitat Bodon F (2003) A fast apriori implementation. In: FIMI, vol 3, p 63 Bodon F (2003) A fast apriori implementation. In: FIMI, vol 3, p 63
10.
Zurück zum Zitat Borgelt C (2003) Efficient implementations of apriori and eclat. In: FIMI’03: Proceedings of the IEEE ICDM workshop on frequent itemset mining implementations Borgelt C (2003) Efficient implementations of apriori and eclat. In: FIMI’03: Proceedings of the IEEE ICDM workshop on frequent itemset mining implementations
11.
Zurück zum Zitat Burdick D, Calimlim M, Flannick J, Gehrke J, Yiu T (2005) Mafia: a maximal frequent itemset algorithm. IEEE Trans Knowl Data Eng 17(11):1490–1504CrossRef Burdick D, Calimlim M, Flannick J, Gehrke J, Yiu T (2005) Mafia: a maximal frequent itemset algorithm. IEEE Trans Knowl Data Eng 17(11):1490–1504CrossRef
13.
Zurück zum Zitat Djenouri Y, Comuzzi M (2017) Combining apriori heuristic and bio-inspired algorithms for solving the frequent itemsets mining problem. Inf Sci 420:1–15CrossRef Djenouri Y, Comuzzi M (2017) Combining apriori heuristic and bio-inspired algorithms for solving the frequent itemsets mining problem. Inf Sci 420:1–15CrossRef
14.
Zurück zum Zitat Dunkel B, Soparkar N (1999) Data organization and access for efficient data mining. In: Proceedings 15th International Conference on Data Engineering (Cat. No. 99CB36337), IEEE, pp 522–529 Dunkel B, Soparkar N (1999) Data organization and access for efficient data mining. In: Proceedings 15th International Conference on Data Engineering (Cat. No. 99CB36337), IEEE, pp 522–529
15.
Zurück zum Zitat Fatemi SM, Hosseini SM, Kamandi A, Shabankhah M (2020) A clustering based approximate algorithm for mining frequent itemsets. In: Data science: from research to application, Springer, Cham, pp 226–237 Fatemi SM, Hosseini SM, Kamandi A, Shabankhah M (2020) A clustering based approximate algorithm for mining frequent itemsets. In: Data science: from research to application, Springer, Cham, pp 226–237
16.
Zurück zum Zitat Fournier-Viger P, Lin JCW, Vo B, Chi TT, Zhang J, Le HB (2017) A survey of itemset mining. Wiley Interdiscip Rev Data Min Knowl Discov 7(4):e1207CrossRef Fournier-Viger P, Lin JCW, Vo B, Chi TT, Zhang J, Le HB (2017) A survey of itemset mining. Wiley Interdiscip Rev Data Min Knowl Discov 7(4):e1207CrossRef
17.
Zurück zum Zitat Ganti V, Gehrke J, Ramakrishnan R (2001) Demon: mining and monitoring evolving data. IEEE Trans Knowl Data Eng 13(1):50–63CrossRef Ganti V, Gehrke J, Ramakrishnan R (2001) Demon: mining and monitoring evolving data. IEEE Trans Knowl Data Eng 13(1):50–63CrossRef
18.
Zurück zum Zitat Gunopulos D, Khardon R, Mannila H, Saluja S, Toivonen H, Sharma RS (2003) Discovering all most specific sentences. ACM Trans Database Syst (TODS) 28(2):140–174CrossRef Gunopulos D, Khardon R, Mannila H, Saluja S, Toivonen H, Sharma RS (2003) Discovering all most specific sentences. ACM Trans Database Syst (TODS) 28(2):140–174CrossRef
19.
Zurück zum Zitat Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, AmsterdamMATH Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, AmsterdamMATH
20.
Zurück zum Zitat Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: ACM sigmod record, ACM, vol 29, pp 1–12 Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: ACM sigmod record, ACM, vol 29, pp 1–12
22.
Zurück zum Zitat Heaton J (2016) Comparing dataset characteristics that favor the apriori, eclat or fp-growth frequent itemset mining algorithms. In: SoutheastCon 2016, IEEE, pp 1–7 Heaton J (2016) Comparing dataset characteristics that favor the apriori, eclat or fp-growth frequent itemset mining algorithms. In: SoutheastCon 2016, IEEE, pp 1–7
23.
Zurück zum Zitat Hornik K, Grün B, Hahsler M (2005) Rules-a computational environment for mining association rules and frequent item sets. J Stat Softw 14(15):1–25 Hornik K, Grün B, Hahsler M (2005) Rules-a computational environment for mining association rules and frequent item sets. J Stat Softw 14(15):1–25
24.
Zurück zum Zitat Inokuchi A, Washio T, Motoda H (2000) An apriori-based algorithm for mining frequent substructures from graph data. In: European conference on principles of data mining and knowledge discovery, Springer, New York, pp 13–23 Inokuchi A, Washio T, Motoda H (2000) An apriori-based algorithm for mining frequent substructures from graph data. In: European conference on principles of data mining and knowledge discovery, Springer, New York, pp 13–23
25.
Zurück zum Zitat Lin DI, Kedem ZM (1998) Pincer-search: a new algorithm for discovering the maximum frequent set. In: International conference on extending database technology, Springer, New York, pp 103–119 Lin DI, Kedem ZM (1998) Pincer-search: a new algorithm for discovering the maximum frequent set. In: International conference on extending database technology, Springer, New York, pp 103–119
26.
Zurück zum Zitat Negrevergne B, Dries A, Guns T, Nijssen S (2013) Dominance programming for itemset mining. In: 2013 IEEE 13th International Conference on Data Mining, IEEE, pp 557–566 Negrevergne B, Dries A, Guns T, Nijssen S (2013) Dominance programming for itemset mining. In: 2013 IEEE 13th International Conference on Data Mining, IEEE, pp 557–566
27.
Zurück zum Zitat Park JS, Chen MS, Yu PS (1995) An effective hash-based algorithm for mining association rules, vol 24, ACM Park JS, Chen MS, Yu PS (1995) An effective hash-based algorithm for mining association rules, vol 24, ACM
28.
Zurück zum Zitat Rathee S, Kaul M, Kashyap A (2015) R-apriori: an efficient apriori based algorithm on spark. In: Proceedings of the 8th Workshop on Ph. D. Workshop in Information and Knowledge Management, ACM, pp 27–34 Rathee S, Kaul M, Kashyap A (2015) R-apriori: an efficient apriori based algorithm on spark. In: Proceedings of the 8th Workshop on Ph. D. Workshop in Information and Knowledge Management, ACM, pp 27–34
29.
Zurück zum Zitat Rymon R (1992) Search through systematic set enumeration Rymon R (1992) Search through systematic set enumeration
30.
Zurück zum Zitat Savasere A, Omiecinski ER, Navathe SB (1995) An efficient algorithm for mining association rules in large databases. Tech. rep., Georgia Institute of Technology Savasere A, Omiecinski ER, Navathe SB (1995) An efficient algorithm for mining association rules in large databases. Tech. rep., Georgia Institute of Technology
32.
Zurück zum Zitat Shenoy P, Haritsa JR, Sudarshan S, Bhalotia G, Bawa M, Shah D (2000) Turbo-charging vertical mining of large databases. In: Acm Sigmod Record, vol. 29, pp 22–33, ACM Shenoy P, Haritsa JR, Sudarshan S, Bhalotia G, Bawa M, Shah D (2000) Turbo-charging vertical mining of large databases. In: Acm Sigmod Record, vol. 29, pp 22–33, ACM
33.
Zurück zum Zitat Su S, Xu S, Cheng X, Li Z, Yang F (2015) Differentially private frequent itemset mining via transaction splitting. IEEE Trans Knowl Data Eng 27(7):1875–1891CrossRef Su S, Xu S, Cheng X, Li Z, Yang F (2015) Differentially private frequent itemset mining via transaction splitting. IEEE Trans Knowl Data Eng 27(7):1875–1891CrossRef
34.
Zurück zum Zitat Toivonen H et al (1996) Sampling large databases for association rules. VLDB 96:134–145 Toivonen H et al (1996) Sampling large databases for association rules. VLDB 96:134–145
35.
Zurück zum Zitat Wang Y, Xu T, Xue S, Shen Y (2018) D2p-apriori: a deep parallel frequent itemset mining algorithm with dynamic queue. In: 2018 Tenth international conference on advanced computational intelligence (ICACI), pp. 649–654. IEEE Wang Y, Xu T, Xue S, Shen Y (2018) D2p-apriori: a deep parallel frequent itemset mining algorithm with dynamic queue. In: 2018 Tenth international conference on advanced computational intelligence (ICACI), pp. 649–654. IEEE
36.
Zurück zum Zitat Yang G (2004) The complexity of mining maximal frequent itemsets and maximal frequent patterns. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 344–353 Yang G (2004) The complexity of mining maximal frequent itemsets and maximal frequent patterns. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 344–353
37.
Zurück zum Zitat Yuan X (2017) An improved apriori algorithm for mining association rules. In: AIP conference proceedings, vol 1820, p 080005. AIP Publishing Yuan X (2017) An improved apriori algorithm for mining association rules. In: AIP conference proceedings, vol 1820, p 080005. AIP Publishing
38.
Zurück zum Zitat Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12(3):372–390CrossRef Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12(3):372–390CrossRef
40.
Zurück zum Zitat Zeng C, Naughton JF, Cai JY (2012) On differentially private frequent itemset mining. Proc VLDB Endowment 6(1):25–36CrossRef Zeng C, Naughton JF, Cai JY (2012) On differentially private frequent itemset mining. Proc VLDB Endowment 6(1):25–36CrossRef
41.
Zurück zum Zitat Zhang C, Tian P, Zhang X, Liao Q, Jiang ZL, Wang X (2019) Hasheclat: an efficient frequent itemset algorithm. Int J Mach Learn Cybern pp 1–14 Zhang C, Tian P, Zhang X, Liao Q, Jiang ZL, Wang X (2019) Hasheclat: an efficient frequent itemset algorithm. Int J Mach Learn Cybern pp 1–14
Metadaten
Titel
CL-MAX: a clustering-based approximation algorithm for mining maximal frequent itemsets
verfasst von
Seyed Mohsen Fatemi
Seyed Mohsen Hosseini
Ali Kamandi
Mahmood Shabankhah
Publikationsdatum
10.08.2020
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal of Machine Learning and Cybernetics / Ausgabe 2/2021
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-020-01177-5

Weitere Artikel der Ausgabe 2/2021

International Journal of Machine Learning and Cybernetics 2/2021 Zur Ausgabe

Neuer Inhalt