nach oben

International Journal of Machine Learning and Cybernetics

Erschienen in:

10.08.2020 | Original Article

CL-MAX: a clustering-based approximation algorithm for mining maximal frequent itemsets

verfasst von: Seyed Mohsen Fatemi, Seyed Mohsen Hosseini, Ali Kamandi, Mahmood Shabankhah

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 2/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The problem of frequent itemset mining is one of the more important problems in data mining which has been extensively employed across a wide range of other relevant tasks such as market basket analysis in marketing, or text analysis in text mining applications. The majority of the deterministic frequent itemset mining algorithms which have been proposed in recent years use some sort or another of an optimal data structures to reduce the overall execution time of the algorithm. In this paper, however, we have tried instead to introduce an approximation algorithm which works by converting the problem into a clustering problem where similar transactions are grouped together. Each cluster centroid represents an itemset which may be assumed to be a candidate frequent itemsets. The validity of this assumption is simply verified by calculating the support count of these itemsets. Those who meet the min-support condition are considered to be an actual frequent itemset. As for the remaining itemsets, they are then passed to MAFIA which extract all maximal frequent itemsets therefrom. Experimentations made on several well-known and diverse datasets show that the proposed algorithm performs almost always faster, and in some cases up to 10 times faster, than the existing deterministic algorithms, and all this by retaining up to 95% of its accuracy.

Vorheriger Artikel Chinese medical relation extraction based on multi-hop self-attention mechanism

Nächster Artikel A survey of 5G network systems: challenges and machine learning approaches

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

ATZelectronics worldwide

ATZlectronics worldwide is up-to-speed on new trends and developments in automotive electronics on a scientific level with a high depth of information.

Order your 30-days-trial for free and without any commitment.

Jetzt informieren

ATZelektronik

Die Fachzeitschrift ATZelektronik bietet für Entwickler und Entscheider in der Automobil- und Zulieferindustrie qualitativ hochwertige und fundierte Informationen aus dem gesamten Spektrum der Pkw- und Nutzfahrzeug-Elektronik.

Lassen Sie sich jetzt unverbindlich 2 kostenlose Ausgabe zusenden.

Jetzt informieren

http://fimi.uantwerpen.be/data/

https://github.com/MohsenFatemii/CL-MAX

Agarwal RC, Aggarwal CC, Prasad V (2001) A tree projection algorithm for generation of frequent item sets. J Parallel Distrib Comput 61(3):350–371CrossRef

Aggarwal CC, Bhuiyan MA, Al Hasan M (2014) Frequent pattern mining algorithms: a survey. In: Frequent pattern mining, pp. 19–64, Springer, New York

Aggarwal CC, Yu PS (1998) Mining large itemsets for association rules. IEEE Data Eng Bull 21(1):23–31

Aggarwal CC, Yu PS (1998) Online generation of association rules. In: Proceedings 14th International Conference on Data Engineering, IEEE, pp 402–411

Agrawal R, Mannila H, Srikant R, Toivonen H, Verkamo AI et al (1996) Fast discovery of association rules. Adv Knowl Discov Data Min 12(1):307–328

Agrawal R, Srikant R et al (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on VLDB, vol. 1215, pp. 487–499

Bayardo Jr RJ (1998) Efficiently mining long patterns from databases. In: ACM Sigmod Record, ACM, vol 27, pp 85–93

Bhandari A, Gupta A, Das D (2015) Improvised apriori algorithm using frequent pattern tree for real time applications in data mining. Proc Comput Sci 46:644–651CrossRef

Bodon F (2003) A fast apriori implementation. In: FIMI, vol 3, p 63

10.

Borgelt C (2003) Efficient implementations of apriori and eclat. In: FIMI’03: Proceedings of the IEEE ICDM workshop on frequent itemset mining implementations

11.

Burdick D, Calimlim M, Flannick J, Gehrke J, Yiu T (2005) Mafia: a maximal frequent itemset algorithm. IEEE Trans Knowl Data Eng 17(11):1490–1504CrossRef

12.

Deng Z, Wang Z (2010) A new fast vertical method for mining frequent patterns. Int J Comput Intell Syst 3:733–744. https://doi.org/10.2991/ijcis.2010.3.6.4CrossRef

13.

Djenouri Y, Comuzzi M (2017) Combining apriori heuristic and bio-inspired algorithms for solving the frequent itemsets mining problem. Inf Sci 420:1–15CrossRef

14.

Dunkel B, Soparkar N (1999) Data organization and access for efficient data mining. In: Proceedings 15th International Conference on Data Engineering (Cat. No. 99CB36337), IEEE, pp 522–529

15.

Fatemi SM, Hosseini SM, Kamandi A, Shabankhah M (2020) A clustering based approximate algorithm for mining frequent itemsets. In: Data science: from research to application, Springer, Cham, pp 226–237

16.

Fournier-Viger P, Lin JCW, Vo B, Chi TT, Zhang J, Le HB (2017) A survey of itemset mining. Wiley Interdiscip Rev Data Min Knowl Discov 7(4):e1207CrossRef

17.

Ganti V, Gehrke J, Ramakrishnan R (2001) Demon: mining and monitoring evolving data. IEEE Trans Knowl Data Eng 13(1):50–63CrossRef

18.

Gunopulos D, Khardon R, Mannila H, Saluja S, Toivonen H, Sharma RS (2003) Discovering all most specific sentences. ACM Trans Database Syst (TODS) 28(2):140–174CrossRef

19.

Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, AmsterdamMATH

20.

Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: ACM sigmod record, ACM, vol 29, pp 1–12

21.

Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. SIGMOD Rec 29(2):1–12. https://doi.org/10.1145/335191.335372CrossRef

22.

Heaton J (2016) Comparing dataset characteristics that favor the apriori, eclat or fp-growth frequent itemset mining algorithms. In: SoutheastCon 2016, IEEE, pp 1–7

23.

Hornik K, Grün B, Hahsler M (2005) Rules-a computational environment for mining association rules and frequent item sets. J Stat Softw 14(15):1–25

24.

Inokuchi A, Washio T, Motoda H (2000) An apriori-based algorithm for mining frequent substructures from graph data. In: European conference on principles of data mining and knowledge discovery, Springer, New York, pp 13–23

25.

Lin DI, Kedem ZM (1998) Pincer-search: a new algorithm for discovering the maximum frequent set. In: International conference on extending database technology, Springer, New York, pp 103–119

26.

Negrevergne B, Dries A, Guns T, Nijssen S (2013) Dominance programming for itemset mining. In: 2013 IEEE 13th International Conference on Data Mining, IEEE, pp 557–566

27.

Park JS, Chen MS, Yu PS (1995) An effective hash-based algorithm for mining association rules, vol 24, ACM

28.

Rathee S, Kaul M, Kashyap A (2015) R-apriori: an efficient apriori based algorithm on spark. In: Proceedings of the 8th Workshop on Ph. D. Workshop in Information and Knowledge Management, ACM, pp 27–34

29.

Rymon R (1992) Search through systematic set enumeration

30.

Savasere A, Omiecinski ER, Navathe SB (1995) An efficient algorithm for mining association rules in large databases. Tech. rep., Georgia Institute of Technology

31.

Sculley D (2010) Web-scale k-means clustering. In: Proceedings of the 19th International Conference on World Wide Web, WWW ’10, pp. 1177–1178. ACM, New York, NY. https://doi.org/10.1145/1772690.1772862

32.

Shenoy P, Haritsa JR, Sudarshan S, Bhalotia G, Bawa M, Shah D (2000) Turbo-charging vertical mining of large databases. In: Acm Sigmod Record, vol. 29, pp 22–33, ACM

33.

Su S, Xu S, Cheng X, Li Z, Yang F (2015) Differentially private frequent itemset mining via transaction splitting. IEEE Trans Knowl Data Eng 27(7):1875–1891CrossRef

34.

Toivonen H et al (1996) Sampling large databases for association rules. VLDB 96:134–145

35.

Wang Y, Xu T, Xue S, Shen Y (2018) D2p-apriori: a deep parallel frequent itemset mining algorithm with dynamic queue. In: 2018 Tenth international conference on advanced computational intelligence (ICACI), pp. 649–654. IEEE

36.

Yang G (2004) The complexity of mining maximal frequent itemsets and maximal frequent patterns. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 344–353

37.

Yuan X (2017) An improved apriori algorithm for mining association rules. In: AIP conference proceedings, vol 1820, p 080005. AIP Publishing

38.

Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12(3):372–390CrossRef

39.

Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12(3):372–390. https://doi.org/10.1109/69.846291CrossRef

40.

Zeng C, Naughton JF, Cai JY (2012) On differentially private frequent itemset mining. Proc VLDB Endowment 6(1):25–36CrossRef

41.

Zhang C, Tian P, Zhang X, Liao Q, Jiang ZL, Wang X (2019) Hasheclat: an efficient frequent itemset algorithm. Int J Mach Learn Cybern pp 1–14

Titel: CL-MAX: a clustering-based approximation algorithm for mining maximal frequent itemsets
verfasst von: Seyed Mohsen Fatemi
Seyed Mohsen Hosseini
Ali Kamandi
Mahmood Shabankhah
Publikationsdatum: 10.08.2020
Verlag: Springer Berlin Heidelberg
Erschienen in: International Journal of Machine Learning and Cybernetics / Ausgabe 2/2021
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI: https://doi.org/10.1007/s13042-020-01177-5

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Internationaler Motorenkongress/© [M] ATZlive | Chisnikov / Fotolia.com, Search Icon, Banner Hanser, Benedikt Bonnmann von Adesso/© Adesso, Teilzeit/© Fokussiert / stock.adobe.com, Hans-Joachim Lefeld/© Lucht Probst Associates GmbH, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade, chassis.tech plus 2023/© [M] ATZlive / TÜV SÜD PRODUCT SERVICE GMBH

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

ATZelectronics worldwide

ATZelektronik

Weitere Artikel der Ausgabe 2/2021

Real-time human posture recognition using an adaptive hybrid classifier

Using multiple classifier behavior to develop a dynamic outlier ensemble

Optimal scale combination selection for multi-scale decision tables based on three-way decision

A novel feature learning framework for high-dimensional data classification

Cross-domain sentiment aware word embeddings for review sentiment analysis

Community detection and co-author recommendation in co-author networks

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.