nach oben

Knowledge and Information Systems

Erschienen in:

01.09.2016 | Regular Paper

Top-K Miner: top-K identical frequent itemsets discovery without user support threshold

verfasst von: Saif-Ur-Rehman, Jawad Ashraf, Asad Habib, Abdus Salam

Erschienen in: Knowledge and Information Systems | Ausgabe 3/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Frequent itemsets (FIs) mining is a prime research area in association rule mining. The customary techniques find FIs or its variants on the basis of either support threshold value or by setting two generic parameters, i.e., N (topmost itemsets) and \(K_\mathrm{{max}}\) (size of the itemsets). However, users are unable to mine the absolute desired number of patterns because they tune these approaches with their approximate parameters settings. We proposed a novel technique, top-K Miner that does not require setting of support threshold, N and \(K_\mathrm{{max}}\) values. Top-K Miner requires the user to specify only a single parameter, i.e., K to find the desired number of frequent patterns called identical frequent itemsets (IFIs). Top-K Miner uses a novel candidate production algorithm called join-FI algorithm. This algorithm uses frequent 2-itemsets to yield one or more candidate itemsets of arbitrary size. The join-FI algorithm follows bottom-up recursive technique to construct candidate-itemsets-search tree. Finally, the generated candidate itemsets are manipulated by the Maintain-Top-K_List algorithm to produce Top-K_List of the IFIs. The proposed top-K Miner algorithm significantly outperforms the generic benchmark techniques even when they are running with the ideal parameters settings.

Vorheriger Artikel Large-scale supervised similarity learning in networks

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM-SIGMOD international conference on management of data (SIGMOD’93). Washington, DC, pp 207–216

Grahne G, Zhu J (2003) High performance mining of maximal frequent itemsets. In: Proceeding of the 2003 SIAM international workshop on high performance data mining. pp 135–143

Lee W, Stolfo SJ, Mok KW (2000) Adaptive intrusion detection: a data mining approach. Artif Intell Rev 14(6):533–567CrossRefMATH

Pei J, Han J, Mortazavi-Asl B, Zhu H (2000) Mining access patterns efficiently from web logs. In: Proceeding of the 2000 Pacific-Asia conference on knowledge discovery and data mining. Kyoto, Japan, pp 396–407

Holt JD, Chung SM (1999) Efficient mining of association rules in text databases. In: Proceeding of the 1999 international conference on Information and knowledge management. Kansas City, Missouri, pp 234–242

Klemettinen M (1999) A knowledge discovery methodology for telecommunication network alarm databases. Ph.D. thesis, University of Helsinki

Satou K, Shibayama G, Ono T, Yamamura Y, Furuichi E, Kuhara S, Takagi T (1997) Finding associations rules on heterogeneous genome data. In: Proceeding of the 1997 Pacific symposium on biocomputing (PSB’97). Hawaii, pp 397–408

Bayardo RJ (1998) Efficiently mining long patterns from databases. In: Proceeding of the 1998 ACM-SIGMOD international conference on management of data (SIGMOD’98). Seattle, WA, pp 85–93

Burdick D, Calimlim M, Flannick J, Gehrke J, Yiu T (2005) MAFIA: a maximal frequent itemset algorithm. IEEE Trans Knowl Data Eng 17(11):1490–1504CrossRef

10.

Gouda K, Zaki MJ (2005) GenMax: an efficient algorithm for mining maximal frequent itemsets. Data Min Knowl Discov 11(3):1–20MathSciNetCrossRef

11.

Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Proceeding of the 7th international conference on database theory (ICDT’99). Jerusalem, Israel, pp 398–416

12.

Pei J, Han J, Mao R (2000) CLOSET: an efficient algorithm for mining frequent closed itemsets. In: Proceeding of the 2000 ACM-SIGMOD international workshop data mining and knowledge discovery (DMKD’00). Dallas, TX, pp 11–20

13.

Zaki MJ, Hsiao CJ (2002) CHARM: an efficient algorithm for closed itemset mining. In: Proceeding of the 2002 SIAM international conference on data mining (SDM’02). Arlington, VA, pp 457–473

14.

Borgelt C, Yang X, Nogales-Cadenas R, Carmona-Saez P, Pascual-Montano A (2011) Finding closed frequent item sets by intersecting transactions. In: Proceedings of the 2011 international conference on extending database technology (EDBT-11). Sweden, Uppsala, pp 367–376

15.

Hu T, Sung SY, Xiong H, Fu Q (2008) Discovery of maximum length frequent itemsets. Inf Sci Int J 178(1):69–87MathSciNet

16.

Zhu F, Yan X, Han J, Yu PS, Cheng H (2007) Mining colossal frequent patterns by core pattern fusion. In: Proceeding of the 2007 international conference on data engineering (ICDE’07). Istanbul, Turkey, pp 706–715

17.

Dabbiru M, Shashi M (2010) An efficient approach to colossal pattern mining. Int J Comput Sci Netw Secur (IJCSNS) 10(1):304–312

18.

Sohrabi MK, Barforoush AA (2012) Efficient colossal pattern mining in high dimensional datasets. Knowl Based Syst 33:41–52CrossRef

19.

Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceeding of the 2000 ACM-SIGMOD international conference on management of data (SIGMOD’00). Dallas, TX, pp 1–12

20.

Han J, Cheng H, Xin D, Yan (2007) Frequent pattern mining—current status and future directions. Data Min Knowl Discov 15(1):55–86MathSciNetCrossRef

21.

Cheung YL, Fu AWC (2004) Mining frequent itemsets without support threshold: with and without item constraints. IEEE Trans Knowl Data Eng 16(9):1052–1069CrossRef

22.

Fu AWC, Kwong RWW, Tang J (2000) Mining N-most interesting itemsets. In: Proceedings of the 2000 international symposium on methodologies for intelligent systems. pp 59–67

23.

Ngan SC, Lam T, Wong RCW, Fu AWC (2005) Mining N-most interesting itemsets without support threshold by the COFI-tree. Int J Bus Intell Data Min 1(1):88–106CrossRef

24.

El-Hajj M, Zaïane OR (2003) COFI-tree mining: a new approach to pattern growth with reduced candidacy generation. In: Workshop on frequent itemset mining implementations (FIMI 2003) in conjunction with IEEE-ICDM

25.

Salam A, Khayal M (2011) Mining top-k frequent patterns without minimum support threshold. Knowl Inf Syst 30(1):112–142

26.

Li Y, Lin Q, Li R, Duan D (2010) TGP: mining top-K frequent closed graph pattern without minimum support. In: Proceeding of the 2010 international conference on advanced data mining and applications (ADMA ’10). pp 537–548

27.

Xie Y, Yu PS (2010) Max-Clique: a top-down graph-based approach to frequent pattern mining. In: Proceeding of the 2010 IEEE international conference on data mining (ICDM ’10). pp 1139–1144

28.

Okubo Y, Haraguchi M (2012) Finding top-N colossal patterns based on clique search with dynamic update of graph. In: Proceeding of the 2012 international conference on formal concept analysis (ICFCA’12). Springer, pp 244–259

29.

Agrawal R, Mannila H, Srikant R, Toivonen H, Verkamo A (1996) Fast discovery of association rules. In: Fayyad UM, Piatetsky G, Smyth P, Uthurusamy R (eds) Advances in KDD. MIT press

30.

Holsheimer M, Kersten M, Mannila H, Toivonen H (1995) A perspective on database and data mining. In: Proceeding of the 1995 international conference on knowledge discovery and data mining (KDD’ 95). pp 150–155

31.

Zaki MJ, Gouda K (2003) Fast vertical mining using diffsets. In: Proceedings of the 2003 ACM-SIGKDD international conference on knowledge discovery and data mining (SIGKDD’03). Washington, pp 326–335

32.

Frequent itemset mining implementations repository. http://fimi.cs.helsinki.fi/

33.

Shen L, Shen H, Pritchard P, Topor R (1998) Finding the N largest itemsets. In: Proceedings of international conference on data mining. pp 211–222

34.

Quang TM, Oyanagi S, Yamazaki K (2006) ExMiner: an efficient algorithm for mining top-K frequent patterns, ADMA 2006, LNAI 4093. pp 436–447

35.

Wang J, Han J (2005) TFP: an efficient algorithm for mining top-K frequent closed itemsets. IEEE Trans Knowl Data Eng 17(5):652–664CrossRef

36.

Hirate Y, Iwahashi E, Yamana H (2004) TF2P-growth: an efficient algorithm for mining frequent patterns without any thresholds. In: Proceedings of ICDM

Titel: Top-K Miner: top-K identical frequent itemsets discovery without user support threshold
verfasst von: Saif-Ur-Rehman
Jawad Ashraf
Asad Habib
Abdus Salam
Publikationsdatum: 01.09.2016
Verlag: Springer London
Erschienen in: Knowledge and Information Systems / Ausgabe 3/2016
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI: https://doi.org/10.1007/s10115-015-0907-7

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 3/2016

Information diffusion in a multi-social-network scenario: framework and ASP-based analysis

Continuous similarity search for evolving queries

Models of time series with time granulation

A survey of the dendritic cell algorithm

Mining contentious documents

Privacy-concerned multiagent planning

Premium Partner