Skip to main content
Erschienen in: Knowledge and Information Systems 3/2016

01.09.2016 | Regular Paper

Top-K Miner: top-K identical frequent itemsets discovery without user support threshold

verfasst von: Saif-Ur-Rehman, Jawad Ashraf, Asad Habib, Abdus Salam

Erschienen in: Knowledge and Information Systems | Ausgabe 3/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Frequent itemsets (FIs) mining is a prime research area in association rule mining. The customary techniques find FIs or its variants on the basis of either support threshold value or by setting two generic parameters, i.e., N (topmost itemsets) and \(K_\mathrm{{max}}\) (size of the itemsets). However, users are unable to mine the absolute desired number of patterns because they tune these approaches with their approximate parameters settings. We proposed a novel technique, top-K Miner that does not require setting of support threshold, N and \(K_\mathrm{{max}}\) values. Top-K Miner requires the user to specify only a single parameter, i.e., K to find the desired number of frequent patterns called identical frequent itemsets (IFIs). Top-K Miner uses a novel candidate production algorithm called join-FI algorithm. This algorithm uses frequent 2-itemsets to yield one or more candidate itemsets of arbitrary size. The join-FI algorithm follows bottom-up recursive technique to construct candidate-itemsets-search tree. Finally, the generated candidate itemsets are manipulated by the Maintain-Top-K_List algorithm to produce Top-K_List of the IFIs. The proposed top-K Miner algorithm significantly outperforms the generic benchmark techniques even when they are running with the ideal parameters settings.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM-SIGMOD international conference on management of data (SIGMOD’93). Washington, DC, pp 207–216 Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM-SIGMOD international conference on management of data (SIGMOD’93). Washington, DC, pp 207–216
2.
Zurück zum Zitat Grahne G, Zhu J (2003) High performance mining of maximal frequent itemsets. In: Proceeding of the 2003 SIAM international workshop on high performance data mining. pp 135–143 Grahne G, Zhu J (2003) High performance mining of maximal frequent itemsets. In: Proceeding of the 2003 SIAM international workshop on high performance data mining. pp 135–143
3.
Zurück zum Zitat Lee W, Stolfo SJ, Mok KW (2000) Adaptive intrusion detection: a data mining approach. Artif Intell Rev 14(6):533–567CrossRefMATH Lee W, Stolfo SJ, Mok KW (2000) Adaptive intrusion detection: a data mining approach. Artif Intell Rev 14(6):533–567CrossRefMATH
4.
Zurück zum Zitat Pei J, Han J, Mortazavi-Asl B, Zhu H (2000) Mining access patterns efficiently from web logs. In: Proceeding of the 2000 Pacific-Asia conference on knowledge discovery and data mining. Kyoto, Japan, pp 396–407 Pei J, Han J, Mortazavi-Asl B, Zhu H (2000) Mining access patterns efficiently from web logs. In: Proceeding of the 2000 Pacific-Asia conference on knowledge discovery and data mining. Kyoto, Japan, pp 396–407
5.
Zurück zum Zitat Holt JD, Chung SM (1999) Efficient mining of association rules in text databases. In: Proceeding of the 1999 international conference on Information and knowledge management. Kansas City, Missouri, pp 234–242 Holt JD, Chung SM (1999) Efficient mining of association rules in text databases. In: Proceeding of the 1999 international conference on Information and knowledge management. Kansas City, Missouri, pp 234–242
6.
Zurück zum Zitat Klemettinen M (1999) A knowledge discovery methodology for telecommunication network alarm databases. Ph.D. thesis, University of Helsinki Klemettinen M (1999) A knowledge discovery methodology for telecommunication network alarm databases. Ph.D. thesis, University of Helsinki
7.
Zurück zum Zitat Satou K, Shibayama G, Ono T, Yamamura Y, Furuichi E, Kuhara S, Takagi T (1997) Finding associations rules on heterogeneous genome data. In: Proceeding of the 1997 Pacific symposium on biocomputing (PSB’97). Hawaii, pp 397–408 Satou K, Shibayama G, Ono T, Yamamura Y, Furuichi E, Kuhara S, Takagi T (1997) Finding associations rules on heterogeneous genome data. In: Proceeding of the 1997 Pacific symposium on biocomputing (PSB’97). Hawaii, pp 397–408
8.
Zurück zum Zitat Bayardo RJ (1998) Efficiently mining long patterns from databases. In: Proceeding of the 1998 ACM-SIGMOD international conference on management of data (SIGMOD’98). Seattle, WA, pp 85–93 Bayardo RJ (1998) Efficiently mining long patterns from databases. In: Proceeding of the 1998 ACM-SIGMOD international conference on management of data (SIGMOD’98). Seattle, WA, pp 85–93
9.
Zurück zum Zitat Burdick D, Calimlim M, Flannick J, Gehrke J, Yiu T (2005) MAFIA: a maximal frequent itemset algorithm. IEEE Trans Knowl Data Eng 17(11):1490–1504CrossRef Burdick D, Calimlim M, Flannick J, Gehrke J, Yiu T (2005) MAFIA: a maximal frequent itemset algorithm. IEEE Trans Knowl Data Eng 17(11):1490–1504CrossRef
10.
Zurück zum Zitat Gouda K, Zaki MJ (2005) GenMax: an efficient algorithm for mining maximal frequent itemsets. Data Min Knowl Discov 11(3):1–20MathSciNetCrossRef Gouda K, Zaki MJ (2005) GenMax: an efficient algorithm for mining maximal frequent itemsets. Data Min Knowl Discov 11(3):1–20MathSciNetCrossRef
11.
Zurück zum Zitat Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Proceeding of the 7th international conference on database theory (ICDT’99). Jerusalem, Israel, pp 398–416 Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Proceeding of the 7th international conference on database theory (ICDT’99). Jerusalem, Israel, pp 398–416
12.
Zurück zum Zitat Pei J, Han J, Mao R (2000) CLOSET: an efficient algorithm for mining frequent closed itemsets. In: Proceeding of the 2000 ACM-SIGMOD international workshop data mining and knowledge discovery (DMKD’00). Dallas, TX, pp 11–20 Pei J, Han J, Mao R (2000) CLOSET: an efficient algorithm for mining frequent closed itemsets. In: Proceeding of the 2000 ACM-SIGMOD international workshop data mining and knowledge discovery (DMKD’00). Dallas, TX, pp 11–20
13.
Zurück zum Zitat Zaki MJ, Hsiao CJ (2002) CHARM: an efficient algorithm for closed itemset mining. In: Proceeding of the 2002 SIAM international conference on data mining (SDM’02). Arlington, VA, pp 457–473 Zaki MJ, Hsiao CJ (2002) CHARM: an efficient algorithm for closed itemset mining. In: Proceeding of the 2002 SIAM international conference on data mining (SDM’02). Arlington, VA, pp 457–473
14.
Zurück zum Zitat Borgelt C, Yang X, Nogales-Cadenas R, Carmona-Saez P, Pascual-Montano A (2011) Finding closed frequent item sets by intersecting transactions. In: Proceedings of the 2011 international conference on extending database technology (EDBT-11). Sweden, Uppsala, pp 367–376 Borgelt C, Yang X, Nogales-Cadenas R, Carmona-Saez P, Pascual-Montano A (2011) Finding closed frequent item sets by intersecting transactions. In: Proceedings of the 2011 international conference on extending database technology (EDBT-11). Sweden, Uppsala, pp 367–376
15.
Zurück zum Zitat Hu T, Sung SY, Xiong H, Fu Q (2008) Discovery of maximum length frequent itemsets. Inf Sci Int J 178(1):69–87MathSciNet Hu T, Sung SY, Xiong H, Fu Q (2008) Discovery of maximum length frequent itemsets. Inf Sci Int J 178(1):69–87MathSciNet
16.
Zurück zum Zitat Zhu F, Yan X, Han J, Yu PS, Cheng H (2007) Mining colossal frequent patterns by core pattern fusion. In: Proceeding of the 2007 international conference on data engineering (ICDE’07). Istanbul, Turkey, pp 706–715 Zhu F, Yan X, Han J, Yu PS, Cheng H (2007) Mining colossal frequent patterns by core pattern fusion. In: Proceeding of the 2007 international conference on data engineering (ICDE’07). Istanbul, Turkey, pp 706–715
17.
Zurück zum Zitat Dabbiru M, Shashi M (2010) An efficient approach to colossal pattern mining. Int J Comput Sci Netw Secur (IJCSNS) 10(1):304–312 Dabbiru M, Shashi M (2010) An efficient approach to colossal pattern mining. Int J Comput Sci Netw Secur (IJCSNS) 10(1):304–312
18.
Zurück zum Zitat Sohrabi MK, Barforoush AA (2012) Efficient colossal pattern mining in high dimensional datasets. Knowl Based Syst 33:41–52CrossRef Sohrabi MK, Barforoush AA (2012) Efficient colossal pattern mining in high dimensional datasets. Knowl Based Syst 33:41–52CrossRef
19.
Zurück zum Zitat Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceeding of the 2000 ACM-SIGMOD international conference on management of data (SIGMOD’00). Dallas, TX, pp 1–12 Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceeding of the 2000 ACM-SIGMOD international conference on management of data (SIGMOD’00). Dallas, TX, pp 1–12
20.
Zurück zum Zitat Han J, Cheng H, Xin D, Yan (2007) Frequent pattern mining—current status and future directions. Data Min Knowl Discov 15(1):55–86MathSciNetCrossRef Han J, Cheng H, Xin D, Yan (2007) Frequent pattern mining—current status and future directions. Data Min Knowl Discov 15(1):55–86MathSciNetCrossRef
21.
Zurück zum Zitat Cheung YL, Fu AWC (2004) Mining frequent itemsets without support threshold: with and without item constraints. IEEE Trans Knowl Data Eng 16(9):1052–1069CrossRef Cheung YL, Fu AWC (2004) Mining frequent itemsets without support threshold: with and without item constraints. IEEE Trans Knowl Data Eng 16(9):1052–1069CrossRef
22.
Zurück zum Zitat Fu AWC, Kwong RWW, Tang J (2000) Mining N-most interesting itemsets. In: Proceedings of the 2000 international symposium on methodologies for intelligent systems. pp 59–67 Fu AWC, Kwong RWW, Tang J (2000) Mining N-most interesting itemsets. In: Proceedings of the 2000 international symposium on methodologies for intelligent systems. pp 59–67
23.
Zurück zum Zitat Ngan SC, Lam T, Wong RCW, Fu AWC (2005) Mining N-most interesting itemsets without support threshold by the COFI-tree. Int J Bus Intell Data Min 1(1):88–106CrossRef Ngan SC, Lam T, Wong RCW, Fu AWC (2005) Mining N-most interesting itemsets without support threshold by the COFI-tree. Int J Bus Intell Data Min 1(1):88–106CrossRef
24.
Zurück zum Zitat El-Hajj M, Zaïane OR (2003) COFI-tree mining: a new approach to pattern growth with reduced candidacy generation. In: Workshop on frequent itemset mining implementations (FIMI 2003) in conjunction with IEEE-ICDM El-Hajj M, Zaïane OR (2003) COFI-tree mining: a new approach to pattern growth with reduced candidacy generation. In: Workshop on frequent itemset mining implementations (FIMI 2003) in conjunction with IEEE-ICDM
25.
Zurück zum Zitat Salam A, Khayal M (2011) Mining top-k frequent patterns without minimum support threshold. Knowl Inf Syst 30(1):112–142 Salam A, Khayal M (2011) Mining top-k frequent patterns without minimum support threshold. Knowl Inf Syst 30(1):112–142
26.
Zurück zum Zitat Li Y, Lin Q, Li R, Duan D (2010) TGP: mining top-K frequent closed graph pattern without minimum support. In: Proceeding of the 2010 international conference on advanced data mining and applications (ADMA ’10). pp 537–548 Li Y, Lin Q, Li R, Duan D (2010) TGP: mining top-K frequent closed graph pattern without minimum support. In: Proceeding of the 2010 international conference on advanced data mining and applications (ADMA ’10). pp 537–548
27.
Zurück zum Zitat Xie Y, Yu PS (2010) Max-Clique: a top-down graph-based approach to frequent pattern mining. In: Proceeding of the 2010 IEEE international conference on data mining (ICDM ’10). pp 1139–1144 Xie Y, Yu PS (2010) Max-Clique: a top-down graph-based approach to frequent pattern mining. In: Proceeding of the 2010 IEEE international conference on data mining (ICDM ’10). pp 1139–1144
28.
Zurück zum Zitat Okubo Y, Haraguchi M (2012) Finding top-N colossal patterns based on clique search with dynamic update of graph. In: Proceeding of the 2012 international conference on formal concept analysis (ICFCA’12). Springer, pp 244–259 Okubo Y, Haraguchi M (2012) Finding top-N colossal patterns based on clique search with dynamic update of graph. In: Proceeding of the 2012 international conference on formal concept analysis (ICFCA’12). Springer, pp 244–259
29.
Zurück zum Zitat Agrawal R, Mannila H, Srikant R, Toivonen H, Verkamo A (1996) Fast discovery of association rules. In: Fayyad UM, Piatetsky G, Smyth P, Uthurusamy R (eds) Advances in KDD. MIT press Agrawal R, Mannila H, Srikant R, Toivonen H, Verkamo A (1996) Fast discovery of association rules. In: Fayyad UM, Piatetsky G, Smyth P, Uthurusamy R (eds) Advances in KDD. MIT press
30.
Zurück zum Zitat Holsheimer M, Kersten M, Mannila H, Toivonen H (1995) A perspective on database and data mining. In: Proceeding of the 1995 international conference on knowledge discovery and data mining (KDD’ 95). pp 150–155 Holsheimer M, Kersten M, Mannila H, Toivonen H (1995) A perspective on database and data mining. In: Proceeding of the 1995 international conference on knowledge discovery and data mining (KDD’ 95). pp 150–155
31.
Zurück zum Zitat Zaki MJ, Gouda K (2003) Fast vertical mining using diffsets. In: Proceedings of the 2003 ACM-SIGKDD international conference on knowledge discovery and data mining (SIGKDD’03). Washington, pp 326–335 Zaki MJ, Gouda K (2003) Fast vertical mining using diffsets. In: Proceedings of the 2003 ACM-SIGKDD international conference on knowledge discovery and data mining (SIGKDD’03). Washington, pp 326–335
33.
Zurück zum Zitat Shen L, Shen H, Pritchard P, Topor R (1998) Finding the N largest itemsets. In: Proceedings of international conference on data mining. pp 211–222 Shen L, Shen H, Pritchard P, Topor R (1998) Finding the N largest itemsets. In: Proceedings of international conference on data mining. pp 211–222
34.
Zurück zum Zitat Quang TM, Oyanagi S, Yamazaki K (2006) ExMiner: an efficient algorithm for mining top-K frequent patterns, ADMA 2006, LNAI 4093. pp 436–447 Quang TM, Oyanagi S, Yamazaki K (2006) ExMiner: an efficient algorithm for mining top-K frequent patterns, ADMA 2006, LNAI 4093. pp 436–447
35.
Zurück zum Zitat Wang J, Han J (2005) TFP: an efficient algorithm for mining top-K frequent closed itemsets. IEEE Trans Knowl Data Eng 17(5):652–664CrossRef Wang J, Han J (2005) TFP: an efficient algorithm for mining top-K frequent closed itemsets. IEEE Trans Knowl Data Eng 17(5):652–664CrossRef
36.
Zurück zum Zitat Hirate Y, Iwahashi E, Yamana H (2004) TF2P-growth: an efficient algorithm for mining frequent patterns without any thresholds. In: Proceedings of ICDM Hirate Y, Iwahashi E, Yamana H (2004) TF2P-growth: an efficient algorithm for mining frequent patterns without any thresholds. In: Proceedings of ICDM
Metadaten
Titel
Top-K Miner: top-K identical frequent itemsets discovery without user support threshold
verfasst von
Saif-Ur-Rehman
Jawad Ashraf
Asad Habib
Abdus Salam
Publikationsdatum
01.09.2016
Verlag
Springer London
Erschienen in
Knowledge and Information Systems / Ausgabe 3/2016
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-015-0907-7

Weitere Artikel der Ausgabe 3/2016

Knowledge and Information Systems 3/2016 Zur Ausgabe

Premium Partner