Skip to main content
Erschienen in: Data Mining and Knowledge Discovery 2/2006

01.09.2006

Hyperclique pattern discovery

verfasst von: Hui Xiong, Pang-Ning Tan, Vipin Kumar

Erschienen in: Data Mining and Knowledge Discovery | Ausgabe 2/2006

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Existing algorithms for mining association patterns often rely on the support-based pruning strategy to prune a combinatorial search space. However, this strategy is not effective for discovering potentially interesting patterns at low levels of support. Also, it tends to generate too many spurious patterns involving items which are from different support levels and are poorly correlated. In this paper, we present a framework for mining highly-correlated association patterns called hyperclique patterns. In this framework, an objective measure called h-confidence is applied to discover hyperclique patterns. We prove that the items in a hyperclique pattern have a guaranteed level of global pairwise similarity to one another as measured by the cosine similarity (uncentered Pearson's correlation coefficient). Also, we show that the h-confidence measure satisfies a cross-support property which can help efficiently eliminate spurious patterns involving items with substantially different support levels. Indeed, this cross-support property is not limited to h-confidence and can be generalized to some other association measures. In addition, an algorithm called hyperclique miner is proposed to exploit both cross-support and anti-monotone properties of the h-confidence measure for the efficient discovery of hyperclique patterns. Finally, our experimental results show that hyperclique miner can efficiently identify hyperclique patterns, even at extremely low levels of support.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
It is available at http://www.almaden.ibm.com/ software/quest/resources.
 
2
This is observed on Sun Ultra 10 work station with a 440 MHz CPU and 128 Mbytes of memory.
 
3
When computing Pearson's correlation coefficient, the data mean is not subtracted.
 
4
Note that the number of items shown in Table 4 for pumsb, pumsb * are somewhat different from the numbers reported in Zaki and Hsiao (2002), because we only consider item IDs for which the count is at least one. For example, although the minimum item ID in pumsb is 0 and the maximum item ID is 7116, there are only 2113 distinct item IDs that appear in the data set.
 
5
The data set is available at http://trec.nist.gov.
 
Literatur
Zurück zum Zitat Agarwal R, Aggarwal C, Prasad V (2000) A tree projection algorithm for generation of frequent itemsets. Journal of Parallel and Distributed Computing Agarwal R, Aggarwal C, Prasad V (2000) A tree projection algorithm for generation of frequent itemsets. Journal of Parallel and Distributed Computing
Zurück zum Zitat Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data. pp 207–216 Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data. pp 207–216
Zurück zum Zitat Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases
Zurück zum Zitat Bayardo R, Agrawal R, Gunopulous D (1999) Constraint-based rule mining in large, dense databases. In: Proceedings of the Int'l Conference on Data Engineering Bayardo R, Agrawal R, Gunopulous D (1999) Constraint-based rule mining in large, dense databases. In: Proceedings of the Int'l Conference on Data Engineering
Zurück zum Zitat Bayardo RJ (1998) Efficiently mining long patterns from databases. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data Bayardo RJ (1998) Efficiently mining long patterns from databases. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data
Zurück zum Zitat Brin S, Motwani R, Silverstein C (1997) Beyond market baskets: Generalizing association rules to correlations. In: Proceedings of ACM SIGMOD Int'l Conference on Management of Data. pp 265–276 Brin S, Motwani R, Silverstein C (1997) Beyond market baskets: Generalizing association rules to correlations. In: Proceedings of ACM SIGMOD Int'l Conference on Management of Data. pp 265–276
Zurück zum Zitat Burdick D, Calimlim M, Gehrke J (2001) MAFIA: A maximal frequent itemset algorithm for transactional databases. In: Proceedings of the 2001 Int'l Conference on Data Engineering (ICDE) Burdick D, Calimlim M, Gehrke J (2001) MAFIA: A maximal frequent itemset algorithm for transactional databases. In: Proceedings of the 2001 Int'l Conference on Data Engineering (ICDE)
Zurück zum Zitat Cohen E, Datar M, Fujiwara S, Gionis A, Indyk P, Motwani R, Ullman J, Yang C (2000) Finding interesting associations without support pruning. In: Proceedings of the 2000 Int'l Conference on Data Engineering (ICDE). pp 489–499 Cohen E, Datar M, Fujiwara S, Gionis A, Indyk P, Motwani R, Ullman J, Yang C (2000) Finding interesting associations without support pruning. In: Proceedings of the 2000 Int'l Conference on Data Engineering (ICDE). pp 489–499
Zurück zum Zitat Feder T, Motwani R (1995) Clique partitions, graph compression and speeding-up algorithms. Special Issue for the STOC conference. Journal of Computer and System Sciences 51:261–272 Feder T, Motwani R (1995) Clique partitions, graph compression and speeding-up algorithms. Special Issue for the STOC conference. Journal of Computer and System Sciences 51:261–272
Zurück zum Zitat Grahne G, Lakshmanan LVS, Wang X (2000) Efficient mining of constrained correlated sets. In: Proceedings of the Int'l Conference on Data Engineering Grahne G, Lakshmanan LVS, Wang X (2000) Efficient mining of constrained correlated sets. In: Proceedings of the Int'l Conference on Data Engineering
Zurück zum Zitat Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of ACM SIGMOD Int'l Conference on Management of Data Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of ACM SIGMOD Int'l Conference on Management of Data
Zurück zum Zitat Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning: Data mining, inference, and prediction, Springer Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning: Data mining, inference, and prediction, Springer
Zurück zum Zitat Liu B, Hsu W, Ma Y (1999) Mining association rules with multiple minimum supports. In: Proceedings of the 1999 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Liu B, Hsu W, Ma Y (1999) Mining association rules with multiple minimum supports. In: Proceedings of the 1999 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Zurück zum Zitat Omiecinski E (2003) Alternative interest measures for mining associations. IEEE Transactions on Knowledge and Data Engineering 15(1) Omiecinski E (2003) Alternative interest measures for mining associations. IEEE Transactions on Knowledge and Data Engineering 15(1)
Zurück zum Zitat Pei J, Han J, Mao R (2000) CLOSET: An efficient algorithm for mining frequent closed itemsets. In: ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery Pei J, Han J, Mao R (2000) CLOSET: An efficient algorithm for mining frequent closed itemsets. In: ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery
Zurück zum Zitat Reynolds HT (1977) The analysis of cross-classifications. The Free Press Reynolds HT (1977) The analysis of cross-classifications. The Free Press
Zurück zum Zitat Rijsbergen CJV (1979) Information retrieval, 2nd edn., Butterworths, London Rijsbergen CJV (1979) Information retrieval, 2nd edn., Butterworths, London
Zurück zum Zitat Tan P, Kumar V, Srivastava J (2002) Selecting the right interestingness measure for association patterns. In: Proceedings of the 1999 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Tan P, Kumar V, Srivastava J (2002) Selecting the right interestingness measure for association patterns. In: Proceedings of the 1999 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Zurück zum Zitat Wang K, He Y, Cheung D, Chin Y (2001) Mining confident rules without support requirement. In: Proceedings of the 2001 ACM International Conference on Information and Knowledge Management (CIKM) Wang K, He Y, Cheung D, Chin Y (2001) Mining confident rules without support requirement. In: Proceedings of the 2001 ACM International Conference on Information and Knowledge Management (CIKM)
Zurück zum Zitat Xiong H, He X, Ding C, Zhang Y, Kumar V, Holbrook S (2005) Identification of functional modules in protein complexes via hyperclique pattern discovery. In: Proceedings of the Pacific Symposium on Biocomputing (PSB) Xiong H, He X, Ding C, Zhang Y, Kumar V, Holbrook S (2005) Identification of functional modules in protein complexes via hyperclique pattern discovery. In: Proceedings of the Pacific Symposium on Biocomputing (PSB)
Zurück zum Zitat Xiong H, Steinbach M, Tan P-N, Kumpar V (2004) HICAP: Hierarchial clustering with pattern preservation. In: Proceedings of 2004 SIAM Int'l Conference on Data Mining (SDM). pp 279–290 Xiong H, Steinbach M, Tan P-N, Kumpar V (2004) HICAP: Hierarchial clustering with pattern preservation. In: Proceedings of 2004 SIAM Int'l Conference on Data Mining (SDM). pp 279–290
Zurück zum Zitat Xiong H, Tan P, Kumar V (2003a) Mining hyperclique patterns with confidence pruning. In: Technical Report 03-006, January, Department of computer science, University of Minnesota, Twin Cities Xiong H, Tan P, Kumar V (2003a) Mining hyperclique patterns with confidence pruning. In: Technical Report 03-006, January, Department of computer science, University of Minnesota, Twin Cities
Zurück zum Zitat Xiong H, Tan P, Kumar V (2003b) Mining strong affinity association patterns in data sets with skewed support distribution. In: Proceedings of the 3rd IEEE International Conference on Data Mining. pp 387–394 Xiong H, Tan P, Kumar V (2003b) Mining strong affinity association patterns in data sets with skewed support distribution. In: Proceedings of the 3rd IEEE International Conference on Data Mining. pp 387–394
Zurück zum Zitat Yang C, Fayyad UM, Bradley PS (2001) Efficient discovery of error-tolerant frequent itemsets in high dimensions. In: Proceedings of the 1999 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Yang C, Fayyad UM, Bradley PS (2001) Efficient discovery of error-tolerant frequent itemsets in high dimensions. In: Proceedings of the 1999 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Zurück zum Zitat Zaki M, Hsiao C-J (2002) CHARM: An efficient algorithm for closed itemset mining. In: Proceedings of 2002 SIAM International Conference on Data Mining Zaki M, Hsiao C-J (2002) CHARM: An efficient algorithm for closed itemset mining. In: Proceedings of 2002 SIAM International Conference on Data Mining
Zurück zum Zitat Zipf G (1949) Human behavior and principle of least effort: An introudction to human ecology. Addison Wesley, Cambridge, Massachusetts Zipf G (1949) Human behavior and principle of least effort: An introudction to human ecology. Addison Wesley, Cambridge, Massachusetts
Metadaten
Titel
Hyperclique pattern discovery
verfasst von
Hui Xiong
Pang-Ning Tan
Vipin Kumar
Publikationsdatum
01.09.2006
Verlag
Springer US
Erschienen in
Data Mining and Knowledge Discovery / Ausgabe 2/2006
Print ISSN: 1384-5810
Elektronische ISSN: 1573-756X
DOI
https://doi.org/10.1007/s10618-006-0043-9

Weitere Artikel der Ausgabe 2/2006

Data Mining and Knowledge Discovery 2/2006 Zur Ausgabe

Premium Partner