nach oben

Data Mining and Knowledge Discovery

Erschienen in:

01.09.2006

Hyperclique pattern discovery

verfasst von: Hui Xiong, Pang-Ning Tan, Vipin Kumar

Erschienen in: Data Mining and Knowledge Discovery | Ausgabe 2/2006

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Existing algorithms for mining association patterns often rely on the support-based pruning strategy to prune a combinatorial search space. However, this strategy is not effective for discovering potentially interesting patterns at low levels of support. Also, it tends to generate too many spurious patterns involving items which are from different support levels and are poorly correlated. In this paper, we present a framework for mining highly-correlated association patterns called hyperclique patterns. In this framework, an objective measure called h-confidence is applied to discover hyperclique patterns. We prove that the items in a hyperclique pattern have a guaranteed level of global pairwise similarity to one another as measured by the cosine similarity (uncentered Pearson's correlation coefficient). Also, we show that the h-confidence measure satisfies a cross-support property which can help efficiently eliminate spurious patterns involving items with substantially different support levels. Indeed, this cross-support property is not limited to h-confidence and can be generalized to some other association measures. In addition, an algorithm called hyperclique miner is proposed to exploit both cross-support and anti-monotone properties of the h-confidence measure for the efficient discovery of hyperclique patterns. Finally, our experimental results show that hyperclique miner can efficiently identify hyperclique patterns, even at extremely low levels of support.

Vorheriger Artikel Mining top-K frequent itemsets from data streams

Nächster Artikel Support measures for graph data*

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

It is available at http://www.almaden.ibm.com/ software/quest/resources.

This is observed on Sun Ultra 10 work station with a 440 MHz CPU and 128 Mbytes of memory.

When computing Pearson's correlation coefficient, the data mean is not subtracted.

Note that the number of items shown in Table 4 for pumsb, pumsb ^* are somewhat different from the numbers reported in Zaki and Hsiao (2002), because we only consider item IDs for which the count is at least one. For example, although the minimum item ID in pumsb is 0 and the maximum item ID is 7116, there are only 2113 distinct item IDs that appear in the data set.

The data set is available at http://trec.nist.gov.

Agarwal R, Aggarwal C, Prasad V (2000) A tree projection algorithm for generation of frequent itemsets. Journal of Parallel and Distributed Computing

Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data. pp 207–216

Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases

Bayardo R, Agrawal R, Gunopulous D (1999) Constraint-based rule mining in large, dense databases. In: Proceedings of the Int'l Conference on Data Engineering

Bayardo RJ (1998) Efficiently mining long patterns from databases. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data

Brin S, Motwani R, Silverstein C (1997) Beyond market baskets: Generalizing association rules to correlations. In: Proceedings of ACM SIGMOD Int'l Conference on Management of Data. pp 265–276

Burdick D, Calimlim M, Gehrke J (2001) MAFIA: A maximal frequent itemset algorithm for transactional databases. In: Proceedings of the 2001 Int'l Conference on Data Engineering (ICDE)

Cohen E, Datar M, Fujiwara S, Gionis A, Indyk P, Motwani R, Ullman J, Yang C (2000) Finding interesting associations without support pruning. In: Proceedings of the 2000 Int'l Conference on Data Engineering (ICDE). pp 489–499

Feder T, Motwani R (1995) Clique partitions, graph compression and speeding-up algorithms. Special Issue for the STOC conference. Journal of Computer and System Sciences 51:261–272

Grahne G, Lakshmanan LVS, Wang X (2000) Efficient mining of constrained correlated sets. In: Proceedings of the Int'l Conference on Data Engineering

Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of ACM SIGMOD Int'l Conference on Management of Data

Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning: Data mining, inference, and prediction, Springer

Liu B, Hsu W, Ma Y (1999) Mining association rules with multiple minimum supports. In: Proceedings of the 1999 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Omiecinski E (2003) Alternative interest measures for mining associations. IEEE Transactions on Knowledge and Data Engineering 15(1)

Pei J, Han J, Mao R (2000) CLOSET: An efficient algorithm for mining frequent closed itemsets. In: ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery

Reynolds HT (1977) The analysis of cross-classifications. The Free Press

Rijsbergen CJV (1979) Information retrieval, 2nd edn., Butterworths, London

Tan P, Kumar V, Srivastava J (2002) Selecting the right interestingness measure for association patterns. In: Proceedings of the 1999 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Wang K, He Y, Cheung D, Chin Y (2001) Mining confident rules without support requirement. In: Proceedings of the 2001 ACM International Conference on Information and Knowledge Management (CIKM)

Xiong H, He X, Ding C, Zhang Y, Kumar V, Holbrook S (2005) Identification of functional modules in protein complexes via hyperclique pattern discovery. In: Proceedings of the Pacific Symposium on Biocomputing (PSB)

Xiong H, Steinbach M, Tan P-N, Kumpar V (2004) HICAP: Hierarchial clustering with pattern preservation. In: Proceedings of 2004 SIAM Int'l Conference on Data Mining (SDM). pp 279–290

Xiong H, Tan P, Kumar V (2003a) Mining hyperclique patterns with confidence pruning. In: Technical Report 03-006, January, Department of computer science, University of Minnesota, Twin Cities

Xiong H, Tan P, Kumar V (2003b) Mining strong affinity association patterns in data sets with skewed support distribution. In: Proceedings of the 3rd IEEE International Conference on Data Mining. pp 387–394

Yang C, Fayyad UM, Bradley PS (2001) Efficient discovery of error-tolerant frequent itemsets in high dimensions. In: Proceedings of the 1999 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Zaki M, Hsiao C-J (2002) CHARM: An efficient algorithm for closed itemset mining. In: Proceedings of 2002 SIAM International Conference on Data Mining

Zipf G (1949) Human behavior and principle of least effort: An introudction to human ecology. Addison Wesley, Cambridge, Massachusetts

Titel: Hyperclique pattern discovery
verfasst von: Hui Xiong
Pang-Ning Tan
Vipin Kumar
Publikationsdatum: 01.09.2006
Verlag: Springer US
Erschienen in: Data Mining and Knowledge Discovery / Ausgabe 2/2006
Print ISSN: 1384-5810
Elektronische ISSN: 1573-756X
DOI: https://doi.org/10.1007/s10618-006-0043-9

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 2/2006

VizRank: Data Visualization Guided by Machine Learning

Support measures for graph data*

A systematic approach to the assessment of fuzzy association rules

A Model-Based Frequency Constraint for Mining Associations from Transaction Data

Mining top-K frequent itemsets from data streams

Premium Partner