2011 | OriginalPaper | Buchkapitel
Efficient Mining of Top Correlated Patterns Based on Null-Invariant Measures
verfasst von : Sangkyum Kim, Marina Barsky, Jiawei Han
Erschienen in: Machine Learning and Knowledge Discovery in Databases
Verlag: Springer Berlin Heidelberg
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
Mining strong correlations from transactional databases often leads to more meaningful results than mining association rules. In such mining, null (transaction)-invariance is an important property of the correlation measures. Unfortunately, some useful null-invariant measures such as
Kulczynski
and
Cosine
, which can discover correlations even for the very unbalanced cases, lack the (anti)-monotonicity property. Thus, they could only be applied to frequent itemsets as the post-evaluation step. For large datasets and for low supports, this approach is computationally prohibitive. This paper presents new properties for all known null-invariant measures. Based on these properties, we develop efficient pruning techniques and design the Apriori-like algorithm
NICoMiner
for mining strongly correlated patterns
directly
. We develop both the threshold-bounded and the top-
k
variations of the algorithm, where top-
k
is used when the optimal correlation threshold is not known in advance and to give user control over the output size. We test
NICoMiner
on real-life datasets from different application domains, using
Cosine
as an example of the null-invariant correlation measure. We show that
NICoMiner
outperforms support-based approach more than an order of magnitude, and that it is very useful for discovering top correlations in itemsets with low support.