2014 | OriginalPaper | Buchkapitel
RealKrimp — Finding Hyperintervals that Compress with MDL for Real-Valued Data
verfasst von : Jouke Witteveen, Wouter Duivesteijn, Arno Knobbe, Peter Grünwald
Erschienen in: Advances in Intelligent Data Analysis XIII
Verlag: Springer International Publishing
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
The MDL Principle (induction by compression) is applied with meticulous effort in the
Krimp
algorithm for the problem of itemset mining, where one seeks exceptionally frequent patterns in a binary dataset. As is the case with many algorithms in data mining,
Krimp
is not designed to cope with real-valued data, and it is not able to handle such data natively. Inspired by
Krimp
’s success at using the MDL Principle in itemset mining, we develop
RealKrimp
: an MDL-based
Krimp
-inspired mining scheme that seeks exceptionally high-density patterns in a real-valued dataset. We review how to extend the underlying Kraft inequality, which relates probabilities to codelengths, to real-valued data. Based on this extension we introduce the
RealKrimp
algorithm: an efficient method to find hyperintervals that compress the real-valued dataset, without the need for pre-algorithm data discretization.