2013 | OriginalPaper | Buchkapitel
Discovering Skylines of Subgroup Sets
verfasst von : Matthijs van Leeuwen, Antti Ukkonen
Erschienen in: Machine Learning and Knowledge Discovery in Databases
Verlag: Springer Berlin Heidelberg
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
Many tasks in exploratory data mining aim to discover the top-
k
results with respect to a certain interestingness measure. Unfortunately, in practice top-
k
solution sets are hardly satisfactory, if only because redundancy in such results is a severe problem. To address this, a recent trend is to find
diverse sets of high-quality patterns
. However, a ‘perfect’ diverse top-
k
cannot possibly exist, since there is an inherent trade-off between quality and diversity.
We argue that the best way to deal with the quality-diversity trade-off is to
explicitly consider the Pareto front, or skyline, of non-dominated solutions
, i.e. those solutions for which neither quality nor diversity can be improved without degrading the other quantity. In particular, we focus on
k
-pattern set mining in the context of Subgroup Discovery [6]. For this setting, we present two algorithms for the discovery of skylines; an exact algorithm and a levelwise heuristic.
We evaluate the performance of the two proposed skyline algorithms, and the accuracy of the levelwise method. Furthermore, we show that the skylines can be used for the objective evaluation of subgroup set heuristics. Finally, we show characteristics of the obtained skylines, which reveal that different quality-diversity trade-offs result in clearly different subgroup sets. Hence, the discovery of skylines is an important step towards a better understanding of ‘diverse top-
k
’s’.