2010 | OriginalPaper | Buchkapitel
CODE: A Data Complexity Framework for Imbalanced Datasets
verfasst von : Cheng G. Weng, Josiah Poon
Erschienen in: New Frontiers in Applied Data Mining
Verlag: Springer Berlin Heidelberg
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
Imbalanced datasets occur in many domains, such as fraud detection, cancer detection and web; and in such domains, the class of interest often concerns the rare occurring events. Thus it is important to have a good performance on these classes while maintaining a reasonable overall accuracy. Although imbalanced datasets can be difficult to learn, but in the previous researches, the skewed class distribution has been suggested to not necessarily being the one that poses problems for learning. Therefore, when the learning of the rare class becomes problematic, it does not imply that the skewed class distribution is the cause to blame, but rather that the imbalanced distribution may just be a byproduct of some other hidden intrinsic difficulties.
This paper tries to shade some light on this issue of learning from imbalanced dataset. We propose to use data complexity models to profile datasets in order to make connections with imbalanced datasets; this can potentially lead to better learning approaches. We have extended from our previous work with an improved implementation of the CODE framework in order to tackle a more difficult learning challenge. Despite the increased difficulty, CODE still enables a reasonable performance on profiling the data complexity of imbalanced datasets.