2012 | OriginalPaper | Buchkapitel
The Decomposed K-Nearest Neighbor Algorithm for Imbalanced Text Classification
verfasst von : Hyung-Seok Kang, Kihyo Nam, Seong-in Kim
Erschienen in: Future Generation Information Technology
Verlag: Springer Berlin Heidelberg
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
As textual data have exponentially increased, it is focused that a need for automatic classification of relevant data to one of pre-defined classes. In many practical applications, they assume that training data are evenly distributed among all classes, but they are suffered from an imbalanced problem. Several algorithms and re-sampling methods have been proposed to overcome an imbalanced problem, but they are still facing the overfitting and information missing. This paper proposes the Decomposed K-Nearest Neighbor (DCM-KNN). In training step, the DCM-KNN decomposes training data into misclassified and correctly-classified data set based on the result of traditional KNN, and finds the appropriate KNN for each set. In test step, the DCM-KNN estimates whether test data is similar to misclassified and correctly-classified data set, and applies the appropriate KNNs. Experimental results show that proposed algorithm can achieve more accurate results in an imbalanced condition.