2006 | OriginalPaper | Buchkapitel
Performance Analysis of Naϊve Bayes Classification, Support Vector Machines and Neural Networks for Spam Categorization
verfasst von : A. Cϋneyd Tantuğ, Gϋlşen Eryiğit
Erschienen in: Applied Soft Computing Technologies: The Challenge of Complexity
Verlag: Springer Berlin Heidelberg
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
Spam mail recognition is a new growing field which brings together the topic of natural language processing and machine learning as it is in essence a two class classification of natural language texts. An important feature of spam recognition is that it is a cost-sensitive classification: misclassification of a nonspam mail as spam is generally a more severe error than misclassifying a spam mail as non-spam. In order to be compared, the methods applied to this field should be all evaluated with the same corpus and within the same cost-sensitive framework. In this paper, the performances of Support Vector Machines (SVM), Neural Networks (NN) and Naϊve Bayes (NB) techniques are compared using a publicly available corpus (LINGSPAM) for different cost scenarios. The training time complexities of the methods are also evaluated. The results show that NN has significantly better performance than the two other, having acceptable training times. NB gives better results than SVM when the cost is extremely high while in all other cases SVM outperforms NB.