Skip to main content
Erschienen in: Soft Computing 11/2015

01.11.2015 | Methodologies and Application

Enhancement of spam detection mechanism based on hybrid \(\varvec{k}\)-mean clustering and support vector machine

verfasst von: Nadir Omer Fadl Elssied, Othman Ibrahim, Ahmed Hamza Osman

Erschienen in: Soft Computing | Ausgabe 11/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Spam e-mails are considered a serious violation of privacy. It has become costly and unwanted communication. Support vector machine (SVM) has been widely used in e-mail spam classification, yet the problem of dealing with huge amounts of data results in low accuracy and time consumption as many researches have demonstrated. This paper proposes a hybrid approach for e-mail spam classification based on the SVM and \(k\)-mean clustering. The experiment of the proposed approach was carried out using spambase standard dataset to evaluate the feasibility of the proposed method. The result of this combination led to improve SVM and accordingly increase the accuracy of spam classification. The accuracy based on SVM algorithm is 96.30 % and the proposed hybrid SVM algorithm with \(k\)-mean clustering is 98.01 %. In addition, experimental results on spambase datasets showed that improved SVM (ESVM) significantly outperforms SVM and many other recent spam classification methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Alguliev RM, Aliguliyev RM, Nazirova SA (2011) Classification of textual e-mail spam using data mining techniques. Appl Comput Intell Soft Comput 2011:1–8 Art. ID 416308 Alguliev RM, Aliguliyev RM, Nazirova SA (2011) Classification of textual e-mail spam using data mining techniques. Appl Comput Intell Soft Comput 2011:1–8 Art. ID 416308
Zurück zum Zitat Alguliyev R, Nazirova S (2012) Two approaches on implementation of CBR and CRM technologies to the spam filtering problem. Inf J Alguliyev R, Nazirova S (2012) Two approaches on implementation of CBR and CRM technologies to the spam filtering problem. Inf J
Zurück zum Zitat Castiglione A et al (2012) An asynchronous covert channel using spam. Comput Math Appl 63(2):437–447CrossRef Castiglione A et al (2012) An asynchronous covert channel using spam. Comput Math Appl 63(2):437–447CrossRef
Zurück zum Zitat Chhabra P, Wadhvani R, Shukla S (2010) Spam filtering using support vector machine. In: ACCTA-2010, pp 166–171 Chhabra P, Wadhvani R, Shukla S (2010) Spam filtering using support vector machine. In: ACCTA-2010, pp 166–171
Zurück zum Zitat DeBarr D, Wechsler H (2009) Spam detection using clustering, random forests, and active learning. In: CEAS 2009, California, USA DeBarr D, Wechsler H (2009) Spam detection using clustering, random forests, and active learning. In: CEAS 2009, California, USA
Zurück zum Zitat Drucker H, Wu D, Vapnik VN (1999) Support vector machines for spam categorization. Neural Netw IEEE Trans 10(5):1048–1054CrossRef Drucker H, Wu D, Vapnik VN (1999) Support vector machines for spam categorization. Neural Netw IEEE Trans 10(5):1048–1054CrossRef
Zurück zum Zitat Golovko V et al (2010) Neural network and artificial immune systems for malware and network intrusion detection. In: Proccedings of advances in machine learning II, pp 485–513 Golovko V et al (2010) Neural network and artificial immune systems for malware and network intrusion detection. In: Proccedings of advances in machine learning II, pp 485–513
Zurück zum Zitat Guzella TS, Caminhas WM (2009) A review of machine learning approaches to spam filtering. Expert Syst Appl 36(7):10206–10222CrossRef Guzella TS, Caminhas WM (2009) A review of machine learning approaches to spam filtering. Expert Syst Appl 36(7):10206–10222CrossRef
Zurück zum Zitat Hayati P, Potdar V (2008) Evaluation of spam detection and prevention frameworks for email and image spam: a state of art. In: Proceedings of ACM Hayati P, Potdar V (2008) Evaluation of spam detection and prevention frameworks for email and image spam: a state of art. In: Proceedings of ACM
Zurück zum Zitat Idris I (2011) E-mail spam classification with artificial neural network and negative selection algorithm. Int J Comput Sci 1(3):227–231 Idris I (2011) E-mail spam classification with artificial neural network and negative selection algorithm. Int J Comput Sci 1(3):227–231
Zurück zum Zitat Idris I (2012a) Model and algorithm in artificial immune system for spam detection. Int J 3(1):83–94 Idris I (2012a) Model and algorithm in artificial immune system for spam detection. Int J 3(1):83–94
Zurück zum Zitat Idris I (2012b) Optimized spam classification approach with negative selection algorithm. J Theor Appl Inf Technol 39(1):22–31 Idris I (2012b) Optimized spam classification approach with negative selection algorithm. J Theor Appl Inf Technol 39(1):22–31
Zurück zum Zitat Jin Q, Ming M (2011) A method to construct self set for IDS based on negative selection algorithm. In: Proceedings of IEEE Jin Q, Ming M (2011) A method to construct self set for IDS based on negative selection algorithm. In: Proceedings of IEEE
Zurück zum Zitat Lai CC, Wu CH (2007) Particle swarm optimization-aided feature selection for spam email classification. In: Proceedings of IEEE Lai CC, Wu CH (2007) Particle swarm optimization-aided feature selection for spam email classification. In: Proceedings of IEEE
Zurück zum Zitat Lee SM et al (2010) Spam detection using feature selection and parameters optimization. In: Proceedings of IEEE Lee SM et al (2010) Spam detection using feature selection and parameters optimization. In: Proceedings of IEEE
Zurück zum Zitat Long X, Cleveland WL, Yao YL (2011) Methods and systems for identifying and localizing objects based on features of the objects that are mapped to a vector, Google patents Long X, Cleveland WL, Yao YL (2011) Methods and systems for identifying and localizing objects based on features of the objects that are mapped to a vector, Google patents
Zurück zum Zitat MacQueen J (1967) Some methods for classification and analysis of multivariate observations. California, USA MacQueen J (1967) Some methods for classification and analysis of multivariate observations. California, USA
Zurück zum Zitat Marsono MN (2007) Towards improving e-mail content classification for spam control: architecture, abstraction, and strategies. PhD Thesis, University of Victoria Marsono MN (2007) Towards improving e-mail content classification for spam control: architecture, abstraction, and strategies. PhD Thesis, University of Victoria
Zurück zum Zitat Ma W, Tran D, Sharma D (2009) A novel spam email detection system based on negative selection. In: Proceedings of IEEE Ma W, Tran D, Sharma D (2009) A novel spam email detection system based on negative selection. In: Proceedings of IEEE
Zurück zum Zitat Mazid MM, Ali ABMS, Tickle KS (2010) Improved C4.5 algorithm for rule based classification recent advances in artificial intelligence, knowledge engineering and data bases Mazid MM, Ali ABMS, Tickle KS (2010) Improved C4.5 algorithm for rule based classification recent advances in artificial intelligence, knowledge engineering and data bases
Zurück zum Zitat Mohammad AH, Zitar RA (2011) Application of genetic optimized artificial immune system and neural networks in spam detection. Appl Soft Comput 11(4):3827–3845CrossRef Mohammad AH, Zitar RA (2011) Application of genetic optimized artificial immune system and neural networks in spam detection. Appl Soft Comput 11(4):3827–3845CrossRef
Zurück zum Zitat Morariu DI, Vintan LN, Tresp V (2006) Evolutionary feature selection for text documents using the SVM. Trans Eng Comput Tech 15:215–221 Morariu DI, Vintan LN, Tresp V (2006) Evolutionary feature selection for text documents using the SVM. Trans Eng Comput Tech 15:215–221
Zurück zum Zitat Münz G, Li S, Carle G (2007) Traffic anomaly detection using k-means clustering Münz G, Li S, Carle G (2007) Traffic anomaly detection using k-means clustering
Zurück zum Zitat Naksomboon S, Charnsripinyo C, Wattanapongsakorn N (2010) Considering behavior of sender in spam mail detection. In: Proceedings of 6th international conference on networked computing (INC) Naksomboon S, Charnsripinyo C, Wattanapongsakorn N (2010) Considering behavior of sender in spam mail detection. In: Proceedings of 6th international conference on networked computing (INC)
Zurück zum Zitat Nosrati L, Pour AN (2011) DWM-CDD: dynamic weighted majority concept drift detection for spam mail filtering world academy of science. Eng Technol 80:2011 Nosrati L, Pour AN (2011) DWM-CDD: dynamic weighted majority concept drift detection for spam mail filtering world academy of science. Eng Technol 80:2011
Zurück zum Zitat Palmieri F et al (2013) On the detection of card-sharing traffic through wavelet analysis and support vector machines. Appl Soft Comput 13(1):615–627CrossRef Palmieri F et al (2013) On the detection of card-sharing traffic through wavelet analysis and support vector machines. Appl Soft Comput 13(1):615–627CrossRef
Zurück zum Zitat Palmieri F, Fiore U, Castiglione A (2014) A distributed approach to network anomaly detection based on independent component analysis. Concurr Comput Pract Exp 26(5):1113–1129 Palmieri F, Fiore U, Castiglione A (2014) A distributed approach to network anomaly detection based on independent component analysis. Concurr Comput Pract Exp 26(5):1113–1129
Zurück zum Zitat Pearson K (1920) Notes on the history of correlation. Biometrika 13(1):25–45CrossRef Pearson K (1920) Notes on the history of correlation. Biometrika 13(1):25–45CrossRef
Zurück zum Zitat Rao IKR (2003) Data mining and clustering techniques Rao IKR (2003) Data mining and clustering techniques
Zurück zum Zitat Raskar SS, Thakore D (2011) Text mining and clustering analysis. IJCSNS 11(6):203 Raskar SS, Thakore D (2011) Text mining and clustering analysis. IJCSNS 11(6):203
Zurück zum Zitat Saad O, Darwish A, Faraj R (2012) A survey of machine learning techniques for Spam filtering. IJCSNS 12(2):66 Saad O, Darwish A, Faraj R (2012) A survey of machine learning techniques for Spam filtering. IJCSNS 12(2):66
Zurück zum Zitat Salcedo-Campos F, Díaz-Verdejo J, García-Teodoro P (2012) Segmental parameterisation and statistical modelling of e-mail headers for spam detection. Inf Sci 195:45–61CrossRef Salcedo-Campos F, Díaz-Verdejo J, García-Teodoro P (2012) Segmental parameterisation and statistical modelling of e-mail headers for spam detection. Inf Sci 195:45–61CrossRef
Zurück zum Zitat Salehi S, Selamat A (2011) Hybrid simple artificial immune system (SAIS) and particle swarm optimization (PSO) for spam detection. In: Proceedings of IEEE Salehi S, Selamat A (2011) Hybrid simple artificial immune system (SAIS) and particle swarm optimization (PSO) for spam detection. In: Proceedings of IEEE
Zurück zum Zitat Sun J et al (2010) Analysis of the distance between two classes for tuning SVM hyperparameters. Neural Netw IEEE Trans 21(2):305–318CrossRef Sun J et al (2010) Analysis of the distance between two classes for tuning SVM hyperparameters. Neural Netw IEEE Trans 21(2):305–318CrossRef
Zurück zum Zitat Tafazzoli T, Sadjadi SH (2009) A combined method for detecting spam machines on a target network. Int J Comput Netw Commun (IJCNC) 1(2):35–44 Tafazzoli T, Sadjadi SH (2009) A combined method for detecting spam machines on a target network. Int J Comput Netw Commun (IJCNC) 1(2):35–44
Zurück zum Zitat Temitayo F, Stephen O, Abimbola A (2012) Hybrid GA-SVM for efficient feature selection in e-mail classification. Comput Eng Intell Syst 3(3):17–28 Temitayo F, Stephen O, Abimbola A (2012) Hybrid GA-SVM for efficient feature selection in e-mail classification. Comput Eng Intell Syst 3(3):17–28
Zurück zum Zitat Torres GJ, Basnet RB, Sung AH, Mukkamala S, Ribero BM (2009) A similarity measure for clustering and its applications. Int J Electr Comput Syst Eng 3(3):164–170 Torres GJ, Basnet RB, Sung AH, Mukkamala S, Ribero BM (2009) A similarity measure for clustering and its applications. Int J Electr Comput Syst Eng 3(3):164–170
Zurück zum Zitat Wang L (2005) Support vector machines: theory and applications. vol. 177, pp 1–47. Springer, Auckland, New Zealand Wang L (2005) Support vector machines: theory and applications. vol. 177, pp 1–47. Springer, Auckland, New Zealand
Zurück zum Zitat Wang X, Cloete I (2005) Learning to classify email: a survey. In: Proceedings of IEEE Wang X, Cloete I (2005) Learning to classify email: a survey. In: Proceedings of IEEE
Zurück zum Zitat Wu X et al (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1):1–37CrossRef Wu X et al (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1):1–37CrossRef
Zurück zum Zitat Wu CH (2009) Behavior-based spam detection using a hybrid method of rule-based techniques and neural networks. Expert Syst Appl 36(3):4321–4330CrossRef Wu CH (2009) Behavior-based spam detection using a hybrid method of rule-based techniques and neural networks. Expert Syst Appl 36(3):4321–4330CrossRef
Zurück zum Zitat Xie Y et al (2008) Spamming botnets: signatures and characteristics. In: Proceedings of ACM Xie Y et al (2008) Spamming botnets: signatures and characteristics. In: Proceedings of ACM
Zurück zum Zitat Youn S, McLeod D (2007) A comparative study for email classification. Computing Sciences and Software Engineering, Advances and Innovations in Systems, pp 387–391 Youn S, McLeod D (2007) A comparative study for email classification. Computing Sciences and Software Engineering, Advances and Innovations in Systems, pp 387–391
Zurück zum Zitat Yu B, Xu Z (2008) A comparative study for content-based dynamic spam classification using four machine learning algorithms. Knowl Based Syst 21(4):355–362CrossRef Yu B, Xu Z (2008) A comparative study for content-based dynamic spam classification using four machine learning algorithms. Knowl Based Syst 21(4):355–362CrossRef
Zurück zum Zitat Zhang Q et al (2011) Fuzzy clustering based on semantic body and its application in Chinese spam filtering. JDCTA: Int J Digital Content Technol Appl 5(4):1–11 Zhang Q et al (2011) Fuzzy clustering based on semantic body and its application in Chinese spam filtering. JDCTA: Int J Digital Content Technol Appl 5(4):1–11
Metadaten
Titel
Enhancement of spam detection mechanism based on hybrid -mean clustering and support vector machine
verfasst von
Nadir Omer Fadl Elssied
Othman Ibrahim
Ahmed Hamza Osman
Publikationsdatum
01.11.2015
Verlag
Springer Berlin Heidelberg
Erschienen in
Soft Computing / Ausgabe 11/2015
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-014-1479-2

Weitere Artikel der Ausgabe 11/2015

Soft Computing 11/2015 Zur Ausgabe