Skip to main content

2016 | OriginalPaper | Buchkapitel

Comparative Study of Classification Algorithms for Spam Email Detection

verfasst von : Aakanksha Sharaff, Naresh Kumar Nagwani, Abhishek Dhadse

Erschienen in: Emerging Research in Computing, Information, Communication and Applications

Verlag: Springer India

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Spam in emails has become a major issue. Spam messages consume space, network bandwidth and are of no use to the receiver. It is very difficult to filter spam as spammers try to tackle the processes carried out by the filtering mechanism. Various classification algorithms are used to classify a mail as spam or non-spam (ham). The present paper compares and discusses the effectiveness of four machine learning classification algorithms, belonging to different categories (Probabilistic, Decision Tree, Vector Machines and Lazy Algorithms) on the basis of various performance measures, using WEKA, a data mining tool to analyze different algorithms. Enron dataset is taken in a processed form from Athens University of Economics and Business and it is found that J48 and BayesNet algorithms perform better than SVM.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Panigrahi, P.K.: A comparative study of supervised machine learning techniques for spam email filtering. In: 2012 Fourth International Conference on Computational Intelligence and Communication Networks, 2012 Panigrahi, P.K.: A comparative study of supervised machine learning techniques for spam email filtering. In: 2012 Fourth International Conference on Computational Intelligence and Communication Networks, 2012
2.
Zurück zum Zitat Alvestad, S.: Early warnings of critical diagnoses, 2009 Alvestad, S.: Early warnings of critical diagnoses, 2009
3.
Zurück zum Zitat Metsis, V., Androutsopoulos, I., Paliouras, G.: Spam filtering with Naive Bayes—which Naive Bayes?. In: Proceedings of the 3rd Conference on Email and Anti-Spam (CEAS 2006), Mountain View, CA, USA, 2006 Metsis, V., Androutsopoulos, I., Paliouras, G.: Spam filtering with Naive Bayes—which Naive Bayes?. In: Proceedings of the 3rd Conference on Email and Anti-Spam (CEAS 2006), Mountain View, CA, USA, 2006
4.
Zurück zum Zitat Ackoff, R.L.: From data to wisdom. J. Appl. Syst. Anal. 16(1), 3–9 (1989) Ackoff, R.L.: From data to wisdom. J. Appl. Syst. Anal. 16(1), 3–9 (1989)
5.
Zurück zum Zitat Sharma, N.: The origin of the data information knowledge wisdom hierarchy. Data Inf. Knowl. Wisdom hierarchy (2008) Sharma, N.: The origin of the data information knowledge wisdom hierarchy. Data Inf. Knowl. Wisdom hierarchy (2008)
6.
Zurück zum Zitat Chandrasekaran, M., Narayanan, K., Upadhyaya, S.: Phishing email detection based on structural properties. In: Proceedings of 9th Annual NYS Cyber Security Conference, June 2006 Chandrasekaran, M., Narayanan, K., Upadhyaya, S.: Phishing email detection based on structural properties. In: Proceedings of 9th Annual NYS Cyber Security Conference, June 2006
7.
Zurück zum Zitat Ozarkar, P., Patwardhan, M.: Efficient spam classification by appropriate feature selection. Global J. Comput. Sci. Technol. Softw. Data Eng. 13(5) (2013) Ozarkar, P., Patwardhan, M.: Efficient spam classification by appropriate feature selection. Global J. Comput. Sci. Technol. Softw. Data Eng. 13(5) (2013)
8.
Zurück zum Zitat Lim, T.S., Loh, W.Y., Shih, Y.S.: A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Mach. Learn. 40(3), 203–228 (2000)CrossRefMATH Lim, T.S., Loh, W.Y., Shih, Y.S.: A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Mach. Learn. 40(3), 203–228 (2000)CrossRefMATH
9.
Zurück zum Zitat Hsiao, W.F., Chang, T.M.: An incremental cluster-based approach to spam filtering. Expert Syst. Appl. 34(3), 1599–1608 (2008) Hsiao, W.F., Chang, T.M.: An incremental cluster-based approach to spam filtering. Expert Syst. Appl. 34(3), 1599–1608 (2008)
10.
Zurück zum Zitat Awad, W.A., Elseuofi, S.M.: Machine learning methods for E-mail classification. Int. J. Comput. Appl. 16(1), 39–45 (2011). doi:10.5120/1974-2646 Awad, W.A., Elseuofi, S.M.: Machine learning methods for E-mail classification. Int. J. Comput. Appl. 16(1), 39–45 (2011). doi:10.​5120/​1974-2646
11.
Zurück zum Zitat El-Alfy, E.S.M., Abdel-Aal, R.E.: Using GMDH-based networks for improved spam detection and email feature analysis. Appl. Soft Comput. 11(1), 477–488 (2011)CrossRef El-Alfy, E.S.M., Abdel-Aal, R.E.: Using GMDH-based networks for improved spam detection and email feature analysis. Appl. Soft Comput. 11(1), 477–488 (2011)CrossRef
14.
Zurück zum Zitat Ahmed, Kh: An overview of content-based spam filtering techniques. Informatica 31(3), 269–277 (2007)MATH Ahmed, Kh: An overview of content-based spam filtering techniques. Informatica 31(3), 269–277 (2007)MATH
15.
Zurück zum Zitat Geetha Ramani, R., Sivagami, G.: Parkinson disease classification using data mining algorithms. Int. J. Comput. 32(9) (2011) Geetha Ramani, R., Sivagami, G.: Parkinson disease classification using data mining algorithms. Int. J. Comput. 32(9) (2011)
16.
Zurück zum Zitat Guzella, T.S., Caminhas, W.M.: A review of machine learning approaches to Spam filtering. Expert Syst. Appl. 36, 10206–10222 (2009)CrossRef Guzella, T.S., Caminhas, W.M.: A review of machine learning approaches to Spam filtering. Expert Syst. Appl. 36, 10206–10222 (2009)CrossRef
Metadaten
Titel
Comparative Study of Classification Algorithms for Spam Email Detection
verfasst von
Aakanksha Sharaff
Naresh Kumar Nagwani
Abhishek Dhadse
Copyright-Jahr
2016
Verlag
Springer India
DOI
https://doi.org/10.1007/978-81-322-2553-9_23

Neuer Inhalt