Skip to main content
Top
Published in: Soft Computing 13/2020

02-11-2019 | Methodologies and Application

Machine intelligence-based algorithms for spam filtering on document labeling

Authors: Devottam Gaurav, Sanju Mishra Tiwari, Ayush Goyal, Niketa Gandhi, Ajith Abraham

Published in: Soft Computing | Issue 13/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The internet has provided numerous modes for secure data transmission from one end station to another, and email is one of those. The reason behind its popular usage is its cost-effectiveness and facility for fast communication. In the meantime, many undesirable emails are generated in a bulk format for a monetary benefit called spam. Despite the fact that people have the ability to promptly recognize an email as spam, performing such task may waste time. To simplify the classification task of a computer in an automated way, a machine learning method is used. Due to limited availability of datasets for email spam, constrained data and the text written in an informal way are the most feasible issues that forced the current algorithms to fail to meet the expectations during classification. This paper proposed a novel, spam mail detection method based on the document labeling concept which classifies the new ones into ham or spam. Moreover, algorithms like Naive Bayes, Decision Tree and Random Forest (RF) are used in the classification process. Three datasets are used to evaluate how the proposed algorithm works. Experimental results illustrate that RF has higher accuracy when compared with other methods.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
go back to reference Ahuja L (2018) Handling web spamming using logic approach. In: International conference on advances in computing and data sciences. Springer, Singapore, pp 380–387 Ahuja L (2018) Handling web spamming using logic approach. In: International conference on advances in computing and data sciences. Springer, Singapore, pp 380–387
go back to reference Attenberg J, Weinberger K, Dasgupta A, Smola A, Zinkevich M (2009) Collaborative email-spam filtering with the hashing trick. In: Proceedings of the sixth conference on email and anti-spam Attenberg J, Weinberger K, Dasgupta A, Smola A, Zinkevich M (2009) Collaborative email-spam filtering with the hashing trick. In: Proceedings of the sixth conference on email and anti-spam
go back to reference Bassiouni M, Ali M, El-Dahshan EA (2018) Ham and spam e-mails classification using machine learning techniques. J Appl Secur Res 13(3):315–331CrossRef Bassiouni M, Ali M, El-Dahshan EA (2018) Ham and spam e-mails classification using machine learning techniques. J Appl Secur Res 13(3):315–331CrossRef
go back to reference Bhat SY, Abulaish M, Mirza AA (2014) Spammer classification using ensemble methods over structural social network features. In: Proceedings of the 2014 IEEE/WIC/ACM international joint conferences on web intelligence (WI) and intelligent agent technologies (IAT), vol 02. IEEE Computer Society Bhat SY, Abulaish M, Mirza AA (2014) Spammer classification using ensemble methods over structural social network features. In: Proceedings of the 2014 IEEE/WIC/ACM international joint conferences on web intelligence (WI) and intelligent agent technologies (IAT), vol 02. IEEE Computer Society
go back to reference Camastra F, Ciaramella A, Staiano A (2013) Machine learning and soft computing for ICT security: an overview of current trends. J Ambient Intell Humaniz Comput 4:235–247CrossRef Camastra F, Ciaramella A, Staiano A (2013) Machine learning and soft computing for ICT security: an overview of current trends. J Ambient Intell Humaniz Comput 4:235–247CrossRef
go back to reference Chebrolu S, Abraham A, Thomas JP (2005) Feature deduction and ensemble design of intrusion detection systems. Comput Secur 24(4):295–307CrossRef Chebrolu S, Abraham A, Thomas JP (2005) Feature deduction and ensemble design of intrusion detection systems. Comput Secur 24(4):295–307CrossRef
go back to reference Christina V, Karpagavalli S, Suganya G (2010) A study on email spam filtering techniques. Int J Comput Appl 12(1):0975–8887 Christina V, Karpagavalli S, Suganya G (2010) A study on email spam filtering techniques. Int J Comput Appl 12(1):0975–8887
go back to reference Gaurav D, Yadav JKPS, Kaliyar RK, Goyal A (2019) Detection of false positive situation in review mining. Soft Computing and signal processing. Springer, Singapore, pp 83–90 Gaurav D, Yadav JKPS, Kaliyar RK, Goyal A (2019) Detection of false positive situation in review mining. Soft Computing and signal processing. Springer, Singapore, pp 83–90
go back to reference Gupta S, Kumar P, Abraham A (2013) A profile based network intrusion detection and prevention system for securing cloud environment. Int J Distrib Sensor Netw 9(3):364575CrossRef Gupta S, Kumar P, Abraham A (2013) A profile based network intrusion detection and prevention system for securing cloud environment. Int J Distrib Sensor Netw 9(3):364575CrossRef
go back to reference Herrero A, Corchado E, Pellicer MA, Abraham A (2009) MOVIH-IDS: a mobile-visualization hybrid intrusion detection system. Neurocomputing 72(13–15):2775–2784CrossRef Herrero A, Corchado E, Pellicer MA, Abraham A (2009) MOVIH-IDS: a mobile-visualization hybrid intrusion detection system. Neurocomputing 72(13–15):2775–2784CrossRef
go back to reference Staiano A, Di Taranto MD, Bloise E, Agostino MND, D’Angelo A, Marotta G, Gentile M, Jossa F, Iannuzzi A, Rubba P, Fortunato G (2013) Investigation of single nucleotide polymorphisms associated to familial combined hyperlipidemia with random forests. In: Neural nets and surroundings. Springer, Berlin, Heidelberg, pp 169–178CrossRef Staiano A, Di Taranto MD, Bloise E, Agostino MND, D’Angelo A, Marotta G, Gentile M, Jossa F, Iannuzzi A, Rubba P, Fortunato G (2013) Investigation of single nucleotide polymorphisms associated to familial combined hyperlipidemia with random forests. In: Neural nets and surroundings. Springer, Berlin, Heidelberg, pp 169–178CrossRef
go back to reference Kim D, Deokseong S, Suhyoun C, Pilsung K (2019) Multi-co-training for document classification using various document representations: TF–IDF, LDA, and Doc2Vec. Inf Sci 477:15–19CrossRef Kim D, Deokseong S, Suhyoun C, Pilsung K (2019) Multi-co-training for document classification using various document representations: TF–IDF, LDA, and Doc2Vec. Inf Sci 477:15–19CrossRef
go back to reference Kumar RK, Poonkuzhali G, Sudhakar P (2012) Comparative study on email spam classifier using data mining techniques. In: Proceedings of the international multi-conference of engineers and computer scientists, vol 1, Hong Kong, pp 14–16 Kumar RK, Poonkuzhali G, Sudhakar P (2012) Comparative study on email spam classifier using data mining techniques. In: Proceedings of the international multi-conference of engineers and computer scientists, vol 1, Hong Kong, pp 14–16
go back to reference Liu TJ, Tsao WL, Lee CL (2010) A high performance image-spam filtering system. In: 2010 ninth international symposium on distributed computing and applications to business engineering and science (DCABES). IEEE, pp 445-449 Liu TJ, Tsao WL, Lee CL (2010) A high performance image-spam filtering system. In: 2010 ninth international symposium on distributed computing and applications to business engineering and science (DCABES). IEEE, pp 445-449
go back to reference Merugu S, Reddy MCS, Goyal E, Piplani L (2019) Text message classification using supervised machine learning algorithms. In: Kumar A, Mozar S (eds) ICCCE 2018. ICCCE 2018. Lecture Notes in Electrical Engineering, vol 500. Springer, Singapore, p 2019 Merugu S, Reddy MCS, Goyal E, Piplani L (2019) Text message classification using supervised machine learning algorithms. In: Kumar A, Mozar S (eds) ICCCE 2018. ICCCE 2018. Lecture Notes in Electrical Engineering, vol 500. Springer, Singapore, p 2019
go back to reference Mishra S, Sagban R, Yakoob A, Gandhi N (2018) Swarm intelligence in anomaly detection systems: an overview. Int J Comput Appl 1–10. (2018) Mishra S, Sagban R, Yakoob A, Gandhi N (2018) Swarm intelligence in anomaly detection systems: an overview. Int J Comput Appl 1–10. (2018)
go back to reference Nizamani S, Memon N, Wiil UK, Karampelas P (2013) Modeling suspicious email detection using enhanced feature selection. arXiv:1312.1971 Nizamani S, Memon N, Wiil UK, Karampelas P (2013) Modeling suspicious email detection using enhanced feature selection. arXiv:​1312.​1971
go back to reference Park YW, Klabjan D (2018) Three iteratively reweighted least squares algorithms for L1-norm principal component analysis. Knowl Inf Syst 54(3):541–565CrossRef Park YW, Klabjan D (2018) Three iteratively reweighted least squares algorithms for L1-norm principal component analysis. Knowl Inf Syst 54(3):541–565CrossRef
go back to reference Radev D (2008) CLAIR collection of fraud email, ACL data and code repository. ADCR2008T001 Radev D (2008) CLAIR collection of fraud email, ACL data and code repository. ADCR2008T001
go back to reference Sarwat N, Menon N, Glasdam M, Nguyen DD (2014) Detection of fraudulent emails by employing advanced feature abundance. Egypt Inform J 15:169–174CrossRef Sarwat N, Menon N, Glasdam M, Nguyen DD (2014) Detection of fraudulent emails by employing advanced feature abundance. Egypt Inform J 15:169–174CrossRef
go back to reference Sharaff A, Nagwani NK, Dhadse A (2016) Comparative study of classification algorithms for spam email detection. In: Shetty N, Prasad N, Nalini N (eds) Emerging research in computing, information, communication and applications. Springer, New Delhi Sharaff A, Nagwani NK, Dhadse A (2016) Comparative study of classification algorithms for spam email detection. In: Shetty N, Prasad N, Nalini N (eds) Emerging research in computing, information, communication and applications. Springer, New Delhi
go back to reference Trivedi SK, Dey S (2013) Interplay between probabilistic classifiers and boosting algorithms for detecting complex unsolicited emails. J Adv Comput Netw 1(2):132–136CrossRef Trivedi SK, Dey S (2013) Interplay between probabilistic classifiers and boosting algorithms for detecting complex unsolicited emails. J Adv Comput Netw 1(2):132–136CrossRef
go back to reference Vidya Kumari KR, Kavitha CR (2019) Spam detection using machine learning in R. In: Smys S, Bestak R, Chen JZ, Kotuliak I (eds) International conference on computer networks and communication technologies. Lecture Notes on Data Engineering and Communications Technologies, vol 15. Springer, Singapore Vidya Kumari KR, Kavitha CR (2019) Spam detection using machine learning in R. In: Smys S, Bestak R, Chen JZ, Kotuliak I (eds) International conference on computer networks and communication technologies. Lecture Notes on Data Engineering and Communications Technologies, vol 15. Springer, Singapore
go back to reference Yoon JW, Hyoungshick K, Huh JH (2010) Hybrid spam filtering for mobile communication. Comput Secur 29(4):446–459CrossRef Yoon JW, Hyoungshick K, Huh JH (2010) Hybrid spam filtering for mobile communication. Comput Secur 29(4):446–459CrossRef
go back to reference Youn S, McLeod D (2007) A comparative study for email classification. In: Elleithy K (ed) Advances and Innovations in systems, computing sciences and software engineering. Springer, Dordrecht Youn S, McLeod D (2007) A comparative study for email classification. In: Elleithy K (ed) Advances and Innovations in systems, computing sciences and software engineering. Springer, Dordrecht
Metadata
Title
Machine intelligence-based algorithms for spam filtering on document labeling
Authors
Devottam Gaurav
Sanju Mishra Tiwari
Ayush Goyal
Niketa Gandhi
Ajith Abraham
Publication date
02-11-2019
Publisher
Springer Berlin Heidelberg
Published in
Soft Computing / Issue 13/2020
Print ISSN: 1432-7643
Electronic ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-019-04473-7

Other articles of this Issue 13/2020

Soft Computing 13/2020 Go to the issue

Premium Partner