skip to main content
10.1145/2663761.2664233acmconferencesArticle/Chapter ViewAbstractPublication PagesracsConference Proceedingsconference-collections
research-article

A study of ensemble based evolutionary classifiers for detecting unsolicited emails

Authors Info & Claims
Published:05 October 2014Publication History

ABSTRACT

Identification of unsolicited emails or spam in a set of email files has become a challenging area of research. A robust classifier is not only appraised by performance accuracy but also false positive rate. Recently, Evolutionary algorithms and ensemble of classifiers methods have gained popularity in this domain. For developing an accurate and sensitive spam classifier, this research conducts a study of Evolutionary algorithm based classifiers i.e. Genetic Algorithm (GA) and Genetic Programming (GP) along with ensemble techniques. Two publicly available datasets (Enron and SpamAssassin) are used for testing, with the help of most informative features selected by Greedy Stepwise Search algorithm. Results show that without ensemble, GA performs better than GP but after an ensemble of many weak classifiers is developed, GP overshoots GA with significantly higher accuracy. Also, Greedy Stepwise Feature Search is found to be a strong method for feature selection in this application domain. Ensemble based GP turns out to be not only good in terms of classification accuracy but also in terms of low False Positive rates, which is considered to be an important criteria for building a robust spam classifier.

References

  1. Aladdin Knowledge Systems, Anti-spam white paper, <http://www.eAladdin.com>.Google ScholarGoogle Scholar
  2. Trivedi, Shrawan Kumar, and Dey, Shubhamoy. "Interplay between Probabilistic Classifiers and Boosting Algorithms for Detecting Complex Unsolicited Emails." Journal of Advances in Computer Networks 1, no. 2 (2013).Google ScholarGoogle Scholar
  3. Lai, Chih-Chin. "An empirical study of three machine learning methods for spam filtering." Knowledge-Based Systems 20, no. 3 (2007): 249--254. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Trivedi, Shrawan Kumar, Shubhamoy Dey, and Prabandh Shikhar. "Effect of Various Kernels and Feature Selection Methods on SVM Performance for Detecting Email Spams." International Journal of Computer Applications 66, no. 21 (2013).Google ScholarGoogle Scholar
  5. Brill, Frank Z., Donald E. Brown, and Worthy N. Martin. "Fast generic selection of features for neural network classifiers." Neural Networks, IEEE Transactions on 3, no. 2 (1992): 324--328. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Raymer, Michael L., William F. Punch, Erik D. Goodman, Leslie A. Kuhn, and Anil K. Jain. "Dimensionality reduction using genetic algorithms." Evolutionary Computation, IEEE Transactions on 4, no. 2 (2000): 164--171. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Vafaie, Haleh, and Kenneth De Jong. "Feature space transformation using genetic algorithms." IEEE Intelligent Systems 13, no. 2 (1998): 57--65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Banzhaf, W., Nordin P., Keller R. E., and Francone F. D., Genetic Programming---An Introduction; On the Automatic Evolution of Computer Programs and its Applications. San Mateo, CA/Heidelberg, Germany:MorganKaufmann/dpunkt.verlag, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Bauer, Eric, and Ron Kohavi. "An empirical comparison of voting classification algorithms: Bagging, boosting, and variants." Machine learning 36, no. 1--2 (1999): 105--139. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Sakkis, Georgios, Ion Androutsopoulos, Georgios Paliouras, Vangelis Karkaletsis, Constantine D. Spyropoulos, and Panagiotis Stamatopoulos. "Stacking classifiers for anti-spam filtering of e-mail." arXiv preprint cs/0106040(2001).Google ScholarGoogle Scholar
  11. Carreras, Xavier, and Lluís Màrquez. "Boosting trees for clause splitting." InProceedings of the 2001 workshop on Computational Natural Language Learning-Volume 7, p. 26. Association for Computational Linguistics, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Trivedi, Shrawan Kumar, and Dey, Shubhamoy. "An Enhanced Genetic Programming Approach for Detecting Unsolicited Emails." In Computational Science and Engineering (CSE), 2013 IEEE 16th International Conference on, pp. 1153--1160. IEEE, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Liu, Bo, Bob McKay, and Hussein A. Abbass. "Improving genetic classifiers with a boosting algorithm." In Evolutionary Computation, 2003. CEC'03. The 2003 Congress on, vol. 4, pp. 2596--2602. IEEE, 2003.Google ScholarGoogle Scholar
  14. Trivedi, Shrawan Kumar, and Dey, Shubhamoy. "Effect of feature selection methods on machine learning classifiers for detecting email spams." InProceedings of the 2013 Research in Adaptive and Convergent Systems, pp. 35--40. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Trivedi, Shrawan Kumar, and Dey, Shubhamoy. "Interaction between feature subset selection techniques and machine learning classifiers for detecting unsolicited emails." ACM SIGAPP Applied Computing Review 14, no. 1 (2014): 53--61. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Freund, Yoav, and Robert E. Schapire. "Experiments with a new boosting algorithm." In ICML, vol. 96, pp. 148--156. 1996.Google ScholarGoogle Scholar
  17. Hastie, Trevor, Robert Tibshirani, Jerome Friedman, T. Hastie, J. Friedman, and R. Tibshirani. The elements of statistical learning. Vol. 2, no. 1. New York: Springer, 2009.Google ScholarGoogle Scholar
  18. Holland, John H. Adaptation in natural and artificial systems: An introductory analysis with applications to biology, control, and artificial intelligence. U Michigan Press, 1975.Google ScholarGoogle Scholar
  19. Dai, Na, Brian D. Davison, and Xiaoguang Qi. "Looking into the past to better classify web spam." In Proceedings of the 5th international workshop on adversarial information retrieval on the web, pp. 1--8. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kishore, J. K., Lalit M. Patnaik, V. Mani, and V. K. Agrawal. "Application of genetic programming for multicategory pattern classification." Evolutionary Computation, IEEE Transactions on 4, no. 3 (2000): 242--258. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Joachims, Thorsten. Text categorization with support vector machines: Learning with many relevant features. Springer Berlin Heidelberg, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A study of ensemble based evolutionary classifiers for detecting unsolicited emails

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            RACS '14: Proceedings of the 2014 Conference on Research in Adaptive and Convergent Systems
            October 2014
            386 pages
            ISBN:9781450330602
            DOI:10.1145/2663761

            Copyright © 2014 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 5 October 2014

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            RACS '14 Paper Acceptance Rate59of251submissions,24%Overall Acceptance Rate393of1,581submissions,25%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader