ABSTRACT
Identification of unsolicited emails or spam in a set of email files has become a challenging area of research. A robust classifier is not only appraised by performance accuracy but also false positive rate. Recently, Evolutionary algorithms and ensemble of classifiers methods have gained popularity in this domain. For developing an accurate and sensitive spam classifier, this research conducts a study of Evolutionary algorithm based classifiers i.e. Genetic Algorithm (GA) and Genetic Programming (GP) along with ensemble techniques. Two publicly available datasets (Enron and SpamAssassin) are used for testing, with the help of most informative features selected by Greedy Stepwise Search algorithm. Results show that without ensemble, GA performs better than GP but after an ensemble of many weak classifiers is developed, GP overshoots GA with significantly higher accuracy. Also, Greedy Stepwise Feature Search is found to be a strong method for feature selection in this application domain. Ensemble based GP turns out to be not only good in terms of classification accuracy but also in terms of low False Positive rates, which is considered to be an important criteria for building a robust spam classifier.
- Aladdin Knowledge Systems, Anti-spam white paper, <http://www.eAladdin.com>.Google Scholar
- Trivedi, Shrawan Kumar, and Dey, Shubhamoy. "Interplay between Probabilistic Classifiers and Boosting Algorithms for Detecting Complex Unsolicited Emails." Journal of Advances in Computer Networks 1, no. 2 (2013).Google Scholar
- Lai, Chih-Chin. "An empirical study of three machine learning methods for spam filtering." Knowledge-Based Systems 20, no. 3 (2007): 249--254. Google ScholarDigital Library
- Trivedi, Shrawan Kumar, Shubhamoy Dey, and Prabandh Shikhar. "Effect of Various Kernels and Feature Selection Methods on SVM Performance for Detecting Email Spams." International Journal of Computer Applications 66, no. 21 (2013).Google Scholar
- Brill, Frank Z., Donald E. Brown, and Worthy N. Martin. "Fast generic selection of features for neural network classifiers." Neural Networks, IEEE Transactions on 3, no. 2 (1992): 324--328. Google ScholarDigital Library
- Raymer, Michael L., William F. Punch, Erik D. Goodman, Leslie A. Kuhn, and Anil K. Jain. "Dimensionality reduction using genetic algorithms." Evolutionary Computation, IEEE Transactions on 4, no. 2 (2000): 164--171. Google ScholarDigital Library
- Vafaie, Haleh, and Kenneth De Jong. "Feature space transformation using genetic algorithms." IEEE Intelligent Systems 13, no. 2 (1998): 57--65. Google ScholarDigital Library
- Banzhaf, W., Nordin P., Keller R. E., and Francone F. D., Genetic Programming---An Introduction; On the Automatic Evolution of Computer Programs and its Applications. San Mateo, CA/Heidelberg, Germany:MorganKaufmann/dpunkt.verlag, 1998. Google ScholarDigital Library
- Bauer, Eric, and Ron Kohavi. "An empirical comparison of voting classification algorithms: Bagging, boosting, and variants." Machine learning 36, no. 1--2 (1999): 105--139. Google ScholarDigital Library
- Sakkis, Georgios, Ion Androutsopoulos, Georgios Paliouras, Vangelis Karkaletsis, Constantine D. Spyropoulos, and Panagiotis Stamatopoulos. "Stacking classifiers for anti-spam filtering of e-mail." arXiv preprint cs/0106040(2001).Google Scholar
- Carreras, Xavier, and Lluís Màrquez. "Boosting trees for clause splitting." InProceedings of the 2001 workshop on Computational Natural Language Learning-Volume 7, p. 26. Association for Computational Linguistics, 2001. Google ScholarDigital Library
- Trivedi, Shrawan Kumar, and Dey, Shubhamoy. "An Enhanced Genetic Programming Approach for Detecting Unsolicited Emails." In Computational Science and Engineering (CSE), 2013 IEEE 16th International Conference on, pp. 1153--1160. IEEE, 2013. Google ScholarDigital Library
- Liu, Bo, Bob McKay, and Hussein A. Abbass. "Improving genetic classifiers with a boosting algorithm." In Evolutionary Computation, 2003. CEC'03. The 2003 Congress on, vol. 4, pp. 2596--2602. IEEE, 2003.Google Scholar
- Trivedi, Shrawan Kumar, and Dey, Shubhamoy. "Effect of feature selection methods on machine learning classifiers for detecting email spams." InProceedings of the 2013 Research in Adaptive and Convergent Systems, pp. 35--40. ACM, 2013. Google ScholarDigital Library
- Trivedi, Shrawan Kumar, and Dey, Shubhamoy. "Interaction between feature subset selection techniques and machine learning classifiers for detecting unsolicited emails." ACM SIGAPP Applied Computing Review 14, no. 1 (2014): 53--61. Google ScholarDigital Library
- Freund, Yoav, and Robert E. Schapire. "Experiments with a new boosting algorithm." In ICML, vol. 96, pp. 148--156. 1996.Google Scholar
- Hastie, Trevor, Robert Tibshirani, Jerome Friedman, T. Hastie, J. Friedman, and R. Tibshirani. The elements of statistical learning. Vol. 2, no. 1. New York: Springer, 2009.Google Scholar
- Holland, John H. Adaptation in natural and artificial systems: An introductory analysis with applications to biology, control, and artificial intelligence. U Michigan Press, 1975.Google Scholar
- Dai, Na, Brian D. Davison, and Xiaoguang Qi. "Looking into the past to better classify web spam." In Proceedings of the 5th international workshop on adversarial information retrieval on the web, pp. 1--8. ACM, 2009. Google ScholarDigital Library
- Kishore, J. K., Lalit M. Patnaik, V. Mani, and V. K. Agrawal. "Application of genetic programming for multicategory pattern classification." Evolutionary Computation, IEEE Transactions on 4, no. 3 (2000): 242--258. Google ScholarDigital Library
- Joachims, Thorsten. Text categorization with support vector machines: Learning with many relevant features. Springer Berlin Heidelberg, 1998. Google ScholarDigital Library
Index Terms
- A study of ensemble based evolutionary classifiers for detecting unsolicited emails
Recommendations
Interaction between feature subset selection techniques and machine learning classifiers for detecting unsolicited emails
Detection of the spam emails within a set of email files has become challenging task for researchers. Identification of an effective classifier is based not only on high accuracy of detection but also on low false alarm rates, and the need to use as few ...
An Enhanced Genetic Programming Approach for Detecting Unsolicited Emails
CSE '13: Proceedings of the 2013 IEEE 16th International Conference on Computational Science and EngineeringIdentification of unsolicited emails (spams) is now a well-recognized research area within text classification. A good email classifier is not only evaluated by performance accuracy but also by the false positive rate. This research presents an Enhanced ...
A modified content-based evolutionary approach to identify unsolicited emails
AbstractThis computational research seeks to classify unsolicited versus legitimate emails. A modified version of an existing genetic programming (GP) classifier—i.e., modified genetic programming (MGP)—is implemented to build an ensemble of classifiers ...
Comments