Skip to main content
Top
Published in: Journal of Intelligent Information Systems 1/2014

01-02-2014

Cost-sensitive three-way email spam filtering

Authors: Bing Zhou, Yiyu Yao, Jigang Luo

Published in: Journal of Intelligent Information Systems | Issue 1/2014

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Email spam filtering is typically treated as a binary classification problem that can be solved by machine learning algorithms. We argue that a three-way decision approach provides a more meaningful way to users for precautionary handling their incoming emails. Three email folders instead of two are produced in a three-way spam filtering system, a suspected folder is added to allow users make further examinations of suspicious emails, thereby reducing the chances of misclassification. Different from existing ternary email spam filtering systems, we focus on two issues that are less studied, that is, the computation of required thresholds to define the three email categories, and the interpretation of the cost-sensitive characteristics of spam filtering. Instead of supplying the thresholds based on intuitive understandings of the levels of tolerance for errors, we systematically calculate the thresholds based on decision-theoretic rough set model. A loss function is interpreted as the costs of making classification decisions. A decision is made for which the overall cost is minimum. Experimental results show that the new approach reduces the error rate of misclassifying a legitimate email to spam and demonstrates a better performance for the cost-sensitivity aspect.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., Paliouras, G., Spyropoulos, C.D. (2000). An evaluation of naive Bayesian anti-spam filtering. In Proc. of the workshop on machine learning in the new information age. Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., Paliouras, G., Spyropoulos, C.D. (2000). An evaluation of naive Bayesian anti-spam filtering. In Proc. of the workshop on machine learning in the new information age.
go back to reference Cohen, W. (1996). Learning rules that classify email. In Advances in inductive logic programming. Cohen, W. (1996). Learning rules that classify email. In Advances in inductive logic programming.
go back to reference Cristianini, N., & Shawe-Taylor, J. (2000). An introduction to support vector machines and other kernel-based learning methods. Cambridge: Cambridge University Press.CrossRef Cristianini, N., & Shawe-Taylor, J. (2000). An introduction to support vector machines and other kernel-based learning methods. Cambridge: Cambridge University Press.CrossRef
go back to reference Drummond, C., & Holte, R.C. (2000). Explicitly representing expected cost: an alternative to ROC representation. In KDD 2000 (pp. 198–207). Drummond, C., & Holte, R.C. (2000). Explicitly representing expected cost: an alternative to ROC representation. In KDD 2000 (pp. 198–207).
go back to reference Drummond, C., & Holte, R.C. (2006). Cost curves: an improved method for visualizing classifier performance. Machine Learning, 65(1), 95–130.CrossRef Drummond, C., & Holte, R.C. (2006). Cost curves: an improved method for visualizing classifier performance. Machine Learning, 65(1), 95–130.CrossRef
go back to reference Duda, R.O., & Hart, P.E. (1973). Pattern classification and scene analysis. New York: Wiley.MATH Duda, R.O., & Hart, P.E. (1973). Pattern classification and scene analysis. New York: Wiley.MATH
go back to reference Elkan, C. (2001). The foundations of cost-senstive learning. In Proceedings of the 17th international joint conference on artificial intelligence (pp. 973–978). Elkan, C. (2001). The foundations of cost-senstive learning. In Proceedings of the 17th international joint conference on artificial intelligence (pp. 973–978).
go back to reference Fayyad, U.M., & Irani, K.B. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the 13th international joint conference on artificial intelligence (pp. 1022–1029). Fayyad, U.M., & Irani, K.B. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the 13th international joint conference on artificial intelligence (pp. 1022–1029).
go back to reference Good, I.J. (1965). The estimation of probabilities: An essay on modern Bayesian methods. Cambridge: MIT Press.MATH Good, I.J. (1965). The estimation of probabilities: An essay on modern Bayesian methods. Cambridge: MIT Press.MATH
go back to reference Masand, B., Linoff, G., Waltz, D. (1992). Classifying news stories using memory based reasoning. In Proceedings of the 15th annual international ACM SIGIR conference on research and development in information retrieval (pp. 59–65). Masand, B., Linoff, G., Waltz, D. (1992). Classifying news stories using memory based reasoning. In Proceedings of the 15th annual international ACM SIGIR conference on research and development in information retrieval (pp. 59–65).
go back to reference Mitchell, T. (1997). Machine learning. New York: McGraw Hill.MATH Mitchell, T. (1997). Machine learning. New York: McGraw Hill.MATH
go back to reference Pantel, P., & Lin, D.K. (1998). SpamCop—a spam classification & organization program. In Proceedings of AAAI workshop on learning for text categorization (pp. 95–98). Madison, WI. Pantel, P., & Lin, D.K. (1998). SpamCop—a spam classification & organization program. In Proceedings of AAAI workshop on learning for text categorization (pp. 95–98). Madison, WI.
go back to reference Robinson, G. (2004). A statistical approach to the spam problem, spam detection. In Why Chi? Motivations for the use of fishers inverse Chi-square procedure in spam classification. Handling redundancy in email token probabilities. Robinson, G. (2004). A statistical approach to the spam problem, spam detection. In Why Chi? Motivations for the use of fishers inverse Chi-square procedure in spam classification. Handling redundancy in email token probabilities.
go back to reference Sahami, M., Dumais, S., Heckerman, D., Horvitz, E. (1998). A Bayesian approach to filtering junk e-mail. In AAAI workshop on learning for text categorization. AAAI Technical Report WS-98-05, Madison, Wisconsin. Sahami, M., Dumais, S., Heckerman, D., Horvitz, E. (1998). A Bayesian approach to filtering junk e-mail. In AAAI workshop on learning for text categorization. AAAI Technical Report WS-98-05, Madison, Wisconsin.
go back to reference Schapire, E., & Singer, Y. (2000). BoosTexter: a boosting-based system for text categorization. Machine Learning, 39(2/3), 135–168.CrossRefMATH Schapire, E., & Singer, Y. (2000). BoosTexter: a boosting-based system for text categorization. Machine Learning, 39(2/3), 135–168.CrossRefMATH
go back to reference Siersdorfer, S., & Weikum, G. (2005). Using restrictive classification and meta classification for junk elimination. In Proceedings of ECIR’2005 (pp. 287–299). Siersdorfer, S., & Weikum, G. (2005). Using restrictive classification and meta classification for junk elimination. In Proceedings of ECIR’2005 (pp. 287–299).
go back to reference Triola, M.F. (2005). Elementary statistics. Reading: Addison Wesley. Triola, M.F. (2005). Elementary statistics. Reading: Addison Wesley.
go back to reference Yao, Y.Y. (2011). The superiority of three-way decisions in probabilistic rough set models. Information Sciences, 181, 1080–1096.CrossRefMATHMathSciNet Yao, Y.Y. (2011). The superiority of three-way decisions in probabilistic rough set models. Information Sciences, 181, 1080–1096.CrossRefMATHMathSciNet
go back to reference Yao, Y.Y., Wong, S.K.M., Lingras, P. (1990). A decision-theoretic rough set model. In Z.W. Ras, M. Zemankova, M.L. Emrich (Eds.), Methodologies for intelligent systems (Vol. 5, pp. 17–24). New York: North Holland. Yao, Y.Y., Wong, S.K.M., Lingras, P. (1990). A decision-theoretic rough set model. In Z.W. Ras, M. Zemankova, M.L. Emrich (Eds.), Methodologies for intelligent systems (Vol. 5, pp. 17–24). New York: North Holland.
go back to reference Yerazunis, W.S. (2003). Sparse binary polynomial hashing and the CRM114 discriminator. In Proceedings of the MIT spam conference. Yerazunis, W.S. (2003). Sparse binary polynomial hashing and the CRM114 discriminator. In Proceedings of the MIT spam conference.
go back to reference Yih, W., McCann, R., Kolcz, A. (2007). Improving spam filtering by Detecting Gray mail. In Proceedings of the 4th conference on e-mail and anti-spam (CEAS07). Yih, W., McCann, R., Kolcz, A. (2007). Improving spam filtering by Detecting Gray mail. In Proceedings of the 4th conference on e-mail and anti-spam (CEAS07).
go back to reference Zhao, W., & Zhang, Z. (2005). An email classification model based on rough set theory. In Procedings of the international conference on active media technology (pp. 403–408). Zhao, W., & Zhang, Z. (2005). An email classification model based on rough set theory. In Procedings of the international conference on active media technology (pp. 403–408).
go back to reference Zhou, Z.H., & Liu, X.Y. (2006). Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Transactions on Knowledge and Data Engineering, 18(1), 63–77.CrossRef Zhou, Z.H., & Liu, X.Y. (2006). Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Transactions on Knowledge and Data Engineering, 18(1), 63–77.CrossRef
go back to reference Zhou, Z.H., & Liu, X.Y. (2010). On multi-class cost-sensitive learning. Computational Intelligence, 26(3), 232–257.CrossRefMathSciNet Zhou, Z.H., & Liu, X.Y. (2010). On multi-class cost-sensitive learning. Computational Intelligence, 26(3), 232–257.CrossRefMathSciNet
go back to reference Zhou, B., & Liu, Q.Z. (2012). A comparison study of cost-sensitive classifier evaluations. In The 2012 international conference on brain informatics (BI’12). Lecture notes in computer science (Vol. 7670, pp. 360–371). Zhou, B., & Liu, Q.Z. (2012). A comparison study of cost-sensitive classifier evaluations. In The 2012 international conference on brain informatics (BI’12). Lecture notes in computer science (Vol. 7670, pp. 360–371).
go back to reference Zhou, B., Yao, Y.Y., Luo, J.G. (2010). A three-way decision approach to email spam filtering. In Proceedings of the 23th Canadian conference on artificial intelligence (AI 2010), University of Ottawa, Ontario, Canada, 31 May–2 June 2010. Lecture notes in artificial intelligence (pp. 28–39). Zhou, B., Yao, Y.Y., Luo, J.G. (2010). A three-way decision approach to email spam filtering. In Proceedings of the 23th Canadian conference on artificial intelligence (AI 2010), University of Ottawa, Ontario, Canada, 31 May–2 June 2010. Lecture notes in artificial intelligence (pp. 28–39).
Metadata
Title
Cost-sensitive three-way email spam filtering
Authors
Bing Zhou
Yiyu Yao
Jigang Luo
Publication date
01-02-2014
Publisher
Springer US
Published in
Journal of Intelligent Information Systems / Issue 1/2014
Print ISSN: 0925-9902
Electronic ISSN: 1573-7675
DOI
https://doi.org/10.1007/s10844-013-0254-7

Other articles of this Issue 1/2014

Journal of Intelligent Information Systems 1/2014 Go to the issue

Premium Partner