Skip to main content
Erschienen in: Journal of Intelligent Information Systems 1/2014

01.02.2014

Cost-sensitive three-way email spam filtering

verfasst von: Bing Zhou, Yiyu Yao, Jigang Luo

Erschienen in: Journal of Intelligent Information Systems | Ausgabe 1/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Email spam filtering is typically treated as a binary classification problem that can be solved by machine learning algorithms. We argue that a three-way decision approach provides a more meaningful way to users for precautionary handling their incoming emails. Three email folders instead of two are produced in a three-way spam filtering system, a suspected folder is added to allow users make further examinations of suspicious emails, thereby reducing the chances of misclassification. Different from existing ternary email spam filtering systems, we focus on two issues that are less studied, that is, the computation of required thresholds to define the three email categories, and the interpretation of the cost-sensitive characteristics of spam filtering. Instead of supplying the thresholds based on intuitive understandings of the levels of tolerance for errors, we systematically calculate the thresholds based on decision-theoretic rough set model. A loss function is interpreted as the costs of making classification decisions. A decision is made for which the overall cost is minimum. Experimental results show that the new approach reduces the error rate of misclassifying a legitimate email to spam and demonstrates a better performance for the cost-sensitivity aspect.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., Paliouras, G., Spyropoulos, C.D. (2000). An evaluation of naive Bayesian anti-spam filtering. In Proc. of the workshop on machine learning in the new information age. Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., Paliouras, G., Spyropoulos, C.D. (2000). An evaluation of naive Bayesian anti-spam filtering. In Proc. of the workshop on machine learning in the new information age.
Zurück zum Zitat Cohen, W. (1996). Learning rules that classify email. In Advances in inductive logic programming. Cohen, W. (1996). Learning rules that classify email. In Advances in inductive logic programming.
Zurück zum Zitat Cristianini, N., & Shawe-Taylor, J. (2000). An introduction to support vector machines and other kernel-based learning methods. Cambridge: Cambridge University Press.CrossRef Cristianini, N., & Shawe-Taylor, J. (2000). An introduction to support vector machines and other kernel-based learning methods. Cambridge: Cambridge University Press.CrossRef
Zurück zum Zitat Drummond, C., & Holte, R.C. (2000). Explicitly representing expected cost: an alternative to ROC representation. In KDD 2000 (pp. 198–207). Drummond, C., & Holte, R.C. (2000). Explicitly representing expected cost: an alternative to ROC representation. In KDD 2000 (pp. 198–207).
Zurück zum Zitat Drummond, C., & Holte, R.C. (2006). Cost curves: an improved method for visualizing classifier performance. Machine Learning, 65(1), 95–130.CrossRef Drummond, C., & Holte, R.C. (2006). Cost curves: an improved method for visualizing classifier performance. Machine Learning, 65(1), 95–130.CrossRef
Zurück zum Zitat Duda, R.O., & Hart, P.E. (1973). Pattern classification and scene analysis. New York: Wiley.MATH Duda, R.O., & Hart, P.E. (1973). Pattern classification and scene analysis. New York: Wiley.MATH
Zurück zum Zitat Elkan, C. (2001). The foundations of cost-senstive learning. In Proceedings of the 17th international joint conference on artificial intelligence (pp. 973–978). Elkan, C. (2001). The foundations of cost-senstive learning. In Proceedings of the 17th international joint conference on artificial intelligence (pp. 973–978).
Zurück zum Zitat Fayyad, U.M., & Irani, K.B. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the 13th international joint conference on artificial intelligence (pp. 1022–1029). Fayyad, U.M., & Irani, K.B. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the 13th international joint conference on artificial intelligence (pp. 1022–1029).
Zurück zum Zitat Good, I.J. (1965). The estimation of probabilities: An essay on modern Bayesian methods. Cambridge: MIT Press.MATH Good, I.J. (1965). The estimation of probabilities: An essay on modern Bayesian methods. Cambridge: MIT Press.MATH
Zurück zum Zitat Masand, B., Linoff, G., Waltz, D. (1992). Classifying news stories using memory based reasoning. In Proceedings of the 15th annual international ACM SIGIR conference on research and development in information retrieval (pp. 59–65). Masand, B., Linoff, G., Waltz, D. (1992). Classifying news stories using memory based reasoning. In Proceedings of the 15th annual international ACM SIGIR conference on research and development in information retrieval (pp. 59–65).
Zurück zum Zitat Mitchell, T. (1997). Machine learning. New York: McGraw Hill.MATH Mitchell, T. (1997). Machine learning. New York: McGraw Hill.MATH
Zurück zum Zitat Pantel, P., & Lin, D.K. (1998). SpamCop—a spam classification & organization program. In Proceedings of AAAI workshop on learning for text categorization (pp. 95–98). Madison, WI. Pantel, P., & Lin, D.K. (1998). SpamCop—a spam classification & organization program. In Proceedings of AAAI workshop on learning for text categorization (pp. 95–98). Madison, WI.
Zurück zum Zitat Robinson, G. (2004). A statistical approach to the spam problem, spam detection. In Why Chi? Motivations for the use of fishers inverse Chi-square procedure in spam classification. Handling redundancy in email token probabilities. Robinson, G. (2004). A statistical approach to the spam problem, spam detection. In Why Chi? Motivations for the use of fishers inverse Chi-square procedure in spam classification. Handling redundancy in email token probabilities.
Zurück zum Zitat Sahami, M., Dumais, S., Heckerman, D., Horvitz, E. (1998). A Bayesian approach to filtering junk e-mail. In AAAI workshop on learning for text categorization. AAAI Technical Report WS-98-05, Madison, Wisconsin. Sahami, M., Dumais, S., Heckerman, D., Horvitz, E. (1998). A Bayesian approach to filtering junk e-mail. In AAAI workshop on learning for text categorization. AAAI Technical Report WS-98-05, Madison, Wisconsin.
Zurück zum Zitat Schapire, E., & Singer, Y. (2000). BoosTexter: a boosting-based system for text categorization. Machine Learning, 39(2/3), 135–168.CrossRefMATH Schapire, E., & Singer, Y. (2000). BoosTexter: a boosting-based system for text categorization. Machine Learning, 39(2/3), 135–168.CrossRefMATH
Zurück zum Zitat Siersdorfer, S., & Weikum, G. (2005). Using restrictive classification and meta classification for junk elimination. In Proceedings of ECIR’2005 (pp. 287–299). Siersdorfer, S., & Weikum, G. (2005). Using restrictive classification and meta classification for junk elimination. In Proceedings of ECIR’2005 (pp. 287–299).
Zurück zum Zitat Triola, M.F. (2005). Elementary statistics. Reading: Addison Wesley. Triola, M.F. (2005). Elementary statistics. Reading: Addison Wesley.
Zurück zum Zitat Yao, Y.Y. (2011). The superiority of three-way decisions in probabilistic rough set models. Information Sciences, 181, 1080–1096.CrossRefMATHMathSciNet Yao, Y.Y. (2011). The superiority of three-way decisions in probabilistic rough set models. Information Sciences, 181, 1080–1096.CrossRefMATHMathSciNet
Zurück zum Zitat Yao, Y.Y., Wong, S.K.M., Lingras, P. (1990). A decision-theoretic rough set model. In Z.W. Ras, M. Zemankova, M.L. Emrich (Eds.), Methodologies for intelligent systems (Vol. 5, pp. 17–24). New York: North Holland. Yao, Y.Y., Wong, S.K.M., Lingras, P. (1990). A decision-theoretic rough set model. In Z.W. Ras, M. Zemankova, M.L. Emrich (Eds.), Methodologies for intelligent systems (Vol. 5, pp. 17–24). New York: North Holland.
Zurück zum Zitat Yerazunis, W.S. (2003). Sparse binary polynomial hashing and the CRM114 discriminator. In Proceedings of the MIT spam conference. Yerazunis, W.S. (2003). Sparse binary polynomial hashing and the CRM114 discriminator. In Proceedings of the MIT spam conference.
Zurück zum Zitat Yih, W., McCann, R., Kolcz, A. (2007). Improving spam filtering by Detecting Gray mail. In Proceedings of the 4th conference on e-mail and anti-spam (CEAS07). Yih, W., McCann, R., Kolcz, A. (2007). Improving spam filtering by Detecting Gray mail. In Proceedings of the 4th conference on e-mail and anti-spam (CEAS07).
Zurück zum Zitat Zhao, W., & Zhang, Z. (2005). An email classification model based on rough set theory. In Procedings of the international conference on active media technology (pp. 403–408). Zhao, W., & Zhang, Z. (2005). An email classification model based on rough set theory. In Procedings of the international conference on active media technology (pp. 403–408).
Zurück zum Zitat Zhou, Z.H., & Liu, X.Y. (2006). Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Transactions on Knowledge and Data Engineering, 18(1), 63–77.CrossRef Zhou, Z.H., & Liu, X.Y. (2006). Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Transactions on Knowledge and Data Engineering, 18(1), 63–77.CrossRef
Zurück zum Zitat Zhou, Z.H., & Liu, X.Y. (2010). On multi-class cost-sensitive learning. Computational Intelligence, 26(3), 232–257.CrossRefMathSciNet Zhou, Z.H., & Liu, X.Y. (2010). On multi-class cost-sensitive learning. Computational Intelligence, 26(3), 232–257.CrossRefMathSciNet
Zurück zum Zitat Zhou, B., & Liu, Q.Z. (2012). A comparison study of cost-sensitive classifier evaluations. In The 2012 international conference on brain informatics (BI’12). Lecture notes in computer science (Vol. 7670, pp. 360–371). Zhou, B., & Liu, Q.Z. (2012). A comparison study of cost-sensitive classifier evaluations. In The 2012 international conference on brain informatics (BI’12). Lecture notes in computer science (Vol. 7670, pp. 360–371).
Zurück zum Zitat Zhou, B., Yao, Y.Y., Luo, J.G. (2010). A three-way decision approach to email spam filtering. In Proceedings of the 23th Canadian conference on artificial intelligence (AI 2010), University of Ottawa, Ontario, Canada, 31 May–2 June 2010. Lecture notes in artificial intelligence (pp. 28–39). Zhou, B., Yao, Y.Y., Luo, J.G. (2010). A three-way decision approach to email spam filtering. In Proceedings of the 23th Canadian conference on artificial intelligence (AI 2010), University of Ottawa, Ontario, Canada, 31 May–2 June 2010. Lecture notes in artificial intelligence (pp. 28–39).
Metadaten
Titel
Cost-sensitive three-way email spam filtering
verfasst von
Bing Zhou
Yiyu Yao
Jigang Luo
Publikationsdatum
01.02.2014
Verlag
Springer US
Erschienen in
Journal of Intelligent Information Systems / Ausgabe 1/2014
Print ISSN: 0925-9902
Elektronische ISSN: 1573-7675
DOI
https://doi.org/10.1007/s10844-013-0254-7

Weitere Artikel der Ausgabe 1/2014

Journal of Intelligent Information Systems 1/2014 Zur Ausgabe

Premium Partner