Skip to main content

2016 | OriginalPaper | Buchkapitel

Spam Filtering Using Regularized Neural Networks with Rectified Linear Units

verfasst von : Aliaksandr Barushka, Petr Hájek

Erschienen in: AI*IA 2016 Advances in Artificial Intelligence

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The rapid growth of unsolicited and unwanted messages has inspired the development of many anti-spam methods. Machine-learning methods such as Naïve Bayes (NB), support vector machines (SVMs) or neural networks (NNs) have been particularly effective in categorizing spam /non-spam messages. They automatically construct word lists and their weights usually in a bag-of-words fashion. However, traditional multilayer perceptron (MLP) NNs usually suffer from slow optimization convergence to a poor local minimum and overfitting issues. To overcome this problem, we use a regularized NN with rectified linear units (RANN-ReL) for spam filtering. We compare its performance on three benchmark spam datasets (Enron, SpamAssassin, and SMS spam collection) with four machine algorithms commonly used in text classification, namely NB, SVM, MLP, and k-NN. We show that the RANN-ReL outperforms other methods in terms of classification accuracy, false negative and false positive rates. Notably, it classifies well both major (legitimate) and minor (spam) classes.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Cormack, G.V.: Email spam filtering: a systematic review. Found. Trends Inf. Retrieval 1(4), 335–455 (2006)CrossRef Cormack, G.V.: Email spam filtering: a systematic review. Found. Trends Inf. Retrieval 1(4), 335–455 (2006)CrossRef
2.
Zurück zum Zitat Delany, S.J., Buckley, M., Greene, D.: SMS spam filtering: methods and data. Expert Syst. Appl. 39(10), 9899–9908 (2012)CrossRef Delany, S.J., Buckley, M., Greene, D.: SMS spam filtering: methods and data. Expert Syst. Appl. 39(10), 9899–9908 (2012)CrossRef
3.
Zurück zum Zitat Hoanca, B.: How good are our weapons in the spam wars? IEEE Technol. Soc. Mag. 25(1), 22–30 (2006)CrossRef Hoanca, B.: How good are our weapons in the spam wars? IEEE Technol. Soc. Mag. 25(1), 22–30 (2006)CrossRef
4.
Zurück zum Zitat Laorden, C., Ugarte-Pedrero, X., Santos, I., Sanz, B., Nieves, J., Bringas, P.G.: Study on the effectiveness of anomaly detection for spam filtering. Inf. Sci. 277, 421–444 (2014)CrossRef Laorden, C., Ugarte-Pedrero, X., Santos, I., Sanz, B., Nieves, J., Bringas, P.G.: Study on the effectiveness of anomaly detection for spam filtering. Inf. Sci. 277, 421–444 (2014)CrossRef
5.
Zurück zum Zitat Shen, H., Li, Z.: Leveraging social networks for effective spam filtering. IEEE Trans. Comput. 63(11), 2743–2759 (2014)MathSciNetCrossRef Shen, H., Li, Z.: Leveraging social networks for effective spam filtering. IEEE Trans. Comput. 63(11), 2743–2759 (2014)MathSciNetCrossRef
6.
Zurück zum Zitat Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., Spyropoulos, C.D.: An experimental comparison of naive bayesian and keyword-based anti-spam filtering with personal E-mail messages. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 160–167. ACM (2000) Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., Spyropoulos, C.D.: An experimental comparison of naive bayesian and keyword-based anti-spam filtering with personal E-mail messages. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 160–167. ACM (2000)
7.
Zurück zum Zitat Metsis, V., Androutsopoulos, I., Paliouras, G.: Spam filtering with naive bayes - which naive bayes? In: Third Conference on Email and AntiSpam (CEAS), pp. 27–28 (2006) Metsis, V., Androutsopoulos, I., Paliouras, G.: Spam filtering with naive bayes - which naive bayes? In: Third Conference on Email and AntiSpam (CEAS), pp. 27–28 (2006)
8.
Zurück zum Zitat Carreras, X., Marquez, L.: Boosting trees for anti-spam email filtering. In: Proceedings of RANLP 2001, Bulgaria, pp. 58–64 (2001) Carreras, X., Marquez, L.: Boosting trees for anti-spam email filtering. In: Proceedings of RANLP 2001, Bulgaria, pp. 58–64 (2001)
9.
Zurück zum Zitat Drucker, H., Wu, D., Vapnik, V.: Support vector machines for spam categorization. IEEE Trans. Neural Netw. 10(5), 1048–1054 (1999)CrossRef Drucker, H., Wu, D., Vapnik, V.: Support vector machines for spam categorization. IEEE Trans. Neural Netw. 10(5), 1048–1054 (1999)CrossRef
10.
Zurück zum Zitat Jiang, S., Pang, G., Wu, M., Kuang, L.: An Improved K-nearest-neighbor algorithm for text categorization. Expert Syst. Appl. 39(1), 1503–1509 (2012)CrossRef Jiang, S., Pang, G., Wu, M., Kuang, L.: An Improved K-nearest-neighbor algorithm for text categorization. Expert Syst. Appl. 39(1), 1503–1509 (2012)CrossRef
11.
Zurück zum Zitat Clark, J., Koprinska, I., Poon, J.: A neural network based approach to automated e-mail classification. In: Proceedings of the IEEE/WIC International Conference on Web Intelligence (WI 2003), pp. 702–705. IEEE Computer Society (2003) Clark, J., Koprinska, I., Poon, J.: A neural network based approach to automated e-mail classification. In: Proceedings of the IEEE/WIC International Conference on Web Intelligence (WI 2003), pp. 702–705. IEEE Computer Society (2003)
12.
Zurück zum Zitat Zhou, B., Yao, Y., Luo, J.: Cost-sensitive three-way email spam filtering. J. Intell. Inf. Syst. 42(1), 19–45 (2014)CrossRef Zhou, B., Yao, Y., Luo, J.: Cost-sensitive three-way email spam filtering. J. Intell. Inf. Syst. 42(1), 19–45 (2014)CrossRef
13.
Zurück zum Zitat Guzella, T., Caminhas, W.: A review of machine learning approaches to spam filtering. Expert Syst. Appl. 36(7), 10206–10222 (2009)CrossRef Guzella, T., Caminhas, W.: A review of machine learning approaches to spam filtering. Expert Syst. Appl. 36(7), 10206–10222 (2009)CrossRef
14.
Zurück zum Zitat Caruana, G., Li, M.: A survey of emerging approaches to spam filtering. ACM Comput. Surv. 44(2), 1–27 (2012)CrossRef Caruana, G., Li, M.: A survey of emerging approaches to spam filtering. ACM Comput. Surv. 44(2), 1–27 (2012)CrossRef
15.
Zurück zum Zitat Nam, J., Kim, J., Mencía, E.L., Gurevych, I., Fürnkranz, J.: Large-scale multi-label text classification - revisiting neural networks. In: Calders, T., Esposito, F., Hüllermeier, E., Melo, R. (eds.) Machine Learning and Knowledge Discovery in Databases, pp. 437–452. Springer, Berlin Heidelberg (2014) Nam, J., Kim, J., Mencía, E.L., Gurevych, I., Fürnkranz, J.: Large-scale multi-label text classification - revisiting neural networks. In: Calders, T., Esposito, F., Hüllermeier, E., Melo, R. (eds.) Machine Learning and Knowledge Discovery in Databases, pp. 437–452. Springer, Berlin Heidelberg (2014)
16.
Zurück zum Zitat Hinton, G., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580 (2012) Hinton, G., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv:​1207.​0580 (2012)
17.
Zurück zum Zitat Khan, A., Baharudin, B., Lee, L.: A review of machine learning algorithms for text-documents classification. J. Adv. Inf. Technol. 1(1), 4–20 (2010) Khan, A., Baharudin, B., Lee, L.: A review of machine learning algorithms for text-documents classification. J. Adv. Inf. Technol. 1(1), 4–20 (2010)
18.
Zurück zum Zitat Carpinter, J., Hunt, R.: Tightening the net: a review of current and next generation spam filtering tools. Comput. Secur. 25(8), 566–578 (2006)CrossRef Carpinter, J., Hunt, R.: Tightening the net: a review of current and next generation spam filtering tools. Comput. Secur. 25(8), 566–578 (2006)CrossRef
19.
Zurück zum Zitat Talbot, D.: Where Spam is born. MIT Technol. Rev. 111(3), 28 (2008) Talbot, D.: Where Spam is born. MIT Technol. Rev. 111(3), 28 (2008)
20.
Zurück zum Zitat Fawcett, T.: In vivo spam filtering: a challenge problem for KDD. ACM SIGKDD Explor. Newsl. 5(2), 140–148 (2003)CrossRef Fawcett, T.: In vivo spam filtering: a challenge problem for KDD. ACM SIGKDD Explor. Newsl. 5(2), 140–148 (2003)CrossRef
21.
Zurück zum Zitat Zhang, Y., Wang, S., Phillips, P., Ji, G.: Binary PSO with mutation operator for feature selection using decision tree applied to spam detection. Knowl.-Based Syst. 64, 22–31 (2014)CrossRef Zhang, Y., Wang, S., Phillips, P., Ji, G.: Binary PSO with mutation operator for feature selection using decision tree applied to spam detection. Knowl.-Based Syst. 64, 22–31 (2014)CrossRef
22.
Zurück zum Zitat Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A bayesian approach to filtering junk E-Mail. In: Papers from the 1998 Workshop Learning for Text Categorization, vol. 62, pp. 98–105 (1998) Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A bayesian approach to filtering junk E-Mail. In: Papers from the 1998 Workshop Learning for Text Categorization, vol. 62, pp. 98–105 (1998)
23.
Zurück zum Zitat Zhang, L., Zhu, J., Yao, T.: An evaluation of statistical spam filtering techniques. ACM Trans. Asian Lang. Inf. Process. 3(4), 243–269 (2004)CrossRef Zhang, L., Zhu, J., Yao, T.: An evaluation of statistical spam filtering techniques. ACM Trans. Asian Lang. Inf. Process. 3(4), 243–269 (2004)CrossRef
24.
Zurück zum Zitat Koprinska, I., Poon, J., Clark, J., Chan, J.: Learning to classify E-mail. Inf. Sci. 177(10), 2167–2187 (2007)CrossRef Koprinska, I., Poon, J., Clark, J., Chan, J.: Learning to classify E-mail. Inf. Sci. 177(10), 2167–2187 (2007)CrossRef
25.
Zurück zum Zitat Lai, C.: An empirical study of three machine learning methods for spam filtering. Knowl.-Based Syst. 20(3), 249–254 (2007)CrossRef Lai, C.: An empirical study of three machine learning methods for spam filtering. Knowl.-Based Syst. 20(3), 249–254 (2007)CrossRef
26.
Zurück zum Zitat Vyas, T., Prajapati, P., Gadhwal, S.: A survey and evaluation of supervised machine learning techniques for spam E-mail filtering. In: IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT), pp. 1–7. IEEE (2015) Vyas, T., Prajapati, P., Gadhwal, S.: A survey and evaluation of supervised machine learning techniques for spam E-mail filtering. In: IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT), pp. 1–7. IEEE (2015)
27.
Zurück zum Zitat Almeida, T.A., Hidalgo, J.M.G., Yamakami, A.: Contributions to the study of SMS spam filtering: new collection and results. In: Proceedings of the 11th ACM Symposium on Document Engineering, pp. 259–262. ACM (2011) Almeida, T.A., Hidalgo, J.M.G., Yamakami, A.: Contributions to the study of SMS spam filtering: new collection and results. In: Proceedings of the 11th ACM Symposium on Document Engineering, pp. 259–262. ACM (2011)
28.
Zurück zum Zitat Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of the 30th International Conference on Machine Learning, vol. 30, pp. 1–6 (2013) Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of the 30th International Conference on Machine Learning, vol. 30, pp. 1–6 (2013)
29.
Zurück zum Zitat Jaitly, N., Hinton, G.: Learning a better representation of speech soundwaves using restricted boltzmann machines. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5884–5887. IEEE (2011) Jaitly, N., Hinton, G.: Learning a better representation of speech soundwaves using restricted boltzmann machines. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5884–5887. IEEE (2011)
30.
Zurück zum Zitat Hajek, P., Bohacova, J.: Predicting abnormal bank stock returns using textual analysis of annual reports - a neural network approach. In: Jayne, C., Iliadis, L. (eds.) Engineering Applications of Neural Networks (EANN), pp. 67–78. Springer, New York (2016)CrossRef Hajek, P., Bohacova, J.: Predicting abnormal bank stock returns using textual analysis of annual reports - a neural network approach. In: Jayne, C., Iliadis, L. (eds.) Engineering Applications of Neural Networks (EANN), pp. 67–78. Springer, New York (2016)CrossRef
Metadaten
Titel
Spam Filtering Using Regularized Neural Networks with Rectified Linear Units
verfasst von
Aliaksandr Barushka
Petr Hájek
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-49130-1_6