nach oben

Erschienen in:

2016 | OriginalPaper | Buchkapitel

Spam Filtering Using Regularized Neural Networks with Rectified Linear Units

verfasst von : Aliaksandr Barushka, Petr Hájek

Erschienen in: AI*IA 2016 Advances in Artificial Intelligence

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The rapid growth of unsolicited and unwanted messages has inspired the development of many anti-spam methods. Machine-learning methods such as Naïve Bayes (NB), support vector machines (SVMs) or neural networks (NNs) have been particularly effective in categorizing spam /non-spam messages. They automatically construct word lists and their weights usually in a bag-of-words fashion. However, traditional multilayer perceptron (MLP) NNs usually suffer from slow optimization convergence to a poor local minimum and overfitting issues. To overcome this problem, we use a regularized NN with rectified linear units (RANN-ReL) for spam filtering. We compare its performance on three benchmark spam datasets (Enron, SpamAssassin, and SMS spam collection) with four machine algorithms commonly used in text classification, namely NB, SVM, MLP, and k-NN. We show that the RANN-ReL outperforms other methods in terms of classification accuracy, false negative and false positive rates. Notably, it classifies well both major (legitimate) and minor (spam) classes.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Flat and Hierarchical Classifiers for Detecting Emotion in Tweets

Nächstes Kapitel User Mood Tracking for Opinion Analysis on Twitter

http://csmining.org/index.php/enron-spam-datasets.html.

http://csmining.org/index.php/spam-assassin-datasets.html.

https://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection.

Cormack, G.V.: Email spam filtering: a systematic review. Found. Trends Inf. Retrieval 1(4), 335–455 (2006)CrossRef

Delany, S.J., Buckley, M., Greene, D.: SMS spam filtering: methods and data. Expert Syst. Appl. 39(10), 9899–9908 (2012)CrossRef

Hoanca, B.: How good are our weapons in the spam wars? IEEE Technol. Soc. Mag. 25(1), 22–30 (2006)CrossRef

Laorden, C., Ugarte-Pedrero, X., Santos, I., Sanz, B., Nieves, J., Bringas, P.G.: Study on the effectiveness of anomaly detection for spam filtering. Inf. Sci. 277, 421–444 (2014)CrossRef

Shen, H., Li, Z.: Leveraging social networks for effective spam filtering. IEEE Trans. Comput. 63(11), 2743–2759 (2014)MathSciNetCrossRef

Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., Spyropoulos, C.D.: An experimental comparison of naive bayesian and keyword-based anti-spam filtering with personal E-mail messages. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 160–167. ACM (2000)

Metsis, V., Androutsopoulos, I., Paliouras, G.: Spam filtering with naive bayes - which naive bayes? In: Third Conference on Email and AntiSpam (CEAS), pp. 27–28 (2006)

Carreras, X., Marquez, L.: Boosting trees for anti-spam email filtering. In: Proceedings of RANLP 2001, Bulgaria, pp. 58–64 (2001)

Drucker, H., Wu, D., Vapnik, V.: Support vector machines for spam categorization. IEEE Trans. Neural Netw. 10(5), 1048–1054 (1999)CrossRef

10.

Jiang, S., Pang, G., Wu, M., Kuang, L.: An Improved K-nearest-neighbor algorithm for text categorization. Expert Syst. Appl. 39(1), 1503–1509 (2012)CrossRef

11.

Clark, J., Koprinska, I., Poon, J.: A neural network based approach to automated e-mail classification. In: Proceedings of the IEEE/WIC International Conference on Web Intelligence (WI 2003), pp. 702–705. IEEE Computer Society (2003)

12.

Zhou, B., Yao, Y., Luo, J.: Cost-sensitive three-way email spam filtering. J. Intell. Inf. Syst. 42(1), 19–45 (2014)CrossRef

13.

Guzella, T., Caminhas, W.: A review of machine learning approaches to spam filtering. Expert Syst. Appl. 36(7), 10206–10222 (2009)CrossRef

14.

Caruana, G., Li, M.: A survey of emerging approaches to spam filtering. ACM Comput. Surv. 44(2), 1–27 (2012)CrossRef

15.

Nam, J., Kim, J., Mencía, E.L., Gurevych, I., Fürnkranz, J.: Large-scale multi-label text classification - revisiting neural networks. In: Calders, T., Esposito, F., Hüllermeier, E., Melo, R. (eds.) Machine Learning and Knowledge Discovery in Databases, pp. 437–452. Springer, Berlin Heidelberg (2014)

16.

Hinton, G., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580 (2012)

17.

Khan, A., Baharudin, B., Lee, L.: A review of machine learning algorithms for text-documents classification. J. Adv. Inf. Technol. 1(1), 4–20 (2010)

18.

Carpinter, J., Hunt, R.: Tightening the net: a review of current and next generation spam filtering tools. Comput. Secur. 25(8), 566–578 (2006)CrossRef

19.

Talbot, D.: Where Spam is born. MIT Technol. Rev. 111(3), 28 (2008)

20.

Fawcett, T.: In vivo spam filtering: a challenge problem for KDD. ACM SIGKDD Explor. Newsl. 5(2), 140–148 (2003)CrossRef

21.

Zhang, Y., Wang, S., Phillips, P., Ji, G.: Binary PSO with mutation operator for feature selection using decision tree applied to spam detection. Knowl.-Based Syst. 64, 22–31 (2014)CrossRef

22.

Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A bayesian approach to filtering junk E-Mail. In: Papers from the 1998 Workshop Learning for Text Categorization, vol. 62, pp. 98–105 (1998)

23.

Zhang, L., Zhu, J., Yao, T.: An evaluation of statistical spam filtering techniques. ACM Trans. Asian Lang. Inf. Process. 3(4), 243–269 (2004)CrossRef

24.

Koprinska, I., Poon, J., Clark, J., Chan, J.: Learning to classify E-mail. Inf. Sci. 177(10), 2167–2187 (2007)CrossRef

25.

Lai, C.: An empirical study of three machine learning methods for spam filtering. Knowl.-Based Syst. 20(3), 249–254 (2007)CrossRef

26.

Vyas, T., Prajapati, P., Gadhwal, S.: A survey and evaluation of supervised machine learning techniques for spam E-mail filtering. In: IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT), pp. 1–7. IEEE (2015)

27.

Almeida, T.A., Hidalgo, J.M.G., Yamakami, A.: Contributions to the study of SMS spam filtering: new collection and results. In: Proceedings of the 11th ACM Symposium on Document Engineering, pp. 259–262. ACM (2011)

28.

Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of the 30th International Conference on Machine Learning, vol. 30, pp. 1–6 (2013)

29.

Jaitly, N., Hinton, G.: Learning a better representation of speech soundwaves using restricted boltzmann machines. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5884–5887. IEEE (2011)

30.

Hajek, P., Bohacova, J.: Predicting abnormal bank stock returns using textual analysis of annual reports - a neural network approach. In: Jayne, C., Iliadis, L. (eds.) Engineering Applications of Neural Networks (EANN), pp. 67–78. Springer, New York (2016)CrossRef

Titel: Spam Filtering Using Regularized Neural Networks with Rectified Linear Units
verfasst von: Aliaksandr Barushka
Petr Hájek
Verlag: Springer International Publishing
Buch: AI*IA 2016 Advances in Artificial Intelligence
Print ISBN: 978-3-319-49129-5

Electronic ISBN: 978-3-319-49130-1

Copyright-Jahr: 2016
DOI: https://doi.org/10.1007/978-3-319-49130-1_6

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"