Skip to main content

2018 | OriginalPaper | Buchkapitel

A Self-training Method for Detection of Phishing Websites

verfasst von : Xue-peng Jia, Xiao-feng Rong

Erschienen in: Data Mining and Big Data

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Phishing detection based on machine learning always lacks training data with high confidence labels. In order to reduce the impact of lack of labels on training set on performance to phishing detection, this paper proposes an improved self-training method of semi-supervised learning. It uses the divide-and-conquer principle and decomposes the original problem into a number of smaller but similar sub-problems to the original one. We compare model classification quality among supervised learning, traditional semi-supervised learning and new proposal method by using four classifiers, as well as the running time between two kinds of semi-supervised methods. The running time of can be reduced by 50% by using the improve method which divides unlabeled dataset equally, on the basis of ensuring the classification effect is equal to the traditional self-training method. Furthermore, the running time of model is continue reducing significantly by increasing the number of dividing unlabeled data set. The experiments results show our proposal, the improved self-training method outperformed the traditional self-training method.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Ma, J., Saul, L.K., Savage, S.: Beyond blacklists: learning to detect malicious web sites from suspicious URLs. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 28 June–July 2009, Paris, France, pp. 1245–1254 (2009) Ma, J., Saul, L.K., Savage, S.: Beyond blacklists: learning to detect malicious web sites from suspicious URLs. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 28 June–July 2009, Paris, France, pp. 1245–1254 (2009)
3.
Zurück zum Zitat Ma, J., Saul, L.K., Savage, S.: Identifying suspicious URLs: an application of large-scale online learning. In: International Conference on Machine Learning (ICML), Montreal, Quebec, June 2009 Ma, J., Saul, L.K., Savage, S.: Identifying suspicious URLs: an application of large-scale online learning. In: International Conference on Machine Learning (ICML), Montreal, Quebec, June 2009
4.
Zurück zum Zitat Xiang, G., Hong, J., Rose, C.P., et al.: CANTINA+: a feature-rich machine learning framework for detecting phishing web sites. ACM Trans. Inf. Syst. Secur. 14(2), 21 (2011)CrossRef Xiang, G., Hong, J., Rose, C.P., et al.: CANTINA+: a feature-rich machine learning framework for detecting phishing web sites. ACM Trans. Inf. Syst. Secur. 14(2), 21 (2011)CrossRef
5.
Zurück zum Zitat Moghimi, M., Varjani, A.Y.: New rule-based phishing detection method. Expert Syst. Appl. 53, 231–242 (2016)CrossRef Moghimi, M., Varjani, A.Y.: New rule-based phishing detection method. Expert Syst. Appl. 53, 231–242 (2016)CrossRef
6.
Zurück zum Zitat Tan, C.L., Kang, L.C., Wong, K.S.: PhishWHO: phishing webpage detection via identity keywords extraction and target domain name finder. Decis. Support Syst. 88, 18–27 (2016)CrossRef Tan, C.L., Kang, L.C., Wong, K.S.: PhishWHO: phishing webpage detection via identity keywords extraction and target domain name finder. Decis. Support Syst. 88, 18–27 (2016)CrossRef
7.
Zurück zum Zitat Li, Y., Xiao, R., Feng, J.: A semi-supervised learning approach for detection of phishing webpages. Opt. – Int. J. Light Electron Opt. 124(23), 6027–6033 (2013)CrossRef Li, Y., Xiao, R., Feng, J.: A semi-supervised learning approach for detection of phishing webpages. Opt. – Int. J. Light Electron Opt. 124(23), 6027–6033 (2013)CrossRef
8.
Zurück zum Zitat Gyawali, B., Solorio, T., Wardman, B.: Evaluating a semi-supervised approach to phishing URL identification in a realistic scenario. In: Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference, pp: 176–183. ACM (2011) Gyawali, B., Solorio, T., Wardman, B.: Evaluating a semi-supervised approach to phishing URL identification in a realistic scenario. In: Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference, pp: 176–183. ACM (2011)
9.
Zurück zum Zitat Debarr, D.: Spam, phishing, and fraud detection using random projections, adversarial learning, and semi-supervised learning. Dissertations & theses – Gradworks (2013) Debarr, D.: Spam, phishing, and fraud detection using random projections, adversarial learning, and semi-supervised learning. Dissertations & theses – Gradworks (2013)
10.
Zurück zum Zitat Nigam, K., Ghani, R.: Analyzing the effectiveness and applicability of co-training, vol. 33, pp. 86–93. ACM (2002) Nigam, K., Ghani, R.: Analyzing the effectiveness and applicability of co-training, vol. 33, pp. 86–93. ACM (2002)
11.
Zurück zum Zitat Blum, A.: Combining labeled and unlabeled data with co-training. In: Eleventh Conference on Computational Learning Theory, pp. 92–100 (2000) Blum, A.: Combining labeled and unlabeled data with co-training. In: Eleventh Conference on Computational Learning Theory, pp. 92–100 (2000)
12.
Zurück zum Zitat Chen, W.J., Shao, Y.H., Ye, Y.F.: Improving Lap-TSVM with successive over relaxation and differential evolution. Procedia Comput. Sci. 17, 33–40 (2013)CrossRef Chen, W.J., Shao, Y.H., Ye, Y.F.: Improving Lap-TSVM with successive over relaxation and differential evolution. Procedia Comput. Sci. 17, 33–40 (2013)CrossRef
13.
Zurück zum Zitat Li, Y., Xiao, R., Feng, J.: A semi-supervised learning approach for detection of phishing webpages. Opt. - Int. J. Light Electron Opt. 124(23), 6027–6033 (2013)CrossRef Li, Y., Xiao, R., Feng, J.: A semi-supervised learning approach for detection of phishing webpages. Opt. - Int. J. Light Electron Opt. 124(23), 6027–6033 (2013)CrossRef
14.
Zurück zum Zitat Chen, Y.S., Wang, G.P., Dong, S.H.: Learning with progressive transductive support vector machine. Pattern Recognit. Lett. 24(12), 1845–1855 (2003)CrossRef Chen, Y.S., Wang, G.P., Dong, S.H.: Learning with progressive transductive support vector machine. Pattern Recognit. Lett. 24(12), 1845–1855 (2003)CrossRef
15.
Zurück zum Zitat Chapelle, O., Schölkopf, B., Zien, A.: Semi-supervised learning. J. R. Stat. Soc. 172(2), 1530 (2006) Chapelle, O., Schölkopf, B., Zien, A.: Semi-supervised learning. J. R. Stat. Soc. 172(2), 1530 (2006)
16.
Zurück zum Zitat Clark, S., Curran, J.R., Osborne, M.: Bootstrapping POS taggers using unlabeled data, p. 49 (2003) Clark, S., Curran, J.R., Osborne, M.: Bootstrapping POS taggers using unlabeled data, p. 49 (2003)
17.
Zurück zum Zitat Zhang, Y., Hong, J.I., Cranor, L.F.: Cantina: a content-based approach to detecting phishing web sites. In: Proceedings of the 16th International Conference on World Wide Web, WWW 2007, pp. 639–648. ACM, New York (2007) Zhang, Y., Hong, J.I., Cranor, L.F.: Cantina: a content-based approach to detecting phishing web sites. In: Proceedings of the 16th International Conference on World Wide Web, WWW 2007, pp. 639–648. ACM, New York (2007)
18.
Zurück zum Zitat Garera S., Provos N., Chew M.: A Framework for Detection and Measurement of Phishing Attacks, ACM Workshop on Recurring Malcode, pp. 1–8. ACM (2007) Garera S., Provos N., Chew M.: A Framework for Detection and Measurement of Phishing Attacks, ACM Workshop on Recurring Malcode, pp. 1–8. ACM (2007)
19.
Zurück zum Zitat James, J., Sandhya, L., Thomas, C.: Detection of phishing URLs using machine learning techniques. In: International Conference on Control Communication and Computing, pp. 304. IEEE (2014) James, J., Sandhya, L., Thomas, C.: Detection of phishing URLs using machine learning techniques. In: International Conference on Control Communication and Computing, pp. 304. IEEE (2014)
20.
Zurück zum Zitat Soska, K., Christin, N.: Automatically detecting vulnerable websites before they turn malicious. In: Usenix Conference on Security Symposium, p. 625. USENIX Association (2014) Soska, K., Christin, N.: Automatically detecting vulnerable websites before they turn malicious. In: Usenix Conference on Security Symposium, p. 625. USENIX Association (2014)
21.
Zurück zum Zitat Pradeepthi, K.V., Kannan, A.: Performance Study of classification techniques for phishing URL detection. In: Sixth International Conference on Advanced Computing, pp. 135–139. IEEE (2015) Pradeepthi, K.V., Kannan, A.: Performance Study of classification techniques for phishing URL detection. In: Sixth International Conference on Advanced Computing, pp. 135–139. IEEE (2015)
Metadaten
Titel
A Self-training Method for Detection of Phishing Websites
verfasst von
Xue-peng Jia
Xiao-feng Rong
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-93803-5_39