Skip to main content
Top
Published in: Neural Computing and Applications 8/2019

06-01-2018 | Original Article

Detection of phishing websites using an efficient feature-based machine learning framework

Authors: Routhu Srinivasa Rao, Alwyn Roshan Pais

Published in: Neural Computing and Applications | Issue 8/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Phishing is a cyber-attack which targets naive online users tricking into revealing sensitive information such as username, password, social security number or credit card number etc. Attackers fool the Internet users by masking webpage as a trustworthy or legitimate page to retrieve personal information. There are many anti-phishing solutions such as blacklist or whitelist, heuristic and visual similarity-based methods proposed to date, but online users are still getting trapped into revealing sensitive information in phishing websites. In this paper, we propose a novel classification model, based on heuristic features that are extracted from URL, source code, and third-party services to overcome the disadvantages of existing anti-phishing techniques. Our model has been evaluated using eight different machine learning algorithms and out of which, the Random Forest (RF) algorithm performed the best with an accuracy of 99.31%. The experiments were repeated with different (orthogonal and oblique) random forest classifiers to find the best classifier for the phishing website detection. Principal component analysis Random Forest (PCA-RF) performed the best out of all oblique Random Forests (oRFs) with an accuracy of 99.55%. We have also tested our model with the third-party-based features and without third-party-based features to determine the effectiveness of third-party services in the classification of suspicious websites. We also compared our results with the baseline models (CANTINA and CANTINA+). Our proposed technique outperformed these methods and also detected zero-day phishing attacks.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
6.
7.
go back to reference Cao Y, Han W, Le Y (2008) Anti-phishing based on automated individual white-list. In: Proceedings of the 4th ACM workshop on digital identity management, ACM, pp 51–60 Cao Y, Han W, Le Y (2008) Anti-phishing based on automated individual white-list. In: Proceedings of the 4th ACM workshop on digital identity management, ACM, pp 51–60
8.
go back to reference Zhang J, Porras PA, Ullrich J (2008) Highly predictive blacklisting. In: USENIX security symposium, pp 107–122 Zhang J, Porras PA, Ullrich J (2008) Highly predictive blacklisting. In: USENIX security symposium, pp 107–122
10.
go back to reference Almomani A, Wan TC, Altaher A, Manasrah A (2012) Evolving fuzzy neural network for phishing emails detection. J Comput Sci 8(7):1099CrossRef Almomani A, Wan TC, Altaher A, Manasrah A (2012) Evolving fuzzy neural network for phishing emails detection. J Comput Sci 8(7):1099CrossRef
16.
go back to reference Fu AY, Wenyin L, Deng X (2006) Detecting phishing web pages with visual similarity assessment based on earth mover’s distance (EMD). IEEE Trans Dependable Secur Comput 3(4):301–311CrossRef Fu AY, Wenyin L, Deng X (2006) Detecting phishing web pages with visual similarity assessment based on earth mover’s distance (EMD). IEEE Trans Dependable Secur Comput 3(4):301–311CrossRef
17.
go back to reference Wenyin L, Huang G, Xiaoyue L, Min Z, Deng X (2005) Detection of phishing webpages based on visual similarity. In: Special interest tracks and posters of the 14th international conference on World Wide Web, ACM, pp 1060–1061 Wenyin L, Huang G, Xiaoyue L, Min Z, Deng X (2005) Detection of phishing webpages based on visual similarity. In: Special interest tracks and posters of the 14th international conference on World Wide Web, ACM, pp 1060–1061
30.
go back to reference Menze BH, Kelm BM, Splitthoff DN, Koethe U, Hamprecht FA (2011) On oblique random forests. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 453–469 Menze BH, Kelm BM, Splitthoff DN, Koethe U, Hamprecht FA (2011) On oblique random forests. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 453–469
31.
go back to reference Zhang L, Suganthan PN (2014) Random forests with ensemble of feature spaces. Pattern Recogn 47(10):3429–3437CrossRef Zhang L, Suganthan PN (2014) Random forests with ensemble of feature spaces. Pattern Recogn 47(10):3429–3437CrossRef
32.
go back to reference Zhang L, Suganthan PN (2015) Oblique decision tree ensemble via multisurface proximal support vector machine. IEEE Trans Cybern 45(10):2165–2176CrossRef Zhang L, Suganthan PN (2015) Oblique decision tree ensemble via multisurface proximal support vector machine. IEEE Trans Cybern 45(10):2165–2176CrossRef
34.
go back to reference Mohammad RM, Thabtah F, McCluskey L (2014) Predicting phishing websites based on self-structuring neural network. Neural Comput Appl 25(2):443–458CrossRef Mohammad RM, Thabtah F, McCluskey L (2014) Predicting phishing websites based on self-structuring neural network. Neural Comput Appl 25(2):443–458CrossRef
37.
go back to reference Aggarwal A, Rajadesingan A, Kumaraguru P (2012) Phishari: automatic realtime phishing detection on twitter. In: eCrime Researchers Summit (eCrime), 2012, IEEE, pp 1–12 Aggarwal A, Rajadesingan A, Kumaraguru P (2012) Phishari: automatic realtime phishing detection on twitter. In: eCrime Researchers Summit (eCrime), 2012, IEEE, pp 1–12
38.
go back to reference Abdelhamid N, Ayesh A, Thabtah F (2014) Phishing detection based associative classification data mining. Expert Syst Appl 41(13):5948–5959CrossRef Abdelhamid N, Ayesh A, Thabtah F (2014) Phishing detection based associative classification data mining. Expert Syst Appl 41(13):5948–5959CrossRef
39.
go back to reference Dietterich TG (2000) Ensemble methods in machine learning. In: International workshop on multiple classifier systems. Springer, pp 1–15 Dietterich TG (2000) Ensemble methods in machine learning. In: International workshop on multiple classifier systems. Springer, pp 1–15
40.
go back to reference Ren Y, Zhang L, Suganthan PN (2016) Ensemble classification and regression-recent developments, applications and future directions [review article]. IEEE Comput Intell Mag 11(1):41–53CrossRef Ren Y, Zhang L, Suganthan PN (2016) Ensemble classification and regression-recent developments, applications and future directions [review article]. IEEE Comput Intell Mag 11(1):41–53CrossRef
41.
go back to reference Fernández-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems. J Mach Learn Res 15(1):3133–3181MathSciNetMATH Fernández-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems. J Mach Learn Res 15(1):3133–3181MathSciNetMATH
43.
go back to reference Fette I, Sadeh N, Tomasic A (2007) Learning to detect phishing emails. In: Proceedings of the 16th international conference on World Wide Web, ACM, pp 649–656 Fette I, Sadeh N, Tomasic A (2007) Learning to detect phishing emails. In: Proceedings of the 16th international conference on World Wide Web, ACM, pp 649–656
44.
go back to reference Dewan P, Kumaraguru P (2015) Towards automatic real time identification of malicious posts on facebook. In: Privacy, security and trust (PST), 2015 13th Annual Conference on IEEE, pp 85–92 Dewan P, Kumaraguru P (2015) Towards automatic real time identification of malicious posts on facebook. In: Privacy, security and trust (PST), 2015 13th Annual Conference on IEEE, pp 85–92
46.
go back to reference Mangasarian OL, Wild EW (2006) Multisurface proximal support vector machine classification via generalized eigenvalues. IEEE Trans Pattern Anal Mach Intell 28(1):69–74CrossRef Mangasarian OL, Wild EW (2006) Multisurface proximal support vector machine classification via generalized eigenvalues. IEEE Trans Pattern Anal Mach Intell 28(1):69–74CrossRef
47.
go back to reference Manwani N, Sastry P (2012) Geometric decision tree. IEEE Trans Syst Man Cybern Part B (Cybernetics) 42(1):181–192CrossRef Manwani N, Sastry P (2012) Geometric decision tree. IEEE Trans Syst Man Cybern Part B (Cybernetics) 42(1):181–192CrossRef
48.
go back to reference Mohammad RM, Thabtah F, McCluskey L (2012) An assessment of features related to phishing websites using an automated technique. In: Internet technology and secured transactions, 2012 international conference for IEEE, pp 492–497 Mohammad RM, Thabtah F, McCluskey L (2012) An assessment of features related to phishing websites using an automated technique. In: Internet technology and secured transactions, 2012 international conference for IEEE, pp 492–497
49.
go back to reference Mohammad RM, Thabtah F, McCluskey L (2014) Intelligent rule-based phishing websites classification. IET Inf Secur 8(3):153–160CrossRef Mohammad RM, Thabtah F, McCluskey L (2014) Intelligent rule-based phishing websites classification. IET Inf Secur 8(3):153–160CrossRef
50.
go back to reference Basnet RB, Sung AH, Liu Q (2011) Rule-based phishing attack detection. In: International conference on security and management (SAM 2011), Las Vegas, NV Basnet RB, Sung AH, Liu Q (2011) Rule-based phishing attack detection. In: International conference on security and management (SAM 2011), Las Vegas, NV
51.
go back to reference Garera S, Provos N, Chew M, Rubin AD (2007) A framework for detection and measurement of phishing attacks. In: Proceedings of the 2007 ACM workshop on recurring malcode, ACM, pp 1–8 Garera S, Provos N, Chew M, Rubin AD (2007) A framework for detection and measurement of phishing attacks. In: Proceedings of the 2007 ACM workshop on recurring malcode, ACM, pp 1–8
52.
go back to reference Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844CrossRef Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844CrossRef
Metadata
Title
Detection of phishing websites using an efficient feature-based machine learning framework
Authors
Routhu Srinivasa Rao
Alwyn Roshan Pais
Publication date
06-01-2018
Publisher
Springer London
Published in
Neural Computing and Applications / Issue 8/2019
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-017-3305-0

Other articles of this Issue 8/2019

Neural Computing and Applications 8/2019 Go to the issue

Premium Partner