Skip to main content
Erschienen in: Neural Computing and Applications 8/2019

06.01.2018 | Original Article

Detection of phishing websites using an efficient feature-based machine learning framework

verfasst von: Routhu Srinivasa Rao, Alwyn Roshan Pais

Erschienen in: Neural Computing and Applications | Ausgabe 8/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Phishing is a cyber-attack which targets naive online users tricking into revealing sensitive information such as username, password, social security number or credit card number etc. Attackers fool the Internet users by masking webpage as a trustworthy or legitimate page to retrieve personal information. There are many anti-phishing solutions such as blacklist or whitelist, heuristic and visual similarity-based methods proposed to date, but online users are still getting trapped into revealing sensitive information in phishing websites. In this paper, we propose a novel classification model, based on heuristic features that are extracted from URL, source code, and third-party services to overcome the disadvantages of existing anti-phishing techniques. Our model has been evaluated using eight different machine learning algorithms and out of which, the Random Forest (RF) algorithm performed the best with an accuracy of 99.31%. The experiments were repeated with different (orthogonal and oblique) random forest classifiers to find the best classifier for the phishing website detection. Principal component analysis Random Forest (PCA-RF) performed the best out of all oblique Random Forests (oRFs) with an accuracy of 99.55%. We have also tested our model with the third-party-based features and without third-party-based features to determine the effectiveness of third-party services in the classification of suspicious websites. We also compared our results with the baseline models (CANTINA and CANTINA+). Our proposed technique outperformed these methods and also detected zero-day phishing attacks.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
6.
Zurück zum Zitat Hong J (2012) The state of phishing attacks. Commun ACM 55(1):74–81CrossRef Hong J (2012) The state of phishing attacks. Commun ACM 55(1):74–81CrossRef
7.
Zurück zum Zitat Cao Y, Han W, Le Y (2008) Anti-phishing based on automated individual white-list. In: Proceedings of the 4th ACM workshop on digital identity management, ACM, pp 51–60 Cao Y, Han W, Le Y (2008) Anti-phishing based on automated individual white-list. In: Proceedings of the 4th ACM workshop on digital identity management, ACM, pp 51–60
8.
Zurück zum Zitat Zhang J, Porras PA, Ullrich J (2008) Highly predictive blacklisting. In: USENIX security symposium, pp 107–122 Zhang J, Porras PA, Ullrich J (2008) Highly predictive blacklisting. In: USENIX security symposium, pp 107–122
10.
Zurück zum Zitat Almomani A, Wan TC, Altaher A, Manasrah A (2012) Evolving fuzzy neural network for phishing emails detection. J Comput Sci 8(7):1099CrossRef Almomani A, Wan TC, Altaher A, Manasrah A (2012) Evolving fuzzy neural network for phishing emails detection. J Comput Sci 8(7):1099CrossRef
15.
16.
Zurück zum Zitat Fu AY, Wenyin L, Deng X (2006) Detecting phishing web pages with visual similarity assessment based on earth mover’s distance (EMD). IEEE Trans Dependable Secur Comput 3(4):301–311CrossRef Fu AY, Wenyin L, Deng X (2006) Detecting phishing web pages with visual similarity assessment based on earth mover’s distance (EMD). IEEE Trans Dependable Secur Comput 3(4):301–311CrossRef
17.
Zurück zum Zitat Wenyin L, Huang G, Xiaoyue L, Min Z, Deng X (2005) Detection of phishing webpages based on visual similarity. In: Special interest tracks and posters of the 14th international conference on World Wide Web, ACM, pp 1060–1061 Wenyin L, Huang G, Xiaoyue L, Min Z, Deng X (2005) Detection of phishing webpages based on visual similarity. In: Special interest tracks and posters of the 14th international conference on World Wide Web, ACM, pp 1060–1061
30.
Zurück zum Zitat Menze BH, Kelm BM, Splitthoff DN, Koethe U, Hamprecht FA (2011) On oblique random forests. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 453–469 Menze BH, Kelm BM, Splitthoff DN, Koethe U, Hamprecht FA (2011) On oblique random forests. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 453–469
31.
Zurück zum Zitat Zhang L, Suganthan PN (2014) Random forests with ensemble of feature spaces. Pattern Recogn 47(10):3429–3437CrossRef Zhang L, Suganthan PN (2014) Random forests with ensemble of feature spaces. Pattern Recogn 47(10):3429–3437CrossRef
32.
Zurück zum Zitat Zhang L, Suganthan PN (2015) Oblique decision tree ensemble via multisurface proximal support vector machine. IEEE Trans Cybern 45(10):2165–2176CrossRef Zhang L, Suganthan PN (2015) Oblique decision tree ensemble via multisurface proximal support vector machine. IEEE Trans Cybern 45(10):2165–2176CrossRef
34.
Zurück zum Zitat Mohammad RM, Thabtah F, McCluskey L (2014) Predicting phishing websites based on self-structuring neural network. Neural Comput Appl 25(2):443–458CrossRef Mohammad RM, Thabtah F, McCluskey L (2014) Predicting phishing websites based on self-structuring neural network. Neural Comput Appl 25(2):443–458CrossRef
37.
Zurück zum Zitat Aggarwal A, Rajadesingan A, Kumaraguru P (2012) Phishari: automatic realtime phishing detection on twitter. In: eCrime Researchers Summit (eCrime), 2012, IEEE, pp 1–12 Aggarwal A, Rajadesingan A, Kumaraguru P (2012) Phishari: automatic realtime phishing detection on twitter. In: eCrime Researchers Summit (eCrime), 2012, IEEE, pp 1–12
38.
Zurück zum Zitat Abdelhamid N, Ayesh A, Thabtah F (2014) Phishing detection based associative classification data mining. Expert Syst Appl 41(13):5948–5959CrossRef Abdelhamid N, Ayesh A, Thabtah F (2014) Phishing detection based associative classification data mining. Expert Syst Appl 41(13):5948–5959CrossRef
39.
Zurück zum Zitat Dietterich TG (2000) Ensemble methods in machine learning. In: International workshop on multiple classifier systems. Springer, pp 1–15 Dietterich TG (2000) Ensemble methods in machine learning. In: International workshop on multiple classifier systems. Springer, pp 1–15
40.
Zurück zum Zitat Ren Y, Zhang L, Suganthan PN (2016) Ensemble classification and regression-recent developments, applications and future directions [review article]. IEEE Comput Intell Mag 11(1):41–53CrossRef Ren Y, Zhang L, Suganthan PN (2016) Ensemble classification and regression-recent developments, applications and future directions [review article]. IEEE Comput Intell Mag 11(1):41–53CrossRef
41.
Zurück zum Zitat Fernández-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems. J Mach Learn Res 15(1):3133–3181MathSciNetMATH Fernández-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems. J Mach Learn Res 15(1):3133–3181MathSciNetMATH
43.
Zurück zum Zitat Fette I, Sadeh N, Tomasic A (2007) Learning to detect phishing emails. In: Proceedings of the 16th international conference on World Wide Web, ACM, pp 649–656 Fette I, Sadeh N, Tomasic A (2007) Learning to detect phishing emails. In: Proceedings of the 16th international conference on World Wide Web, ACM, pp 649–656
44.
Zurück zum Zitat Dewan P, Kumaraguru P (2015) Towards automatic real time identification of malicious posts on facebook. In: Privacy, security and trust (PST), 2015 13th Annual Conference on IEEE, pp 85–92 Dewan P, Kumaraguru P (2015) Towards automatic real time identification of malicious posts on facebook. In: Privacy, security and trust (PST), 2015 13th Annual Conference on IEEE, pp 85–92
46.
Zurück zum Zitat Mangasarian OL, Wild EW (2006) Multisurface proximal support vector machine classification via generalized eigenvalues. IEEE Trans Pattern Anal Mach Intell 28(1):69–74CrossRef Mangasarian OL, Wild EW (2006) Multisurface proximal support vector machine classification via generalized eigenvalues. IEEE Trans Pattern Anal Mach Intell 28(1):69–74CrossRef
47.
Zurück zum Zitat Manwani N, Sastry P (2012) Geometric decision tree. IEEE Trans Syst Man Cybern Part B (Cybernetics) 42(1):181–192CrossRef Manwani N, Sastry P (2012) Geometric decision tree. IEEE Trans Syst Man Cybern Part B (Cybernetics) 42(1):181–192CrossRef
48.
Zurück zum Zitat Mohammad RM, Thabtah F, McCluskey L (2012) An assessment of features related to phishing websites using an automated technique. In: Internet technology and secured transactions, 2012 international conference for IEEE, pp 492–497 Mohammad RM, Thabtah F, McCluskey L (2012) An assessment of features related to phishing websites using an automated technique. In: Internet technology and secured transactions, 2012 international conference for IEEE, pp 492–497
49.
Zurück zum Zitat Mohammad RM, Thabtah F, McCluskey L (2014) Intelligent rule-based phishing websites classification. IET Inf Secur 8(3):153–160CrossRef Mohammad RM, Thabtah F, McCluskey L (2014) Intelligent rule-based phishing websites classification. IET Inf Secur 8(3):153–160CrossRef
50.
Zurück zum Zitat Basnet RB, Sung AH, Liu Q (2011) Rule-based phishing attack detection. In: International conference on security and management (SAM 2011), Las Vegas, NV Basnet RB, Sung AH, Liu Q (2011) Rule-based phishing attack detection. In: International conference on security and management (SAM 2011), Las Vegas, NV
51.
Zurück zum Zitat Garera S, Provos N, Chew M, Rubin AD (2007) A framework for detection and measurement of phishing attacks. In: Proceedings of the 2007 ACM workshop on recurring malcode, ACM, pp 1–8 Garera S, Provos N, Chew M, Rubin AD (2007) A framework for detection and measurement of phishing attacks. In: Proceedings of the 2007 ACM workshop on recurring malcode, ACM, pp 1–8
52.
Zurück zum Zitat Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844CrossRef Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844CrossRef
Metadaten
Titel
Detection of phishing websites using an efficient feature-based machine learning framework
verfasst von
Routhu Srinivasa Rao
Alwyn Roshan Pais
Publikationsdatum
06.01.2018
Verlag
Springer London
Erschienen in
Neural Computing and Applications / Ausgabe 8/2019
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-017-3305-0

Weitere Artikel der Ausgabe 8/2019

Neural Computing and Applications 8/2019 Zur Ausgabe