Skip to main content
Top
Published in:

21-03-2022

Modeling Hybrid Feature-Based Phishing Websites Detection Using Machine Learning Techniques

Authors: Sumitra Das Guptta, Khandaker Tayef Shahriar, Hamed Alqahtani, Dheyaaldin Alsalman, Iqbal H. Sarker

Published in: Annals of Data Science | Issue 1/2024

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this paper, we mainly present a machine learning based approach to detect real-time phishing websites by taking into account URL and hyperlink based hybrid features to achieve high accuracy without relying on any third-party systems. In phishing, the attackers typically try to deceive internet users by masking a webpage as an official genuine webpage to steal sensitive information such as usernames, passwords, social security numbers, credit card information, etc. Anti-phishing solutions like blacklist or whitelist, heuristic, and visual similarity based methods cannot detect zero-hour phishing attacks or brand-new websites. Moreover, earlier approaches are complex and unsuitable for real-time environments due to the dependency on third-party sources, such as a search engine. Hence, detecting recently developed phishing websites in a real-time environment is a great challenge in the domain of cybersecurity. To overcome these problems, this paper proposes a hybrid feature based anti-phishing strategy that extracts features from URL and hyperlink information of client-side only. We also develop a new dataset for the purpose of conducting experiments using popular machine learning classification techniques. Our experimental result shows that the proposed phishing detection approach is more effective having higher detection accuracy of 99.17% with the XG Boost technique than traditional approaches.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Sarker IH, Furhad MH, Nowrozy R (2021) Ai-driven cybersecurity: an overview, security intelligence modeling and research directions. SN Comput Sci 2(3):1–18CrossRef Sarker IH, Furhad MH, Nowrozy R (2021) Ai-driven cybersecurity: an overview, security intelligence modeling and research directions. SN Comput Sci 2(3):1–18CrossRef
3.
go back to reference Rao RS, Pais AR (2019) Detection of phishing websites using an efficient feature-based machine learning framework. Neural Comput Appl 31(8):3851–3873CrossRef Rao RS, Pais AR (2019) Detection of phishing websites using an efficient feature-based machine learning framework. Neural Comput Appl 31(8):3851–3873CrossRef
4.
go back to reference Jain AK, Gupta BB (2019) A machine learning based approach for phishing detection using hyperlinks information. J Ambient Intell Human Comput 10(5):2015–2028CrossRef Jain AK, Gupta BB (2019) A machine learning based approach for phishing detection using hyperlinks information. J Ambient Intell Human Comput 10(5):2015–2028CrossRef
5.
go back to reference Şahingöz ÖK, Buber E, Demir Ö, Diri B (2017) Machine learning based phishing detection from uris Şahingöz ÖK, Buber E, Demir Ö, Diri B (2017) Machine learning based phishing detection from uris
8.
go back to reference Sarker Iqbal H (2021) Data science and analytics: an overview from data-driven smart computing, decision-making and applications perspective. SN Comput Sci Sarker Iqbal H (2021) Data science and analytics: an overview from data-driven smart computing, decision-making and applications perspective. SN Comput Sci
9.
go back to reference Shi Y, Tian Y, Kou G, Peng Y, Li J (eds) (2011) Optimization based data mining: theory and applications. Springer Shi Y, Tian Y, Kou G, Peng Y, Li J (eds) (2011) Optimization based data mining: theory and applications. Springer
10.
go back to reference Iqbal H, Sarker AC, Han J, Watters P (2022) Automated rule-based services with intelligent decision-making. Context-aware machine learning and mobile data analytics. Springer Iqbal H, Sarker AC, Han J, Watters P (2022) Automated rule-based services with intelligent decision-making. Context-aware machine learning and mobile data analytics. Springer
11.
go back to reference Olson DL, Shi Y, Shi Y (2007) Introduction to business data mining, vol 10. McGraw-Hill/Irwin New York Olson DL, Shi Y, Shi Y (2007) Introduction to business data mining, vol 10. McGraw-Hill/Irwin New York
13.
go back to reference Sheng S, Magnien B, Kumaraguru P, Acquisti A, Cranor LF, Hong J, Nunge E (2007) Anti-phishing phil: the design and evaluation of a game that teaches people not to fall for phish. In: Proceedings of the 3rd symposium on Usable privacy and security, pp 88–99 Sheng S, Magnien B, Kumaraguru P, Acquisti A, Cranor LF, Hong J, Nunge E (2007) Anti-phishing phil: the design and evaluation of a game that teaches people not to fall for phish. In: Proceedings of the 3rd symposium on Usable privacy and security, pp 88–99
14.
go back to reference Kumaraguru P, Sheng S, Acquisti A, Cranor LF, Hong J (2010) Teaching johnny not to fall for phish. ACM Trans Internet Technol (TOIT) 10(2):1–31CrossRef Kumaraguru P, Sheng S, Acquisti A, Cranor LF, Hong J (2010) Teaching johnny not to fall for phish. ACM Trans Internet Technol (TOIT) 10(2):1–31CrossRef
15.
go back to reference Arachchilage NAG, Love S (2013) A game design framework for avoiding phishing attacks. Comput Hum Behav 29(3):706–714CrossRef Arachchilage NAG, Love S (2013) A game design framework for avoiding phishing attacks. Comput Hum Behav 29(3):706–714CrossRef
16.
go back to reference Wang Y, Agrawal R, Choi B-Y (2008) Light weight anti-phishing with user whitelisting in a web browser. In: 2008 IEEE Region 5 Conference. IEEE, pp 1–4 Wang Y, Agrawal R, Choi B-Y (2008) Light weight anti-phishing with user whitelisting in a web browser. In: 2008 IEEE Region 5 Conference. IEEE, pp 1–4
17.
go back to reference Han W, Cao Y, Bertino E, Yong J (2012) Using automated individual white-list to protect web digital identities. Expert Syst Appl 39(15):11861–11869CrossRef Han W, Cao Y, Bertino E, Yong J (2012) Using automated individual white-list to protect web digital identities. Expert Syst Appl 39(15):11861–11869CrossRef
18.
go back to reference Chiew KL, Chang FH, Tiong WK et al (2015) Utilisation of website logo for phishing detection. Comput Secur 54:16–26CrossRef Chiew KL, Chang FH, Tiong WK et al (2015) Utilisation of website logo for phishing detection. Comput Secur 54:16–26CrossRef
19.
go back to reference Rosiello APE, Kirda E, Ferrandi F et al. (2007) A layout-similarity-based approach for detecting phishing pages. In: 2007 third international conference on security and privacy in communications networks and the workshops-securecomm 2007. IEEE, pp 454–463 Rosiello APE, Kirda E, Ferrandi F et al. (2007) A layout-similarity-based approach for detecting phishing pages. In: 2007 third international conference on security and privacy in communications networks and the workshops-securecomm 2007. IEEE, pp 454–463
20.
go back to reference Chiew KL, Choo JS-F, Sze SN, Yong KSC (2018) Leverage website favicon to detect phishing websites. Secur Commun Netw Chiew KL, Choo JS-F, Sze SN, Yong KSC (2018) Leverage website favicon to detect phishing websites. Secur Commun Netw
21.
go back to reference Felegyhazi M, Kreibich C, Paxson V (2010) On the potential of proactive domain blacklisting. LEET 10:6–6 Felegyhazi M, Kreibich C, Paxson V (2010) On the potential of proactive domain blacklisting. LEET 10:6–6
22.
go back to reference Jain AK, Gupta BB (2017) Phishing detection: analysis of visual similarity based approaches. Secur Commun Netw Jain AK, Gupta BB (2017) Phishing detection: analysis of visual similarity based approaches. Secur Commun Netw
23.
go back to reference Huang Y, Qin J, Wen W (2019) Phishing url detection via capsule-based neural network. In: 2019 IEEE 13th international conference on anti-counterfeiting, security, and identification (ASID). IEEE, pp 22–26 Huang Y, Qin J, Wen W (2019) Phishing url detection via capsule-based neural network. In: 2019 IEEE 13th international conference on anti-counterfeiting, security, and identification (ASID). IEEE, pp 22–26
24.
go back to reference Rao RS, Vaishnavi T, Pais AR (2020) Catchphish: detection of phishing websites by inspecting urls. J Ambient Intell Hum Comput 11(2):813–825CrossRef Rao RS, Vaishnavi T, Pais AR (2020) Catchphish: detection of phishing websites by inspecting urls. J Ambient Intell Hum Comput 11(2):813–825CrossRef
25.
go back to reference Odeh A, Keshta I, Abdelfattah E (2021) Phiboost-a novel phishing detection model using adaptive boosting approach. Jordanian J Comput Inf Technol (JJCIT) 7(01) Odeh A, Keshta I, Abdelfattah E (2021) Phiboost-a novel phishing detection model using adaptive boosting approach. Jordanian J Comput Inf Technol (JJCIT) 7(01)
26.
go back to reference Babagoli M, Aghababa MP, Solouk V (2019) Heuristic nonlinear regression strategy for detecting phishing websites. Soft Comput 23(12):4315–4327CrossRef Babagoli M, Aghababa MP, Solouk V (2019) Heuristic nonlinear regression strategy for detecting phishing websites. Soft Comput 23(12):4315–4327CrossRef
27.
go back to reference Mohammad RM, Thabtah F, McCluskey L (2014) Predicting phishing websites based on self-structuring neural network. Neural Comput Appl 25(2):443–458CrossRef Mohammad RM, Thabtah F, McCluskey L (2014) Predicting phishing websites based on self-structuring neural network. Neural Comput Appl 25(2):443–458CrossRef
28.
go back to reference Feng F, Zhou Q, Shen Z, Yang X, Han L, Wang J (2018) The application of a novel neural network in the detection of phishing websites. J Ambient Intell Hum Comput:1–15 Feng F, Zhou Q, Shen Z, Yang X, Han L, Wang J (2018) The application of a novel neural network in the detection of phishing websites. J Ambient Intell Hum Comput:1–15
32.
go back to reference Geng G-G, Xiu-Tao Y, Wei W, Chi-Jie M (2014) A taxonomy of hyperlink hiding techniques. Asia-Pacific web conference. Springer, pp 165–176 Geng G-G, Xiu-Tao Y, Wei W, Chi-Jie M (2014) A taxonomy of hyperlink hiding techniques. Asia-Pacific web conference. Springer, pp 165–176
33.
34.
go back to reference Sarker IH (2021) Cyberlearning: effectiveness analysis of machine learning security modeling to detect cyber-anomalies and multi-attacks. Internet of Things 14:100393CrossRef Sarker IH (2021) Cyberlearning: effectiveness analysis of machine learning security modeling to detect cyber-anomalies and multi-attacks. Internet of Things 14:100393CrossRef
35.
go back to reference Mahesh B (2019) Machine learning algorithms-a review. 01 Mahesh B (2019) Machine learning algorithms-a review. 01
36.
go back to reference Anupam S, Kar AK (2021) Phishing website detection using support vector machines and nature-inspired optimization algorithms. Telecommun Syst 76(1):17–32CrossRef Anupam S, Kar AK (2021) Phishing website detection using support vector machines and nature-inspired optimization algorithms. Telecommun Syst 76(1):17–32CrossRef
37.
go back to reference Tang L, Mahmoud QH (2021) A survey of machine learning-based solutions for phishing website detection. Mach Learn Knowl Extract 3(3):672–694CrossRef Tang L, Mahmoud QH (2021) A survey of machine learning-based solutions for phishing website detection. Mach Learn Knowl Extract 3(3):672–694CrossRef
38.
go back to reference Attack and anomaly detection in iot sensors in iot sites using machine learning approaches. Internet of Things 7:100059 (2019) Attack and anomaly detection in iot sensors in iot sites using machine learning approaches. Internet of Things 7:100059 (2019)
39.
go back to reference Bhati BS, Chugh G, Al-Turjman F, Bhati NS (2021) An improved ensemble based intrusion detection technique using xgboost. Trans Emerg Telecommun Technol 32(6):e4076CrossRef Bhati BS, Chugh G, Al-Turjman F, Bhati NS (2021) An improved ensemble based intrusion detection technique using xgboost. Trans Emerg Telecommun Technol 32(6):e4076CrossRef
40.
go back to reference Zhang D, Yan Z, Jiang H, Kim T (2014) A domain-feature enhanced classification model for the detection of chinese phishing e-business websites. Inf Manag 51(7):845–853CrossRef Zhang D, Yan Z, Jiang H, Kim T (2014) A domain-feature enhanced classification model for the detection of chinese phishing e-business websites. Inf Manag 51(7):845–853CrossRef
41.
go back to reference Sarker IH (2021) Deep learning: a comprehensive overview on techniques, taxonomy, applications and research directions. SN Comput Sci 2(6):1–20ADSMathSciNetCrossRef Sarker IH (2021) Deep learning: a comprehensive overview on techniques, taxonomy, applications and research directions. SN Comput Sci 2(6):1–20ADSMathSciNetCrossRef
42.
go back to reference Sarker IH (2022) Ai-based modeling: techniques, applications and research issues towards automation, intelligent and smart systems. SN Comput Sci Sarker IH (2022) Ai-based modeling: techniques, applications and research issues towards automation, intelligent and smart systems. SN Comput Sci
Metadata
Title
Modeling Hybrid Feature-Based Phishing Websites Detection Using Machine Learning Techniques
Authors
Sumitra Das Guptta
Khandaker Tayef Shahriar
Hamed Alqahtani
Dheyaaldin Alsalman
Iqbal H. Sarker
Publication date
21-03-2022
Publisher
Springer Berlin Heidelberg
Published in
Annals of Data Science / Issue 1/2024
Print ISSN: 2198-5804
Electronic ISSN: 2198-5812
DOI
https://doi.org/10.1007/s40745-022-00379-8

Premium Partner