Top

Neural Computing and Applications

Published in:

06-01-2018 | Original Article

Detection of phishing websites using an efficient feature-based machine learning framework

Authors: Routhu Srinivasa Rao, Alwyn Roshan Pais

Published in: Neural Computing and Applications | Issue 8/2019

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Phishing is a cyber-attack which targets naive online users tricking into revealing sensitive information such as username, password, social security number or credit card number etc. Attackers fool the Internet users by masking webpage as a trustworthy or legitimate page to retrieve personal information. There are many anti-phishing solutions such as blacklist or whitelist, heuristic and visual similarity-based methods proposed to date, but online users are still getting trapped into revealing sensitive information in phishing websites. In this paper, we propose a novel classification model, based on heuristic features that are extracted from URL, source code, and third-party services to overcome the disadvantages of existing anti-phishing techniques. Our model has been evaluated using eight different machine learning algorithms and out of which, the Random Forest (RF) algorithm performed the best with an accuracy of 99.31%. The experiments were repeated with different (orthogonal and oblique) random forest classifiers to find the best classifier for the phishing website detection. Principal component analysis Random Forest (PCA-RF) performed the best out of all oblique Random Forests (oRFs) with an accuracy of 99.55%. We have also tested our model with the third-party-based features and without third-party-based features to determine the effectiveness of third-party services in the classification of suspicious websites. We also compared our results with the baseline models (CANTINA and CANTINA+). Our proposed technique outperformed these methods and also detected zero-day phishing attacks.

previous article VLCI approach for optimal capacitors allocation in distribution networks based on hybrid PSOGSA optimization algorithm

next article A novel modified flower pollination algorithm for global optimization

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

http://jsoup.org/..

https://www.internic.net/whois.html.

http://www.alexa.com/siteinfo/page-rank-calculator.com.

https://developers.google.com/web-search/.

http://jsoup.org/..

Ollmann G (2004) The phishing guide. Next Generation Security Software Limited. http://www-935.ibm.com/services/us/iss/pdf/phishing-guide-wp.pdf

APWG (2016) Phishing attack trends reports, fourth quarter 2016. http://docs.apwg.org/reports/apwg_trends_report_q4_2016.pdf. Accessed 03 Mar 2017

Dhamija R, Tygar JD, Hearst M (2006) Why phishing works. In: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, pp 581–590. https://doi.org/10.1145/1124772.1124861

APWG (2016) Phishing attack trends reports, first quarter 2016. http://docs.apwg.org/reports/apwg_trends_report_q1_2016.pdf. Accessed 01 June 2016

(2014) Kaspersky lab:spam and phishing trends and statistics report q1 2014. https://usa.kaspersky.com/internet-security-center/threats/spam-statistics-report-q1-2014. Accessed 15 July 2015

Hong J (2012) The state of phishing attacks. Commun ACM 55(1):74–81CrossRef

Cao Y, Han W, Le Y (2008) Anti-phishing based on automated individual white-list. In: Proceedings of the 4th ACM workshop on digital identity management, ACM, pp 51–60

Zhang J, Porras PA, Ullrich J (2008) Highly predictive blacklisting. In: USENIX security symposium, pp 107–122

Prakash P, Kumar M, Kompella RR, Gupta M (2010) Phishnet: predictive blacklisting to detect phishing attacks. In: INFOCOM, 2010 Proceedings IEEE, IEEE, pp 1–5. https://doi.org/10.1109/INFCOM.2010.5462216

10.

Almomani A, Wan TC, Altaher A, Manasrah A (2012) Evolving fuzzy neural network for phishing emails detection. J Comput Sci 8(7):1099CrossRef

11.

Joshi Y, Saklikar S, Das D, Saha S (2008) Phishguard: a browser plug-in for protection from phishing. In: Internet multimedia services architecture and applications, 2008. IMSAA 2008. 2nd International Conference on IEEE, pp 1–6. https://doi.org/10.1109/IMSAA.2008.4753929

12.

Chou N, Ledesma R, Teraguchi Y, Mitchell JC, et al (2004) Client-side defense against web-based identity theft. In: NDSS. doi: 10.1.1.65.679, http://www.isoc.org/isoc/conferences/ndss/04/proceedings/Papers/Chou.pdf

13.

Shahriar H, Zulkernine M (2012) Trustworthiness testing of phishing websites: a behavior model-based approach. Future Generation Computer Systems 28(8):1258–1271. https://doi.org/10.1016/j.future.2011.02.001, http://www.sciencedirect.com/science/article/pii/S0167739X11000045

14.

Rao RS, Ali ST (2015) Phishshield: a desktop application to detect phishing webpages through heuristic approach. Proc Comput Sci 54:147–156. https://doi.org/10.1016/j.procs.2015.06.017 CrossRef

15.

Srinivasa Rao R, Pais AR (2017) Detecting phishing websites using automation of human behavior. In: Proceedings of the 3rd ACM workshop on cyber-physical system security, ACM, New York, NY, USA, CPSS ’17, pp 33–42. https://doi.org/10.1145/3055186.3055188

16.

Fu AY, Wenyin L, Deng X (2006) Detecting phishing web pages with visual similarity assessment based on earth mover’s distance (EMD). IEEE Trans Dependable Secur Comput 3(4):301–311CrossRef

17.

Wenyin L, Huang G, Xiaoyue L, Min Z, Deng X (2005) Detection of phishing webpages based on visual similarity. In: Special interest tracks and posters of the 14th international conference on World Wide Web, ACM, pp 1060–1061

18.

Hara M, Yamada A, Miyake Y (2009) Visual similarity-based phishing detection without victim site information. In: Computational intelligence in cyber security, 2009. CICS’09. IEEE symposium on, IEEE, pp 30–36. https://doi.org/10.1109/CICYBS.2009.4925087

19.

Rao RS, Ali ST (2015) A computer vision technique to detect phishing attacks. In: Communication systems and network technologies (CSNT), 2015 Fifth international conference on IEEE, pp 596–601. https://doi.org/10.1109/CSNT.2015.68

20.

Whittaker C, Ryner B, Nazif M (2010) Large-scale automatic classification of phishing pages. In: NDSS ’10. http://www.isoc.org/isoc/conferences/ndss/10/pdf/08.pdf

21.

Khonji M, Iraqi Y, Jones A (2013) Phishing detection: a literature survey. IEEE Commun Surv Tutor 15(4):2091–2121. https://doi.org/10.1109/SURV.2013.032213.00009 CrossRef

22.

Huh JH, Kim H (2011) Phishing detection with popular search engines: simple and effective. In: International symposium on foundations and practice of security. Springer, pp 194–207. https://doi.org/10.1007/978-3-642-27901-0_15

23.

Zhang Y, Hong JI, Cranor LF (2007) Cantina: a content-based approach to detecting phishing web sites. In: Proceedings of the 16th international conference on World Wide Web, ACM, pp 639–648. https://doi.org/10.1145/1242572.1242659, http://dl.acm.org/citation.cfm?id=1242659

24.

Pan Y, Ding X (2006) Anomaly based web phishing page detection. In: Proceedings-annual computer security applications conference, ACSAC, vol 6, pp 381–392. https://doi.org/10.1109/ACSAC.2006.13

25.

APWG (2014) Global phishing reports 1st half 2014. http://docs.apwg.org/reports/APWG_Global_Phishing_Report_1H_2014.pdf. Accessed 01 June 2016

26.

He M, Horng SJ, Fan P, Khan MK, Run RS, Lai JL, Chen RJ, Sutanto A (2011) An efficient phishing webpage detector. Expert systems with applications 38(10):12,018–12,027. https://doi.org/10.1016/j.eswa.2011.01.046, http://www.sciencedirect.com/science/article/pii/S0957417411000662

27.

Xiang G, Hong J, Rose CP, Cranor L (2011) Cantina+: a feature-rich machine learning framework for detecting phishing web sites. ACM Trans Inf Syst Secur (TISSEC) 14(2):21. https://doi.org/10.1145/2019599.2019606, http://dl.acm.org/citation.cfm?doid=2019599.2019606

28.

Miyamoto D, Hazeyama H, Kadobayashi Y (2008) An evaluation of machine learning-based methods for detection of phishing sites. In: International conference on neural information processing. Springer, pp 539–546. https://doi.org/10.1007/978-3-642-02490-0_66

29.

Zhang D, Yan Z, Jiang H, Kim T (2014) A domain-feature enhanced classification model for the detection of Chinese phishing e-business websites. Inf Manag 51(7):845–853. https://doi.org/10.1016/j.im.2014.08.003, http://www.sciencedirect.com/science/article/pii/S0378720614001001

30.

Menze BH, Kelm BM, Splitthoff DN, Koethe U, Hamprecht FA (2011) On oblique random forests. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 453–469

31.

Zhang L, Suganthan PN (2014) Random forests with ensemble of feature spaces. Pattern Recogn 47(10):3429–3437CrossRef

32.

Zhang L, Suganthan PN (2015) Oblique decision tree ensemble via multisurface proximal support vector machine. IEEE Trans Cybern 45(10):2165–2176CrossRef

33.

Gowtham R, Krishnamurthi I (2014) A comprehensive and efficacious architecture for detecting phishing webpages. Comput Secur 40:23–37. https://doi.org/10.1016/j.cose.2013.10.004, http://www.sciencedirect.com/science/article/pii/S0167404813001442

34.

Mohammad RM, Thabtah F, McCluskey L (2014) Predicting phishing websites based on self-structuring neural network. Neural Comput Appl 25(2):443–458CrossRef

35.

Chiew KL, Chang EH, Tiong WK et al (2015) Utilisation of website logo for phishing detection. Comput Secur 54:16–26. https://doi.org/10.1016/j.cose.2015.07.006 CrossRef

36.

Moghimi M, Varjani AY (2016) New rule-based phishing detection method. Expert Syst Appl 53:231–242. https://doi.org/10.1016/j.eswa.2016.01.028 CrossRef

37.

Aggarwal A, Rajadesingan A, Kumaraguru P (2012) Phishari: automatic realtime phishing detection on twitter. In: eCrime Researchers Summit (eCrime), 2012, IEEE, pp 1–12

38.

Abdelhamid N, Ayesh A, Thabtah F (2014) Phishing detection based associative classification data mining. Expert Syst Appl 41(13):5948–5959CrossRef

39.

Dietterich TG (2000) Ensemble methods in machine learning. In: International workshop on multiple classifier systems. Springer, pp 1–15

40.

Ren Y, Zhang L, Suganthan PN (2016) Ensemble classification and regression-recent developments, applications and future directions [review article]. IEEE Comput Intell Mag 11(1):41–53CrossRef

41.

Fernández-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems. J Mach Learn Res 15(1):3133–3181MathSciNetMATH

42.

Akinyelu AA, Adewumi AO (2014) Classification of phishing email using random forest machine learning technique. J Appl Math 2014:6. https://doi.org/10.1155/2014/425731

43.

Fette I, Sadeh N, Tomasic A (2007) Learning to detect phishing emails. In: Proceedings of the 16th international conference on World Wide Web, ACM, pp 649–656

44.

Dewan P, Kumaraguru P (2015) Towards automatic real time identification of malicious posts on facebook. In: Privacy, security and trust (PST), 2015 13th Annual Conference on IEEE, pp 85–92

45.

Breiman L (2001) Random forests. Mach Learn 45(1):5–32CrossRefMATH

46.

Mangasarian OL, Wild EW (2006) Multisurface proximal support vector machine classification via generalized eigenvalues. IEEE Trans Pattern Anal Mach Intell 28(1):69–74CrossRef

47.

Manwani N, Sastry P (2012) Geometric decision tree. IEEE Trans Syst Man Cybern Part B (Cybernetics) 42(1):181–192CrossRef

48.

Mohammad RM, Thabtah F, McCluskey L (2012) An assessment of features related to phishing websites using an automated technique. In: Internet technology and secured transactions, 2012 international conference for IEEE, pp 492–497

49.

Mohammad RM, Thabtah F, McCluskey L (2014) Intelligent rule-based phishing websites classification. IET Inf Secur 8(3):153–160CrossRef

50.

Basnet RB, Sung AH, Liu Q (2011) Rule-based phishing attack detection. In: International conference on security and management (SAM 2011), Las Vegas, NV

51.

Garera S, Provos N, Chew M, Rubin AD (2007) A framework for detection and measurement of phishing attacks. In: Proceedings of the 2007 ACM workshop on recurring malcode, ACM, pp 1–8

52.

Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844CrossRef

Title: Detection of phishing websites using an efficient feature-based machine learning framework
Authors: Routhu Srinivasa Rao
Alwyn Roshan Pais
Publication date: 06-01-2018
Publisher: Springer London
Published in: Neural Computing and Applications / Issue 8/2019
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI: https://doi.org/10.1007/s00521-017-3305-0

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Other articles of this Issue 8/2019

A novel differentiation sectionalized strengthen planning method for transmission line based on support vector regression

A dynamic neural network model for accelerating preliminary parameterization of 3D triangular mesh surfaces

Automatic clustering and feature selection using gravitational search algorithm and its application to microarray data analysis

Construction of robust substitution boxes based on chaotic systems

A multi-objective artificial sheep algorithm

A stochastic well-test analysis on transient pressure data using iterative ensemble Kalman filter

Premium Partner