Top

Published in:

2016 | OriginalPaper | Chapter

Detecting Malicious URLs Using Lexical Analysis

Authors : Mohammad Saiful Islam Mamun, Mohammad Ahmad Rathore, Arash Habibi Lashkari, Natalia Stakhanova, Ali A. Ghorbani

Published in: Network and System Security

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

The Web has long become a major platform for online criminal activities. URLs are used as the main vehicle in this domain. To counter this issues security community focused its efforts on developing techniques for mostly blacklisting of malicious URLs. While successful in protecting users from known malicious domains, this approach only solves part of the problem. The new malicious URLs that sprang up all over the web in masses commonly get a head start in this race. Besides that Alexa ranked trusted websites may convey compromised fraudulent URLs called defacement URL. In this work, we explore a lightweight approach to detection and categorization of the malicious URLs according to their attack type. We show that lexical analysis is effective and efficient for proactive detection of these URLs. We provide the set of sufficient features necessary for accurate categorization and evaluate the accuracy of the approach on a set of over 110,000 URLs. We also study the effect of the obfuscation techniques on malicious URLs to figure out the type of obfuscation technique targeted at specific type of malicious URL.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Iris Cancellable Template Generation Based on Indexing-First-One Hashing

next chapter Gatekeeping Behavior Analysis for Information Credibility Assessment on Weibo

Using web content as features requires downloading and analysis of page contents. Moreover inspecting millions of URL and its contents per unit of time may create a bottleneck.

Access to malicious webpage may cause risk since such webpages may contain malicious content such as Javascript functions.

URLs which originating from pages that are written in server side scripting languages, often have arguments [3]. The longest variable value length from arguments of URL is calculated.

Google Safe Browsing Transparency Report (2015). www.google.com/transparencyreport/safebrowsing/

Su, K.-W., et al.: Suspicious URL filtering based on logistic regression with multi-view analysis. In: 8th Asia Joint Conference on Information Security (Asia JCIS). IEEE (2013)

Le, A., Markopoulou, A., Faloutsos, M.: PhishDef: URL names say it all. In: Proceedings IEEE, INFOCOM. IEEE (2011)

Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)MathSciNetCrossRefMATH

Thomas, K., et al.: Design and evaluation of a real-time URL spam filtering service. In: Proceeding of the IEEE Symposium on Security and Privacy (SP) (2011)

Ma, J., et al.: Identifying suspicious URLs: an application of large-scale online learning. In: Proceedings of the 26th Annual International Conference on Machine Learning. ACM (2009)

Nunan, A.E., et al.: Automatic classification of cross-site scripting in web pages using document-based and URL-based features. In: IEEE Symposium on Computers and Communications (ISCC) (2012)

Choi, H., Zhu, B.B., Lee, H.: Detecting malicious web links and identifying their attack types. In: Proceedings WebApps (2011)

Huang, D., Kai, X., Pei, J.: Malicious URL detection by dynamically mining patterns without pre-defined elements. World Wide Web 17(6), 1375–1394 (2014)CrossRef

10.

Chu, W., et al.: Protect sensitive sites from phishing attacks using features extractable from inaccessible phishing URLs. In: IEEE International Conference on Communications (ICC) (2013)

11.

Xu, L., et al.: Cross-layer detection of malicious websites. In: Proceedings of the Third ACM Conference on Data and Application Security and Privacy. ACM (2013)

12.

Garera, S., et al.: A framework for detection and measurement of phishing attacks. In: Proceedings of the ACM Workshop on Recurring Malcode (2007)

13.

Radu, Vasile: Application. In: Radu, Vasile (ed.) Stochastic Modeling of Thermal Fatigue Crack Growth. ACM, vol. 1, pp. 63–70. Springer, Heidelberg (2015)

14.

Kevin, M.D., Gupta, M.: Behind phishing: an examination of phisher modi operandi. In: Proceedings of the 1st Usenix Workshop on Large-Scale Ex-ploits and Emergent Threats (2008)

15.

Ma, J., et al.: Beyond blacklists: learning to detect malicious web sites from suspicious URLs. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM (2009)

16.

Davide, C., et al.: Prophiler: a fast filter for the large-scale detection of malicious web pages. In: Proceedings of the 20th International Conference on World Wide Web. ACM (2011)

17.

Xiang, G., et al.: CANTINA+: a feature-rich machine learning framework for detecting phishing web sites. ACM Trans. Inf. Syst. Secur. (TISSEC) 14(2), 21 (2011)CrossRef

18.

Abdelhamid, N., Aladdin, A., Thabtah, F.: Phishing detection based associative classification data mining. Expert Syst. Appl. 41(3), 5948–5959 (2014)CrossRef

19.

Eshete, B., Villafiorita, A., Binspect, K.W.: Holistic Analysis and Detection of Malicious Web Pages. Security and Privacy in Communication Networks. Springer, Heidelberg (2012)

20.

Cao, C., Caverlee, J.: Behavioral detection of spam URL sharing: posting patterns versus click patterns. In: IEEE International Conference on Advances in Social Networks Analysis and Mining (ASONAM) (2014)

21.

Lin, M.-S., et al.: Malicious URL filtering- a big data application. IEEE International Conference on Big Data (2013)

22.

Enrico, S., Bartoli, A., Medvet, E.: Detection of hidden fraudulent urls within trusted sites using lexical features. In: Proceeding 18th International Conference on Availability, Reliability and Security (ARES). IEEE (2013)

23.

Garera, S., Provos, N., Chew, M., Rubin, A.: A framework for detection and measurement of phishing attacks. In: Proceedings of the ACM workshop on Recurring malcode, pp. 1–8. ACM (2007)

24.

WEBSPAM-UK2007 dataset. http://chato.cl/webspam/datasets/uk2007/

25.

Malware domain dataset. http://www.malwaredomains.com/

26.

OpenPhish dataset. https://openphish.com/

27.

Davanzo, M., Bartoli, A.: Anomaly detection techniques for a web defacement monitoring service. Expert Syst. Appl. (ESWA) 38(10), 12521–12530 (2011)CrossRef

28.

Zone-h, unrestricted information. http://www.zone-h.org/

29.

Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)

30.

Keller, J.M., Gray, M.R., Givens, J.A.: A fuzzy k-nearest neighbor algorithm. IEEE Trans. Syst. Man Cybern. 4, 580–585 (1985)CrossRef

31.

Liaw, A., Wiener, M.: Classification and regression by randomForest. R news 2(3), 18–22 (2002)

Title: Detecting Malicious URLs Using Lexical Analysis
Authors: Mohammad Saiful Islam Mamun
Mohammad Ahmad Rathore
Arash Habibi Lashkari
Natalia Stakhanova
Ali A. Ghorbani
Publisher: Springer International Publishing
Book: Network and System Security
Print ISBN: 978-3-319-46297-4

Electronic ISBN: 978-3-319-46298-1

Copyright Year: 2016
DOI: https://doi.org/10.1007/978-3-319-46298-1_30

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner