Top

Published in:

2021 | OriginalPaper | Chapter

Bio-inspired Machine Learning Mechanism for Detecting Malicious URL Through Passive DNS in Big Data Platform

Authors : Saad M. Darwish, Ali E. Anber, Saleh Mesbah

Published in: Machine Learning and Big Data Analytics Paradigms: Analysis, Applications and Challenges

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Malicious links are used as a source by the distribution channels to broadcast malware all over the Web. These links become instrumental in giving partial or full system control to the attackers. To overcome these issues, researchers have applied machine learning techniques for malicious URL detection. However, these techniques fall to identify distinguishable generic features that are able to define the maliciousness of a given domain. Generally, well-crafted URL’s features contribute considerably to the success of machine learning approaches, and on the contrary, poor features may ruin even good detection algorithms. In addition, the complex relationships between features are not easy to spot. The work presented in this paper explores how to detect malicious Web sites from passive DNS based features. This problem lends itself naturally to modern algorithms for selecting discriminative features in the continuously evolving distribution of malicious URLs. So, the suggested model adapts a bio-inspired feature selection technique to choose an optimal feature set in order to reduce the cost and running time of a given system, as well as achieving an acceptably high recognition rate. Moreover, a two-step artificial bee colony (ABC) algorithm is utilized for efficient data clustering. The two approaches are incorporated within a unified framework that operates on the top of Hadoop infrastructure to deal with large samples of URLs. Both the experimental and statistical analyses show that improvements in the hybrid model have an advantage over some conventional algorithms for detecting malicious URL attacks. The results demonstrated that the suggested model capable to scale 10 million query answer pairs with more than 96.6% accuracy.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter A Chaotic Search-Enhanced Genetic Algorithm for Bilevel Programming Problems

next chapter TargetAnalytica: A Text Analytics Framework for Ranking Therapeutic Molecules in the Bibliome

Sayamber, A., Dixit, A.: Malicious URL detection and identification. Int. J. Comput. Appl. 99(17), 17–23 (2014)

Zhauniarovich, Y., Khalil, I., Yu, T., Dacier, M.: A survey on malicious domains detection through DNS data analysis. ACM Comput. Surv. 51(4), 1–36 (2018)CrossRef

Watkins, L., Beck, S., Zook, J., Buczak, A., Chavis, J., Mishra, S.: Using semi-supervised machine learning to address the big data problem in DNS networks. In: Proceedings of the IEEE 7th Annual Computing and Communication Conference (CCWC), pp. 1–6, USA (2017)

Sahoo, D., Liu, C., Hoi, S.: Malicious URL Detection Using Machine Learning: A Survey. arXiv preprint arXiv:1701.07179, pp. 1–21 (2017)

Antonakakis, M., Perdisci, R., Lee, W., Vasiloglou, N., Dagon, D.: Detecting malware domains at the upper DNS hierarchy. In: Proceedings of the 20th USENIX Conference on Security (SEC’11), pp. 1–16, USA (2011)

Ma, J., Saul, L., Savage, S., Voelker, G: Beyond blacklists: learning to detect malicious web sites from suspicious URLs. In: Proceeding of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1245–1254, France (2009)

Zhang, Y., Hong, J., Cranor, L.: CANTINA: a content-based approach to detecting phishing web sites. In: Proceedings of the 16th International Conference on World Wide Web, pp. 639–648, Canada (2007)

Kan, M.-Y., Thi, H.: Fast webpage classification using URL features. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, pp. 325–326, Germany (2005)

Guan, D., Chen, C., Lin, J.: Anomaly based malicious URL detection in instant messaging. In: Proceedings of the Joint Workshop on Information Security, Taiwan (2009)

10.

Bilge, L., Sen, S., Balzarotti, D., Kirda, E., Kruegel, C.: EXPOSURE: a passive DNS analysis service to detect and report malicious domains. ACM Trans. Inf. Syst. Secur. 16(4), 1–28 (2014)CrossRef

11.

Manikandan, S., Ravi, S.: Big data analysis using Apache Hadoop. In: Proceedings of the International Conference on IT Convergence and Security (ICITCS), pp. 1–4, China (2014)

12.

Figo, D., Diniz, P., Ferreira, D., Cardoso, J.: Preprocessing techniques for context recognition from accelerometer data. Pers. Ubiquit. Comput. 14(7), 645–662 (2010)

13.

El-Sawy, A., Hussein, M., Zaki, E., Mousa, A.: An introduction to genetic algorithms: a survey, a practical issues. Int. J. Sci. Eng. Res. 5(1), 252–262 (2014)

14.

Sivanandam, S., Deepa, S.: Introduction to Genetic Algorithms. Springer, USA (2007)

15.

Kumar, Y., Sahoo, G.: A two-step artificial bee colony algorithm for clustering. Neural Comput. Appl. 28(3), 537–551 (2015)

16.

Veček, N., Liu, S., Črepinšek, M., Mernik, M.: On the importance of the artificial bee colony control parameter ‘Limit’. Inf. Technol. Control 46(4), 566–604 (2017)

Title: Bio-inspired Machine Learning Mechanism for Detecting Malicious URL Through Passive DNS in Big Data Platform
Authors: Saad M. Darwish
Ali E. Anber
Saleh Mesbah
Publisher: Springer International Publishing
Book: Machine Learning and Big Data Analytics Paradigms: Analysis, Applications and Challenges
Print ISBN: 978-3-030-59337-7

Electronic ISBN: 978-3-030-59338-4

Copyright Year: 2021
DOI: https://doi.org/10.1007/978-3-030-59338-4_9

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner