Skip to main content

2022 | OriginalPaper | Buchkapitel

Cardinal Correlated Oversampling for Detection of Malicious Web Links Using Machine Learning

verfasst von : M. Shyamala Devi, Uttam Gupta, Khomchand Sahu, Ranjan Jyoti Das, Santhosh Veeraraghavan Ramesh

Erschienen in: Computer Networks and Inventive Communication Technologies

Verlag: Springer Nature Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The problem with malicious websites is growing day by day as it leads to the black listing of websites. The unauthorized websites are gathering the user’s database information and their assets. Few of the URLs are completely used as a host webpage to publish unrelated web content that signifies cyber-attacks. Cracking the presence of malicious website still pertains as open task due to the lack of web characteristics for malicious and benign websites. To overcome this problem, we are using machine learning techniques for detecting the malicious content and web links. Backgrounding the above, this paper used malicious webpage dataset extracted from UCI dataset repository for predicting the level of mushroom edibility. The categorization of malicious webpage classes is achieved in five ways. Firstly, the dataset consisting of 21 features with 1781 records and is preprocessed with encoding, feature scaling and missing values. Secondly, raw dataset is fitted to all the classifiers with and without the presence of feature scaling and the performance is analyzed. Thirdly, the cardinality free malicious dataset is fitted to all the classifiers with and without the presence of feature scaling and the performance is analyzed. Fourth, the correlated free malicious dataset is fitted to all the classifiers with and without the presence of feature scaling and the performance is analyzed. Fifth, the oversampled malicious dataset is fitted to all the classifiers with and without the presence of feature scaling and the performance is analyzed with precision, recall, accuracy, running time and F-score. Implementation analysis portrays that the decision tree classifier for raw, Cardinality reduced dataset tends to retain the accuracy with 97.1% before and after feature scaling. The random forest classifier with correlated free dataset tends to retain the 96.6% accuracy before and after feature scaling. Decision tree classifier for oversampled dataset tends to retain the accuracy with 98.4% before and after feature scaling. From the above analysis decision tree classifier is found to be more efficient in its accuracy with all raw, cardinality free and oversampled dataset.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Basnet, R.B., Doleck, T.: Towards developing a tool to detect phishing URLs: a machine learning approach. In: Proceedings of IEEE International Conference on Computational Intelligence & Communication Technology, pp. 220–223 (2015) Basnet, R.B., Doleck, T.: Towards developing a tool to detect phishing URLs: a machine learning approach. In: Proceedings of IEEE International Conference on Computational Intelligence & Communication Technology, pp. 220–223 (2015)
2.
Zurück zum Zitat Selvan, K., Muthuraman, V.: Detection of phishing web pages based on features vector and prevention using multi layered authentication. Int. J. Pure Appl. Math. 119, 564–573 (2018) Selvan, K., Muthuraman, V.: Detection of phishing web pages based on features vector and prevention using multi layered authentication. Int. J. Pure Appl. Math. 119, 564–573 (2018)
3.
Zurück zum Zitat Chamidah, N., Wasito, I.: Fetal state classification from cardiotocography based on feature extraction using hybrid K-means and support vector machine. In: Proceedings of International Conference on Advanced Computer Science and Information Systems, 25 Feb 2016 Chamidah, N., Wasito, I.: Fetal state classification from cardiotocography based on feature extraction using hybrid K-means and support vector machine. In: Proceedings of International Conference on Advanced Computer Science and Information Systems, 25 Feb 2016
5.
Zurück zum Zitat Jagannathan, D.: Cardiotocography—a comparative study between support vector machine and decision tree algorithms. Int. J. Trend Res. Dev. 4(1) (2017) Jagannathan, D.: Cardiotocography—a comparative study between support vector machine and decision tree algorithms. Int. J. Trend Res. Dev. 4(1) (2017)
7.
Zurück zum Zitat Silva, R.M., Almeida, T.A., Yamakami, A.: Towards web spam filtering using a classifier based on the minimum description length principle. In: Proceedings of 15th IEEE International Conference on Machine Learning and Applications (ICMLA), Anaheim, CA, pp. 470–475 (2016). https://doi.org/10.1109/ICMLA.2016.0083 Silva, R.M., Almeida, T.A., Yamakami, A.: Towards web spam filtering using a classifier based on the minimum description length principle. In: Proceedings of 15th IEEE International Conference on Machine Learning and Applications (ICMLA), Anaheim, CA, pp. 470–475 (2016). https://​doi.​org/​10.​1109/​ICMLA.​2016.​0083
9.
Zurück zum Zitat Crisan, A., Florea, G., Halasz, L., Lemnaru, C., Oprisa, C.: Detecting malicious URLs based on machine learning algorithms and word embeddings. In: Proceedings of International Conference on Intelligent Computer Communication and Processing (ICCP), Cluj-Napoca, Romania, pp. 187–193 (2020) Crisan, A., Florea, G., Halasz, L., Lemnaru, C., Oprisa, C.: Detecting malicious URLs based on machine learning algorithms and word embeddings. In: Proceedings of International Conference on Intelligent Computer Communication and Processing (ICCP), Cluj-Napoca, Romania, pp. 187–193 (2020)
11.
Zurück zum Zitat Gabriel, A.D., Gavrilut, D.T., Alexandru. B.I., Stefan, P.A.:Detecting malicious URLs: a semi-supervised machine learning system approach. In: Proceedings of Symbolic and Numeric Algorithms for Scientific Computing, pp. 233–239 (2016) Gabriel, A.D., Gavrilut, D.T., Alexandru. B.I., Stefan, P.A.:Detecting malicious URLs: a semi-supervised machine learning system approach. In: Proceedings of Symbolic and Numeric Algorithms for Scientific Computing, pp. 233–239 (2016)
12.
Zurück zum Zitat Aydin, M., Butun, I., Bicakci, K., Baykal, N.: Using attribute-based feature selection approaches and machine learning algorithms for detecting fraudulent website URLs. Computing and Communication Workshop, p. 774–779 (2020). Aydin, M., Butun, I., Bicakci, K., Baykal, N.: Using attribute-based feature selection approaches and machine learning algorithms for detecting fraudulent website URLs. Computing and Communication Workshop, p. 774–779 (2020).
13.
Zurück zum Zitat Sajedi, H., Allameh, F.: Detection of malicious web pages by evolutionary ensemble learning. Int. J. Hybrid Intell. Syst. 51–59 (2017) Sajedi, H., Allameh, F.: Detection of malicious web pages by evolutionary ensemble learning. Int. J. Hybrid Intell. Syst. 51–59 (2017)
14.
Zurück zum Zitat Manjeri, A.S., Nair, P.C.: A machine learning approach for detecting malicious websites using URL features. In: Proceedings of International conference on Electronics, Communication and Aerospace Technology, pp. 555–561 (2019) Manjeri, A.S., Nair, P.C.: A machine learning approach for detecting malicious websites using URL features. In: Proceedings of International conference on Electronics, Communication and Aerospace Technology, pp. 555–561 (2019)
15.
Zurück zum Zitat Yang, W., Zuo, W., Cui, B.: Detecting malicious URLs via a keyword-based convolutional gated-recurrent-unit neural network. IEEE Access 7, 29891–29900 (2019)CrossRef Yang, W., Zuo, W., Cui, B.: Detecting malicious URLs via a keyword-based convolutional gated-recurrent-unit neural network. IEEE Access 7, 29891–29900 (2019)CrossRef
Metadaten
Titel
Cardinal Correlated Oversampling for Detection of Malicious Web Links Using Machine Learning
verfasst von
M. Shyamala Devi
Uttam Gupta
Khomchand Sahu
Ranjan Jyoti Das
Santhosh Veeraraghavan Ramesh
Copyright-Jahr
2022
Verlag
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-16-3728-5_36

Neuer Inhalt