nach oben

International Journal of Information Security

Erschienen in:

22.12.2023 | Regular Contribution

Security bug reports classification using fasttext

verfasst von: Sultan S. Alqahtani

Erschienen in: International Journal of Information Security | Ausgabe 2/2024

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Software developers and maintainers must address security bug reports (SBRs) before they are publicly disclosed, and their system is left vulnerable to attack. Bug tracking systems may contain securities-related reports which are unlabeled as SBRs, which makes it hard for developers to identify them. Therefore, finding unlabeled SBRs is an essential to help security expert developers identify these security issues fast and accurately. The goal of this paper is to aid software developers to better classify bug reports that identify security vulnerabilities as security bug reports through fasttext classifier. Previous work has applied text analytics and machine learning learners to classify which bug reports are security related. We improve on that work, as shown by our analysis of five open-source projects. We first collected a dataset of 45,940 bug reports from five software repositories (e.g., the work of Peters et al. and Shu et al.). Second, we conducted an experiment throughout the classification of SBRs using machine learning technique; particularly, we built fasttext classifiers. Finally, we investigated the accuracy of our built fasttext classifiers in identifying SBRs. Our experiment results show that our fasttext classifier can achieve an average F1 score of 0.81 when used to identify SBRs. Furthermore, we examined the generalizability of identifying SBRs by applying cross-project validation, and our results showed that the fasttext classifier is able to achieve an average F1 score values of 0.65. Finally, we made our data and results available at Alqahtani (fasttext implementation, 2023. https://github.com/isultane/fasttext_classifications) to help the replication of our work.

Vorheriger Artikel A perspective–retrospective analysis of diversity in signature-based open-source network intrusion detection systems

Nächster Artikel Enhancing detection of malicious profiles and spam tweets with an automated honeypot framework powered by deep learning

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

https://cwe.mitre.org/.

https://docs.python.org/3/c-api/utilities.html.

https://cwe.mitre.org/.

https://fasttext.cc/docs/en/options.html.

Floris, P., Vogt Harald, H.: How to save on software maintenance costs, omnext white pape, vol. SOURCE 2 V (2010)

Rui, S., Tianpei, X., Laurie, W., Tim, M.: Better security bug report classification via hyperparameter optimization (2019). https://arxiv.org/pdf/1905.06872.pdf

Chawla, I., Singh, S.K.: Automatic bug labeling using semantic information from LSI. In: 2014 Seventh International Conference on Contemporary Computing (IC3), pp. 376–381 (2014). https://doi.org/10.1109/IC3.2014.6897203.

Bozorgi, M., Saul, L.K., Savage, S., Voelker, G.M.: Beyond heuristics: learning to classify vulnerabilities and predict exploits. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining—KDD ’10, p. 105 (2010). https://doi.org/10.1145/1835804.1835821

Peters, F., Tun, T.T., Yu, Y., Nuseibeh, B.: Text filtering and ranking for security bug report prediction. IEEE Trans. Softw. Eng. 45(6), 615–631 (2019). https://doi.org/10.1109/TSE.2017.2787653CrossRef

Wijayasekara, D., Manic, M., Wright, J.L., McQueen, M.: Mining bug databases for unidentified software vulnerabilities. In: 2012 5th International Conference on Human System Interactions, pp. 89–96 (2012). https://doi.org/10.1109/HSI.2012.22

Wu, X., Zheng, W., Xia, X., Lo, D.: Data quality matters: a case study on data label correctness for security bug report prediction. IEEE Trans. Softw. Eng. 48(7), 2541–2556 (2022). https://doi.org/10.1109/TSE.2021.3063727CrossRef

Fu, W., Menzies, T.: Easy over hard: a case study on deep learning. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, pp. 49–60 (2017). https://doi.org/10.1145/3106237.3106256

Liu, Z., Xia, X., Hassan, A.E., Lo, D., Xing, Z., Wang, X.: Neural-machine-translation-based commit message generation: how far are we? In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, pp. 373–384 (2018). https://doi.org/10.1145/3238147.3238190

10.

Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pp. 427–431 (2017). https://aclanthology.org/E17-2068

11.

Ohira, M., et al.: A dataset of high impact bugs: manually-classified issue reports. In: 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, pp. 518–521 (2015). https://doi.org/10.1109/MSR.2015.78

12.

Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002). https://doi.org/10.1613/jair.953CrossRef

13.

Abir, R., Moulay, A.A. Malbert: using transformers for cybersecurity and malicious software detection (2021). https://arxiv.org/pdf/2103.03806.pdf

14.

Roopak, M., Yun Tian, G., Chambers, J.: Deep learning models for cyber security in IoT networks. In: 2019 IEEE 9th Annual Computing and Communication Workshop and Conference (CCWC), pp. 0452–0457 (2019). https://doi.org/10.1109/CCWC.2019.8666588

15.

Yin, J., Tang, M., Cao, J., Wang, H.: Apply transfer learning to cybersecurity: predicting exploitability of vulnerabilities by description. Knowl. Based Syst. 210, 106529 (2020). https://doi.org/10.1016/j.knosys.2020.106529CrossRef

16.

Johnson, R., Zhang, T.: Semi-supervised convolutional neural networks for text categorization via region embedding. In: Advances in Neural Information Processing Systems, vol. 28 (2015). https://proceedings.neurips.cc/paper/2015/file/acc3e0404646c57502b480dc052c4fe1-Paper.pdf

17.

Liu, J., Chang, W.-C., Wu, Y., Yang, Y.: Deep learning for extreme multi-label text classification. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 115–124 (2017). https://doi.org/10.1145/3077136.3080834

18.

Alqahtani, S.S.: fasttext implementation (2023). https://github.com/isultane/fasttext_classifications. Accessed 20 June 2023

19.

Song, Q., Guo, Y., Shepperd, M.: A comprehensive investigation of the role of imbalanced learning for software defect prediction. IEEE Trans. Softw. Eng. 45(12), 1253–1269 (2019). https://doi.org/10.1109/TSE.2018.2836442CrossRef

20.

Kallis, R., Di Sorbo, A., Canfora, G., Panichella, S.: Ticket tagger: machine learning driven issue classification. In: 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 406–409 (2019). https://doi.org/10.1109/ICSME.2019.00070

21.

Mileva, Y.M., Dallmeier, V., Burger, M., Zeller, A.: Mining trends of library usage. In: Proceedings of the Joint International and Annual ERCIM Workshops on Principles of Software Evolution (IWPSE) and Software Evolution (Evol) Workshops, pp. 57–62 (2009). https://doi.org/10.1145/1595808.1595821

22.

Gegick, M., Rotella, P., Xie, T.: Identifying security bug reports via text mining: an industrial case study. In: 7th IEEE Working Conference on Mining Software Repositories, pp. 11–20 (2010). https://doi.org/10.1109/MSR.2010.5463340

23.

Scandariato, R., Walden, J., Hovsepyan, A., Joosen, W.: Predicting vulnerable software components via text mining. IEEE Trans. Softw. Eng. 40(10), 993–1006 (2014). https://doi.org/10.1109/TSE.2014.2340398CrossRef

24.

Yang, Y., Xia, X., Lo, D., Bi, T., Grundy, J., Yang, X.: Predictive models in software engineering: challenges and opportunities. ACM Trans. Softw. Eng. Methodol. 31(3), 1–72 (2022). https://doi.org/10.1145/3503509CrossRef

25.

Sawadogo, A.D., Guimard, T.F., Bissyandé, Q., Kader Kaboré, J., Klein, A., Moha, N.: Early Detection of Security-Relevant Bug Reports using Machine Learning: How Far Are We? eprint arXiv:2112.10123 (2021). https://ui.adsabs.harvard.edu/abs/2021arXiv211210123S/abstract

26.

Berrar, D.: Cross-validation. In: Encyclopedia of Bioinformatics and Computational Biology, pp. 542–545. Elsevier (2019)

27.

Zhang, Z.: Introduction to machine learning: k-nearest neighbors. Ann. Transl. Med. 4(11), 218–218 (2016). https://doi.org/10.21037/atm.2016.03.37CrossRef

28.

Alipour, A., Hindle, A., Stroulia, E.: A contextual approach towards more accurate duplicate bug report detection. In: 2013 10th Working Conference on Mining Software Repositories (MSR), pp. 183–192 (2013). https://doi.org/10.1109/MSR.2013.6624026

29.

Sharma, M., Bedi, P., Chaturvedi, K.K., Singh, V.B.: Predicting the priority of a reported bug using machine learning techniques and cross project validation. In: 2012 12th International Conference on Intelligent Systems Design and Applications (ISDA), pp. 539–545 (2012). https://doi.org/10.1109/ISDA.2012.6416595

30.

Peng, H., Bing, L., Yutao, M.: Towards cross-project defect prediction with imbalanced feature sets, p. 10 (2014). https://doi.org/10.48550/arXiv.1411.4228

Titel: Security bug reports classification using fasttext
verfasst von: Sultan S. Alqahtani
Publikationsdatum: 22.12.2023
Verlag: Springer Berlin Heidelberg
Erschienen in: International Journal of Information Security / Ausgabe 2/2024
Print ISSN: 1615-5262
Elektronische ISSN: 1615-5270
DOI: https://doi.org/10.1007/s10207-023-00793-w

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 2/2024

Radio frequency fingerprinting techniques for device identification: a survey

Vulnerability discovery based on source code patch commit mining: a systematic literature review

Cashing out crypto: state of practice in ransom payments

Cyberattack defense mechanism using deep learning techniques in software-defined networks

Intrusion detection for power grid: a review

A systematic mapping study on security for systems of systems

Premium Partner