Top

Empirical Software Engineering

Published in:

01-05-2021

How to Better Distinguish Security Bug Reports (Using Dual Hyperparameter Optimization)

Authors: Rui Shu, Tianpei Xia, Jianfeng Chen, Laurie Williams, Tim Menzies

Published in: Empirical Software Engineering | Issue 3/2021

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Background

In order that the general public is not vulnerable to hackers, security bug reports need to be handled by small groups of engineers before being widely discussed. But learning how to distinguish the security bug reports from other bug reports is challenging since they may occur rarely. Data mining methods that can find such scarce targets require extensive optimization effort.

Goal

The goal of this research is to aid practitioners as they struggle to optimize methods that try to distinguish between rare security bug reports and other bug reports.

Method

Our proposed method, called SWIFT, is a dual optimizer that optimizes both learner and pre-processor options. Since this is a large space of options, SWIFT uses a technique called 𝜖-dominance that learns how to avoid operations that do not significantly improve performance.

Result

When compared to recent state-of-the-art results (from FARSEC which is published in TSE’18), we find that the SWIFT’s dual optimization of both pre-processor and learner is more useful than optimizing each of them individually. For example, in a study of security bug reports from the Chromium dataset, the median recalls of FARSEC and SWIFT were 15.7% and 77.4%, respectively. For another example, in experiments with data from the Ambari project, the median recalls improved from 21.5% to 85.7% (FARSEC to SWIFT).

Conclusion

Overall, our approach can quickly optimize models that achieve better recalls than the prior state-of-the-art. These increases in recall are associated with moderate increases in false positive rates (from 8% to 24%, median). For future work, these results suggest that dual optimization is both practical and useful.

previous article Testing self-healing cyber-physical systems under uncertainty with reinforcement learning: an empirical study

next article Flair: efficient analysis of Android inter-component vulnerabilities in response to incremental changes

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

e.g. Fig. 12 of Menzies et al. (2007c) lists nine SE data mining applications with median false positive rates of 25%.

Agrawal A, Menzies T (2018) Is “Better Data” Better than “Better Data Miner”? (on the benefits of tuning SMOTE for defect prediction). In: Proceedings of the 40th international conference on software engineering, ACM, pp 1050–1061

Agrawal A, Fu W, Menzies T (2018) What is wrong with topic modeling? and how to fix it using search-based software engineering. Inf Softw Technol 98:74–88CrossRef

Agrawal A, Fu W, Chen D, Shen X, Menzies T (2019) How to “DODGE” complex software analytics. IEEE Trans Softw Eng

Arcuri A, Briand L (2011) A practical guide for using statistical tests to assess randomized algorithms in software engineering. In: Proceedings of the 33rd international conference on software engineering ICSE ’11. https://doi.org/10.1145/1985793.1985795. ACM, New York, pp 1–10

Bennin KE, Keung JW, Monden A (2019) On the relative value of data resampling approaches for software defect prediction. Empir Softw Eng 24 (2):602–636CrossRef

Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(Feb):281–305MathSciNetMATH

Bergstra JS, Bardenet R, Bengio Y, Kégl B (2011) Algorithms for hyper-parameter optimization. In: Advances in neural information processing systems, pp 2546–2554

Biedenkapp A, Eggensperger K, Elsken T, Falkner S, Feurer M, Gargiani M, Hutter F, Klein A, Lindauer M, Loshchilov I et al (2018) Hyperparameter optimization. Artif Intell 1:35

Binkley D, Lawrie D, Morrell C (2018) The need for software specific natural language techniques. Empir Softw Eng 23(4):2398–2425CrossRef

Black PE, Badger L, Guttman B, Fong E (2016) Dramatically reducing software vulnerabilities. Report to the White House Office of Science and Technology Policy, Information Technology Laboratory

Chan S, Treleaven P, Capra L (2013) Continuous hyperparameter optimization for large-scale recommender systems. In: 2013 IEEE international conference on big data, IEEE, pp 350–358

Chen L et al (2013) R2fix: automatically generating bug fixes from bug reports. Proceedings of the 2013 IEEE 6th ICST

Deb K, Mohan M, Mishra S (2005) Evaluating the ε-domination based multi-objective evolutionary algorithm for a quick computation of pareto-optimal solutions. Evol Comput 13(4):501–525CrossRef

Deshmukh J, Podder S, Sengupta S, Dubash N, et al. (2017) Towards accurate duplicate bug retrieval using deep learning techniques. In: 2017 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 115–124

Di Francescomarino C, Dumas M, Federici M, Ghidini C, Maggi F M, Rizzi W, Simonetto L (2018) Genetic algorithms for hyperparameter optimization in predictive business process monitoring. Inf Syst 74:67–83CrossRef

Efron B, Tibshirani RJ (1994) An introduction to the bootstrap. CRC Press, Boca RatonCrossRef

Feurer M, Springenberg JT, Hutter F (2015) Initializing bayesian hyperparameter optimization via meta-learning. In: Twenty-Ninth AAAI conference on artificial intelligence

Fu W, Menzies T (2017) Easy over hard: A case study on deep learning. In: Proceedings of the 2017 11th joint meeting on foundations of software engineering. ACM, pp 49–60

Fu W, Menzies T, Shen X (2016) Tuning for software analytics: is it really necessary? Inf Softw Technol 76:135–146CrossRef

Gegick M, Rotella P, Xie T (2010) Identifying security bug reports via text mining: An industrial case study. In: 2010 7th IEEE working conference on mining software repositories (MSR). IEEE, pp 11–20

Goldberg DE (2006) Genetic algorithms. Pearson Education India

Goseva-Popstojanova K, Tyo J (2018) Identification of security related bug reports via text mining using supervised and unsupervised classification. In: 2018 IEEE international conference on software quality, reliability and security (QRS). IEEE, pp 344–355

Graham P (2004) Hackers & painters: big ideas from the computer age. O’Reilly Media, Inc

Han X, Yu T, Lo D (2018) Perflearner: learning from bug reports to understand and generate performance test frames. In: Proceedings of the 33rd ACM/IEEE international conference on automated software engineering. ACM, pp 17–28

Herodotou H, Lim H, Luo G, Borisov N, Dong L, Cetin FB, Babu S (2011) Starfish: a self-tuning system for big data analytics. In: Cidr, vol 11, pp 261–272

Hindle A, Alipour A, Stroulia E (2016) A contextual approach towards more accurate duplicate bug report detection and ranking. Empir Softw Eng 21 (2):368–410CrossRef

Holland JH (1992) Genetic algorithms. Sci Am 267(1):66–73CrossRef

Huang Q, Xia X, Lo D (2017) Supervised vs unsupervised models: A holistic look at effort-aware just-in-time defect prediction. In: 2017 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 159–170

Huang Q, Xia X, Lo D (2019) Revisiting supervised and unsupervised models for effort-aware just-in-time defect prediction. Empir Softw Eng 24 (5):2823–2862CrossRef

Jalali O, Menzies T, Feather M (2008) Optimizing requirements decisions with keys. In: Proceedings of the 4th international workshop on predictor models in software engineering. ACM, pp 79–86

Kampenes VB, Dybå T, Hannay JE, Sjøberg DIK (2007) A systematic review of effect size in software engineering experiments. Inf Softw Technol 49(11–12):1073–1086CrossRef

Keller JM, Gray MR, Givens JA (1985) A fuzzy k-nearest neighbor algorithm. IEEE Trans Sys Man Cybern (4)580–585

Kim S, Zhang H, Wu R, Gong L (2011) Dealing with noise in defect prediction. In: 2011 33rd international conference on software engineering (ICSE). IEEE, pp 481–490

Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science 220(4598):671–680MathSciNetCrossRef

Kochhar PS, Xia X, Lo D, Li S (2016) Practitioners’ expectations on automated fault localization. In: Proceedings of the 25th international symposium on software testing and analysis. ACM, pp 165–176

Lamkanfi A, Demeyer S, Giger E, Goethals B (2010) Predicting the severity of a reported bug. In: 2010 7th IEEE working conference on mining software repositories (MSR). IEEE, pp 1–10

Lazar A, Ritchey S, Sharif B (2014) Improving the accuracy of duplicate bug report detection using textual similarity measures. In: Proceedings of the 11th working conference on mining software repositories. ACM, pp 308–311

Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans Softw Eng 34(4):485–496CrossRef

Li L, Jamieson K, DeSalvo G, Rostamizadeh A, Talwalkar A (2017) Hyperband: a novel bandit-based approach to hyperparameter optimization. J Mach Learn Res 18(1):6765–6816MathSciNetMATH

Menzies T, Shepperd M (2019) “Bad smells” in software analytics papers. Inf Softw Technol 112:35–47CrossRef

Menzies T, Greenwald J, Frank A (2006) Data mining static code attributes to learn defect predictors. IEEE Trans Softw Eng 33(1):2–13CrossRef

Menzies T, Dekhtyar A, Distefano J, Greenwald J (2007a) Problems with precision: a response to” comments on’data mining static code attributes to learn defect predictors’”. IEEE Trans Softw Eng 33(9):637–640CrossRef

Menzies T, Elrawas O, Hihn J, Feather M, Madachy R, Boehm B (2007b) The business case for automated software engineering. In: Proceedings of the Twenty-second IEEE/ACM international conference on automated software engineering ASE ’07. https://doi.org/10.1145/1321631.1321676. ACM, New York, pp 303–312

Menzies T, Greenwald J, Frank A (2007c) Data mining static code attributes to learn defect predictors. IEEE Trans Softw Engineering (1) 2–13

Menzies T, Majumder S, Balaji N, Brey K, Fu W (2018) 500+ times faster than deep learning:(a case study exploring faster methods for text mining stackoverflow). In: 2018 IEEE/ACM 15Th international conference on mining software repositories (MSR). IEEE, pp 554–563

MITRE (2017) Common Vulnerabilities and Exposures (CVE). https://cve.mitre.org/about/terminology.html#vulnerability

Mittas N, Angelis L (2013) Ranking and clustering software cost estimation models through a multiple comparisons algorithm. IEEE Trans Softw Eng 39(4):537–551CrossRef

Nair V, Yu Z, Menzies T, Siegmund N, Apel S (2018) Finding faster configurations using flash. IEEE Trans Softw Eng

Neuhaus S, Zimmermann T (2009) The beauty and the beast: vulnerabilities in red hat’s packages. In: USENIX annual technical conference

Neuhaus S, Zimmermann T, Holler C, Zeller A (2007) Predicting vulnerable software components. In: Proceedings of the 14th ACM conference on computer and communications security. ACM, pp 529–540

Nguyen VH, Tran LMS (2010) Predicting vulnerable software components with dependency graphs. In: Proceedings of the 6th international workshop on security measurements and metrics. ACM, p 3

Novielli N, Girardi D, Lanubile F (2018) A benchmark study on sentiment analysis for software engineering research. In: 2018 IEEE/ACM 15Th international conference on mining software repositories (MSR). IEEE, pp 364–375

Ohira M, Kashiwa Y, Yamatani Y, Yoshiyuki H, Maeda Y, Limsettho N, Fujino K, Hata H, Ihara A, Matsumoto K (2015) A dataset of high impact bugs: manually-classified issue reports. In: 2015 IEEE/ACM 12th working conference on mining software repositories (MSR). IEEE, pp 518–521

Onan A, Korukoğlu S, Bulut H (2016) A multiobjective weighted voting ensemble classifier based on differential evolution algorithm for text sentiment classification. Expert Syst Appl 62:1–16CrossRef

Osman H, Ghafari M, Nierstrasz O (2017) Hyperparameter optimization to improve bug prediction accuracy. In: IEEE workshop on machine learning techniques for software quality evaluation (maLTeSQue). IEEE, pp 33–38

Panichella A, Dit B, Oliveto R, Di Penta M, Poshyvanyk D, De Lucia A (2013) How to effectively use topic models for software engineering tasks? An approach based on genetic algorithms. In: International conference on software engineering

Parnin C, Orso A (2011) Are automated debugging techniques actually helping programmers?. In: Proceedings of the 2011 international symposium on software testing and analysis. ACM, pp 199–209

Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830MathSciNetMATH

Peters F, Tun T, Yu Y, Nuseibeh B (2018) Text filtering and ranking for security bug report prediction. IEEE Trans Softw Eng:Early–Access

Scandariato R, Walden J, Hovsepyan A, Joosen W (2014) Predicting vulnerable software components via text mining. IEEE Trans Softw Eng 40(10):993–1006CrossRef

Storn R, Price K (1997) Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. J Glob Optim 11(4):341–359MathSciNetCrossRef

Sun C, Lo D, Khoo SC, Jiang J (2011) Towards more accurate retrieval of duplicate bug reports. In: Proceedings of the 2011 26th IEEE/ACM international conference on automated software engineering. IEEE Computer Society, pp 253–262

Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2016) Automated parameter optimization of classification techniques for defect prediction models. In: 2016 IEEE/ACM 38th international conference on software engineering (ICSE). IEEE, pp 321–332

Tantithamthavorn C, Hassan AE, Matsumoto K (2018) The impact of class rebalancing techniques on the performance and interpretation of defect prediction models. IEEE Trans Softw Eng

The Equifax Data Breach (2019) https://epic.org/privacy/data-breach/equifax/

Thornton C, Hutter F, Hoos HH, Leyton-Brown K (2013) Auto-weka: combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 847–855

Tian Y, Lo D, Sun C (2012) Information retrieval based nearest neighbor classification for fine-grained bug severity prediction. In: 2012 19th working conference on reverse engineering. IEEE, pp 215–224

Tian Y, Lo D, Xia X, Sun C (2015) Automated prediction of bug report priority using multi-factor analysis. Empir Softw Eng 20(5):1354–1383CrossRef

Van Aken D, Pavlo A, Gordon GJ, Zhang B (2017) Automatic database management system tuning through large-scale machine learning. In: Proceedings of the 2017 ACM international conference on management of data. ACM, pp 1009–1024

Vesterstrøm J, Thomsen R (2004) A comparative study of differential evolution, particle swarm optimization, and evolutionary algorithms on numerical benchmark problems. In: Congress on evolutionary computation. IEEE

Wang L, Zeng Y, Chen T (2015) Back propagation neural network with adaptive differential evolution algorithm for time series forecasting. Expert Syst Appl 42(2):855–863CrossRef

Wang Y, Xu W (2018) Leveraging deep learning with lda-based text analytics to detect automobile insurance fraud. Decis Support Syst 105:87–95CrossRef

WannaCry Ransomware Attack (2017) https://en.wikipedia.org/wiki/WannaCry_ransomware_attack

Wijayasekara D, Manic M, McQueen M (2014) Vulnerability identification and classification via text mining bug databases. In: IECON 2014-40th annual conference of the IEEE industrial electronics society. IEEE, pp 3612–3618

Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Trans Evol Comput 1(1):67–82CrossRef

Xia X, Lo D, Qiu W, Wang X, Zhou B (2014) Automated configuration bug report prediction using text mining. In: 2014 IEEE 38Th annual computer software and applications conference (COMPSAC). IEEE, pp 107–116

Xia X, Lo D, Shihab E, Wang X (2016) Automated bug report field reassignment and refinement prediction. IEEE Trans Reliab 65 (3):1094–1113CrossRef

Xia Y, Liu C, Li Y, Liu N (2017) A boosted decision tree approach using bayesian hyper-parameter optimization for credit scoring. Expert Syst Appl 78:225–241CrossRef

Yang X, Lo D, Huang Q, Xia X, Sun J (2016) Automated identification of high impact bug reports leveraging imbalanced learning strategies. In: 2016 IEEE 40Th annual computer software and applications conference (COMPSAC), vol 1. IEEE, pp 227–232

Yang XL, Lo D, Xia X, Huang Q, Sun JL (2017) High-impact bug report identification with imbalanced learning strategies. J Comput Sci Technol 32(1):181–198CrossRef

Yildizdan G, Baykan ÖK (2020) A novel modified bat algorithm hybridizing by differential evolution algorithm. Expert Syst Appl 141:112949CrossRef

Zaman S, Adams B, Hassan AE (2011) Security versus performance bugs: a case study on firefox. In: Proceedings of the 8th working conference on mining software repositories. ACM, pp 93–102

Zhang T, Yang G, Lee B, Chan AT (2015) Predicting severity of bug report by mining bug repository with concept profile. In: Proceedings of the 30th annual ACM symposium on applied computing. ACM, pp 1553–1558

Zhou Y, Sharma A (2017) Automated identification of security issues from commit messages and bug reports. In: Proceedings of the 2017 11th joint meeting on foundations of software engineering, pp 914–919

Zhou Y, Tong Y, Gu R, Gall H (2016) Combining text mining and data mining for bug report classification. J Softw Evol Process 28(3):150–176CrossRef

Title: How to Better Distinguish Security Bug Reports (Using Dual Hyperparameter Optimization)
Authors: Rui Shu
Tianpei Xia
Jianfeng Chen
Laurie Williams
Tim Menzies
Publication date: 01-05-2021
Publisher: Springer US
Published in: Empirical Software Engineering / Issue 3/2021
Print ISSN: 1382-3256
Electronic ISSN: 1573-7616
DOI: https://doi.org/10.1007/s10664-020-09906-8

Springer Professional

Abstract

Background

Goal

Method

Result

Conclusion

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Other articles of this Issue 3/2021

A privacy and security analysis of early-deployed COVID-19 contact tracing Android apps

Training students in evidence-based software engineering and systematic reviews: a systematic review and empirical study

Correction to: Wait for it: identifying “On-Hold” self-admitted technical debt

Correction to: An empirical assessment of baseline feature location techniques

Appreciation to Empirical Software Engineering reviewers of 2020

Release synchronization in software ecosystems

Premium Partner