Skip to main content
Top
Published in: Empirical Software Engineering 3/2021

01-05-2021

How to Better Distinguish Security Bug Reports (Using Dual Hyperparameter Optimization)

Authors: Rui Shu, Tianpei Xia, Jianfeng Chen, Laurie Williams, Tim Menzies

Published in: Empirical Software Engineering | Issue 3/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Background

In order that the general public is not vulnerable to hackers, security bug reports need to be handled by small groups of engineers before being widely discussed. But learning how to distinguish the security bug reports from other bug reports is challenging since they may occur rarely. Data mining methods that can find such scarce targets require extensive optimization effort.

Goal

The goal of this research is to aid practitioners as they struggle to optimize methods that try to distinguish between rare security bug reports and other bug reports.

Method

Our proposed method, called SWIFT, is a dual optimizer that optimizes both learner and pre-processor options. Since this is a large space of options, SWIFT uses a technique called 𝜖-dominance that learns how to avoid operations that do not significantly improve performance.

Result

When compared to recent state-of-the-art results (from FARSEC which is published in TSE’18), we find that the SWIFT’s dual optimization of both pre-processor and learner is more useful than optimizing each of them individually. For example, in a study of security bug reports from the Chromium dataset, the median recalls of FARSEC and SWIFT were 15.7% and 77.4%, respectively. For another example, in experiments with data from the Ambari project, the median recalls improved from 21.5% to 85.7% (FARSEC to SWIFT).

Conclusion

Overall, our approach can quickly optimize models that achieve better recalls than the prior state-of-the-art. These increases in recall are associated with moderate increases in false positive rates (from 8% to 24%, median). For future work, these results suggest that dual optimization is both practical and useful.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Footnotes
1
e.g. Fig. 12 of Menzies et al. (2007c) lists nine SE data mining applications with median false positive rates of 25%.
 
Literature
go back to reference Agrawal A, Menzies T (2018) Is “Better Data” Better than “Better Data Miner”? (on the benefits of tuning SMOTE for defect prediction). In: Proceedings of the 40th international conference on software engineering, ACM, pp 1050–1061 Agrawal A, Menzies T (2018) Is “Better Data” Better than “Better Data Miner”? (on the benefits of tuning SMOTE for defect prediction). In: Proceedings of the 40th international conference on software engineering, ACM, pp 1050–1061
go back to reference Agrawal A, Fu W, Menzies T (2018) What is wrong with topic modeling? and how to fix it using search-based software engineering. Inf Softw Technol 98:74–88CrossRef Agrawal A, Fu W, Menzies T (2018) What is wrong with topic modeling? and how to fix it using search-based software engineering. Inf Softw Technol 98:74–88CrossRef
go back to reference Agrawal A, Fu W, Chen D, Shen X, Menzies T (2019) How to “DODGE” complex software analytics. IEEE Trans Softw Eng Agrawal A, Fu W, Chen D, Shen X, Menzies T (2019) How to “DODGE” complex software analytics. IEEE Trans Softw Eng
go back to reference Arcuri A, Briand L (2011) A practical guide for using statistical tests to assess randomized algorithms in software engineering. In: Proceedings of the 33rd international conference on software engineering ICSE ’11. https://doi.org/10.1145/1985793.1985795. ACM, New York, pp 1–10 Arcuri A, Briand L (2011) A practical guide for using statistical tests to assess randomized algorithms in software engineering. In: Proceedings of the 33rd international conference on software engineering ICSE ’11. https://​doi.​org/​10.​1145/​1985793.​1985795. ACM, New York, pp 1–10
go back to reference Bennin KE, Keung JW, Monden A (2019) On the relative value of data resampling approaches for software defect prediction. Empir Softw Eng 24 (2):602–636CrossRef Bennin KE, Keung JW, Monden A (2019) On the relative value of data resampling approaches for software defect prediction. Empir Softw Eng 24 (2):602–636CrossRef
go back to reference Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(Feb):281–305MathSciNetMATH Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(Feb):281–305MathSciNetMATH
go back to reference Bergstra JS, Bardenet R, Bengio Y, Kégl B (2011) Algorithms for hyper-parameter optimization. In: Advances in neural information processing systems, pp 2546–2554 Bergstra JS, Bardenet R, Bengio Y, Kégl B (2011) Algorithms for hyper-parameter optimization. In: Advances in neural information processing systems, pp 2546–2554
go back to reference Biedenkapp A, Eggensperger K, Elsken T, Falkner S, Feurer M, Gargiani M, Hutter F, Klein A, Lindauer M, Loshchilov I et al (2018) Hyperparameter optimization. Artif Intell 1:35 Biedenkapp A, Eggensperger K, Elsken T, Falkner S, Feurer M, Gargiani M, Hutter F, Klein A, Lindauer M, Loshchilov I et al (2018) Hyperparameter optimization. Artif Intell 1:35
go back to reference Binkley D, Lawrie D, Morrell C (2018) The need for software specific natural language techniques. Empir Softw Eng 23(4):2398–2425CrossRef Binkley D, Lawrie D, Morrell C (2018) The need for software specific natural language techniques. Empir Softw Eng 23(4):2398–2425CrossRef
go back to reference Black PE, Badger L, Guttman B, Fong E (2016) Dramatically reducing software vulnerabilities. Report to the White House Office of Science and Technology Policy, Information Technology Laboratory Black PE, Badger L, Guttman B, Fong E (2016) Dramatically reducing software vulnerabilities. Report to the White House Office of Science and Technology Policy, Information Technology Laboratory
go back to reference Chan S, Treleaven P, Capra L (2013) Continuous hyperparameter optimization for large-scale recommender systems. In: 2013 IEEE international conference on big data, IEEE, pp 350–358 Chan S, Treleaven P, Capra L (2013) Continuous hyperparameter optimization for large-scale recommender systems. In: 2013 IEEE international conference on big data, IEEE, pp 350–358
go back to reference Chen L et al (2013) R2fix: automatically generating bug fixes from bug reports. Proceedings of the 2013 IEEE 6th ICST Chen L et al (2013) R2fix: automatically generating bug fixes from bug reports. Proceedings of the 2013 IEEE 6th ICST
go back to reference Deb K, Mohan M, Mishra S (2005) Evaluating the ε-domination based multi-objective evolutionary algorithm for a quick computation of pareto-optimal solutions. Evol Comput 13(4):501–525CrossRef Deb K, Mohan M, Mishra S (2005) Evaluating the ε-domination based multi-objective evolutionary algorithm for a quick computation of pareto-optimal solutions. Evol Comput 13(4):501–525CrossRef
go back to reference Deshmukh J, Podder S, Sengupta S, Dubash N, et al. (2017) Towards accurate duplicate bug retrieval using deep learning techniques. In: 2017 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 115–124 Deshmukh J, Podder S, Sengupta S, Dubash N, et al. (2017) Towards accurate duplicate bug retrieval using deep learning techniques. In: 2017 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 115–124
go back to reference Di Francescomarino C, Dumas M, Federici M, Ghidini C, Maggi F M, Rizzi W, Simonetto L (2018) Genetic algorithms for hyperparameter optimization in predictive business process monitoring. Inf Syst 74:67–83CrossRef Di Francescomarino C, Dumas M, Federici M, Ghidini C, Maggi F M, Rizzi W, Simonetto L (2018) Genetic algorithms for hyperparameter optimization in predictive business process monitoring. Inf Syst 74:67–83CrossRef
go back to reference Efron B, Tibshirani RJ (1994) An introduction to the bootstrap. CRC Press, Boca RatonCrossRef Efron B, Tibshirani RJ (1994) An introduction to the bootstrap. CRC Press, Boca RatonCrossRef
go back to reference Feurer M, Springenberg JT, Hutter F (2015) Initializing bayesian hyperparameter optimization via meta-learning. In: Twenty-Ninth AAAI conference on artificial intelligence Feurer M, Springenberg JT, Hutter F (2015) Initializing bayesian hyperparameter optimization via meta-learning. In: Twenty-Ninth AAAI conference on artificial intelligence
go back to reference Fu W, Menzies T (2017) Easy over hard: A case study on deep learning. In: Proceedings of the 2017 11th joint meeting on foundations of software engineering. ACM, pp 49–60 Fu W, Menzies T (2017) Easy over hard: A case study on deep learning. In: Proceedings of the 2017 11th joint meeting on foundations of software engineering. ACM, pp 49–60
go back to reference Fu W, Menzies T, Shen X (2016) Tuning for software analytics: is it really necessary? Inf Softw Technol 76:135–146CrossRef Fu W, Menzies T, Shen X (2016) Tuning for software analytics: is it really necessary? Inf Softw Technol 76:135–146CrossRef
go back to reference Gegick M, Rotella P, Xie T (2010) Identifying security bug reports via text mining: An industrial case study. In: 2010 7th IEEE working conference on mining software repositories (MSR). IEEE, pp 11–20 Gegick M, Rotella P, Xie T (2010) Identifying security bug reports via text mining: An industrial case study. In: 2010 7th IEEE working conference on mining software repositories (MSR). IEEE, pp 11–20
go back to reference Goldberg DE (2006) Genetic algorithms. Pearson Education India Goldberg DE (2006) Genetic algorithms. Pearson Education India
go back to reference Goseva-Popstojanova K, Tyo J (2018) Identification of security related bug reports via text mining using supervised and unsupervised classification. In: 2018 IEEE international conference on software quality, reliability and security (QRS). IEEE, pp 344–355 Goseva-Popstojanova K, Tyo J (2018) Identification of security related bug reports via text mining using supervised and unsupervised classification. In: 2018 IEEE international conference on software quality, reliability and security (QRS). IEEE, pp 344–355
go back to reference Graham P (2004) Hackers & painters: big ideas from the computer age. O’Reilly Media, Inc Graham P (2004) Hackers & painters: big ideas from the computer age. O’Reilly Media, Inc
go back to reference Han X, Yu T, Lo D (2018) Perflearner: learning from bug reports to understand and generate performance test frames. In: Proceedings of the 33rd ACM/IEEE international conference on automated software engineering. ACM, pp 17–28 Han X, Yu T, Lo D (2018) Perflearner: learning from bug reports to understand and generate performance test frames. In: Proceedings of the 33rd ACM/IEEE international conference on automated software engineering. ACM, pp 17–28
go back to reference Herodotou H, Lim H, Luo G, Borisov N, Dong L, Cetin FB, Babu S (2011) Starfish: a self-tuning system for big data analytics. In: Cidr, vol 11, pp 261–272 Herodotou H, Lim H, Luo G, Borisov N, Dong L, Cetin FB, Babu S (2011) Starfish: a self-tuning system for big data analytics. In: Cidr, vol 11, pp 261–272
go back to reference Hindle A, Alipour A, Stroulia E (2016) A contextual approach towards more accurate duplicate bug report detection and ranking. Empir Softw Eng 21 (2):368–410CrossRef Hindle A, Alipour A, Stroulia E (2016) A contextual approach towards more accurate duplicate bug report detection and ranking. Empir Softw Eng 21 (2):368–410CrossRef
go back to reference Huang Q, Xia X, Lo D (2017) Supervised vs unsupervised models: A holistic look at effort-aware just-in-time defect prediction. In: 2017 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 159–170 Huang Q, Xia X, Lo D (2017) Supervised vs unsupervised models: A holistic look at effort-aware just-in-time defect prediction. In: 2017 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 159–170
go back to reference Huang Q, Xia X, Lo D (2019) Revisiting supervised and unsupervised models for effort-aware just-in-time defect prediction. Empir Softw Eng 24 (5):2823–2862CrossRef Huang Q, Xia X, Lo D (2019) Revisiting supervised and unsupervised models for effort-aware just-in-time defect prediction. Empir Softw Eng 24 (5):2823–2862CrossRef
go back to reference Jalali O, Menzies T, Feather M (2008) Optimizing requirements decisions with keys. In: Proceedings of the 4th international workshop on predictor models in software engineering. ACM, pp 79–86 Jalali O, Menzies T, Feather M (2008) Optimizing requirements decisions with keys. In: Proceedings of the 4th international workshop on predictor models in software engineering. ACM, pp 79–86
go back to reference Kampenes VB, Dybå T, Hannay JE, Sjøberg DIK (2007) A systematic review of effect size in software engineering experiments. Inf Softw Technol 49(11–12):1073–1086CrossRef Kampenes VB, Dybå T, Hannay JE, Sjøberg DIK (2007) A systematic review of effect size in software engineering experiments. Inf Softw Technol 49(11–12):1073–1086CrossRef
go back to reference Keller JM, Gray MR, Givens JA (1985) A fuzzy k-nearest neighbor algorithm. IEEE Trans Sys Man Cybern (4)580–585 Keller JM, Gray MR, Givens JA (1985) A fuzzy k-nearest neighbor algorithm. IEEE Trans Sys Man Cybern (4)580–585
go back to reference Kim S, Zhang H, Wu R, Gong L (2011) Dealing with noise in defect prediction. In: 2011 33rd international conference on software engineering (ICSE). IEEE, pp 481–490 Kim S, Zhang H, Wu R, Gong L (2011) Dealing with noise in defect prediction. In: 2011 33rd international conference on software engineering (ICSE). IEEE, pp 481–490
go back to reference Kochhar PS, Xia X, Lo D, Li S (2016) Practitioners’ expectations on automated fault localization. In: Proceedings of the 25th international symposium on software testing and analysis. ACM, pp 165–176 Kochhar PS, Xia X, Lo D, Li S (2016) Practitioners’ expectations on automated fault localization. In: Proceedings of the 25th international symposium on software testing and analysis. ACM, pp 165–176
go back to reference Lamkanfi A, Demeyer S, Giger E, Goethals B (2010) Predicting the severity of a reported bug. In: 2010 7th IEEE working conference on mining software repositories (MSR). IEEE, pp 1–10 Lamkanfi A, Demeyer S, Giger E, Goethals B (2010) Predicting the severity of a reported bug. In: 2010 7th IEEE working conference on mining software repositories (MSR). IEEE, pp 1–10
go back to reference Lazar A, Ritchey S, Sharif B (2014) Improving the accuracy of duplicate bug report detection using textual similarity measures. In: Proceedings of the 11th working conference on mining software repositories. ACM, pp 308–311 Lazar A, Ritchey S, Sharif B (2014) Improving the accuracy of duplicate bug report detection using textual similarity measures. In: Proceedings of the 11th working conference on mining software repositories. ACM, pp 308–311
go back to reference Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans Softw Eng 34(4):485–496CrossRef Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans Softw Eng 34(4):485–496CrossRef
go back to reference Li L, Jamieson K, DeSalvo G, Rostamizadeh A, Talwalkar A (2017) Hyperband: a novel bandit-based approach to hyperparameter optimization. J Mach Learn Res 18(1):6765–6816MathSciNetMATH Li L, Jamieson K, DeSalvo G, Rostamizadeh A, Talwalkar A (2017) Hyperband: a novel bandit-based approach to hyperparameter optimization. J Mach Learn Res 18(1):6765–6816MathSciNetMATH
go back to reference Menzies T, Shepperd M (2019) “Bad smells” in software analytics papers. Inf Softw Technol 112:35–47CrossRef Menzies T, Shepperd M (2019) “Bad smells” in software analytics papers. Inf Softw Technol 112:35–47CrossRef
go back to reference Menzies T, Greenwald J, Frank A (2006) Data mining static code attributes to learn defect predictors. IEEE Trans Softw Eng 33(1):2–13CrossRef Menzies T, Greenwald J, Frank A (2006) Data mining static code attributes to learn defect predictors. IEEE Trans Softw Eng 33(1):2–13CrossRef
go back to reference Menzies T, Dekhtyar A, Distefano J, Greenwald J (2007a) Problems with precision: a response to” comments on’data mining static code attributes to learn defect predictors’”. IEEE Trans Softw Eng 33(9):637–640CrossRef Menzies T, Dekhtyar A, Distefano J, Greenwald J (2007a) Problems with precision: a response to” comments on’data mining static code attributes to learn defect predictors’”. IEEE Trans Softw Eng 33(9):637–640CrossRef
go back to reference Menzies T, Elrawas O, Hihn J, Feather M, Madachy R, Boehm B (2007b) The business case for automated software engineering. In: Proceedings of the Twenty-second IEEE/ACM international conference on automated software engineering ASE ’07. https://doi.org/10.1145/1321631.1321676. ACM, New York, pp 303–312 Menzies T, Elrawas O, Hihn J, Feather M, Madachy R, Boehm B (2007b) The business case for automated software engineering. In: Proceedings of the Twenty-second IEEE/ACM international conference on automated software engineering ASE ’07. https://​doi.​org/​10.​1145/​1321631.​1321676. ACM, New York, pp 303–312
go back to reference Menzies T, Greenwald J, Frank A (2007c) Data mining static code attributes to learn defect predictors. IEEE Trans Softw Engineering (1) 2–13 Menzies T, Greenwald J, Frank A (2007c) Data mining static code attributes to learn defect predictors. IEEE Trans Softw Engineering (1) 2–13
go back to reference Menzies T, Majumder S, Balaji N, Brey K, Fu W (2018) 500+ times faster than deep learning:(a case study exploring faster methods for text mining stackoverflow). In: 2018 IEEE/ACM 15Th international conference on mining software repositories (MSR). IEEE, pp 554–563 Menzies T, Majumder S, Balaji N, Brey K, Fu W (2018) 500+ times faster than deep learning:(a case study exploring faster methods for text mining stackoverflow). In: 2018 IEEE/ACM 15Th international conference on mining software repositories (MSR). IEEE, pp 554–563
go back to reference Mittas N, Angelis L (2013) Ranking and clustering software cost estimation models through a multiple comparisons algorithm. IEEE Trans Softw Eng 39(4):537–551CrossRef Mittas N, Angelis L (2013) Ranking and clustering software cost estimation models through a multiple comparisons algorithm. IEEE Trans Softw Eng 39(4):537–551CrossRef
go back to reference Nair V, Yu Z, Menzies T, Siegmund N, Apel S (2018) Finding faster configurations using flash. IEEE Trans Softw Eng Nair V, Yu Z, Menzies T, Siegmund N, Apel S (2018) Finding faster configurations using flash. IEEE Trans Softw Eng
go back to reference Neuhaus S, Zimmermann T (2009) The beauty and the beast: vulnerabilities in red hat’s packages. In: USENIX annual technical conference Neuhaus S, Zimmermann T (2009) The beauty and the beast: vulnerabilities in red hat’s packages. In: USENIX annual technical conference
go back to reference Neuhaus S, Zimmermann T, Holler C, Zeller A (2007) Predicting vulnerable software components. In: Proceedings of the 14th ACM conference on computer and communications security. ACM, pp 529–540 Neuhaus S, Zimmermann T, Holler C, Zeller A (2007) Predicting vulnerable software components. In: Proceedings of the 14th ACM conference on computer and communications security. ACM, pp 529–540
go back to reference Nguyen VH, Tran LMS (2010) Predicting vulnerable software components with dependency graphs. In: Proceedings of the 6th international workshop on security measurements and metrics. ACM, p 3 Nguyen VH, Tran LMS (2010) Predicting vulnerable software components with dependency graphs. In: Proceedings of the 6th international workshop on security measurements and metrics. ACM, p 3
go back to reference Novielli N, Girardi D, Lanubile F (2018) A benchmark study on sentiment analysis for software engineering research. In: 2018 IEEE/ACM 15Th international conference on mining software repositories (MSR). IEEE, pp 364–375 Novielli N, Girardi D, Lanubile F (2018) A benchmark study on sentiment analysis for software engineering research. In: 2018 IEEE/ACM 15Th international conference on mining software repositories (MSR). IEEE, pp 364–375
go back to reference Ohira M, Kashiwa Y, Yamatani Y, Yoshiyuki H, Maeda Y, Limsettho N, Fujino K, Hata H, Ihara A, Matsumoto K (2015) A dataset of high impact bugs: manually-classified issue reports. In: 2015 IEEE/ACM 12th working conference on mining software repositories (MSR). IEEE, pp 518–521 Ohira M, Kashiwa Y, Yamatani Y, Yoshiyuki H, Maeda Y, Limsettho N, Fujino K, Hata H, Ihara A, Matsumoto K (2015) A dataset of high impact bugs: manually-classified issue reports. In: 2015 IEEE/ACM 12th working conference on mining software repositories (MSR). IEEE, pp 518–521
go back to reference Onan A, Korukoğlu S, Bulut H (2016) A multiobjective weighted voting ensemble classifier based on differential evolution algorithm for text sentiment classification. Expert Syst Appl 62:1–16CrossRef Onan A, Korukoğlu S, Bulut H (2016) A multiobjective weighted voting ensemble classifier based on differential evolution algorithm for text sentiment classification. Expert Syst Appl 62:1–16CrossRef
go back to reference Osman H, Ghafari M, Nierstrasz O (2017) Hyperparameter optimization to improve bug prediction accuracy. In: IEEE workshop on machine learning techniques for software quality evaluation (maLTeSQue). IEEE, pp 33–38 Osman H, Ghafari M, Nierstrasz O (2017) Hyperparameter optimization to improve bug prediction accuracy. In: IEEE workshop on machine learning techniques for software quality evaluation (maLTeSQue). IEEE, pp 33–38
go back to reference Panichella A, Dit B, Oliveto R, Di Penta M, Poshyvanyk D, De Lucia A (2013) How to effectively use topic models for software engineering tasks? An approach based on genetic algorithms. In: International conference on software engineering Panichella A, Dit B, Oliveto R, Di Penta M, Poshyvanyk D, De Lucia A (2013) How to effectively use topic models for software engineering tasks? An approach based on genetic algorithms. In: International conference on software engineering
go back to reference Parnin C, Orso A (2011) Are automated debugging techniques actually helping programmers?. In: Proceedings of the 2011 international symposium on software testing and analysis. ACM, pp 199–209 Parnin C, Orso A (2011) Are automated debugging techniques actually helping programmers?. In: Proceedings of the 2011 international symposium on software testing and analysis. ACM, pp 199–209
go back to reference Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830MathSciNetMATH Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830MathSciNetMATH
go back to reference Peters F, Tun T, Yu Y, Nuseibeh B (2018) Text filtering and ranking for security bug report prediction. IEEE Trans Softw Eng:Early–Access Peters F, Tun T, Yu Y, Nuseibeh B (2018) Text filtering and ranking for security bug report prediction. IEEE Trans Softw Eng:Early–Access
go back to reference Scandariato R, Walden J, Hovsepyan A, Joosen W (2014) Predicting vulnerable software components via text mining. IEEE Trans Softw Eng 40(10):993–1006CrossRef Scandariato R, Walden J, Hovsepyan A, Joosen W (2014) Predicting vulnerable software components via text mining. IEEE Trans Softw Eng 40(10):993–1006CrossRef
go back to reference Storn R, Price K (1997) Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. J Glob Optim 11(4):341–359MathSciNetCrossRef Storn R, Price K (1997) Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. J Glob Optim 11(4):341–359MathSciNetCrossRef
go back to reference Sun C, Lo D, Khoo SC, Jiang J (2011) Towards more accurate retrieval of duplicate bug reports. In: Proceedings of the 2011 26th IEEE/ACM international conference on automated software engineering. IEEE Computer Society, pp 253–262 Sun C, Lo D, Khoo SC, Jiang J (2011) Towards more accurate retrieval of duplicate bug reports. In: Proceedings of the 2011 26th IEEE/ACM international conference on automated software engineering. IEEE Computer Society, pp 253–262
go back to reference Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2016) Automated parameter optimization of classification techniques for defect prediction models. In: 2016 IEEE/ACM 38th international conference on software engineering (ICSE). IEEE, pp 321–332 Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2016) Automated parameter optimization of classification techniques for defect prediction models. In: 2016 IEEE/ACM 38th international conference on software engineering (ICSE). IEEE, pp 321–332
go back to reference Tantithamthavorn C, Hassan AE, Matsumoto K (2018) The impact of class rebalancing techniques on the performance and interpretation of defect prediction models. IEEE Trans Softw Eng Tantithamthavorn C, Hassan AE, Matsumoto K (2018) The impact of class rebalancing techniques on the performance and interpretation of defect prediction models. IEEE Trans Softw Eng
go back to reference Thornton C, Hutter F, Hoos HH, Leyton-Brown K (2013) Auto-weka: combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 847–855 Thornton C, Hutter F, Hoos HH, Leyton-Brown K (2013) Auto-weka: combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 847–855
go back to reference Tian Y, Lo D, Sun C (2012) Information retrieval based nearest neighbor classification for fine-grained bug severity prediction. In: 2012 19th working conference on reverse engineering. IEEE, pp 215–224 Tian Y, Lo D, Sun C (2012) Information retrieval based nearest neighbor classification for fine-grained bug severity prediction. In: 2012 19th working conference on reverse engineering. IEEE, pp 215–224
go back to reference Tian Y, Lo D, Xia X, Sun C (2015) Automated prediction of bug report priority using multi-factor analysis. Empir Softw Eng 20(5):1354–1383CrossRef Tian Y, Lo D, Xia X, Sun C (2015) Automated prediction of bug report priority using multi-factor analysis. Empir Softw Eng 20(5):1354–1383CrossRef
go back to reference Van Aken D, Pavlo A, Gordon GJ, Zhang B (2017) Automatic database management system tuning through large-scale machine learning. In: Proceedings of the 2017 ACM international conference on management of data. ACM, pp 1009–1024 Van Aken D, Pavlo A, Gordon GJ, Zhang B (2017) Automatic database management system tuning through large-scale machine learning. In: Proceedings of the 2017 ACM international conference on management of data. ACM, pp 1009–1024
go back to reference Vesterstrøm J, Thomsen R (2004) A comparative study of differential evolution, particle swarm optimization, and evolutionary algorithms on numerical benchmark problems. In: Congress on evolutionary computation. IEEE Vesterstrøm J, Thomsen R (2004) A comparative study of differential evolution, particle swarm optimization, and evolutionary algorithms on numerical benchmark problems. In: Congress on evolutionary computation. IEEE
go back to reference Wang L, Zeng Y, Chen T (2015) Back propagation neural network with adaptive differential evolution algorithm for time series forecasting. Expert Syst Appl 42(2):855–863CrossRef Wang L, Zeng Y, Chen T (2015) Back propagation neural network with adaptive differential evolution algorithm for time series forecasting. Expert Syst Appl 42(2):855–863CrossRef
go back to reference Wang Y, Xu W (2018) Leveraging deep learning with lda-based text analytics to detect automobile insurance fraud. Decis Support Syst 105:87–95CrossRef Wang Y, Xu W (2018) Leveraging deep learning with lda-based text analytics to detect automobile insurance fraud. Decis Support Syst 105:87–95CrossRef
go back to reference Wijayasekara D, Manic M, McQueen M (2014) Vulnerability identification and classification via text mining bug databases. In: IECON 2014-40th annual conference of the IEEE industrial electronics society. IEEE, pp 3612–3618 Wijayasekara D, Manic M, McQueen M (2014) Vulnerability identification and classification via text mining bug databases. In: IECON 2014-40th annual conference of the IEEE industrial electronics society. IEEE, pp 3612–3618
go back to reference Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Trans Evol Comput 1(1):67–82CrossRef Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Trans Evol Comput 1(1):67–82CrossRef
go back to reference Xia X, Lo D, Qiu W, Wang X, Zhou B (2014) Automated configuration bug report prediction using text mining. In: 2014 IEEE 38Th annual computer software and applications conference (COMPSAC). IEEE, pp 107–116 Xia X, Lo D, Qiu W, Wang X, Zhou B (2014) Automated configuration bug report prediction using text mining. In: 2014 IEEE 38Th annual computer software and applications conference (COMPSAC). IEEE, pp 107–116
go back to reference Xia X, Lo D, Shihab E, Wang X (2016) Automated bug report field reassignment and refinement prediction. IEEE Trans Reliab 65 (3):1094–1113CrossRef Xia X, Lo D, Shihab E, Wang X (2016) Automated bug report field reassignment and refinement prediction. IEEE Trans Reliab 65 (3):1094–1113CrossRef
go back to reference Xia Y, Liu C, Li Y, Liu N (2017) A boosted decision tree approach using bayesian hyper-parameter optimization for credit scoring. Expert Syst Appl 78:225–241CrossRef Xia Y, Liu C, Li Y, Liu N (2017) A boosted decision tree approach using bayesian hyper-parameter optimization for credit scoring. Expert Syst Appl 78:225–241CrossRef
go back to reference Yang X, Lo D, Huang Q, Xia X, Sun J (2016) Automated identification of high impact bug reports leveraging imbalanced learning strategies. In: 2016 IEEE 40Th annual computer software and applications conference (COMPSAC), vol 1. IEEE, pp 227–232 Yang X, Lo D, Huang Q, Xia X, Sun J (2016) Automated identification of high impact bug reports leveraging imbalanced learning strategies. In: 2016 IEEE 40Th annual computer software and applications conference (COMPSAC), vol 1. IEEE, pp 227–232
go back to reference Yang XL, Lo D, Xia X, Huang Q, Sun JL (2017) High-impact bug report identification with imbalanced learning strategies. J Comput Sci Technol 32(1):181–198CrossRef Yang XL, Lo D, Xia X, Huang Q, Sun JL (2017) High-impact bug report identification with imbalanced learning strategies. J Comput Sci Technol 32(1):181–198CrossRef
go back to reference Yildizdan G, Baykan ÖK (2020) A novel modified bat algorithm hybridizing by differential evolution algorithm. Expert Syst Appl 141:112949CrossRef Yildizdan G, Baykan ÖK (2020) A novel modified bat algorithm hybridizing by differential evolution algorithm. Expert Syst Appl 141:112949CrossRef
go back to reference Zaman S, Adams B, Hassan AE (2011) Security versus performance bugs: a case study on firefox. In: Proceedings of the 8th working conference on mining software repositories. ACM, pp 93–102 Zaman S, Adams B, Hassan AE (2011) Security versus performance bugs: a case study on firefox. In: Proceedings of the 8th working conference on mining software repositories. ACM, pp 93–102
go back to reference Zhang T, Yang G, Lee B, Chan AT (2015) Predicting severity of bug report by mining bug repository with concept profile. In: Proceedings of the 30th annual ACM symposium on applied computing. ACM, pp 1553–1558 Zhang T, Yang G, Lee B, Chan AT (2015) Predicting severity of bug report by mining bug repository with concept profile. In: Proceedings of the 30th annual ACM symposium on applied computing. ACM, pp 1553–1558
go back to reference Zhou Y, Sharma A (2017) Automated identification of security issues from commit messages and bug reports. In: Proceedings of the 2017 11th joint meeting on foundations of software engineering, pp 914–919 Zhou Y, Sharma A (2017) Automated identification of security issues from commit messages and bug reports. In: Proceedings of the 2017 11th joint meeting on foundations of software engineering, pp 914–919
go back to reference Zhou Y, Tong Y, Gu R, Gall H (2016) Combining text mining and data mining for bug report classification. J Softw Evol Process 28(3):150–176CrossRef Zhou Y, Tong Y, Gu R, Gall H (2016) Combining text mining and data mining for bug report classification. J Softw Evol Process 28(3):150–176CrossRef
Metadata
Title
How to Better Distinguish Security Bug Reports (Using Dual Hyperparameter Optimization)
Authors
Rui Shu
Tianpei Xia
Jianfeng Chen
Laurie Williams
Tim Menzies
Publication date
01-05-2021
Publisher
Springer US
Published in
Empirical Software Engineering / Issue 3/2021
Print ISSN: 1382-3256
Electronic ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-020-09906-8

Other articles of this Issue 3/2021

Empirical Software Engineering 3/2021 Go to the issue

Premium Partner