Skip to main content
Erschienen in: Empirical Software Engineering 6/2022

01.11.2022

Predicting health indicators for open source projects (using hyperparameter optimization)

verfasst von: Tianpei Xia, Wei Fu, Rui Shu, Rishabh Agrawal, Tim Menzies

Erschienen in: Empirical Software Engineering | Ausgabe 6/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Software developed on public platform is a source of data that can be used to make predictions about those projects. While the individual developing activity may be random and hard to predict, the developing behavior on project level can be predicted with good accuracy when large groups of developers work together on software projects. To demonstrate this, we use 64,181 months of data from 1,159 GitHub projects to make various predictions about the recent status of those projects (as of April 2020). We find that traditional estimation algorithms make many mistakes. Algorithms like k-nearest neighbors (KNN), support vector regression (SVR), random forest (RFT), linear regression (LNR), and regression trees (CART) have high error rates. But that error rate can be greatly reduced using hyperparameter optimization. To the best of our knowledge, this is the largest study yet conducted, using recent data for predicting multiple health indicators of open-source projects. To facilitate open science (and replications and extensions of this work), all our materials are available online at https://​github.​com/​arennax/​Health_​Indicator_​Prediction.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Fußnoten
1
In a recent TSE’21 article. we have explained by SE hyperparameter optimization can be so simple: SE data can be intrinsically simpler than other kinds of data and, hence simpler to explore (see Figure 6d of Agrawal et al. (2021)).
 
3
We use default settings for the baselines to find if they can provide good prediction performance, and how much space hyperparameter-tuning can improve. Using a pre-selected parameter-settings from literature may bring bias because of different data format or prediction tasks.
 
4
i.e. A maximum of 200 evaluations for Random Search, Grid Search, Flash and DE; for ASKL, maximum runtime for each project is restricted to 15 seconds, please see Section 5.1 for details.
 
5
In the Apache Software Foundation, projects can be canceled and “moved to the attic” (https://​attic.​apache.​org) when they are unable to muster 3 votes for a release, lack of active contributors, or unable to fulfill their reporting duties to the Foundation.
 
Literatur
Zurück zum Zitat Aggarwal K, Hindle A, Stroulia E (2014) Co-evolution of project documentation and popularity within github. In: Proceedings of the 11th working conference on mining software repositories, pp 360–363 Aggarwal K, Hindle A, Stroulia E (2014) Co-evolution of project documentation and popularity within github. In: Proceedings of the 11th working conference on mining software repositories, pp 360–363
Zurück zum Zitat Agrawal A, Fu W, Chen D, Shen X, Menzies T (2019) How to” DODGE” complex software analytics. IEEE Trans Softw Eng Agrawal A, Fu W, Chen D, Shen X, Menzies T (2019) How to” DODGE” complex software analytics. IEEE Trans Softw Eng
Zurück zum Zitat Agrawal A, Menzies T (2018) Is” better data” better than” better data miners”?. In: 2018 IEEE/ACM 40th international conference on software engineering (ICSE), IEEE, pp 1050–1061 Agrawal A, Menzies T (2018) Is” better data” better than” better data miners”?. In: 2018 IEEE/ACM 40th international conference on software engineering (ICSE), IEEE, pp 1050–1061
Zurück zum Zitat Agrawal A, Menzies T, Minku LL, Wagner M, Yu Z (2018) Better software analytics via” DUO”: Data mining algorithms using/used-by optimizers. arXiv:1812.01550 Agrawal A, Menzies T, Minku LL, Wagner M, Yu Z (2018) Better software analytics via” DUO”: Data mining algorithms using/used-by optimizers. arXiv:1812.​01550
Zurück zum Zitat Bao L, Xia X, Lo D, Murphy GC (2019) A large scale study of long-time contributor prediction for github projects. IEEE Trans Softw Eng Bao L, Xia X, Lo D, Murphy GC (2019) A large scale study of long-time contributor prediction for github projects. IEEE Trans Softw Eng
Zurück zum Zitat Bergstra JS, Bardenet R, Bengio Y, Kégl B (2011) Algorithms for hyper-parameter optimization. In: Advances in neural information processing systems, pp 2546–2554 Bergstra JS, Bardenet R, Bengio Y, Kégl B (2011) Algorithms for hyper-parameter optimization. In: Advances in neural information processing systems, pp 2546–2554
Zurück zum Zitat Bidoki NH, Sukthankar G, Keathley H, Garibay I (2018) A cross-repository model for predicting popularity in github. In: 2018 international conference on computational science and computational intelligence (CSCI), IEEE, pp 1248–1253 Bidoki NH, Sukthankar G, Keathley H, Garibay I (2018) A cross-repository model for predicting popularity in github. In: 2018 international conference on computational science and computational intelligence (CSCI), IEEE, pp 1248–1253
Zurück zum Zitat Borges H, Hora A, Valente MT (2016a) Predicting the popularity of github repositories. In: Proceedings of the The 12th international conference on predictive models and data analytics in software engineering, pp 1–10 Borges H, Hora A, Valente MT (2016a) Predicting the popularity of github repositories. In: Proceedings of the The 12th international conference on predictive models and data analytics in software engineering, pp 1–10
Zurück zum Zitat Borges H, Hora A, Valente MT (2016b) Understanding the factors that impact the popularity of github repositories. In: 2016 IEEE international conference on software maintenance and evolution (ICSME), IEEE, pp 334–344 Borges H, Hora A, Valente MT (2016b) Understanding the factors that impact the popularity of github repositories. In: 2016 IEEE international conference on software maintenance and evolution (ICSME), IEEE, pp 334–344
Zurück zum Zitat C M, MacDonell S (2012) Evaluating prediction systems in software project estimation. IST 54(8):820–827 C M, MacDonell S (2012) Evaluating prediction systems in software project estimation. IST 54(8):820–827
Zurück zum Zitat Chen C, Twycross J, Garibaldi JM (2017) A new accuracy measure based on bounded relative error for time series forecasting. PloS One 12:3 Chen C, Twycross J, Garibaldi JM (2017) A new accuracy measure based on bounded relative error for time series forecasting. PloS One 12:3
Zurück zum Zitat Chen F, Li L, Jiang J, Zhang L (2014) Predicting the number of forks for open source software project. In: Proceedings of the 2014 3rd International workshop on evidential assessment of software technologies, pp 40–47 Chen F, Li L, Jiang J, Zhang L (2014) Predicting the number of forks for open source software project. In: Proceedings of the 2014 3rd International workshop on evidential assessment of software technologies, pp 40–47
Zurück zum Zitat Coelho J, Valente M T, Milen L, Silva L L (2020) Is this github project maintained? measuring the level of maintenance activity of open-source projects. Information and Software Technology 122 Coelho J, Valente M T, Milen L, Silva L L (2020) Is this github project maintained? measuring the level of maintenance activity of open-source projects. Information and Software Technology 122
Zurück zum Zitat Cohen PR (1995) Empirical methods for artificial intelligence. MIT Press, Cambridge, MA, USAMATH Cohen PR (1995) Empirical methods for artificial intelligence. MIT Press, Cambridge, MA, USAMATH
Zurück zum Zitat Crowston K, Howison J (2006) Assessing the health of open source communities. Computer 39(5):89–91CrossRef Crowston K, Howison J (2006) Assessing the health of open source communities. Computer 39(5):89–91CrossRef
Zurück zum Zitat Das S, Mullick S S, Suganthan P N (2016) Recent advances in differential evolution–an updated survey. Swarm and Evolutionary Computation 27:1–30CrossRef Das S, Mullick S S, Suganthan P N (2016) Recent advances in differential evolution–an updated survey. Swarm and Evolutionary Computation 27:1–30CrossRef
Zurück zum Zitat Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research 7:1–30MathSciNetMATH Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research 7:1–30MathSciNetMATH
Zurück zum Zitat Feldt R, Magazinius A (2010) Validity threats in empirical software engineering research-an initial survey. In: SEKE, pp 374–379 Feldt R, Magazinius A (2010) Validity threats in empirical software engineering research-an initial survey. In: SEKE, pp 374–379
Zurück zum Zitat Feurer M, Klein A, Eggensperger K, Springenberg J T, Blum M, Hutter F (2019) Auto-sklearn: Efficient and robust automated machine learning. In: Automated Machine Learning. Springer, Cham, pp 113–134 Feurer M, Klein A, Eggensperger K, Springenberg J T, Blum M, Hutter F (2019) Auto-sklearn: Efficient and robust automated machine learning. In: Automated Machine Learning. Springer, Cham, pp 113–134
Zurück zum Zitat Foss T, Stensrud E, Kitchenham B, Myrtveit I (2003) A simulation study of the model evaluation criterion mmre. TSE 29(11):985–995 Foss T, Stensrud E, Kitchenham B, Myrtveit I (2003) A simulation study of the model evaluation criterion mmre. TSE 29(11):985–995
Zurück zum Zitat Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. The Annals of Mathematical Statistics 11(1):86–92MathSciNetCrossRef Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. The Annals of Mathematical Statistics 11(1):86–92MathSciNetCrossRef
Zurück zum Zitat Fu W, Menzies T, Shen X (2016) Tuning for software analytics: Is it really necessary?. IST Journal 76:135–146 Fu W, Menzies T, Shen X (2016) Tuning for software analytics: Is it really necessary?. IST Journal 76:135–146
Zurück zum Zitat Fu W, Nair V, Menzies T (2016) Why is differential evolution better than grid search for tuning defect predictors?. arXiv:1609.02613 Fu W, Nair V, Menzies T (2016) Why is differential evolution better than grid search for tuning defect predictors?. arXiv:1609.​02613
Zurück zum Zitat Georg JPL, Germonprez M (2018) Assessing open source project health Georg JPL, Germonprez M (2018) Assessing open source project health
Zurück zum Zitat Han J, Deng S, Xia X, Wang D, Yin J (2019) Characterization and prediction of popular projects on github. In: 2019 IEEE 43rd annual computer software and applications conference (COMPSAC), IEEE, vol 1, pp 21–26 Han J, Deng S, Xia X, Wang D, Yin J (2019) Characterization and prediction of popular projects on github. In: 2019 IEEE 43rd annual computer software and applications conference (COMPSAC), IEEE, vol 1, pp 21–26
Zurück zum Zitat Herbold S (2017) Comments on scottknottesd in response to” an empirical comparison of model validation techniques for defect prediction models”. IEEE Trans Softw Eng 43(11):1091–1094CrossRef Herbold S (2017) Comments on scottknottesd in response to” an empirical comparison of model validation techniques for defect prediction models”. IEEE Trans Softw Eng 43(11):1091–1094CrossRef
Zurück zum Zitat Herbold S, Trautsch A, Grabowski J (2018) Correction of “A comparative study to benchmark cross-project defect prediction approaches”. IEEE Trans Softw Eng 45(6):632–636CrossRef Herbold S, Trautsch A, Grabowski J (2018) Correction of “A comparative study to benchmark cross-project defect prediction approaches”. IEEE Trans Softw Eng 45(6):632–636CrossRef
Zurück zum Zitat Hohl P, Stupperich M, Münch J, Schneider K (2018) An assessment model to foster the adoption of agile software product lines in the automotive domain. In: 2018 IEEE international conference on engineering, technology and innovation (ICE/ITMC), IEEE, pp 1–9 Hohl P, Stupperich M, Münch J, Schneider K (2018) An assessment model to foster the adoption of agile software product lines in the automotive domain. In: 2018 IEEE international conference on engineering, technology and innovation (ICE/ITMC), IEEE, pp 1–9
Zurück zum Zitat Jansen S (2014) Measuring the health of open source software ecosystems: Beyond the scope of project health. Inf Softw Technol 56(11):1508–1519CrossRef Jansen S (2014) Measuring the health of open source software ecosystems: Beyond the scope of project health. Inf Softw Technol 56(11):1508–1519CrossRef
Zurück zum Zitat Jarczyk O, Jaroszewicz S, Wierzbicki A, Pawlak K, Jankowski-Lorek M (2018) Surgical teams on github: Modeling performance of github project development processes. Inf Softw Technol 100:32–46CrossRef Jarczyk O, Jaroszewicz S, Wierzbicki A, Pawlak K, Jankowski-Lorek M (2018) Surgical teams on github: Modeling performance of github project development processes. Inf Softw Technol 100:32–46CrossRef
Zurück zum Zitat Kalliamvakou E, Gousios G, Blincoe K, Singer L, German D M, Damian D (2014) The promises and perils of mining github. In: Proceedings of the 11th working conference on mining software repositories, pp 92–101 Kalliamvakou E, Gousios G, Blincoe K, Singer L, German D M, Damian D (2014) The promises and perils of mining github. In: Proceedings of the 11th working conference on mining software repositories, pp 92–101
Zurück zum Zitat Kalliamvakou E, Gousios G, Blincoe K, Singer L, German D M, Damian D (2016) An in-depth study of the promises and perils of mining github. Empir Softw Eng 21(5):2035–2071CrossRef Kalliamvakou E, Gousios G, Blincoe K, Singer L, German D M, Damian D (2016) An in-depth study of the promises and perils of mining github. Empir Softw Eng 21(5):2035–2071CrossRef
Zurück zum Zitat Kikas R, Dumas M, Pfahl D (2016) Using dynamic and contextual features to predict issue lifetime in github projects. In: 2016 IEEE/ACM 13th working conference on mining software repositories (MSR), IEEE, pp 291–302 Kikas R, Dumas M, Pfahl D (2016) Using dynamic and contextual features to predict issue lifetime in github projects. In: 2016 IEEE/ACM 13th working conference on mining software repositories (MSR), IEEE, pp 291–302
Zurück zum Zitat Kitchenham B A, Pickard L M, MacDonell S G, Shepperd M J (2001) What accuracy statistics really measure. IEEE Softw 148(3):81–85CrossRef Kitchenham B A, Pickard L M, MacDonell S G, Shepperd M J (2001) What accuracy statistics really measure. IEEE Softw 148(3):81–85CrossRef
Zurück zum Zitat Korte M, Port D (2008) Confidence in software cost estimation results based on mmre and pred. In: PROMISE’08, pp 63–70 Korte M, Port D (2008) Confidence in software cost estimation results based on mmre and pred. In: PROMISE’08, pp 63–70
Zurück zum Zitat Krishna R, Agrawal A, Rahman A, Sobran A, Menzies T (2018) What is the connection between issues, bugs, and enhancements?. In: 2018 IEEE/ACM 40th international conference on software engineering: software engineering in practice track (ICSE-SEIP), IEEE, pp 306–315 Krishna R, Agrawal A, Rahman A, Sobran A, Menzies T (2018) What is the connection between issues, bugs, and enhancements?. In: 2018 IEEE/ACM 40th international conference on software engineering: software engineering in practice track (ICSE-SEIP), IEEE, pp 306–315
Zurück zum Zitat Langdon W B, Dolado J, Sarro F, Harman M (2016) Exact mean absolute error of baseline predictor, MARP0. IST 73:16–18 Langdon W B, Dolado J, Sarro F, Harman M (2016) Exact mean absolute error of baseline predictor, MARP0. IST 73:16–18
Zurück zum Zitat Liao Z, Yi M, Wang Y, Liu S, Liu H, Zhang Y, Zhou Y (2019) Healthy or not: A way to predict ecosystem health in github. Symmetry 11(2):144CrossRef Liao Z, Yi M, Wang Y, Liu S, Liu H, Zhang Y, Zhou Y (2019) Healthy or not: A way to predict ecosystem health in github. Symmetry 11(2):144CrossRef
Zurück zum Zitat Manikas K, Hansen K M (2013) Reviewing the health of software ecosystems-a conceptual framework proposal. In: Proceedings of the 5th international workshop on software ecosystems (IWSECO), Citeseer, pp 33–44 Manikas K, Hansen K M (2013) Reviewing the health of software ecosystems-a conceptual framework proposal. In: Proceedings of the 5th international workshop on software ecosystems (IWSECO), Citeseer, pp 33–44
Zurück zum Zitat Minku L L (2019) A novel online supervised hyperparameter tuning procedure applied to cross-company software effort estimation. Empir Softw Eng 24 (5):3153–3204CrossRef Minku L L (2019) A novel online supervised hyperparameter tuning procedure applied to cross-company software effort estimation. Empir Softw Eng 24 (5):3153–3204CrossRef
Zurück zum Zitat Molokken K, Jorgensen M (2003) A review of software surveys on software effort estimation. In: Empirical Software Engineering, 2003. ISESE 2003. Proceedings. 2003 International Symposium on, IEEE, pp 223–230 Molokken K, Jorgensen M (2003) A review of software surveys on software effort estimation. In: Empirical Software Engineering, 2003. ISESE 2003. Proceedings. 2003 International Symposium on, IEEE, pp 223–230
Zurück zum Zitat Molokken K, Jorgensen M (2003) A review of software surveys on software effort estimation. In: 2003 International Symposium on Empirical Software Engineering, 2003. ISESE 2003. Proceedings, IEEE, pp 223–230 Molokken K, Jorgensen M (2003) A review of software surveys on software effort estimation. In: 2003 International Symposium on Empirical Software Engineering, 2003. ISESE 2003. Proceedings, IEEE, pp 223–230
Zurück zum Zitat Munaiah N, Kroh S, Cabrey C, Nagappan M (2017) Curating github for engineered software projects. Empir Softw Eng 22(6):3219–3253CrossRef Munaiah N, Kroh S, Cabrey C, Nagappan M (2017) Curating github for engineered software projects. Empir Softw Eng 22(6):3219–3253CrossRef
Zurück zum Zitat Nagy A, Njima M, Mkrtchyan L (2010) A bayesian based method for agile software development release planning and project health monitoring. In: 2010 international conference on intelligent networking and collaborative systems, IEEE, pp 192–199 Nagy A, Njima M, Mkrtchyan L (2010) A bayesian based method for agile software development release planning and project health monitoring. In: 2010 international conference on intelligent networking and collaborative systems, IEEE, pp 192–199
Zurück zum Zitat Nemenyi PB (1963) Distribution-free multiple comparisons. Princeton University Nemenyi PB (1963) Distribution-free multiple comparisons. Princeton University
Zurück zum Zitat Paasivaara M, Behm B, Lassenius C, Hallikainen M (2018) Large-scale agile transformation at ericsson: a case study. Empir Softw Eng 23(5):2550–2596CrossRef Paasivaara M, Behm B, Lassenius C, Hallikainen M (2018) Large-scale agile transformation at ericsson: a case study. Empir Softw Eng 23(5):2550–2596CrossRef
Zurück zum Zitat Parnin C, Helms E, Atlee C, Boughton H, Ghattas M, Glover A, Holman J, Micco J, Murphy B, Savor T et al (2017) The top 10 adages in continuous deployment. IEEE Softw 34(3):86–95CrossRef Parnin C, Helms E, Atlee C, Boughton H, Ghattas M, Glover A, Holman J, Micco J, Murphy B, Savor T et al (2017) The top 10 adages in continuous deployment. IEEE Softw 34(3):86–95CrossRef
Zurück zum Zitat Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: Machine learning in python. J Mach Learn Res 12:2825–2830MathSciNetMATH Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: Machine learning in python. J Mach Learn Res 12:2825–2830MathSciNetMATH
Zurück zum Zitat Port D, Korte M (2008) Comparative studies of the model evaluation criterion mmre and pred in software cost estimation research. In: ESEM’08, pp 51–60 Port D, Korte M (2008) Comparative studies of the model evaluation criterion mmre and pred in software cost estimation research. In: ESEM’08, pp 51–60
Zurück zum Zitat Qi F, Jing X-Y, Zhu X, Xie X, Xu B, Ying S (2017) Software effort estimation based on open source projects: Case study of github. Inf Softw Technol 92:145–157CrossRef Qi F, Jing X-Y, Zhu X, Xie X, Xu B, Ying S (2017) Software effort estimation based on open source projects: Case study of github. Inf Softw Technol 92:145–157CrossRef
Zurück zum Zitat Santos A R, Kroll J, Sales A, Fernandes P, Wildt D (2016) Investigating the adoption of agile practices in mobile application development. In: ICEIS (1), pp 490–497 Santos A R, Kroll J, Sales A, Fernandes P, Wildt D (2016) Investigating the adoption of agile practices in mobile application development. In: ICEIS (1), pp 490–497
Zurück zum Zitat Sarro F, Petrozziello A, Harman M (2016) Multi-objective software effort estimation. In: ICSE, ACM, pp 619–630 Sarro F, Petrozziello A, Harman M (2016) Multi-objective software effort estimation. In: ICSE, ACM, pp 619–630
Zurück zum Zitat Shepperd M, Cartwright M, Kadoda G (2000) On building prediction systems for software engineers. EMSE 5(3):175–182MATH Shepperd M, Cartwright M, Kadoda G (2000) On building prediction systems for software engineers. EMSE 5(3):175–182MATH
Zurück zum Zitat Shrikanth NC, Menzies T (2021) The early bird catches the worm: Better early life cycle defect predictors. arXiv:2105.11082 Shrikanth NC, Menzies T (2021) The early bird catches the worm: Better early life cycle defect predictors. arXiv:2105.​11082
Zurück zum Zitat Snoek J, Larochelle H, Adams R P (2012) Practical bayesian optimization of machine learning algorithms. arXiv:1206.2944 Snoek J, Larochelle H, Adams R P (2012) Practical bayesian optimization of machine learning algorithms. arXiv:1206.​2944
Zurück zum Zitat Stensrud E, Foss T, Kitchenham B, Myrtveit I (2003) A further empirical investigation of the relationship of mre and project size. ESE 8(2):139–161 Stensrud E, Foss T, Kitchenham B, Myrtveit I (2003) A further empirical investigation of the relationship of mre and project size. ESE 8(2):139–161
Zurück zum Zitat Stewart K (2019) Personnel communication Stewart K (2019) Personnel communication
Zurück zum Zitat Storn R, Price K (1997) Differential evolution–a simple and efficient heuristic for global optimization over cont. spaces. JoGO 11(4):341–359MATH Storn R, Price K (1997) Differential evolution–a simple and efficient heuristic for global optimization over cont. spaces. JoGO 11(4):341–359MATH
Zurück zum Zitat Tantithamthavorn C, McIntosh S, Hassan A E, Matsumoto K (2016) Automated parameter optimization of classification techniques for defect prediction models. In: Proceedings of the 38th international conference on software engineering, pp 321–332 Tantithamthavorn C, McIntosh S, Hassan A E, Matsumoto K (2016) Automated parameter optimization of classification techniques for defect prediction models. In: Proceedings of the 38th international conference on software engineering, pp 321–332
Zurück zum Zitat Tantithamthavorn C, McIntosh S, Hassan A E, Matsumoto K (2018) The impact of automated parameter optimization on defect prediction models. IEEE Trans Softw Eng 45(7):683–711CrossRef Tantithamthavorn C, McIntosh S, Hassan A E, Matsumoto K (2018) The impact of automated parameter optimization on defect prediction models. IEEE Trans Softw Eng 45(7):683–711CrossRef
Zurück zum Zitat Tu H, Menzies T (2021) Frugal: Unlocking ssl for software analytics Tu H, Menzies T (2021) Frugal: Unlocking ssl for software analytics
Zurück zum Zitat Tu H, Papadimitriou G, Kiran M, Wang C, Mandal A, Deelman E, Menzies T (2021) Mining workflows for anomalous data transfers. In: 2021 IEEE/ACM 18th international conference on mining software repositories (MSR), pp 1–12 Tu H, Papadimitriou G, Kiran M, Wang C, Mandal A, Deelman E, Menzies T (2021) Mining workflows for anomalous data transfers. In: 2021 IEEE/ACM 18th international conference on mining software repositories (MSR), pp 1–12
Zurück zum Zitat Wahyudin D, Mustofa K, Schatten A, Biffl S, Tjoa A M (2007) Monitoring the “health” status of open source web-engineering projects. International Journal of Web Information Systems Wahyudin D, Mustofa K, Schatten A, Biffl S, Tjoa A M (2007) Monitoring the “health” status of open source web-engineering projects. International Journal of Web Information Systems
Zurück zum Zitat Wang T, Zhang Y, Yin G, Yu Y, Wang H (2018) Who will become a long-term contributor? a prediction model based on the early phase behaviors. In: Proceedings of the Tenth Asia-Pacific symposium on internetware, pp 1–10 Wang T, Zhang Y, Yin G, Yu Y, Wang H (2018) Who will become a long-term contributor? a prediction model based on the early phase behaviors. In: Proceedings of the Tenth Asia-Pacific symposium on internetware, pp 1–10
Zurück zum Zitat Weber S, Luo J (2014) What makes an open source code popular on git hub?. In: 2014 IEEE international conference on data mining workshop, IEEE, pp 851–855 Weber S, Luo J (2014) What makes an open source code popular on git hub?. In: 2014 IEEE international conference on data mining workshop, IEEE, pp 851–855
Zurück zum Zitat Witten I H, Frank E, Hall M A (2011) Data mining: Practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA Witten I H, Frank E, Hall M A (2011) Data mining: Practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA
Zurück zum Zitat Wu G, Shen X, Li H, Chen H, Lin A, Suganthan P N (2018) Ensemble of differential evolution variants. Inf Sci 423:172–186MathSciNetCrossRef Wu G, Shen X, Li H, Chen H, Lin A, Suganthan P N (2018) Ensemble of differential evolution variants. Inf Sci 423:172–186MathSciNetCrossRef
Zurück zum Zitat Wynn Jr D (2007) Assessing the health of an open source ecosystem. In: Emerging Free and Open Source Software Practices. IGI Global, pp 238–258 Wynn Jr D (2007) Assessing the health of an open source ecosystem. In: Emerging Free and Open Source Software Practices. IGI Global, pp 238–258
Zurück zum Zitat Xia T (2021) Principles of project health for open source software Xia T (2021) Principles of project health for open source software
Zurück zum Zitat Xia T, Shu R, Shen X, Menzies T (2020) Sequential model optimization for software effort estimation. IEEE Transactions on Software Engineering Xia T, Shu R, Shen X, Menzies T (2020) Sequential model optimization for software effort estimation. IEEE Transactions on Software Engineering
Zurück zum Zitat Yu Y, Wang H, Yin G, Wang T (2016) Reviewer recommendation for pull-requests in github: What can we learn from code review and bug assignment?. Inf Softw Technol 74:204–218CrossRef Yu Y, Wang H, Yin G, Wang T (2016) Reviewer recommendation for pull-requests in github: What can we learn from code review and bug assignment?. Inf Softw Technol 74:204–218CrossRef
Metadaten
Titel
Predicting health indicators for open source projects (using hyperparameter optimization)
verfasst von
Tianpei Xia
Wei Fu
Rui Shu
Rishabh Agrawal
Tim Menzies
Publikationsdatum
01.11.2022
Verlag
Springer US
Erschienen in
Empirical Software Engineering / Ausgabe 6/2022
Print ISSN: 1382-3256
Elektronische ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-022-10171-0

Weitere Artikel der Ausgabe 6/2022

Empirical Software Engineering 6/2022 Zur Ausgabe

Premium Partner