skip to main content
research-article

Experience: Quality Benchmarking of Datasets Used in Software Effort Estimation

Published:19 August 2019Publication History
Skip Abstract Section

Abstract

Data is a cornerstone of empirical software engineering (ESE) research and practice. Data underpin numerous process and project management activities, including the estimation of development effort and the prediction of the likely location and severity of defects in code. Serious questions have been raised, however, over the quality of the data used in ESE. Data quality problems caused by noise, outliers, and incompleteness have been noted as being especially prevalent. Other quality issues, although also potentially important, have received less attention. In this study, we assess the quality of 13 datasets that have been used extensively in research on software effort estimation. The quality issues considered in this article draw on a taxonomy that we published previously based on a systematic mapping of data quality issues in ESE. Our contributions are as follows: (1) an evaluation of the “fitness for purpose” of these commonly used datasets and (2) an assessment of the utility of the taxonomy in terms of dataset benchmarking. We also propose a template that could be used to both improve the ESE data collection/submission process and to evaluate other such datasets, contributing to enhanced awareness of data quality issues in the ESE community and, in time, the availability and use of higher-quality datasets.

Skip Supplemental Material Section

Supplemental Material

References

  1. Pekka Abrahamsson, Ilenia Fronza, Raimund Moser, Jelena Vlasenko, and Witold Pedrycz. 2011. Predicting development effort from user stories. In Proceedings of the 2011 International Symposium on Empirical Software Engineering and Measurement. 400--403. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Allan J. Albrecht and John E. Gaffney. 1983. Software function, source lines of code, and development effort prediction: a software science validation. IEEE Trans. Softw. Eng. 9, 6 (1983), 639--648. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Allan J. Albrecht. 1979. Measuring application development productivity. In Proceedings of the Joint SHARE/GUIDE/IBM Application Development Symposium, 83--92.Google ScholarGoogle Scholar
  4. Sousuke Amasaki. 2012. Replicated analyses of windowing approach with single company datasets. In Proceedings of the 12th International Conference on Product Focused Software Development and Process Improvement. ACM. 14--17 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Sousuke Amasaki, Yohei Takahara, and Tomoyuki Yokogawa. 2011. Performance evaluation of windowing approach on effort estimation by analogy. In Proceedings of the 2011 Joint Conference of the 21st International Workshop on Software Measurement and the 6th International Conference on Software Process and Product Measurement, 188--195. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Mohammad Azzeh, Daniel Neagu, and Peter I. Cowling. 2010. Fuzzy grey relational analysis for software effort estimation. Emp. Softw. Eng. 15,1 (2009), 60--90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Ali Sajedi Badashian, Afsaneh Esteki, Ameneh Gholipour, Abram Hindle, and Eleni Stroulia. 2014. Involvement, contribution and influence in github and stackoverflow. In Proceedings of the 24th Annual International Conference on Computer Science and Software Engineering. 19--33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Rajiv D. Banker, Hsihui Chang, and Chris F. Kemerer. 1994. Evidence on economies of scale in software development. Inf. Softw. Technol. 36, 5 (1994), 275--282.Google ScholarGoogle ScholarCross RefCross Ref
  9. K. Bennett, E. Burd, C. Kemerer, M. M. Lehman, M. Lee, R. Madachy, C. Mair, D. Sjoberg, and S. Slaughter. 1999. Empirical studies of evolving systems. Emp. Softw. Eng. 4, 4 (1999), 370--380. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Nicolas Bettenburg, Sascha Just, and Adrian Schröter. 2008. What makes a good bug report? In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 308--318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Barry W. Boehm 1981. Software Engineering Economics. Prentice-Hall, Englewood Cliffs, NJ. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Michael F. Bosu and Stephen G. MacDonell. 2013a. A taxonomy of data quality challenges in empirical software engineering. In Proceedings of the 22nd Australian Conference on Software Engineering. 97--106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Michael F. Bosu and Stephen G. MacDonell. 2013b. Data quality in empirical software engineering: A targeted review. In Proceedings of the 17th International Conference on Evaluation and Assessment in Software Engineering. ACM. 171--176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Luigi Buglione and Cigdem Gencel. 2008. Impact of base functional component types on software functional size based effort estimation. In Proceedings of PROFES 2008 9th International Conference on Product-Focused Software Development and Process Improvement. Springer, Berlin, 75--89. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Andrea Capiluppi and Daniel Izquierdo-Cortázar. 2013. Effort estimation of FLOSS projects: A study of the Linux kernel. Emp. Softw. Eng. 18, 1 (2013), 60--88.Google ScholarGoogle ScholarCross RefCross Ref
  16. Cagatay Catal and Banu Diri. 2009. Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem. Inf. Sci. 179, 8 (2009), 1040--1058. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Laila Cheikhi and Alain Abran. 2013. Promise and isbsg software engineering data repositories: A survey. In Proceedings of the 2013 Joint Conference of the 23nd International Workshop on Software Measurement (IWSM’13) and the 8th International Conference on Software Process and Product Measurement (Mensura’13), 17--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Jr-shian Chen and Ching-hsue Cheng. 2006. Software diagnosis using fuzzified attribute base on modified MEPA. In Proceedings of the International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems. Springer, Berlin, Heidelberg, 1270--1279. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Sun-Jen Huang and Nan-Hsing Chiu. 2009. Applying fuzzy neural network to estimate software development. Appl. Intell. 30, 2 (2009), 73--83. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Tony Clear and Stephen G. MacDonell. 2011. Understanding technology use in global virtual teams: Research methodologies and methods. Inf. Softw. Technol. 53 9 (2011), 994--1011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Juan J. Cuadrado-Gallego, Luigi Buglione, María J. Domínguez-Alda, Marian Fernández De Sevilla, J. Antonio Gutierrez De Mesa, and Onur Demirors. 2010. An experimental study on the conversion between ifpug and cosmic functional size measurement units. Inf. Softw. Technol. 52, 3 (2010), 347--357. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Michael K. Daskalantonakis. 1992. A practical view of software measurement and implementation experiences within motorolla. IEEE Trans. Softw. Eng. 18 11 (1992), 998--1010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Kefu Deng and Stephen G. MacDonell. 2008. Maximising data retention from the isbsg repository. In Proceedings of the 12th International Conference on Evaluation and Assessment in Software Engineering. Italy. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Jean-Marc Desharnais. 1988. Statistical Analysis on the Productivity of Data Processing with Development Projects Using the Function Point Technique. Master's thesis. Université du Québec à Montréal, Canada.Google ScholarGoogle Scholar
  25. Norman Fenton, Martin Neil, William Marsh, Peter Hearty, Lukasz Radliński, amd Paul Krause. 2008. On the effectiveness of early life cycle defect prediction with bayesian nets. Emp. Softw. Eng. 13, 5 (2008), 499--537. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Andreas Folleco, Taghi M. Khoshgoftaar, Jason Van Hulse, and Lofton Bullard. 2008. Software quality modeling: The impact of class noise on the random forest classifier. In Proceedings of the 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence). 3853--3859.Google ScholarGoogle ScholarCross RefCross Ref
  27. Pekka Forselius. 2008. Quality of benchmarking data. Retrieved January 29, 2014 from www.4sumpartners.com.Google ScholarGoogle Scholar
  28. Cigdem Gencel, Luigi Buglione, and Alain Abran. 2009. Improvement opportunities and suggestions for benchmarking. In Software Process and Product Measurement. Springer, Berlin, 144--156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. R. L. Glass, I. Vessey, and V. Ramesh. 2002. Research in software engineering: An analysis of the literature. Inf. Softw. Technol. 44, 8 (2002), 491--506. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. María Paula González, Jesús Lorés, and Antoni Granollers. 2008. Enhancing usability testing through datamining techniques: A novel approach to detecting usability problem patterns for a context of use. Inf. Softw. Technol. 50, 6 (2008), 547--68. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. D. Gray, D. Bowes, N. Davey, Y. Sun, and B. Christianson. 2012. Reflections on the NASA MDP data sets. IET Softw. 6 6, (2012) 549--558.Google ScholarGoogle Scholar
  32. Tracy Hall. 2007. Longitudinal studies in evidence-based software engineering. In Empirical Software Engineering Issues: Critical Assessment and Future Directions. Springer, Berlin, 41--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Tracy Hall and Norman Fenton. 1997. Implementing effective software metrics programs. IEEE Software 14, 2 (1997), 55--65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Zhimm He, Fayola Peters, Tim Menzies, and Ye Yang. 2013. Learning from open-source projects: An empirical study on defect prediction. In Proceedings of the 2013 ACM / IEEE International Symposium on Empirical Software Engineering and Measurement. IEEE, 45--54.Google ScholarGoogle ScholarCross RefCross Ref
  35. Chao-Jung Hsu and Chin-Yu Huang. 2007. Improving effort estimation accuracy by weighted grey relational analysis during software development. In Proceedings of the 14th Asia-Pacific Software Engineering Conference. 534--541. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Sun-Jen Huang, Nan-Hsing Chiu, and Li-Wei Chen. 2008. Integration of the grey relational analysis with genetic algorithm for software effort estimation. Eur. J. Operat. Res. 188, 3 (2008), 898--909.Google ScholarGoogle ScholarCross RefCross Ref
  37. Jason Van Hulse, Taghi M. Khoshgoftaar, Chris Seiffert, and Lili Zhao. 2006. Noise correction using bayesian multiple imputation. In Proceedings of the 2006 IEEE International Conference on Information Reuse and Integration, 478--483.Google ScholarGoogle ScholarCross RefCross Ref
  38. Jason Van Hulse and Taghi M. Khoshgoftaar. 2008. A comprehensive empirical evaluation of missing value imputation in noisy software measurement data. J. Syst. Softw. 81 5 (2008), 691--708. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Jason Van Hulse and T. M. Khoshgoftaar. 2014. Incomplete-case nearest neighbor imputation in software measurement data. Information Sciences 259 (2014), 596--610. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Ayelet Israeli and Dror G. Feitelson. 2010. The linux kernel as a case study in software evolution. J. Syst. Softw. 83, 3 (2010), 485--501. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Philip M. Johnson and Anne M. Disney. 1999. A critical analysis of psp data quality: Results from a case study. Emp. Softw. Eng. 4, 1 (1999), 317--349. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Chris F. Kemerer. 1987. An empirical validation of software cost estimation models. Commun. ACM 30, 5 (1987), 416--429. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Jacky Keung and Barbara Kitchenham. 2008. Experiments with analogy-x for software cost estimation. In Proceedings of the 19th Australian Conference on Software Engineering 229--238. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Taghi M. Khoshgoftaar, Andres Folleco, Jason Van Hulse, and Lofton Bullard. 2006. Software quality imputation in the presence of noisy data. In Proceedings of IEEE International Conference on Information Reuse and Integration. 484--489.Google ScholarGoogle ScholarCross RefCross Ref
  45. T. M. Khoshgoftaar and P. Rebours. 2004. Generating multiple noise elimination filters with the ensemble-partitioning filter. In Proceedings of the IEEE International Conference on Information Reuse and Integration, 369--375.Google ScholarGoogle Scholar
  46. Taghi M. Khoshgoftaar and Jason Van Hulse. 2005. Identifying noise in an attribute of interest. In Proceedings of the 4th International Conference on Machine Learning and Applications. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Barbara Kitchenham, Tore Dybå, and Magne Jorgensen. 2004. Evidence-based software engineering. In Proceedings of the 26th International Conference on Software Engineering (ICSE’04), 273--81. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Barbara Kitchenham and Kari Kansala. 1993. Inter-item correlations among function points. In Proceedings of the 15th International Conference on Software Engineering. 229--238. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Barbara Kitchenham, Shari L. Pfleeger, Beth Mccoll, and Suzanne Eagan. 2002. An empirical study of maintenance and development estimation accuracy. J. Syst. Softw. 64, 1 (2002), 57--77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Ekrem Kocaguneli, Gregory Gay, Tim Menzies, Ye Yang, and Jacky W. Keung. 2010. When to use data from other projects for effort estimation. In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering. ACM. 321--324. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Ekrem Kocaguneli and Tim Menzies. 2011. How to find relevant data for effort estimation? In Proceedings of the 5th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM’11), 2011, 255--264. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Ekrem Kocaguneli, Tim Menzies, and Jacky W. Keung. 2013. Kernel methods for software effort estimation. Emp. Softw. Eng. 18, 1 (2013), 1--24.Google ScholarGoogle ScholarCross RefCross Ref
  53. Ekrem Kocaguneli, Tim Menzies, and Jacky W. Keung. 2012. On the value of ensemble effort estimation. IEEE Trans. Softw. Eng. 38 6 (2012), 1403--1416. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Ekrem Kocaguneli, Tim Menzies, and Emilia Mendes. 2015. Transfer learning in effort estimation. Emp. Softw. Eng. 20, 3 (2015), 813--843. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Luigi Lavazza and Sandro Morasca. 2012. Software effort estimation with a generalized robust linear regression technique. In Proceedings of the 16th International Conference on Evaluation 8 Assessment in Software Engineering (EASE’12), 206--215.Google ScholarGoogle ScholarCross RefCross Ref
  56. Taeho Lee, Taewan Gu, and Jongmoon Baik. 2014. Mnd-Scemp: An empirical study of a software cost estimation modeling process in the defense domain. Emp. Softw. Eng. 19, 1 (2014), 213--240. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Sukumar Letchmunan, Marc Roper, and Murray Wood. 2010. Investigating effort prediction of web-based applications using cbr on the ISBSG dataset. In Proceedings of the 14th International Conference on Evaluation and Assessment in Software Engineering. 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Y. F. Li, M. Xie, and T. N. Goh. 2009. A study of project selection and feature weighting for analogy based software cost estimation. J. Syst. Softw. 82, 2 (2009), 241--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Gernot A. Liebchen and Martin J. Shepperd. 2005. Software productivity analysis of a large data set and issues of confidentiality and data quality. In Proceedings of the 11th IEEE International Software Metrics Symposium (METRICS’05). Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Gernot A. Liebchen and Martin J. Shepperd. 2008. Data sets and data quality in software engineering. In Proceedings of the 4th International Workshop on Predictor Models in Software Engineering. ACM Press New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Gernot A. Liebchen, Bheki Twala, Martin J. Shepperd, and Michelle Cartwright. 2006. Assessing the quality and cleaning of a software project dataset: An experience report. In Proceedings of the 10th International Conference on Evaluation and Assessment in Software Engineering. 1--7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Gernot Liebchen, Bheki Twala, Martin J. Shepperd, Michelle Cartwright, and Mark Stephens. 2007. Filtering, robust filtering, polishing: Techniques for addressing quality in software data. In Proceedings of the 1st International Symposium on Empirical Software Engineering and Measurement (ESEM’07), 99--106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Chris Lokan and Emilia Mendes. 2006. Cross-company and single-company effort models using the ISBSG Database: A further replicated study. In Proceedings of the International Symposium on Empirical Software Engineering. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Chris Lokan and Emilia Mendes. 2009. Applying moving windows to software effort estimation. In Proceedings of the 3rd International Symposium on Empirical Software Engineering and Measurement, 111--122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Stephen G. MacDonell and Martin J. Shepperd. 2007. Comparing local and global software effort estimation models -- reflections on a systematic review. In Proceedings of the 1st International Symposium on Empirical Software Engineering and Measurement (ESEM’07), 401--409. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Carolyn Mair, Martin J. Shepperd, and Magne Jørgensen. 2005. An analysis of data sets used to train and validate cost prediction systems. In Proceedings of the 2005 Workshop on Predictor Models in Software Engineering (PROMISE’05). 1--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Katrina Maxwell. 2002. Applied Statistics for Software Managers. Prentice-Hall, Englewood Cliffs, NJ.Google ScholarGoogle Scholar
  68. Katrina D. Maxwell and Pekka Forselius. 2000. Benchmarking software-development productivity. IEEE Softw. 17, 1 (2000), 80--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. D. Mazinanian, M. Doroodchi, and M. Hassany. 2012. WDMES: A comprehensive measurement system for web application development. In Proceedings of the Euro American Conference on Telematics and Information Systems (EATIS’12). 135--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Emilia Mendes, Sergio Di Martino, Filomena Ferrucci, and Carmine Gravino. 2008. Cross-company vs. single-company web effort models using the tukutuku database: An extended study. J. Syst. Softw. 81, 5 (2008), 673--690. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Emelia Mendes and Chris Lokan. 2008. Replicating studies on cross- vs single-company effort models using the ISBSG database. Emp. Softw. Eng. 13, 1 (2008), 3--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Emilia Mendes, Sergio Di Martino, Filomena Ferrucci, and Carmine Gravino. 2007. Effort estimation: How valuable is it for a web company to use a cross-company data set, compared to using its own single-company data set? In Proceedings of the 16th International Conference on World Wide Web (WWW’07). 963--972. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Tim Menzies, Andrew Butcher, Andrian Marcus, Thomas Zimmermann, and David Cok. 2011. Local Vs. global models for effort estimation and defect prediction. In Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE’11), 343--351. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Leandro L. Minku and Xin Yao. 2013. Ensembles and locality: Insight on improving software effort estimation. Inf. Softw. Technol. 55, 8 (2013) 1512--1528. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Y. Miyazaki, M. Terakado, K. Ozaki, and H. Nozaki. 1994. Robust regression for developing software estimation models. J. Syst. Softw. 27, 1 (1994) 3--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Sandro Morasca. 2009. Building statistically significant robust regression models in empirical software engineering. In Proceedings of the 5th International Conference on Predictor Models in Software Engineering (PROMISE’09). Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Raimund Moser, Pedrycz Witold, and Giancarlo Succi. 2008. A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In Proceedings of the International Conference on Software Engineering. 181--90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Vu Nguyen, Bert Steece, and Barry W. Boehm. 2008. A constrained regression technique for cocomo calibration. In Proceedings of the 2nd ACM-IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM’08). 213--222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Fayola Peters, Tim Menzies, Liang Gong, and Hongyu Zhang. 2013. Balancing privacy and utility in cross-company defect prediction. IEEE Trans. Softw. Eng. 3, 8 (2013) 1054--1068. Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. M. E. Prabhakar and Maitreyee Dutta. 2013. Prediction of software effort using artificial neural network and support vector machine. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 3, 3 (2013), 40--46.Google ScholarGoogle Scholar
  81. Rahul Premraj, Martin J. Shepperd, Barbara Kitchenham, and Pekka Forselius. 2005. An empirical analysis of software productivity over time. Software Metrics, 2005. In Proceedings of the 11th IEEE International Software Metrics Symposium (METRICS’05), 37--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. Tomi Prifti, Sean Banerjee, and Bojan Cukic. 2011. Detecting bug duplicate reports through local references. In Proceedings of the 7th International Conference on Predictive Models in Software Engineering (Promise’11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Fumin Qi, Xiao-Yuan Jing, Xiaoke Zhu, Xiaovuan Xie, Baowen Xu, and Shi Ying. 2017. Software effort estimation based on open source projects: Case study of github. Inf. Softw. Technol. 92, 145--157.Google ScholarGoogle ScholarCross RefCross Ref
  84. Ch. Satyananda Reddy and Kvsvn Raju. 2009. An improved fuzzy approach for cocomo's effort estimation using gaussian membership function. J. Softw. 4, 5 (2009), 452--459.Google ScholarGoogle Scholar
  85. Gregorio Robles. 2010. Replicating MSR: A study of the potential replicability of papers published in the mining software repositories proceedings. In Proceedings of the 7th IEEE Working Conference on Mining Software Repositories (MSR’10). 171--180.Google ScholarGoogle ScholarCross RefCross Ref
  86. Daniel Rodriguez, Israel Herraiz, and Rachel Harrison. 2012. On software engineering repositories and their open problems. In Proceedings of the 2012 1st International Workshop on Realizing AI Synergies in Software Engineering (RAISE’12). 52--56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. Marshima M. Rosli, Ewan Tempero, and Andrew Luxton-Reilly. 2013. Can we trust our results? a mapping study on data quality. In Proceedings of the 2013 20th Asia-Pacific Software Engineering Conference (APSEC’13). 116--123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. Joost Schalken and Hans van Vliet. 2008. Measuring where it matters: Determining starting points for metrics collection. J. Syst. Softw. 81, 5 (2008), 603--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. Yeong-Seok Seo, Kyung-A Yoon, and Doo-Hwan Bae. 2008. An empirical analysis of software effort estimation with outlier elimination. In Proceedings of the 4th International Workshop on Predictor Models in Software Engineering (PROMISE’08) 25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. Martin J. Shepperd, David Bowes, and Tracy Hall. 2014. Researcher bias: The use of machine learning in software defect prediction. IEEE Trans. Softw. Eng. 40, 6 (2014), 603--616.Google ScholarGoogle ScholarCross RefCross Ref
  91. Martin J. Shepperd and Chris Schofield. 1997. Estimating software project effort using analogies. software engineering. IEEE Trans. Softw. Eng. 23, 12 (1997) 736--743. Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. Martin J. Shepperd, Qinbao Song, Zhongbin Sun, and Carolyn Mair. 2013. Data quality: Some comments on the NASA software defect datasets. Software Engineering. IEEE Trans. Softw. Eng. 39, 9 (2013), 1208--1215. Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. Yonghee Shin, Andrew Meneely, Laurie Williams, and Jason A. Osborne. 2011. Evaluating complexity, code churn, and developer activity metrics as indicators of software vulnerabilities. IEEE Trans. Softw. Eng. 37, 6 (2011), 772--787. Google ScholarGoogle ScholarDigital LibraryDigital Library
  94. Forrest J. Shull, Jeffrey C. Carver, Sira Vegas, and Natalia Juristo. 2008. The role of replications in empirical software engineering. Emp. Softw. Eng. 13, 2 (2008), 211--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  95. Thomas Tan, Guan Cun, Mei He, and Barry Boehm. 2009. Productivity trends in incremental and iterative software development. In Proceedings of the 3rd International Symposium on Empirical Software Engineering and Measurement, 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. Wang-chiew Tan. 2007. “Provenance in databases : Past current, and future.” IEEE Data Eng. Bull. 30, 4 (2007), 3--12.Google ScholarGoogle Scholar
  97. Wei Tang and Taghi M. Khoshgoftaar. 2004. Noise identification with the k-means algorithm. In Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence. 373--378. Google ScholarGoogle ScholarDigital LibraryDigital Library
  98. Choh Man Teng. 2000. Evaluating noise correction. PRICAI 2000 Topics in Artificial Intelligence 188--98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  99. Ayse Tosun, Burak Turhan, and Ayse B. Bener. 2009. Feature weighting heuristics for analogy-based effort estimation models. Exp. Syst. Appl. 36, 7 (2009), 10325--10333. Google ScholarGoogle ScholarDigital LibraryDigital Library
  100. Burak Turhan, Tim Menzies, Ayşe B. Bener, and Justin Di Stefano. 2009. On the relative value of cross-company and within-company data for defect prediction. Emp. Softw. Eng. 14, 5 (2009), 540--578. Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. María C. Valverde, Diego Vallespir, Adriana Marotta, and Joseignacio Panach. 2014. Applying a data quality model to experiments in software engineering. In Advances in Conceptual Modeling, Lecture Notes in Computer Science, Vol. 8823. 168--177.Google ScholarGoogle ScholarCross RefCross Ref
  102. Isabella Wieczorek. 2002. Improved software cost estimation—A robust and interpretable modelling method and a comprehensive empirical investigation. Emp. Softw. Eng. 7, 2 (2002), 177--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  103. Kyung- A. Yoon and Doo-Hwan Bae. 2010. A pattern-based outlier detection method identifying abnormal attributes in software project data. Inf. Softw. Technol. 52, 2 (2010), 137--151. Google ScholarGoogle ScholarDigital LibraryDigital Library
  104. Wen Zhang, Ye Yang, and Qing Wang. 2011. Handling missing data in software effort prediction with naive bayes and em algorithm categories and subject descriptors. In Proceedings of the 7th International Conference on Predictive Models in Software Engineering (Promise’11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  105. Chen Zhihao, Barry Boehm, Tim Menzies, and Daniel Port. 2005. Finding the right data for software cost modeling. IEEE Softw. 22, 6 (2005), 38--46. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Experience: Quality Benchmarking of Datasets Used in Software Effort Estimation

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image Journal of Data and Information Quality
          Journal of Data and Information Quality  Volume 11, Issue 4
          December 2019
          139 pages
          ISSN:1936-1955
          EISSN:1936-1963
          DOI:10.1145/3357606
          Issue’s Table of Contents

          Copyright © 2019 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 19 August 2019
          • Accepted: 1 April 2019
          • Revised: 1 March 2019
          • Received: 1 December 2018
          Published in jdiq Volume 11, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format