Skip to main content
Top
Published in: Empirical Software Engineering 2/2018

08-08-2017

Addressing problems with replicability and validity of repository mining studies through a smart data platform

Authors: Fabian Trautsch, Steffen Herbold, Philip Makedonski, Jens Grabowski

Published in: Empirical Software Engineering | Issue 2/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The usage of empirical methods has grown common in software engineering. This trend spawned hundreds of publications, whose results are helping to understand and improve the software development process. Due to the data-driven nature of this venue of investigation, we identified several problems within the current state-of-the-art that pose a threat to the replicability and validity of approaches. The heavy re-use of data sets in many studies may invalidate the results in case problems with the data itself are identified. Moreover, for many studies data and/or the implementations are not available, which hinders a replication of the results and, thereby, decreases the comparability between studies. Furthermore, many studies use small data sets, which comprise of less than 10 projects. This poses a threat especially to the external validity of these studies. Even if all information about the studies is available, the diversity of the used tooling can make their replication even then very hard. Within this paper, we discuss a potential solution to these problems through a cloud-based platform that integrates data collection and analytics. We created SmartSHARK, which implements our approach. Using SmartSHARK, we collected data from several projects and created different analytic examples. Within this article, we present SmartSHARK and discuss our experiences regarding the use of it and the mentioned problems. Additionally, we show how we have addressed the issues that we have identified during our work with SmartSHARK.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Footnotes
9
The complete source code as well as deployment scripts are available in our public SVN: http://​trex.​informatik.​uni-goettingen.​de/​svn/​smartshark/​. A running instance is located at the following URL: http://​smartshark.​informatik.​uni-goettingen.​de.
 
10
The developing company Intooitus does not exist anymore and the tool is also not available anymore.
 
14
The default set of regular expressions includes: "defect(s)?", "patch(ing|es|ed)?", "bug(s|fix(es)?)?", "(re)?fix(es|ed|ing|age|∖s?up(s)?)?", "debug(ged)?", "∖#∖d+", "back∖s?out", "revert(ing|ed)?"
 
15
The default assumption may be overridden by applying different strategies based on the size of the changes or other information.
 
22
There is no ksudoku specific list. Instead we collected the whole kde-games-devel mailing: https://​mail.​kde.​org/​pipermail/​kde-games-devel
 
23
The project is not available anymore.
 
24
Note that all messages were always additionally sent to the mailing list.
 
30
This problem does not occur anymore with the current version of CVSAnalY.
 
33
We use the official git library for analysis: https://​libgit2.​github.​com/​
 
34
Currently, mecoSHARK is able to detect Type-2 clones, which are clones that are syntactically identical except for variations in layout, comments, whitespaces, type references, identifier names, and literals. Details can be found in the SourceMeter documentation: FrontEndART Ltd (2016a)
 
36
https://github.com/smartshark/issueSHARK
 
39
According to Google Scholar on 2017-07-06.
 
Literature
go back to reference Alexandru CV, Gall HC (2015) Rapid multi-purpose, multi-commit code analysis. In: Proceedings of the IEEE/ACM 37th international conference on software engineering (ICSE). IEEE/ACM, pp 635–638 Alexandru CV, Gall HC (2015) Rapid multi-purpose, multi-commit code analysis. In: Proceedings of the IEEE/ACM 37th international conference on software engineering (ICSE). IEEE/ACM, pp 635–638
go back to reference Arcuri A, Fraser G, Galeotti JP (2015) Generating tcp/udp network data for automated unit test generation. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 155–165 Arcuri A, Fraser G, Galeotti JP (2015) Generating tcp/udp network data for automated unit test generation. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 155–165
go back to reference Avdiienko V, Kuznetsov K, Gorla A, Zeller A, Arzt S, Rasthofer S, Bodden E (2015) Mining apps for abnormal usage of sensitive data. In: Proceedings of the 37th international conference on software engineering-volume 1. IEEE Press, pp 426–436 Avdiienko V, Kuznetsov K, Gorla A, Zeller A, Arzt S, Rasthofer S, Bodden E (2015) Mining apps for abnormal usage of sensitive data. In: Proceedings of the 37th international conference on software engineering-volume 1. IEEE Press, pp 426–436
go back to reference Bang L, Aydin A, Bultan T (2015) Automatically computing path complexity of programs. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 61–72 Bang L, Aydin A, Bultan T (2015) Automatically computing path complexity of programs. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 61–72
go back to reference Benelallam A, Gómez A, Sunyé G, Tisi M, Launay D (2014) Neo4emf, a scalable persistence layer for emf models. In: Proceedings of the 10th European conference on modelling foundations and applications - volume 8569. Springer-Verlag New York, Inc., New York, NY, USA, pp 230–241. doi:10.1007/978-3-319-09195-2_15 Benelallam A, Gómez A, Sunyé G, Tisi M, Launay D (2014) Neo4emf, a scalable persistence layer for emf models. In: Proceedings of the 10th European conference on modelling foundations and applications - volume 8569. Springer-Verlag New York, Inc., New York, NY, USA, pp 230–241. doi:10.​1007/​978-3-319-09195-2_​15
go back to reference Bevan J, Whitehead E J, Kim S, Godfrey M (2005) Facilitating software evolution research with kenyon. In: ACM SIGSOFT software engineering notes, vol 30. ACM, pp 177186 Bevan J, Whitehead E J, Kim S, Godfrey M (2005) Facilitating software evolution research with kenyon. In: ACM SIGSOFT software engineering notes, vol 30. ACM, pp 177186
go back to reference Beyer D, Dangl M, Dietsch D, Heizmann M, Stahlbauer A (2015) Witness validation and stepwise testification across software verifiers. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 721–733 Beyer D, Dangl M, Dietsch D, Heizmann M, Stahlbauer A (2015) Witness validation and stepwise testification across software verifiers. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 721–733
go back to reference Bird C, Gourley A, Devanbu P, Gertz M, Swaminathan A (2006) Mining email social networks. In: Proceedings of the 2006 international workshop on mining software repositories. ACM, pp 137–143 Bird C, Gourley A, Devanbu P, Gertz M, Swaminathan A (2006) Mining email social networks. In: Proceedings of the 2006 international workshop on mining software repositories. ACM, pp 137–143
go back to reference Catal C, Diri B (2009) A systematic review of software fault prediction studies. Expert Syst Appl 36(4):7346–7354CrossRef Catal C, Diri B (2009) A systematic review of software fault prediction studies. Expert Syst Appl 36(4):7346–7354CrossRef
go back to reference Cavalcanti G, Accioly P, Borba P (2015) Assessing semistructured merge in version control systems: a replicated experiment. In: ACM/IEEE international symposium on empirical software engineering and measurement 2015 (ESEM). IEEE, pp 1–10 Cavalcanti G, Accioly P, Borba P (2015) Assessing semistructured merge in version control systems: a replicated experiment. In: ACM/IEEE international symposium on empirical software engineering and measurement 2015 (ESEM). IEEE, pp 1–10
go back to reference Claes M, Mens T, Di Cosmo R, Vouillon J (2015) A historical analysis of debian package incompatibilities. In: IEEE/ACM 12th working conference on mining software repositories 2015 (MSR). IEEE, pp 212–223 Claes M, Mens T, Di Cosmo R, Vouillon J (2015) A historical analysis of debian package incompatibilities. In: IEEE/ACM 12th working conference on mining software repositories 2015 (MSR). IEEE, pp 212–223
go back to reference Coelho R, Almeida L, Gousios G, van Deursen A (2015) Unveiling exception handling bug hazards in android based on github and google code issues. In: 2015 IEEE/ACM 12th working conference on mining software repositories (MSR). IEEE, pp 134–145 Coelho R, Almeida L, Gousios G, van Deursen A (2015) Unveiling exception handling bug hazards in android based on github and google code issues. In: 2015 IEEE/ACM 12th working conference on mining software repositories (MSR). IEEE, pp 134–145
go back to reference Ċubranić D, Murphy GC, Singer J, Booth KS (2005) Hipikat: a project memory for software development. IEEE Trans Softw Eng 31(6):446–465CrossRef Ċubranić D, Murphy GC, Singer J, Booth KS (2005) Hipikat: a project memory for software development. IEEE Trans Softw Eng 31(6):446–465CrossRef
go back to reference Czerwonka J, Nagappan N, Schulte W (2013) CODEMINE: building a software development data analytics platform at microsoft. IEEE Softw 30(4):64–71CrossRef Czerwonka J, Nagappan N, Schulte W (2013) CODEMINE: building a software development data analytics platform at microsoft. IEEE Softw 30(4):64–71CrossRef
go back to reference Devanbu P, Zimmermann T, Bird C (2016) Belief & evidence in empirical software engineering. In: Proceedings of the 38th international conference on software engineering. ACM, pp 108–119 Devanbu P, Zimmermann T, Bird C (2016) Belief & evidence in empirical software engineering. In: Proceedings of the 38th international conference on software engineering. ACM, pp 108–119
go back to reference Dhar A, Purandare R, Dhawan M, Rangaswamy S (2015) Clotho: saving programs from malformed strings and incorrect string-handling. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 555–566 Dhar A, Purandare R, Dhawan M, Rangaswamy S (2015) Clotho: saving programs from malformed strings and incorrect string-handling. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 555–566
go back to reference Di Ruscio D, Kolovos DS, Korkontzelos I, Matragkas N, Vinju J (2015) Ossmeter: a software measurement platform for automatically analysing open source software projects. In: ESEC/FSE 2015 tool demonstrations track Di Ruscio D, Kolovos DS, Korkontzelos I, Matragkas N, Vinju J (2015) Ossmeter: a software measurement platform for automatically analysing open source software projects. In: ESEC/FSE 2015 tool demonstrations track
go back to reference Di Sorbo A, Panichella S, Visaggio C, Di Penta M, Canfora G, Gall H (2015) Development emails content analyzer: Intention mining in developer discussions. In: Proceedings of the IEEE/ACM 30th international conference on automated software engineering (ASE) Di Sorbo A, Panichella S, Visaggio C, Di Penta M, Canfora G, Gall H (2015) Development emails content analyzer: Intention mining in developer discussions. In: Proceedings of the IEEE/ACM 30th international conference on automated software engineering (ASE)
go back to reference Draisbach U, Naumann F (2010) Dude: The duplicate detection toolkit. In: Proceedings of the international workshop on quality in databases (QDB) Draisbach U, Naumann F (2010) Dude: The duplicate detection toolkit. In: Proceedings of the international workshop on quality in databases (QDB)
go back to reference Dyer R, Nguyen HA, Rajan H, Nguyen TN (2013) Boa: a language and infrastructure for analyzing ultra-large-scale software repositories. In: Proceedings of the IEEE/ACM 35th international conference on software engineering (ICSE) Dyer R, Nguyen HA, Rajan H, Nguyen TN (2013) Boa: a language and infrastructure for analyzing ultra-large-scale software repositories. In: Proceedings of the IEEE/ACM 35th international conference on software engineering (ICSE)
go back to reference Dyer R, Nguyen HA, Rajan H, Nguyen T (2015) Boa: ultra-large-scale software repository and source code mining. ACM Transactions on Software Engineering and Methodology forthcoming Dyer R, Nguyen HA, Rajan H, Nguyen T (2015) Boa: ultra-large-scale software repository and source code mining. ACM Transactions on Software Engineering and Methodology forthcoming
go back to reference Eichberg M, Hermann B, Mezini M, Glanz L (2015) Hidden truths in dead software paths. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 474–484 Eichberg M, Hermann B, Mezini M, Glanz L (2015) Hidden truths in dead software paths. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 474–484
go back to reference Fernandez-Ramil J, Izquierdo-Cortazar D, Mens T (2009) What does it take to develop a million lines of open source code?. In: Open source ecosystems: diverse communities interacting. Springer, pp 170–184 Fernandez-Ramil J, Izquierdo-Cortazar D, Mens T (2009) What does it take to develop a million lines of open source code?. In: Open source ecosystems: diverse communities interacting. Springer, pp 170–184
go back to reference Foucault M, Palyart M, Blanc X, Murphy GC, Falleri JR (2015) Impact of developer turnover on quality in open-source software. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 829–841 Foucault M, Palyart M, Blanc X, Murphy GC, Falleri JR (2015) Impact of developer turnover on quality in open-source software. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 829–841
go back to reference Gallaba K, Mesbah A, Beschastnikh I (2015) Don’t call us, we’ll call you: characterizing callbacks in javascript. In: ACM/IEEE international symposium on empirical software engineering and measurement 2015 (ESEM). IEEE, pp 1–10 Gallaba K, Mesbah A, Beschastnikh I (2015) Don’t call us, we’ll call you: characterizing callbacks in javascript. In: ACM/IEEE international symposium on empirical software engineering and measurement 2015 (ESEM). IEEE, pp 1–10
go back to reference German DM (2004) Mining CVS repositories, the softChange experience. Evolution 245(5,402):92–688 German DM (2004) Mining CVS repositories, the softChange experience. Evolution 245(5,402):92–688
go back to reference Giger E, Pinzger M, Gall H (2010) Predicting the fix time of bugs. In: Proceedings of the 2nd International workshop on recommendation systems for software engineering (RSSE). ACM, pp. 52–56 Giger E, Pinzger M, Gall H (2010) Predicting the fix time of bugs. In: Proceedings of the 2nd International workshop on recommendation systems for software engineering (RSSE). ACM, pp. 52–56
go back to reference Gong L, Pradel M, Sen K (2015) Jitprof: Pinpointing jit-unfriendly javascript code. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 357–368 Gong L, Pradel M, Sen K (2015) Jitprof: Pinpointing jit-unfriendly javascript code. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 357–368
go back to reference González-Barahona JM, Robles G (2012) On the reproducibility of empirical software engineering studies based on data retrieved from development repositories. Empir Softw Eng 17(1-2):75–89CrossRef González-Barahona JM, Robles G (2012) On the reproducibility of empirical software engineering studies based on data retrieved from development repositories. Empir Softw Eng 17(1-2):75–89CrossRef
go back to reference Gousios G, Spinellis D (2009) Alitheia core: An extensible software quality monitoring platform. In: Proceedings of the IEEE/ACM 31st international conference on software engineering (ICSE) Gousios G, Spinellis D (2009) Alitheia core: An extensible software quality monitoring platform. In: Proceedings of the IEEE/ACM 31st international conference on software engineering (ICSE)
go back to reference Gousios G, Spinellis D (2012) Ghtorrent: Github’s data from a firehose. In: Proceedings of the 9th IEEE working conference on mining software repositories (MSR). IEEE, pp 12–21 Gousios G, Spinellis D (2012) Ghtorrent: Github’s data from a firehose. In: Proceedings of the 9th IEEE working conference on mining software repositories (MSR). IEEE, pp 12–21
go back to reference Gousios G, Kalliamvakou E, Spinellis D (2008) Measuring developer contribution from software repository data. In: Proceedings of the 2008 international working conference on mining software repositories. ACM, New York, NY, USA, MSR ’08, pp 129–132. doi:10.1145/1370750.1370781 Gousios G, Kalliamvakou E, Spinellis D (2008) Measuring developer contribution from software repository data. In: Proceedings of the 2008 international working conference on mining software repositories. ACM, New York, NY, USA, MSR ’08, pp 129–132. doi:10.​1145/​1370750.​1370781
go back to reference Gousios G, Vasilescu B, Serebrenik A, Zaidman A (2014) Lean ghtorrent: Github data on demand. In: Proceedings of the 11th IEEE working conference on mining software repositories (MSR). ACM, pp 384–387 Gousios G, Vasilescu B, Serebrenik A, Zaidman A (2014) Lean ghtorrent: Github data on demand. In: Proceedings of the 11th IEEE working conference on mining software repositories (MSR). ACM, pp 384–387
go back to reference Gupta M, Sureka A, Padmanabhuni S, Asadullah AM (2015) Identifying software process management challenges: Survey of practitioners in a large global it company. In: Proceedings of the 12th working conference on mining software repositories. IEEE Press, pp 346–356 Gupta M, Sureka A, Padmanabhuni S, Asadullah AM (2015) Identifying software process management challenges: Survey of practitioners in a large global it company. In: Proceedings of the 12th working conference on mining software repositories. IEEE Press, pp 346–356
go back to reference Hall T, Beecham S, Bowes D, Gray D, Counsell S (2012) A systematic literature review on fault prediction performance in software engineering. IEEE Trans Softw Eng 38(6):1276–1304. doi:10.1109/TSE.2011.103 CrossRef Hall T, Beecham S, Bowes D, Gray D, Counsell S (2012) A systematic literature review on fault prediction performance in software engineering. IEEE Trans Softw Eng 38(6):1276–1304. doi:10.​1109/​TSE.​2011.​103 CrossRef
go back to reference He P, Li B, Liu X, Chen J, Ma Y (2015) An empirical study on software defect prediction with a simplified metric set. Inf Softw Technol 59:170–190CrossRef He P, Li B, Liu X, Chen J, Ma Y (2015) An empirical study on software defect prediction with a simplified metric set. Inf Softw Technol 59:170–190CrossRef
go back to reference Hermann B, Reif M, Eichberg M, Mezini M (2015) Getting to know you: towards a capability model for java. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 758–769 Hermann B, Reif M, Eichberg M, Mezini M (2015) Getting to know you: towards a capability model for java. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 758–769
go back to reference Herraiz I, Robles G, Amor JJ, Romera T, González Barahona JM (2006) The processes of joining in global distributed software projects. In: Proceedings of the 2006 international workshop on global software development for the practitioner. ACM, pp 27–33 Herraiz I, Robles G, Amor JJ, Romera T, González Barahona JM (2006) The processes of joining in global distributed software projects. In: Proceedings of the 2006 international workshop on global software development for the practitioner. ACM, pp 27–33
go back to reference Herraiz I, Gonzalez-Barahona JM, Robles G (2007) Forecasting the number of changes in eclipse using time series analysis. In: Proceedings of the 4th IEEE working conference on mining software repositories (MSR) Herraiz I, Gonzalez-Barahona JM, Robles G (2007) Forecasting the number of changes in eclipse using time series analysis. In: Proceedings of the 4th IEEE working conference on mining software repositories (MSR)
go back to reference Honsel V, Honsel D, Herbold S, Grabowski J, Waack S (2015) Mining software dependency networks for agent-based simulation of software evolution. In: Proceedings of the 4th international workshop on software mining (SoftMine) Honsel V, Honsel D, Herbold S, Grabowski J, Waack S (2015) Mining software dependency networks for agent-based simulation of software evolution. In: Proceedings of the 4th international workshop on software mining (SoftMine)
go back to reference Howison J, Conklin MS, Crowston K (2005) Ossmole: A collaborative repository for floss research data and analyses. In: Proceedings of the 1st international conference on open source software Howison J, Conklin MS, Crowston K (2005) Ossmole: A collaborative repository for floss research data and analyses. In: Proceedings of the 1st international conference on open source software
go back to reference ISO/IEC (1998) 9241-11 Ergonomic requirements for office work with visual display terminals (VDTs). ISO/IEC 9241-14 ISO/IEC (1998) 9241-11 Ergonomic requirements for office work with visual display terminals (VDTs). ISO/IEC 9241-14
go back to reference Jermakovics A, Sillitti A, Succi G (2011) Mining and visualizing developer networks from version control systems. In: Proceedings of the 4th international workshop on cooperative and human aspects of software engineering (CHASE). ACM, New York, NY, USA, CHASE ’11, pp 24–31. doi:10.1145/1984642.1984647 Jermakovics A, Sillitti A, Succi G (2011) Mining and visualizing developer networks from version control systems. In: Proceedings of the 4th international workshop on cooperative and human aspects of software engineering (CHASE). ACM, New York, NY, USA, CHASE ’11, pp 24–31. doi:10.​1145/​1984642.​1984647
go back to reference Joblin M, Mauerer W, Apel S, Siegmund J, Riehle D (2015) From developer networks to verified communities: a fine-grained approach. In: Proceedings of the 37th international conference on software engineering-volume 1. IEEE Press, pp 563–573 Joblin M, Mauerer W, Apel S, Siegmund J, Riehle D (2015) From developer networks to verified communities: a fine-grained approach. In: Proceedings of the 37th international conference on software engineering-volume 1. IEEE Press, pp 563–573
go back to reference Kouroshfar E, Mirakhorli M, Bagheri H, Xiao L, Malek S, Cai Y (2015) A study on the role of software architecture in the evolution and quality of software. In: Proceedings of the 12th working conference on mining software repositories. IEEE Press, pp 246–257 Kouroshfar E, Mirakhorli M, Bagheri H, Xiao L, Malek S, Cai Y (2015) A study on the role of software architecture in the evolution and quality of software. In: Proceedings of the 12th working conference on mining software repositories. IEEE Press, pp 246–257
go back to reference Le DM, Behnamghader P, Garcia J, Link D, Shahbazian A, Medvidovic N (2015) An empirical study of architectural change in open-source software systems. In: Proceedings of the 12th working conference on mining software repositories. IEEE Press, pp 235–245 Le DM, Behnamghader P, Garcia J, Link D, Shahbazian A, Medvidovic N (2015) An empirical study of architectural change in open-source software systems. In: Proceedings of the 12th working conference on mining software repositories. IEEE Press, pp 235–245
go back to reference Lin Z, Whitehead J (2015) Why power laws? An explanation from fine-grained code changes. In: 2015 IEEE/ACM 12th working conference on mining software repositories (MSR). IEEE, pp 68–75 Lin Z, Whitehead J (2015) Why power laws? An explanation from fine-grained code changes. In: 2015 IEEE/ACM 12th working conference on mining software repositories (MSR). IEEE, pp 68–75
go back to reference Linares-Vásquez M, Bavota G, Cárdenas CEB, Oliveto R, Di Penta M, Poshyvanyk D (2015a) Optimizing energy consumption of guis in android apps: A multi-objective approach. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 143–154 Linares-Vásquez M, Bavota G, Cárdenas CEB, Oliveto R, Di Penta M, Poshyvanyk D (2015a) Optimizing energy consumption of guis in android apps: A multi-objective approach. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 143–154
go back to reference Linares-Vásquez M, White M, Bernal-Cárdenas C, Moran K, Poshyvanyk D (2015b) Mining android app usages for generating actionable gui-based execution scenarios. In: Proceedings of the 12th working conference on mining software repositories. IEEE Press, pp 111–122 Linares-Vásquez M, White M, Bernal-Cárdenas C, Moran K, Poshyvanyk D (2015b) Mining android app usages for generating actionable gui-based execution scenarios. In: Proceedings of the 12th working conference on mining software repositories. IEEE Press, pp 111–122
go back to reference Long F, Rinard M (2015) Staged program repair with condition synthesis. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 166–178 Long F, Rinard M (2015) Staged program repair with condition synthesis. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 166–178
go back to reference Makedonski P, Grabowski J (2016) Weighted Multi-factor multi-layer identification of potential causes for events of interest in software repositories. In: Proceedings of the seminar series on advanced techniques and tools for software evolution (SATToSE) 2015, forthcoming 2016 Makedonski P, Grabowski J (2016) Weighted Multi-factor multi-layer identification of potential causes for events of interest in software repositories. In: Proceedings of the seminar series on advanced techniques and tools for software evolution (SATToSE) 2015, forthcoming 2016
go back to reference Makedonski P, Sudau F, Grabowski J (2015) Towards a model-based software mining infrastructure. ACM SIGSOFT Software Engineering Notes 40(1):1–8CrossRef Makedonski P, Sudau F, Grabowski J (2015) Towards a model-based software mining infrastructure. ACM SIGSOFT Software Engineering Notes 40(1):1–8CrossRef
go back to reference Menzies T, Rees-Jones M, Krishna R, Pape C (2015) The promise repository of empirical software engineering data. http://openscience.us/repo. North Carolina State University, Department of Computer Science [accessed 22-January-2015] Menzies T, Rees-Jones M, Krishna R, Pape C (2015) The promise repository of empirical software engineering data. http://​openscience.​us/​repo. North Carolina State University, Department of Computer Science [accessed 22-January-2015]
go back to reference Nanz S, Furia CA (2015) A comparative study of programming languages in rosetta code. In: IEEE/ACM 37th IEEE international conference on software engineering 2015 (ICSE), vol 1. IEEE, pp 778–788 Nanz S, Furia CA (2015) A comparative study of programming languages in rosetta code. In: IEEE/ACM 37th IEEE international conference on software engineering 2015 (ICSE), vol 1. IEEE, pp 778–788
go back to reference Nguyen HV, Kästner C, Nguyen TN (2015a) Cross-language program slicing for dynamic web applications. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 369–380 Nguyen HV, Kästner C, Nguyen TN (2015a) Cross-language program slicing for dynamic web applications. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 369–380
go back to reference Nguyen TH, Grundy J, Almorsy M (2015b) Rule-based extraction of goal-use case models from text. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 591–601 Nguyen TH, Grundy J, Almorsy M (2015b) Rule-based extraction of goal-use case models from text. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 591–601
go back to reference Park J, Esmaeilzadeh H, Zhang X, Naik M, Harris W (2015) Flexjava: language support for safe and modular approximate programming. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 745–757 Park J, Esmaeilzadeh H, Zhang X, Naik M, Harris W (2015) Flexjava: language support for safe and modular approximate programming. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 745–757
go back to reference Robles G (2010) Replicating msr: a study of the potential replicability of papers published in the mining software repositories proceedings. In: 2010 7th IEEE working conference on mining software repositories (MSR). IEEE, pp 171–180 Robles G (2010) Replicating msr: a study of the potential replicability of papers published in the mining software repositories proceedings. In: 2010 7th IEEE working conference on mining software repositories (MSR). IEEE, pp 171–180
go back to reference Robles G, González-Barahona JM, Cervigón C, Capiluppi A, Izquierdo-Cortázar D (2014) Estimating development effort in free/open source software projects by mining software repositories: a case study of openstack. In: Proceedings of the 11th working conference on mining software repositories. ACM, New York, NY, USA, MSR 2014, pp 222–231. doi:10.1145/2597073.2597107 Robles G, González-Barahona JM, Cervigón C, Capiluppi A, Izquierdo-Cortázar D (2014) Estimating development effort in free/open source software projects by mining software repositories: a case study of openstack. In: Proceedings of the 11th working conference on mining software repositories. ACM, New York, NY, USA, MSR 2014, pp 222–231. doi:10.​1145/​2597073.​2597107
go back to reference Safi G, Shahbazian A, Halfond WG, Medvidovic N (2015) Detecting event anomalies in event-based systems. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 25–37 Safi G, Shahbazian A, Halfond WG, Medvidovic N (2015) Detecting event anomalies in event-based systems. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 25–37
go back to reference Samak M, Ramanathan MK (2015) Synthesizing tests for detecting atomicity violations. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 131–142 Samak M, Ramanathan MK (2015) Synthesizing tests for detecting atomicity violations. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 131–142
go back to reference Scheidgen M, Zubow A, Fischer J, Kolbe TH (2012) Automated and transparent model fragmentation for persisting large models. Springer Scheidgen M, Zubow A, Fischer J, Kolbe TH (2012) Automated and transparent model fragmentation for persisting large models. Springer
go back to reference Shepperd M, Song Q, Sun Z, Mair C (2013) Data quality: some comments on the NASA software defect datasets. IEEE Trans Softw Eng 39(9):1208–1215CrossRef Shepperd M, Song Q, Sun Z, Mair C (2013) Data quality: some comments on the NASA software defect datasets. IEEE Trans Softw Eng 39(9):1208–1215CrossRef
go back to reference Shi A, Yung T, Gyori A, Marinov D (2015) Comparing and combining test-suite reduction and regression test selection. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 237–247 Shi A, Yung T, Gyori A, Marinov D (2015) Comparing and combining test-suite reduction and regression test selection. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 237–247
go back to reference Shull FJ, Carver JC, Vegas S, Juristo N (2008) The role of replications in empirical software engineering. Empir Softw Eng 13(2):211–218CrossRef Shull FJ, Carver JC, Vegas S, Juristo N (2008) The role of replications in empirical software engineering. Empir Softw Eng 13(2):211–218CrossRef
go back to reference Siegmund J, Siegmund N, Apel S (2015a) Views on internal and external validity in empirical software engineering. In: 2015 IEEE/ACM 37th IEEE international conference on software engineering (ICSE), vol 1. IEEE, pp 9–19 Siegmund J, Siegmund N, Apel S (2015a) Views on internal and external validity in empirical software engineering. In: 2015 IEEE/ACM 37th IEEE international conference on software engineering (ICSE), vol 1. IEEE, pp 9–19
go back to reference Siegmund N, Grebhahn A, Apel S, Kästner C (2015b) Performance-influence models for highly configurable systems. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 284–294 Siegmund N, Grebhahn A, Apel S, Kästner C (2015b) Performance-influence models for highly configurable systems. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 284–294
go back to reference Sjøberg DI, Hannay JE, Hansen O, Kampenes VB, Karahasanovic A, Liborg NK, Rekdal AC (2005) A survey of controlled experiments in software engineering. IEEE Trans Softw Eng 31(9):733–753CrossRef Sjøberg DI, Hannay JE, Hansen O, Kampenes VB, Karahasanovic A, Liborg NK, Rekdal AC (2005) A survey of controlled experiments in software engineering. IEEE Trans Softw Eng 31(9):733–753CrossRef
go back to reference Smith EK, Barr ET, Le Goues C, Brun Y (2015) Is the cure worse than the disease? Overfitting in automated program repair. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 532–543 Smith EK, Barr ET, Le Goues C, Brun Y (2015) Is the cure worse than the disease? Overfitting in automated program repair. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 532–543
go back to reference Smith EK, Bird C, Zimmermann T (2016) Beliefs, practices, and personalities of software engineers: a survey in a large software company. In: Proceedings of the 9th international workshop on cooperative and human aspects of software engineering. ACM, pp 15–18 Smith EK, Bird C, Zimmermann T (2016) Beliefs, practices, and personalities of software engineers: a survey in a large software company. In: Proceedings of the 9th international workshop on cooperative and human aspects of software engineering. ACM, pp 15–18
go back to reference Sun X, Liu X, Li B, Duan Y, Yang H, Hu J (2016) Exploring topic models in software engineering data analysis: A survey. In: 2016 17th IEEE/ACIS international conference on software engineering, artificial intelligence, networking and parallel/distributed computing (SNPD), pp 357–362. doi:10.1109/SNPD.2016.7515925 Sun X, Liu X, Li B, Duan Y, Yang H, Hu J (2016) Exploring topic models in software engineering data analysis: A survey. In: 2016 17th IEEE/ACIS international conference on software engineering, artificial intelligence, networking and parallel/distributed computing (SNPD), pp 357–362. doi:10.​1109/​SNPD.​2016.​7515925
go back to reference Tan M, Tan L, Dara S, Mayeux C (2015) Online defect prediction for imbalanced data. In: Proceedings of the IEEE/ACM 37th international conference on software engineering (ICSE) Tan M, Tan L, Dara S, Mayeux C (2015) Online defect prediction for imbalanced data. In: Proceedings of the IEEE/ACM 37th international conference on software engineering (ICSE)
go back to reference Tao Y, Kim S (2015) Partitioning composite code changes to facilitate code review. In: 2015 IEEE/ACM 12th working conference on mining software repositories (MSR). IEEE, pp 180–190 Tao Y, Kim S (2015) Partitioning composite code changes to facilitate code review. In: 2015 IEEE/ACM 12th working conference on mining software repositories (MSR). IEEE, pp 180–190
go back to reference Thomas JJ, Cook KA (2006) A visual analytics agenda. IEEE Comput Graph Appl 26(1):10–13CrossRef Thomas JJ, Cook KA (2006) A visual analytics agenda. IEEE Comput Graph Appl 26(1):10–13CrossRef
go back to reference Trautsch F, Herbold S, Makedonski P, Grabowski J (2016) Adressing problems with external validity of repository mining studies through a smart data platform. In: Proceedings of the 13th international workshop on mining software repositories. ACM, pp 97–108 Trautsch F, Herbold S, Makedonski P, Grabowski J (2016) Adressing problems with external validity of repository mining studies through a smart data platform. In: Proceedings of the 13th international workshop on mining software repositories. ACM, pp 97–108
go back to reference Van Rysselberghe F, Demeyer S (2004) Studying software evolution information by visualizing the change history. In: Proceedings of the 20th IEEE international conference on software maintenance, 2004. IEEE, pp 328–337 Van Rysselberghe F, Demeyer S (2004) Studying software evolution information by visualizing the change history. In: Proceedings of the 20th IEEE international conference on software maintenance, 2004. IEEE, pp 328–337
go back to reference Walden J, Stuckman J, Scandariato R (2014) Predicting vulnerable components: Software metrics vs text mining. In: Proceedings of the IEEE 25th international symposium on software reliability engineering (ISSRE). IEEE, pp 23–33 Walden J, Stuckman J, Scandariato R (2014) Predicting vulnerable components: Software metrics vs text mining. In: Proceedings of the IEEE 25th international symposium on software reliability engineering (ISSRE). IEEE, pp 23–33
go back to reference Xu T, Jin L, Fan X, Zhou Y, Pasupathy S, Talwadker R (2015) Hey, you have given me too many knobs!: understanding and dealing with over-designed configuration in system software. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 307–319 Xu T, Jin L, Fan X, Zhou Y, Pasupathy S, Talwadker R (2015) Hey, you have given me too many knobs!: understanding and dealing with over-designed configuration in system software. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 307–319
go back to reference Yang W, Xiao X, Andow B, Li S, Xie T, Enck W (2015) Appcontext: Differentiating malicious and benign mobile app behaviors using context. In: IEEE/ACM 37th IEEE international conference on software engineering 2015 (ICSE), vol 1. IEEE, pp 303–313 Yang W, Xiao X, Andow B, Li S, Xie T, Enck W (2015) Appcontext: Differentiating malicious and benign mobile app behaviors using context. In: IEEE/ACM 37th IEEE international conference on software engineering 2015 (ICSE), vol 1. IEEE, pp 303–313
go back to reference Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX conference on hot topics in cloud computing (HotCloud) Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX conference on hot topics in cloud computing (HotCloud)
go back to reference Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin MJ, Shenker S, Stoica I (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX conference on network system design and implementation (NSDI) Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin MJ, Shenker S, Stoica I (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX conference on network system design and implementation (NSDI)
go back to reference Zhu J, He P, Fu Q, Zhang H, Lyu MR, Zhang D (2015) Learning to log: helping developers make informed logging decisions. In: IEEE/ACM 37th IEEE international conference on software engineering 2015 (ICSE), vol 1. IEEE, pp 415–425 Zhu J, He P, Fu Q, Zhang H, Lyu MR, Zhang D (2015) Learning to log: helping developers make informed logging decisions. In: IEEE/ACM 37th IEEE international conference on software engineering 2015 (ICSE), vol 1. IEEE, pp 415–425
Metadata
Title
Addressing problems with replicability and validity of repository mining studies through a smart data platform
Authors
Fabian Trautsch
Steffen Herbold
Philip Makedonski
Jens Grabowski
Publication date
08-08-2017
Publisher
Springer US
Published in
Empirical Software Engineering / Issue 2/2018
Print ISSN: 1382-3256
Electronic ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-017-9537-x

Other articles of this Issue 2/2018

Empirical Software Engineering 2/2018 Go to the issue

Premium Partner