Skip to main content
Erschienen in: Empirical Software Engineering 2/2018

08.08.2017

Addressing problems with replicability and validity of repository mining studies through a smart data platform

verfasst von: Fabian Trautsch, Steffen Herbold, Philip Makedonski, Jens Grabowski

Erschienen in: Empirical Software Engineering | Ausgabe 2/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The usage of empirical methods has grown common in software engineering. This trend spawned hundreds of publications, whose results are helping to understand and improve the software development process. Due to the data-driven nature of this venue of investigation, we identified several problems within the current state-of-the-art that pose a threat to the replicability and validity of approaches. The heavy re-use of data sets in many studies may invalidate the results in case problems with the data itself are identified. Moreover, for many studies data and/or the implementations are not available, which hinders a replication of the results and, thereby, decreases the comparability between studies. Furthermore, many studies use small data sets, which comprise of less than 10 projects. This poses a threat especially to the external validity of these studies. Even if all information about the studies is available, the diversity of the used tooling can make their replication even then very hard. Within this paper, we discuss a potential solution to these problems through a cloud-based platform that integrates data collection and analytics. We created SmartSHARK, which implements our approach. Using SmartSHARK, we collected data from several projects and created different analytic examples. Within this article, we present SmartSHARK and discuss our experiences regarding the use of it and the mentioned problems. Additionally, we show how we have addressed the issues that we have identified during our work with SmartSHARK.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Fußnoten
9
The complete source code as well as deployment scripts are available in our public SVN: http://​trex.​informatik.​uni-goettingen.​de/​svn/​smartshark/​. A running instance is located at the following URL: http://​smartshark.​informatik.​uni-goettingen.​de.
 
10
The developing company Intooitus does not exist anymore and the tool is also not available anymore.
 
14
The default set of regular expressions includes: "defect(s)?", "patch(ing|es|ed)?", "bug(s|fix(es)?)?", "(re)?fix(es|ed|ing|age|∖s?up(s)?)?", "debug(ged)?", "∖#∖d+", "back∖s?out", "revert(ing|ed)?"
 
15
The default assumption may be overridden by applying different strategies based on the size of the changes or other information.
 
22
There is no ksudoku specific list. Instead we collected the whole kde-games-devel mailing: https://​mail.​kde.​org/​pipermail/​kde-games-devel
 
23
The project is not available anymore.
 
24
Note that all messages were always additionally sent to the mailing list.
 
30
This problem does not occur anymore with the current version of CVSAnalY.
 
33
We use the official git library for analysis: https://​libgit2.​github.​com/​
 
34
Currently, mecoSHARK is able to detect Type-2 clones, which are clones that are syntactically identical except for variations in layout, comments, whitespaces, type references, identifier names, and literals. Details can be found in the SourceMeter documentation: FrontEndART Ltd (2016a)
 
36
https://github.com/smartshark/issueSHARK
 
39
According to Google Scholar on 2017-07-06.
 
Literatur
Zurück zum Zitat Alexandru CV, Gall HC (2015) Rapid multi-purpose, multi-commit code analysis. In: Proceedings of the IEEE/ACM 37th international conference on software engineering (ICSE). IEEE/ACM, pp 635–638 Alexandru CV, Gall HC (2015) Rapid multi-purpose, multi-commit code analysis. In: Proceedings of the IEEE/ACM 37th international conference on software engineering (ICSE). IEEE/ACM, pp 635–638
Zurück zum Zitat Arcuri A, Fraser G, Galeotti JP (2015) Generating tcp/udp network data for automated unit test generation. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 155–165 Arcuri A, Fraser G, Galeotti JP (2015) Generating tcp/udp network data for automated unit test generation. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 155–165
Zurück zum Zitat Avdiienko V, Kuznetsov K, Gorla A, Zeller A, Arzt S, Rasthofer S, Bodden E (2015) Mining apps for abnormal usage of sensitive data. In: Proceedings of the 37th international conference on software engineering-volume 1. IEEE Press, pp 426–436 Avdiienko V, Kuznetsov K, Gorla A, Zeller A, Arzt S, Rasthofer S, Bodden E (2015) Mining apps for abnormal usage of sensitive data. In: Proceedings of the 37th international conference on software engineering-volume 1. IEEE Press, pp 426–436
Zurück zum Zitat Bang L, Aydin A, Bultan T (2015) Automatically computing path complexity of programs. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 61–72 Bang L, Aydin A, Bultan T (2015) Automatically computing path complexity of programs. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 61–72
Zurück zum Zitat Benelallam A, Gómez A, Sunyé G, Tisi M, Launay D (2014) Neo4emf, a scalable persistence layer for emf models. In: Proceedings of the 10th European conference on modelling foundations and applications - volume 8569. Springer-Verlag New York, Inc., New York, NY, USA, pp 230–241. doi:10.1007/978-3-319-09195-2_15 Benelallam A, Gómez A, Sunyé G, Tisi M, Launay D (2014) Neo4emf, a scalable persistence layer for emf models. In: Proceedings of the 10th European conference on modelling foundations and applications - volume 8569. Springer-Verlag New York, Inc., New York, NY, USA, pp 230–241. doi:10.​1007/​978-3-319-09195-2_​15
Zurück zum Zitat Bevan J, Whitehead E J, Kim S, Godfrey M (2005) Facilitating software evolution research with kenyon. In: ACM SIGSOFT software engineering notes, vol 30. ACM, pp 177186 Bevan J, Whitehead E J, Kim S, Godfrey M (2005) Facilitating software evolution research with kenyon. In: ACM SIGSOFT software engineering notes, vol 30. ACM, pp 177186
Zurück zum Zitat Beyer D, Dangl M, Dietsch D, Heizmann M, Stahlbauer A (2015) Witness validation and stepwise testification across software verifiers. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 721–733 Beyer D, Dangl M, Dietsch D, Heizmann M, Stahlbauer A (2015) Witness validation and stepwise testification across software verifiers. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 721–733
Zurück zum Zitat Bird C, Gourley A, Devanbu P, Gertz M, Swaminathan A (2006) Mining email social networks. In: Proceedings of the 2006 international workshop on mining software repositories. ACM, pp 137–143 Bird C, Gourley A, Devanbu P, Gertz M, Swaminathan A (2006) Mining email social networks. In: Proceedings of the 2006 international workshop on mining software repositories. ACM, pp 137–143
Zurück zum Zitat Catal C, Diri B (2009) A systematic review of software fault prediction studies. Expert Syst Appl 36(4):7346–7354CrossRef Catal C, Diri B (2009) A systematic review of software fault prediction studies. Expert Syst Appl 36(4):7346–7354CrossRef
Zurück zum Zitat Cavalcanti G, Accioly P, Borba P (2015) Assessing semistructured merge in version control systems: a replicated experiment. In: ACM/IEEE international symposium on empirical software engineering and measurement 2015 (ESEM). IEEE, pp 1–10 Cavalcanti G, Accioly P, Borba P (2015) Assessing semistructured merge in version control systems: a replicated experiment. In: ACM/IEEE international symposium on empirical software engineering and measurement 2015 (ESEM). IEEE, pp 1–10
Zurück zum Zitat Claes M, Mens T, Di Cosmo R, Vouillon J (2015) A historical analysis of debian package incompatibilities. In: IEEE/ACM 12th working conference on mining software repositories 2015 (MSR). IEEE, pp 212–223 Claes M, Mens T, Di Cosmo R, Vouillon J (2015) A historical analysis of debian package incompatibilities. In: IEEE/ACM 12th working conference on mining software repositories 2015 (MSR). IEEE, pp 212–223
Zurück zum Zitat Coelho R, Almeida L, Gousios G, van Deursen A (2015) Unveiling exception handling bug hazards in android based on github and google code issues. In: 2015 IEEE/ACM 12th working conference on mining software repositories (MSR). IEEE, pp 134–145 Coelho R, Almeida L, Gousios G, van Deursen A (2015) Unveiling exception handling bug hazards in android based on github and google code issues. In: 2015 IEEE/ACM 12th working conference on mining software repositories (MSR). IEEE, pp 134–145
Zurück zum Zitat Ċubranić D, Murphy GC, Singer J, Booth KS (2005) Hipikat: a project memory for software development. IEEE Trans Softw Eng 31(6):446–465CrossRef Ċubranić D, Murphy GC, Singer J, Booth KS (2005) Hipikat: a project memory for software development. IEEE Trans Softw Eng 31(6):446–465CrossRef
Zurück zum Zitat Czerwonka J, Nagappan N, Schulte W (2013) CODEMINE: building a software development data analytics platform at microsoft. IEEE Softw 30(4):64–71CrossRef Czerwonka J, Nagappan N, Schulte W (2013) CODEMINE: building a software development data analytics platform at microsoft. IEEE Softw 30(4):64–71CrossRef
Zurück zum Zitat Devanbu P, Zimmermann T, Bird C (2016) Belief & evidence in empirical software engineering. In: Proceedings of the 38th international conference on software engineering. ACM, pp 108–119 Devanbu P, Zimmermann T, Bird C (2016) Belief & evidence in empirical software engineering. In: Proceedings of the 38th international conference on software engineering. ACM, pp 108–119
Zurück zum Zitat Dhar A, Purandare R, Dhawan M, Rangaswamy S (2015) Clotho: saving programs from malformed strings and incorrect string-handling. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 555–566 Dhar A, Purandare R, Dhawan M, Rangaswamy S (2015) Clotho: saving programs from malformed strings and incorrect string-handling. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 555–566
Zurück zum Zitat Di Ruscio D, Kolovos DS, Korkontzelos I, Matragkas N, Vinju J (2015) Ossmeter: a software measurement platform for automatically analysing open source software projects. In: ESEC/FSE 2015 tool demonstrations track Di Ruscio D, Kolovos DS, Korkontzelos I, Matragkas N, Vinju J (2015) Ossmeter: a software measurement platform for automatically analysing open source software projects. In: ESEC/FSE 2015 tool demonstrations track
Zurück zum Zitat Di Sorbo A, Panichella S, Visaggio C, Di Penta M, Canfora G, Gall H (2015) Development emails content analyzer: Intention mining in developer discussions. In: Proceedings of the IEEE/ACM 30th international conference on automated software engineering (ASE) Di Sorbo A, Panichella S, Visaggio C, Di Penta M, Canfora G, Gall H (2015) Development emails content analyzer: Intention mining in developer discussions. In: Proceedings of the IEEE/ACM 30th international conference on automated software engineering (ASE)
Zurück zum Zitat Draisbach U, Naumann F (2010) Dude: The duplicate detection toolkit. In: Proceedings of the international workshop on quality in databases (QDB) Draisbach U, Naumann F (2010) Dude: The duplicate detection toolkit. In: Proceedings of the international workshop on quality in databases (QDB)
Zurück zum Zitat Dyer R, Nguyen HA, Rajan H, Nguyen TN (2013) Boa: a language and infrastructure for analyzing ultra-large-scale software repositories. In: Proceedings of the IEEE/ACM 35th international conference on software engineering (ICSE) Dyer R, Nguyen HA, Rajan H, Nguyen TN (2013) Boa: a language and infrastructure for analyzing ultra-large-scale software repositories. In: Proceedings of the IEEE/ACM 35th international conference on software engineering (ICSE)
Zurück zum Zitat Dyer R, Nguyen HA, Rajan H, Nguyen T (2015) Boa: ultra-large-scale software repository and source code mining. ACM Transactions on Software Engineering and Methodology forthcoming Dyer R, Nguyen HA, Rajan H, Nguyen T (2015) Boa: ultra-large-scale software repository and source code mining. ACM Transactions on Software Engineering and Methodology forthcoming
Zurück zum Zitat Eichberg M, Hermann B, Mezini M, Glanz L (2015) Hidden truths in dead software paths. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 474–484 Eichberg M, Hermann B, Mezini M, Glanz L (2015) Hidden truths in dead software paths. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 474–484
Zurück zum Zitat Fernandez-Ramil J, Izquierdo-Cortazar D, Mens T (2009) What does it take to develop a million lines of open source code?. In: Open source ecosystems: diverse communities interacting. Springer, pp 170–184 Fernandez-Ramil J, Izquierdo-Cortazar D, Mens T (2009) What does it take to develop a million lines of open source code?. In: Open source ecosystems: diverse communities interacting. Springer, pp 170–184
Zurück zum Zitat Foucault M, Palyart M, Blanc X, Murphy GC, Falleri JR (2015) Impact of developer turnover on quality in open-source software. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 829–841 Foucault M, Palyart M, Blanc X, Murphy GC, Falleri JR (2015) Impact of developer turnover on quality in open-source software. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 829–841
Zurück zum Zitat Gallaba K, Mesbah A, Beschastnikh I (2015) Don’t call us, we’ll call you: characterizing callbacks in javascript. In: ACM/IEEE international symposium on empirical software engineering and measurement 2015 (ESEM). IEEE, pp 1–10 Gallaba K, Mesbah A, Beschastnikh I (2015) Don’t call us, we’ll call you: characterizing callbacks in javascript. In: ACM/IEEE international symposium on empirical software engineering and measurement 2015 (ESEM). IEEE, pp 1–10
Zurück zum Zitat German DM (2004) Mining CVS repositories, the softChange experience. Evolution 245(5,402):92–688 German DM (2004) Mining CVS repositories, the softChange experience. Evolution 245(5,402):92–688
Zurück zum Zitat Giger E, Pinzger M, Gall H (2010) Predicting the fix time of bugs. In: Proceedings of the 2nd International workshop on recommendation systems for software engineering (RSSE). ACM, pp. 52–56 Giger E, Pinzger M, Gall H (2010) Predicting the fix time of bugs. In: Proceedings of the 2nd International workshop on recommendation systems for software engineering (RSSE). ACM, pp. 52–56
Zurück zum Zitat Gong L, Pradel M, Sen K (2015) Jitprof: Pinpointing jit-unfriendly javascript code. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 357–368 Gong L, Pradel M, Sen K (2015) Jitprof: Pinpointing jit-unfriendly javascript code. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 357–368
Zurück zum Zitat González-Barahona JM, Robles G (2012) On the reproducibility of empirical software engineering studies based on data retrieved from development repositories. Empir Softw Eng 17(1-2):75–89CrossRef González-Barahona JM, Robles G (2012) On the reproducibility of empirical software engineering studies based on data retrieved from development repositories. Empir Softw Eng 17(1-2):75–89CrossRef
Zurück zum Zitat Gousios G, Spinellis D (2009) Alitheia core: An extensible software quality monitoring platform. In: Proceedings of the IEEE/ACM 31st international conference on software engineering (ICSE) Gousios G, Spinellis D (2009) Alitheia core: An extensible software quality monitoring platform. In: Proceedings of the IEEE/ACM 31st international conference on software engineering (ICSE)
Zurück zum Zitat Gousios G, Spinellis D (2012) Ghtorrent: Github’s data from a firehose. In: Proceedings of the 9th IEEE working conference on mining software repositories (MSR). IEEE, pp 12–21 Gousios G, Spinellis D (2012) Ghtorrent: Github’s data from a firehose. In: Proceedings of the 9th IEEE working conference on mining software repositories (MSR). IEEE, pp 12–21
Zurück zum Zitat Gousios G, Kalliamvakou E, Spinellis D (2008) Measuring developer contribution from software repository data. In: Proceedings of the 2008 international working conference on mining software repositories. ACM, New York, NY, USA, MSR ’08, pp 129–132. doi:10.1145/1370750.1370781 Gousios G, Kalliamvakou E, Spinellis D (2008) Measuring developer contribution from software repository data. In: Proceedings of the 2008 international working conference on mining software repositories. ACM, New York, NY, USA, MSR ’08, pp 129–132. doi:10.​1145/​1370750.​1370781
Zurück zum Zitat Gousios G, Vasilescu B, Serebrenik A, Zaidman A (2014) Lean ghtorrent: Github data on demand. In: Proceedings of the 11th IEEE working conference on mining software repositories (MSR). ACM, pp 384–387 Gousios G, Vasilescu B, Serebrenik A, Zaidman A (2014) Lean ghtorrent: Github data on demand. In: Proceedings of the 11th IEEE working conference on mining software repositories (MSR). ACM, pp 384–387
Zurück zum Zitat Gupta M, Sureka A, Padmanabhuni S, Asadullah AM (2015) Identifying software process management challenges: Survey of practitioners in a large global it company. In: Proceedings of the 12th working conference on mining software repositories. IEEE Press, pp 346–356 Gupta M, Sureka A, Padmanabhuni S, Asadullah AM (2015) Identifying software process management challenges: Survey of practitioners in a large global it company. In: Proceedings of the 12th working conference on mining software repositories. IEEE Press, pp 346–356
Zurück zum Zitat Hall T, Beecham S, Bowes D, Gray D, Counsell S (2012) A systematic literature review on fault prediction performance in software engineering. IEEE Trans Softw Eng 38(6):1276–1304. doi:10.1109/TSE.2011.103 CrossRef Hall T, Beecham S, Bowes D, Gray D, Counsell S (2012) A systematic literature review on fault prediction performance in software engineering. IEEE Trans Softw Eng 38(6):1276–1304. doi:10.​1109/​TSE.​2011.​103 CrossRef
Zurück zum Zitat He P, Li B, Liu X, Chen J, Ma Y (2015) An empirical study on software defect prediction with a simplified metric set. Inf Softw Technol 59:170–190CrossRef He P, Li B, Liu X, Chen J, Ma Y (2015) An empirical study on software defect prediction with a simplified metric set. Inf Softw Technol 59:170–190CrossRef
Zurück zum Zitat Hermann B, Reif M, Eichberg M, Mezini M (2015) Getting to know you: towards a capability model for java. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 758–769 Hermann B, Reif M, Eichberg M, Mezini M (2015) Getting to know you: towards a capability model for java. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 758–769
Zurück zum Zitat Herraiz I, Robles G, Amor JJ, Romera T, González Barahona JM (2006) The processes of joining in global distributed software projects. In: Proceedings of the 2006 international workshop on global software development for the practitioner. ACM, pp 27–33 Herraiz I, Robles G, Amor JJ, Romera T, González Barahona JM (2006) The processes of joining in global distributed software projects. In: Proceedings of the 2006 international workshop on global software development for the practitioner. ACM, pp 27–33
Zurück zum Zitat Herraiz I, Gonzalez-Barahona JM, Robles G (2007) Forecasting the number of changes in eclipse using time series analysis. In: Proceedings of the 4th IEEE working conference on mining software repositories (MSR) Herraiz I, Gonzalez-Barahona JM, Robles G (2007) Forecasting the number of changes in eclipse using time series analysis. In: Proceedings of the 4th IEEE working conference on mining software repositories (MSR)
Zurück zum Zitat Honsel V, Honsel D, Herbold S, Grabowski J, Waack S (2015) Mining software dependency networks for agent-based simulation of software evolution. In: Proceedings of the 4th international workshop on software mining (SoftMine) Honsel V, Honsel D, Herbold S, Grabowski J, Waack S (2015) Mining software dependency networks for agent-based simulation of software evolution. In: Proceedings of the 4th international workshop on software mining (SoftMine)
Zurück zum Zitat Howison J, Conklin MS, Crowston K (2005) Ossmole: A collaborative repository for floss research data and analyses. In: Proceedings of the 1st international conference on open source software Howison J, Conklin MS, Crowston K (2005) Ossmole: A collaborative repository for floss research data and analyses. In: Proceedings of the 1st international conference on open source software
Zurück zum Zitat ISO/IEC (1998) 9241-11 Ergonomic requirements for office work with visual display terminals (VDTs). ISO/IEC 9241-14 ISO/IEC (1998) 9241-11 Ergonomic requirements for office work with visual display terminals (VDTs). ISO/IEC 9241-14
Zurück zum Zitat Jermakovics A, Sillitti A, Succi G (2011) Mining and visualizing developer networks from version control systems. In: Proceedings of the 4th international workshop on cooperative and human aspects of software engineering (CHASE). ACM, New York, NY, USA, CHASE ’11, pp 24–31. doi:10.1145/1984642.1984647 Jermakovics A, Sillitti A, Succi G (2011) Mining and visualizing developer networks from version control systems. In: Proceedings of the 4th international workshop on cooperative and human aspects of software engineering (CHASE). ACM, New York, NY, USA, CHASE ’11, pp 24–31. doi:10.​1145/​1984642.​1984647
Zurück zum Zitat Joblin M, Mauerer W, Apel S, Siegmund J, Riehle D (2015) From developer networks to verified communities: a fine-grained approach. In: Proceedings of the 37th international conference on software engineering-volume 1. IEEE Press, pp 563–573 Joblin M, Mauerer W, Apel S, Siegmund J, Riehle D (2015) From developer networks to verified communities: a fine-grained approach. In: Proceedings of the 37th international conference on software engineering-volume 1. IEEE Press, pp 563–573
Zurück zum Zitat Kouroshfar E, Mirakhorli M, Bagheri H, Xiao L, Malek S, Cai Y (2015) A study on the role of software architecture in the evolution and quality of software. In: Proceedings of the 12th working conference on mining software repositories. IEEE Press, pp 246–257 Kouroshfar E, Mirakhorli M, Bagheri H, Xiao L, Malek S, Cai Y (2015) A study on the role of software architecture in the evolution and quality of software. In: Proceedings of the 12th working conference on mining software repositories. IEEE Press, pp 246–257
Zurück zum Zitat Le DM, Behnamghader P, Garcia J, Link D, Shahbazian A, Medvidovic N (2015) An empirical study of architectural change in open-source software systems. In: Proceedings of the 12th working conference on mining software repositories. IEEE Press, pp 235–245 Le DM, Behnamghader P, Garcia J, Link D, Shahbazian A, Medvidovic N (2015) An empirical study of architectural change in open-source software systems. In: Proceedings of the 12th working conference on mining software repositories. IEEE Press, pp 235–245
Zurück zum Zitat Lin Z, Whitehead J (2015) Why power laws? An explanation from fine-grained code changes. In: 2015 IEEE/ACM 12th working conference on mining software repositories (MSR). IEEE, pp 68–75 Lin Z, Whitehead J (2015) Why power laws? An explanation from fine-grained code changes. In: 2015 IEEE/ACM 12th working conference on mining software repositories (MSR). IEEE, pp 68–75
Zurück zum Zitat Linares-Vásquez M, Bavota G, Cárdenas CEB, Oliveto R, Di Penta M, Poshyvanyk D (2015a) Optimizing energy consumption of guis in android apps: A multi-objective approach. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 143–154 Linares-Vásquez M, Bavota G, Cárdenas CEB, Oliveto R, Di Penta M, Poshyvanyk D (2015a) Optimizing energy consumption of guis in android apps: A multi-objective approach. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 143–154
Zurück zum Zitat Linares-Vásquez M, White M, Bernal-Cárdenas C, Moran K, Poshyvanyk D (2015b) Mining android app usages for generating actionable gui-based execution scenarios. In: Proceedings of the 12th working conference on mining software repositories. IEEE Press, pp 111–122 Linares-Vásquez M, White M, Bernal-Cárdenas C, Moran K, Poshyvanyk D (2015b) Mining android app usages for generating actionable gui-based execution scenarios. In: Proceedings of the 12th working conference on mining software repositories. IEEE Press, pp 111–122
Zurück zum Zitat Long F, Rinard M (2015) Staged program repair with condition synthesis. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 166–178 Long F, Rinard M (2015) Staged program repair with condition synthesis. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 166–178
Zurück zum Zitat Makedonski P, Grabowski J (2016) Weighted Multi-factor multi-layer identification of potential causes for events of interest in software repositories. In: Proceedings of the seminar series on advanced techniques and tools for software evolution (SATToSE) 2015, forthcoming 2016 Makedonski P, Grabowski J (2016) Weighted Multi-factor multi-layer identification of potential causes for events of interest in software repositories. In: Proceedings of the seminar series on advanced techniques and tools for software evolution (SATToSE) 2015, forthcoming 2016
Zurück zum Zitat Makedonski P, Sudau F, Grabowski J (2015) Towards a model-based software mining infrastructure. ACM SIGSOFT Software Engineering Notes 40(1):1–8CrossRef Makedonski P, Sudau F, Grabowski J (2015) Towards a model-based software mining infrastructure. ACM SIGSOFT Software Engineering Notes 40(1):1–8CrossRef
Zurück zum Zitat Menzies T, Rees-Jones M, Krishna R, Pape C (2015) The promise repository of empirical software engineering data. http://openscience.us/repo. North Carolina State University, Department of Computer Science [accessed 22-January-2015] Menzies T, Rees-Jones M, Krishna R, Pape C (2015) The promise repository of empirical software engineering data. http://​openscience.​us/​repo. North Carolina State University, Department of Computer Science [accessed 22-January-2015]
Zurück zum Zitat Nanz S, Furia CA (2015) A comparative study of programming languages in rosetta code. In: IEEE/ACM 37th IEEE international conference on software engineering 2015 (ICSE), vol 1. IEEE, pp 778–788 Nanz S, Furia CA (2015) A comparative study of programming languages in rosetta code. In: IEEE/ACM 37th IEEE international conference on software engineering 2015 (ICSE), vol 1. IEEE, pp 778–788
Zurück zum Zitat Nguyen HV, Kästner C, Nguyen TN (2015a) Cross-language program slicing for dynamic web applications. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 369–380 Nguyen HV, Kästner C, Nguyen TN (2015a) Cross-language program slicing for dynamic web applications. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 369–380
Zurück zum Zitat Nguyen TH, Grundy J, Almorsy M (2015b) Rule-based extraction of goal-use case models from text. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 591–601 Nguyen TH, Grundy J, Almorsy M (2015b) Rule-based extraction of goal-use case models from text. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 591–601
Zurück zum Zitat Park J, Esmaeilzadeh H, Zhang X, Naik M, Harris W (2015) Flexjava: language support for safe and modular approximate programming. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 745–757 Park J, Esmaeilzadeh H, Zhang X, Naik M, Harris W (2015) Flexjava: language support for safe and modular approximate programming. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 745–757
Zurück zum Zitat Robles G (2010) Replicating msr: a study of the potential replicability of papers published in the mining software repositories proceedings. In: 2010 7th IEEE working conference on mining software repositories (MSR). IEEE, pp 171–180 Robles G (2010) Replicating msr: a study of the potential replicability of papers published in the mining software repositories proceedings. In: 2010 7th IEEE working conference on mining software repositories (MSR). IEEE, pp 171–180
Zurück zum Zitat Robles G, González-Barahona JM, Cervigón C, Capiluppi A, Izquierdo-Cortázar D (2014) Estimating development effort in free/open source software projects by mining software repositories: a case study of openstack. In: Proceedings of the 11th working conference on mining software repositories. ACM, New York, NY, USA, MSR 2014, pp 222–231. doi:10.1145/2597073.2597107 Robles G, González-Barahona JM, Cervigón C, Capiluppi A, Izquierdo-Cortázar D (2014) Estimating development effort in free/open source software projects by mining software repositories: a case study of openstack. In: Proceedings of the 11th working conference on mining software repositories. ACM, New York, NY, USA, MSR 2014, pp 222–231. doi:10.​1145/​2597073.​2597107
Zurück zum Zitat Safi G, Shahbazian A, Halfond WG, Medvidovic N (2015) Detecting event anomalies in event-based systems. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 25–37 Safi G, Shahbazian A, Halfond WG, Medvidovic N (2015) Detecting event anomalies in event-based systems. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 25–37
Zurück zum Zitat Samak M, Ramanathan MK (2015) Synthesizing tests for detecting atomicity violations. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 131–142 Samak M, Ramanathan MK (2015) Synthesizing tests for detecting atomicity violations. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 131–142
Zurück zum Zitat Scheidgen M, Zubow A, Fischer J, Kolbe TH (2012) Automated and transparent model fragmentation for persisting large models. Springer Scheidgen M, Zubow A, Fischer J, Kolbe TH (2012) Automated and transparent model fragmentation for persisting large models. Springer
Zurück zum Zitat Shepperd M, Song Q, Sun Z, Mair C (2013) Data quality: some comments on the NASA software defect datasets. IEEE Trans Softw Eng 39(9):1208–1215CrossRef Shepperd M, Song Q, Sun Z, Mair C (2013) Data quality: some comments on the NASA software defect datasets. IEEE Trans Softw Eng 39(9):1208–1215CrossRef
Zurück zum Zitat Shi A, Yung T, Gyori A, Marinov D (2015) Comparing and combining test-suite reduction and regression test selection. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 237–247 Shi A, Yung T, Gyori A, Marinov D (2015) Comparing and combining test-suite reduction and regression test selection. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 237–247
Zurück zum Zitat Shull FJ, Carver JC, Vegas S, Juristo N (2008) The role of replications in empirical software engineering. Empir Softw Eng 13(2):211–218CrossRef Shull FJ, Carver JC, Vegas S, Juristo N (2008) The role of replications in empirical software engineering. Empir Softw Eng 13(2):211–218CrossRef
Zurück zum Zitat Siegmund J, Siegmund N, Apel S (2015a) Views on internal and external validity in empirical software engineering. In: 2015 IEEE/ACM 37th IEEE international conference on software engineering (ICSE), vol 1. IEEE, pp 9–19 Siegmund J, Siegmund N, Apel S (2015a) Views on internal and external validity in empirical software engineering. In: 2015 IEEE/ACM 37th IEEE international conference on software engineering (ICSE), vol 1. IEEE, pp 9–19
Zurück zum Zitat Siegmund N, Grebhahn A, Apel S, Kästner C (2015b) Performance-influence models for highly configurable systems. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 284–294 Siegmund N, Grebhahn A, Apel S, Kästner C (2015b) Performance-influence models for highly configurable systems. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 284–294
Zurück zum Zitat Sjøberg DI, Hannay JE, Hansen O, Kampenes VB, Karahasanovic A, Liborg NK, Rekdal AC (2005) A survey of controlled experiments in software engineering. IEEE Trans Softw Eng 31(9):733–753CrossRef Sjøberg DI, Hannay JE, Hansen O, Kampenes VB, Karahasanovic A, Liborg NK, Rekdal AC (2005) A survey of controlled experiments in software engineering. IEEE Trans Softw Eng 31(9):733–753CrossRef
Zurück zum Zitat Smith EK, Barr ET, Le Goues C, Brun Y (2015) Is the cure worse than the disease? Overfitting in automated program repair. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 532–543 Smith EK, Barr ET, Le Goues C, Brun Y (2015) Is the cure worse than the disease? Overfitting in automated program repair. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 532–543
Zurück zum Zitat Smith EK, Bird C, Zimmermann T (2016) Beliefs, practices, and personalities of software engineers: a survey in a large software company. In: Proceedings of the 9th international workshop on cooperative and human aspects of software engineering. ACM, pp 15–18 Smith EK, Bird C, Zimmermann T (2016) Beliefs, practices, and personalities of software engineers: a survey in a large software company. In: Proceedings of the 9th international workshop on cooperative and human aspects of software engineering. ACM, pp 15–18
Zurück zum Zitat Sun X, Liu X, Li B, Duan Y, Yang H, Hu J (2016) Exploring topic models in software engineering data analysis: A survey. In: 2016 17th IEEE/ACIS international conference on software engineering, artificial intelligence, networking and parallel/distributed computing (SNPD), pp 357–362. doi:10.1109/SNPD.2016.7515925 Sun X, Liu X, Li B, Duan Y, Yang H, Hu J (2016) Exploring topic models in software engineering data analysis: A survey. In: 2016 17th IEEE/ACIS international conference on software engineering, artificial intelligence, networking and parallel/distributed computing (SNPD), pp 357–362. doi:10.​1109/​SNPD.​2016.​7515925
Zurück zum Zitat Tan M, Tan L, Dara S, Mayeux C (2015) Online defect prediction for imbalanced data. In: Proceedings of the IEEE/ACM 37th international conference on software engineering (ICSE) Tan M, Tan L, Dara S, Mayeux C (2015) Online defect prediction for imbalanced data. In: Proceedings of the IEEE/ACM 37th international conference on software engineering (ICSE)
Zurück zum Zitat Tao Y, Kim S (2015) Partitioning composite code changes to facilitate code review. In: 2015 IEEE/ACM 12th working conference on mining software repositories (MSR). IEEE, pp 180–190 Tao Y, Kim S (2015) Partitioning composite code changes to facilitate code review. In: 2015 IEEE/ACM 12th working conference on mining software repositories (MSR). IEEE, pp 180–190
Zurück zum Zitat Thomas JJ, Cook KA (2006) A visual analytics agenda. IEEE Comput Graph Appl 26(1):10–13CrossRef Thomas JJ, Cook KA (2006) A visual analytics agenda. IEEE Comput Graph Appl 26(1):10–13CrossRef
Zurück zum Zitat Trautsch F, Herbold S, Makedonski P, Grabowski J (2016) Adressing problems with external validity of repository mining studies through a smart data platform. In: Proceedings of the 13th international workshop on mining software repositories. ACM, pp 97–108 Trautsch F, Herbold S, Makedonski P, Grabowski J (2016) Adressing problems with external validity of repository mining studies through a smart data platform. In: Proceedings of the 13th international workshop on mining software repositories. ACM, pp 97–108
Zurück zum Zitat Van Rysselberghe F, Demeyer S (2004) Studying software evolution information by visualizing the change history. In: Proceedings of the 20th IEEE international conference on software maintenance, 2004. IEEE, pp 328–337 Van Rysselberghe F, Demeyer S (2004) Studying software evolution information by visualizing the change history. In: Proceedings of the 20th IEEE international conference on software maintenance, 2004. IEEE, pp 328–337
Zurück zum Zitat Walden J, Stuckman J, Scandariato R (2014) Predicting vulnerable components: Software metrics vs text mining. In: Proceedings of the IEEE 25th international symposium on software reliability engineering (ISSRE). IEEE, pp 23–33 Walden J, Stuckman J, Scandariato R (2014) Predicting vulnerable components: Software metrics vs text mining. In: Proceedings of the IEEE 25th international symposium on software reliability engineering (ISSRE). IEEE, pp 23–33
Zurück zum Zitat Xu T, Jin L, Fan X, Zhou Y, Pasupathy S, Talwadker R (2015) Hey, you have given me too many knobs!: understanding and dealing with over-designed configuration in system software. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 307–319 Xu T, Jin L, Fan X, Zhou Y, Pasupathy S, Talwadker R (2015) Hey, you have given me too many knobs!: understanding and dealing with over-designed configuration in system software. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 307–319
Zurück zum Zitat Yang W, Xiao X, Andow B, Li S, Xie T, Enck W (2015) Appcontext: Differentiating malicious and benign mobile app behaviors using context. In: IEEE/ACM 37th IEEE international conference on software engineering 2015 (ICSE), vol 1. IEEE, pp 303–313 Yang W, Xiao X, Andow B, Li S, Xie T, Enck W (2015) Appcontext: Differentiating malicious and benign mobile app behaviors using context. In: IEEE/ACM 37th IEEE international conference on software engineering 2015 (ICSE), vol 1. IEEE, pp 303–313
Zurück zum Zitat Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX conference on hot topics in cloud computing (HotCloud) Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX conference on hot topics in cloud computing (HotCloud)
Zurück zum Zitat Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin MJ, Shenker S, Stoica I (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX conference on network system design and implementation (NSDI) Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin MJ, Shenker S, Stoica I (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX conference on network system design and implementation (NSDI)
Zurück zum Zitat Zhu J, He P, Fu Q, Zhang H, Lyu MR, Zhang D (2015) Learning to log: helping developers make informed logging decisions. In: IEEE/ACM 37th IEEE international conference on software engineering 2015 (ICSE), vol 1. IEEE, pp 415–425 Zhu J, He P, Fu Q, Zhang H, Lyu MR, Zhang D (2015) Learning to log: helping developers make informed logging decisions. In: IEEE/ACM 37th IEEE international conference on software engineering 2015 (ICSE), vol 1. IEEE, pp 415–425
Metadaten
Titel
Addressing problems with replicability and validity of repository mining studies through a smart data platform
verfasst von
Fabian Trautsch
Steffen Herbold
Philip Makedonski
Jens Grabowski
Publikationsdatum
08.08.2017
Verlag
Springer US
Erschienen in
Empirical Software Engineering / Ausgabe 2/2018
Print ISSN: 1382-3256
Elektronische ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-017-9537-x

Weitere Artikel der Ausgabe 2/2018

Empirical Software Engineering 2/2018 Zur Ausgabe