1 Introduction
-
RQ1 What is the prevalence of software artifact papers at ICSE?s
-
RQ2 How have the artifacts from those papers been published?
-
RQ3 What is the impact of publishing software artifacts?
-
We provide a comparison of existing open-science initiatives in terms of the properties they define for publishing software artifacts.
-
We analyzed 789 papers that have been published at ICSE research tracks from 2007 until 2017. Based on this analysis, we address three questions, namely: (RQ1) how many software artifacts have been reported in the papers; (RQ2) whether and how these artifacts have been published; and (RQ3) whether publishing software artifacts changed the impact (in terms of citations) of the corresponding papers.
-
From our findings, we derive lessons learned that can help SAP authors avoid pitfalls, and which substantiate and complement existing artifact publishing guidelines (cf. Section 5.2).
-
We provide a public replication package comprising our dataset and all analysis scripts on Zenodo.5
2 Artifact Publishing Guidelines
3 Methodology
3.1 Data Acquisition
3.2 Authorship-Based Community Clustering
3.3 Classification
-
Conceptual / Guideline – Papers that introduce, for example, new concepts, guidelines, and best-practices, and reason about their effectiveness based on experiences or logic.
-
Empirical Study – Papers that use empirical methods to improve the understanding of the status quo, rather than the effectiveness of some new technique. The latter, we considered to be technical contributions (for example, if a paper evaluates a new algorithm by applying it to existing benchmarks and comparing it to other state-of-the-art techniques).
-
Experience Report – Papers that report on experiences in a subjective, non-empirical way. For example, these can be reports about the practical application of an existing method, process, or tool.
-
Technical Contribution – Papers that focus on new methods or algorithms. If such a paper uses an empirical study to evaluate its contribution, we still labeled it as a technical contribution (cf. Empirical Study).
-
Non-linked artifact: The paper does not contain a link to its artifact.
-
Linked, but non-available artifact: The paper contains a link to its artifact, but we could not download or use it from there.
-
Available artifact: The paper contains a link and we could still download or use the artifact from the referenced archive.
-
Personal: The website is a personal one (e.g., of an author or exclusively for making the software artifact available) that was not hosted by an institution or an organization.Example: https://www.jenn-doe.com/project
-
Academic / Institutional: The website is hosted by a company, university, or a project (e.g., Eclipse Marketplace) and encompasses artifacts developed by this institution or within this project.Example: https://www.atlantis-university.edu/˜jenndoe/project
-
Open-Source Repositories: The website is an open-source repository hosted by an organization, such as GitHub or BitBucket, which provide a service to make source code publicly available for developers or organizations.Example: https://code-repo.com/jenndoe/project
4 Evaluation
4.1 Prevalence of Software Artifact Papers (RQ1)
4.2 How Software Artifacts are Published (RQ2)
Year | #Software Artifact Papers | #Linked Artifacts | #Available Artifacts |
---|---|---|---|
2007 | 35 | 3 | 2 |
2008 | 39 | 13 | 5 |
2009 | 39 | 19 | 5 |
2010 | 41 | 15 | 7 |
2011 | 48 | 25 | 10 |
2012 | 73 | 33 | 20 |
2013 | 66 | 29 | 14 |
2014 | 71 | 38 | 21 |
2015 | 68 | 37 | 23 |
2016 | 71 | 39 | 25 |
2017 | 53 | 38 | 31 |
Σ | 604 | 289 | 163 |
4.3 Impact of Publishing Artifacts (RQ3)
arc Artifact linked | arc Artifact available | arc Artifact named | ||||
---|---|---|---|---|---|---|
mean | median | mean | median | mean | median | |
no | 0.915 | 0.601 | 0.935 | 0.667 | 0.926 | 0.619 |
yes | 1.028 | 0.770 | 1.061 | 0.735 | 0.989 | 0.694 |
5 Discussion
5.1 Reasons Not to Publish Artifacts
-
Properly publishing an artifact requires a lot of initial (e.g., documentation, packaging) and maintenance effort (e.g., updating information when an author’s affiliation changes). This requires much time and is usually not awarded (Méndez Fernández et al. 2019).
-
Additional efforts occur when authors submit their software artifacts to venues with a double-blind review process. While double-blind reviews can reduce reviewing biases (Le Goues et al. 2018) and are preferred by a majority of authors (Prechelt et al. 2018), the authors also have to anonymize their artifacts and data (Méndez Fernández et al. 2019).26
-
Researchers may be unaware or uninterested in the benefits of publishing artifacts (Haupt et al. 2018).
-
Selecting a suitable software license is simple when publishing artifacts that have been developed from scratch. However, this selection can become extremely complicated when several, potentially contradicting licenses of the artifact’s components must be considered (Schreiber and Haupt 2017; Almeida et al. 2017; Méndez Fernández et al. 2019).
-
Similarly, some authors and research institutions may have copyright concerns, which prevent them from publishing their artifact as open source.
-
Researchers, especially in software engineering, may also be ashamed of their source code. While we are generally aware of what constitutes “good” software, prototypes and proofs-of-concept are seldom up to these standards. Researchers may, therefore, decide to not publish these “prototypical” software artifacts.
-
Software artifacts are occasionally published after paper acceptance, for instance, because they may require additional consolidation and polishing to be understandable and useful to others. Due to the pressure of approaching submission deadlines, researchers may decide to postpone such “cosmetic” tasks until acceptance. While badges provide an incentive to publish artifacts immediately with the paper, the decision if and when to publish artifacts remains with the authors.
-
Researchers may understandably withhold an artifact for some time if it is part of a larger project that has not yet been published in its entirety.