Poor code quality is endemic, and not just in scientific computation. It is always tempting to build something 'quick and dirty', under the assumption that it can be cleaned up later. This is especially true at the cutting edge of a field — why invest time writing beautifully engineered code from the outset, if you're not sure that what you're trying to do is even possible?

In software engineering, this is known as technical debt: by deferring issues such as code readability and maintainability, a debt is created that someone in the future might have to pay, in the extra effort needed to re-run or modify the code1. The point of the metaphor is not that debt is bad per se. After all, we frequently incur debt to obtain something of immediate value, for example, using a mortgage to buy a house. The point is that such debts have to be managed carefully, to prevent them spiralling out of control.

Open source policies in scholarly journals can help here. If journals ask for open code, they create a strong incentive for authors to clean up the code each time a paper is produced, rather than deferring such tasks indefinitely. As a second order effect, such policies should encourage more scientists to take the opportunity to improve their software-building skills, through courses such as Software Carpentry (http://software-carpentry.org/).

I argue that open source policies are unlikely to usher in an era of much greater sharing and reproducibility, because there are many barriers beyond the basic requirement of being able to read the code (see Box 1). Instead, such policies have an important role to play in improving the quality of scientific software by nudging scientists to manage their technical debt more carefully.

Repeat or reproduce?

In principle, by making models and data freely available, other scientists can perform their own analysis on the data, and can re-run the code to verify results2. Ideally, the end result is greater collaboration, wider acceptance of results and increased trust in the scientific endeavour. In practice, none of this comes easily.

First, there is some disagreement on what it means for research to be reproducible3. For example, repeatability and reproducibility are often conflated in the context of scientific computing. Repeatability means the ability to re-run the same code at a later time or on a different machine. Reproducibility means the ability to recreate the results, whether by re-running the same code, or by writing a new program (see Fig. 1).

Figure 1
figure 1

Repeatability versus reproducibility.

It is possible to have each without the other. Repeatability without reproducibility — getting different results when re-running the code — can be a result of fragile code, combined with small changes in the hardware platform, the compiler or one of the ancillary tools. It is especially common in numerical simulations of chaotic phenomena, such as weather, where any change in how the code is compiled and optimized may lead to tiny rounding differences that rapidly multiply as the simulation proceeds. In meteorology and climate science, modellers handle this problem by using ensembles of runs to produce probabilistic results. Exact repeatability is extremely hard to maintain across platforms (see Box 1).

Reproducibility without repeatability — the confirmation of results using different code — is the computational equivalent of a replicated experiment, the bread-and-butter of doing science. Independently reproducing computational results is a creative process that can lead to the discovery of new approaches, and generates a stronger body of scientific evidence.

At the intersection, repeating a run to get the same results is rarely interesting. It may be an important testing strategy, for example when porting code, but it doesn't yield new scientific insights. Sometimes it can be useful for different labs to re-run each other's code to make detailed comparisons of their different approaches, but in practice, this rarely happens. In the climate sciences, for example, such inter-comparisons are achieved more easily without the need for code sharing, not least because of the technical difficulties that come with running complex code on another machine. Instead, each lab runs its own models on standard experiments and shares the model outputs via a community databank.

As a result, at least for more complex scientific software, it is not obvious that making the code associated with a specific scientific publication available will lead to significant advances in reproducibility or to significant new insights. A sharing strategy built around modular tools might be more useful than one based on the idea of repeating computations used in published papers.

The myth of many eyes

Journal policies to ensure code availability for published papers will not, on their own, create successful open source collaborations. In practice, making code truly open source — in the sense that it can be run, usefully, by a wide variety of people and on a wide variety of platforms — demands a commitment that few scientists are able to make. When building scientific software, researchers usually face the choice to build something that helps answer a specific problem, or to build a more generalized tool and invest in an effort to build a user community around it. These two goals tend to be mutually exclusive, and the latter approach is only worthwhile if there is a clear demand for the tool. Code that is created to produce an individual publication rarely qualifies.

One of the hidden secrets of open source is that the vast majority of software released as open source fails to attract any kind of a community at all. For example, detailed analyses of the projects on Sourceforge (http://sourceforge.net) demonstrate a clear power-law pattern of participation: a very small number of projects end up with a large number of participants, whereas the vast majority of projects end up with one participant or even none4. These results suggest that for the vast majority of scientists who release their code with a journal publication, there will probably be no uptake whatsoever. However, a small number of projects will attract a lot of attention — perhaps those with controversial results or exciting breakthroughs. Releasing the code is therefore a hopeful act — it allows for serendipitous discovery, even if we can't predict whether anyone will be interested.

On a different note, in the polarized context of climate research, making code available to public scrutiny holds the potential to improve trust. Certainly, if the code is not available, accusations that the science cannot be trusted are easy to make. But in reality, releasing the code makes little difference, as all but the simplest codes are impenetrable to non-experts. Unfortunately, trust via open source could also come at a price: climate scientists are often subject to politically-motivated attacks, and opening up access to source code brings the potential for 'denial of service' attacks on scientific labs. A science institution usually does not have the support staff to help resolve queries, if attempts to re-run the code fail. This puts scientists in a difficult position: if they do not respond, they can be exposed to pressure from the media; if they do, they could end up spending all their time resolving minor weaknesses in the code. Making code available can therefore only work on the understanding that it does not involve the obligation to support others in repeating the computations.

Realistic expectations

A growing number of scientific journals, including the Nature family, now encourage authors to share their software, and require them to include a statement in each paper about the availability of the code. Such policies are an important step forward for computational science, but the expectations around these policies should be realistic. Significant improvements in the sharing of software tools and in making computationally-based research reproducible require much more than merely making the code available. Nevertheless, even the short-term benefits of such policies are not negligible. Asking authors to open up access to their code is likely to lead to a rapid improvement in code quality: the mere possibility that someone could read the code is a strong incentive to make it presentable (even if nobody does read it in the end).

Journal efforts to move research communities towards a norm where all code is freely available, are only a first step. Building on such a culture of openness, an environment may eventually develop where small data sets and new software tools can be more readily discovered, and where reproducibility is achieved more easily.