Elsevier

Research Policy

Volume 43, Issue 9, November 2014, Pages 1621-1633
Research Policy

Open access to data: An ideal professed but not practised

https://doi.org/10.1016/j.respol.2014.04.008Get rights and content

Highlights

  • Data-sharing in economics is often professed but seldom practised.

  • We find that 80.74% of researchers do not voluntarily share their data.

  • We derive five testable hypotheses based on the literature on information-sharing.

  • We find four significant predictors of voluntary data-sharing.

  • Tenure, author quality, extent of mandatory data-disclosure and personal attitudes.

Abstract

Data-sharing is an essential tool for replication, validation and extension of empirical results. Using a hand-collected data set describing the data-sharing behaviour of 488 randomly selected empirical researchers, we provide evidence that most researchers in economics and management do not share their data voluntarily. We derive testable hypotheses based on the theoretical literature on information-sharing and relate data-sharing to observable characteristics of researchers. We find empirical support for the hypotheses that voluntary data-sharing significantly increases with (a) academic tenure, (b) the quality of researchers, (c) the share of published articles subject to a mandatory data-disclosure policy of journals, and (d) personal attitudes towards “open science” principles. On the basis of our empirical evidence, we discuss a set of policy recommendations.

Introduction

Sharing and granting access to data (disclosure) enables researchers to replicate, verify and expand existing works. In addition, it helps to deter and detect data-related fraud and creates incentives for compiling high-quality data and codes, thereby reducing errors.1 These arguments would seem to make voluntary data-sharing “a requirement for responsible scholarship” (Journal of Political Economy, 1975, p. 1296). In addition, most researchers seem to embrace the idea of self-correction and replicability in science – which is immediately associated with data-sharing (Nelson, 2009).

There is ample anecdotal evidence, however, that the ideal is not being followed in practice due to disincentives to share data (Moffitt, 2007). Voluntary data-sharing is also not part of the norms in many fields in which non-disclosure of data is to some extent an accepted practice (Tenopir et al., 2011). Furthermore, researchers have personal incentives to avoid sharing of intermediate research results (Thursby et al., 2009) and the willingness to share depends on whether the information is shared privately in response to a request or publicly, e.g., in academic presentations or on the Internet (Haeussler et al., 2014). In addition, according to Anderson et al. (2008, p. 99), the “ ‘supply of’ replicable results in economics has been minimal”. As Hamermesh (2006, p. 715) claims, “economists treat replication … as an ideal to be professed but not to be practised”. However, there is very little systematic evidence on the status quo and the key drivers of data-sharing that goes beyond anecdotes or intuitive insights – both of which are not particularly suitable foundations for editorial or public policies.

The aim of this paper is twofold. First, we provide systematic empirical evidence that data-sharing and facilitated access to data and codes in economics and management is scarce. Although the ease of sharing has increased recently and although many top-ranked journals in economics and management nowadays require data-sharing, researchers tend to publish neither data nor codes on their websites or in public data repositories. We constructed a unique data set describing the data-sharing behaviour of 488 randomly selected empirical researchers affiliated with the top 100 economics departments and top 50 business schools. In our sample, 82 empirical researchers (16.8%) sporadically share data, while only 12 (2.46%) share data in a comprehensive and clear way. The large majority, 394 (80.74%), neither share data nor provide any indication of whether or where the data is available.

Second, based on the theoretical literature we derive testable hypotheses on data-sharing in economics and management. Our data allow us to relate voluntary data-sharing via researchers’ websites and public data repositories to observable characteristics of the respective researchers and their institutions.2 Our regression analysis allows us to show that voluntary data-sharing is systematically associated with three factors: personal incentives, institutional factors and personal attitudes towards “open science”. Our empirical results confirm these theoretical predictions and show that the likelihood of voluntary data-sharing increases with (a) academic tenure, (b) the quality of researchers, (c) the share of published articles that are subject to a mandatory data-disclosure policy of journals, and (d) personal attitudes towards open science. Based on our results, we derive a set of implications for journal and university policy to address the current low sharing status and encourage data-sharing.

The debate on data-sharing has recently gained momentum due to the data fraud scandal concerning the social psychologist Diederik Stapel from Tilburg University. Considered a star scientist in the Netherlands and abroad, his studies on social behaviour attracted wide press coverage. In 2012, the Levelt Committee et al. (2012, p. 17) found that he had committed scientific fraud including “fabrication, falsification or unjustified replenishment of data” in 55 publications. They concluded that the data that underlie publications “must remain archived and be made available on request to other scientific practitioners” in order to increase detection and to discourage fraud (Levelt Committee et al., 2012, p. 58). This debate has been further pursued in Nature (2009, p. 145), whose editors claim that “research cannot flourish if data are not preserved and made accessible” and “data management should be woven into every course in science, as one of the foundations of knowledge”.

Besides fraud, the debate surrounding the coding error by two leading economists on the negative relation between debt and growth, which has been used to justify austerity programmes around the world, has also advanced the debate on data-sharing and replication (Herndon et al., 2013, Reinhart and Rogoff, 2013).3

This paper relates to two intertwined strands of literature which focus on the personal and institutional factors of data-sharing and replication in applied economics (Dewald et al., 1986, Feigenbaum and Levy, 1993, McCullough et al., 2006) and on the incentive structure for sharing of intermediate research results (Dasgupta and David, 1994, Haeussler, 2011, Haeussler et al., 2014).

The first strand of literature emphasizes that inadvertent errors do occur in empirical publications (Anderson et al., 2008, Dewald et al., 1986, McCullough et al., 2006). However, in applied economics, there is not a strong tradition of data-sharing or of replication studies for which the availability of data is a prerequisite (McCullough et al., 2006).4 As Anderson et al. (2008, p. 101) put it, “the incentive structure of publish-or-perish is just too strongly skewed toward irreproducibility”. Further, as the publication market for replication studies is limited, investing time and effort in writing a replication study is not an efficient use of a researcher's resources (Hamermesh, 1997, McCullough et al., 2006, Mirowski and Sklivas, 1991). Consequently, as replication of a researcher's results is highly unlikely to occur, it is also not rational to invest significant time and effort in ensuring the replicability of research results, e.g., by compiling data in a reproducible way (McCullough et al., 2006). Feigenbaum and Levy (1993) suggest that rational researchers are reluctant to share data in order to delay or even prevent attempts to replicate their results. Keeping research data secret may also be a rational strategy for the creators of the data to protect their competitive advantage as sharing of data would lower the competing researchers’ cost of recollecting the data and rewriting the code (McCullough, 2009). In particular, the original creator of research data may have an incentive to keep the data secret until their private value is fully exploited in subsequent publications (Anderson et al., 2008, Stephan, 1996). In addition, Anderson et al. (2008, p. 99) argue that the” ‘respect for the scientific method’ is not sufficient to motivate … editors of professional journals to ensure the replicability of published results”. While some of the top-ranked economics journals have recently introduced data availability policies to address this issue, the vast majority of journals either do not have a policy that requires authors to share their data or are reluctant to enforce it (McCullough, 2009, McCullough and Vinod, 2003). Also, Anderson et al. (2008) put forward that authors may generally be hesitant to share their data and code despite their pre-publication commitment to provide this information. This may suggest that editors, referees and readers are confident that the empirical results presented in the papers are always credible and robust. However, this is not always the case (Lacetera and Zirulia, 2011). In their seminal paper, Dewald et al. (1986) attempted to replicate 54 papers published in the Journal of Money, Credit and Banking and could replicate only two. Later, McCullough et al. (2006) tried to replicate 69 articles published in the same journal and could only replicate 14. McCullough et al. (2008) attempted to replicate 117 articles published in the Federal Reserve Bank of St. Louis Review and could only replicate nine. These findings raise serious concerns regarding the credibility and reliability of empirical work.5

We contribute to this first strand of literature in three important aspects. First, we provide systematic empirical evidence on the status quo in economics and management with respect to voluntary data-sharing. Second, we find strong empirical support for the hypothesis that personal attitudes towards open science principles drive the choice of researchers to share data. Third, to the best of our knowledge, this is the first paper to empirically test (and find empirical support for) the hypothesis that mandatory data-disclosure induces voluntary data-sharing.

Finally, this paper relates to the above-mentioned second strand of literature on the incentive structure of sharing of intermediate research results. Haeussler et al. (2014) find that tenured academics in the life sciences are more likely to engage in information-sharing in one-on-one situations, e.g., cases in which one researcher requests specific material or data from another. However, they also find that tenure does not matter in the case of general sharing, e.g., presentations of intermediate research results at conferences or workshops. Our analysis differs from Haeussler et al. (2014) in two aspects. First, we find that tenure can also matter for data-sharing via author websites or public data repositories. Second, while Haeussler et al. (2014) adopt a survey response, we derive our results from observation of the websites of randomly selected researchers and public data repositories.

The remainder of the paper is organised as follows. In Section 2, we describe the conceptual framework and develop our research hypotheses. Section 3 presents the data and defines the variables under study. In Section 4, we run the regressions and report the results. Section 5 discusses resulting policy implications and Section 6 concludes our study.

Section snippets

Conceptual framework and research hypotheses

A researcher's decision to share data depends on the incentive structure of the priority-based reward system of science such as career incentives and scientific competition (Cohen and Walsh, 2008, Feigenbaum and Levy, 1993, Haeussler, 2011, Haeussler et al., 2014, Stephan, 1996). It also depends on institutional factors, e.g., data availability policies of journals (Anderson et al., 2008, McCullough, 2009) and on personal attitudes towards open science (Dasgupta and David, 1994, Mukherjee and

Data and definition of variables

In our data set we gather information on the extent of voluntary data-sharing and mandatory data-disclosure of 488 empirical researchers in economics, 388 of which are affiliated with an economics department (100 with a business school). The researchers are chosen uniformly across the top 100 economics departments (four observations each) and top 50 business schools12

Empirical results

This section presents and discusses the empirical evidence with respect to the two aims of this paper: first, to examine the status quo of data-sharing by empirical researchers in economics and management and, second, to analyse the extent to which voluntary data-sharing is driven by personal incentives, institutional factors and personal attitudes towards open science principles as discussed in the hypothesis section.

Policy implications

The public availability of research and government data has recently attracted widespread attention from policy-makers at the national and international level (European Commission, 2012, OECD, 2007, US House of Representatives, 2007). The European Commission (2012) suggests that open access to research data will improve the quality of research results and foster scientific progress and innovation. The OECD (2007, p. 13) stresses the importance of open availability of research data to “improve

Conclusion

The status quo in empirical research in economics and management is not to share data. Using a hand-collected data set consisting of 488 observations from randomly selected empirical researchers affiliated with the top 100 economics departments and top 50 business schools we show that most researchers, 394 (80.74%), do not share their data. In contrast, 82 empirical researchers (16.8%) sporadically share data, while only 12 (2.46%) share data regularly in a comprehensive and transparent way.

Acknowledgments

We thank Badi Baltagi, Daniel Hamermesh, Dietmar Harhoff, Fabian Herweg, Mark Schankerman, Marc Scheufen, Monika Schnitzer, Olaf Siegert, Ralf Toepfer, Sven Vlaeminck, Gert G. Wagner, Joachim Wagner, Joachim Winter and three anonymous referees for their helpful comments and suggestions. We also thank participants of the Third LERU Doctoral Summer School “Beyond Open Access: Open Education, Open Data and Open Knowledge”, Universitat de Barcelona, 8–13 July 2012, and seminar participants of the

References (77)

  • R. Agarwal et al.

    Industry or academia, basic or applied? Career choices and earnings trajectories of scientists

    Management Science

    (2013)
  • R. Allan

    Editorial: geoscience data

    Geoscience Data Journal

    (2012)
  • A.A. Alsheikh-Ali et al.

    Public availability of published research data in high-impact journals

    PLoS ONE

    (2011)
  • M. Altman et al.

    A proposed standard for the scholarly citation of quantitative data

    D-Lib Magazine

    (2007)
  • R.G. Anderson et al.

    The role of data/code archives in the future of economic research

    Journal of Economic Methodology

    (2008)
  • C.G. Begley et al.

    Drug development: raise standards for preclinical cancer research

    Nature

    (2012)
  • C.L. Borgmann

    The conundrum of sharing research data

    Journal of the American Society for Information Science and Technology

    (2012)
  • D. Card et al.

    Nine facts about top journals in economics

    Journal of Economic Literature

    (2013)
  • W.G. Christie et al.

    Why do NASDAQ market makers avoid odd-eighth quotes?

    Journal of Finance

    (1994)
  • W.M. Cohen et al.

    Real impediments to academic biomedical research

  • D. Constant et al.

    What's mine is ours, or is it? A study of attitudes about information sharing

    Information Systems Research

    (1994)
  • M.J. Costello

    Motivating online publication of data

    BioScience

    (2009)
  • T. Coupé et al.

    Incentives, sorting and productivity along the career: evidence from a sample of top economists

    Journal of Law, Economics, and Organization

    (2006)
  • V.P. Crawford et al.

    Strategic information transmission

    Econometrica

    (1982)
  • G. Currie et al.

    The impact of institutional forces upon knowledge sharing in the UK NHS: the triumph of professional power and the inconsistency of policy

    Public Administration

    (2006)
  • P. Dasgupta et al.

    Toward a new economics of science

    Research Policy

    (1994)
  • W.G. Dewald et al.

    Replication in empirical economics: the Journal of Money, Credit and Banking project

    American Economic Review

    (1986)
  • P.J. DiMaggio et al.

    Introduction

  • Economic and Social Research Council

    ESRC Research Data Policy

    (2010)
  • European Commission

    Towards Better Access to Scientific Information: Boosting the Benefits of Public Investments in Research

    (2012)
  • S. Feigenbaum et al.

    The market for (ir)reproducible econometrics

    Social Epistemology

    (1993)
  • C. Ferraz et al.

    Exposing corrupt politicians: the effects of Brazil's publicly released audits on electoral outcomes

    Quarterly Journal of Economics

    (2008)
  • S.E. Fienberg et al.

    Sharing Research Data

    (1985)
  • J.L. Furman et al.

    Climbing atop the shoulders of giants: the impact of institutions on cumulative research

    American Economic Review

    (2011)
  • S.J. Gans et al.

    Is there a market for ideas?

    Industrial and Corporate Change

    (2010)
  • D.S. Hamermesh

    Viewpoint: replication in economics

    Canadian Journal of Economics

    (2006)
  • T. Herndon et al.

    Does high public debt consistently stifle economic growth? A critique of Reinhart and Rogoff

  • Y. Kim et al.

    Institutional and individual influences on scientists’ data sharing practices

    Journal of Computational Science Education

    (2012)
  • Cited by (0)

    View full text