Skip to main content
Top

Hint

Swipe to navigate through the chapters of this book

2021 | OriginalPaper | Chapter

Toward a FAIR Reproducible Research

Authors : Christophe Bontemps, Valérie Orozco

Published in: Advances in Contemporary Statistics and Econometrics

Publisher: Springer International Publishing

Abstract

Two major movements are actively at work to change the way research is done, shared, and reproduced. The first is the reproducible research (RR) approach, which has never been easier to implement given the current availability of tools and DIY manuals. The second is the FAIR (Findable, Accessible, Interoperable, and Reusable) approach, which aims to support the availability and sharing of research materials. We show here that despite the efforts made by researchers to improve the reproducibility of their research, the initial goals of RR remain mostly unmet. There is great demand, both within the scientific community and from the general public, for greater transparency and for trusted published results. As a scientific community, we need to reorganize the diffusion of all materials used in a study and to rethink the publication process. Researchers and journal reviewers should be able to easily use research materials for reproducibility, replicability, or reusability purposes or for exploration of new research paths. Here we present how the research process, from data collection to paper publication, could be reorganized and introduce some already available tools and initiatives. We show that even in cases in which data are confidential, journals and institutions can organize and promote “FAIR-like RR” solutions where not only the published paper but also all related materials can be used by any researcher.

To get access to this content you need the following product:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt 90 Tage mit der neuen Mini-Lizenz testen!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe



 


Jetzt 90 Tage mit der neuen Mini-Lizenz testen!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt 90 Tage mit der neuen Mini-Lizenz testen!

Appendix
Available only for authorised users
Footnotes
1
We consider here that the research process starts once the data are collected and in possession of the researcher. We do not address here the issue of reproducibility for data collection in experimental economics or field experiments (Bowers et al. 2017).
 
2
We will not discuss here the question of the precise meaning of “same results”.
 
3
At the European level, one should mention OpenAIRE and in France the “Plan national pour la science ouverte” (https://​www.​ouvrirlascience.​fr/​).
 
4
See also Table 2 in Appendix 2, for a synthesis of the cases presented throughout the paper.
 
5
Other issues that we do not address directly here include the digital preservation of research data (Akers and Doty 2013) or the preservation of software (Di Cosmo and Zacchiroli 2017).
 
6
In these figures, for clarity reasons, we do not illustrate the fact that researchers may share their materials themselves.
 
7
In 2003, H. Pesaran announced the creation of a new section of the Journal of Applied Econometrics dedicated to the replication of published empirical papers (Pesaran 2003). Since then, some journals have followed this idea leading to an increase in the number of replication papers in economics (Mueller-Langer et al. 2019). The site PubPeer (https://​pubpeer.​com/​) is also a way to allow users to discuss and review scientific research.
 
8
Some useful resources facilitate the process (see https://​social-science-data-editors.​github.​io/​guidance/​Verification_​guidance.​html). The Transparency and Openness Promotion (TOP) proposes also varying levels of replication policies for journals (Nosek et al. 2015).
 
9
Jacoby William (2017) analyzed the AJPS verification policy and reported an average of 8 person-hours per manuscript to curate and replicate the analyses. The publication workflow, involving more rounds and resubmissions, is also much longer.
 
10
A complete list of solutions is detailed in The Registry of Research Data Repositories (http://​re3data.​org) a service of DataCite. In addition, CoreTrustSeal provides certification to repositories and lists the certified ones.
 
11
For datasets, the FAIR interoperability principle suggests the use of open formats such as CSV files instead of proprietary formats (.xls). For code, open-source software should be preferred to avoid exclusive access (Vilhuber 2019). The metadata should also follow standards (Dublin core or DDI). References and links to related data should also be provided (Jones and Grootveld 2017).
 
12
The DataCite project (Brase 2009) is a popular resource to locate and precisely identify data through a unique DOI.
 
13
There are many sources of confidential and nonshareable data (Christensen and Miguel 2018; Lagoze and Vilhuber 2017).
 
14
In France, the CASD (https://​www.​casd.​eu/​) is a single-access portal to many public data providers (INSEE, ministries, etc.). Researchers are not allowed to copy all the materials locally on their machine, and only some type of outputs can be extracted.
 
15
The code may also contain some confidential elements. In particular, the code used for the initial data curation may contain, e.g., brand or city names and addresses.
 
16
Some data providers, in particular NSOs, already perform RR on their confidential data, controlling output files and code, to check for confidentiality restrictions (Lagoze and Vilhuber 2017).
 
17
Alter and Gonzalez (2018) suggested that to “protect” researchers who want to use their data first (before sharing), journals can propose an “embargo”.
 
18
A recent lawsuit involving the popular training program CrossFit showed that a paper by Smith et al. (2013) erroneously showed an increased risk for injuries for its users. Although the paper was retracted later, the impacts on the researcher’s career were severe (for details, see https://​retractionwatch.​com/​).
 
19
The European Research Council (ERC) recommends “to all its funded researchers that they follow best practice by retaining files of all the research data they have used during the course of their work and that they be prepared to share this data with other researchers”.
 
Literature
go back to reference Akers, K. G., & Doty, J. (2013). Disciplinary differences in faculty research data management practices and perspectives. International Journal of Digital Curation, 8(2), 5–26. CrossRef Akers, K. G., & Doty, J. (2013). Disciplinary differences in faculty research data management practices and perspectives. International Journal of Digital Curation, 8(2), 5–26. CrossRef
go back to reference Alter, G., & Gonzalez, R. (2018). Responsible practices for data sharing. American Psychologist, 73(2), 146–156. CrossRef Alter, G., & Gonzalez, R. (2018). Responsible practices for data sharing. American Psychologist, 73(2), 146–156. CrossRef
go back to reference Baiocchi, G. (2007). Reproducible research in computational economics: Guidelines, integrated approaches, and open source software. Computational Economics, 30(1), 19–40. CrossRef Baiocchi, G. (2007). Reproducible research in computational economics: Guidelines, integrated approaches, and open source software. Computational Economics, 30(1), 19–40. CrossRef
go back to reference Baker, M. (2016). Why scientists must share their research code. Nature News. Baker, M. (2016). Why scientists must share their research code. Nature News.
go back to reference Benureau, F. C. Y., & Rougier, N. P. (2018). Re-run, repeat, reproduce, reuse, replicate: Transforming code into scientific contributions. Frontiers in Neuroinformatics, 11, 69. CrossRef Benureau, F. C. Y., & Rougier, N. P. (2018). Re-run, repeat, reproduce, reuse, replicate: Transforming code into scientific contributions. Frontiers in Neuroinformatics, 11, 69. CrossRef
go back to reference Boker, S. M., Brick, T. R., Pritikin, J. N., Wang, Y., von Oertzen, T., Brown, D., et al. (2015). Maintained individual data distributed likelihood estimation (middle). Multivariate Behavioral Research, 50(6), 706–720. CrossRef Boker, S. M., Brick, T. R., Pritikin, J. N., Wang, Y., von Oertzen, T., Brown, D., et al. (2015). Maintained individual data distributed likelihood estimation (middle). Multivariate Behavioral Research, 50(6), 706–720. CrossRef
go back to reference Bowers, J., Higgins, N., Karlan, D., Tulman, S., & Zinman, J. (2017). Challenges to replication and iteration in field experiments: Evidence from two direct mail shots. American Economic Review, 107(5), 462–65. CrossRef Bowers, J., Higgins, N., Karlan, D., Tulman, S., & Zinman, J. (2017). Challenges to replication and iteration in field experiments: Evidence from two direct mail shots. American Economic Review, 107(5), 462–65. CrossRef
go back to reference Brase, J. (2009). DataCite - A global registration agency for research data. In 2009 4th International Conference on Cooperation and Promotion of Information Resources in Science and Technology (pp. 257–261). Brase, J. (2009). DataCite - A global registration agency for research data. In 2009 4th International Conference on Cooperation and Promotion of Information Resources in Science and Technology (pp. 257–261).
go back to reference Chang, A. C., & Li, P. (2017). A preanalysis plan to replicate sixty economics research papers that worked half of the time. American Economic Review, 107(5), 60–64. CrossRef Chang, A. C., & Li, P. (2017). A preanalysis plan to replicate sixty economics research papers that worked half of the time. American Economic Review, 107(5), 60–64. CrossRef
go back to reference Christensen, G., & Miguel, E. (2018). Transparency, reproducibility, and the credibility of economics research. Journal of Economic Literature, 56(3), 920–80. CrossRef Christensen, G., & Miguel, E. (2018). Transparency, reproducibility, and the credibility of economics research. Journal of Economic Literature, 56(3), 920–80. CrossRef
go back to reference Christensen, G., Freese, J., & Miguel, E. (2019). Transparent and reproducible social science research: How to do open science. Berkeley: University of California Press. Christensen, G., Freese, J., & Miguel, E. (2019). Transparent and reproducible social science research: How to do open science. Berkeley: University of California Press.
go back to reference Christian, T.-M., Lafferty-Hess, S., Jacoby, W., & Carsey, T. (2018). Operationalizing the replication standard: A case study of the data curation and verification workflow for scholarly journals. International Journal of Digital Curation, 13(1), 114–124. CrossRef Christian, T.-M., Lafferty-Hess, S., Jacoby, W., & Carsey, T. (2018). Operationalizing the replication standard: A case study of the data curation and verification workflow for scholarly journals. International Journal of Digital Curation, 13(1), 114–124. CrossRef
go back to reference Claerbout, J. (1990). Active documents and reproducible results. SEP, 67, 139–144. Claerbout, J. (1990). Active documents and reproducible results. SEP, 67, 139–144.
go back to reference Crabtree, J. D. (2011). Odum institute user study: Exploring the applicability of the dataverse network. Crabtree, J. D. (2011). Odum institute user study: Exploring the applicability of the dataverse network.
go back to reference Crosas, M., King, G., Honaker, J., & Sweeney, L. (2015). Automating open science for big data. ANNALS of the American Academy of Political and Social Science, 659(1), 260–273. CrossRef Crosas, M., King, G., Honaker, J., & Sweeney, L. (2015). Automating open science for big data. ANNALS of the American Academy of Political and Social Science, 659(1), 260–273. CrossRef
go back to reference de Leeuw, J. (2001). Reproducible research. The bottom line. de Leeuw, J. (2001). Reproducible research. The bottom line.
go back to reference Dewald, W. G., Thursby, J. G., & Anderson, R. G. (1988). Replication in empirical economics: The journal of money, credit and banking project: Reply. American Economic Review, 78(5), 1162–1163. Dewald, W. G., Thursby, J. G., & Anderson, R. G. (1988). Replication in empirical economics: The journal of money, credit and banking project: Reply. American Economic Review, 78(5), 1162–1163.
go back to reference Di Cosmo, R., & Zacchiroli, S. (2017). Software heritage: Why and how to preserve software source code. Di Cosmo, R., & Zacchiroli, S. (2017). Software heritage: Why and how to preserve software source code.
go back to reference Dunn, C. S., & Austin, E. W. (1998). Protecting confidentiality in archival data resources. IASSIST Quarterly, 22(2), 16–16. Dunn, C. S., & Austin, E. W. (1998). Protecting confidentiality in archival data resources. IASSIST Quarterly, 22(2), 16–16.
go back to reference Duvendack, M., Palmer-Jones, R., & Reed, W. R. (2017). What is meant by “replication” and why does it encounter resistance in economics? American Economic Review, 107(5), 46–51. CrossRef Duvendack, M., Palmer-Jones, R., & Reed, W. R. (2017). What is meant by “replication” and why does it encounter resistance in economics? American Economic Review, 107(5), 46–51. CrossRef
go back to reference Dwork, C., Naor, M., Reingold, O., Rothblum, G. N., & Vadhan, S. (2009). On the complexity of differentially private data release: Efficient algorithms and hardness results. In Proceedings of the 41st Annual ACM Symposium on Theory of Computing (pp. 381–390). Dwork, C., Naor, M., Reingold, O., Rothblum, G. N., & Vadhan, S. (2009). On the complexity of differentially private data release: Efficient algorithms and hardness results. In Proceedings of the 41st Annual ACM Symposium on Theory of Computing (pp. 381–390).
go back to reference Fenner, M., Crosas, M., Grethe, J., Kennedy, D., Hermjakob, H., Rocca-Serra, P., et al. (2017). A data citation roadmap for scholarly data repositories. bioRxiv. Fenner, M., Crosas, M., Grethe, J., Kennedy, D., Hermjakob, H., Rocca-Serra, P., et al. (2017). A data citation roadmap for scholarly data repositories. bioRxiv.
go back to reference Fuentes, M. (2016). Reproducible research in JASA. AMSTAT News: The Membership Magazine of the American Statistical Association, 17. Fuentes, M. (2016). Reproducible research in JASA. AMSTAT News: The Membership Magazine of the American Statistical Association, 17.
go back to reference Gentleman, R., Temple Lang, D. (2007). Statistical analyses and reproducible research. Journal of Computational and Graphical Statistics, 16(1), 1–23. Gentleman, R., Temple Lang, D. (2007). Statistical analyses and reproducible research. Journal of Computational and Graphical Statistics, 16(1), 1–23.
go back to reference Gentzkow, M., & Shapiro, J. (2013). Nuts and bolts: Computing with large data. In Summer Institute 2013 Econometric Methods for High-Dimensional Data. Gentzkow, M., & Shapiro, J. (2013). Nuts and bolts: Computing with large data. In Summer Institute 2013 Econometric Methods for High-Dimensional Data.
go back to reference Van Gorp, P., & Mazanek, S. (2011). SHARE: A web portal for creating and sharing executable research papers. Procedia Computer Science, 4, 589–597. CrossRef Van Gorp, P., & Mazanek, S. (2011). SHARE: A web portal for creating and sharing executable research papers. Procedia Computer Science, 4, 589–597. CrossRef
go back to reference Gouëzel, S., & Shchur, V. (2019). A corrected quantitative version of the Morse lemma. Journal of Functional Analysis, 277(4), 1258–1268. MathSciNetCrossRef Gouëzel, S., & Shchur, V. (2019). A corrected quantitative version of the Morse lemma. Journal of Functional Analysis, 277(4), 1258–1268. MathSciNetCrossRef
go back to reference Hurlin, C., Pérignon, C., & Stodden, V. (2014). RunMyCode.org: A novel dissemination and collaboration platform for executing published computational results. Open Science Framework. Hurlin, C., Pérignon, C., & Stodden, V. (2014). RunMyCode.org: A novel dissemination and collaboration platform for executing published computational results. Open Science Framework.
go back to reference Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Medicine, 2(8), e124. Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Medicine, 2(8), e124.
go back to reference Jacoby William G., Lafferty-Hess, S., & Christian, T.-M. (2017). Should journals be responsible for reproducibility? Jacoby William G., Lafferty-Hess, S., & Christian, T.-M. (2017). Should journals be responsible for reproducibility?
go back to reference Jones, S., & Grootveld, M. (2017). How FAIR are your data? Jones, S., & Grootveld, M. (2017). How FAIR are your data?
go back to reference King, G. (2007). An introduction to the dataverse network as an infrastructure for data sharing. Sociological Methods & Research, 36(2), 173–199. MathSciNetCrossRef King, G. (2007). An introduction to the dataverse network as an infrastructure for data sharing. Sociological Methods & Research, 36(2), 173–199. MathSciNetCrossRef
go back to reference Knuth, D. E. (1984). Literate programming. The Computer Journal, 27, 97–111. CrossRef Knuth, D. E. (1984). Literate programming. The Computer Journal, 27, 97–111. CrossRef
go back to reference Knuth, D. E. (1992). Literate programming. Center for the Study of Language and Information. Knuth, D. E. (1992). Literate programming. Center for the Study of Language and Information.
go back to reference Lagoze, C., & Vilhuber, L. (2017). O privacy, where art thou? Making confidential data part of reproducible research. CHANCE, 30(3), 68–72. CrossRef Lagoze, C., & Vilhuber, L. (2017). O privacy, where art thou? Making confidential data part of reproducible research. CHANCE, 30(3), 68–72. CrossRef
go back to reference Leeper, T. J. (2014). Archiving reproducible research with R and dataverse. R Journal, 6(1). Leeper, T. J. (2014). Archiving reproducible research with R and dataverse. R Journal, 6(1).
go back to reference LeVeque, R. J. (2009). Python tools for reproducible research on hyperbolic problems. Computing in Science and Engineering (CiSE), 19–27. Special issue on Reproducible Research. LeVeque, R. J. (2009). Python tools for reproducible research on hyperbolic problems. Computing in Science and Engineering (CiSE), 19–27. Special issue on Reproducible Research.
go back to reference McCullough, B. D. (2009). Open access economics journals and the market for reproducible economic research. Economic Analysis and Policy, 39(1), 117–126. CrossRef McCullough, B. D. (2009). Open access economics journals and the market for reproducible economic research. Economic Analysis and Policy, 39(1), 117–126. CrossRef
go back to reference Miyakawa, T. (2020). No raw data, no science: Another possible source of the reproducibility crisis. Miyakawa, T. (2020). No raw data, no science: Another possible source of the reproducibility crisis.
go back to reference Mueller-Langer, F., Fecher, B., Harhoff, D., & Wagner, G. G. (2019). Replication studies in economics–How many and which papers are chosen for replication, and why? Research Policy, 48(1), 62–83. CrossRef Mueller-Langer, F., Fecher, B., Harhoff, D., & Wagner, G. G. (2019). Replication studies in economics–How many and which papers are chosen for replication, and why? Research Policy, 48(1), 62–83. CrossRef
go back to reference Nature, Editor. (2013). Reducing our irreproducibility. Nature, 496, 398. Nature, Editor. (2013). Reducing our irreproducibility. Nature, 496, 398.
go back to reference Nosek, B. A., & Coauthors. (2015). Promoting an open research culture. Science, 348(6242), 1422–1425. Nosek, B. A., & Coauthors. (2015). Promoting an open research culture. Science, 348(6242), 1422–1425.
go back to reference Orozco, V., Bontemps, C., Maigne, E., Piguet, V., Hofstetter, A., Lacroix, A., et al. (2020). How to make a pie: Reproducible research for empirical economics & econometrics. Journal of Economic Surveys, 34(5), 1134–1169. Orozco, V., Bontemps, C., Maigne, E., Piguet, V., Hofstetter, A., Lacroix, A., et al. (2020). How to make a pie: Reproducible research for empirical economics & econometrics. Journal of Economic Surveys, 34(5), 1134–1169.
go back to reference Pérignon, C., Gadouche, K., Hurlin, C., Silberman, R., & Debonnel, E. (2019). Certify reproducibility with confidential data. Science, 365(6449), 127–128. Pérignon, C., Gadouche, K., Hurlin, C., Silberman, R., & Debonnel, E. (2019). Certify reproducibility with confidential data. Science, 365(6449), 127–128.
go back to reference Reinhart, C. M., & Rogoff, K. S. (2010). Growth in a time of debt. American Economic Review, 100(2), 573–78. CrossRef Reinhart, C. M., & Rogoff, K. S. (2010). Growth in a time of debt. American Economic Review, 100(2), 573–78. CrossRef
go back to reference Rowhani-Farid, A., & Barnett, A. G. (2018). Badges for sharing data and code at biostatistics: An observational study [version 2; peer review: 2 approved]. F1000Research, 7(90). Rowhani-Farid, A., & Barnett, A. G. (2018). Badges for sharing data and code at biostatistics: An observational study [version 2; peer review: 2 approved]. F1000Research, 7(90).
go back to reference Sansone, S.-A., McQuilton, P., Rocca-Serra, P., Gonzalez-Beltran, A., Izzo, M., Lister, A. L., et al. (2019). FAIRsharing as a community approach to standards, repositories and policies. Nature Biotechnology, 37(4), 358–367. CrossRef Sansone, S.-A., McQuilton, P., Rocca-Serra, P., Gonzalez-Beltran, A., Izzo, M., Lister, A. L., et al. (2019). FAIRsharing as a community approach to standards, repositories and policies. Nature Biotechnology, 37(4), 358–367. CrossRef
go back to reference Science, S. (2011). Challenges and opportunities. Science, 331(6018), 692–693. CrossRef Science, S. (2011). Challenges and opportunities. Science, 331(6018), 692–693. CrossRef
go back to reference Smith, M. M., Sommer, A. J., Starkoff, B. E., Devor, S. T. (2013). Crossfit-based high-intensity power training improves maximal aerobic fitness and body composition. The Journal of Strength and Conditioning Research, 27(11), 3159–3172. Smith, M. M., Sommer, A. J., Starkoff, B. E., Devor, S. T. (2013). Crossfit-based high-intensity power training improves maximal aerobic fitness and body composition. The Journal of Strength and Conditioning Research, 27(11), 3159–3172.
go back to reference Sweeney, L, Crosas, M., & Bar-Sinai, M. (2015). Sharing sensitive data with confidence: The datatags system. Technology Science. Sweeney, L, Crosas, M., & Bar-Sinai, M. (2015). Sharing sensitive data with confidence: The datatags system. Technology Science.
go back to reference Vilhuber, L. (2019). Report by the AEA data editor. AEA Papers and Proceedings, 109, 718–729. CrossRef Vilhuber, L. (2019). Report by the AEA data editor. AEA Papers and Proceedings, 109, 718–729. CrossRef
go back to reference Vlaeminck, S., & Herrmann, L.-K. (2015). Data policies and data archives: A new paradigm for academic publishing in economic sciences? In B. Schmidt, & M. Dobreva (Eds.), New avenues for electronic publishing in the age of infinite collections and citizen science (pp. 145–155). Amsterdam: IOS Press. Vlaeminck, S., & Herrmann, L.-K. (2015). Data policies and data archives: A new paradigm for academic publishing in economic sciences? In B. Schmidt, & M. Dobreva (Eds.), New avenues for electronic publishing in the age of infinite collections and citizen science (pp. 145–155). Amsterdam: IOS Press.
go back to reference Wilkinson, M., Dumontier, M., Aalbersber I., Appleton, G., Axton, M., Baak, A. (2016). The FAIR guiding principles for scientific data management and stewardship. Scientific Data, 3(160018). Wilkinson, M., Dumontier, M., Aalbersber I., Appleton, G., Axton, M., Baak, A. (2016). The FAIR guiding principles for scientific data management and stewardship. Scientific Data, 3(160018).
Metadata
Title
Toward a FAIR Reproducible Research
Authors
Christophe Bontemps
Valérie Orozco
Copyright Year
2021
DOI
https://doi.org/10.1007/978-3-030-73249-3_30

Premium Partner