skip to main content
research-article

An introduction to Docker for reproducible research

Published:20 January 2015Publication History
Skip Abstract Section

Abstract

As computational work becomes more and more integral to many aspects of scientific research, computational reproducibility has become an issue of increasing importance to computer systems researchers and domain scientists alike. Though computational reproducibility seems more straight forward than replicating physical experiments, the complex and rapidly changing nature of computer environments makes being able to reproduce and extend such work a serious challenge. In this paper, I explore common reasons that code developed for one research project cannot be successfully executed or extended by subsequent researchers. I review current approaches to these issues, including virtual machines and workflow systems, and their limitations. I then examine how the popular emerging technology Docker combines several areas from systems research - such as operating system virtualization, cross-platform portability, modular re-usable elements, versioning, and a 'DevOps' philosophy, to address these challenges. I illustrate this with several examples of Docker use with a focus on the R statistical environment.

References

  1. Altintas, I. et al. 2004. Kepler: an extensible system for design and execution of scientific workflows. Proceedings.16th international conference on scientific and statistical database management, 2004. (2004). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Barnes, N. 2010. Publish your computer code: it is good enough. Nature. 467, 7317 (Oct. 2010), 753--753.Google ScholarGoogle ScholarCross RefCross Ref
  3. Clark, D. et al. 2014. BCE: Berkeley's Common Scientific Compute Environment for Research and Education. Proceedings of the 13th Python in Science Conference (SciPy 2014). (2014).Google ScholarGoogle Scholar
  4. Collberg, C. et al. 2014. Measuring Reproducibility in Computer Systems Research.Google ScholarGoogle Scholar
  5. Dudley, J.T. and Butte, A.J. 2010. In silico research in the era of cloud computing. Nat Biotechnol. 28, 11 (Nov. 2010), 1181--1185.Google ScholarGoogle ScholarCross RefCross Ref
  6. Eide, E. 2010. Toward Replayable Research in Networking and Systems. Archive '10, the nSF workshop on archiving experiments to raise scientific standards (2010).Google ScholarGoogle Scholar
  7. FitzJohn, R. et al. 2014. Reproducible research is still a challenge. http://ropensci.org/blog/2014/06/09/reproducibility/.Google ScholarGoogle Scholar
  8. Garijo, D. et al. 2013. Quantifying reproducibility in computational biology: The case of the tuberculosis drugome. {PLoS} {ONE}. 8, 11 (Nov. 2013), e80278.Google ScholarGoogle Scholar
  9. Gil, Y. et al. 2007. Examining the challenges of scientific workflows. Computer. 40, 12 (2007), 24--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Gilbert, K.J. et al. 2012. Recommendations for utilizing and reporting population genetic analyses: the reproducibilityof genetic clustering using the program structure. Mol Ecol. 21, 20 (Sep. 2012), 4925--4930.Google ScholarGoogle ScholarCross RefCross Ref
  11. Harji, A.S. et al. 2013. Our Troubles with Linux Kernel Upgrades and Why You Should Care. ACM SIGOPS Operating Systems Review. 47, 2 (2013), 66--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Howe, B. 2012. Virtual appliances, cloud computing, and reproducible research. Computing in Science & Engineering. 14, 4 (Jul. 2012), 36--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Hull, D. et al. 2006. Taverna: a tool for building and running workflows of services. Nucleic Acids Research. 34, Web Server (Jul. 2006), W729--W732.Google ScholarGoogle Scholar
  14. Ince, D.C. et al. 2012. The case for open computer programs. Nature. 482, 7386 (Feb. 2012), 485--488.Google ScholarGoogle ScholarCross RefCross Ref
  15. Joppa, L.N. et al. 2013. Troubling Trends in Scientific Software Use. Science (New York, N.Y.). 340, 6134 (May 2013), 814--815.Google ScholarGoogle Scholar
  16. Lapp, Hilmar 2014. Reproducibility / repeatability big- Think (with tweets) @hlapp. Storify. http://storify.com/hlapp/reproducibility-repeatability-bigthink.Google ScholarGoogle Scholar
  17. Leisch, F. 2002. Sweave: Dynamic Generation of Statistical Reports Using Literate Data Analysis. Compstat. W. Härdle and B. Rönz, eds. Physica-Verlag HD.Google ScholarGoogle Scholar
  18. Merali, Z. 2010. Computational science: ...Error. Nature. 467, 7317 (Oct. 2010), 775--777.Google ScholarGoogle ScholarCross RefCross Ref
  19. Nature Editors 2012. Must try harder. Nature. 483, 7391 (Mar. 2012), 509--509.Google ScholarGoogle Scholar
  20. Ooms, J. 2013. Possible directions for improving dependency versioning in r. arXiv.org. http://arxiv.org/abs/1303. 2140v2.Google ScholarGoogle Scholar
  21. Ooms, J. 2014. The openCPU system: Towards a universal interface for scientific computing through separation of concerns. arXiv.org. http://arxiv.org/abs/1406.4806.Google ScholarGoogle Scholar
  22. Peng, R.D. 2011. Reproducible research in computational science. Science. 334, 6060 (Dec. 2011), 1226--1227.Google ScholarGoogle ScholarCross RefCross Ref
  23. Stodden, V. 2010. The scientific method in practice: Reproducibility in the computational sciences. SSRN Journal. (2010).Google ScholarGoogle Scholar
  24. Stodden, V. et al. 2013. Setting the Default to Reproducible. (2013), 1--19.Google ScholarGoogle Scholar
  25. The Economist 2013. How science goes wrong. The Economist. http://www.economist.com/news/leaders/21588069-scientific-research-has-changed-world-now-itneeds-change-itself-how-science-goes-wrong.Google ScholarGoogle Scholar
  26. Xie, Y. 2013. Dynamic documents with R and knitr. Chapman; Hall/CRC. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. 2014. Examining reproducibility in computer science. http://cs.brown.edu/~sk/Memos/Examining- Reproducibility/.Google ScholarGoogle Scholar
  28. 2012. Mick Watson on Twitter: @ewanbirney @pathogenomenick @ctitusbrown you can't install an image for every pipeline you want... https://twitter.com/BioMickWatson/status/265037994526928896.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image ACM SIGOPS Operating Systems Review
    ACM SIGOPS Operating Systems Review  Volume 49, Issue 1
    Special Issue on Repeatability and Sharing of Experimental Artifacts
    January 2015
    155 pages
    ISSN:0163-5980
    DOI:10.1145/2723872
    Issue’s Table of Contents

    Copyright © 2015 Author

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 20 January 2015

    Check for updates

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader