Skip to main content
Erschienen in: International Journal on Digital Libraries 2/2017

20.06.2016

Key components of data publishing: using current best practices to develop a reference model for data publishing

verfasst von: Claire C. Austin, Theodora Bloom, Sünje Dallmeier-Tiessen, Varsha K. Khodiyar, Fiona Murphy, Amy Nurnberger, Lisa Raymond, Martina Stockhause, Jonathan Tedds, Mary Vardigan, Angus Whyte

Erschienen in: International Journal on Digital Libraries | Ausgabe 2/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The availability of workflows for data publishing could have an enormous impact on researchers, research practices and publishing paradigms, as well as on funding strategies and career and research evaluations. We present the generic components of such workflows to provide a reference model for these stakeholders. The RDA-WDS Data Publishing Workflows group set out to study the current data-publishing workflow landscape across disciplines and institutions. A diverse set of workflows were examined to identify common components and standard practices, including basic self-publishing services, institutional data repositories, long-term projects, curated data repositories, and joint data journal and repository arrangements. The results of this examination have been used to derive a data-publishing reference model comprising generic components. From an assessment of the current data-publishing landscape, we highlight important gaps and challenges to consider, especially when dealing with more complex workflows and their integration into wider community frameworks. It is clear that the data-publishing landscape is varied and dynamic and that there are important gaps and challenges. The different components of a data-publishing system need to work, to the greatest extent possible, in a seamless and integrated way to support the evolution of commonly understood and utilized standards and—eventually—to increased reproducibility. We therefore advocate the implementation of existing standards for repositories and all parts of the data-publishing process, and the development of new standards where necessary. Effective and trustworthy data publishing should be embedded in documented workflows. As more research communities seek to publish the data associated with their research, they can build on one or more of the components identified in this reference model.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
When we use the term ‘research data’ we mean data that are used as primary sources to support technical or scientific enquiry, research, scholarship, or artistic activity, and that are used as evidence in the research process and/or are commonly accepted in the research community as necessary to validate research findings and results. All digital and non-digital outputs of a research project have the potential to become research data. Research data may be experimental, observational, operational, data from a third party, from the public sector, monitoring data, processed data, or repurposed data (Research Data Canada (2015), Glossary of terms and definitions, http://​dictionary.​casrai.​org/​Category:​Research_​Data_​Domain).
 
2
A repository (also referred to as a data repository or digital data repository) is a searchable and queryable interfacing entity that is able to store, manage, maintain, and curate Data/Digital Objects. A repository is a managed location (destination, directory or ‘bucket’) where digital data objects are registered, permanently stored, made accessible and retrievable, and curated (Research Data Alliance, Data Foundations and Terminology Working Group. http://​smw-rda.​esc.​rzg.​mpg.​de/​index.​php/​Main_​Page). Repositories preserve, manage, and provide access to many types of digital material in a variety of formats. Materials in online repositories are curated to enable search, discovery, and reuse. There must be sufficient control for the digital material to be authentic, reliable, accessible, and usable on a continuing basis (Research Data Canada (2015), Glossary of terms and definitions, http://​dictionary.​casrai.​org/​Category:​Research_​Data_​Domain). Similarly, ‘data services’ assist organizations in the capture, storage, curation, long-term preservation, discovery, access, retrieval, aggregation, analysis, and/or visualization of scientific data, as well as in the associated legal frameworks, to support disciplinary and multidisciplinary scientific research.
 
4
For example, the Antarctic Treaty Article III states that “scientific observations and results from Antarctica shall be exchanged and made freely available”. http://​www.​ats.​aq/​e/​ats_​science.​html.
 
7
Version control (also known as ‘revision control’ or ‘versioning’) is control over a time period of changes to data, computer code, software, and documents that allows for the ability to revert to a previous revision, which is critical for data traceability, tracking edits, and correcting errors. TeD-T: Term definition tool. Research Data Alliance, Data Foundations and Terminology Working Group. http://​smw-rda.​esc.​rzg.​mpg.​de/​index.​php/​Main_​Page.
 
9
Research Data Canada (RDC) is an organizational member of Research Data Alliance (RDA) and from the beginning has worked very closely with RDA. See: “Guidelines for the deposit and preservation of research data in Canada, http://​www.​rdc-drc.​ca/​wp-content/​uploads/​Guidelines-for-Deposit-of-Research-Data-in-Canada-2015.​pdf and, “Research Data Repository Requirements and Features Review”, http://​hdl.​handle.​net/​10864/​10892.
 
11
“Recommendation for Space Data System Practices: Reference Model for an Opean Archival Information System (OAIS), CCSDS 650.0-M-2.” http://​public.​ccsds.​org/​publications/​archive/​650x0m2.​pdf DataCite (2015). “DataCite Metadata Schema for the Publication and Citation of Research Data”. http://​dx.​doi.​org/​10.​5438/​0010.
 
13
Force11 (2015). Future Of Research Communications and e-Scholarship http://​www.​force11.​org/​group/​data-citation-implementation-group.
 
14
Indirect linkage or restricted access—see e.g. Open Health Data Journal, http://​openhealthdata.​metajnl.​com.
 
16
Quality assurance: The process or set of processes used to measure and assure the quality of a product. Quality control: The process of meeting products and services to consumer expectations (Research Data Canada, 2015, Glossary of terms and definitions, http://​dictionary.​casrai.​org/​Category:​Research_​Data_​Domain).
 
18
Defined in e.g. [18].
 
19
Program for Climate Model Diagnosis and Intercomparison. (n.d.). Coupled Model Intercomparison Project (CMIP). Retrieved November 11, 2015, from http://​www-pcmdi.​llnl.​gov/​projects/​cmip/​.
 
20
Approved by the data journal.
 
21
Post-publication peer review is becoming more prevalent and may ultimately strengthen the Parsons–Fox continual release paradigm. See, for instance, F1000 Research and Earth System Science Data and the latter journal’s website: http://​www.​earth-system-science-data.​net/​peer_​review/​interactive_​review_​process.​html.
 
22
An example for a discipline standard is the format and metadata standard NetCDF/CF used in Earth system sciences: http://​cfconventions.​org/​.
 
23
Intergovernmental Panel on Climate Change Data Distribution Centre (IPCC-DDC): http://​ipcc-data.​org.
 
24
Data Seal of Approval (DSA); Network of Expertise in long-term Storage and Accessibility of Digital Resources in Germany (NESTOR) seal/German Institute for Standardization (DIN) standard 31644; Trustworthy Repositories Audit and Certification (TRAC) criteria / International Organization for Standardization (ISO) standard 16363; and the International Council for Science World Data System (ICSU-WDS) certification.
 
27
Among the analyzed workflows, it was generally understood that data citation which properly attributes datasets to originating researchers can be an incentive for deposit of data in a form that makes the data accessible and reusable, a key to changing the culture around scholarly credit for research data.
 
31
See e.g. Open Health Data journal http://​openhealthdata.​metajnl.​com/​.
 
32
Data Citation Synthesis Group, 2014. Accessed 17 November 2015: http://​www.​force11.​org/​group/​joint-declaration-data-citation-principles-final.
 
33
See Sarah Callaghan’s blogpost: Cite what you use, 24 January 2014. Accessed 24 June 2015: http://​citingbytes.​blogspot.​co.​uk/​2014/​01/​cite-what-you-use.​html.
 
35
Funders have an interest in tracking Return on Investment to assess which researchers/projects/fields are effective and whether the proposed new projects consist of new or repeated work.
 
36
Accessed 17 November 2015: http://​www.​ddialliance.​org.
 
37
Accessed 17 November 2015: http://​schema.​datacite.​org.
 
41
See the hiberlink Project for information on this problem and work being done to solve it: http://​hiberlink.​org/​dissemination.​html.
 
44
RDA/WDS Publishing Data Costs IG addresses this topic: http://​rd-alliance.​org/​groups/​rdawds-publishing-data-ig.​html.
 
46
For example, in genomics, there is the idea of numbered “releases” of, for example, a particular animal genome, so that while refinement is ongoing it is also possible to refer to a reference dataset.
 
47
For scientific communities with high volume data, the storage of every dataset version is often too expensive. Versioning and keeping a good provenance record of the datasets are crucial for citations of such data collections. Technical solutions are being developed, e.g. by the European Persistent Identifier Consortium (EPIC).
 
49
At the time of writing, CrossRef had recently announced the concept and approximate launch date for a ‘DOI Event Tracker’, which could also have considerable implications for the perceived value of data publishing as well as for the issues around the associated metrics (Reference: http://​crosstech.​crossref.​org/​2015/​03/​crossrefs-doi-event-tracker-pilot.​html by Geoffrey Bilder, accessed 26 October 2015).
 
Literatur
2.
Zurück zum Zitat Vines, T.H., Albert, A.Y.K., Andrew, R.L., DeBarre, F., Bock, D.G., Franklin, M.T., Gilbert, K.J., Moore, J.S., Renaut, S., Rennison, D.J.: The availability of research data declines rapidly with article age. Curr. Biol. 24(1), 94–97 (2014) Vines, T.H., Albert, A.Y.K., Andrew, R.L., DeBarre, F., Bock, D.G., Franklin, M.T., Gilbert, K.J., Moore, J.S., Renaut, S., Rennison, D.J.: The availability of research data declines rapidly with article age. Curr. Biol. 24(1), 94–97 (2014)
6.
Zurück zum Zitat Borgman, C.L.: Big data, little data, no data: scholarship in the networked world. MIT Press, Cambridge (2015) Borgman, C.L.: Big data, little data, no data: scholarship in the networked world. MIT Press, Cambridge (2015)
8.
Zurück zum Zitat Peng, R.D.: Reproducible research in computational science. Science 334(6060), 1226–1227 (2011)CrossRef Peng, R.D.: Reproducible research in computational science. Science 334(6060), 1226–1227 (2011)CrossRef
12.
Zurück zum Zitat Stodden, V., Bailey, D.H., Borwein, J., LeVeque, R.J., Rider, W., Stein, W.: Setting the default to reproducible. Reproducibility in computational and experimental mathematics. Institute for Computational and Experimental Research in Mathematics (2013). http://icerm.brown.edu/tw12-5-rcem/icerm_report.pdf. Workshop report accessed 10 November 2015 Stodden, V., Bailey, D.H., Borwein, J., LeVeque, R.J., Rider, W., Stein, W.: Setting the default to reproducible. Reproducibility in computational and experimental mathematics. Institute for Computational and Experimental Research in Mathematics (2013). http://​icerm.​brown.​edu/​tw12-5-rcem/​icerm_​report.​pdf. Workshop report accessed 10 November 2015
17.
18.
Zurück zum Zitat Lawrence, B., Jones, C., Matthews, B., Pepler, S., Callaghan, S.: Citation and peer review of data: moving toward formal data publication. Int. J. Digital Curation (2011). doi:10.2218/ijdc.v6i2.205 Lawrence, B., Jones, C., Matthews, B., Pepler, S., Callaghan, S.: Citation and peer review of data: moving toward formal data publication. Int. J. Digital Curation (2011). doi:10.​2218/​ijdc.​v6i2.​205
19.
Zurück zum Zitat Callaghan, S., Murphy, F., Tedds, J., Allan, R., Kunze, J., Lawrence, R., Mayernik, M.S., Whyte , A.: Processes and procedures for data publication: a case study in the geosciences. Int. J. Digital Curation 8(1) (2013). doi:10.2218/ijdc.v8i1.253 Callaghan, S., Murphy, F., Tedds, J., Allan, R., Kunze, J., Lawrence, R., Mayernik, M.S., Whyte , A.: Processes and procedures for data publication: a case study in the geosciences. Int. J. Digital Curation 8(1) (2013). doi:10.​2218/​ijdc.​v8i1.​253
20.
Zurück zum Zitat Austin, C.C., Brown, S., Fong, N., Humphrey, C., Leahey, L., Webster, P.: Research data repositories: review of current features, gap analysis, and recommendations for minimum requirements. Presented at the IASSIST Annual Conference. IASSIST Quarterly Preprint. International Association for Social Science, Information Services, and Technology. Minneapolis (2015). http://drive.google.com/file/d/0B_SRWahCB9rpRF96RkhsUnh1a00/view. Accessed 13 November 2015 Austin, C.C., Brown, S., Fong, N., Humphrey, C., Leahey, L., Webster, P.: Research data repositories: review of current features, gap analysis, and recommendations for minimum requirements. Presented at the IASSIST Annual Conference. IASSIST Quarterly Preprint. International Association for Social Science, Information Services, and Technology. Minneapolis (2015). http://​drive.​google.​com/​file/​d/​0B_​SRWahCB9rpRF96Rk​hsUnh1a00/​view. Accessed 13 November 2015
21.
Zurück zum Zitat Yin, R.: Case study research: design and methods, 5th edn. Sage Publications, Thousand Oaks (2003) Yin, R.: Case study research: design and methods, 5th edn. Sage Publications, Thousand Oaks (2003)
22.
Zurück zum Zitat Murphy, F., Bloom, T., Dallmeier-Tiessen, S., Austin, C.C., Whyte, A., Tedds, J., Nurnberger, A., Raymond, L., Stockhause, M., Vardigan, M.: WDS-RDA-F11 Publishing Data Workflows WG Synthesis FINAL CORRECTED. Zenodo. 2015 (2015). doi:10.5281/zenodo.33899. Accessed 17 November 2015 Murphy, F., Bloom, T., Dallmeier-Tiessen, S., Austin, C.C., Whyte, A., Tedds, J., Nurnberger, A., Raymond, L., Stockhause, M., Vardigan, M.: WDS-RDA-F11 Publishing Data Workflows WG Synthesis FINAL CORRECTED. Zenodo. 2015 (2015). doi:10.​5281/​zenodo.​33899. Accessed 17 November 2015
23.
Zurück zum Zitat Stockhause, M., Höck, H., Toussaint, F., Lautenschlager, M.: Quality assessment concept of the World Data Center for Climate and its application to the CMIP5 data. Geosci. Model Dev. 5(4), 1023–1032 (2012). doi:10.5194/gmd-5-1023-2012 Stockhause, M., Höck, H., Toussaint, F., Lautenschlager, M.: Quality assessment concept of the World Data Center for Climate and its application to the CMIP5 data. Geosci. Model Dev. 5(4), 1023–1032 (2012). doi:10.​5194/​gmd-5-1023-2012
24.
Zurück zum Zitat Starr, J., Castro, E., Crosas, M., Dumontier, M., Downs, R.R., Duerr, R., Haak, L.L., Haendel, M., Herman, I., Hodson, S., Hourclé, J., Kratz, J.E., Lin, J., Nielsen, L.H., Nurnberger, A., Proell, S., Rauber, A., Sacchi, S., Smith, A., Taylor, M., Clark, T.: Achieving human and machine accessibility of cited data in scholarly publications. PeerJ Comput. Sci. 1(e1) (2015). doi:10.7717/peerj-cs.1 Starr, J., Castro, E., Crosas, M., Dumontier, M., Downs, R.R., Duerr, R., Haak, L.L., Haendel, M., Herman, I., Hodson, S., Hourclé, J., Kratz, J.E., Lin, J., Nielsen, L.H., Nurnberger, A., Proell, S., Rauber, A., Sacchi, S., Smith, A., Taylor, M., Clark, T.: Achieving human and machine accessibility of cited data in scholarly publications. PeerJ Comput. Sci. 1(e1) (2015). doi:10.​7717/​peerj-cs.​1
25.
Zurück zum Zitat Castro, E., Garnett, A.: Building a bridge between journal articles and research data: The PKP-Dataverse Integration Project. Int. J. Digital Curation 9(1), 176–184 (2014). doi:10.2218/ijdc.v9i1.311 CrossRef Castro, E., Garnett, A.: Building a bridge between journal articles and research data: The PKP-Dataverse Integration Project. Int. J. Digital Curation 9(1), 176–184 (2014). doi:10.​2218/​ijdc.​v9i1.​311 CrossRef
27.
Zurück zum Zitat Meehl, G.A., Moss, R., Taylor, K.E., Eyring, V., Stouffer, R.J., Bony, S., Stevens, B.: Climate Model Intercomparisons: preparing for the next phase. Eos Trans. AGU 95(9), 77 (2014). doi:10.1002/2014EO090001 CrossRef Meehl, G.A., Moss, R., Taylor, K.E., Eyring, V., Stouffer, R.J., Bony, S., Stevens, B.: Climate Model Intercomparisons: preparing for the next phase. Eos Trans. AGU 95(9), 77 (2014). doi:10.​1002/​2014EO090001 CrossRef
28.
Zurück zum Zitat Bandrowski, A., Brush, M., Grethe, J.S., Haendel, M.A., Kennedy, D.N., Hill, S., Hof, P.R., Martone, M.E., Pols, M., Tan, S., Washington, N., Zudilova-Seinstra, E., Vasilevsky, N.: The Resource Identification Initiative: a cultural shift in publishing [version 1; referees: 2 approved] F1000Research 4, 134 (2015). doi:10.12688/f1000research.6555.1 Bandrowski, A., Brush, M., Grethe, J.S., Haendel, M.A., Kennedy, D.N., Hill, S., Hof, P.R., Martone, M.E., Pols, M., Tan, S., Washington, N., Zudilova-Seinstra, E., Vasilevsky, N.: The Resource Identification Initiative: a cultural shift in publishing [version 1; referees: 2 approved] F1000Research 4, 134 (2015). doi:10.​12688/​f1000research.​6555.​1
29.
Zurück zum Zitat Brase, J., Lautenschlager, M., Sens, I.: The Tenth Anniversary of Assigning DOI Names to Scientific Data and a Five Year History of DataCite. D-Lib Mag. 21(1/2) (2015). doi:10.1045/january2015-brase Brase, J., Lautenschlager, M., Sens, I.: The Tenth Anniversary of Assigning DOI Names to Scientific Data and a Five Year History of DataCite. D-Lib Mag. 21(1/2) (2015). doi:10.​1045/​january2015-brase
30.
Zurück zum Zitat Cragin, M.H., Palmer, C.L., Carlson, J.R., Witt, M.: Data sharing, small science and institutional repositories. Philos. Trans. R. Soc. A 368(1926), 4023–4038 (2010)CrossRef Cragin, M.H., Palmer, C.L., Carlson, J.R., Witt, M.: Data sharing, small science and institutional repositories. Philos. Trans. R. Soc. A 368(1926), 4023–4038 (2010)CrossRef
31.
Metadaten
Titel
Key components of data publishing: using current best practices to develop a reference model for data publishing
verfasst von
Claire C. Austin
Theodora Bloom
Sünje Dallmeier-Tiessen
Varsha K. Khodiyar
Fiona Murphy
Amy Nurnberger
Lisa Raymond
Martina Stockhause
Jonathan Tedds
Mary Vardigan
Angus Whyte
Publikationsdatum
20.06.2016
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal on Digital Libraries / Ausgabe 2/2017
Print ISSN: 1432-5012
Elektronische ISSN: 1432-1300
DOI
https://doi.org/10.1007/s00799-016-0178-2

Weitere Artikel der Ausgabe 2/2017

International Journal on Digital Libraries 2/2017 Zur Ausgabe