ABSTRACT
National Science Foundation large facilities conduct large-scale physical and natural science research. They include telescopes that survey the entire sky, gravitational wave detectors that look deep into our universe’s past, sensor-driven field sites that collect a range of biological and environmental data, and more. The Cyberinfrastructure Center for Excellence (CICoE) pilot project aims to develop a model for a center that facilitates community building, fosters knowledge sharing, and applies best practices in consulting with large facilities with regard to their cyberinfrastructure. To accomplish this goal, the pilot began an in-depth study of how large facilities manage their data during the course of their research. Large facilities are diverse and highly complex, from the types of data they capture, to the types of equipment they use, to the types of data processing and analysis they conduct, to their policies on data sharing and use. Because of this complexity, the pilot needed to find a single lens through which it could frame its growing understanding of large facilities and identify areas where it could best serve large facilities. As a result of the pilot’s research into large facilities, common themes have emerged which have enabled the creation of a data lifecycle model that successfully captures the data management practices of large facilities. This model has enabled the pilot to organize its thinking about large facilities, and frame its support and consultation efforts around the cyberinfrastructure used during lifecycle stages. This paper describes the model and discusses how it was applied to disaster recovery planning for a representative large facility—IceCube.
Supplemental Material
- Sergio Albani and David Giaretta. 2009. Long term data and knowledge preservation to guarantee access and use of the Earth science archive. In PV2018: Ensuring the Long-Term Preservation and Value Adding to Scientific and Technical Data. 1–7.Google Scholar
- Suzie Allard. 2012. DataONE: Facilitating eScience through collaboration. Journal of eScience Librarianship 1, 1 (2012), 4–17.Google ScholarCross Ref
- Mohammed El Arass, Iman Tikito, and Nissrine Souissi. 2017. Data lifecycles analysis: Towards intelligent cycle. In 2017 Intelligent Systems and Computer Vision (ISCV). IEEE, 1–8.Google Scholar
- Sören Auer, Lorenz Bühmann, Christian Dirschl, Orri Erling, Michael Hausenblas, Robert Isele, Jens Lehmann, Michael Martin, Pablo N. Mendes, and Bert Van Nuffelen. 2012. Managing the life-cycle of linked data with the LOD2 stack. In International Semantic Web Conference. Springer, 1–16.Google ScholarDigital Library
- Alex Ball. 2012. Review of data management lifecycle models. University of Bath, IDMRC.Google Scholar
- Jake Carlson. 2014. The use of life cycle models in developing and supporting data services. Research Data Management: Practical Strategies for Information Professionals (2014), 63–86.Google Scholar
- Andrew Martin Cox and Winnie Wan Ting Tam. 2018. A critical analysis of lifecycle models of the research process and research data management. Aslib Journal of Information Management 70, 2 (2018), 142–157.Google ScholarCross Ref
- Kevin Crowston and Jian Qin. 2011. A capability maturity model for scientific data management: Evidence from the literature. Proceedings of the American Society for Information Science and Technology 48, 1 (2011), 1–9.Google ScholarCross Ref
- Ewa Deelman, Anirban Mandal, Valerio Pascucci, Susan Sons, Jane Wyngaard, Charles F. Vardeman II, Steve Petruzza, Ilya Baldin, Laura Christopherson, Ryan Mitchell, Loic Pottier, Mats Rynge, Erik Scott, Karan Vahi, Marina Kogank, Jasmine A Mann, Tom Gulbransen, Daniel Allen, David Barlow, Santiago Bonarrigo, Chris Clark, Leslie Goldman, Tristan Goulden, Phil Harvey, David Hulsander, Steve Jacob, Christine Laney, Ivan Lobo-Padilla, Jeremey Sampson, John Staarmann, and Steve Stone. 2019. Cyberinfrastructure Center of Excellence Pilot: Connecting Large Facilities Cyberinfrastructure. In 15th International Conference on eScience (eScience) (San Diego, CA, USA). Funding Acknowledgments: NSF 1842042.Google Scholar
- Yuri Demchenko, Cees De Laat, and Peter Membrey. 2014. Defining architecture components of the Big Data Ecosystem. In 2014 International Conference on Collaboration Technologies and Systems (CTS). IEEE, 104–112.Google ScholarCross Ref
- DigitalNZ.org. [n.d.]. Getting Started with Digitisation. https://digitalnz.org/make-it-digital/getting-started-with-digitisationGoogle Scholar
- Satu Elo and Helvi Kyngäs. 2008. The qualitative content analysis process. Journal of Advanced Nursing 62, 1 (2008), 107–115.Google ScholarCross Ref
- John L. Faundeen, Thomas E. Burley, Jennifer A. Carlino, David L. Govoni, Heather S. Henkel, Sally L. Holl, Vivian B. Hutchison, Elizabeth Martín, Ellyn T. Montgomery, and Cassandra Ladino. 2013. The United States geological survey science data lifecycle model. Technical Report. US Geological Survey. https://pubs.usgs.gov/of/2013/1265/pdf/of2013-1265.pdfGoogle Scholar
- Inter-University Consortium for Political Social Research (ICPSR). 2012. Guide to Social Science Data Preparation and Archiving Best Practice Throughout the Data Life Cycle. https://www.icpsr.umich.edu/files/deposit/dataprep.pdfGoogle Scholar
- Sarah Higgins. 2008. The DCC curation lifecycle model. International Journal of Digital Curation 3, 1 (2008).Google ScholarCross Ref
- Chuck Humphrey. 2006. e-Science and the Life Cycle of Research. https://era.library.ualberta.ca/items/3334684b-fa6a-4c9d-a74b-559fecd42f9f/view/79b064d6-7b51-4d18-8e4e-3d42b9faa81f/Lifecycle-science060308.pdfGoogle Scholar
- Data Documentation Initiative. 2019. Why Use DDI?https://ddialliance.org/training/why-use-ddiGoogle Scholar
- Nawsher Khan, Ibrar Yaqoob, Ibrahim Abaker Targio Hashem, Zakira Inayat, Mahmoud Ali, Waleed Kamaleldin, Muhammad Alam, Muhammad Shiraz, and Abdullah Gani. 2014. Big data: survey, technologies, opportunities, and challenges. The Scientific World Journal 2014 (2014).Google Scholar
- Finance Large Facilities Office in the Budget and Award Management Office (BFA-LFO). 2019. Major Facilities Guide. NSF 19-68. National Science Foundation. https://www.nsf.gov/pubs/2019/nsf19068/nsf19068.pdfGoogle Scholar
- Brian Lavoie. 2000. Meeting the challenges of digital preservation: The OAIS reference model. Technical Report. Online Computer Library Center (OCLC). https://www.oclc.org/research/publications/library/2000/lavoie-oais.htmlGoogle Scholar
- Li Lin, Tingting Liu, Jian Hu, and Jianbiao Zhang. 2014. A privacy-aware cloud service selection method toward data life-cycle. In 2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS). IEEE, 752–759.Google ScholarCross Ref
- Philipp Mayring. 2004. Qualitative content analysis. A Companion to Qualitative Research 1 (2004), 159–176.Google Scholar
- Research Information Network and NESTA. 2010. Open to all? Case studies of openness in research. http://www.rin.ac.uk/system/files/attachments/NESTA-RIN_Open_Science_V01_0.pdfGoogle Scholar
- University of Central Florida Libraries: Scholarly Communication. [n.d.]. Overview: Research Lifecycle. https://library.ucf.edu/about/departments/scholarly-communication/overview-research-lifecycle/Google Scholar
- University of Virginia Library: Research Data Services and Sciences. [n.d.]. Steps in the Data Life Cycle. https://data.library.virginia.edu/data-management/lifecycle/Google Scholar
- Working Group on Information Systems and Services. 2012. Data life cycle models and concepts: CEOS Version 1.2. Technical Report. Committee on Earth Observation Satellites (CEOS). http://ceos.org/document_management/Working_Groups/WGISS/Interest_Groups/Data_Stewardship/White_Papers/WGISS_DSIG_Data-Lifecycle-Models-And-Concepts-v13-1_Apr2012.docxGoogle Scholar
- Alberto Pepe, Matthew Mayernik, Christine L. Borgman, and Herbert Van de Sompel. 2010. From artifacts to aggregations: Modeling scientific life cycles on the semantic web. Journal of the American Society for Information Science and Technology 61, 3 (2010), 567–582.Google ScholarDigital Library
- Line Pouchard. 2015. Revisiting the data lifecycle with big data curation. International Journal of Digital Curation 10, 2 (2015), 176–192.Google ScholarCross Ref
- Janine Rüegg, Corinna Gries, Ben Bond-Lamberty, Gabriel J. Bowen, Benjamin S. Felzer, Nancy E. McIntyre, Patricia A. Soranno, Kristin L. Vanderbilt, and Kathleen C. Weathers. 2014. Completing the data life cycle: Using information management in macrosystems ecology research. Frontiers in Ecology and the Environment 12, 1 (2014), 24–30.Google ScholarCross Ref
- Amir Sinaeepourfard, Xavier Masip-Bruin, Jordi Garcia, and Eva Marín-Tordera. 2015. A survey on data lifecycle models: Discussions toward the 6Vs Challenges (UPC-DAC-RR-2015–18). Technical Report. https://www.ac.upc.edu/app/research-reports/html/RR/2015/18.pdfGoogle Scholar
- Carly Strasser, Robert Cook, William Michener, and Amber Budden. 2012. Primer on data management: What you always wanted to know. Technical Report. DataONE. https://www.dataone.org/sites/all/documents/DataONE_BP_Primer_020212.pdfGoogle Scholar
- Marianne Swanson, Pauline Bowen, Amy Phillips, Dean Gallup, and David Lynes. 2010. Contingency planning guide for federal information systems, SP 800-34 Rev.1. Technical Report. National Institute of Standards and Technology (NIST). https://csrc.nist.gov/publications/detail/sp/800-34/rev-1/finalGoogle Scholar
- Barbara M. Wildemuth. 2009. Applications of Social Research Methods to Questions in Information and Library Science. Libraries Unlimited.Google Scholar
Recommendations
Towards a comprehensive data lifecycle model for big data environments
BDCAT '16: Proceedings of the 3rd IEEE/ACM International Conference on Big Data Computing, Applications and TechnologiesA huge amount of data is constantly being produced in the world. Data coming from the IoT, from scientific simulations, or from any other field of the eScience, are accumulated over historical data sets and set up the seed for future Big Data processing,...
Optimal location policy for three competitive facilities
Competitive facility location problems have been investigated in many papers. In most, authors have applied location models with two competitors. In this paper three companies, which are mutually competitive, intend to locate their facilities in a ...
The discourse of data: exploring data-related vocabularies in geographic information systems description
Various ideas of data have emerged, expressed in practice through distinct vocabularies of data-related terms. This article develops a six-category taxonomy of these vocabularies, and illustrates how their terms are utilized in texts which relate to ...
Comments