Introduction
How big is Big Data?
Exemplary Big Data archives
-
whether the infrastructure contains viable data and provides flexible methods for data description and relationships among various metadata characteristics (e.g., provenance);
-
whether the database is well organized, algorithmically agile and the user access interface is easy to navigate;
-
whether the data are derived versions of raw data or the raw data themselves, with the attendant human subjects privacy issues of “extended” consent for Big Data cohort compilation;
-
whether the duties and responsibilities of stakeholders, individuals and institutions, are clearly and precisely specified;
-
whether clear curation systems governing quality control, data validation, authentication, and authorization are in place;
-
whether secure data transactions are efficient and support subsequent data derivation (generation of derived data);
-
whether there are pathways and penalties to ensure that requesting investigators give proper attribution to the original and multiple collectors of the data; and
-
whether and how the database addresses sociologic and bureaucratic issues germane to data sharing, both open and restricted or tiered access.