Abstract
Scientific instruments and computer simulations are creating vast data stores that require new scientific methods to analyze and organize the data. Data volumes are approximately doubling each year. Since these new instruments have extraordinary precision, the data quality is also rapidly improving. Analyzing this data to find the subtle effects missed by previous studies requires algorithms that can simultaneously deal with huge datasets and that can find very subtle effects --- finding both needles in the haystack and finding very small haystacks that were undetected in previous measurements.
- {fr1} Committee on Data Management, Archiving, and Computing (CODMAC) Data Level Definitions http://science.hq.nasa.gov/research/earth_science_formats.htmlGoogle Scholar
- {fr2} http://hdf.ncsa.uiuc.edu/HDF5/Google Scholar
- {fr3} http://my.unidata.ucar.edu/content/software/netcdf/Google Scholar
- {fr4} http://fits.gsfc.nasa.gov/Google Scholar
- {fr5} http://vizier.u-strasbg.fr/doc/UCD.htxGoogle Scholar
- {fr6} "MapReduce: Simplified Data Processing on Large Clusters," J. Dean, S. Ghemawat, ACM OSDI, Dec. 2004.Google Scholar
- {fr7} "Parallel Database Systems: the Future of High Performance Database Systems", D. DeWitt, J. Gray, CACM, Vol. 35, No. 6, June 1992. Google ScholarDigital Library
- {fr8} "When Database Systems Meet the Grid," M. Nieto Santisteban et. al., CIDR, 2005, http://www-db.cs.wisc.edu/cidr/papers/P13.pdfGoogle Scholar
- {fr9} "Batch is back: CasJobs serving multi-TB data on the Web," W. O'Mullane, et. al, in preparation.Google Scholar
- {fr10} "Lessons Learned from Managing a Petabyte," J. Becla and D. L. Wang, CIDR, 2005, http://www-db.cs.wisc.edu/cidr/papers/P06.pdfGoogle Scholar
- {fr11} D. T. Liu and M. J. Franklin, VLDB, 2004, www.cs.berkeley.edu/~dtliu/pubs/griddb_vldb04. pdfGoogle Scholar
- {fr12} M. Litzkow, M. Livny and M. Mutka, Condor - A Hunter of Idle Workstations, International Conference of Distributed Computing Systems, 1988.Google ScholarCross Ref
- {fr13} I. Foster and C. Kesselman, Globus: A Metacomputing Infrastructure Toolkit, Journal of Supercomputer Applications and High Performance Computing, 1997.Google Scholar
Index Terms
- Scientific data management in the coming decade
Recommendations
Big Data Management: Advanced Issues and Approaches
The objective of this article is to provide the advanced issues and approaches of big data management. The literature review indicates the overview of big data management; the aspects of Big Data Analytics BDA; the importance of big data management; the ...
Research on Scientific Data Management in Big Data Era
CSAE '20: Proceedings of the 4th International Conference on Computer Science and Application EngineeringScientific data is an important strategic resource in the era of big data. Efficient management and wide circulation are the key ways to enhance the value of scientific data resources. With the transformation of the industrial society into the ...
Making sense of performance in in-memory computing frameworks for scientific data analysis: A case study of the spark system
AbstractOver the last five years, Apache Spark has become a major software platform for in-memory data analysis. Acknowledging its widespread use, we present a comprehensive study of system characteristics of Spark targeting scientific data ...
Highlights- We develop a benchmark, ArrayBench, for benchmarking scientific data analytics that process gene expression matrices using Spark and SciDB.
Comments