article

Scientific data management in the coming decade

Authors:
Jim Gray

Microsoft

Microsoft
View Profile

,
David T. Liu

Berkeley

Berkeley
View Profile

,
Maria Nieto-Santisteban

Johns Hopkins University

Johns Hopkins University
View Profile

,
Alex Szalay

Johns Hopkins University

Johns Hopkins University
View Profile

,
David J. DeWitt

Wisconsin

Wisconsin
View Profile

,
Gerd Heber

Cornell

Cornell
View Profile

Authors Info & Claims

ACM SIGMOD Record Volume 34 Issue 4December 2005pp 34–41https://doi.org/10.1145/1107499.1107503

Published:01 December 2005Publication History

ACM SIGMOD Record

Abstract

Scientific instruments and computer simulations are creating vast data stores that require new scientific methods to analyze and organize the data. Data volumes are approximately doubling each year. Since these new instruments have extraordinary precision, the data quality is also rapidly improving. Analyzing this data to find the subtle effects missed by previous studies requires algorithms that can simultaneously deal with huge datasets and that can find very subtle effects --- finding both needles in the haystack and finding very small haystacks that were undetected in previous measurements.

References

{fr1} Committee on Data Management, Archiving, and Computing (CODMAC) Data Level Definitions http://science.hq.nasa.gov/research/earth_science_formats.htmlGoogle Scholar
{fr2} http://hdf.ncsa.uiuc.edu/HDF5/Google Scholar
{fr3} http://my.unidata.ucar.edu/content/software/netcdf/Google Scholar
{fr4} http://fits.gsfc.nasa.gov/Google Scholar
{fr5} http://vizier.u-strasbg.fr/doc/UCD.htxGoogle Scholar
{fr6} "MapReduce: Simplified Data Processing on Large Clusters," J. Dean, S. Ghemawat, ACM OSDI, Dec. 2004.Google Scholar
{fr7} "Parallel Database Systems: the Future of High Performance Database Systems", D. DeWitt, J. Gray, CACM, Vol. 35, No. 6, June 1992. Google ScholarDigital Library
{fr8} "When Database Systems Meet the Grid," M. Nieto Santisteban et. al., CIDR, 2005, http://www-db.cs.wisc.edu/cidr/papers/P13.pdfGoogle Scholar
{fr9} "Batch is back: CasJobs serving multi-TB data on the Web," W. O'Mullane, et. al, in preparation.Google Scholar
{fr10} "Lessons Learned from Managing a Petabyte," J. Becla and D. L. Wang, CIDR, 2005, http://www-db.cs.wisc.edu/cidr/papers/P06.pdfGoogle Scholar
{fr11} D. T. Liu and M. J. Franklin, VLDB, 2004, www.cs.berkeley.edu/~dtliu/pubs/griddb_vldb04. pdfGoogle Scholar
{fr12} M. Litzkow, M. Livny and M. Mutka, Condor - A Hunter of Idle Workstations, International Conference of Distributed Computing Systems, 1988.Google ScholarCross Ref
{fr13} I. Foster and C. Kesselman, Globus: A Metacomputing Infrastructure Toolkit, Journal of Supercomputer Applications and High Performance Computing, 1997.Google Scholar

Index Terms

Scientific data management in the coming decade
1. Information systems

Recommendations

Big Data Management: Advanced Issues and Approaches

The objective of this article is to provide the advanced issues and approaches of big data management. The literature review indicates the overview of big data management; the aspects of Big Data Analytics BDA; the importance of big data management; the ...
Read More
Research on Scientific Data Management in Big Data Era
CSAE '20: Proceedings of the 4th International Conference on Computer Science and Application Engineering

Scientific data is an important strategic resource in the era of big data. Efficient management and wide circulation are the key ways to enhance the value of scientific data resources. With the transformation of the industrial society into the ...
Read More
Making sense of performance in in-memory computing frameworks for scientific data analysis: A case study of the spark system
Abstract
Over the last five years, Apache Spark has become a major software platform for in-memory data analysis. Acknowledging its widespread use, we present a comprehensive study of system characteristics of Spark targeting scientific data ...
Highlights
- We develop a benchmark, ArrayBench, for benchmarking scientific data analytics that process gene expression matrices using Spark and SciDB.
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM SIGMOD Record Volume 34, Issue 4
December 2005
86 pages
ISSN:0163-5808
DOI:10.1145/1107499
Issue’s Table of Contents

Copyright © 2005 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 December 2005
Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 335
  Total Citations
  View Citations
- 5,793
  Total Downloads
- Downloads (Last 12 months)262
- Downloads (Last 6 weeks)32
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Scientific data management in the coming decade

ACM SIGMOD Record

Abstract

References

Cited By

Index Terms

Recommendations

Big Data Management: Advanced Issues and Approaches

Research on Scientific Data Management in Big Data Era

Making sense of performance in in-memory computing frameworks for scientific data analysis: A case study of the spark system

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Scientific data management in the coming decade

ACM SIGMOD Record

Abstract

References

Cited By

Index Terms

Recommendations

Big Data Management: Advanced Issues and Approaches

Research on Scientific Data Management in Big Data Era

Making sense of performance in in-memory computing frameworks for scientific data analysis: A case study of the spark system

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media