Skip to main content

2011 | OriginalPaper | Buchkapitel

20. On the Processing of Extreme Scale Datasets in the Geosciences

verfasst von : Sangmi Lee Pallickara, Matthew Malensek, Shrideep Pallickara

Erschienen in: Handbook of Data Intensive Computing

Verlag: Springer New York

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Observational measurements and model output data acquired or generated by the various research areas within the realm of Geosciences (also known as Earth Science) encompass a spatial scale of tens of thousands of kilometers and temporal scales of seconds to millions of years. Here geosciences refers to the study of atmosphere, hydrosphere, oceans, and biosphere as well as the earth’s core. Rapid advances in sensor deployments, computational capacity, and data storage density have been resulted in dramatic increases in the volume and complexity of data in geosciences. Geoscientists now see the data-intensive computing approach as part of their knowledge discovery process alongside traditional theoretical, experimental, and computational archetype [1]. Data-intensive computing poses unique challenges to the geoscience community that is exacerbated by the sheer size of the datasets involved.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat T. Hey, et al., The Fourth Paradigm: Data-Intensive Scientific Discovery. Redmond, Washington: Microsoft Corporation, 2009. T. Hey, et al., The Fourth Paradigm: Data-Intensive Scientific Discovery. Redmond, Washington: Microsoft Corporation, 2009.
2.
Zurück zum Zitat F. M. Hoffman, et al., “Multivariate Spatio-Temporal Clustering (MSTC) as a data mining tool for environmental applications,” in the iEMSs Fourth Biennial Meeting: International Congress on Environmental Modelling and Software Society (iEMSs 2008), 2008, pp. 1774–1781. F. M. Hoffman, et al., “Multivariate Spatio-Temporal Clustering (MSTC) as a data mining tool for environmental applications,” in the iEMSs Fourth Biennial Meeting: International Congress on Environmental Modelling and Software Society (iEMSs 2008), 2008, pp. 1774–1781.
3.
Zurück zum Zitat F. M. Hoffman, et al., “Data Mining in Earth System,” in the International Conference on Computational Science (ICCS), 2011, pp. 1450–1455. F. M. Hoffman, et al., “Data Mining in Earth System,” in the International Conference on Computational Science (ICCS), 2011, pp. 1450–1455.
4.
Zurück zum Zitat O. J. Reichman, et al. (2011) Challenges and opportunities of open data in ecology. Science. 703–705. O. J. Reichman, et al. (2011) Challenges and opportunities of open data in ecology. Science. 703–705.
5.
Zurück zum Zitat M. Keller, et al., “A continental strategy for the National Ecological Observatory Network,” Front. Ecol. Environ Special Issue on Continental-Scale Ecology, vol. 5, pp. 282–284, 2008. M. Keller, et al., “A continental strategy for the National Ecological Observatory Network,” Front. Ecol. Environ Special Issue on Continental-Scale Ecology, vol. 5, pp. 282–284, 2008.
6.
Zurück zum Zitat D. Schimel, et al., “NEON: A hierarchically designed national ecological network,” Front. Ecol. Environ, vol. 2, 2007. D. Schimel, et al., “NEON: A hierarchically designed national ecological network,” Front. Ecol. Environ, vol. 2, 2007.
8.
Zurück zum Zitat G. Percivall and C. Reed, “OGC Sensor Web Enabliment Standards,” Sensors and Transducers Journal, vol. 71, pp. 698–706, 2006. G. Percivall and C. Reed, “OGC Sensor Web Enabliment Standards,” Sensors and Transducers Journal, vol. 71, pp. 698–706, 2006.
9.
Zurück zum Zitat MTPE EOS Reference Handbook the EOS Project Science Office, code 900, NASA Goddard Space Flight Center, 1995. MTPE EOS Reference Handbook the EOS Project Science Office, code 900, NASA Goddard Space Flight Center, 1995.
13.
Zurück zum Zitat M. M. Kuhn, et al., “Dynamic file system semantics to enable metadata optimizations in PVFS,” Concurrency and Computation: Practice and Experience, vol. 21, 2009. M. M. Kuhn, et al., “Dynamic file system semantics to enable metadata optimizations in PVFS,” Concurrency and Computation: Practice and Experience, vol. 21, 2009.
14.
Zurück zum Zitat P. J. Braam, “Lustre: a scalable high-performance file system,” 2002. P. J. Braam, “Lustre: a scalable high-performance file system,” 2002.
15.
Zurück zum Zitat F. B. Schmuck and R. L. Haskin, “GPFS: A Shared-Disk File System for Large Computing Clusters,” in the Conference on File and Storage Technologies, 2002, pp. 231–244. F. B. Schmuck and R. L. Haskin, “GPFS: A Shared-Disk File System for Large Computing Clusters,” in the Conference on File and Storage Technologies, 2002, pp. 231–244.
16.
Zurück zum Zitat J. Lofstead, et al., “Managing Variability in the IO Performance of Petascale Storage Systems,” presented at the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, 2010. J. Lofstead, et al., “Managing Variability in the IO Performance of Petascale Storage Systems,” presented at the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, 2010.
17.
Zurück zum Zitat M. P. I. Forum, “MPI-2: Extensions to the Message-Passing Interface,” 1997. M. P. I. Forum, “MPI-2: Extensions to the Message-Passing Interface,” 1997.
18.
Zurück zum Zitat S. Ghemawat, et al., “The Google File System,” ACM SIGOPS Operating Systems Review, vol. 37, 2003. S. Ghemawat, et al., “The Google File System,” ACM SIGOPS Operating Systems Review, vol. 37, 2003.
21.
Zurück zum Zitat J. Li, et al., “Parallel netCDF: A high-performance scientific I/O interface,” in ACM Supercomputing (SC03), 2003. J. Li, et al., “Parallel netCDF: A high-performance scientific I/O interface,” in ACM Supercomputing (SC03), 2003.
22.
Zurück zum Zitat H. Abbasi, et al., “DataStager: scalable data staging services for petascale applications,” in ACM international Symposium on High Performance Distributed Computing, 2009. H. Abbasi, et al., “DataStager: scalable data staging services for petascale applications,” in ACM international Symposium on High Performance Distributed Computing, 2009.
23.
Zurück zum Zitat J. Craig Upson, et al., “The Application Visualization System: A computational environment for scientific visualization,” IEEE Computer Graphics and Applications, pp. 30–42, 1989. J. Craig Upson, et al., “The Application Visualization System: A computational environment for scientific visualization,” IEEE Computer Graphics and Applications, pp. 30–42, 1989.
25.
Zurück zum Zitat R. Daley, Atmospheric Data Analysis: Cambridge atmospheric and space science series, 1993. R. Daley, Atmospheric Data Analysis: Cambridge atmospheric and space science series, 1993.
26.
Zurück zum Zitat O. Wildi, Data Analysis in Vegetation Ecology Willey, 2010. O. Wildi, Data Analysis in Vegetation Ecology Willey, 2010.
27.
Zurück zum Zitat P. Rigaux, et al., Spatial Databases with Application to GIS: Morgan Kaufmann, 2002. P. Rigaux, et al., Spatial Databases with Application to GIS: Morgan Kaufmann, 2002.
28.
Zurück zum Zitat S. Shekhar and S. Chawla, Spatial Database: A Tour: Prentice Hall, 2002. S. Shekhar and S. Chawla, Spatial Database: A Tour: Prentice Hall, 2002.
29.
Zurück zum Zitat P. Longley, et al., Geographic Information Systems and Science, 3 ed.: John Wiley & Sons, 2011. P. Longley, et al., Geographic Information Systems and Science, 3 ed.: John Wiley & Sons, 2011.
30.
Zurück zum Zitat R. Rew and G. Davis, “NetCDF: an interface for scientific data access,” IEEE Computer Graphics and Applications, vol. 10, pp. 76–82, 1990.CrossRef R. Rew and G. Davis, “NetCDF: an interface for scientific data access,” IEEE Computer Graphics and Applications, vol. 10, pp. 76–82, 1990.CrossRef
32.
Zurück zum Zitat P. Cudre-Mauroux, et al., “A Demonstration of SciDB: A Science-Oriented DBMS,” in the 2009 VLDB Endowment 2009. P. Cudre-Mauroux, et al., “A Demonstration of SciDB: A Science-Oriented DBMS,” in the 2009 VLDB Endowment 2009.
33.
Zurück zum Zitat J. Buck, et al., “SciHadoop: Array-based Query Processing in Hadoop,” UCSC2011. J. Buck, et al., “SciHadoop: Array-based Query Processing in Hadoop,” UCSC2011.
36.
Zurück zum Zitat D. C. Wells, et al., “FITS: A Flexible Image Transport System,” Astronomy & Astrophysics, vol. 44, pp. 363–370, 1981. D. C. Wells, et al., “FITS: A Flexible Image Transport System,” Astronomy & Astrophysics, vol. 44, pp. 363–370, 1981.
37.
Zurück zum Zitat P. Cornillon, et al., “OPeNDAP: Accessing data in a distributed, heterogeneous environment,” Data Science Journal, vol. 2, pp. 164–174, 2003.CrossRef P. Cornillon, et al., “OPeNDAP: Accessing data in a distributed, heterogeneous environment,” Data Science Journal, vol. 2, pp. 164–174, 2003.CrossRef
38.
Zurück zum Zitat D. M. Karl, et al., “Building the long-term picture: U.S. JGOFS Time-series Programs,” Oceanography, pp. 6–17, 2001. D. M. Karl, et al., “Building the long-term picture: U.S. JGOFS Time-series Programs,” Oceanography, pp. 6–17, 2001.
39.
Zurück zum Zitat P. Ramsey, “PostGIS Manual,” ed: Refractions Research. P. Ramsey, “PostGIS Manual,” ed: Refractions Research.
40.
Zurück zum Zitat A. Guttman, “R-trees: a dynamic index structure for spatial searching,” in Proceedings of the 1984 ACM SIGMOD international conference on Management of data, ed. Boston, Massachusetts: ACM, 1984, pp. 47–57. A. Guttman, “R-trees: a dynamic index structure for spatial searching,” in Proceedings of the 1984 ACM SIGMOD international conference on Management of data, ed. Boston, Massachusetts: ACM, 1984, pp. 47–57.
41.
Zurück zum Zitat S. Tilak, et al., “The Ring Buffer Network Bus (RBNB) DataTurbine Streaming Data Middleware for Environmental Observing Systems,” in IEEE e-Science, 2007, pp. 125–133. S. Tilak, et al., “The Ring Buffer Network Bus (RBNB) DataTurbine Streaming Data Middleware for Environmental Observing Systems,” in IEEE e-Science, 2007, pp. 125–133.
42.
Zurück zum Zitat D. N. Williams, et al., “The Earth System Grid: Enabling Access to Multi-Model Climate Simulation Data,” Bulletin of the American Meteorological Society, vol. 90, pp. 195–205, 2009.CrossRef D. N. Williams, et al., “The Earth System Grid: Enabling Access to Multi-Model Climate Simulation Data,” Bulletin of the American Meteorological Society, vol. 90, pp. 195–205, 2009.CrossRef
43.
Zurück zum Zitat B. Domenico, et al., “Thematic Real-time Environmental Distributed Data Services (THREDDS): Incorporating Interactive Analysis Tools into NSDL,” Journal of Interactivity in Digital Libraries, vol. 2, 2002. B. Domenico, et al., “Thematic Real-time Environmental Distributed Data Services (THREDDS): Incorporating Interactive Analysis Tools into NSDL,” Journal of Interactivity in Digital Libraries, vol. 2, 2002.
44.
Zurück zum Zitat A. Shoshani, et al., “Storage Resource Managers (SRM) in the Earth System Grid,” Earth System Grid2009. A. Shoshani, et al., “Storage Resource Managers (SRM) in the Earth System Grid,” Earth System Grid2009.
45.
Zurück zum Zitat G. Khanna, et al., “A Dynamic Scheduling Approach for Coordinated Wide-Area Data Transfers using GridFTP,” in the 22nd IEEE International Parallel and Distributed Processing Symposium (IPDPS 2008), 2008. G. Khanna, et al., “A Dynamic Scheduling Approach for Coordinated Wide-Area Data Transfers using GridFTP,” in the 22nd IEEE International Parallel and Distributed Processing Symposium (IPDPS 2008), 2008.
47.
Zurück zum Zitat P. G. Brown, “Overview of sciDB: large scale array storage, processing and analysis,” in Proceedings of the 2010 international conference on Management of data, ed. Indianapolis, Indiana, USA: ACM, 2010, pp. 963–968. P. G. Brown, “Overview of sciDB: large scale array storage, processing and analysis,” in Proceedings of the 2010 international conference on Management of data, ed. Indianapolis, Indiana, USA: ACM, 2010, pp. 963–968.
48.
Zurück zum Zitat M. S. Mit, et al. (2009, Requirements for Science Data Bases and SciDB. M. S. Mit, et al. (2009, Requirements for Science Data Bases and SciDB.
49.
Zurück zum Zitat J. Dean and S. Ghemawat, “Mapreduce: Simplified data processing on large clusters,” Communications of the ACM, vol. 51, pp. 107–113, 2008.CrossRef J. Dean and S. Ghemawat, “Mapreduce: Simplified data processing on large clusters,” Communications of the ACM, vol. 51, pp. 107–113, 2008.CrossRef
50.
Zurück zum Zitat A. Akdogan, et al., “Voronoi-Based Geospatial Query Processing with MapReduce,” in Cloud Computing Technology and Science (CloudCom), 2010 IEEE Second International Conference on, ed, 2010, pp. 9–16. A. Akdogan, et al., “Voronoi-Based Geospatial Query Processing with MapReduce,” in Cloud Computing Technology and Science (CloudCom), 2010 IEEE Second International Conference on, ed, 2010, pp. 9–16.
51.
Zurück zum Zitat Y. Wang and S. Wang, “Research and implementation on spatial data storage and operation based on Hadoop platform,” in Geoscience and Remote Sensing (IITA-GRS), 2010 Second IITA International Conference on vol. 2, ed, 2010, pp. 275–278. Y. Wang and S. Wang, “Research and implementation on spatial data storage and operation based on Hadoop platform,” in Geoscience and Remote Sensing (IITA-GRS), 2010 Second IITA International Conference on vol. 2, ed, 2010, pp. 275–278.
54.
Zurück zum Zitat J. Wang, et al., “Kepler + Hadoop: a general architecture facilitating data-intensive applications in scientific workflow systems,” in Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science, ed. Portland, Oregon: ACM, 2009, pp. 12:1–12:8. J. Wang, et al., “Kepler + Hadoop: a general architecture facilitating data-intensive applications in scientific workflow systems,” in Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science, ed. Portland, Oregon: ACM, 2009, pp. 12:1–12:8.
Metadaten
Titel
On the Processing of Extreme Scale Datasets in the Geosciences
verfasst von
Sangmi Lee Pallickara
Matthew Malensek
Shrideep Pallickara
Copyright-Jahr
2011
Verlag
Springer New York
DOI
https://doi.org/10.1007/978-1-4614-1415-5_20