ABSTRACT
Numerical reproducibility and stability of large scale scientific simulations, especially climate modeling, on distributed memory parallel computers are becoming critical issues. In particular, global summation of distributed arrays is most susceptible to rounding errors, and their propagation and accumulation cause uncertainty in final simulation results. We analyzed several accurate summation methods and found that two methods are particularly effective to improve (ensure) reproducibility and stability: Kahan's self-compensated summation and Bailey's double-double precision summation. We provide an MPI operator MPLSUMDD to work with MPI collective operations to ensure a scalable implementation on large number of processors. The final methods are particularly simple to adopt in practical codes.
- 1.D. H. Barley. A Fortran-90 Suite of Double-Double Precision Programs. See web page at http: //www.nersc.gov/~ dhb/mpdizt/mpdist.html.Google Scholar
- 2.D. H. Bailey. Multiprecision Translation and Execution of Fortran Programs. A CM Transactions on Mathematical Software, 19(3):288-319, September 1993. Google ScholarDigital Library
- 3.R. P. Brent. A Fortran Multiple Precision Arithmetic Package. A CM Transactions on Mathematical Software, 4:57-70, 1978. Google ScholarDigital Library
- 4.J. Demmel, X. Li, D. Bailey, M. Martin, 3. Iskandar, and A. Kapur. A Reference Implementation for Extended and Mixed Precision BLAS. In Preparation.Google Scholar
- 5.C. H. Q. Ding and R. D. Ferraro. A Parallel Climate Data Assimilation Package. SIAM News, pages 1-12, November 1996.Google Scholar
- 6.C. H. Q. Ding and Y. He. Data Organization and I/O in an Ocean Circulation Model. In Proceedings of Supercomputing'99, November 1999. Also LBL report number LBNL-43384, May 1999. Google ScholarDigital Library
- 7.C. H. Q. Ding, P. Lyster, J. Larson, J. Guo, and A. da Silva. Atmospheric Data Assimilation on Distributed Parallel Supercomputers. Lecture Notes in Computer Science, 1401:115-124. Ed. P. Sloot et al., Springer, April 1998. Google ScholarDigital Library
- 8.J. Drake, I. Foster, J. Michalakes, B. Toonen, and P. Worley. Design and Performance of a Scalable Parallel Community Climate Model. Parallel Computing (PCCM2), 21:1571, 1995. Google ScholarDigital Library
- 9.G. Fox, M. Johnsonl G. Lyzenga, S. Otto, J. Salmon, and D. Walker. Solving Problems on Concurrent Processors. Vol 1. Prentice Hall, Englewood Cliffs, New Jersey, 1988. Google ScholarDigital Library
- 10.D. Goldberg. What Every Computer Scientist Should Know About Floating-Point Arithmetic. A CM Computing Surveys, March 1991. Google ScholarDigital Library
- 11.A. Greenbaum. Iterative Methods for Solvong Linear Systems. Frontiers in Applied Mathematics. Vol 17. SIAM. Philadelphia, PA, 1997. Google ScholarDigital Library
- 12.S. M. Griflies, R. C. Pacanowski, M. Schmidt, and V. Balaji. The Explicit Free Surface Method in the GFDL Modular Ocean Model. Submitted to Monthly Weather Review, 1999.Google Scholar
- 13.J. J. Hack, J. M. Rosinski, D. L. Williamson, B. A. Boville, and J. E. Truesdale. Computational Design of NCAR Community Climate Model. Parallel Computing, 21:1545, 1995. Google ScholarDigital Library
- 14.Y. He and C. H. Q. Ding. Parallel Ocean Model Development at NERSC. See web page at http://www.nersc.gov/research/SCG/ocean.Google Scholar
- 15.N. J. Higham. Accuracy and Stability of Numerical Algorithms. SIAM Press, Philadelphia, PA, 1996. Google ScholarDigital Library
- 16.W. Kahan. Further Remarks on Reducing Truncation Errors. Comm. ACM, page 40, 1965. Google ScholarDigital Library
- 17.D. E. Knuth. The Art of Computer Programming. Vol 2, Chap4, Arithmetic. Addison-Wesley Press, Reading, MA, 1969. Google ScholarDigital Library
- 18.D. Moore. Class Notes for CAAM 420: Introduction to Computational Science. Rice University, Spring 1999. See web page at http://www.owlnet.rice.edu/~ caam420/Outline.html.Google Scholar
- 19.The NCAR Ocean Model User's Guide. Vet 1.4. See web page at http://www.cgd.ucar.edu/csm/models/ ocn-ncom/UserGuidel_4.html, 1998.Google Scholar
- 20.R. C. Pacanowsld and S. M. Griflles. MOM 3.0 Manual. GFDL Ocean Circulation Group, Geophysical Fluid Dynamics Laboratory, Princeton, N J, September 1999.Google Scholar
- 21.W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical Recipes in Fortran: the Art of Scientific Computing. 2nd Edition. Cambridge University Press, Cambridge, UK, 1992. Google ScholarDigital Library
- 22.D. M. Priest. Algorithms for Arbitrary Precision Floating Point Arithmetic. On Properties of Floating Point Arithmetics: Numerical Stability and the Cost of Accurate Computations. Ph.D. Thesis. Mathematics Dept. University of California, Berkeley, 1992. Google ScholarDigital Library
- 23.R. D. Smith, J. K. Dukowicz, and R. C. Malone. Parallel Ocean General Circulation Modeling. Physica, D60:38, 1992. See web page at http://www.acl.lanl.gov/climate/models/pop. Google ScholarDigital Library
- 24.Second International Workshop for Software Engineering and Code Design for Parallel Meteorological and Oceanographic Applications. Scottsdale, AZ, June 1998.Google Scholar
- 25.Workshop on Numerical Benchmarks for Climate/ Ocean/Weather Modeling Community. Boulder, CO, June 1999.Google Scholar
Index Terms
- Using accurate arithmetics to improve numerical reproducibility and stability in parallel applications
Recommendations
Using Accurate Arithmetics to Improve Numerical Reproducibility and Stability in Parallel Applications
Numerical reproducibility and stability of large scale scientific simulations, especially climate modeling, on distributed memory parallel computers are becoming critical issues. In particular, global summation of distributed arrays is most susceptible ...
Fast, good, and repeatable: Summations, vectorization, and reproducibility
Enhanced-precision global sums are key to reproducibility in exascale applications. We examine two classic summation algorithms and show that vectorized versions are fast, good and reproducible at exascale. Both 256-bit and 512-bit implementations speed ...
Reproducibility model for wireless sensor networks parallel simulations
AbstractSeveral wireless sensor networks (WSNs) simulations run in parallel computer architectures to improve their scalability. The main problem with this strategy is guaranteeing the reproducibility transparently to simulation users. We present a ...
Comments