Weitere Kapitel dieses Buchs durch Wischen aufrufen
This chapter deals with fundamental theories on the accuracy of numerical calculation and some cases that seems to be important, somewhat different from previous chapters. We must remember that numerical errors are included in the output data of the computer. In particular, do not overlook the important points you need to know when parallelizing codes. Pursuit of calculation speed is, of course, the central theme of this book, however, it is premised that it produces correct results. This chapter introduces a numerical computation method with guaranteed accuracy in large-scale numerical computations, convergence accuracy problems in parallel computing, and high-precision calculation in HPC.
Bitte loggen Sie sich ein, um Zugang zu diesem Inhalt zu erhalten
Sie möchten Zugang zu diesem Inhalt erhalten? Dann informieren Sie sich jetzt über unsere Produkte:
For details on specifications in which there are two closest floating-point numbers or an arithmetic is approaching overflow, please consult the standard [ 1].
In Fortran language, when in conformity with the Fortran2003 standard, modules IEEE_ARITHMETIC or IEEE_FEATURES make it possible to change the rounding mode. For instance, by inputting the text CALL IEEE_SET_ROUNDING_MODE(IEEE_NEAREST), round-to-nearest will be the selected mode. The mode will be changed to rounding up if IEEE_NEAREST is replaced with IEEE_UP, and changed to rounding down if replaced with IEEE_DOWN.
If the computation order is changed owing to compiler optimization, an operation not conforming to the IEEE 754 Standard may be performed, unintentionally resulting in an operation that does not constitute a numerical computation with guaranteed accuracy. Therefore, in order to inhibit optimization, it is necessary to add a volatile attribute stipulated in C language and Fortran2003 standards as a variable, or to set up arithmetic so that they more strictly conform if optimization options ( -fp-model etc.) for floating-point numbers are included as compiler options.
When __float128 is defined in the standard C/C++, it is only necessary to switch __float128 with typedef. C/C++ are problematic in the size or interpretation of numbers; they may be different in different implementations or architectures, e.g., the “long double” can be either 80 bit (extended precision), IEEE 754 binary64 (double-precision), or IEEE 754 binary128 (quadruple precision). Moreover, “long double” means double-double by compilers on IBM Power processors. Even with the same 64-bit architecture, the data models such as LLP 64, LP 64, and ILP 64 are different; if two different binaries with the same program on the same machine and the same OS is compiled by two different compilers using different data models, they may give different results (segmentation fault usually occurs for unintended data model).
FMA performs \(a\times b+c\) in one clock, and it performs \(a\times b + c\) exactly and rounds the result to double-precision. It is often used for inner product calculations and matrix–matrix multiplications. Why such hardware is implemented in recent CPUs is that since every instruction must be processed in one clock, both an adder and multiplier must exist in its arithmetic unit. The processor would stall if this were not the case. Implementing FMA on a CPU is a good way to these utilizing these two operators maximally, as it fills up the adder and multiplier in the arithmetic unit.
SSE4 and AVX4 stand for Streaming SIMD Extensions and Intel Advanced Vector Extensions, and they can perform operations such as double-precision numbers collectively with one instruction.
IEEE Standard for Floating-Point Arithmetic, Std 754–2008 (2008)
S. Oishi, Numerical Methods with Guaranteed Accuracy (Corona-sya, 2000, Japanese)
E. Ramon, R. Moore, B. Kearfott, J. Michael, Introduction to Interval Analysis (Society for Industrial and Applied Mathematics, Cloud, 2009)
T. Ogita, S.M. Rump, S. Oishi, Verified solution of linear systems without directed rounding, Technical Report 2005–04 (Waseda University, Tokyo, Japan, Advanced Research Institute for Science and Engineering, 2005)
N.J. Higham, Accuracy and Stability of Numerical Algorithms, 2nd edn. (SIAM Publications, Philadelphia, 2002) CrossRef
S. Koshizuka, Y. Oka, Moving-particle semi-implicit method for fragmentation of incompressible fluid. Nuclear Sci. Eng. 123, 421–434 (1996) CrossRef
H. Togawa, Conjugate Gradient Method (Kyoiku Shuppan, 1977, in Japanese)
IEEE, IEEE standard for floating-point arithmetic, IEEE Std 754-2008, pp. 1–70 (2008)
D.H. Bailey, J.M. Borwein, High-precision arithmetic in mathematical physics. Mathematics 3, 337–367 (2015) CrossRef
G. Beliakov, Y. Matiyasevich, A parallel algorithm for calculation of large determinants with high accuracy for GPUs and MPI clusters. arXiv:1308.1536v2
N.J. Higham, SIAM: Society for Industrial and Applied Mathematics, 2nd edn. (2002)
H. Hasegawa, Utilizing the quadruple-precision floating-point arithmetic operation for the krylov subspace methods, in Proceedings of the 8th SIAM Conference on Applied Linear Algebra, vol. 25 (2012)
M. Nakata, B.J. Braams, K. Fujisawa, M. Fukuda, J.K. Percus, M. Yamashita, Z. Zhao, Variational calculation of second-order reduced density matrices by strong n-representability conditions and an accurate semidefinite programming solver. J. Chem. Phys. 128, 164113 (2008) CrossRef
F. Bornemann, D. Laurie, S. Wagon, J. Waldvogel, The SIAM 100-Digit Challenge: A Study in High-Accuracy Numerical Computing (Society for Industrial and Applied Mathematics, SIAM, 2004)
D.E. Knuth, Art of Computer Programming, Volume 2: Seminumerical Algorithms, 3rd edn. (Addison-Wesley Professional, 1997)
Y. Hida, X.S. Li, D.H. Bailey, Library for double-double and quad-double arithmetic, Technical report (Lawrence Berkeley National Laboratory, 2008)
M. Nakata, Y. Takao, S. Noda, R. Himeno, A fast implementation of matrix-matrix product in double-double precision on nvidia C2050 and application to semidefinite programming, in Third International Conference on Networking and Computing (ICNC) (2012)
T. Granlund, Gmp Development Team, GNU MP 6.0 Multiple Precision Arithmetic Library (Samurai Media Limited, United Kingdom, 2015)
L. Fousse, G. Hanrot, V. Lefevre, P. Pélissier, P. Zimmermann, MPFR: a multiple-precision binary floating-point library with correct rounding. ACM Trans. Math. Softw. 33, 13 (2007) CrossRef
A. Enge, M. Gastineau, P. Théveny, P. Zimmermann, mpc—a library for multiprecision complex arithmetic with exact rounding, INRIA, 1.0.3 edn., Feb 2015
M. Nakata, MPACK, RIKEN, 0.8.0 edn. (2012)
M. Nakata, Mpack0.6.7: a high precision linear algebra library. Appl. Math. 2110 (2011, In Japanese)
T. Koya, BNCpack, 0.7 edn. (Shizuoka Institute of Science and Technology, 2011)
B.N. Parlett, The Symmetric Eigenvalue Problem (Classics in Applied Mathematics) (Society for Industrial Mathematics, 1987)
- Techniques Concerning Computation Accuracy
- Springer Singapore
- Chapter 10
Neuer Inhalt/© ITandMEDIA