Skip to main content
Top
Published in: International Journal of Parallel Programming 6/2015

01-12-2015

Extending Summation Precision for Network Reduction Operations

Authors: George Michelogiannakis, Xiaoye S. Li, David H. Bailey, John Shalf

Published in: International Journal of Parallel Programming | Issue 6/2015

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Double precision summation is at the core of numerous important algorithms such as Newton–Krylov methods and other operations involving inner products, such as matrix multiplication and dot products. However, the effectiveness of summation is limited by the accumulation of rounding errors due to compressed representations, which are an increasing problem with the scaling of modern HPC systems and data sets that can easily perform summations with millions or billions of operands. To reduce the impact of precision loss, researchers have proposed increased- and arbitrary-precision libraries that provide reproducible error or even bounded error accumulation for large sums. However, such libraries increase computation and communication time significantly, and do not always guarantee an exact result. In this article, we propose fixed-point representations of double precision variables that enable arbitrarily large summations without error and provide exact and reproducible results. We call this format big integer (BigInt). Even though such formats have been studied for local processor computations, we make the case that using fixed-point representation for distributed computation over a system-wide network is feasible with performance comparable to that of double-precision floating point summation. This is possible by the inclusion of simple and inexpensive logic into modern NICs, or by using the programmable logic found in many modern NICs, in order to accelerate performance on large-scale systems in order to avoid waking up processors.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
2.
go back to reference Allen, E., Burns, J., Gilliam, D., Hill, J., Shubov, V.: The impact of finite precision arithmetic and sensitivity on the numerical solution of partial differential equations. Math. Comput. Model. 35(11–12) (2002). doi:10.1016/S0895-7177(02)00078-X Allen, E., Burns, J., Gilliam, D., Hill, J., Shubov, V.: The impact of finite precision arithmetic and sensitivity on the numerical solution of partial differential equations. Math. Comput. Model. 35(11–12) (2002). doi:10.​1016/​S0895-7177(02)00078-X
3.
go back to reference Antypas, K.: The Hopper XE6 system: delivering high end computing to the nation’s science and research community. Tech. rep, Cray Quarterly Review (2011) Antypas, K.: The Hopper XE6 system: delivering high end computing to the nation’s science and research community. Tech. rep, Cray Quarterly Review (2011)
6.
go back to reference Bailey, D.H., Barrio, R., Borwein, J.M.: High-precision computation: mathematical physics and dynamics. Appl. Math. Comput. 218(20), 10106–10121 (2012) Bailey, D.H., Barrio, R., Borwein, J.M.: High-precision computation: mathematical physics and dynamics. Appl. Math. Comput. 218(20), 10106–10121 (2012)
7.
go back to reference Bailey, D.H., Hida, Y., Li, X.S., Thompson, O.: ARPREC: an arbitrary precision computation package. Tech. rep, Lawrence Berkeley National Laboratory (2002) Bailey, D.H., Hida, Y., Li, X.S., Thompson, O.: ARPREC: an arbitrary precision computation package. Tech. rep, Lawrence Berkeley National Laboratory (2002)
8.
go back to reference Boden, N.J., Cohen, D., Felderman, R.E., Kulawik, A.E., Seitz, C.L., Seizovic, J.N., Su, W.K.: Myrinet: A gigabit-per-second local area network. IEEE Micro 15(1), 29–36 (1995). doi:10.1109/40.342015 CrossRef Boden, N.J., Cohen, D., Felderman, R.E., Kulawik, A.E., Seitz, C.L., Seizovic, J.N., Su, W.K.: Myrinet: A gigabit-per-second local area network. IEEE Micro 15(1), 29–36 (1995). doi:10.​1109/​40.​342015 CrossRef
9.
go back to reference Buntinas, D., Panda, D.K.: NIC-based reduction in Myrinet clusters: is it beneficial? In: SAN-02 Workshop (2003) Buntinas, D., Panda, D.K.: NIC-based reduction in Myrinet clusters: is it beneficial? In: SAN-02 Workshop (2003)
10.
go back to reference Carreo, V.A., Miner, P.S.: Specification of the IEEE-854 floating-point standard in HOL and PVS (1995) Carreo, V.A., Miner, P.S.: Specification of the IEEE-854 floating-point standard in HOL and PVS (1995)
12.
go back to reference Chervenak, A., Deelman, E., Livny, M., Su, M.H., Schuler, R., Bharathi, S., Mehta, G., Vahi, K.: Data placement for scientific applications in distributed environments. In: 8th IEEE/ACM International Conference on Grid Computing, GRID ’07 (2007). doi:10.1109/GRID.2007.4354142 Chervenak, A., Deelman, E., Livny, M., Su, M.H., Schuler, R., Bharathi, S., Mehta, G., Vahi, K.: Data placement for scientific applications in distributed environments. In: 8th IEEE/ACM International Conference on Grid Computing, GRID ’07 (2007). doi:10.​1109/​GRID.​2007.​4354142
13.
go back to reference Chesneaux, J.M., Graillat, S., Jézéquel, F.: Rounding errors. In: Wiley Encyclopedia of Computer Science and Engineering (2008) Chesneaux, J.M., Graillat, S., Jézéquel, F.: Rounding errors. In: Wiley Encyclopedia of Computer Science and Engineering (2008)
14.
go back to reference Corporation, I.: Intel 64 and IA-32 architectures developer’s manual: vol. 1 (2012) Corporation, I.: Intel 64 and IA-32 architectures developer’s manual: vol. 1 (2012)
15.
go back to reference Damaraju, S., George, V., Jahagirdar, S., Khondker, T., Milstrey, R., Sarkar, S., Siers, S., Stolero, I., Subbiah, A.: A 22nm IA multi-CPU and GPU system-on-chip. In: 59th IEEE International Solid-State Circuits Conference Digest of Technical Papers, ISSCC ’12 (2012). doi:10.1109/ISSCC.2012.6176876 Damaraju, S., George, V., Jahagirdar, S., Khondker, T., Milstrey, R., Sarkar, S., Siers, S., Stolero, I., Subbiah, A.: A 22nm IA multi-CPU and GPU system-on-chip. In: 59th IEEE International Solid-State Circuits Conference Digest of Technical Papers, ISSCC ’12 (2012). doi:10.​1109/​ISSCC.​2012.​6176876
16.
17.
go back to reference Demmel, J., Dumitriu, I., Holtz, O., Koev, P.: Accurate and efficient expression evaluation and linear algebra. Comput. Res. Reporitory abs/0712.4027 (2007) Demmel, J., Dumitriu, I., Holtz, O., Koev, P.: Accurate and efficient expression evaluation and linear algebra. Comput. Res. Reporitory abs/0712.4027 (2007)
18.
go back to reference Demmel, J., Nguyen, H.D.: Fast reproducible floating-point summation. In: 21st IEEE Symposium on Computer Arithmetic (2013) Demmel, J., Nguyen, H.D.: Fast reproducible floating-point summation. In: 21st IEEE Symposium on Computer Arithmetic (2013)
20.
go back to reference Fousse, L., Hanrot, G., Lefèvre, V., Pélissier, P., Zimmermann, P.: MPFR: a multiple-precision binary floating-point library with correct rounding. ACM Trans. Math. Softw. 33(2) (2007). doi:10.1145/1236463.1236468 Fousse, L., Hanrot, G., Lefèvre, V., Pélissier, P., Zimmermann, P.: MPFR: a multiple-precision binary floating-point library with correct rounding. ACM Trans. Math. Softw. 33(2) (2007). doi:10.​1145/​1236463.​1236468
23.
go back to reference Govindu, G., Zhuo, L., Choi, S., Prasanna, V.: Analysis of high-performance floating-point arithmetic on FPGAs. In: 18th IEEE International Parallel and Distributed Processing Symposium, IPDPS ’04 (2004). doi:10.1109/IPDPS.2004.1303135 Govindu, G., Zhuo, L., Choi, S., Prasanna, V.: Analysis of high-performance floating-point arithmetic on FPGAs. In: 18th IEEE International Parallel and Distributed Processing Symposium, IPDPS ’04 (2004). doi:10.​1109/​IPDPS.​2004.​1303135
24.
go back to reference Graillat, S., Ménissier-Morain, V.: Accurate summation, dot product and polynomial evaluation in complex floating point arithmetic. Inf. Comput. 216, 57–71 (2012)MATHCrossRef Graillat, S., Ménissier-Morain, V.: Accurate summation, dot product and polynomial evaluation in complex floating point arithmetic. Inf. Comput. 216, 57–71 (2012)MATHCrossRef
25.
go back to reference Granlund, T., the GMP development team: GNU MP: the GNU Multiple Precision Arithmetic Library, 5.0.5 edn. (2012) Granlund, T., the GMP development team: GNU MP: the GNU Multiple Precision Arithmetic Library, 5.0.5 edn. (2012)
26.
go back to reference He, Y., Ding, C.H.Q.: Using accurate arithmetics to improve numerical reproducibility and stability in parallel applications. In: 14th International Conference on Supercomputing, ICS ’00 (2000). doi:10.1145/335231.335253 He, Y., Ding, C.H.Q.: Using accurate arithmetics to improve numerical reproducibility and stability in parallel applications. In: 14th International Conference on Supercomputing, ICS ’00 (2000). doi:10.​1145/​335231.​335253
29.
go back to reference Higham, N.J.: The accuracy of floating point summation. SIAM J. Sci. Comput. 14, 783–799 (1993) Higham, N.J.: The accuracy of floating point summation. SIAM J. Sci. Comput. 14, 783–799 (1993)
30.
go back to reference Hoefler, T., Gottlieb, S.: Parallel zero-copy algorithms for fast Fourier transform and conjugate gradient using MPI datatypes. In: Proceedings of the 17th European MPI Users’ Group Meeting Conference on Recent Advances in the Message Passing Interface, EuroMPI’10, pp. 132–141 (2010). http://dl.acm.org/citation.cfm?id=1894122.1894140 Hoefler, T., Gottlieb, S.: Parallel zero-copy algorithms for fast Fourier transform and conjugate gradient using MPI datatypes. In: Proceedings of the 17th European MPI Users’ Group Meeting Conference on Recent Advances in the Message Passing Interface, EuroMPI’10, pp. 132–141 (2010). http://​dl.​acm.​org/​citation.​cfm?​id=​1894122.​1894140
31.
32.
go back to reference Hong, X., Chongyang, W., Jiangyu, Y.: Analysis and research of floating-point exceptions. In: 2nd International Conference on Information Science and Engineering, ICISE ’10 (2010). doi:10.1109/ICISE.2010.5690343 Hong, X., Chongyang, W., Jiangyu, Y.: Analysis and research of floating-point exceptions. In: 2nd International Conference on Information Science and Engineering, ICISE ’10 (2010). doi:10.​1109/​ICISE.​2010.​5690343
34.
go back to reference IEEE: IEEE standard for floating-point arithmetic. ANSI/IEEE Std 754–2008 (2008). DOI 10.1109/IEEESTD.2008.4610935 IEEE: IEEE standard for floating-point arithmetic. ANSI/IEEE Std 754–2008 (2008). DOI 10.1109/IEEESTD.2008.4610935
35.
go back to reference Katz, R.H.: Contemporary logic design. Benjamin-Cummings, Redwood City (1993) Katz, R.H.: Contemporary logic design. Benjamin-Cummings, Redwood City (1993)
36.
go back to reference Kielmann, T., Hofman, R.E.H., Bal, H.E., Plaat, A., Bhoedjang, R.A.E.: MPI’s reduction operations in clustered wide area systems. In: Message Passing Interface Developer’s and User’s Conference, MPIFC ’99 (1999). doi:10.1109/IPDPS.2006.1639334 Kielmann, T., Hofman, R.E.H., Bal, H.E., Plaat, A., Bhoedjang, R.A.E.: MPI’s reduction operations in clustered wide area systems. In: Message Passing Interface Developer’s and User’s Conference, MPIFC ’99 (1999). doi:10.​1109/​IPDPS.​2006.​1639334
37.
go back to reference Krueger, J., Donofrio, D., Shalf, J., Mohiyuddin, M., Williams, S., Oliker, L., Pfreund, F.J.: Hardware/software co-design for energy-efficient seismic modeling. In: Conference on High Performance Computing Networking, Storage and Analysis (2011) Krueger, J., Donofrio, D., Shalf, J., Mohiyuddin, M., Williams, S., Oliker, L., Pfreund, F.J.: Hardware/software co-design for energy-efficient seismic modeling. In: Conference on High Performance Computing Networking, Storage and Analysis (2011)
40.
go back to reference Kwon, T.J., Sondeen, J., Draper, J.: Design trade-offs in floating-point unit implementation for embedded and processing-in-memory systems. In: IEEE International Symposium on Circuits and Systems, ISCAS ’05 (2005). doi:10.1109/ISCAS.2005.1465341 Kwon, T.J., Sondeen, J., Draper, J.: Design trade-offs in floating-point unit implementation for embedded and processing-in-memory systems. In: IEEE International Symposium on Circuits and Systems, ISCAS ’05 (2005). doi:10.​1109/​ISCAS.​2005.​1465341
42.
go back to reference Petrini, F., Feng, W.c., Hoisie, A., Coll, S., Frachtenberg, E.: The quadrics network (QsNet): High-performance clustering technology. In: Proceedings of the The Ninth Symposium on High Performance Interconnects, HOTI ’01, pp. 125–130 (2001) Petrini, F., Feng, W.c., Hoisie, A., Coll, S., Frachtenberg, E.: The quadrics network (QsNet): High-performance clustering technology. In: Proceedings of the The Ninth Symposium on High Performance Interconnects, HOTI ’01, pp. 125–130 (2001)
43.
44.
go back to reference Pritchard, H., Gorodetsky, I., Buntinas, D.: A uGNI-based MPICH2 nemesis network module for the Cray XE. In: 18th European MPI Users’ Group Conference on Recent Advances in the Message Passing Interface, EuroMPI’11 (2011) Pritchard, H., Gorodetsky, I., Buntinas, D.: A uGNI-based MPICH2 nemesis network module for the Cray XE. In: 18th European MPI Users’ Group Conference on Recent Advances in the Message Passing Interface, EuroMPI’11 (2011)
45.
go back to reference Reussner, R., Sanders, P., Träff, J.L.: SKaMPI: a comprehensive benchmark for public benchmarking of MPI. Sci. Program. 10(1), 55–65 (2002) Reussner, R., Sanders, P., Träff, J.L.: SKaMPI: a comprehensive benchmark for public benchmarking of MPI. Sci. Program. 10(1), 55–65 (2002)
46.
go back to reference Ritzdorf, H., Traff, J.: Collective operations in NEC’s high-performance MPI libraries. In: International Parallel and Distributed Processing Symposium, IPDPS ’06 (2006). doi:10.1109/IPDPS.2006.1639334 Ritzdorf, H., Traff, J.: Collective operations in NEC’s high-performance MPI libraries. In: International Parallel and Distributed Processing Symposium, IPDPS ’06 (2006). doi:10.​1109/​IPDPS.​2006.​1639334
47.
go back to reference Schuite, M., Balzola, P., Akkas, A., Brocato, R.: Integer multiplication with overflow detection or saturation. IEEE Trans. Comput. 49(7), 681–691 (2000). doi:10.1109/12.863038 CrossRef Schuite, M., Balzola, P., Akkas, A., Brocato, R.: Integer multiplication with overflow detection or saturation. IEEE Trans. Comput. 49(7), 681–691 (2000). doi:10.​1109/​12.​863038 CrossRef
48.
go back to reference Shalf, J., Dosanjh, S., Morrison, J.: Exascale computing technology challenges. In: 9th International Conference on High Performance Computing for Computational Science, VECPAR’10 (2011) Shalf, J., Dosanjh, S., Morrison, J.: Exascale computing technology challenges. In: 9th International Conference on High Performance Computing for Computational Science, VECPAR’10 (2011)
50.
go back to reference Tsafrir, D.: The context-switch overhead inflicted by hardware interrupts (and the enigma of do-nothing loops). In: Experimental Computer Science on Experimental Computer Science, ECS ’07. USENIX Association (2007) Tsafrir, D.: The context-switch overhead inflicted by hardware interrupts (and the enigma of do-nothing loops). In: Experimental Computer Science on Experimental Computer Science, ECS ’07. USENIX Association (2007)
51.
go back to reference Vishnu, A., ten Bruggencate, M., Olson, R.: Evaluating the potential of Cray Gemini interconnect for PGAS communication runtime systems. In: 19th IEEE Annual Symposium on High Performance Interconnects, HOTI ’11 (2011). doi:10.1109/HOTI.2011.19 Vishnu, A., ten Bruggencate, M., Olson, R.: Evaluating the potential of Cray Gemini interconnect for PGAS communication runtime systems. In: 19th IEEE Annual Symposium on High Performance Interconnects, HOTI ’11 (2011). doi:10.​1109/​HOTI.​2011.​19
52.
go back to reference Vishnu, A., Koop, M., Moody, A., Mamidala, A., Narravula, S., Panda, D.: Hot-spot avoidance with multi-pathing over InfiniBand: an MPI perspective. In: 7th IEEE International Symposium on Cluster Computing and the Grid, CCGRID ’07 (2007). doi:10.1109/CCGRID.2007.60 Vishnu, A., Koop, M., Moody, A., Mamidala, A., Narravula, S., Panda, D.: Hot-spot avoidance with multi-pathing over InfiniBand: an MPI perspective. In: 7th IEEE International Symposium on Cluster Computing and the Grid, CCGRID ’07 (2007). doi:10.​1109/​CCGRID.​2007.​60
53.
Metadata
Title
Extending Summation Precision for Network Reduction Operations
Authors
George Michelogiannakis
Xiaoye S. Li
David H. Bailey
John Shalf
Publication date
01-12-2015
Publisher
Springer US
Published in
International Journal of Parallel Programming / Issue 6/2015
Print ISSN: 0885-7458
Electronic ISSN: 1573-7640
DOI
https://doi.org/10.1007/s10766-014-0326-5

Other articles of this Issue 6/2015

International Journal of Parallel Programming 6/2015 Go to the issue

Premium Partner