Skip to main content
Top
Published in: The Journal of Supercomputing 2/2021

23-05-2020

Thermal neutrons: a possible threat for supercomputer reliability

Published in: The Journal of Supercomputing | Issue 2/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The high performance, high efficiency, and low cost of Commercial Off-The-Shelf (COTS) devices make them attractive for applications with strict reliability constraints. Today, COTS devices are adopted in HPC and safety-critical applications such as autonomous driving. Unfortunately, the cheap natural boron widely used in COTS chip manufacturing process makes them highly susceptible to thermal (low energy) neutrons. In this paper, we demonstrate that thermal neutrons are a significant threat to COTS device reliability. For our study, we consider two DDR memories, an AMD APU, three NVIDIA GPUs, an Intel accelerator, and an FPGA executing a relevant set of algorithms. We consider different scenarios that impact the thermal neutron flux such as weather, concrete walls and floors, and HPC liquid cooling systems. Correlating beam experiments and neutron detector data, we show that thermal neutrons FIT rate could be comparable or even higher than the high energy neutron FIT rate.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Lucas R (2014) Top ten exascale research challenges. In: DOE ASCAC Subcommittee Report Lucas R (2014) Top ten exascale research challenges. In: DOE ASCAC Subcommittee Report
2.
go back to reference Cohen A, Shen X, Torrellas J, Tuck J, Zhou Y, Adve S, Akturk I, Bagchi S, Balasubramonian R, Barik R, Beck M, Bodik R, Butt A, Ceze L, Chen H, Chen Y, Chilimbi T, Christodorescu M, Criswell J, Ding C, Ding Y, Dwarkadas S, Elmroth E, Gibbons P, Guo X, Gupta R, Heiser G, Hoffman H, Huang J, Hunter H, Kim J, King S, Larus J, Liu C, Lu S, Lucia B, Maleki S, Mazumdar S, Neamtiu I, Pingali K, Rech P, Scott M, Solihin Y, Song D, Szefer J, Tsafrir D, Urgaonkar B, Wolf M, Xie Y, Zhao J, Zhong L, Zhu Y (2018) Inter-disciplinary research challenges in computer systems for the 2020s. Tech. rep, National Science Foundation, USA Cohen A, Shen X, Torrellas J, Tuck J, Zhou Y, Adve S, Akturk I, Bagchi S, Balasubramonian R, Barik R, Beck M, Bodik R, Butt A, Ceze L, Chen H, Chen Y, Chilimbi T, Christodorescu M, Criswell J, Ding C, Ding Y, Dwarkadas S, Elmroth E, Gibbons P, Guo X, Gupta R, Heiser G, Hoffman H, Huang J, Hunter H, Kim J, King S, Larus J, Liu C, Lu S, Lucia B, Maleki S, Mazumdar S, Neamtiu I, Pingali K, Rech P, Scott M, Solihin Y, Song D, Szefer J, Tsafrir D, Urgaonkar B, Wolf M, Xie Y, Zhao J, Zhong L, Zhu Y (2018) Inter-disciplinary research challenges in computer systems for the 2020s. Tech. rep, National Science Foundation, USA
4.
go back to reference Ziegler J, Puchner H (2004) SER-history. A Guide for Designing with Memory ICs (Cypress, Trends and Challenges Ziegler J, Puchner H (2004) SER-history. A Guide for Designing with Memory ICs (Cypress, Trends and Challenges
6.
go back to reference Snir M, Wisniewski RW, Abraham JA, Adve SV, Bagchi S, Balaji P, Belak J, Bose P, Cappello F, Carlson B, et al (2014) Addressing failures in exascale computing. Int J High Perform Comput Appl 1094342014522573 Snir M, Wisniewski RW, Abraham JA, Adve SV, Bagchi S, Balaji P, Belak J, Bose P, Cappello F, Carlson B, et al (2014) Addressing failures in exascale computing. Int J High Perform Comput Appl 1094342014522573
7.
go back to reference Dirk J, Nelson ME, Ziegler JF, Thompson A, Zabel TH (2003) Terrestrial thermal neutrons. IEEE Trans Nuclear Sci 50(6):2060CrossRef Dirk J, Nelson ME, Ziegler JF, Thompson A, Zabel TH (2003) Terrestrial thermal neutrons. IEEE Trans Nuclear Sci 50(6):2060CrossRef
8.
go back to reference JEDEC (2006) Measurement and reporting of alpha particle and terrestrial cosmic ray-induced soft errors in semiconductor devices. Tech. Rep. JESD89A, JEDEC Standard JEDEC (2006) Measurement and reporting of alpha particle and terrestrial cosmic ray-induced soft errors in semiconductor devices. Tech. Rep. JESD89A, JEDEC Standard
9.
go back to reference Baumann R, Hossain T, Smith E, Murata S, Kitagawa H (1995) Boron as a primary source of radiation in high density DRAMs. In: 1995 Symposium on VLSI Technology. Digest of Technical Papers. IEEE. IEEE, Kyoto, Japan, Japan, pp 81–82 Baumann R, Hossain T, Smith E, Murata S, Kitagawa H (1995) Boron as a primary source of radiation in high density DRAMs. In: 1995 Symposium on VLSI Technology. Digest of Technical Papers. IEEE. IEEE, Kyoto, Japan, Japan, pp 81–82
10.
go back to reference Normand E, Vranish K, Sheets A, Stitt M, Kim R (2006) Quantifying the double-sided neutron SEU threat, from low energy (thermal) and high energy>10 MeV) neutrons. IEEE Trans Nucl Sci 53(6):3587CrossRef Normand E, Vranish K, Sheets A, Stitt M, Kim R (2006) Quantifying the double-sided neutron SEU threat, from low energy (thermal) and high energy>10 MeV) neutrons. IEEE Trans Nucl Sci 53(6):3587CrossRef
11.
go back to reference Wen SJ, Pai S, Wong R, Romain M, Tam N (2010) B10 finding and correlation to thermal neutron soft error rate sensitivity for SRAMs in the sub-micron technology. In: 2010 IEEE International Integrated Reliability Workshop Final Report (IEEE), pp 31–33 Wen SJ, Pai S, Wong R, Romain M, Tam N (2010) B10 finding and correlation to thermal neutron soft error rate sensitivity for SRAMs in the sub-micron technology. In: 2010 IEEE International Integrated Reliability Workshop Final Report (IEEE), pp 31–33
12.
go back to reference Weulersse C, Houssany S, Guibbaud N, Segura-Ruiz J, Beaucour J, Miller F, Mazurek M (2018) Contribution of thermal neutrons to soft error rate. IEEE Trans Nucl Sci 65(8):1851CrossRef Weulersse C, Houssany S, Guibbaud N, Segura-Ruiz J, Beaucour J, Miller F, Mazurek M (2018) Contribution of thermal neutrons to soft error rate. IEEE Trans Nucl Sci 65(8):1851CrossRef
13.
go back to reference Lee S, Kim I, Ha S, Yu Cs, Noh J, Pae S, Park J (2015) Radiation-induced soft error rate analyses for 14 nm FinFET SRAM devices. In: 2015 IEEE International Reliability Physics Symposium. IEEE (IEEE), pp 4B–1 Lee S, Kim I, Ha S, Yu Cs, Noh J, Pae S, Park J (2015) Radiation-induced soft error rate analyses for 14 nm FinFET SRAM devices. In: 2015 IEEE International Reliability Physics Symposium. IEEE (IEEE), pp 4B–1
14.
go back to reference Fang YP, Oates AS (2016) Characterization of single bit and multiple cell soft error events in planar and FinFET SRAMs. IEEE Trans Device Mater Reliab 16(2):132CrossRef Fang YP, Oates AS (2016) Characterization of single bit and multiple cell soft error events in planar and FinFET SRAMs. IEEE Trans Device Mater Reliab 16(2):132CrossRef
15.
go back to reference Maillard P, Hart M, Barton J, Jain P, Karp J (2015) Neutron, 64 MeV proton, thermal neutron and alpha single-event upset characterization of Xilinx 20nm UltraScale Kintex FPGA. In: 2015 IEEE Radiation Effects Data Workshop (REDW) (IEEE, 2015), pp 1–5 Maillard P, Hart M, Barton J, Jain P, Karp J (2015) Neutron, 64 MeV proton, thermal neutron and alpha single-event upset characterization of Xilinx 20nm UltraScale Kintex FPGA. In: 2015 IEEE Radiation Effects Data Workshop (REDW) (IEEE, 2015), pp 1–5
16.
go back to reference Hess VF (1913) Über den Ursprung der durchdringenden Strahlung. Z Phys 14:610 Hess VF (1913) Über den Ursprung der durchdringenden Strahlung. Z Phys 14:610
18.
go back to reference Hands A, Morris P, Ryden K, Dyer C, Truscott P, Chugg A, Parker S (2011) Single event effects in power MOSFETs due to atmospheric and thermal neutrons. IEEE Trans Nucl Sci 58(6):2687CrossRef Hands A, Morris P, Ryden K, Dyer C, Truscott P, Chugg A, Parker S (2011) Single event effects in power MOSFETs due to atmospheric and thermal neutrons. IEEE Trans Nucl Sci 58(6):2687CrossRef
19.
go back to reference Baumann R (2000) Soft error characterization and modeling methodologies at Texas Instruments. In: Proceedings of the Semiconductor Research Council 4th Topical Conference Reliability.[CD-Rom] SemaTech CD-ROM (SemaTech, USA), pp 0043–3283 Baumann R (2000) Soft error characterization and modeling methodologies at Texas Instruments. In: Proceedings of the Semiconductor Research Council 4th Topical Conference Reliability.[CD-Rom] SemaTech CD-ROM (SemaTech, USA), pp 0043–3283
20.
go back to reference Sheu R, Jiang S (2003) Cosmic-ray-induced neutron spectra and effective dose rates near air/ground and air/water interfaces in Taiwan. Health Phys 84(1):92CrossRef Sheu R, Jiang S (2003) Cosmic-ray-induced neutron spectra and effective dose rates near air/ground and air/water interfaces in Taiwan. Health Phys 84(1):92CrossRef
23.
go back to reference Capozzoli A, Primiceri G (2015) Cooling systems in data centers: state of art and emerging technologies. Energy Procedia 83:484CrossRef Capozzoli A, Primiceri G (2015) Cooling systems in data centers: state of art and emerging technologies. Energy Procedia 83:484CrossRef
25.
go back to reference Gao T, David M, Geer J, Schmidt R, Sammakia B (2015) Experimental and numerical dynamic investigation of an energy efficient liquid cooled chiller-less data center test facility. Energy Build 91:83CrossRef Gao T, David M, Geer J, Schmidt R, Sammakia B (2015) Experimental and numerical dynamic investigation of an energy efficient liquid cooled chiller-less data center test facility. Energy Build 91:83CrossRef
26.
go back to reference Ellsworth M, Campbell L, Simons R, Iyengar M, Schmidt R, Chu R (2008) The evolution of water cooling for IBM large server systems: Back to the future. In: 2008 11th Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems. IEEE (IEEE), pp 266–274 Ellsworth M, Campbell L, Simons R, Iyengar M, Schmidt R, Chu R (2008) The evolution of water cooling for IBM large server systems: Back to the future. In: 2008 11th Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems. IEEE (IEEE), pp 266–274
27.
go back to reference Ellsworth MJ, Goth GF, Zoodsma RJ, Arvelo A, Campbell LA, Anderl WJ (2012) An overview of the IBM power 775 supercomputer water cooling system. J Electron Packag 134(2):020906CrossRef Ellsworth MJ, Goth GF, Zoodsma RJ, Arvelo A, Campbell LA, Anderl WJ (2012) An overview of the IBM power 775 supercomputer water cooling system. J Electron Packag 134(2):020906CrossRef
28.
go back to reference Che S, Boyer M, Meng J, Tarjan D, Sheaffer JW, Lee SH, Skadron K (2009) Rodinia: a benchmark suite for heterogeneous computing. In: Proceedings of the IEEE International Symposium on Workload Characterization (IISWC). Austin, TX, USA, IEEE, pp 44–54 Che S, Boyer M, Meng J, Tarjan D, Sheaffer JW, Lee SH, Skadron K (2009) Rodinia: a benchmark suite for heterogeneous computing. In: Proceedings of the IEEE International Symposium on Workload Characterization (IISWC). Austin, TX, USA, IEEE, pp 44–54
29.
go back to reference Fragkiadaki K, Zhang W, Zhang G, Shi J (2012) Two-granularity tracking: mediating trajectory and detection graphs for tracking under occlusions In: European Conference on Computer Vision. Springer, pp 552–565 Fragkiadaki K, Zhang W, Zhang G, Shi J (2012) Two-granularity tracking: mediating trajectory and detection graphs for tracking under occlusions In: European Conference on Computer Vision. Springer, pp 552–565
33.
go back to reference Cazzaniga C, Frost CD (2018) Progress of the Scientific Commissioning of a fast neutron beamline for Chip Irradiation. J Phys: Conf Ser 1021:012037 Cazzaniga C, Frost CD (2018) Progress of the Scientific Commissioning of a fast neutron beamline for Chip Irradiation. J Phys: Conf Ser 1021:012037
34.
go back to reference Chiesa D, Nastasi M, Cazzaniga C, Rebai M, Arcidiacono L, Previtali E, Gorini G, Frost CD (2018) Measurement of the neutron flux at spallation sources using multi-foil activation. Nuclear Instruments and Methods in Physics Research Section A: Accelerators. Spectrometers, Detectors and Associated Equipment Chiesa D, Nastasi M, Cazzaniga C, Rebai M, Arcidiacono L, Previtali E, Gorini G, Frost CD (2018) Measurement of the neutron flux at spallation sources using multi-foil activation. Nuclear Instruments and Methods in Physics Research Section A: Accelerators. Spectrometers, Detectors and Associated Equipment
35.
go back to reference Tietze H, Schmidt W, Geick R (1989) Rotax, a spectrometer for coherent neutron inelastic scattering at ISIS. Phys B: Condens Matter 156:550CrossRef Tietze H, Schmidt W, Geick R (1989) Rotax, a spectrometer for coherent neutron inelastic scattering at ISIS. Phys B: Condens Matter 156:550CrossRef
36.
go back to reference Oliveira D, Pilla L, DeBardeleben N, Blanchard S, Quinn H, Koren I, Navaux P, Rech P (2017) Experimental and Analytical Study of Xeon Phi Reliability. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (ACM, New York, NY, USA), SC’17, pp 28:1–28:12. https://doi.org/10.1145/3126908.3126960 Oliveira D, Pilla L, DeBardeleben N, Blanchard S, Quinn H, Koren I, Navaux P, Rech P (2017) Experimental and Analytical Study of Xeon Phi Reliability. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (ACM, New York, NY, USA), SC’17, pp 28:1–28:12. https://​doi.​org/​10.​1145/​3126908.​3126960
37.
go back to reference Oliveira D, Pilla L, Hanzich M, Fratin V, Fernandes F, Lunardi C, Cela J, Navaux P, Carro L, Rech P (2017) Radiation-induced error criticality in modern HPC parallel accelerators. In: Proceedings of 21st IEEE Symposium on High Performance Computer Architecture (HPCA) (ACM) Oliveira D, Pilla L, Hanzich M, Fratin V, Fernandes F, Lunardi C, Cela J, Navaux P, Carro L, Rech P (2017) Radiation-induced error criticality in modern HPC parallel accelerators. In: Proceedings of 21st IEEE Symposium on High Performance Computer Architecture (HPCA) (ACM)
38.
go back to reference Constantinescu C (2008) Intermittent faults and effects on reliability of integrated circuits. In: Reliability and Maintainability Symposium, 2008. RAMS 2008. Annual. IEEE (IEEE, Las Vegas, NV, USA), pp 370–374 Constantinescu C (2008) Intermittent faults and effects on reliability of integrated circuits. In: Reliability and Maintainability Symposium, 2008. RAMS 2008. Annual. IEEE (IEEE, Las Vegas, NV, USA), pp 370–374
41.
go back to reference Association EI et al (1996) Test procedures for the measurement of single-event effects in semiconductor devices from heavy ion irradiation. EIA/JEDEC Standard (57) Association EI et al (1996) Test procedures for the measurement of single-event effects in semiconductor devices from heavy ion irradiation. EIA/JEDEC Standard (57)
42.
go back to reference Constantinescu C (2002) Impact of deep submicron technology on dependability of VLSI circuits. In: IEEE Proceedings of the International Conference on Dependable Systems and Networks, 2002 DSN 2002. IEEE, Washington, DC, USA, pp 205–209 Constantinescu C (2002) Impact of deep submicron technology on dependability of VLSI circuits. In: IEEE Proceedings of the International Conference on Dependable Systems and Networks, 2002 DSN 2002. IEEE, Washington, DC, USA, pp 205–209
43.
go back to reference Sridharan V, Stearley J, DeBardeleben N, Blanchard S, Gurumurthi S (2013) Feng shui of supercomputer memory: positional effects in DRAM and SRAM faults. In: Proceedings of SC13: International Conference for High Performance Computing. Storage and Analysis. ACM, Networking, p 22 Sridharan V, Stearley J, DeBardeleben N, Blanchard S, Gurumurthi S (2013) Feng shui of supercomputer memory: positional effects in DRAM and SRAM faults. In: Proceedings of SC13: International Conference for High Performance Computing. Storage and Analysis. ACM, Networking, p 22
45.
go back to reference Sridharan V, Liberty D (2012) A study of DRAM failures in the field. In: 2012 International Conference for (IEEE) High Performance Computing, Networking, Storage and Analysis (SC), pp 1–11 Sridharan V, Liberty D (2012) A study of DRAM failures in the field. In: 2012 International Conference for (IEEE) High Performance Computing, Networking, Storage and Analysis (SC), pp 1–11
46.
go back to reference Fratin V, Oliveira D, Lunardi C, Santos F, Rodrigues G, Rech P (2018) Code-dependent and architecture-dependent reliability behaviors. In: 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE (IEEE), pp 13–26 Fratin V, Oliveira D, Lunardi C, Santos F, Rodrigues G, Rech P (2018) Code-dependent and architecture-dependent reliability behaviors. In: 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE (IEEE), pp 13–26
48.
go back to reference Jeon H, Wilkening M, Sridharan V, Gurumurthi S, Loh GH (2013) Architectural vulnerability modeling and analysis of integrated graphics processors In: IEEE 10th Workshop on Silicon Errors in Logic—System Effects (SELSE) (IEEE) Jeon H, Wilkening M, Sridharan V, Gurumurthi S, Loh GH (2013) Architectural vulnerability modeling and analysis of integrated graphics processors In: IEEE 10th Workshop on Silicon Errors in Logic—System Effects (SELSE) (IEEE)
51.
go back to reference Werner CJ et al (2018) mcnp6.2 release notes Werner CJ et al (2018) mcnp6.2 release notes
52.
go back to reference Leo WR (2012) Techniques for nuclear and particle physics experiments: a how-to approach. Springer, Berlin Leo WR (2012) Techniques for nuclear and particle physics experiments: a how-to approach. Springer, Berlin
Metadata
Title
Thermal neutrons: a possible threat for supercomputer reliability
Publication date
23-05-2020
Published in
The Journal of Supercomputing / Issue 2/2021
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-020-03324-9

Other articles of this Issue 2/2021

The Journal of Supercomputing 2/2021 Go to the issue

Premium Partner