Skip to main content
Top
Published in: The Journal of Supercomputing 10/2021

29-03-2021

HiperView: real-time monitoring of dynamic behaviors of high-performance computing centers

Authors: Tommy Dang, Ngan Nguyen, Yong Chen

Published in: The Journal of Supercomputing | Issue 10/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This paper presents HiperView, a visual analytics framework monitoring and characterizing the health status of high-performance computing systems through a RESTful interface in real time. The primary objectives of this visual analytical system are: (1) to provide a graphical interface for tracking the health status of a large number of data center hosts in real-time statistics, (2) to help users visually analyze unusual behavior of a series of events that may have temporal and spatial correlation, and (3) to assist in performing preliminary troubleshooting and maintenance with a visual layout that reflects the actual physical locations. Two use cases were analyzed in detail to assess the effectiveness of the HiperView on a medium-scale, Redfish-enabled production high-performance computing system with a total of 10 racks and 467 hosts. The visualization apparatus has been proven to offer the necessary support for system automation and control. Our framework’s visual components and interfaces are designed to potentially handle a larger-scale data center of thousands of hosts with hundreds of various health services per host.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Footnotes
1
A computer room air conditioning (CRAC) unit is a device that monitors and maintains the temperature, air distribution, and humidity in a network room or data center.
 
Literature
2.
go back to reference Amar R, Eagan J, Stasko J (2005) Low-level components of analytic activity in information visualization. In: Proc. of the IEEE Symposium on Information Visualization, pp 15–24 Amar R, Eagan J, Stasko J (2005) Low-level components of analytic activity in information visualization. In: Proc. of the IEEE Symposium on Information Visualization, pp 15–24
3.
go back to reference Andrienko N, Andrienko G, Gatalsky P (2003) Exploratory spatio-temporal visualization: an analytical review. J Vis Lang Comput 14(6):503–541CrossRef Andrienko N, Andrienko G, Gatalsky P (2003) Exploratory spatio-temporal visualization: an analytical review. J Vis Lang Comput 14(6):503–541CrossRef
4.
go back to reference Andrienko N, Lammarsch T, Andrienko G, Fuchs G, Keim D, Miksch S, Rind A (2018) Viewing visual analytics as model building. In: Computer graphics forum, vol 37. Wiley Online Library, pp 275–299 Andrienko N, Lammarsch T, Andrienko G, Fuchs G, Keim D, Miksch S, Rind A (2018) Viewing visual analytics as model building. In: Computer graphics forum, vol 37. Wiley Online Library, pp 275–299
5.
go back to reference Barth W (2008) Nagios: system and network monitoring. No Starch Press, San Francisco Barth W (2008) Nagios: system and network monitoring. No Starch Press, San Francisco
6.
go back to reference Betke E, Kunkel J (2017) Real-time i/o-monitoring of hpc applications with siox, elasticsearch, grafana and fuse. In: International Conference on High Performance Computing, pp 174–186. Springer Betke E, Kunkel J (2017) Real-time i/o-monitoring of hpc applications with siox, elasticsearch, grafana and fuse. In: International Conference on High Performance Computing, pp 174–186. Springer
7.
go back to reference Bostock M, Ogievetsky V, Heer J (2011) D3 data-driven documents. IEEE Trans Vis Comput Graph 17(12):2301–2309CrossRef Bostock M, Ogievetsky V, Heer J (2011) D3 data-driven documents. IEEE Trans Vis Comput Graph 17(12):2301–2309CrossRef
8.
go back to reference Buyya R (2000) Parmon: a portable and scalable monitoring system for clusters. Softw Pract Exp 30(7):723–739CrossRef Buyya R (2000) Parmon: a portable and scalable monitoring system for clusters. Softw Pract Exp 30(7):723–739CrossRef
9.
go back to reference Carasso D (2012) Exploring splunk. CITO Research, New York Carasso D (2012) Exploring splunk. CITO Research, New York
11.
go back to reference Dang T, Wilkinson L (2013) TimeExplorer: similarity search time series by their signatures. In: Proc. International Symp. on Visual Computing, pp 280–289 Dang T, Wilkinson L (2013) TimeExplorer: similarity search time series by their signatures. In: Proc. International Symp. on Visual Computing, pp 280–289
14.
go back to reference Dang TN, Wilkinson L (2014) Transforming scagnostics to reveal hidden features. IEEE Trans Vis Comput Graph 20(12):1624–1632CrossRef Dang TN, Wilkinson L (2014) Transforming scagnostics to reveal hidden features. IEEE Trans Vis Comput Graph 20(12):1624–1632CrossRef
19.
go back to reference Hugh Greenberg ND (2018) Tivan: a scalable data collection and analytics cluster (2018). In: The 2nd Industry/University Joint International Workshop on Data Center Automation, Analytics, and Control (DAAC) Hugh Greenberg ND (2018) Tivan: a scalable data collection and analytics cluster (2018). In: The 2nd Industry/University Joint International Workshop on Data Center Automation, Analytics, and Control (DAAC)
21.
go back to reference Jia C, Cai Y, Yu YT, Tse T (2016) 5w+1h pattern: a perspective of systematic mapping studies and a case study on cloud software testing. J Syst Softw 116:206–219CrossRef Jia C, Cai Y, Yu YT, Tse T (2016) 5w+1h pattern: a perspective of systematic mapping studies and a case study on cloud software testing. J Syst Softw 116:206–219CrossRef
22.
go back to reference Keim DA, Panse C, Sips M (2004) Information visualization: scope, techniques and opportunities for geovisualization. In: Dykes J (ed) Exploring geovisualization. Elsevier, Oxford, pp 1–17 Keim DA, Panse C, Sips M (2004) Information visualization: scope, techniques and opportunities for geovisualization. In: Dykes J (ed) Exploring geovisualization. Elsevier, Oxford, pp 1–17
23.
go back to reference Li J, Ali G, Nguyen N, Hass J, Sill A, Dang T, Chen Y (2020) Monster: an out-of-the-box monitoring tool for high performance computing systems. In: 2020 IEEE International Conference on Cluster Computing (CLUSTER), pp 119–129. IEEE Li J, Ali G, Nguyen N, Hass J, Sill A, Dang T, Chen Y (2020) Monster: an out-of-the-box monitoring tool for high performance computing systems. In: 2020 IEEE International Conference on Cluster Computing (CLUSTER), pp 119–129. IEEE
24.
go back to reference Massie ML, Chun BN, Culler DE (2004) The ganglia distributed monitoring system: design, implementation, and experience. Parallel Comput 30(7):817–840CrossRef Massie ML, Chun BN, Culler DE (2004) The ganglia distributed monitoring system: design, implementation, and experience. Parallel Comput 30(7):817–840CrossRef
26.
go back to reference Misra G, Agrawal S, Kurkure N, Pawar S, Mathur K (2011) Chreme: a web based application execution tool for using hpc resources. In: International Conference on High Performance Computing, pp 12–14 Misra G, Agrawal S, Kurkure N, Pawar S, Mathur K (2011) Chreme: a web based application execution tool for using hpc resources. In: International Conference on High Performance Computing, pp 12–14
27.
go back to reference Nguyen N, Dang T (2019) Hiperviz: Interactive visualization of CPU temperatures in high performance computing centers. In: Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (Learning), PEARC ’19. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3332186.3337959 Nguyen N, Dang T (2019) Hiperviz: Interactive visualization of CPU temperatures in high performance computing centers. In: Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (Learning), PEARC ’19. Association for Computing Machinery, New York, NY, USA. https://​doi.​org/​10.​1145/​3332186.​3337959
28.
go back to reference Nguyen N, Hass J, Chen Y, Li J, Sill A, Dang T (2020) Radarviewer: visualizing the dynamics of multivariate data. In: Practice and Experience in Advanced Research Computing, PEARC ’20, pp 555–556. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3311790.3404538 Nguyen N, Hass J, Chen Y, Li J, Sill A, Dang T (2020) Radarviewer: visualizing the dynamics of multivariate data. In: Practice and Experience in Advanced Research Computing, PEARC ’20, pp 555–556. Association for Computing Machinery, New York, NY, USA. https://​doi.​org/​10.​1145/​3311790.​3404538
33.
go back to reference Saary MJ (2008) Radar plots: a useful way for presenting multivariate health care data. J Clin Epidemiol 61(4):311–317CrossRef Saary MJ (2008) Radar plots: a useful way for presenting multivariate health care data. J Clin Epidemiol 61(4):311–317CrossRef
34.
go back to reference Seo J, Shneiderman B (2004) A rank-by-feature framework for unsupervised multidimensional data exploration using low dimensional projections. In: Information Visualization, 2004. INFOVIS 2004. IEEE Symposium on, pp 65–72. IEEE Seo J, Shneiderman B (2004) A rank-by-feature framework for unsupervised multidimensional data exploration using low dimensional projections. In: Information Visualization, 2004. INFOVIS 2004. IEEE Symposium on, pp 65–72. IEEE
36.
go back to reference Stearley J, Corwell S, Lord K (2010) Bridging the gaps: Joining information sources with splunk. In: SLAML Stearley J, Corwell S, Lord K (2010) Bridging the gaps: Joining information sources with splunk. In: SLAML
37.
go back to reference Wilkinson L (2017) Visualizing big data outliers through distributed aggregation. IEEE Trans Vis Comput Graph 24(1):256–266CrossRef Wilkinson L (2017) Visualizing big data outliers through distributed aggregation. IEEE Trans Vis Comput Graph 24(1):256–266CrossRef
38.
go back to reference Wilkinson L, Anand A, Grossman R (2005) Graph-theoretic scagnostics. In: Proceedings of the IEEE Information Visualization 2005, pp 157–164. IEEE Computer Society Press Wilkinson L, Anand A, Grossman R (2005) Graph-theoretic scagnostics. In: Proceedings of the IEEE Information Visualization 2005, pp 157–164. IEEE Computer Society Press
39.
go back to reference Wilkinson L, Anand A, Grossman R (2006) High-dimensional visual analytics: Interactive exploration guided by pairwise views of point distributions. IEEE Trans Vis Comput Graph 12(6):1363–1372CrossRef Wilkinson L, Anand A, Grossman R (2006) High-dimensional visual analytics: Interactive exploration guided by pairwise views of point distributions. IEEE Trans Vis Comput Graph 12(6):1363–1372CrossRef
40.
go back to reference Zadrozny P, Kodali R (2013) Big data analytics using Splunk: deriving operational intelligence from social media, machine data, existing data warehouses, and other real-time streaming sources. Apress, New YorkCrossRef Zadrozny P, Kodali R (2013) Big data analytics using Splunk: deriving operational intelligence from social media, machine data, existing data warehouses, and other real-time streaming sources. Apress, New YorkCrossRef
Metadata
Title
HiperView: real-time monitoring of dynamic behaviors of high-performance computing centers
Authors
Tommy Dang
Ngan Nguyen
Yong Chen
Publication date
29-03-2021
Publisher
Springer US
Published in
The Journal of Supercomputing / Issue 10/2021
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-021-03724-5

Other articles of this Issue 10/2021

The Journal of Supercomputing 10/2021 Go to the issue

Premium Partner