Elsevier

Parallel Computing

Volume 30, Issue 7, July 2004, Pages 817-840
Parallel Computing

The ganglia distributed monitoring system: design, implementation, and experience

https://doi.org/10.1016/j.parco.2004.04.001Get rights and content

Abstract

Ganglia is a scalable distributed monitoring system for high performance computing systems such as clusters and Grids. It is based on a hierarchical design targeted at federations of clusters. It relies on a multicast-based listen/announce protocol to monitor state within clusters and uses a tree of point-to-point connections amongst representative cluster nodes to federate clusters and aggregate their state. It leverages widely used technologies such as XML for data representation, XDR for compact, portable data transport, and RRDtool for data storage and visualization. It uses carefully engineered data structures and algorithms to achieve very low per-node overheads and high concurrency. The implementation is robust, has been ported to an extensive set of operating systems and processor architectures, and is currently in use on over 500 clusters around the world. This paper presents the design, implementation, and evaluation of Ganglia along with experience gained through real world deployments on systems of widely varying scale, configurations, and target application domains over the last two and a half years.

Introduction

Over the last ten years, there has been an enormous shift in high performance computing from systems composed of small numbers of computationally massive devices [11], [12], [18], [19] to systems composed of large numbers of commodity components [3], [4], [6], [7], [9]. This architectural shift from the few to the many is causing designers of high performance systems to revisit numerous design issues and assumptions pertaining to scale, reliability, heterogeneity, manageability, and system evolution over time. With clusters now the de facto building block for high performance systems, scale and reliability have become key issues as many independently failing and unreliable components need to be continuously accounted for and managed over time. Heterogeneity, previously a non-issue when running a single vector supercomputer or an MPP, must now be designed for from the beginning, since systems that grow over time are unlikely to scale with the same hardware and software base. Manageability also becomes of paramount importance, since clusters today commonly consist of hundreds or even thousands of nodes [6], [7]. Finally, as systems evolve to accommodate growth, system configurations inevitably need to adapt. In summary, high performance systems today have sharply diverged from the monolithic machines of the past and now face the same set of challenges as that of large-scale distributed systems.

One of the key challenges faced by high performance distributed systems is scalable monitoring of system state. Given a large enough collection of nodes and the associated computational, I/O, and network demands placed on them by applications, failures in large-scale systems become commonplace. To deal with node attrition and to maintain the health of the system, monitoring software must be able to quickly identify failures so that they can be repaired either automatically or via out-of-band means (e.g. rebooting). In large-scale systems, interactions amongst the myriad computational nodes, network switches and links, and storage devices can be complex. A monitoring system that captures some subset of these interactions and visualizes them in interesting ways can often lead to an increased understanding of a system's macroscopic behavior. Finally, as systems scale up and become increasingly distributed, bottlenecks are likely to arise in various locations in the system. A good monitoring system can assist here as well by providing a global view of the system, which can be helpful in identifying performance problems and, ultimately, assisting in capacity planning.

Ganglia is a scalable distributed monitoring system that was built to address these challenges. It provides scalable monitoring of distributed systems at various points in the architectural design space including large-scale clusters in a machine room, computational Grids [14], [15] consisting of federations of clusters, and, most recently, has even seen application on an open, shared planetary-scale application testbed called PlanetLab [21]. The system is based on a hierarchical design targeted at federations of clusters. It relies on a multicast-based listen/announce protocol [1], [10], [16], [29] to monitor state within clusters and uses a tree of point-to-point connections amongst representative cluster nodes to federate clusters and aggregate their state. It leverages widely used technologies such as XML for data representation, XDR for compact, portable data transport, and RRDtool for data storage and visualization. It uses carefully engineered data structures and algorithms to achieve very low per-node overheads and high concurrency. The implementation is robust, has been ported to an extensive set of operating systems and processor architectures, and is currently in use on over 500 clusters around the world.

This paper presents the design, implementation, and evaluation of the Ganglia distributed monitoring system along with an account of experience gained through real world deployments on systems of widely varying scale, configurations, and target application domains. It is organized as follows. In Section 2, we describe the key challenges in building a distributed monitoring system and how they relate to different points in the system architecture space. In Section 3, we present the architecture of Ganglia, a scalable distributed monitoring system for high performance computing systems. In Section 4, we describe our current implementation of Ganglia which is currently deployed on over 500 clusters around the world. In Section 5, we present a performance analysis of our implementation along with an account of experience gained through real world deployments of Ganglia on several large-scale distributed systems. In Section 6, we present related work and in Section 7, we conclude the paper.

Section snippets

Distributed monitoring

In this section, we summarize the key design challenges faced in designing a distributed monitoring system. We then discuss key characteristics of three classes of distributed systems where Ganglia is currently in use: clusters, Grids, and planetary-scale systems. Each class of systems presents a different set of constraints and requires making different design decisions and trade-offs in addressing our key design challenges. While Ganglia's initial design focus was scalable monitoring on a

Architecture

Ganglia is based on a hierarchical design targeted at federations of clusters (Fig. 1). It relies on a multicast-based listen/announce protocol [1], [10], [16], [29] to monitor state within clusters and uses a tree of point-to-point connections amongst representative cluster nodes to federate clusters and aggregate their state. Within each cluster, Ganglia uses heartbeat messages on a well-known multicast address as the basis for a membership protocol. Membership is maintained by using the

Implementation

The implementation consists of two daemons, gmond and gmetad, a command-line program gmetric, and a client side library. The Ganglia monitoring daemon (gmond) provides monitoring on a single cluster by implementing the listen/announce protocol and responding to client requests by returning an XML representation of its monitoring data. gmond runs on every node of a cluster. The Ganglia Meta Daemon (gmetad), on the other hand, provides federation of multiple clusters. A tree of TCP connections

Evaluation and experience

In this section, we present a quantitative analysis of Ganglia along with an account of experience gained through real world deployments on production distributed systems. For the analysis, we measure scalability and performance overhead. We use data obtained from four example systems to make this concrete. For experience, we report on key observations and lessons learned while deploying and maintaining Ganglia on several production systems. Specifically, we describe what worked well, what did

Related work

There are a number of research and commercial efforts centered on monitoring of clusters, but only a handful which have a focus on scale. Supermon [27] is a hierarchical cluster monitoring system that uses a statically configured hierarchy of point-to-point connections to gather and aggregate cluster data collected by custom kernel modules running on each cluster node. CARD [2] is a hierarchical cluster monitoring system that uses a statically configured hierarchy of relational databases to

Conclusion

In this paper, we presented the design, implementation, and evaluation of Ganglia, a scalable distributed monitoring system for high performance computing systems. Ganglia is based on a hierarchical design which uses a multicast-based listen/announce protocol to monitor state within clusters and a tree of point-to-point connections amongst representative cluster nodes to federate clusters and aggregate their state. It uses a careful balance of simple design principles and sound engineering to

Acknowledgements

Special thanks are extended to the Ganglia Development Team for their hard work and insightful ideas. We would like to thank Bartosz Ilkowski for his performance measurements on SUNY Buffalo's 2000-node HPC cluster. Thanks also to Catalin Lucian Dumitrescu for providing useful feedback on this paper and to Steve Wagner for providing information on how Ganglia is being used at Industrial Light and Magic. This work is supported in part by National Science Foundation RI Award EIA-9802069 and NPACI.

References (31)

  • S Brin et al.

    The anatomy of a large-scale hypertextual web search engine

    Computer Networks and ISDN Systems

    (1998)
  • E. Amir, S. McCanne, R.H. Katz, An active service framework and its application to real-time multimedia transcoding,...
  • E. Anderson, D. Patterson, Extensible, scalable monitoring for clusters of computers, in: Proceedings of the 11th...
  • T.E Anderson et al.

    A case for now networks of workstations

    IEEE Micro

    (1995)
  • D.J. Becker, T. Sterling, D. Savarese, J.E. Dorband, U.A. Ranawak, C.V. Packer, Beowulf: a parallel workstation for...
  • N Boden et al.

    A gigabit per second local area network

    IEEE Micro

    (1995)
  • E Brewer

    Lessons from giant-scale services

    IEEE Internet Computing

    (2001)
  • R Buyya

    Parmon: a portable and scalable monitoring system for clusters

    Software––Practice and Experience

    (2000)
  • A. Chien, S. Pakin, M. Lauria, M. Buchanon, K. Hane, L. Giannini, High performance virtual machines (hpvm): clusters...
  • B.N. Chun, D.E. Culler, Rexec: a decentralized, secure remote execution environment for clusters, in: Proceedings of...
  • Intel Corporation. Paragon xp/s product overview,...
  • Thinking Machines Corporation. Connection machine cm-5 technical summary,...
  • K. Czajkowski, S. Fitzgerald, I. Foster, C. Kesselman, Grid information services for distributed resource sharing, in:...
  • I Foster et al.

    Globus: a metacomputing infrastructure toolkit

    International Journal of Supercomputer Applications

    (1997)
  • I Foster et al.

    The anatomy of the grid: enabling scalable virtual organizations

    International Journal of Supercomputer Applications

    (2001)
  • Cited by (0)

    1

    Tel.: +1-510-643-7450; fax: +1-510-643-7352.

    2

    Tel.: +1-510-643-7572; fax: +1-510-643-7352.

    View full text