skip to main content
article

CoMon: a mostly-scalable monitoring system for PlanetLab

Published:01 January 2006Publication History
Skip Abstract Section

Abstract

CoMon is an evolving, mostly-scalable monitoring system for PlanetLab that has the goal of presenting environment-tailored information for both the administrators and users of the PlanetLab global testbed. In addition to passively reporting metrics provided by the operating system, CoMon also actively gathers a number of metrics useful for developers of networked systems. Using CoMon, PlanetLab administrators and users can easily spot problematic machines, where the problem may arise from the machine itself, local configuration/environment problems, or the workload running on the machine. Furthermore, users can easily observe many properties of all of the experiments running across multiple PlanetLab nodes, facilitating not only their own experiment monitoring and debugging, but also helping scale the task of finding PlanetLab problems.In this paper we describe CoMon's design and operation, including what kinds of data are gathered, the scale of the processing involved, and the approaches we have taken to keep CoMon running. Our goal is not only to illustrate the kinds of problems faced in this environment, but also to invite others to participate, either by experimenting with the data generated by CoMon, or by building on the CoMon system itself.

References

  1. HP OpenView products. http://www.managementsoftware.hp.com/products/.]]Google ScholarGoogle Scholar
  2. Jabber Software Foundation. http://www.jabber.org/about/overview.shtml.]]Google ScholarGoogle Scholar
  3. PlanetLab Application Manager. http://appmanager.berkeley.intel-research.net/.]]Google ScholarGoogle Scholar
  4. PLuSH. http://sysnet.ucsd.edu/projects/plush/.]]Google ScholarGoogle Scholar
  5. RRDTool. http://people.ee.ethz.ch/~oetiker/webtools/rrdtool/.]]Google ScholarGoogle Scholar
  6. Vserver. http://linux-vserver.org/.]]Google ScholarGoogle Scholar
  7. P. Brett, R. Knauerhase, M. Bowman, R. Adams, A. Nataraj, J. Sedayao, and M. Spinde. A shared global event propagation system to enable next generation distributed services. In Proceedings of First Workshop on Real, Large Distributed Systems(WORDLS), December 2004.]]Google ScholarGoogle Scholar
  8. J. Case, M. Fedor, M. Schoffstall, and J. Davin. A simple network management protocol (SNMP), RFC 1157, May 1990.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. L. Massie, B. N. Chun, and D. E. Culler. The Ganglia distributed monitoring system: Design, implementation, and experience. Parallel Computing, 30(7), July 2004.]]Google ScholarGoogle Scholar
  10. S. Muir, L. Peterson, M. Fiuczynski, J. Cappos, and J. Hartman. Proper: Privileged operations in a virtualised system environment. In Proceedings of the USENIX Annual Technical Conference, April 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Oppenheimer, J. Albrecht, D. Patterson, and A. Vahdat. Distributed resource discovery on PlanetLab with SWORD. In Proceedings of First Workshop on Real, Large Distributed Systems(WORDLS), December 2004.]]Google ScholarGoogle Scholar
  12. S. Rhea, B. Godfrey, B. Karp, J. Kubiatowicz, S. Ratnasamy, S. Shenker, I. Stoica, and H. Yu. OpenDHT: A public DHT service and its uses. In Proceedings of ACM SIGCOMM, August 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. L. Wang, K. Park, R. Pang, V. Pai, and L. Peterson. Reliability and security in the CoDeeN content distribution network. In Proceedings of the USENIX Annual Technical Conference, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. CoMon: a mostly-scalable monitoring system for PlanetLab

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGOPS Operating Systems Review
          ACM SIGOPS Operating Systems Review  Volume 40, Issue 1
          January 2006
          101 pages
          ISSN:0163-5980
          DOI:10.1145/1113361
          Issue’s Table of Contents

          Copyright © 2006 Authors

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 January 2006

          Check for updates

          Qualifiers

          • article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader