article

Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining

Authors:
Robbert Van Renesse

Cornell University, Ithaca, NY

Cornell University, Ithaca, NY
View Profile

,
Kenneth P. Birman

Cornell University, Ithaca, NY

Cornell University, Ithaca, NY
View Profile

,
Werner Vogels

Cornell University, Ithaca, NY

Cornell University, Ithaca, NY
View Profile

Authors Info & Claims

ACM Transactions on Computer Systems Volume 21 Issue 2pp 164–206https://doi.org/10.1145/762483.762485

Published:01 May 2003Publication History

ACM Transactions on Computer Systems

Abstract

Scalable management and self-organizational capabilities are emerging as central requirements for a generation of large-scale, highly dynamic, distributed applications. We have developed an entirely new distributed information management system called Astrolabe. Astrolabe collects large-scale system state, permitting rapid updates and providing on-the-fly attribute aggregation. This latter capability permits an application to locate a resource, and also offers a scalable way to track system state as it evolves over time. The combination of features makes it possible to solve a wide variety of management and self-configuration problems. This paper describes the design of the system with a focus upon its scalability. After describing the Astrolabe service, we present examples of the use of Astrolabe for locating resources, publish-subscribe, and distributed synchronization in large systems. Astrolabe is implemented using a peer-to-peer protocol, and uses a restricted form of mobile code based on the SQL query language for aggregation. This protocol gives rise to a novel consistency model. Astrolabe addresses several security considerations using a built-in PKI. The scalability of the system is evaluated using both simulation and experiments; these confirm that Astrolabe could scale to thousands and perhaps millions of nodes, with information propagation delays in the tens of seconds.

References

Adjie-Winoto, W., Schwartz, E., Balakrishnan, H., and Lilley, J. 1999. The design and implementation of an Intentional Naming System. In Proceedings of the 17th ACM Symposium on Operating Systems Principles. ACM Press, Kiawah Island, SC.]] Google Scholar
Aguilera, M., Strom, R., Sturman, D., Astley, M., and Chandra, T. 1999. Matching events in a content-based subscription system. In Proceedings of the 18th ACM Symposium on Principles of Distributed Computing. Atlanta, GA.]] Google Scholar
Andersen, D., Balakrishnan, H., Kaashoek, M., and Morris, R. 2001. Resilient overlay networks. In Proceedings of the 18th ACM Symposium on Operating Systems Principles. Banff, Canada, 131--145.]] Google Scholar
Balazinska, M., Balakrishnan, H., and Karger, D. 2002. INS/Twine: A scalable peer-to-peer architecture for intentional resource discovery. In Proceedings of the 1st International Conference on Pervasive Computing (Pervasive 2002), F. Mattern and M. Naghshineh, Eds. Lecture Notes in Computer Science, vol. 2414. Springer, Zürich, Switzerland, 195--210.]] Google Scholar
Birman, K., Hayden, M., Ozkasap, O., Xiao, Z., Budiu, M., and Minsky, Y. 1999. Bimodal Multicast. ACM Trans. Comput. Syst. 17, 2 (May), 41--88.]] Google Scholar
Birman, K. P. and Joseph, T. A. 1987. Exploiting virtual synchrony in distributed systems. In Proceedings of the 11th ACM Symposium on Operating Systems Principles. Austin, TX, 123--138.]] Google Scholar
Birrell, A., Levin, R., Needham, R., and Schroeder, M. 1982. Grapevine: an exercise in distributed computing. CACM 25, 4 (Apr.), 260--274.]] Google Scholar
Bloom, B. 1970. Space/time tradeoffs in hash coding with allowable errors. CACM 13, 7 (July), 422--426.]] Google Scholar
Bonnet, P., Gehrke, J., and Seshadri, P. 2001. Towards sensor database systems. In Proceedings of the 2nd International Conference on Mobile Data Management. Hong Kong.]] Google Scholar
Carzaniga, A., Rosenblum, D., and A.L., W. 2001. Design and evaluation of a wide-area event notification service. ACM Trans. Comput. Syst. 19, 3 (Aug.), 332--383.]] Google Scholar
Demers, A., Greene, D., Hauser, C., Irish, W., Larson, J., Shenker, S., Sturgis, H., Swinehart, D., and Terry, D. 1987. Epidemic algorithms for replicated database maintenance. In Proceedings of the 6th ACM Symposium on Principles of Distributed Computing. Vancouver, BC, 1--12.]] Google Scholar
Golding, R. 1992. A weak-consistency architecture for distributed information services. Comput. Syst. 5, 4 (Fall), 379--405.]]Google Scholar
Golding, R., Long, D., and Wilkes, J. 1994. The REFDBMS distributed bibliographic database system. In Proceedings of Usenix'94. Santa Fe, NM, 47--62.]] Google Scholar
Gribble, S., Welsh, M., Von Behren, R., Brewer, E., Culler, D., Borisov, N., Czerwinski, S., Gummadi, R., Hill, J., Joseph, A., Katz, R., Mao, Z., Ross, S., and Zhao, B. 2001. The Ninja architecture for robust Internet-scale systems and services. Comput. Netw., Special Issue of Computer Networks on Pervasive Computing 35, 4, 473--497.]] Google Scholar
Heidemann, J., Silva, F., Intanagonwiwat, C., Govindan, R., Estrin, D., and Ganesan, D. 2001. Building efficient wireless sensor networks with low-level naming. In Proceedings of the 18th ACM Symposium on Operating Systems Principles. Banff, Canada, 146--159.]] Google Scholar
Lampson, B. 1986. Designing a global name service. In Proceedings of the 5th ACM Symposium on Principles of Distributed Computing. Calgary, Alberta.]] Google Scholar
Minsky, Y. 2002. Spreading rumors cheaply, quickly, and reliably. Ph.D. thesis, Department of Computer Science, Cornell University, Ithaca, NY.]] Google Scholar
Mockapetris, P. 1984. The Domain Name System. In Proceedings of the IFIP 6.5 International Symposium on Computer Messaging. Nottingham, UK.]] Google Scholar
Oki, B. M., Pfluegl, M., Siegel, A., and Skeen, D. 1993. The Information Bus---an architecture for extensible distributed systems. In Proceedings of the 14th ACM Symposium on Operating Systems Principles. Asheville, NC, 58--68.]] Google Scholar
Petersen, K., Spreitzer, M., Terry, D., Theimer, M., and Demers, A. 1997. Flexible update propagation for weakly consistent replication. In Proceedings of the 16th ACM Symposium on Operating Systems Principles. Saint-Malo, France, 288--301.]] Google Scholar
Radicati, S. 1994. X.500 Directory Services: Technology and Deployment. International Thomson Computer Press, London, UK.]] Google Scholar
Reese, G. 2000. Database Programming with JDBC and Java, 2nd Edition. O'Reilly.]] Google Scholar
Rowstron, A. and Druschel, P. 2001. Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems. In Proceedings of Middleware 2001. Heidelberg, Germany.]] Google Scholar
Sanders, R. 1998. ODBC 3.5 Developer's Guide. M&T Books.]] Google Scholar
Snoeren, A., Conley, K., and Gifford, D. 2001. Mesh-based content routing using XML. In Proceedings of the 18th ACM Symposium on Operating Systems Principles. Banff, Canada, 160--173.]] Google Scholar
Stallings, W. 1993. SNMP, SNMPv2, and CMIP. Addison-Wesley.]]Google Scholar
Stoica, I., Morris, R., Karger, D., and Kaashoek, M. 1995. Chord: A scalable peer-to-peer lookup service for Internet applications. In Proceedings of the '95 Symposium on Communications Architectures & Protocols. ACM SIGCOMM, Cambridge, MA.]] Google Scholar
Tennenhouse, D., Smith, J., Sincoskie, W., Wetherall, D., and Minden, G. 1997. A survey of active network research. IEEE Communications Magazine 35, 1 (Jan.), 80--86.]]Google Scholar
Van Renesse, R. 2002. Power-aware epidemics. In Proceedings of the 21st Symposium on Reliable Distributed Systems. Osaka, Japan.]] Google Scholar
Van Renesse, R. and Dumitriu, D. 2002. Collaborative networking in an uncooperative Internet. In Proceedings of the 21st Symposium on Reliable Distributed Systems. Osaka, Japan.]] Google Scholar
Van Renesse, R., Minsky, Y., and Hayden, M. 1998. A gossip-style failure detection service. In Proceedings of Middleware'98. IFIP, The Lake District, UK, 55--70.]]Google Scholar
Van Steen, M., Hauck, F., Homburg, P., and Tanenbaum, A. 1998. Locating objects in wide-area systems. IEEE Communications Magazine 36, 1 (Jan.), 104--109.]]Google Scholar
White, B., Lepreau, J., Stoller, L., Ricci, R., Guruprasad, S., Newbold, M., Hibler, M., Bard, C., and Joglekar, A. 2002. An integrated experimental environment for distributed systems and networks. In USENIX OSDI'02. Boston, MA.]] Google Scholar
Zhao, B., Kubiatowicz, J., and Joseph, A. 2001. Tapestry: An infrastructure for fault-tolerant wide-area location and routing. Tech. Rep. UCB/CSD-01-1141, University of California, Berkeley, Computer Science Department.]] Google Scholar

Index Terms

Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining

Recommendations

DECA: a hierarchical framework for DECentralized aggregation in DHTs
DSOM'06: Proceedings of the 17th IFIP/IEEE international conference on Distributed Systems: operations and management

As Structured Peer-to-Peer (P2P) Networks become popular, there is an emerging need to monitor continuously the huge number of participants in a robust and scalable manner. To this end, aggregation has emerged as a basis for the self-management of these ...
Read More
Araneola: A scalable reliable multicast system for dynamic environments

This paper presents Araneola (Araneola means ''little spider'' in Latin.), a scalable reliable application-level multicast system for highly dynamic wide-area environments. Araneola supports multi-point to multi-point reliable communication in a fully ...
Read More
Peer-to-Peer Membership Management for Gossip-Based Protocols

Gossip-based protocols for group communication have attractive scalability and reliability properties. The probabilistic gossip schemes studied so far typically assume that each group member has full knowledge of the global membership and chooses gossip ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Computer Systems Volume 21, Issue 2
May 2003
117 pages
ISSN:0734-2071
EISSN:1557-7333
DOI:10.1145/762483
Issue’s Table of Contents

Copyright © 2003 ACM
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 May 2003
Published in tocs Volume 21, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Aggregation
epidemic protocols
failure detection
gossip
membership
publish-subscribe
scalability
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 468
  Total Citations
  View Citations
- 5,428
  Total Downloads
- Downloads (Last 12 months)73
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining

ACM Transactions on Computer Systems

Abstract

References

Cited By

Index Terms

Recommendations

DECA: a hierarchical framework for DECentralized aggregation in DHTs

Araneola: A scalable reliable multicast system for dynamic environments

Peer-to-Peer Membership Management for Gossip-Based Protocols

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining

ACM Transactions on Computer Systems

Abstract

References

Cited By

Index Terms

Recommendations

DECA: a hierarchical framework for DECentralized aggregation in DHTs

Araneola: A scalable reliable multicast system for dynamic environments

Peer-to-Peer Membership Management for Gossip-Based Protocols

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media