skip to main content
article

Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining

Published:01 May 2003Publication History
Skip Abstract Section

Abstract

Scalable management and self-organizational capabilities are emerging as central requirements for a generation of large-scale, highly dynamic, distributed applications. We have developed an entirely new distributed information management system called Astrolabe. Astrolabe collects large-scale system state, permitting rapid updates and providing on-the-fly attribute aggregation. This latter capability permits an application to locate a resource, and also offers a scalable way to track system state as it evolves over time. The combination of features makes it possible to solve a wide variety of management and self-configuration problems. This paper describes the design of the system with a focus upon its scalability. After describing the Astrolabe service, we present examples of the use of Astrolabe for locating resources, publish-subscribe, and distributed synchronization in large systems. Astrolabe is implemented using a peer-to-peer protocol, and uses a restricted form of mobile code based on the SQL query language for aggregation. This protocol gives rise to a novel consistency model. Astrolabe addresses several security considerations using a built-in PKI. The scalability of the system is evaluated using both simulation and experiments; these confirm that Astrolabe could scale to thousands and perhaps millions of nodes, with information propagation delays in the tens of seconds.

References

  1. Adjie-Winoto, W., Schwartz, E., Balakrishnan, H., and Lilley, J. 1999. The design and implementation of an Intentional Naming System. In Proceedings of the 17th ACM Symposium on Operating Systems Principles. ACM Press, Kiawah Island, SC.]] Google ScholarGoogle Scholar
  2. Aguilera, M., Strom, R., Sturman, D., Astley, M., and Chandra, T. 1999. Matching events in a content-based subscription system. In Proceedings of the 18th ACM Symposium on Principles of Distributed Computing. Atlanta, GA.]] Google ScholarGoogle Scholar
  3. Andersen, D., Balakrishnan, H., Kaashoek, M., and Morris, R. 2001. Resilient overlay networks. In Proceedings of the 18th ACM Symposium on Operating Systems Principles. Banff, Canada, 131--145.]] Google ScholarGoogle Scholar
  4. Balazinska, M., Balakrishnan, H., and Karger, D. 2002. INS/Twine: A scalable peer-to-peer architecture for intentional resource discovery. In Proceedings of the 1st International Conference on Pervasive Computing (Pervasive 2002), F. Mattern and M. Naghshineh, Eds. Lecture Notes in Computer Science, vol. 2414. Springer, Zürich, Switzerland, 195--210.]] Google ScholarGoogle Scholar
  5. Birman, K., Hayden, M., Ozkasap, O., Xiao, Z., Budiu, M., and Minsky, Y. 1999. Bimodal Multicast. ACM Trans. Comput. Syst. 17, 2 (May), 41--88.]] Google ScholarGoogle Scholar
  6. Birman, K. P. and Joseph, T. A. 1987. Exploiting virtual synchrony in distributed systems. In Proceedings of the 11th ACM Symposium on Operating Systems Principles. Austin, TX, 123--138.]] Google ScholarGoogle Scholar
  7. Birrell, A., Levin, R., Needham, R., and Schroeder, M. 1982. Grapevine: an exercise in distributed computing. CACM 25, 4 (Apr.), 260--274.]] Google ScholarGoogle Scholar
  8. Bloom, B. 1970. Space/time tradeoffs in hash coding with allowable errors. CACM 13, 7 (July), 422--426.]] Google ScholarGoogle Scholar
  9. Bonnet, P., Gehrke, J., and Seshadri, P. 2001. Towards sensor database systems. In Proceedings of the 2nd International Conference on Mobile Data Management. Hong Kong.]] Google ScholarGoogle Scholar
  10. Carzaniga, A., Rosenblum, D., and A.L., W. 2001. Design and evaluation of a wide-area event notification service. ACM Trans. Comput. Syst. 19, 3 (Aug.), 332--383.]] Google ScholarGoogle Scholar
  11. Demers, A., Greene, D., Hauser, C., Irish, W., Larson, J., Shenker, S., Sturgis, H., Swinehart, D., and Terry, D. 1987. Epidemic algorithms for replicated database maintenance. In Proceedings of the 6th ACM Symposium on Principles of Distributed Computing. Vancouver, BC, 1--12.]] Google ScholarGoogle Scholar
  12. Golding, R. 1992. A weak-consistency architecture for distributed information services. Comput. Syst. 5, 4 (Fall), 379--405.]]Google ScholarGoogle Scholar
  13. Golding, R., Long, D., and Wilkes, J. 1994. The REFDBMS distributed bibliographic database system. In Proceedings of Usenix'94. Santa Fe, NM, 47--62.]] Google ScholarGoogle Scholar
  14. Gribble, S., Welsh, M., Von Behren, R., Brewer, E., Culler, D., Borisov, N., Czerwinski, S., Gummadi, R., Hill, J., Joseph, A., Katz, R., Mao, Z., Ross, S., and Zhao, B. 2001. The Ninja architecture for robust Internet-scale systems and services. Comput. Netw., Special Issue of Computer Networks on Pervasive Computing 35, 4, 473--497.]] Google ScholarGoogle Scholar
  15. Heidemann, J., Silva, F., Intanagonwiwat, C., Govindan, R., Estrin, D., and Ganesan, D. 2001. Building efficient wireless sensor networks with low-level naming. In Proceedings of the 18th ACM Symposium on Operating Systems Principles. Banff, Canada, 146--159.]] Google ScholarGoogle Scholar
  16. Lampson, B. 1986. Designing a global name service. In Proceedings of the 5th ACM Symposium on Principles of Distributed Computing. Calgary, Alberta.]] Google ScholarGoogle Scholar
  17. Minsky, Y. 2002. Spreading rumors cheaply, quickly, and reliably. Ph.D. thesis, Department of Computer Science, Cornell University, Ithaca, NY.]] Google ScholarGoogle Scholar
  18. Mockapetris, P. 1984. The Domain Name System. In Proceedings of the IFIP 6.5 International Symposium on Computer Messaging. Nottingham, UK.]] Google ScholarGoogle Scholar
  19. Oki, B. M., Pfluegl, M., Siegel, A., and Skeen, D. 1993. The Information Bus---an architecture for extensible distributed systems. In Proceedings of the 14th ACM Symposium on Operating Systems Principles. Asheville, NC, 58--68.]] Google ScholarGoogle Scholar
  20. Petersen, K., Spreitzer, M., Terry, D., Theimer, M., and Demers, A. 1997. Flexible update propagation for weakly consistent replication. In Proceedings of the 16th ACM Symposium on Operating Systems Principles. Saint-Malo, France, 288--301.]] Google ScholarGoogle Scholar
  21. Radicati, S. 1994. X.500 Directory Services: Technology and Deployment. International Thomson Computer Press, London, UK.]] Google ScholarGoogle Scholar
  22. Reese, G. 2000. Database Programming with JDBC and Java, 2nd Edition. O'Reilly.]] Google ScholarGoogle Scholar
  23. Rowstron, A. and Druschel, P. 2001. Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems. In Proceedings of Middleware 2001. Heidelberg, Germany.]] Google ScholarGoogle Scholar
  24. Sanders, R. 1998. ODBC 3.5 Developer's Guide. M&T Books.]] Google ScholarGoogle Scholar
  25. Snoeren, A., Conley, K., and Gifford, D. 2001. Mesh-based content routing using XML. In Proceedings of the 18th ACM Symposium on Operating Systems Principles. Banff, Canada, 160--173.]] Google ScholarGoogle Scholar
  26. Stallings, W. 1993. SNMP, SNMPv2, and CMIP. Addison-Wesley.]]Google ScholarGoogle Scholar
  27. Stoica, I., Morris, R., Karger, D., and Kaashoek, M. 1995. Chord: A scalable peer-to-peer lookup service for Internet applications. In Proceedings of the '95 Symposium on Communications Architectures & Protocols. ACM SIGCOMM, Cambridge, MA.]] Google ScholarGoogle Scholar
  28. Tennenhouse, D., Smith, J., Sincoskie, W., Wetherall, D., and Minden, G. 1997. A survey of active network research. IEEE Communications Magazine 35, 1 (Jan.), 80--86.]]Google ScholarGoogle Scholar
  29. Van Renesse, R. 2002. Power-aware epidemics. In Proceedings of the 21st Symposium on Reliable Distributed Systems. Osaka, Japan.]] Google ScholarGoogle Scholar
  30. Van Renesse, R. and Dumitriu, D. 2002. Collaborative networking in an uncooperative Internet. In Proceedings of the 21st Symposium on Reliable Distributed Systems. Osaka, Japan.]] Google ScholarGoogle Scholar
  31. Van Renesse, R., Minsky, Y., and Hayden, M. 1998. A gossip-style failure detection service. In Proceedings of Middleware'98. IFIP, The Lake District, UK, 55--70.]]Google ScholarGoogle Scholar
  32. Van Steen, M., Hauck, F., Homburg, P., and Tanenbaum, A. 1998. Locating objects in wide-area systems. IEEE Communications Magazine 36, 1 (Jan.), 104--109.]]Google ScholarGoogle Scholar
  33. White, B., Lepreau, J., Stoller, L., Ricci, R., Guruprasad, S., Newbold, M., Hibler, M., Bard, C., and Joglekar, A. 2002. An integrated experimental environment for distributed systems and networks. In USENIX OSDI'02. Boston, MA.]] Google ScholarGoogle Scholar
  34. Zhao, B., Kubiatowicz, J., and Joseph, A. 2001. Tapestry: An infrastructure for fault-tolerant wide-area location and routing. Tech. Rep. UCB/CSD-01-1141, University of California, Berkeley, Computer Science Department.]] Google ScholarGoogle Scholar

Index Terms

  1. Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining

                                Recommendations

                                Comments

                                Login options

                                Check if you have access through your login credentials or your institution to get full access on this article.

                                Sign in

                                Full Access

                                PDF Format

                                View or Download as a PDF file.

                                PDF

                                eReader

                                View online with eReader.

                                eReader