Abstract
This paper describes QuickSilver, developed at the IBM Almaden Research Center, which uses atomic transactions as a unified failure recovery mechanism for a client-server structured distributed system. Transactions allow failure atomicity for related activities at a single server or at a number of independent servers. Rather than bundling transaction management into a dedicated language or recoverable object manager, Quicksilver exposes the basic commit protocol and log recovery primitives, allowing clients and servers to tailor their recovery techniques to their specific needs. Servers can implement their own log recovery protocols rather than being required to use a system-defined protocol. These decisions allow servers to make their own choices to balance simplicity, efficiency, and recoverability.
- 1 ALLCHIN, J. E., AND MCKENDRY, M.S. Synchronization and recovery of actions. In Proceedings of the 2nd ACM Symposium on Principles o{ Distributed Computing (Montreal, Aug. 1983). ACM, New York, 1983, 31-44. Google Scholar
- 2 AUSLANDER, M., AND HOPKINS, M. An overview of the PL.8 compiler. In SIGPLAN '82 Symposium on Compiler Writing (Boston, Mass., June 1982). ACM, New York, 1982. Google Scholar
- 3 BARON, R. V., RASHID, R. F., SIEGEL, E. H., TEVANIAN, A., JR., AND YOUNG, M.W. MACH- 1: A multiprocessor oriented operating system and environment. In New Computing Environmerits: Parallel, Vector, and Systolic, SIAM, 1986, 80-89.Google Scholar
- 4 BARTLETT, J. A NonStop kernel. In A CM Proceedings of the 8th Symposium on Operating Systems Principles (Pacific Grove, Calif. Dec. 1981). ACM, New York, 1981, 22-30. Google Scholar
- 5 BIRMAN, K.P. Replication and fault-tolerance in the ISIS system. In Proceedings of the 10th ACM Symposium on Operating Systems Principles (Orcas Island, Wash., Dec. 1985). ACM, New York, 1985, 79-86. Google Scholar
- 6 BORR, A.J. Transaction monitoring in Encompass: Reliable distributed transaction processing. In Proceedings of the 7th International Conference on Very Large Data Bases (Cannes, France, Sept. 1981), IEEE, New York, 1981, 155-165.Google Scholar
- 7 CABRERA, L. F., AND WYLLIE, j.C. QuickSilver distributed file services: An architecture for horizontal growth. IBM Res. Rep. RJ5578, Feb. 1987.Google Scholar
- 8 CHANG, A., AND MERGEN, M. F. 801 storage: Architecture and programming. ACM Trans. Comput. Syst. This issue, 28-50. Google Scholar
- 9 CHERITON, D.R. The V kernel: a software base for distributed systems. IEEE Softw. 1, 2 (April 1984), 19-42.Google Scholar
- 10 CHERITON, D.R. Fault-tolerant transaction management in a workstation cluster. Unpublished.Google Scholar
- 11 CHERITON, D. R., AND ZWAENEPOEL, W. Distributed process groups in the V kernel. ACM Trans. Comput. Syst. 3, 2 (May 1985), 77-107. Google Scholar
- 12 COOPER, S.C. Replicated distributed programs. In Proceedings of the lOth ACM Symposium on Operating Systems Principles (Orcas Island, Wash., Dec. 1985). ACM, New York, 1985, 63-78. Google Scholar
- 13 CRISTIAN, F., AGHILI, H., STRONG, R., AND DOLEV, D. Atomic broadcast: From simple message diffusion to Byzantine agreement. IBM Res. Rep. RJ5244, IBM, San Jose, Calif., July 1986.Google Scholar
- 14 GRAY, J.N. Notes on data base operating systems. In Operating Systems, An Advanced Course, R. Bayer, R. M. Graham, and G. Seegmdller, Eds. Springer-Verlag, New York, 1978, 393-481. Also available as IBM Res. Rep. RJ2188, IBM Almaden Research Center, San Jose, CA 95120. Google Scholar
- 15 GRAY, J. N., MCJONES, P., BLASGEN, M. W., LORIE, R. A., PRICE, T. G., PUTZOLU, G. F., AND TRAIGER, I.L. The recovery manager of the System R database manager. Comput. Surv. 13, 2 (June 1981), 223-242. Google Scholar
- 16 HERLIHY, M. P., AND WING, J.M. Avalon: Language support for reliable distributed systems. Tech. Rep. CMU-CS-86-167, Dept. of Computer Science, Carnegie Mellon Univ., Pittsburgh, Pa., Nov. 1986.Google Scholar
- 17 INTERNATIONAL BUSINESS MACHINES. Systems Network Architecture Transaction: Programmer's Reference Manual for LU Type 6.2, IBM Corporation GC30-3084.Google Scholar
- 18 LAMPSON, B.W. Atomic transactions. In Distributed Systems--Architecture and Implementation. Springer-Verlag, New York, 1981, 246-264. Google Scholar
- 19 LINDSAY, B., HAAS, L., MOHAN, C., WILMS, P., AND YOST, R. Computation and communication in R*: A distributed database manager. In Proceedings of the 9th ACM Symposium on Operating Systems Principles (Bretton Woods, N.H., Oct. 1983). ACM, New York, 1983, 1-10. Also available as IBM Res. Rep. RJ3740, IBM, San Jose, Calif., Jan. 1983. Google Scholar
- 20 LINDSAY, B. G., SELINGER, P. G., GALTIERI, C., GRAY, J. N., LORIE, R. A., PRICE, T. G., PUTZOLU, F., TRAIGER, I. L., AND WADE, B.W. Single and multi-site recovery facilities. In Distributed Data Bases, I. W. Draffan and F. Poole, Eds. Cambridge University Press, Cambridge, UK, 1980. Also available as Notes on Distributed Databases, IBM Res. Rep. RJ2571, IBM, San Jose, Calif., July 1979, 44-50.Google Scholar
- 21 LISKOV, B., AND SCHEIFLER, R. Guardians and actions: Linguistic support for robust, distributed programs. ACM Trans. Program. Lang. Syst. 5, 3 (July 1983), 381-404. Google Scholar
- 22 LYON, B., AND SAGER, G. Overview of the SUN network file system. SUN Microsystems, Inc., Mountain View, Calif., Jan. 1985, 1-8.Google Scholar
- 23 MOHAN, C., LINDSAY, B., AND OBERMARCK, R. Transaction management in the R* distributed database management system. ACM Trans. Database Syst. 11, 4 (Dec. 1986), 378-396. Also available as IBM Res. Rep. RJ5037, IBM, San Jose, Calif., Feb. 1986. Google Scholar
- 24 MOHAN, C., STRONG, H. R., AND FXNKELSTEIN, S. Method for distributed transaction commit and recovery using Byzantine agreement within clusters of processors. In Proceedings of the 2nd ACM Symposium on Principles o{ Distributed Computing (Montreal, Aug. 1983). ACM, New York, 1983, 89-103. Also IBM Res. Rep. RJ3882. Google Scholar
- 25 Moss, E.B. Nested Transactions: An Approach to Reliable Distributed Computing, MIT Press, Cambridge, Mass., 1985. Google Scholar
- 26 MOLLER, E. T., MOORE, J. D., AND POPEK, G.J. A nested transaction mechanism for LOCUS. In Proceedings of the 9th A CM Symposium on Operating System Principles (Bretton Woods, N.H., Oct. 1983). ACM, New York, 1983, 71-89. Google Scholar
- 27 OBERMARCK, R. Distributed deadlock detection algorithm. A CM Trans. Database Syst. 7, 2 (June 1982), 187-208. Google Scholar
- 28 OKI, B., LISKOV, B., AND SCHEIFLER, R. Reliable object storage to support atomic actions. In Proceedings of the lOth ACM Symposium on Operating Systems Principles (Orcas Island, Wash., Dec. 1985). ACM, New York, 1985, 147-159. Google Scholar
- 29 POPEK, G., WALKER, B., CHOW, J., EDWARDS, D., KLINE, C., RUDISIN, G., AND THIEL, G. LOCUS: A network transparent high reliability distributed system. In Proceedings of the 8th ACM Symposium on Operating Systems Principles (Pacific Grove, Calif., Dec. 1981). ACM, New York, 1981, 169-177. Google Scholar
- 30 Pu, C., NOE, J. D., AND PROUDFOOT, A. Regeneration of replicated objects: A technique and its Eden implementation. In Proceedings o/the 2nd International Conference on Data Engineering, (Los Angeles, Feb. 1986). IEEE Press, New York, 1986, 175-187. Google Scholar
- 31 RASHID, R., AND ROBERTSON, G. Accent: A communication oriented network operating system kernel. In Proceedings of the 8th A CM Symposium on Operating Systems Principles (Pacific Grove, Calif., Dec. 1981). ACM, New York, 1981, 64-75. Google Scholar
- 32 REED, D., AND SVOBODOVA, L. SWALLOW: A distributed data storage system for a local network. In Networks for Computer Communications, North-Holland, Amsterdam, 1981, 355- 373.Google Scholar
- 33 SCHWARZ, P.M. Transactions on Typed Objects. Ph.D. Dissertation, Carnegie-Mellon Univ., Pittsburgh, Pa., Dec. 1984. Available as CMU Tech. Rep. CMU-CS-84-166. Google Scholar
- 34 SPECTOR, A. Z., BUTCHER, J., DANIELS, D. S., DUCHAMP, D. J., EPPINGER, J. L., FINEMAN, C. E., HEDDAYA, A., AND SCHWARZ, P.M. Support for distributed transactions in the TABS prototype. IEEE Trans. Softw. Eng. SE-11, 6 (June 1985), 520-530.Google Scholar
- 35 SPECTOR, A. Z., DANIELS, D., DUCHAMP, D., EPPINGER, J., AND PAUSCH, R. Distributed transactions for reliable systems. In Proceedings of the l Oth A CM Symposium on Operating Systems Principles (Orcas Island, Wash., Dec. 1985. ACM, New York, 1985, 127-146. Google Scholar
- 36 SPECTOR, A., ET AL. Camelot: A distributed transaction facility for Mach and the internet--an interim report. Tech. Rep. CMU-CS-87-129, Dept. of Computer Science, Carnegie Mellon Univ., Pittsburgh, Pa., June 1987.Google Scholar
- 37 STONEBRAKER, M. Operating systems support for database management. Commun. A CM 24, 7 (July 1981), 412-418. Google Scholar
- 38 WEINSTEIN, M. J., PAGE, T. W., LIVEZEY, B. K., AND POPEK, G.J. Transactions and synchronization in a distributed operating system. In Proceedings of the l Oth A CM Symposium on Operating Systems Principles (Orcas Island, Wash., Dec. 1985). ACM, New York, 1985, 115-126. Google Scholar
Index Terms
- Recovery management in QuickSilver
Recommendations
Recovery management in QuickSilver
One price of extensibility and distribution, as implemented in QuickSilver, is a more complicated set of failure modes, and the consequent necessity of dealing with them. In traditional operating systems, services (e.g., file, display) are intrinsic ...
Recovery management in QuickSilver
SOSP '87: Proceedings of the eleventh ACM Symposium on Operating systems principlesOne price of extensibility and distribution, as implemented in QuickSilver, is a more complicated set of failure modes, and the consequent necessity of dealing with them. In traditional operating systems, services (e.g., file, display) are intrinsic ...
Comments