skip to main content
article
Free Access

Recovery management in QuickSilver

Authors Info & Claims
Published:01 November 1987Publication History
Skip Abstract Section

Abstract

One price of extensibility and distribution, as implemented in QuickSilver, is a more complicated set of failure modes, and the consequent necessity of dealing with them. In traditional operating systems, services (e.g., file, display) are intrinsic pieces of the kernel. Process state is maintained in kernel tables, and the kernel contains explicit cleanup code (e.g., to close files, reclaim memory, and get rid of process images after hardware or software failures). QuickSilver, however, is structured according to the client-server model, and as in many systems of its type, system services are implemented by user-level processes that maintain a substantial amount of client process state. Examples of this state are the open files, screen windows, address space, etc., belonging to a process. Failure resilience in such an environment requires that clients and servers be aware of problems involving each other. Examples of the way one would like the system to behave include having files closed and windows removed from the screen when a client terminates, and having clients see bad return codes (rather than hanging) when a file server crashes. This motivates a number of design goals:

  • Properly written programs (especially servers) should be resilient to external process and machine failures, and should be able to recover all resources associated with failed entities.

  • Server processes should contain their own recovery code. The kernel should not make any distinction between system service processes and normal application processes.

  • To avoid the proliferation of ad-hoc recovery mechanisms, there should be a uniform system-wide architecture for recovery management.

  • A client may invoke several independent servers to perform a set of logically related activitites (a unit of work) that must execute atomically in the presence of failures, that is, either all the related activities should occur or none of them should. The recovery mechanism should support this.

In QuickSilver, recovery is based on the database notion of atomic transactions, which are made available as a system service to be used by other, higher-level servers. This allows meeting all the above design goals. Software portability is important in the QuickSilver environment, dictating that transaction-based recovery be accessible to conventional programming languages rather than a special-purpose one such as Argus [Liskov84]. To accommodate servers with diverse recovery demands, the low-level primitives of commit coordination and log recovery are exposed directly rather than building recovery on top of a stable-storage mechanism such as in CPR [Attanasio87] or recoverable objects such as those in Camelot [Spector87] or Clouds [Allchin&McKendry83].

References

  1. Allchin & McKendry 83 Allchin, J. E., McKendry, M. S., Synchronization and recovery of actions, Proceedings of the Second A CM Symposium on Principles of Distributed Computing (August 1983) pp. 31-44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Attanasio 87 Attanasio, C.R., CPR supervisor support for relational database facility, IBM Technical Report RC 12416 (January 1987).Google ScholarGoogle Scholar
  3. Liskov 84 Liskov, B., Overview of the Argus language and system, MIT Laboratory for Computer Science (February 1984).Google ScholarGoogle Scholar
  4. Spector 87 Spector, A., et. al., Camelot: A distributed transaction facility for Math and the intemet- An Interim Report, CMU Technical Report CMU-CS-87-129 (June, 1987).Google ScholarGoogle Scholar
  5. Moss 85 Moss E. B., Nested Transactions: an Approach to Reliable Distributed Computing, MIT Press (1985). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Obermarck 82 Obermarck R., Distributed deadlock detection algorithm, A CM Transactions on Database Systems Volume 7, Number 2 (June 1982) pp. 187-208. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Recovery management in QuickSilver

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGOPS Operating Systems Review
          ACM SIGOPS Operating Systems Review  Volume 21, Issue 5
          Nov. 1987
          162 pages
          ISSN:0163-5980
          DOI:10.1145/37499
          Issue’s Table of Contents
          • cover image ACM Conferences
            SOSP '87: Proceedings of the eleventh ACM Symposium on Operating systems principles
            November 1987
            162 pages
            ISBN:089791242X
            DOI:10.1145/41457

          Copyright © 1987 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 November 1987

          Check for updates

          Qualifiers

          • article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader