skip to main content
article
Free Access

Hypervisor-based fault tolerance

Published:01 February 1996Publication History
Skip Abstract Section

Abstract

Protocols to implement a fault-tolerant computing system are described. These protocols augment the hypervisor of a virtual-machine manager and coordinate a primary virtual machine with its backup. No modifications to the hardware, operating system, or application programs are required. A prototype system was constructed for HP's PA-RISC instruction-set architecture. Even though the prototype was not carefully tuned, it ran programs about a factor of 2 slower than a bare machine would.

References

  1. ALSBERG, P. A. AND DAY, J.D. 1976. A principle for resilient sharing of distributed resources. In Proceedings of the 2nd International Conference on Software Engineering (San Francisco, Calif.). IEEE, New York, 627-644.]] Google ScholarGoogle Scholar
  2. BERNSTEIN, P. A., HADZlLACOS, V., AND GOODMAN, N. 1987. Concurrency Control and Recovery in Database Systems. Addison-Wesley, Reading, Mass.]] Google ScholarGoogle Scholar
  3. BIRMAN, K.P. 1993. The process group approach to reliable distributed computing. Commun. ACM 36, 12 (Dec.), 37-52.]] Google ScholarGoogle Scholar
  4. B^RTLE~, J.F. 1981. A nonstop kernel. In Proceedings of the 8th Symposium on Operating Systems Principles (Asilomar, Calif., Dec.). ACM, New York, 22-29.]] Google ScholarGoogle Scholar
  5. BRESSOVD, T.C. 1996. Hypervisor-based fault-tolerance. Ph.D. dissertation, Computer Science Dept., Cornell Univ., Ithaca, N.Y. Jan.]]Google ScholarGoogle Scholar
  6. BORG, A., B^UMB^CH, J., AND GLAZER, S. 1983. A message system for supporting fault tolerance. In Proceedings of the 9th Symposium on Operating Systems Principles (Bretten Woods, New Hamp., Oct.). ACM, New York, 90-99.]] Google ScholarGoogle Scholar
  7. BoR~, A., BLAU, W., GP~ETSCH, W., HER~, F., AND OBERLE, W. 1985. Fault tolerance under UNIX. ACM Trans. Comput. Syst. 3, 1 (Feb.), 63-75.]]Google ScholarGoogle Scholar
  8. CuTs, R. W., NI~IL, A. M., ~D JEWZTT, D.E. 1990. Multiple processor system having shared memory with private-write capability. U.S. Patent 4,965,717, U.S. Patent Office, Washington, D.C. Oct.]]Google ScholarGoogle Scholar
  9. ELNOZAHY, E.N. 1995. An efficient technique for tracking nondeterministic execution and its applications. Tech. Rep. CMU-CS-95-157, Carnegie-Mellon Univ., Pittsburgh, Pa. May.]] Google ScholarGoogle Scholar
  10. GOLDBERO, R.P. 1974. Survey of virtual machine research. Comput. Mag. 7, 3 (June), 34-45.]]Google ScholarGoogle Scholar
  11. GLEESON, B. 1994. Fault tolerant computer system with provision for handling external events. U.S. Patent 5,363,503, U.S. Patent Office, Washington, D.C. Nov.]]Google ScholarGoogle Scholar
  12. GaAHAM, S. L., LUCCO, S., AND WMtBE, R. 1995. Adaptable binary programs. In Proceedings of the 1995 USENIX Winter Conference (New Orleans, La., Jan.). USENIX Assoc., Berkeley, Calif., 315-325.]] Google ScholarGoogle Scholar
  13. HEWLETT PACKARD. 1987. Precision Architecture and Instruction Reference Manual. Part no. 09740-90014, Hewlett Packard, Cupertino, Calif. June.]]Google ScholarGoogle Scholar
  14. IBM. 1972. IBM Virtual Machine Facility/370 Planning Guide. Pub. no. GC20-1801-0. IBM Corp., White Plains, N.Y.]]Google ScholarGoogle Scholar
  15. KARGER, P. 1982. Preliminary design of a VAX-11 virtual machine monitor security kernel. DEC Tech. Rep. TR-126, Digital Equipment Corp., Hudson, Mass. Jan.]]Google ScholarGoogle Scholar
  16. LAMPOR?, L. 1978. Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21, 7 (July), 558-565.]] Google ScholarGoogle Scholar
  17. MELLOR-CRUMMEY, J. M. ~w LEBLS~C, T.J. 1989. A sot~ware instruction counter. In Proceedings of the 3rd International Conference on Architectural Support for Programming Languages and Operating Systems (Boston, Mass., Apr.). ACM, New York, 78-86.]] Google ScholarGoogle Scholar
  18. MAJOR, D., MINSHALL, G., AND POWELL, K. 1994. An overview of the NetWare operating system. In Proceedings of the 1994 Winter USEN1X (San Francisco, Calif., Jan.). USENIX Assoc., Berkeley, Calif., 355-372.]] Google ScholarGoogle Scholar
  19. MAJOR, D., POWELL, K., AND NELBAUR, D. 1992. Fault tolerant computer system. U.S. Patent 5,157,663, U.S. Patent Office, Washington, D.C. Oct.]]Google ScholarGoogle Scholar
  20. MEYER, P. A. AND SEAWR1GHT, L.H. 1970. A virtual machine time-sharing system. IBM Syst. J. 9, 3, 199 218.]]Google ScholarGoogle Scholar
  21. POPEK, G. J. AND KLINE, C. 1974. Verifiable secure operating system software. In the AFIPS Conference Proceedings. AFIPS, Montvale, N.J.]]Google ScholarGoogle Scholar
  22. POPEK, G. J. AND KLINE, C. 1975. The PDP-11 virtual machine architecture: A case study. In New York, 97-105.]] Google ScholarGoogle Scholar
  23. POWEI. k, M. L. AND PRF. SOTTO, D L. 1983. Publishing: A reliable broadcast communication mechanism. In Proceedings of the 9th Symposium on Operating Systems Principles (Bretton Woods, New Hump., Oct.). ACM, New York, 100 109.]] Google ScholarGoogle Scholar
  24. SCHLICHTING, R. AND SCHNEIDER, F.B. 1983. Failstop processors: An approach to desig~ling fault-tolerant computing systems. ACM Trans. Comput. Syst. I, 3 (Aug.), 222-238.]] Google ScholarGoogle Scholar
  25. SCHNEIDER, F. B. 1990. Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Comput. Surv. 22, 4 (Dec.), 299-319.]] Google ScholarGoogle Scholar
  26. SITES, R. 1992. Alpha Architecture Reference Manual. Digital Press, Bedford, Mass.]] Google ScholarGoogle Scholar
  27. SIEWIOREK, D. P. AND Swngz, R.S. 1992. Reliable Computer System Design and Evaluation. Digital Press, Beford, Mass.]] Google ScholarGoogle Scholar
  28. THEKKATH, C. A. AND LEVY, H.M. 1993. Low-latency communication on high-speed networks. ACM Trans. Comput. Syst. 11, 2 (May), 179-203.]] Google ScholarGoogle Scholar

Index Terms

  1. Hypervisor-based fault tolerance

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader