Abstract
Protocols to implement a fault-tolerant computing system are described. These protocols augment the hypervisor of a virtual-machine manager and coordinate a primary virtual machine with its backup. No modifications to the hardware, operating system, or application programs are required. A prototype system was constructed for HP's PA-RISC instruction-set architecture. Even though the prototype was not carefully tuned, it ran programs about a factor of 2 slower than a bare machine would.
- ALSBERG, P. A. AND DAY, J.D. 1976. A principle for resilient sharing of distributed resources. In Proceedings of the 2nd International Conference on Software Engineering (San Francisco, Calif.). IEEE, New York, 627-644.]] Google Scholar
- BERNSTEIN, P. A., HADZlLACOS, V., AND GOODMAN, N. 1987. Concurrency Control and Recovery in Database Systems. Addison-Wesley, Reading, Mass.]] Google Scholar
- BIRMAN, K.P. 1993. The process group approach to reliable distributed computing. Commun. ACM 36, 12 (Dec.), 37-52.]] Google Scholar
- B^RTLE~, J.F. 1981. A nonstop kernel. In Proceedings of the 8th Symposium on Operating Systems Principles (Asilomar, Calif., Dec.). ACM, New York, 22-29.]] Google Scholar
- BRESSOVD, T.C. 1996. Hypervisor-based fault-tolerance. Ph.D. dissertation, Computer Science Dept., Cornell Univ., Ithaca, N.Y. Jan.]]Google Scholar
- BORG, A., B^UMB^CH, J., AND GLAZER, S. 1983. A message system for supporting fault tolerance. In Proceedings of the 9th Symposium on Operating Systems Principles (Bretten Woods, New Hamp., Oct.). ACM, New York, 90-99.]] Google Scholar
- BoR~, A., BLAU, W., GP~ETSCH, W., HER~, F., AND OBERLE, W. 1985. Fault tolerance under UNIX. ACM Trans. Comput. Syst. 3, 1 (Feb.), 63-75.]]Google Scholar
- CuTs, R. W., NI~IL, A. M., ~D JEWZTT, D.E. 1990. Multiple processor system having shared memory with private-write capability. U.S. Patent 4,965,717, U.S. Patent Office, Washington, D.C. Oct.]]Google Scholar
- ELNOZAHY, E.N. 1995. An efficient technique for tracking nondeterministic execution and its applications. Tech. Rep. CMU-CS-95-157, Carnegie-Mellon Univ., Pittsburgh, Pa. May.]] Google Scholar
- GOLDBERO, R.P. 1974. Survey of virtual machine research. Comput. Mag. 7, 3 (June), 34-45.]]Google Scholar
- GLEESON, B. 1994. Fault tolerant computer system with provision for handling external events. U.S. Patent 5,363,503, U.S. Patent Office, Washington, D.C. Nov.]]Google Scholar
- GaAHAM, S. L., LUCCO, S., AND WMtBE, R. 1995. Adaptable binary programs. In Proceedings of the 1995 USENIX Winter Conference (New Orleans, La., Jan.). USENIX Assoc., Berkeley, Calif., 315-325.]] Google Scholar
- HEWLETT PACKARD. 1987. Precision Architecture and Instruction Reference Manual. Part no. 09740-90014, Hewlett Packard, Cupertino, Calif. June.]]Google Scholar
- IBM. 1972. IBM Virtual Machine Facility/370 Planning Guide. Pub. no. GC20-1801-0. IBM Corp., White Plains, N.Y.]]Google Scholar
- KARGER, P. 1982. Preliminary design of a VAX-11 virtual machine monitor security kernel. DEC Tech. Rep. TR-126, Digital Equipment Corp., Hudson, Mass. Jan.]]Google Scholar
- LAMPOR?, L. 1978. Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21, 7 (July), 558-565.]] Google Scholar
- MELLOR-CRUMMEY, J. M. ~w LEBLS~C, T.J. 1989. A sot~ware instruction counter. In Proceedings of the 3rd International Conference on Architectural Support for Programming Languages and Operating Systems (Boston, Mass., Apr.). ACM, New York, 78-86.]] Google Scholar
- MAJOR, D., MINSHALL, G., AND POWELL, K. 1994. An overview of the NetWare operating system. In Proceedings of the 1994 Winter USEN1X (San Francisco, Calif., Jan.). USENIX Assoc., Berkeley, Calif., 355-372.]] Google Scholar
- MAJOR, D., POWELL, K., AND NELBAUR, D. 1992. Fault tolerant computer system. U.S. Patent 5,157,663, U.S. Patent Office, Washington, D.C. Oct.]]Google Scholar
- MEYER, P. A. AND SEAWR1GHT, L.H. 1970. A virtual machine time-sharing system. IBM Syst. J. 9, 3, 199 218.]]Google Scholar
- POPEK, G. J. AND KLINE, C. 1974. Verifiable secure operating system software. In the AFIPS Conference Proceedings. AFIPS, Montvale, N.J.]]Google Scholar
- POPEK, G. J. AND KLINE, C. 1975. The PDP-11 virtual machine architecture: A case study. In New York, 97-105.]] Google Scholar
- POWEI. k, M. L. AND PRF. SOTTO, D L. 1983. Publishing: A reliable broadcast communication mechanism. In Proceedings of the 9th Symposium on Operating Systems Principles (Bretton Woods, New Hump., Oct.). ACM, New York, 100 109.]] Google Scholar
- SCHLICHTING, R. AND SCHNEIDER, F.B. 1983. Failstop processors: An approach to desig~ling fault-tolerant computing systems. ACM Trans. Comput. Syst. I, 3 (Aug.), 222-238.]] Google Scholar
- SCHNEIDER, F. B. 1990. Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Comput. Surv. 22, 4 (Dec.), 299-319.]] Google Scholar
- SITES, R. 1992. Alpha Architecture Reference Manual. Digital Press, Bedford, Mass.]] Google Scholar
- SIEWIOREK, D. P. AND Swngz, R.S. 1992. Reliable Computer System Design and Evaluation. Digital Press, Beford, Mass.]] Google Scholar
- THEKKATH, C. A. AND LEVY, H.M. 1993. Low-latency communication on high-speed networks. ACM Trans. Comput. Syst. 11, 2 (May), 179-203.]] Google Scholar
Index Terms
- Hypervisor-based fault tolerance
Recommendations
Fast and live hypervisor replacement
VEE 2019: Proceedings of the 15th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution EnvironmentsHypervisors are increasingly complex and must be often updated for applying security patches, bug fixes, and feature upgrades. However, in a virtualized cloud infrastructure, updates to an operational hypervisor can be highly disruptive. Before being ...
Shrinking the hypervisor one subsystem at a time: a userspace packet switch for virtual machines
VEE '14Efficient and secure networking between virtual machines is crucial in a time where a large share of the services on the Internet and in private datacenters run in virtual machines. To achieve this efficiency, virtualization solutions, such as Qemu/KVM, ...
Embedded Hypervisor Xvisor: A Comparative Analysis
PDP '15: Proceedings of the 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based ProcessingVirtualization technology has shown immense popularity within embedded systems due to its direct relationship with cost reduction, better resource utilization, and higher performance measures. Efficient hypervisors are required to achieve such high ...
Comments