ABSTRACT
Rebooting an operating system is a final but effective recovery technique. However, the system performance largely degrades just after the reboot due to the page cache being lost in the main memory. For fast performance recovery, we propose a new reboot mechanism called the warm-cache reboot. The warm-cache reboot preserves the page cache during the reboot and enables an operating system to restore it after the reboot, with the help of a virtual machine monitor (VMM). To perform correct recovery, the VMM guarantees that the reused page cache is consistent with the corresponding files on disks. We have implemented the warm-cache reboot mechanism in the Xen VMM and the Linux operating system. Our experimental results showed that the warm-cache reboot decreased performance degradation just after the reboot. In addition, we confirmed that the file cache corrupted by faults was not reused. The overheads for maintaining cache consistency were not usually large.
- Apache Software Foundation. Apache HTTP Server Project. http://httpd.apache.org/.Google Scholar
- M. Baker and M. Sullivan. The Recovery Box: Using Fast Recovery to Provide High Availability in the UNIX Environment. In Proceedings of the Summer USENIX Conference, pages 31--44, 1992.Google Scholar
- P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, and A. Warfield. Xen and the Art of Virtualization. In Proceedings of the 19th Symposium on Operating Systems Principles, pages 164--177, 2003. Google ScholarDigital Library
- G. Candea, S. Kawamoto, Y. Fujiki, G. Friedman, and A. Fox. Microreboot -- A Technique for Cheap Recovery. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation, pages 31--44, 2004. Google ScholarDigital Library
- P. Chen, W. Ng, S. Chandra, C. Aycock, G. Rajamani, and D. Lowell. The Rio File Cache: Surviving Operating System Crashes. In Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 74--83, 1996. Google ScholarDigital Library
- A. Depoutovitch and M. Stumm. Otherworld - Giving Applications a Chance to Survive OS Kernel Crashes. In Proceedings of the 5th European Conference on Computer Systems, pages 181--194, 2010. Google ScholarDigital Library
- S. Garg, A. Puliafito, M. Telek, and K. Trivedi. Analysis of Preventive Maintenance in Transactions Based Software Systems. IEEE Transactions on Computers, 47 (1): 96--107, 1998. Google ScholarDigital Library
- M. Grottke and K. Trivedi. Fighting Bugs: Remove, Retry, Replicate, and Rejuvenate. IEEE Computer, 40 (2): 107--109, 2007. Google ScholarDigital Library
- J. Halderman, S. Schoen, N. Heninger, W. Clarkson, W. Paul, J. Calandrino, A. Feldman, J. Appelbaum, and E. Felten. Lest We Remember: Cold Boot Attacks on Encryption Keys. In Proceedings of the USENIX Security Symposium, pages 45--60, 2008. Google ScholarDigital Library
- Y. Huang, C. Kintala, N. Kolettis, and N. Fulton. Software Rejuvenation: Analysis, module and Applications. In Proceedings of the 25th International Symposium on Fault-Tolerant Computing, pages 381--391, 1995. Google ScholarDigital Library
- S. Jones,, A. Arpaci-Dusseau, and R. Arpaci-Dusseau. Geiger: Monitoring the Buffer Cache in a Virtual Machine Environment. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 14--24, 2006. Google ScholarDigital Library
- H. Kaminaga. Improving Linux Startup Time Using Software Resume (and Other Techniques). In Proceedings of the Linux Symposium, pages 25--34, 2006.Google Scholar
- A. Kivity, Y. Kamay, and D. Laor. KVM: The Linux Virtual Machine Monitor. In Proceedings of the Linux Symposium, pages 225--230, 2007.Google Scholar
- K. Kourai and S. Chiba. A Fast Rejuvenation Technique for Server Consolidation with Virtual Machines. In Proceedings of the 37th International Conference on Dependable Systems and Networks, pages 245--254, 2007. Google ScholarDigital Library
- K. Kourai and S. Chiba. Fast Software Rejuvenation of Virtual Machine Monitors. IEEE Transactions on Dependable and Secure Computing, 2010. Google ScholarDigital Library
- P. Lu and K. Shen. Virtual Machine Memory Access Tracing with Hypervisor Exclusive Cache. In Proceedings of the USENIX Annual Technical Conference, pages 1--15, 2007. Google ScholarDigital Library
- D. Mosberger and T. Jin. httperf: A Tool for Measuring Web Server Performance. Performance Evaluation Review, 26 (3): 31--37, 1998. Google ScholarDigital Library
- W. Ng and P. Chen. The Design and Verification of the Rio File Cache. IEEE Transactions on Computers, 50 (4): 322--337, 2001. Google ScholarDigital Library
- W. Norcott and D. Capps. IOzone Filesystem Benchmark.Google Scholar
- A. Pfiffer. Reducing System Reboot Time with kexec. http://www.osdl.org/.Google Scholar
- M. Swift, B. Bershad, and H. Levy. Improving the Reliability of Commodity Operating Systems. In Proceedings of the 19th Symposium on Operating Systems Principles, pages 207--222, 2003. Google ScholarDigital Library
- Transaction Processing Performance Council. TPC Benchmark H Standard Specification Revision 2.9.0. http://www.tpc.org/, 2009.Google Scholar
- C. Waldspurger. Memory Resource Management in VMware ESX Server. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation, pages 181--194, 2002. Google ScholarDigital Library
- J. Zhang and M. Wong. Database Test Suite. http://osdldbt.sourceforge.net/.Google Scholar
Index Terms
- Fast and correct performance recovery of operating systems using a virtual machine monitor
Recommendations
Fast and correct performance recovery of operating systems using a virtual machine monitor
VEE '11Rebooting an operating system is a final but effective recovery technique. However, the system performance largely degrades just after the reboot due to the page cache being lost in the main memory. For fast performance recovery, we propose a new reboot ...
Error Recovery in Shared Memory Multiprocessors Using Private Caches
The problem of recovering from processor transient faults in shared memory multiprocessor systems is examined. A user-transparent checkpointing and recovery scheme using private caches is presented. Processes can recover from errors due to faulty ...
Performance aspects of distributed caches using, TTL-based consistency
Automata, languages and programmingThe web is the largest distributed database deploying time-to-live-based weak consistency. Each object has a lifetime-duration assigned to it by its origin server. A copy of the object fetched from its origin server is received with maximum time-to-live ...
Comments