ABSTRACT
Commodity operating systems execute core kernel subsystems in a single address space along with hundreds of dynamically loaded extensions and device drivers. Lack of isolation within the kernel implies that a vulnerability in any of the kernel subsystems or device drivers opens a way to mount a successful attack on the entire kernel.
Historically, isolation within the kernel remained prohibitive due to the high cost of hardware isolation primitives. Recent CPUs, however, bring a new set of mechanisms. Extended page-table (EPT) switching with VM functions and memory protection keys (MPKs) provide memory isolation and invocations across boundaries of protection domains with overheads comparable to system calls. Unfortunately, neither MPKs nor EPT switching provide architectural support for isolation of privileged ring 0 kernel code, i.e., control of privileged instructions and well-defined entry points to securely restore state of the system on transition between isolated domains.
Our work develops a collection of techniques for lightweight isolation of privileged kernel code. To control execution of privileged instructions, we rely on a minimal hypervisor that transparently deprivileges the system into a non-root VT-x guest. We develop a new isolation boundary that leverages extended page table (EPT) switching with the VMFUNC instruction. We define a set of invariants that allows us to isolate kernel components in the face of an intricate execution model of the kernel, e.g., provide isolation of preemptable, concurrent interrupt handlers. To minimize overheads of virtualization, we develop support for exitless interrupt delivery across isolated domains. We evaluate our approach by developing isolated versions of several device drivers in the Linux kernel.
- Bareflank Hypervisor SDK. http://bareflank.github.io/hypervisor/.Google Scholar
- Code-Pointer Integrity in Clang/LLVM. https://github.com/cpi-llvm/compiler-rt.Google Scholar
- LKDDb: Linux Kernel Driver DataBase. https://cateee.net/lkddb/. Accessed on 04.23.2019.Google Scholar
- seL4 performance. https://sel4.systems/About/Performance/.Google Scholar
- Intel 64 and IA-32 Architectures Software Developer's Manual, 2017. https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf.Google Scholar
- Jonathan Appavoo, Marc Auslander, Dilma DaSilva, David Edelsohn, Orran Krieger, Michal Ostrowski, Bryan Rosenburg, R Wisniewski, and Jimi Xenidis. Utilizing Linux kernel components in K42. Technical report, Technical report, IBM Watson Research, 2002.Google Scholar
- Scott Bauer. Please stop naming vulnerabilities: Exploring 6 previously unknown remote kernel bugs affecting android phones. https://pleasestopnamingvulnerabilities.com, 2017.Google Scholar
- Andrew Baumann, Paul Barham, Pierre-Evariste Dagand, Tim Harris, Rebecca Isaacs, Simon Peter, Timothy Roscoe, Adrian Schüpbach, and Akhilesh Singhania. The Multikernel: A new os architecture for scalable multicore systems. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, SOSP'09, pages 29--44, New York, NY, USA, 2009. ACM.Google ScholarDigital Library
- D. W. Boettner and M. T. Alexander. The Michigan Terminal System. Proceedings of the IEEE, 63(6):912--918, June 1975.Google Scholar
- Allen C. Bomberger, William S. Frantz, Ann C. Hardy, Norman Hardy, Charles R. Landau, and Jonathan S. Shapiro. The KeyKOS nanokernel architecture. In Proceedings of the Workshop on Micro-kernels and Other Kernel Architectures, pages 95--112, Berkeley, CA, USA, 1992.Google ScholarDigital Library
- Silas Boyd-Wickizer and Nickolai Zeldovich. Tolerating malicious device drivers in Linux. In USENIX ATC, pages 9--9, 2010.Google ScholarDigital Library
- Bromium. Bromium micro-virtualization, 2010. http://www.bromium.com/misc/BromiumMicrovirtualization.pdf.Google Scholar
- Edouard Bugnion, Scott Devine, Kinshuk Govil, and Mendel Rosenblum. Disco: Running commodity operating systems on scalable multiprocessors. ACM Trans. Comput. Syst., 15(4):412--447, November 1997.Google ScholarDigital Library
- Miguel Castro, Manuel Costa, Jean-Philippe Martin, Marcus Peinado, Periklis Akritidis, Austin Donnelly, Paul Barham, and Richard Black. Fast byte-granularity software fault isolation. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, SOSP'09, pages 45--58, New York, NY, USA, 2009. ACM.Google ScholarDigital Library
- Stephen Checkoway and Hovav Shacham. Iago Attacks: Why the system call API is a bad untrusted RPC interface. In ASPLOS XVIII, pages 253--264. ACM, April 2013.Google ScholarDigital Library
- Gang Chen, Hai Jin, Deqing Zou, Bing Bing Zhou, Zhenkai Liang, Weide Zheng, and Xuanhua Shi. Safestack: Automatically patching stack-based buffer overflow vulnerabilities. IEEE Transactions on Dependable and Secure Computing, 10(6):368--379, 2013.Google ScholarDigital Library
- Haogang Chen, Yandong Mao, Xi Wang, Dong Zhou, Nickolai Zeldovich, and M. Frans Kaashoek. Linux kernel vulnerabilities: state-of-the-art defenses and open problems. In APSys, pages 5:1--5:5, 2011.Google ScholarDigital Library
- CloudLab testbed. http://cloudlab.us/.Google Scholar
- Jonathan Corbet. Supervisor mode access prevention. https://lwn.net/Articles/517475/, 2012.Google Scholar
- Crispin Cowan, Calton Pu, Dave Maier, Heather Hinton, and Jonathan Walpole. StackGuard: Automatic Adaptive Detection and Prevention of Buffer-Overflow Attacks. In USENIX Security Symposium, 1998.Google Scholar
- Tudor David, Rachid Guerraoui, and Vasileios Trigonakis. Everything you always wanted to know about synchronization but were afraid to ask. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pages 33--48. ACM, 2013.Google ScholarDigital Library
- DDEKit and DDE for linux. http://os.inf.tu-dresden.de/ddekit/.Google Scholar
- Eric Eide, Kevin Frei, Bryan Ford, Jay Lepreau, and Gary Lindstrom. Flick: A flexible, optimizing IDL compiler. In ACM SIGPLAN Notices, volume 32, pages 44--56. ACM, 1997.Google ScholarDigital Library
- Kevin Elphinstone and Stefan Götz. Initial evaluation of a user-level device driver framework. In Asia-Pacific Conference on Advances in Computer Systems Architecture, pages 256--269. Springer, 2004.Google ScholarCross Ref
- Kevin Elphinstone and Gernot Heiser. From L3 to seL4 what have we learnt in 20 years of L4 microkernels? In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pages 133--150. ACM, 2013.Google ScholarDigital Library
- Ülfar Erlingsson, Martín Abadi, Michael Vrable, Mihai Budiu, and George C. Necula. Xfi: Software guards for system address spaces. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation, OSDI '06, pages 75--88, Berkeley, CA, USA, 2006. USENIX Association.Google ScholarDigital Library
- Feske, N. and Helmuth, C. Design of the Bastei OS architecture. Techn. Univ., Fakultät Informatik, 2007.Google Scholar
- Stephen Fischer. Supervisor mode execution protection. NSA Trusted Computing Conference, 2011.Google Scholar
- Bryan Ford, Godmar Back, Greg Benson, Jay Lepreau, Albert Lin, and Olin Shivers. The flux OSKit: A substrate for kernel and language research. In Proceedings of the 16th ACM Symposium on Operating Systems Principles, pages 38--51, 1997.Google ScholarDigital Library
- Alessandro Forin, David Golub, and Brian N Bershad. An I/O system for Mach 3.0. Carnegie-Mellon University. Department of Computer Science, 1991.Google Scholar
- Keir Fraser, Steven Hand, Rolf Neugebauer, Ian Pratt, Andrew Warfield, and Mark Williamson. Safe hardware access with the Xen virtual machine monitor. In 1st Workshop on Operating System and Architectural Support for the on demand IT InfraStructure (OASIS), 2004.Google Scholar
- Vinod Ganapathy, Matthew J Renzelmann, Arini Balakrishnan, Michael M Swift, and Somesh Jha. The design and implementation of microdrivers. In ACM SIGARCH Computer Architecture News, volume 36, pages 168--178. ACM, 2008.Google ScholarDigital Library
- Tal Garfinkel, Ben Pfaff, Jim Chow, Mendel Rosenblum, and Dan Boneh. Terra: a virtual machine-based platform for trusted computing. In SOSP, pages 193--206, 2003.Google Scholar
- Alain Gefflaut, Trent Jaeger, Yoonho Park, Jochen Liedtke, Kevin J Elphinstone, Volkmar Uhlig, Jonathon E Tidswell, Luke Deller, and Lars Reuther. The SawMill multiserver approach. In Proceedings of the 9th workshop on ACM SIGOPS European Workshop: Beyond the PC: New Challenges for the Operating System, pages 109--114. ACM, 2000.Google ScholarDigital Library
- Shantanu Goel and Dan Duchamp. Linux device driver emulation in Mach. In Proceedings of the USENIX Annual Technical Conference, pages 65--74, 1996.Google Scholar
- David B Golub, Guy G Sotomayor, and Freeman L Rawson III. An architecture for device drivers executing as user-level tasks. In USENIX MACH III Symposium, pages 153--172, 1993.Google ScholarDigital Library
- Google. Fuchsia project. https://fuchsia.dev/fuchsia-src/getting_started.md.Google Scholar
- Google. Protocol buffers. https://developers.google.com/protocol-buffers/.Google Scholar
- Abel Gordon, Nadav Amit, Nadav Har'El, Muli Ben-Yehuda, Alex Landau, Assaf Schuster, and Dan Tsafrir. ELI: bare-metal performance for I/O virtualization. ACM SIGPLAN Notices, 47(4):411--422, 2012.Google ScholarDigital Library
- Spyridoula Gravani, Mohammad Hedayati, John Criswell, and Michael L Scott. Iskios: Lightweight defense against kernel-level code-reuse attacks. arXiv preprint arXiv:1903.04654, 2019.Google Scholar
- Daniel Gruss, Moritz Lipp, Michael Schwarz, Richard Fellner, Clémentine Maurice, and Stefan Mangard. KASLR is dead: long live KASLR. In International Symposium on Engineering Secure Software and Systems, pages 161--176. Springer, 2017.Google ScholarCross Ref
- Andreas Haeberlen, Jochen Liedtke, Yoonho Park, Lars Reuther, and Volkmar Uhlig. Stub-code performance is becoming important. In Proceedings of the 1st Workshop on Industrial Experiences with Systems Software, San Diego, CA, October 22 2000.Google ScholarDigital Library
- Tim Harris, Martin Abadi, Rebecca Isaacs, and Ross McIlroy. AC: composable asynchronous I/O for native languages. In ACM SIGPLAN Notices, volume 46, pages 903--920. ACM, 2011.Google ScholarDigital Library
- Hermann Härtig,, Jork Löser, Frank Mehnert, Lars Reuther, Martin Pohlack, and Alexander Warg. An I/O architecture for microkernel-based operating systems. Technical report, TU Dresden, Dresden, Germany, 2003.Google Scholar
- Mohammad Hedayati, Spyridoula Gravani, Ethan Johnson, John Criswell, Michael L. Scott, Kai Shen, and Mike Marty. Hodor: Intra-process isolation for high-throughput data plane libraries. In 2019 USENIX Annual Technical Conference (USENIX ATC 19), pages 489--504, Renton, WA, July 2019. USENIX Association.Google Scholar
- Heiser, G. and Elphinstone, K. and Kuz, I. and Klein, G. and Petters, S.M. Towards trustworthy computing systems: taking microkernels to the next level. ACM SIGOPS Operating Systems Review, 41(4):3--11, 2007.Google ScholarDigital Library
- Jorrit N Herder, Herbert Bos, Ben Gras, Philip Homburg, and Andrew S Tanenbaum. Minix 3: A highly reliable, self-repairing operating system. ACM SIGOPS Operating Systems Review, 40(3):80--89, 2006.Google Scholar
- Hohmuth, M. and Peter, M. and Härtig, H. and Shapiro, J.S. Reducing TCB size by using untrusted components: small kernels versus virtual-machine monitors. In Proceedings of the 11th workshop on ACM SIGOPS European workshop, page 22. ACM, 2004.Google ScholarDigital Library
- Hovav Shacham, Matthew Page, Ben Pfaff, Eu-Jin Goh, Nagendra Modadugu, and Dan Boneh. On the effectiveness of address-space randomization. In Proceedings of the 11th ACM conference on Computer and Communications Security, pages 298--307, 2004.Google ScholarDigital Library
- Tomas Hruby, Herbert Bos, and Andrew S Tanenbaum. When slower is faster: On heterogeneous multicores for reliable systems. In Presented as part of the 2013 USENIX Annual Technical Conference (USENIX ATC'13), pages 255--266, 2013.Google Scholar
- Zhichao Hua, Dong Du, Yubin Xia, Haibo Chen, and Binyu Zang. EPTI: Efficient defence against meltdown attack for unpatched vms. In 2018 USENIX Annual Technical Conference (USENIX ATC'18), pages 255--266, 2018.Google Scholar
- Galen Hunt and Jim Larus. Singularity: Rethinking the software stack. ACM SIGOPS Operating Systems Review, 41/2:37--49, April 2007.Google ScholarDigital Library
- INTEGRITY Real-Time Operating System. http://www.ghs.com/products/rtos/integrity.html.Google Scholar
- Kyriakos K. Ispoglou, Bader AlBassam, Trent Jaeger, and Mathias Payer. Block oriented programming: Automating data-only attacks. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS '18, pages 1868--1882, New York, NY, USA, 2018. ACM.Google ScholarDigital Library
- Antti Kantee. Flexible operating system internals: the design and implementation of the Anykernel and Rump kernels. PhD thesis, 2012.Google Scholar
- Koen Koning, Xi Chen, Herbert Bos, Cristiano Giuffrida, and Elias Athanasopoulos. No need to hide: Protecting safe regions on commodity hardware. In Proceedings of the 12th European Conference on Computer Systems, EuroSys '17, pages 437--452, New York, NY, USA, 2017. ACM.Google ScholarDigital Library
- Volodymyr Kuznetsov, László Szekeres, Mathias Payer, George Candea, R. Sekar, and Dawn Song. Code-pointer integrity. In USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 147--163, 2014.Google ScholarDigital Library
- Joshua LeVasseur, Volkmar Uhlig, Jan Stoess, and Stefan Götz. Unmodified device driver reuse and improved system dependability via virtual machines. In Proceedings of the 6th Conference on Symposium on Operating Systems Design & Implementation - Volume 6, OSDI'04, pages 2--2, Berkeley, CA, USA, 2004. USENIX Association.Google Scholar
- Jonathan Levin. Mac OS X and IOS Internals: To the Apple's Core. John Wiley & Sons, 2012.Google ScholarDigital Library
- Jochen Liedtke. Improved address-space switching on Pentium processors by transparently multiplexing user address spaces. Technical report, GMD SET-RS, Schlo Birlinghoven, 53754 Sankt Augustin, Germany, 1995.Google Scholar
- Jochen Liedtke, Ulrich Bartling, Uwe Beyer, Dietmar Heinrichs, Rudolf Ruland, and Gyula Szalay. Two years of experience with a μ-kernel based os. SIGOPS Oper. Syst. Rev., 25(2):51--62, April 1991.Google ScholarDigital Library
- Yutao Liu, Tianyu Zhou, Kexin Chen, Haibo Chen, and Yubin Xia. Thwarting memory disclosure with efficient hypervisor-enforced intra-domain isolation. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pages 1607--1619. ACM, 2015.Google ScholarDigital Library
- Yutao Liu, Tianyu Zhou, Kexin Chen, Haibo Chen, and Yubin Xia. Thwarting memory disclosure with efficient hypervisor-enforced intra-domain isolation. In 22nd ACM Conference on Computer and Communications Security (CCS), pages 1607--1619, 2015.Google ScholarDigital Library
- Yandong Mao, Haogang Chen, Dong Zhou, Xi Wang, Nickolai Zeldovich, and M Frans Kaashoek. Software fault isolation with API integrity and multi-principal modules. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles, pages 115--128. ACM, 2011.Google ScholarDigital Library
- Stephen McCamant and Greg Morrisett. Efficient, verifiable binary sandboxing for a CISC architecture. 2005.Google Scholar
- Mellanox. Connectx-6 single/dual-port adapter supporting 200gb/s with vpi. http://www.mellanox.com/page/products_dyn?product_family=265&mtag=connectx_6_vpi_card, 2019.Google Scholar
- Adrian Mettler, David Wagner, and Tyler Close. Joe-E: A security-oriented subset ofJava. In Proc. NDSS, February-March 2010.Google Scholar
- Zeyu Mi, Dingji Li, Zihan Yang, Xinran Wang, and Haibo Chen. Skybridge: Fast and secure inter-process communication for microkernels. In Proceedings of the Fourteenth EuroSys Conference 2019, page 9. ACM, 2019.Google Scholar
- Mark Samuel Miller. Robust Composition: Towards a Unified Approach to Access Control and Concurrency Control. PhD thesis, Johns Hopkins University, May 2006.Google ScholarDigital Library
- Daniel Molka Daniel Hackenberg, and Robert Schöne. Main memory and cache performance of Intel Sandy Bridge and AMD Bulldozer. In Workshop on Memory Systems Performance and Correctness, pages 4:1--4:10, 2014.Google Scholar
- Daniel Molka, Daniel Hackenberg, Robert Schone, and Matthias S Muller. Memory performance and cache coherency effects on an Intel Nehalem multiprocessor system. In PACT, pages 261--270. IEEE, 2009.Google ScholarDigital Library
- Vikram Narayanan, Abhiram Balasubramanian, Charlie Jacobsen, Sarah Spall, Scott Bauer, Michael Quigley, Aftab Hussain, Abdullah Younis, Junjie Shen, Moinak Bhattacharyya, and Anton Burtsev. LXDs : Towards isolation of kernel subsystems. In 2019 USENIX Annual Technical Conference (USENIX ATC 19), 2019.Google Scholar
- Ruslan Nikolaev and Godmar Back. VirtuOS: An operating system with kernel virtualization. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP '13, pages 116--132, New York, NY, USA, 2013. ACM.Google ScholarDigital Library
- Soyeon Park, Sangho Lee, Wen Xu, HyunGon Moon, and Taesoo Kim. libmpk: Software abstraction for Intel Memory Protection Keys (Intel MPK). In 2019 USENIX Annual Technical Conference (USENIX ATC 19), pages 241--254, Renton, WA, July 2019. USENIX Association.Google Scholar
- Phoronix Test Suite: An automated, open-source testing framework. http://www.phoronix-test-suite.com/.Google Scholar
- Octavian Purdila. Linux kernel library. https://lwn.net/Articles/662953/.Google Scholar
- Matthew J Renzelmann and Michael M Swift. Decaf: Moving device drivers to a modern language. In USENIX Annual Technical Conference, 2009.Google Scholar
- Robert Ricci, Eric Eide, and The CloudLab Team. Introducing Cloud-Lab: Scientific infrastructure for advancing cloud architectures and applications. USENIX ;login:, 39(6), December 2014.Google Scholar
- Rutkowska, J. and Wojtczuk, R. Qubes OS architecture. Invisible Things Lab Technical Report, 2010.Google Scholar
- Saltzer, J.H. and Schroeder, M.D. The protection of information in computer systems. Proceedings of the IEEE, 63(9):1278--1308, 1975.Google ScholarCross Ref
- David Sehr, Robert Muth, Cliff L Biffle, Victor Khimenko, Egor Pasko, Bennet Yee, Karl Schimpf, and Brad Chen. Adapting software fault isolation to contemporary CPU architectures. 2010.Google Scholar
- Livio Soares and Michael Stumm. FlexSC: flexible system call scheduling with exception-less system calls. In OSDI, pages 1--8, 2010.Google Scholar
- Yifeng Sun and Tzi-cker Chiueh. SIDE: Isolated and efficient execution of unmodified device drivers. In 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pages 1--12. IEEE, 2013.Google ScholarDigital Library
- Michael M Swift, Steven Martin, Henry M Levy, and Susan J Eggers. Nooks: An architecture for reliable device drivers. In Proceedings of the 10th workshop on ACM SIGOPS European workshop, pages 102--107. ACM, 2002.Google ScholarDigital Library
- Hajime Tazaki. An introduction of library operating system for Linux (LibOS). https://lwn.net/Articles/637658/.Google Scholar
- Cheng-Chun Tu, Michael Ferdman, Chao-tung Lee, and Tzi-cker Chiueh. A comprehensive implementation and evaluation of direct interrupt delivery. In Acm Sigplan Notices, volume 50, pages 1--15. ACM, 2015.Google ScholarDigital Library
- Anjo Vahldiek-Oberwagner, Eslam Elnikety, Nuno O Duarte, Michael Sammler, Peter Druschel, and Deepak Garg. ERIM: Secure, efficient in-process isolation with protection keys (MPK). In 28th USENIX Security Symposium (USENIX Security 19), pages 1221--1238, 2019.Google Scholar
- Arjan van de Ven. New Security Enhancements in Red Hat Enterprise Linux v.8, update 3. https://static.redhat.com/legacy/f/pdf/rhel/WHP0006US_Execshield.pdf.Google Scholar
- Kevin Thomas Van Maren. The Fluke device driver framework. Master's thesis, The University of Utah, 1999.Google Scholar
- David A. Wagner. Janus: An approach for confinement of untrusted applications. Technical report, Berkeley, CA, USA, 1999.Google Scholar
- Robert Wahbe, Steven Lucco, Thomas E. Anderson, and Susan L. Graham. Efficient software-based fault isolation. In Proceedings of the Fourteenth ACM Symposium on Operating Systems Principles, SOSP '93, pages 203--216, New York, NY, USA, 1993. ACM.Google ScholarDigital Library
- Dan Williams, Patrick Reynolds, Kevin Walsh, Emin Gün Sirer, and Fred B. Schneider. Device driver safety through a reference validation mechanism. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI'08, pages 241--254, Berkeley, CA, USA, 2008. USENIX Association.Google ScholarDigital Library
- Wei Wu, Yueqi Chen, Jun Xu, Xinyu Xing, Xiaorui Gong, and Wei Zou. FUZE: Towards facilitating exploit generation for kernel use-after-free vulnerabilities. In 27th Usenix Security Symposium, 2018.Google Scholar
- Bennet Yee, David Sehr, Gregory Dardyk, J Bradley Chen, Robert Muth, Tavis Ormandy, Shiki Okasaka, Neha Narula, and Nicholas Fullagar. Native client: A sandbox for portable, untrusted x86 native code. In 30th IEEE Symposium on Security and Privacy, pages 79--93, 2009.Google Scholar
- Lightweight kernel isolation with virtualization and VM functions
Recommendations
A Virtualization Architecture for In-Depth Kernel Isolation
HICSS '10: Proceedings of the 2010 43rd Hawaii International Conference on System SciencesRecent advances in virtualization technologies have sparked a renewed interest in the use of kernel and process virtualization as a security mechanism to enforce resource isolation and management. Unfortunately, virtualization solutions incur ...
Quantifying the performance isolation properties of virtualization systems
ExpCS '07: Proceedings of the 2007 workshop on Experimental computer scienceIn this paper, we present the design of a performance isolation benchmark that quantifies the degree to which a virtualization system limits the impact of a misbehaving virtual machine on other well-behaving virtual machines running on the same physical ...
Quantifying the performance isolation properties of virtualization systems
ecs'07: Experimental computer science on Experimental computer scienceIn recent years, there have been a number of papers comparing the performance of different virtualization environments for x86. These comparisons have typically quantified the overhead of virtualization for one VM compared to a base OS. It has also been ...
Comments