ABSTRACT
Process races occur when multiple processes access shared operating system resources, such as files, without proper synchronization. We present the first study of real process races and the first system designed to detect them. Our study of hundreds of applications shows that process races are numerous, difficult to debug, and a real threat to reliability. To address this problem, we created RacePro, a system for automatically detecting these races. RacePro checks deployed systems in-vivo by recording live executions then deterministically replaying and checking them later. This approach increases checking coverage beyond the configurations or executions covered by software vendors or beta testing sites. RacePro records multiple processes, detects races in the recording among system calls that may concurrently access shared kernel objects, then tries different execution orderings of such system calls to determine which races are harmful and result in failures. To simplify race detection, RacePro models under-specified system calls based on load and store micro-operations. To reduce false positives and negatives, RacePro uses a replay and go-live mechanism to distill harmful races from benign ones. We have implemented RacePro in Linux, shown that it imposes only modest recording overhead, and used it to detect a number of previously unknown bugs in real applications caused by process races.
- All resource races studied. http://rcs.cs.columbia.edu/projects/racepro/.Google Scholar
- Launchpad Software Collaboration Platform. https://launchpad.net/.Google Scholar
- The Debian Almquist Shell. http://gondor.apana.org.au/~herbert/dash/.Google Scholar
- Upstart: an Event-Based Replacement for System V Init Scripts. http://upstart.ubuntu.com/.Google Scholar
- A. Aviram, S.-C. Weng, S. Hu, and B. Ford. Efficient System-Enforced Deterministic Parallelism. In Proceedings of the 9th Symposium on Operating Systems Design and Implementation (OSDI '10), Oct. 2010. Google ScholarDigital Library
- T. Bergan, N. Hunt, L. Ceze, and S. D. Gribble. Deterministic Process Groups in dOS. In Proceedings of the 9th Symposium on Operating Systems Design and Implementation (OSDI '10), Oct. 2010. Google ScholarDigital Library
- J. Chow, T. Garfinkel, and P. M. Chen. Decoupling Dynamic Program Analysis from Execution in Virtual Environments. In Proceedings of the USENIX Annual Technical Conference (USENIX '08), June 2008. Google ScholarDigital Library
- M. Chu, C. Murphy, and G. Kaiser. Distributed In Vivo Testing of Software Applications. In Proceedings of the First IEEE International Conference on Software Testing, Verification, and Validation (ICST '08), Apr. 2008. Google ScholarDigital Library
- H. Cui, J. Wu, C.-C. Tsai, and J. Yang. Stable Deterministic Multithreading through Schedule Memoization. In Proceedings of the 9th Symposium on Operating Systems Design and Implementation (OSDI '10), Oct. 2010. Google ScholarDigital Library
- G. W. Dunlap, D. G. Lucchetti, M. A. Fetterman, and P. M. Chen. Execution Replay of Multiprocessor Virtual Machines. In Proceedings of the 4th International Conference on Virtual Execution Environments (VEE '08), Mar. 2008. Google ScholarDigital Library
- D. Engler and K. Ashcraft. RacerX: Effective, Static Detection of Race Conditions and Deadlocks. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP '03), Oct. 2003. Google ScholarDigital Library
- C. Flanagan and P. Godefroid. Dynamic Partial-Order Reduction for Model Checking Software. In Proceedings of the 32nd Annual Symposium on Principles of Programming Languages (POPL '05), Jan. 2005. Google ScholarDigital Library
- P. Fonseca, C. Li, and R. Rodrigues. Finding Complex Concurrency Bugs in Large Multi-Threaded Applications. In Proceedings of the 6th ACM European Conference on Computer Systems (EUROSYS '11), Apr. 2011. Google ScholarDigital Library
- Q. Gao, W. Zhang, Z. Chen, M. Zheng, and F. Qin. 2ndStrike: Towards Manifesting Hidden Concurrency Typestate Bugs. In Proceedings of the 16th International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS '11), Mar. 2011. Google ScholarDigital Library
- Z. Guo, X. Wang, J. Tang, X. Liu, Z. Xu, M. Wu, M. F. Kaashoek, and Z. Zhang. R2: An Application-Level Kernel for Record and Replay. In Proceedings of the 8th Symposium on Operating Systems Design and Implementation (OSDI '08), Dec. 2008. Google ScholarDigital Library
- O. Laadan, C.-C. Tsai, N. Viennot, C. Blinn, P. S. Du, J. Yang, and J. Nieh. Finding Concurrency Errors in Sequential Code---OS-level, In-vivo Model Checking of Process Races. In Proceedings of the 13th USENIX Workshop on Hot Topics in Operating Systems (HOTOS '11), May 2011. Google ScholarDigital Library
- O. Laadan, N. Viennot, and J. Nieh. Transparent, Lightweight Application Execution Replay on Commodity Multiprocessor Operating Systems. In Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (SIGMETRICS '10), June 2010. Google ScholarDigital Library
- L. Lamport. Time, Clocks, and the Ordering of Events in a Distributed System. Comm. ACM, 21(7):558--565, 1978. Google ScholarDigital Library
- T. J. LeBlanc and J. M. Mellor-Crummey. Debugging Parallel Programs with Instant Replay. IEEE Trans. Comput., 36(4):471--482, 1987. Google ScholarDigital Library
- S. Lu, S. Park, E. Seo, and Y. Zhou. Learning from Mistakes: a Comprehensive Study on Real World Concurrency Bug Characteristics. In Proceedings of the 13th International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS '08), Mar. 2008. Google ScholarDigital Library
- S. Lu, J. Tucek, F. Qin, and Y. Zhou. AVIO: Detecting Atomicity Violations via Access Interleaving Invariants. In Proceedings of the 12th International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS '06), Oct. 2006. Google ScholarDigital Library
- F. Mattern. Dynamic Partial-Order Reduction for Model Checking Software. In Proceedings of the 32nd Annual Symposium on Principles of Programming Languages (POPL '05), Oct. 1988.Google Scholar
- M. Musuvathi, S. Qadeer, T. Ball, G. Basler, P. A. Nainar, and I. Neamtiu. Finding and Reproducing Heisenbugs in Concurrent Programs. In Proceedings of the 8th Symposium on Operating Systems Design and Implementation (OSDI '08), Dec. 2008. Google ScholarDigital Library
- M. Naik, A. Aiken, and J. Whaley. Effective Static Race Detection For Java. In Proceedings of the ACM SIGPLAN 2006 Conference on Programming Language Design and Implementation (PLDI '06), 2006. Google ScholarDigital Library
- S. Narayanasamy, Z. Wang, J. Tigani, A. Edwards, and B. Calder. Automatically Classifying Benign and Harmful Data Racesallusing Replay Analysis. In Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation (PLDI '07), June 2007. Google ScholarDigital Library
- E. B. Nightingale, D. Peek, P. M. Chen, and J. Flinn. Parallelizing Security Checks on Commodity Hardware. In Proceedings of the 13th International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS '08), Mar. 2008. Google ScholarDigital Library
- S. Osman, D. Subhraveti, G. Su, and J. Nieh. The Design and Implementation of Zap: A System for Migrating Computing Environments. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI '02), Dec. 2002. Google ScholarDigital Library
- D. E. Porter, O. S. Hofmann, C. J. Rossbach, A. Benn, and E. Witchel. Operating System Transactions. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP '09), Oct. 2009. Google ScholarDigital Library
- D. P. Quigley, J. Sipek, C. P. Wright, and E. Zadok. UnionFS: User- and Community-oriented Development of a Unification Filesystem. In Proceedings of the 2006 Linux Symposium, July 2006.Google Scholar
- K. Sen. Race Directed Random Testing of Concurrent Programs. In Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation (PLDI '08), June 2008. Google ScholarDigital Library
- S. M. Srinivasan, S. Kandula, C. R. Andrews, and Y. Zhou. Flashback: A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging. In Proceedings of the USENIX Annual Technical Conference (USENIX '04), June 2004. Google ScholarDigital Library
- D. Tsafrir, T. Hertz, D. Wagner, and D. Da Silva. Portably Solving File TOCTTOU Races with Hardness Amplification. In Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST '08), Feb. 2008. Google ScholarDigital Library
- E. Tsyrklevich and B. Yee. Dynamic Detection and Prevention of Race Conditions in File Accesses. In Proceedings of the 12th Conference on USENIX Security Symposium, Aug. 2003. Google ScholarDigital Library
- University of California at Berkeley. Open-Source Software for Volunteer Computing and Grid Computing. http://boinc.berkeley.edu/.Google Scholar
- J. Wei and C. Pu. TOCTTOU Vulnerabilities in UNIX-Style File Systems: an Anatomical Study. In Proceedings of the 4th USENIX Conference on File and Storage Technologies (FAST '05), Dec. 2005. Google ScholarDigital Library
- J. Wu, H. Cui, and J. Yang. Bypassing Races in Live Applications with Execution Filters. In Proceedings of the 9th Symposium on Operating Systems Design and Implementation (OSDI '10), Oct. 2010. Google ScholarDigital Library
- M. Yabandeh, N. Knezevic, D. Kostic, and V. Kuncak. CrystalBall: Predicting and Preventing Inconsistencies in Deployed Distributed Systems. In Proceedings of the 6th Symposium on Networked Systems Design and Implementation (NSDI '09), Apr. 2009. Google ScholarDigital Library
- Y. Yu, T. Rodeheffer, and W. Chen. RaceTrack: Efficient Detection of Data Race Conditions via Adaptive Tracking. In Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP '05), Oct. 2005. Google ScholarDigital Library
- M. Zalewski. Delivering Signals for Fun and Profit. Bindview Corporation, 2001.Google Scholar
- W. Zhang, J. Lim, R. Olichandran, J. Scherpelz, G. Jin, S. Lu, and T. Reps. ConSeq: Detecting Concurrency Bugs through Sequential Errors. In Proceedings of the 16th International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS '11), Mar. 2011. Google ScholarDigital Library
Index Terms
- Pervasive detection of process races in deployed systems
Recommendations
TRADE: Precise Dynamic Race Detection for Scalable Transactional Memory Systems
As other multithreaded programs, transactional memory (TM) programs are prone to race conditions. Previous work focuses on extending existing definitions of data race for lock-based applications to TM applications, which requires all transactions to be ...
Detecting harmful data races through parallel verification
Data races widely exist in concurrent programs and the harmful races have caused severe failures. To detect the harmful races, previous tools verify all the races, identifying the harmful ones. However, efficiency is affected when there are a large ...
On-the-fly race detection in multi-threaded programs
PADTAD '08: Proceedings of the 6th workshop on Parallel and distributed systems: testing, analysis, and debuggingMulti-core chips enable parallel processing for general purpose applications. Unfortunately, parallel programs may contain synchronization defects. Such defects are difficult to detect due to nondeterministic interleavings of parallel threads. Current ...
Comments