ABSTRACT
We report on the fourth in a series of studies on the reliability of application programs in the face of random input. Over the previous 15 years, we have studied the reliability of UNIX command line and X-Window based (GUI) applications and Windows applications. In this study, we apply our fuzz testing techniques to applications running on the Mac OS X operating system. We continue to use a simple, or even simplistic technique: unstructured black-box random testing, considering a failure to be a crash or hang. As in the previous three studies, the technique is crude but seems to be effective in locating bugs in real programs.We tested the reliability of 135 command-line UNIX utilities and thirty graphical applications on Mac OS X by feeding random input to each. We report on application failures -- crashes (dumps core) or hangs (loops indefinitely) -- and, where source code is available, we identify the causes of these failures and categorize them.Our testing crashed only 7% of the command-line utilities, a considerably lower rate of failure than observed in almost all cases of previous studies. We found the GUI-based applications to be less reliable: of the thirty that we tested, only eight did not crash or hang. Twenty others crashed, and two hung. These GUI results were noticeably worse than either of the previous Windows (Win32) or UNIX (X-Windows) studies.
- D. Aitel, "The Advantages of Block-Based Protocol Analysis for Security Testing", Immunity Inc., February 2002. http://www.immunitysec.com/downloads/advantages_of_block_based_analysis.htmlGoogle Scholar
- Apple Computer, May 2006, http://developer.apple.com/documentation/Cocoa/Conceptual/Coco aFundamentals/WhatIsCocoa/chapter_2_section_6.html.Google Scholar
- G. J. Carrette, "CRASHME: Random Input Testing", http://people.delphi.com/gjc/crashme.html, 1996.Google Scholar
- J. W. Duran and S. C. Ntafos, "An Evaluation of Random Testing", IEEE Transactions on Software EngineeringSE-10, 4, July 1984, pp. 438--444.Google ScholarDigital Library
- J. E. Forrester and B. P. Miller, "An Empirical Study of the Robustness of Windows NT Applications Using Random Testing", 4th USENIX Windows Systems Symposium, Seattle, August 2000. Appears (in German translation) as "Empirische Studie zur Stabilität von NT-Anwendungen", iX, September 2000. Google ScholarDigital Library
- S. Garfinkel and G. Spafford, Practical UNIX & Internet Security, O'Reilly & Associates, 1996. Google ScholarDigital Library
- A. Ghosh, V. Shah and M. Schmid, "Testing the Robustness of Windows NT Software", 1998 International Symposium on Software Reliability Engineering (ISSRE'98), Paderborn, Germany, November 1998. Google ScholarDigital Library
- A. Ghosh, V. Shah and M. Schmid, "An Approach for Analyzing the Robustness of Windows NT Software", 21st National Information Systems Security Conference, Crystal City, VA, October 1998.Google Scholar
- A. Hertzfeld, Revolution in the Valley, O'Reilly Media, Inc., Sebastopol, CA, 2004, pp. 184--185. Google ScholarDigital Library
- S. Marquis, T. Dean, S. Knight, "SCL: a Language for Security Testing of Network Applications", 2005 Conference of the Centre for Advanced Studies on Collaborative Research, Toronto, October 2005. Google ScholarDigital Library
- W. McKeeman, "Differential Testing for Software", Digital Technical Journal, Digital Equipment Corporation 10, 1, December 1998.Google Scholar
- G. Myers, The Art of Software Testing, Wiley Publishing, New York, 1979. Google ScholarDigital Library
- Microsoft Corporation, "Security and Reliability Strategies", http://www.microsoft.com/whdc/driver/security/, 2006.Google Scholar
- B. P. Miller, D. Koski, C. P. Lee, V. Maganty, R. Murthy, A. Natarajan, J. Steidl, "Fuzz Revisited: A Re-examination of the Reliability of UNIX Utilities and Services", University of Wisconsin-Madison, 1995. Appears (in German translation) as "Empirische Studie zur Zuverlasskeit von UNIX-Utilities: Nichts dazu Gerlernt", iX, September 1995. ftp://grilled.cs.wisc.edu/technical_papers/fuzz-revisted.pdf.Google Scholar
- B. P. Miller, L. Fredriksen, B. So, "An Empirical Study of the Reliability of UNIX Utilities", Communications of the ACM 33, 12, December 1990, pp. 32--44. Also appears in German translation as "Fatale Fehlerträchtigkeit: Eine Empirische Studie zur Zuverlassigkeit von UNIX-Utilities", iX (March 1991). ftp://grilled.cs.wisc.edu/technical_papers/fuzz.pdf. Google ScholarDigital Library
- E. Sirer and B. Bershad, "Using Production Grammars in Software Testing", Symposium on Domain-Specific Languages, Austin, TX, October 1999. Google ScholarDigital Library
- D. Wood, G. Gibson, and R. Katz, "Verifying a Multiprocessor Cache Controller Using Random Case Generation", Computer Science Tech report UCB/CSD-89-490, University of California, Berkeley, 1989. Google ScholarDigital Library
- S. Xiao, L. Deng, S. Li and X. Wang, "Integrated TCP/IP Protocol Software Testing for Vulnerability Detection", 2003 International Conference on Computer Networks and Mobile Computing, Shanghai, October 2003. Google ScholarDigital Library
Index Terms
- An empirical study of the robustness of MacOS applications using random testing
Recommendations
An empirical study of the robustness of MacOS applications using random testing
We report on the fourth in a series of studies on the reliability of application programs in the face of random input. Over the previous 15 years, we have studied the reliability of UNIX command line and X-Window based (GUI) applications and Windows ...
An empirical study of the robustness of Windows NT applications using random testing
WSS'00: Proceedings of the 4th conference on USENIX Windows Systems Symposium - Volume 4We report on the third in a series of studies on the reliability of application programs in the face of random input. In 1990 and 1995, we studied the reliability of UNIX application programs, both command line and X-Window based (GUI). In this study, ...
Comparison of adaptive random testing and random testing under various testing and debugging scenarios
Adaptive random testing is an enhancement of random testing. Previous studies on adaptive random testing assumed that once a failure is detected, testing is terminated and debugging is conducted immediately. It has been shown that adaptive random ...
Comments