skip to main content
10.1145/2043556.2043572acmconferencesArticle/Chapter ViewAbstractPublication PagessospConference Proceedingsconference-collections
research-article

An empirical study on configuration errors in commercial and open source systems

Published:23 October 2011Publication History

ABSTRACT

Configuration errors (i.e., misconfigurations) are among the dominant causes of system failures. Their importance has inspired many research efforts on detecting, diagnosing, and fixing misconfigurations; such research would benefit greatly from a real-world characteristic study on misconfigurations. Unfortunately, few such studies have been conducted in the past, primarily because historical misconfigurations usually have not been recorded rigorously in databases.

In this work, we undertake one of the first attempts to conduct a real-world misconfiguration characteristic study. We study a total of 546 real world misconfigurations, including 309 misconfigurations from a commercial storage system deployed at thousands of customers, and 237 from four widely used open source systems (CentOS, MySQL, Apache HTTP Server, and OpenLDAP). Some of our major findings include: (1) A majority of misconfigurations (70.0%~85.5%) are due to mistakes in setting configuration parameters; however, a significant number of misconfigurations are due to compatibility issues or component configurations (i.e., not parameter-related). (2) 38.1%~53.7% of parameter mistakes are caused by illegal parameters that clearly violate some format or rules, motivating the use of an automatic configuration checker to detect these misconfigurations. (3) A significant percentage (12.2%~29.7%) of parameter-based mistakes are due to inconsistencies between different parameter values. (4) 21.7%~57.3% of the misconfigurations involve configurations external to the examined system, some even on entirely different hosts. (5) A significant portion of misconfigurations can cause hard-to-diagnose failures, such as crashes, hangs, or severe performance degradation, indicating that systems should be better-equipped to handle misconfigurations.

References

  1. P. Anderson, P. Goldsack, and J. Paterson. SmartFrog meets LCFG Autonomous Reconfiguration with Central Policy Control. In LISA, August 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Attariyan and J. Flinn. Using causality to diagnose configuration bugs. In USENIX, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Attariyan and J. Flinn. Automating configuration troubleshooting with dynamic information flow analysis. In OSDI, October 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. B. Brown and D. A. Patterson. Undo for Operators: Building an Undoable E-mail Store. In USENIX, June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Chou, J. Yang, B. Chelf, S. Hallem, and D. Engler. An empirical study of operating systems errors. In SOSP'01. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. CircleID. Misconfiguration brings down entire se domain in sweden. www.circleid.com/posts/misconfiguration_brings_down_entire_se_domain_in_sweden/.Google ScholarGoogle Scholar
  7. O. Crameri, N. Knezević, D. Kostić, R. Bianchini, and W. Zwaenepoel. Staged Deployment in Mirage, an Integrated Software Upgrade Testing and Distribution System. In SOSP'07, October 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Debian. The Debian GNU/Linux FAQ, Chapter 8: The Debian Package Management Tools. http://www.debian.org/doc/FAQ/ch-pkgtools.en.html.Google ScholarGoogle Scholar
  9. N. Feamster and H. Balakrishnan. Detecting BGP configuration faults with static analysis. In NSDI, May 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Freedman, R. Pisani, and R. Purves. Statistics, 3rd Edition. W. W. Norton & Company., 1997.Google ScholarGoogle Scholar
  11. J. Gray. Why do computers stop and what can be done about it? In Symp. on Reliability in Distributed Software and Database Systems, 1986.Google ScholarGoogle Scholar
  12. J. Ha, C. J. Rossbach, J. V. Davis, I. Roy, H. E. Ramadan, D. E. Porter, D. L. Chen, and E. Witchel. Improved Error Reporting for Software that Uses Black-Box Components. In PLDI, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Hewlett-Packard. HP Storage Essentials SRM Software Suite. http://h18000.www1.hp.com/products/quickspecs/12191_na/12191_na.pdf.Google ScholarGoogle Scholar
  14. IBM Corp. IBM Tivoli Software. http://www-01.ibm.com/software/tivoli/.Google ScholarGoogle Scholar
  15. R. Johnson. More details on today's outage. http://www.facebook.com/notes/facebook-engineering/more-details-on-todays-outage/431441338919.Google ScholarGoogle Scholar
  16. A. Kappor. Web-to-host: Reducing total cost of ownership. In Technical Report 200503, The Tolly Group, May 2000.Google ScholarGoogle Scholar
  17. L. Keller, P. Upadhyaya, and G. Candea. ConfErr: A Tool for Assessing Resilience to Human Configuration Errors. In DSN, June 2008.Google ScholarGoogle ScholarCross RefCross Ref
  18. N. Kushman and D. Katabi. Enabling Configuration-Independent Automation by Non-Expert Users. In OSDI, October 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Lu, S. Park, E. Seo, and Y. Zhou. Learning from mistakes -- a comprehensive study on real world concurrency bug characteristics. In ASPLOS, March 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. A. Maxion and R. W. Reeder. Improving user-interface dependability through mitigation of human error. International Journal of Human-Computer Studies, 63, July 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Microsoft Corp. Microsoft Baseline Security Analyzer. 2008. http://www.microsoft.com/technet/security/tools/MBSAHome.mspx.Google ScholarGoogle Scholar
  22. B. Murphy and T. Gent. Measuring system and software reliability using an automated data collection process. In Quality and Reliability Engineering International, 11(5), 1995.Google ScholarGoogle Scholar
  23. K. Nagaraja, F. Oliveira, R. Bianchini, R. P. Martin, and T. D. Nguyen. Understanding and Dealing with Operator Mistakes in Internet Services. In OSDI'04, October 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. NetApp, Inc. Proactive Health Management with AutoSupport. http://media.netapp.com/documents/wp-7027.pdf.Google ScholarGoogle Scholar
  25. NetApp, Inc. Protection Manager. http://www.netapp.com/us/products/management-software/protection.html.Google ScholarGoogle Scholar
  26. NetApp, Inc. Provisioning Manager. http://www.netapp.com/us/products/management-software/provisioning.html.Google ScholarGoogle Scholar
  27. F. Oliveira, K. Nagaraja, R. Bachwani, R. Bianchini, R. P. Martin, and T. D. Nguyen. Understanding and Validating Database System Administration. In USENIX'06, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. F. Oliveira, A. Tjang, R. Bianchini, R. P. Martin, and T. D. Nguyen. Barricade: Defending Systems Against Operator Mistakes. In EuroSys'10, April 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. Oppenheimer, A. Ganapathi, and D. A. Patterson. Why do Internet services fail, and what can be done about it? In Proceedings of the 4th USENIX Symposium on Internet Technologies and Systems (USITS), March 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. D. Patterson, A. Brown, P. Broadwell, G. Candea, M. Chen, J. Cutler, P. Enriquez, A. Fox, E. Kiciman, M. Merzbacher, D. Oppenheimer, N. Sastry, W. Tetzlaff, J. Traupman, and N. Treuhaft. Recovery Oriented Computing (ROC): Motivation, Definition, Techniques, and Case Studies. In Technical Report UCB//CSD-02-1175, University of California, Berkeley, March 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. A. Rabkin and R. Katz. Static Extraction of Program Configuration Options. In ICSE, May 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. V. Ramachandran, M. Gupta, M. Sethi, and S. R. Chowdhury. Determining Configuration Parameter Dependencies via Analysis of Configuration Data from Multi-tiered Enterprise Applications. In ICAC, June 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. E. Reisner, C. Song, K.-K. Ma, J. S. Foster, and A. Porter. Using symbolic evaluation to understand behavior in configurable software systems. In ICSE, May 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. RPM. Rpm package manager (rpm). http://rpm.org/.Google ScholarGoogle Scholar
  35. Y.-Y. Su, M. Attariyan, and J. Flinn. AutoBash: improving configuration management with operating system causality analysis. In SOSP, October 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. M. Sullivan and R. Chillarege. Software defects and their impact on system availability: A study of field failures in operating systems. In FTCS, 1991.Google ScholarGoogle ScholarCross RefCross Ref
  37. M. Sullivan and R. Chillarege. A comparison of software defects in database management systems and operating systems. In International Symposium on Fault-Tolerant Computing, 1992.Google ScholarGoogle ScholarCross RefCross Ref
  38. H. J. Wang, J. C. Platt, Y. Chen, R. Zhang, and Y.-M. Wang. Automatic Misconfiguration Troubleshooting with PeerPressure. In OSDI'04, October 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. R. Wang, X. Wang, K. Zhang, and Z. li. Towards Automatic Reverse Engineering of Software Security Configurations. In CCS, October 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Y.-M. Wang, C. Verbowski, J. Dunagan, Y. Chen, H. J. Wang, C. Yuan, and Z. Zhang. STRIDER: A Black-box, State-based Approach to Change and Configuration Management and Support. In LISA'03, October 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. A. Whitaker, R. S. Cox, and S. D. Gribble. Configuration Debugging as Search: Finding the Needle in the Haystack. In OSDI, October 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. C. Yuan, N. Lao, J.-R. Wen, J. Li, Z. Zhang, Y.-M. Wang, and W.-Y. Ma. Automated Known Problem Diagnosis with Event Traces. In EuroSys, April 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. D. Yuan, J. Zheng, S. Park, Y. Zhou, and S. Savage. Improving Software Diagnosability via Log Enhancement. In ASPLOS, March 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. W. Zheng, R. Bianchini, and T. D. Nguyen. Automatic Configuration of Internet Services. In EuroSys, March 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. An empirical study on configuration errors in commercial and open source systems

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SOSP '11: Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
        October 2011
        417 pages
        ISBN:9781450309776
        DOI:10.1145/2043556

        Copyright © 2011 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 23 October 2011

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate131of716submissions,18%

        Upcoming Conference

        SOSP '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader