skip to main content
research-article
Open Access

Synthesizing configuration file specifications with association rule learning

Published:12 October 2017Publication History
Skip Abstract Section

Abstract

System failures resulting from configuration errors are one of the major reasons for the compromised reliability of today's software systems. Although many techniques have been proposed for configuration error detection, these approaches can generally only be applied after an error has occurred. Proactively verifying configuration files is a challenging problem, because 1) software configurations are typically written in poorly structured and untyped “languages”, and 2) specifying rules for configuration verification is challenging in practice. This paper presents ConfigV, a verification framework for general software configurations. Our framework works as follows: in the pre-processing stage, we first automatically derive a specification. Once we have a specification, we check if a given configuration file adheres to that specification. The process of learning a specification works through three steps. First, ConfigV parses a training set of configuration files (not necessarily all correct) into a well-structured and probabilistically-typed intermediate representation. Second, based on the association rule learning algorithm, ConfigV learns rules from these intermediate representations. These rules establish relationships between the keywords appearing in the files. Finally, ConfigV employs rule graph analysis to refine the resulting rules. ConfigV is capable of detecting various configuration errors, including ordering errors, integer correlation errors, type errors, and missing entry errors. We evaluated ConfigV by verifying public configuration files on GitHub, and we show that ConfigV can detect known configuration errors in these files.

References

  1. 2017a. Aymargeddon. https://raw.githubusercontent.com/bennibaermann/Aymargeddon/ b85d23c0690b1c6a48a045ea45f4c8b19b036fa5/var/my.cnf . (March 2017).Google ScholarGoogle Scholar
  2. 2017a. container. https://www.dropbox.com/s/5alc0zs0qp5i529/ybh8r3n2avj7sqd1rcmx0orzry23bopl.cnf ?dl=0 . (March 2017).Google ScholarGoogle Scholar
  3. 2017a. containerization. https://raw.githubusercontent.com/billycyzhang/containerization/ 78c6e8fefbafb89de8c28296e83a2f6fefe03879/enterprise- images/mariadb/my.cnf . (March 2017).Google ScholarGoogle Scholar
  4. 2017a. evansims. https://raw.githubusercontent.com/evansims/scripts/715e4f4519bbff8bab5ab26a15256d79796c923a/config/ mysql/my- 2gb.cnf . (March 2017).Google ScholarGoogle Scholar
  5. 2017b. evansims-script. https://raw.githubusercontent.com/evansims/scripts/715e4f4519bbff8bab5ab26a15256d79796c923a/ config/mysql/my- 1gb.cnf . (March 2017).Google ScholarGoogle Scholar
  6. 2017. Fatal Error: Cannot allocate memory for the buffer pool. http://dba.stackexchange.com/questions/25165/ intermittent- mysql- crashes- with- error- fatal- error- cannot- allocate- memory- for- t . (March 2017).Google ScholarGoogle Scholar
  7. 2017. Fine-grained value correlation error. (March 2017). http://serverfault.com/questions/628414/ my- cnf- configuration- in- mysql- 5- 6- x .Google ScholarGoogle Scholar
  8. 2017b. isucon2-summer-ruby. https://raw.githubusercontent.com/co- me/isucon2- summer- ruby/ 1f633384f485fb7282bbbf42f2bf5d18410f7307/config/database/my.cnf . (March 2017).Google ScholarGoogle Scholar
  9. 2017b. mini-2011. https://raw.githubusercontent.com/funtoo/experimental- mini- 2011/ 083598863a7c9659f188d31e15b39e3af0f56cab/dev- db/mysql/files/my.cnf . (March 2017).Google ScholarGoogle Scholar
  10. 2017c. mysetup. https://raw.githubusercontent.com/kazeburo/mysetup/99ba8656f54b1b36f4a7c93941e113adc2f05f70/mysql/ my55.cnf . (March 2017).Google ScholarGoogle Scholar
  11. 2017c. PHP CLI Segmentation Fault with pgsql. http://linux.m2osw.com/php_cli_segmentation_fault_with_pgsql . (March 2017).Google ScholarGoogle Scholar
  12. 2017b. puppet. https://raw.githubusercontent.com/a2o/puppet- modules- a2o- essential/ 9e48057cc1320de52548ff019352299bc4bd5069/modules/a2o_essential_linux_mysql/files/my.cnf . (March 2017).Google ScholarGoogle Scholar
  13. 2017. Stack Overflow. http://stackoverflow.com/ . (March 2017).Google ScholarGoogle Scholar
  14. 2017c. Stats-analysis. https://raw.githubusercontent.com/NCIP/stats- analysis/ec7a1a15b0a5a7518a061aedd2d601ea7cc2dfca/ cacoresdk203.2.1/conf/download/my.cnf . (March 2017).Google ScholarGoogle Scholar
  15. 2017a. Stats-analysis. https://raw.githubusercontent.com/NCIP/stats- analysis/ec7a1a15b0a5a7518a061aedd2d601ea7cc2dfca/ cacoresdk203.2.1/conf/download/my.cnf . (March 2017).Google ScholarGoogle Scholar
  16. 2017. The issue for slow query log. http://forum.directadmin.com/showthread.php?t=47547 . (March 2017).Google ScholarGoogle Scholar
  17. 2017d. Type Error Example. https://github.com/thekad/puppet- module- mysql/blob/master/templates/my.cnf.erb . (March 2017).Google ScholarGoogle Scholar
  18. 2017b. vit-analysis. https://www.dropbox.com/s/09joln8kacu9ceq/ekqjat6m1j5nv9ihjhua9q89sid77cso.cnf ?dl=00 . (March 2017).Google ScholarGoogle Scholar
  19. 2017c. vitroot. https://raw.githubusercontent.com/vitroot/configs/90441204dbae37521912eaaeedd3574db07b8ae4/my.cnf . (March 2017).Google ScholarGoogle Scholar
  20. 2017d. vitroot2. https://www.dropbox.com/s/qcfmsx12i4pjjtd/missing.cnf ?dl=0 . (March 2017).Google ScholarGoogle Scholar
  21. 2017c. vps. https://raw.githubusercontent.com/rarescosma/vps/7d0b898bb30eecac65158f704b43bb4d1ca06dbe/_config/ mysql/my.cnf . (March 2017).Google ScholarGoogle Scholar
  22. Rakesh Agrawal, Tomasz Imieliński, and Arun Swami. 1993. Mining association rules between sets of items in large databases. In Acm sigmod record, Vol. 22. ACM, 207–216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Mona Attariyan, Michael Chow, and Jason Flinn. 2012. X-ray: Automating root-cause diagnosis of performance anomalies in production software. In 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI).Google ScholarGoogle Scholar
  24. Mona Attariyan and Jason Flinn. 2010. Automating configuration troubleshooting with dynamic information flow analysis. In 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI).Google ScholarGoogle Scholar
  25. Jay Ayres, Jason Flannick, Johannes Gehrke, and Tomi Yiu. 2002. Sequential pattern mining using a bitmap representation. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 429–435. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. François Bobot, Jean-Christophe Filliâtre, Claude Marché, and Andrei Paskevich. 2015. Let’s verify this with Why3. STTT 17, 6 (2015), 709–727. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Ali Breland. 2017. FCC: Over 12,000 callers couldnâĂŹt reach 911 during AT&T outage. http://thehill.com/policy/technology/325510-over-12000-callers-couldnt-reach-911-during-att-outage. (March 2017).Google ScholarGoogle Scholar
  28. Xu Chen, Yun Mao, Zhuoqing Morley Mao, and Jacobus E. van der Merwe. 2010. Declarative configuration management for complex and dynamic networks. In ACM CoNEXT (CoNEXT). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. William Enck, Patrick Drew McDaniel, Subhabrata Sen, Panagiotis Sebos, Sylke Spoerel, Albert G. Greenberg, Sanjay G. Rao, and William Aiello. 2007. Configuration Management at Massive Scale: System Design and Experience. In USENIX Annual Technical Conference (USENIX ATC).Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Jiawei Han, Hong Cheng, Dong Xin, and Xifeng Yan. 2007. Frequent pattern mining: current status and future directions. Data Mining and Knowledge Discovery 15, 1 (2007), 55–86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Peng Huang, William J. Bolosky, Abhishek Singh, and Yuanyuan Zhou. 2015. Conf Valley: A systematic configuration validation framework for cloud services. In 10th European Conference on Computer Systems (EuroSys).Google ScholarGoogle Scholar
  32. Andrei Nikolaevich Kolmogorov. 1965. Three approaches to the definition of the concept âĂIJquantity of informationâĂİ. Problemy peredachi informatsii 1, 1 (1965), 3–11.Google ScholarGoogle Scholar
  33. Pat Langley and Herbert A Simon. 1995. Applications of machine learning and rule induction. Commun. ACM 38, 11 (1995), 54–64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Tao Lei, Regina Barzilay, and Tommi Jaakkola. 2016. Rationalizing neural predictions. arXiv preprint arXiv:1606.04155 (2016).Google ScholarGoogle Scholar
  35. K. Rustan M. Leino. 2010. Dafny: An Automatic Program Verifier for Functional Correctness. In Logic for Programming, Artificial Intelligence, and Reasoning - 16th International Conference, LPAR-16. 348–370.Google ScholarGoogle Scholar
  36. Boon Thau Loo, Joseph M. Hellerstein, Ion Stoica, and Raghu Ramakrishnan. 2005. Declarative routing: Extensible routing with declarative queries. In ACM SIGCOMM (SIGCOMM). Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Nizar R Mabroukeh and Christie I Ezeife. 2010. A taxonomy of sequential pattern mining algorithms. ACM Computing Surveys (CSUR) 43, 1 (2010), 3.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Ruzica Piskac, Thomas Wies, and Damien Zufferey. 2014. GRASShopper - Complete Heap Verification with Mixed Specifications. In Tools and Algorithms for the Construction and Analysis of Systems - 20th International Conference, TACAS 2014. 124–139.Google ScholarGoogle Scholar
  39. Veselin Raychev, Martin Vechev, and Andreas Krause. 2015. Predicting program properties from big code. In ACM SIGPLAN Notices, Vol. 50. ACM, 111–124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Jenni Ryall. 2015. Facebook, Tinder, Instagram suffer widespread issues. http://mashable.com/2015/01/27/ facebook- tinder- instagram- issues/ . (2015).Google ScholarGoogle Scholar
  41. Mark Santolucito, Ennan Zhai, and Ruzica Piskac. 2016. Probabilistic Automated Language Learning for Configuration Files. In 28th Computer Aided Verification (CAV).Google ScholarGoogle Scholar
  42. Ya-Yunn Su, Mona Attariyan, and Jason Flinn. 2007. AutoBash: Improving configuration management with operating systems. In 21st ACM Symposium on Operating Systems Principles (SOSP). Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Helen J. Wang, John C. Platt, Yu Chen, Ruyun Zhang, and Yi-Min Wang. 2004. Automatic misconfiguration troubleshooting with PeerPressure. In 6th USENIX Symposium on Operating Systems Design and Implementation (OSDI).Google ScholarGoogle Scholar
  44. Andrew Whitaker, Richard S. Cox, and Steven D. Gribble. 2004. Configuration debugging as search: Finding the needle in the haystack. In 6th USENIX Symposium on Operating Systems Design and Implementation (OSDI).Google ScholarGoogle Scholar
  45. Tianyin Xu. 2017. Misconfiguration dataset. https://github.com/tianyin/configuration_datasets . (March 2017).Google ScholarGoogle Scholar
  46. Tianyin Xu, Long Jin, Xuepeng Fan, Yuanyuan Zhou, Shankar Pasupathy, and Rukma Talwadker. 2015. Hey, you have given me too many knobs!: understanding and dealing with over-designed configuration in system software. In 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE). Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Tianyin Xu, Xinxin Jin, Peng Huang, Yuanyuan Zhou, Shan Lu, Long Jin, and Shankar Pasupathy. 2016. Early detection of configuration errors to reduce failure damage. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI).Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Tianyin Xu, Jiaqi Zhang, Peng Huang, Jing Zheng, Tianwei Sheng, Ding Yuan, Yuanyuan Zhou, and Shankar Pasupathy. 2013. Do not blame users for misconfigurations. In 24th ACM Symposium on Operating Systems Principles (SOSP). Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Tianyin Xu and Yuanyuan Zhou. 2015. Systems approaches to tackling configuration errors: A survey. ACM Comput. Surv. 47, 4 (2015), 70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Zuoning Yin, Xiao Ma, Jing Zheng, Yuanyuan Zhou, Lakshmi N. Bairavasundaram, and Shankar Pasupathy. 2011. An empirical study on configuration errors in commercial and open source systems. In 23rd ACM Symposium on Operating Systems Principles (SOSP). Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Ding Yuan, Yinglian Xie, Rina Panigrahy, Junfeng Yang, Chad Verbowski, and Arunvijay Kumar. 2011. Context-based online configuration-error detection. In USENIX Annual Technical Conference (USENIX ATC).Google ScholarGoogle Scholar
  52. Andreas Zeller. 2005. Why Programs Fail: A Guide to Systematic Debugging. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Jiaqi Zhang, Lakshminarayanan Renganarayana, Xiaolan Zhang, Niyu Ge, Vasanth Bala, Tianyin Xu, and Yuanyuan Zhou. 2014. EnCore: Exploiting system environment and correlation information for misconfiguration detection. In Architectural Support for Programming Languages and Operating Systems (ASPLOS).Google ScholarGoogle Scholar

Index Terms

  1. Synthesizing configuration file specifications with association rule learning

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image Proceedings of the ACM on Programming Languages
          Proceedings of the ACM on Programming Languages  Volume 1, Issue OOPSLA
          October 2017
          1786 pages
          EISSN:2475-1421
          DOI:10.1145/3152284
          Issue’s Table of Contents

          Copyright © 2017 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 12 October 2017
          Published in pacmpl Volume 1, Issue OOPSLA

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader