skip to main content
10.1145/1007512.1007539acmconferencesArticle/Chapter ViewAbstractPublication PagesisstaConference Proceedingsconference-collections
Article

Active learning for automatic classification of software behavior

Published:01 July 2004Publication History

ABSTRACT

A program's behavior is ultimately the collection of all its executions. This collection is diverse, unpredictable, and generally unbounded. Thus it is especially suited to statistical analysis and machine learning techniques. The primary focus of this paper is on the automatic classification of program behavior using execution data. Prior work on classifiers for software engineering adopts a classical batch-learning approach. In contrast, we explore an active-learning paradigm for behavior classification. In active learning, the classifier is trained incrementally on a series of labeled data elements. Secondly, we explore the thesis that certain features of program behavior are stochastic processes that exhibit the Markov property, and that the resultant Markov models of individual program executions can be automatically clustered into effective predictors of program behavior. We present a technique that models program executions as Markov models, and a clustering method for Markov models that aggregates multiple program executions into effective behavior classifiers. We evaluate an application of active learning to the efficient refinement of our classifiers by conducting three empirical studies that explore a scenario illustrating automated test plan augmentation.

References

  1. G. Ammons, R. Bodik, and J. R. Larus. Mining specifications. In Proceedings of the 2002 ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'02), pages 4--16, January 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Aristotle Research Group. Aristotle: Software engineering tools, 2002. http://www.cc.gatech.edu/aristotle/.Google ScholarGoogle Scholar
  3. G. Booch, J. Rumbaugh, and I. Jacobson. The Unified Modeling Language User Guide. Addison-Wesley, Boston, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Y. Brun and M. D. Ernst. Finding latent code errors via machine learning over program executions. In Proceedings of the 26th International Conference on Software Engineering. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. A. Cohn, L. Atlas, and R. E. Ladner. Improving generalization with active learning. Machine Learning, 15(2):201--221, 1994. Google ScholarGoogle ScholarCross RefCross Ref
  6. J. E. Cook and A. L. Wolf. Automating process discovery through event-data analysis. In Proceedings of the 17th International Conference on Software Engineering (ICSE'95), pages 73--82, January 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. W. Dickinson, D. Leon, and A. Podgurski. Finding failures by cluster analysis of execution profiles. In Proceedings of the 23rd International Conference on Software Engineering (ICSE'01), pages 339--348, May 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. T. G. Dietterich. Machine learning for sequential data: A review. In T. Caelli, editor, Structural, Syntactic, and Statistical Pattern Recognition, volume 2396 of Lecture Notes in Computer Science, pages 15--30. Springer-Verlag, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. John Wiley and Sons, Inc., New York, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Girolami and A. Kaban. Simplicial mixtures of markov chains: Distributed modelling of dynamic user profiles. In S. Thrun, L. Saul, and B. Schölkopf, editors, Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, MA, 2004.Google ScholarGoogle Scholar
  11. K. C. Gross, S. McMaster, A. Porter, A. Urmanov, and L. Votta. Proactive system maintenance using software telemetry. In Proceedings of the 1st International Conference on Remote Analysis and Measurement of Software Systems (RAMSS'03), pages 24--26, May 2003.Google ScholarGoogle Scholar
  12. M. Harder, J. Mellen, and M. D. Ernst. Improving test suites via operational abstraction. In Proceedings of the 25rd International Conference on Software Engineering (ICSE'03), pages 60--71, May 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. J. Harrold, G. Rothermel, K. Sayre, R. Wu, and L. Yi. An empirical investigation of the relationship between fault-revealing test behavior and differences in program spectra. Journal of Software Testing, Verifications, and Reliability, 10(3), September 2000.Google ScholarGoogle Scholar
  14. K. Ilgun, R. A. Kemmerer, and P. A. Porras. State transition analysis: A rule-based intrusion detection approach. Software Engineering, 21(3):181--199, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Jha, K. Tan, and R. A. Maxion. Markov chains, classifiers, and intrusion detection. In Proceedings of the 14th IEEE Computer Security Foundations Workshop (CSFW'01), pages 206--219, June 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. A. Kowal. Behavior Models: Specifying User's Expectations. Prentice Hall, Englewood Cliffs, New Jersey, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. T. M. Mitchell. Machine Learning. McGraw-Hill, Boston, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. C. Munson and S. Elbaum. Software reliability as a function of user execution patterns. In Proceedings of the Thirty-second Annual Hawaii International Conference on System Sciences, January 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Musa. Software Reliability Engineering: More Reliable Software, Faster Development and Testing. McGraw-Hill, New York, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Podgurski, D. Leon, P. Francis, W. Masri, M. Minch, J. Sun, and B. Wang. Automated support for classifying software failure reports. In Proceedings of the 25rd International Conference on Software Engineering (ICSE'03), pages 465--474, May 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. J. Prowell, C. J. Trammell, R. C. Linger, and J. H. Poore. Cleanroom Software Engineering: Technology and Process. Addison-Wesley, Reading, Mass., 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. L. Rabiner and B. Juang. Fundamentals of Speech Recognition. Prentice Hall, New Jersey, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. T. Reps, T. Ball, M. Das, and J. Larus. The use of program profiling for software maintenance with applications to the year 2000 problem. ACM Software Engineering Notes, 22(6):432--439, November 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. A. Whittaker and J. H. Poore. Markov analysis of software specifications. ACM Transactions on Software Engineering and Methodology, 2(1):93--106, January 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Active learning for automatic classification of software behavior

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in
                • Published in

                  cover image ACM Conferences
                  ISSTA '04: Proceedings of the 2004 ACM SIGSOFT international symposium on Software testing and analysis
                  July 2004
                  294 pages
                  ISBN:1581138202
                  DOI:10.1145/1007512
                  • cover image ACM SIGSOFT Software Engineering Notes
                    ACM SIGSOFT Software Engineering Notes  Volume 29, Issue 4
                    July 2004
                    284 pages
                    ISSN:0163-5948
                    DOI:10.1145/1013886
                    Issue’s Table of Contents

                  Copyright © 2004 ACM

                  Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 1 July 2004

                  Permissions

                  Request permissions about this article.

                  Request Permissions

                  Check for updates

                  Qualifiers

                  • Article

                  Acceptance Rates

                  Overall Acceptance Rate58of213submissions,27%

                  Upcoming Conference

                  ISSTA '24

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader