Article

Active learning for automatic classification of software behavior

Authors:
James F. Bowring

Georgia Institute of Technology, Atlanta, Georgia

Georgia Institute of Technology, Atlanta, Georgia
View Profile

,
James M. Rehg

Georgia Institute of Technology, Atlanta, Georgia

Georgia Institute of Technology, Atlanta, Georgia
View Profile

,
Mary Jean Harrold

Georgia Institute of Technology, Atlanta, Georgia

Georgia Institute of Technology, Atlanta, Georgia
View Profile

ISSTA '04: Proceedings of the 2004 ACM SIGSOFT international symposium on Software testing and analysisJuly 2004Pages 195–205https://doi.org/10.1145/1007512.1007539

Published:01 July 2004Publication History

ISSTA '04: Proceedings of the 2004 ACM SIGSOFT international symposium on Software testing and analysis

Pages 195–205

ABSTRACT

A program's behavior is ultimately the collection of all its executions. This collection is diverse, unpredictable, and generally unbounded. Thus it is especially suited to statistical analysis and machine learning techniques. The primary focus of this paper is on the automatic classification of program behavior using execution data. Prior work on classifiers for software engineering adopts a classical batch-learning approach. In contrast, we explore an active-learning paradigm for behavior classification. In active learning, the classifier is trained incrementally on a series of labeled data elements. Secondly, we explore the thesis that certain features of program behavior are stochastic processes that exhibit the Markov property, and that the resultant Markov models of individual program executions can be automatically clustered into effective predictors of program behavior. We present a technique that models program executions as Markov models, and a clustering method for Markov models that aggregates multiple program executions into effective behavior classifiers. We evaluate an application of active learning to the efficient refinement of our classifiers by conducting three empirical studies that explore a scenario illustrating automated test plan augmentation.

References

G. Ammons, R. Bodik, and J. R. Larus. Mining specifications. In Proceedings of the 2002 ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'02), pages 4--16, January 2002. Google ScholarDigital Library
Aristotle Research Group. Aristotle: Software engineering tools, 2002. http://www.cc.gatech.edu/aristotle/.Google Scholar
G. Booch, J. Rumbaugh, and I. Jacobson. The Unified Modeling Language User Guide. Addison-Wesley, Boston, 1998. Google ScholarDigital Library
Y. Brun and M. D. Ernst. Finding latent code errors via machine learning over program executions. In Proceedings of the 26th International Conference on Software Engineering. Google ScholarDigital Library
D. A. Cohn, L. Atlas, and R. E. Ladner. Improving generalization with active learning. Machine Learning, 15(2):201--221, 1994. Google ScholarCross Ref
J. E. Cook and A. L. Wolf. Automating process discovery through event-data analysis. In Proceedings of the 17th International Conference on Software Engineering (ICSE'95), pages 73--82, January 1999. Google ScholarDigital Library
W. Dickinson, D. Leon, and A. Podgurski. Finding failures by cluster analysis of execution profiles. In Proceedings of the 23rd International Conference on Software Engineering (ICSE'01), pages 339--348, May 2001. Google ScholarDigital Library
T. G. Dietterich. Machine learning for sequential data: A review. In T. Caelli, editor, Structural, Syntactic, and Statistical Pattern Recognition, volume 2396 of Lecture Notes in Computer Science, pages 15--30. Springer-Verlag, 2002. Google ScholarDigital Library
R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. John Wiley and Sons, Inc., New York, 2001. Google ScholarDigital Library
M. Girolami and A. Kaban. Simplicial mixtures of markov chains: Distributed modelling of dynamic user profiles. In S. Thrun, L. Saul, and B. Schölkopf, editors, Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, MA, 2004.Google Scholar
K. C. Gross, S. McMaster, A. Porter, A. Urmanov, and L. Votta. Proactive system maintenance using software telemetry. In Proceedings of the 1st International Conference on Remote Analysis and Measurement of Software Systems (RAMSS'03), pages 24--26, May 2003.Google Scholar
M. Harder, J. Mellen, and M. D. Ernst. Improving test suites via operational abstraction. In Proceedings of the 25rd International Conference on Software Engineering (ICSE'03), pages 60--71, May 2003. Google ScholarDigital Library
M. J. Harrold, G. Rothermel, K. Sayre, R. Wu, and L. Yi. An empirical investigation of the relationship between fault-revealing test behavior and differences in program spectra. Journal of Software Testing, Verifications, and Reliability, 10(3), September 2000.Google Scholar
K. Ilgun, R. A. Kemmerer, and P. A. Porras. State transition analysis: A rule-based intrusion detection approach. Software Engineering, 21(3):181--199, 1995. Google ScholarDigital Library
S. Jha, K. Tan, and R. A. Maxion. Markov chains, classifiers, and intrusion detection. In Proceedings of the 14th IEEE Computer Security Foundations Workshop (CSFW'01), pages 206--219, June 2001. Google ScholarDigital Library
J. A. Kowal. Behavior Models: Specifying User's Expectations. Prentice Hall, Englewood Cliffs, New Jersey, 1992. Google ScholarDigital Library
T. M. Mitchell. Machine Learning. McGraw-Hill, Boston, 1997. Google ScholarDigital Library
J. C. Munson and S. Elbaum. Software reliability as a function of user execution patterns. In Proceedings of the Thirty-second Annual Hawaii International Conference on System Sciences, January 1999. Google ScholarDigital Library
J. Musa. Software Reliability Engineering: More Reliable Software, Faster Development and Testing. McGraw-Hill, New York, 1999. Google ScholarDigital Library
A. Podgurski, D. Leon, P. Francis, W. Masri, M. Minch, J. Sun, and B. Wang. Automated support for classifying software failure reports. In Proceedings of the 25rd International Conference on Software Engineering (ICSE'03), pages 465--474, May 2003. Google ScholarDigital Library
S. J. Prowell, C. J. Trammell, R. C. Linger, and J. H. Poore. Cleanroom Software Engineering: Technology and Process. Addison-Wesley, Reading, Mass., 1999. Google ScholarDigital Library
L. Rabiner and B. Juang. Fundamentals of Speech Recognition. Prentice Hall, New Jersey, 1993. Google ScholarDigital Library
T. Reps, T. Ball, M. Das, and J. Larus. The use of program profiling for software maintenance with applications to the year 2000 problem. ACM Software Engineering Notes, 22(6):432--439, November 1997. Google ScholarDigital Library
J. A. Whittaker and J. H. Poore. Markov analysis of software specifications. ACM Transactions on Software Engineering and Methodology, 2(1):93--106, January 1996. Google ScholarDigital Library

Index Terms

Recommendations

Active learning for automatic classification of software behavior

A program's behavior is ultimately the collection of all its executions. This collection is diverse, unpredictable, and generally unbounded. Thus it is especially suited to statistical analysis and machine learning techniques. The primary focus of this ...
Read More
Active learning for text classification with reusability

We investigate the reusability problem in active learning for text classification.The reusability problem affects active learning systems for text classification.If the consumer classifier type is known, it should be used for the selector.Local and ...
Read More
When does my program do this? learning circumstances of software behavior
ESEC/FSE 2020: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

A program fails. Under which circumstances does the failure occur? Our Alhazenapproach starts with a run that exhibits a particular behavior and automatically determines input features associated with the behavior in question: (1) We use a grammar to ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ISSTA '04: Proceedings of the 2004 ACM SIGSOFT international symposium on Software testing and analysis
July 2004
294 pages
ISBN:1581138202
DOI:10.1145/1007512
General Chair:
George Avrunin
University of Massachusetts, USA
,
Program Chair:
Gregg Rothermel
University of Nebraska -- Lincoln, USA
ACM SIGSOFT Software Engineering Notes Volume 29, Issue 4
July 2004
284 pages
ISSN:0163-5948
DOI:10.1145/1013886
Issue’s Table of Contents
Copyright © 2004 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 July 2004
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Markov models
machine learning
software behavior
software testing
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate58of213submissions,27%
Upcoming Conference
ISSTA '24

Sponsor:

sigsoft

33rd ACM SIGSOFT International Symposium on Software Testing and Analysis

September 16 - 20, 2024

Vienna , Austria
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 146
  Total Citations
  View Citations
- 1,920
  Total Downloads
- Downloads (Last 12 months)41
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Active learning for automatic classification of software behavior

ISSTA '04: Proceedings of the 2004 ACM SIGSOFT international symposium on Software testing and analysis

ABSTRACT

References

Cited By

Index Terms

Recommendations

Active learning for automatic classification of software behavior

Active learning for text classification with reusability

When does my program do this? learning circumstances of software behavior