ABSTRACT
Test suites should test exceptional behavior to detect faults in error-handling code. However, manually-written test suites tend to neglect exceptional behavior. Automatically-generated test suites, on the other hand, lack test oracles that verify whether runtime exceptions are the expected behavior of the code under test.
This paper proposes a technique that automatically creates test oracles for exceptional behaviors from Javadoc comments. The technique uses a combination of natural language processing and run-time instrumentation. Our implementation, Toradocu, can be combined with a test input generation tool. Our experimental evaluation shows that Toradocu improves the fault-finding effectiveness of EvoSuite and Randoop test suites by 8% and 16% respectively, and reduces EvoSuite’s false positives by 33%.
- G. Angeli, M. J. J. Premkumar, and C. D. Manning. Leveraging linguistic structure for open domain information extraction. In ACL 2015, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics, pages 344–354, 2015.Google ScholarCross Ref
- S. Antoy and D. Hamlet. Automatically checking an implementation against its formal specification. IEEE Transactions on Software Engineering, 26(1):55–69, 2000. Google ScholarDigital Library
- W. Araujo, L. C. Briand, and Y. Labiche. Enabling the runtime assertion checking of concurrent contracts for the Java modeling language. In ICSE’11, Proceedings of the 33rd International Conference on Software Engineering, pages 786–795, 2011. Google ScholarDigital Library
- C. Bacherler, B. Moszkowski, C. Facchi, and A. Huebner. Automated test code generation based on formalized natural language business rules. In ICSEA’12, Proceedings of the 7th International Conference on Software Engineering Advances, pages 165–171, 2012.Google Scholar
- L. Baresi, P. L. Lanzi, and M. Miraz. Testful: An evolutionary test approach for Java. In ICST’10, Proceedings of the 3rd International Conference on Software Testing, Verification and Validation, pages 185–194, 2010. Google ScholarDigital Library
- L. Baresi and M. Young. Test oracles. Technical Report CIS-TR-01-02, University of Oregon, Department of Computer and Information Science, 2001.Google Scholar
- E. Barr, M. Harman, P. McMinn, M. Shahbaz, and S. Yoo. The oracle problem in software testing: A survey. IEEE Transactions on Software Engineering, 41(5):507–525, May 2015.Google ScholarDigital Library
- A. Carzaniga, A. Goffi, A. Gorla, A. Mattavelli, and M. Pezzè. Cross-checking oracles from intrinsic software redundancy. In ICSE’14, Proceedings of the 36th International Conference on Software Engineering, pages 931–942, 2014. Google ScholarDigital Library
- M. Ceccato, A. Marchetto, L. Mariani, C. D. Nguyen, and P. Tonella. Do automatically generated test cases make debugging easier? an experimental assessment of debugging effectiveness and efficiency. ACM Transactions on Programming Languages and Systems, 25(1):5:1–5:38, dec 2015. Google ScholarDigital Library
- T. Y. Chen, F.-C. Kuo, T. H. Tse, and Z. Q. Zhou. Metamorphic testing and beyond. In STEP’03, Proceedings of the 11th International Workshop on Software Technology and Engineering Practice, pages 94–100, 2003. Google ScholarDigital Library
- Y. Cheon. Abstraction in assertion-based test oracles. In QSIC’07, Proceedings of the 7th International Conference on Quality Software, pages 410–414, 2007. Google ScholarDigital Library
- Y. Cheon and G. T. Leavens. A simple and practical approach to unit testing: The JML and JUnit way. In ECOOP 2002 — Object-Oriented Programming, 16th European Conference, pages 231–255, 2002. Google ScholarDigital Library
- C. Csallner and Y. Smaragdakis. JCrasher: an automatic robustness tester for Java. Software: Practice and Experience, 34(11):1025–1050, September 2004. Google ScholarDigital Library
- C. Csallner and Y. Smaragdakis. Check ’n’ Crash: Combining static checking and testing. In ICSE’05, Proceedings of the 27th International Conference on Software Engineering, pages 422–431, St. Louis, MO, USA, May 18–20, 2005. Google ScholarDigital Library
- J. D. Day and J. D. Gannon. A test oracle based on formal specifications. In SOFTAIR’85, Proceedings of the 2nd Conference on Software Development Tools, Techniques, and Alternatives, pages 126–130, 1985. Google ScholarDigital Library
- L. Del Corro and R. Gemulla. Clausie: Clause-based open information extraction. In WWW 2013, Proceedings of the 22nd International World Wide Web Conference, pages 355–366, 2013. Google ScholarDigital Library
- W. Dietl, S. Dietzel, M. D. Ernst, K. Mu¸slu, and T. Schiller. Building and using pluggable type-checkers. In ICSE’11, Proceedings of the 33rd International Conference on Software Engineering, pages 681–690, Waikiki, Hawaii, USA, May 25–27, 2011. Google ScholarDigital Library
- R.-K. Doong and P. G. Frankl. The ASTOOT approach to testing object-oriented programs. ACM Transactions on Software Engineering and Methodology, 3(2):101–130, 1994. Google ScholarDigital Library
- G. Fraser and A. Zeller. Mutation-driven generation of unit tests and oracles. IEEE Transactions on Software Engineering, 38(2):278–292, March–April 2012. Google ScholarDigital Library
- S. Fujiwara, G. von Bochmann, F. Khendek, M. Amalou, and A. Ghedamsi. Test selection based on finite state models. IEEE Transactions on Software Engineering, 17(6):591–603, 1991. Google ScholarDigital Library
- J. P. Galeotti, G. Fraser, and A. Arcuri. Improving search-based test suite generation with dynamic symbolic execution. In ISSRE’13, Proceedings of the IEEE International Symposium on Software Reliability Engineering, pages 360–369, 2013.Google Scholar
- J. Gannon, P. McMullin, and R. Hamlet. Data abstraction, implementation, specification, and testing. ACM Transactions on Programming Languages and Systems, 3(3):211–223, 1981. Google ScholarDigital Library
- P. Godefroid, N. Klarlund, and K. Sen. DART: Directed automated random testing. In PLDI 2005, Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation, Chicago, IL, USA, June 13–15, 2005. Google ScholarDigital Library
- A. Gotlieb. Exploiting symmetries to test programs. In ISSRE’03, Proceedings of the IEEE International Symposium on Software Reliability Engineering, pages 365–375, 2003. Google ScholarDigital Library
- C. D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. J. Bethard, and D. McClosky. The Stanford CoreNLP natural language processing toolkit. In Association for Computational Linguistics (ACL) System Demonstrations, pages 55–60, 2014.Google ScholarCross Ref
- M. Marneffe, B. Maccartney, and C. Manning. Generating typed dependency parses from phrase structure parses. In LREC’06, Proceedings of the workshop on Cross-Framework and Cross-Domain Parser Evaluation, pages 449–454, 2006.Google Scholar
- J. Mcdonald. Translating Object-Z specifications to passive test oracles. In ICFEM’98, Proceedings of the 1998 International Conference on Formal Engineering Methods, pages 165–174, 1998. Google ScholarDigital Library
- B. Meyer. Object-Oriented Software Construction. Prentice Hall, 1st edition, 1988. Google ScholarDigital Library
- E. Mikk. Compilation of Z specifications into C for automatic test result evaluation. In ZUM’95, Proceedings of the 9th International Conference of Z Users, pages 167–180, 1995. Google ScholarDigital Library
- C. Murphy, G. Kaiser, I. Vo, and M. Chu. Quality assurance of software applications using the in vivo testing approach. In ICST’09, Proceedings of the 2nd International Conference on Software Testing, Verification and Validation, pages 111–120, 2009. Google ScholarDigital Library
- C. Pacheco and M. D. Ernst. Eclat: Automatic generation and classification of test inputs. In ECOOP 2005 — Object-Oriented Programming, 19th European Conference, pages 504–527, Glasgow, Scotland, July 27–29, 2005. Google ScholarDigital Library
- C. Pacheco, S. K. Lahiri, M. D. Ernst, and T. Ball. Feedback-directed random test generation. In ICSE’07, Proceedings of the 29th International Conference on Software Engineering, pages 75–84, Minneapolis, MN, USA, May 23–25, 2007. Google ScholarDigital Library
- R. Pandita, X. Xiao, H. Zhong, T. Xie, S. Oney, and A. Paradkar. Inferring method specifications from natural language API descriptions. In ICSE’12, Proceedings of the 34th International Conference on Software Engineering, pages 815–825, Zurich, Switzerland, 2012. Google ScholarDigital Library
- M. M. Papi, M. Ali, T. L. Correa Jr., J. H. Perkins, and M. D. Ernst. Practical pluggable types for Java. In ISSTA 2008, Proceedings of the 2008 International Symposium on Software Testing and Analysis, pages 201–212, Seattle, WA, USA, July 22–24, 2008. Google ScholarDigital Library
- Parasoft Corporation. Jtest version 4.5. http://www.parasoft.com/.Google Scholar
- Randoop Developers. Randoop manual. https://randoop.github.io/randoop/manual/, January 2016.Google Scholar
- Version 2.1.1.Google Scholar
- J. M. Rojas, G. Fraser, and A. Arcuri. Automated unit test generation during software development: A controlled experiment and think-aloud observations. In ISSTA 2015, Proceedings of the 2015 International Symposium on Software Testing and Analysis, pages 338–349, 2015. Google ScholarDigital Library
- D. S. Rosenblum. A practical approach to programming with assertions. IEEE Transactions on Software Engineering, 21(1):19–31, 1995. Google ScholarDigital Library
- C. Rubio-González and B. Liblit. Expect the unexpected: error code mismatches between documentation and the real world. In PASTE’10, Proceedings of the ACM SIGPLAN/SIGSOFT Workshop on Program Analysis for Software Tools and Engineering, pages 73–80, 2010. Google ScholarDigital Library
- K. Sen, D. Marinov, and G. Agha. CUTE: A concolic unit testing engine for C. In ESEC/FSE 2005: Proceedings of the 10th European Software Engineering Conference and the 13th ACM SIGSOFT Symposium on the Foundations of Software Engineering, pages 263–272, Lisbon, Portugal, September 7–9, 2005. Google ScholarDigital Library
- S. Shamshiri, R. Just, J. M. Rojas, G. Fraser, P. McMinn, and A. Arcuri. Do automatically generated unit tests find real faults? An empirical study of effectiveness and challenges. In ASE 2015: Proceedings of the 30th Annual International Conference on Automated Software Engineering, pages 201–211, Lincoln, NE, USA, November 11–13, 2015.Google ScholarDigital Library
- L. Tan, D. Yuan, G. Krishna, and Y. Zhou. /*iComment: Bugs or bad comments?*/. In SOSP 2007, Proceedings of the 21st ACM Symposium on Operating Systems Principles, pages 145–158, Stevenson, WA, USA, October 14–17, 2007. Google ScholarDigital Library
- L. Tan, Y. Zhou, and Y. Padioleau. aComment: Mining annotations from comments and code to detect interrupt related concurrency bugs. In ICSE’11, Proceedings of the 33rd International Conference on Software Engineering, pages 11–20, 2011. Google ScholarDigital Library
- S. H. Tan, D. Marinov, L. Tan, and G. T. Leavens. @tComment: Testing Javadoc comments to detect comment-code inconsistencies. In Fifth International Conference on Software Testing, Verification and Validation (ICST), pages 260–269, Montreal, Canada, April 18–20, 2012. Google ScholarDigital Library
- R. N. Taylor. An integrated verification and testing environment. Software: Practice and Experience, 13(8):697–713, 1983.Google ScholarCross Ref
- M. Vivanti, A. Mis, A. Gorla, and G. Fraser. Search-based data-flow test generation. In ISSRE’13, Proceedings of the IEEE International Symposium on Software Reliability Engineering, pages 370–379. IEEE, 2013.Google Scholar
- W. Weimer and G. C. Necula. Finding and preventing run-time error handling mistakes. In Object-Oriented Programming Systems, Languages, and Applications (OOPSLA 2004), pages 419–431, Vancouver, BC, Canada, 2004. Google ScholarDigital Library
- E. Wong, L. Zhang, S. Wang, T. Liu, and L. Tan. Dase: Document-assisted symbolic execution for improving automated software testing. In ICSE’15, Proceedings of the 37th International Conference on Software Engineering, pages 620–631, Florence, Italy, 2015. Google ScholarDigital Library
- Q. Wu, L. Wu, G. Liang, Q. Wang, T. Xie, and H. Mei. Inferring dependency constraints on parameters for web services. In Proceedings of the 22nd International Conference on World Wide Web, pages 1421–1432, Rio de Janeiro, Brazil, 2013. Google ScholarDigital Library
- X. Xiao, A. Paradkar, S. Thummalapenta, and T. Xie. Automated extraction of security policies from natural-language software documents. In FSE 2012, Proceedings of the ACM SIGSOFT 20th Symposium on the Foundations of Software Engineering, pages 12:1–12:11, Cary, North Carolina, 2012. Google ScholarDigital Library
- T. Xie and D. Notkin. Tool-assisted unit test selection based on operational violations. In ASE 2003: Proceedings of the 18th Annual International Conference on Automated Software Engineering, pages 40–48, Montreal, Canada, October 8–10, 2003.Google Scholar
- B. Zhang, E. Hill, and J. Clause. Automatically generating test templates from test names. In ASE 2015: Proceedings of the 30th Annual International Conference on Automated Software Engineering, pages 506–511, Lincoln, NE, USA, November 11–13, 2015.Google ScholarDigital Library
- H. Zhong and Z. Su. Detecting API documentation errors. In Object-Oriented Programming Systems, Languages, and Applications (OOPSLA 2013), pages 803–816, Indianapolis, Indiana, USA, 2013. Google ScholarDigital Library
- H. Zhong, L. Zhang, T. Xie, and H. Mei. Inferring resource specifications from natural language API documentation. In ASE 2009: Proceedings of the 24th Annual International Conference on Automated Software Engineering, pages 307–318, Washington, DC, USA, 2009. Google ScholarDigital Library
Index Terms
- Automatic generation of oracles for exceptional behaviors
Recommendations
Fault-based testing without the need of oracles
AbstractThere are two fundamental limitations in software testing, known as the reliable test set problem and the oracle problem. Fault-based testing is an attempt by Morell to alleviate the reliable test set problem. In this paper, we propose ...
Automatic system testing of programs without test oracles
ISSTA '09: Proceedings of the eighteenth international symposium on Software testing and analysisMetamorphic testing has been shown to be a simple yet effective technique in addressing the quality assurance of applications that do not have test oracles, i.e., for which it is difficult or impossible to know what the correct output should be for ...
Testing web enabled simulation at scale using metamorphic testing
ICSE-SEIP '21: Proceedings of the 43rd International Conference on Software Engineering: Software Engineering in PracticeWe report on Facebook's deployment of MIA (Metamorphic Interaction Automaton). MIA is used to test Facebook's Web Enabled Simulation, built on a web infrastructure of hundreds of millions of lines of code. MIA tackles the twin problems of test flakiness ...
Comments