ABSTRACT
Regression performance testing is an important but time/resource-consuming phase during software development. Developers need to detect performance regressions as early as possible to reduce their negative impact and fixing cost. However, conducting regression performance testing frequently (e.g., after each commit) is prohibitively expensive. To address this issue, in this paper, we propose PerfRanker, the first approach to prioritizing test cases in performance regression testing for collection-intensive software, a common type of modern software heavily using collections. Our test prioritization is based on performance impact analysis that estimates the performance impact of a given code revision on a given test execution. Evaluation shows that our approach can cover top 3 test cases whose performance is most affected within top 30% to 37% prioritized test cases, in contrast to top 65% to 79% by 3 baseline techniques.
Supplemental Material
Available for Download
The data set and prototype to run the test prioritization process.
- Apache Commons Maths. http://commons.apache.org/proper/commons-math/Google Scholar
- Xalan for Java. https://xml.apache.org/xalan-j/ 2017. PerfRanker Project Web. (2017).Google Scholar
- https://sites.google.com/site/ perfranker2017/Google Scholar
- Marcos K. Aguilera, Jeffrey C. Mogul, Janet L. Wiener, Patrick Reynolds, and Athicha Muthitacharoen. 2003. Performance Debugging for Distributed Systems of Black Boxes. In Proceedings of the 19th ACM Symposium on Operating Systems Principles. 74–89. Google ScholarDigital Library
- Azzah Al-Maskari, Mark Sanderson, and Paul Clough. 2007. The Relationship Between IR Effectiveness Measures and User Satisfaction. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 773–774. Google ScholarDigital Library
- Taweesup Apiwattanapong, Alessandro Orso, and Mary J. Harrold. 2005. Efficient and Precise Dynamic Impact Analysis Using Execute-after Sequences. In Proceedings of the 27th International Conference on Software Engineering. 432–441. Google ScholarDigital Library
- Robert S. Arnold. 1996. Software Change Impact Analysis. IEEE Computer Society Press. Google ScholarDigital Library
- Cornel Barna, Marin Litoiu, and Hamoun Ghanbari. 2011. Model-based Performance Testing (NIER Track). In Proceedings of the 33rd International Conference on Software Engineering. 872–875. Google ScholarDigital Library
- Jennifer Black, Emanuel Melachrinoudis, and David Kaeli. 2004. Bi-Criteria Models for All-Uses Test Suite Reduction. In Proceedings of the 26th International Conference on Software Engineering. 106–115. Google ScholarDigital Library
- Stephen M. Blackburn, Robin Garner, Chris Hoffmann, Asjad M. Khang, Kathryn S. McKinley, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z. Guyer, Martin Hirzel, Antony Hosking, Maria Jump, Han Lee, J. Eliot B. Moss, Aashish Phansalkar, Darko Stefanović, Thomas VanDrunen, Daniel von Dincklage, and Ben Wiedermann. 2006. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. In Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-oriented Programming Systems, Languages, and Applications. 169–190. Google ScholarDigital Library
- Eric Bodden. 2012. Inter-procedural Data-flow Analysis with IFDS/IDE and Soot. In Proceedings of the ACM SIGPLAN International Workshop on State of the Art in Java Program Analysis. 3–8. Google ScholarDigital Library
- Bihuan Chen, Yang Liu, and Wei Le. 2016. Generating Performance Distributions via Probabilistic Symbolic Execution. In Proceedings of the 38th International Conference on Software Engineering. 49–60. Google ScholarDigital Library
- Yih-Farn Chen, David S. Rosenblum, and Kiem-Phong Vo. 1994. TESTTUBE: A System for Selective Regression Testing. In Proceedings of 16th International Conference on Software Engineering. 211–220. Google ScholarDigital Library
- Giovanni Denaro, Andrea Polini, and Wolfgang Emmerich. 2004. Early Performance Testing of Distributed Software Applications. In Proceedings of the 4th International Workshop on Software and Performance. 94–103. Google ScholarDigital Library
- Sebastian Elbaum, Alexey G. Malishevsky, and Gregg Rothermel. 2000. Prioritizing Test Cases for Regression Testing. In Proceedings of the 2000 ACM SIGSOFT International Symposium on Software Testing and Analysis. 102–112. Google ScholarDigital Library
- King Chun Foo. 2011. Automated Discovery of Performance Regressions in Enterprise Applications. In Master’s Thesis.Google Scholar
- King Chun Foo, Zhen Ming Jiang, Bram Adams, Ahmed E. Hassan, Ying Zou, and Parminder Flora. 2010. Mining Performance Regression Testing Repositories for Automated Performance Analysis. In 2010 10th International Conference on Quality Software. 32–41. Google ScholarDigital Library
- Gregory Fox. 1989. Performance Engineering As a Part of the Development Life Cycle for Large-scale Software Systems. In Proceedings of the 11th International Conference on Software Engineering (ICSE ’89). 85–94. Google ScholarDigital Library
- Mark Grechanik, Chen Fu, and Qing Xie. 2012. Automatically Finding Performance Problems with Feedback-directed Learning Software Testing. In Proceedings of the 34th International Conference on Software Engineering. 156–166. Google ScholarDigital Library
- Sumit Gulwani, Krishna K. Mehra, and Trishul Chilimbi. 2009. SPEED: Precise and Efficient Static Estimation of Program Computational Complexity. In Proceedings of the 36th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. 127–139. Google ScholarDigital Library
- Shi Han, Yingnong Dang, Song Ge, Dongmei Zhang, and Tao Xie. 2012. Performance Debugging in the Large via Mining Millions of Stack Traces. In Proceedings of the 34th International Conference on Software Engineering (ICSE ’12). 145–155. Google ScholarDigital Library
- Dan Hao, Tao Xie, Lu Zhang, Xiaoyin Wang, Jiasu Sun, and Hong Mei. 2009. Test Input Reduction for Result Inspection to Facilitate Fault Localization. Automated Software Engineering 17, 1 (2009), 5. Google ScholarDigital Library
- M. Jean Harrold, Rajiv Gupta, and Mary Lou Soffa. 1993. A Methodology for Controlling the Size of a Test Suite. ACM Trans. Softw. Eng. Methodol. 2, 3 (July 1993), 270–285. Google ScholarDigital Library
- Peng Huang, Xiao Ma, Dongcai Shen, and Yuanyuan Zhou. 2014. Performance Regression Testing Target Prioritization via Performance Risk Analysis. In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014). 60–71. Google ScholarDigital Library
- Guoliang Jin, Linhai Song, Xiaoming Shi, Joel Scherpelz, and Shan Lu. 2012. Understanding and Detecting Real-world Performance Bugs. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation. 77–88. Google ScholarDigital Library
- Milan Jovic, Andrea Adamoli, and Matthias Hauswirth. 2011. Catch Me if You Can: Performance Bug Detection in the Wild. In Proceedings of the 2011 ACM International Conference on Object Oriented Programming Systems Languages and Applications. 155–170. Google ScholarDigital Library
- Tomas Kalibera, Lubomir Bulej, and Petr Tuma. 2005. Automated Detection of Performance Regressions: The Mono Experience. In 13th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems. 183–190. Google ScholarDigital Library
- Charles Killian, Karthik Nagaraj, Salman Pervez, Ryan Braud, James W. Anderson, and Ranjit Jhala. 2010. Finding Latent Performance Bugs in Systems Implementations. In Proceedings of the Eighteenth ACM SIGSOFT International Symposium on Foundations of Software Engineering. 17–26. Google ScholarDigital Library
- Jung-Min Kim and Adam Porter. 2002. A History-based Test Prioritization Technique for Regression Testing in Resource Constrained Environments. In Proceedings of the 24th International Conference on Software Engineering. 119–129. Google ScholarDigital Library
- Yongin Kwon, Sangmin Lee, Hayoon Yi, Donghyun Kwon, Seungjun Yang, Byung-Gon Chun, Ling Huang, Petros Maniatis, Mayur Naik, and Yunheung Paek. 2013.Google Scholar
- Mantis: Automatic Performance Prediction for Smartphone Applications. In Proceedings of the 2013 USENIX Conference on Annual Technical Conference. 297– 308. Google ScholarDigital Library
- James Law and Gregg Rothermel. 2003. Whole Program Path-Based Dynamic Impact Analysis. In Proceedings of the 25th International Conference on Software Engineering. 308–318. Google ScholarDigital Library
- Andrew W. Leung, Eric Lalonde, Jacob Telleen, James Davis, and Carlos Maltzahn. 2007. Using Comprehensive Analysis for Performance Debugging in Distributed Storage Systems. In 24th IEEE Conference on Mass Storage Systems and Technologies (MSST 2007). 281–286. Google ScholarDigital Library
- Ondřej Lhoták and Laurie Hendren. 2003. Scaling Java Points-to Analysis Using Spark. In Compiler Construction: 12th International Conference, CC 2003. 153–169. Google ScholarDigital Library
- Zheng Li, Mark Harman, and Robert M. Hierons. 2007. Search Algorithms for Regression Test Case Prioritization. IEEE Trans. Softw. Eng. 33, 4 (April 2007), 225–237. Google ScholarDigital Library
- Tongping Liu, Charlie Curtsinger, and Emery D. Berger. 2016. DoubleTake: Fast and Precise Error Detection via Evidence-based Dynamic Analysis. In Proceedings of the 38th International Conference on Software Engineering. 911–922. Google ScholarDigital Library
- Tongping Liu and Xu Liu. 2016. Cheetah: Detecting False Sharing Efficiently and Effectively. In Proceedings of the 2016 International Symposium on Code Generation and Optimization. 1–11. Google ScholarDigital Library
- Qi Luo. 2016. Automatic Performance Testing Using Input-sensitive Profiling. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 1139–1141. Google ScholarDigital Library
- Qi Luo, Denys Poshyvanyk, and Mark Grechanik. 2016. Mining Performance Regression Inducing Code Changes in Evolving Software. In Proceedings of the 13th International Conference on Mining Software Repositories. 25–36. Google ScholarDigital Library
- Alexey G. Malishevsky, Joseph R. Ruthruff, Gregg Rothermel, and Sebastian Elbaum. 2006. Cost-Cognizant Test Case Prioritization. Technical Report.Google Scholar
- Marissa Mayer. In Search of a Better, Faster, Stronger Web. http://goo.gl/m4fXxGoogle Scholar
- Sichen Meng, Xiaoyin Wang, Lu Zhang, and Hong Mei. 2012. A History-Based Matching Approach to Identification of Framework Evolution. In 2012 34th International Conference on Software Engineering (ICSE). 353–363. Google ScholarCross Ref
- M. MITCHELL. GCC Performance Regression Testing Discussion. http://gcc.gnu. org/ml/gcc/2005-11/msg01306Google Scholar
- Shaikh Mostafa and Xiaoyin Wang. 2014. An Empirical Study on the Usage of Mocking Frameworks in Software Testing. In 14th International Conference on Quality Software. 127–132. Google ScholarDigital Library
- Adrian Nistor, Po-Chun Chang, Cosmin RaGoogle Scholar
- Adrian Nistor, Linhai Song, Darko Marinov, and Shan Lu. 2013. Toddler: Detecting Performance Problems via Similar Memory-Access Patterns. In 2013 35th International Conference on Software Engineering (ICSE). 562–571. Google ScholarDigital Library
- Alessandro Orso, Taweesup Apiwattanapong, and Mary J. Harrold. 2003. Leveraging Field Data for Impact Analysis and Regression Testing. In Proceedings of the 9th European Software Engineering Conference Held Jointly with 11th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 128– 137. Google ScholarDigital Library
- Rohan Padhye and Koushik Sen. 2017. TRAVIOLI: A Dynamic Analysis for Detecting Data-Structure Traversals. In Proceedings of the International Conference on Software Engineering. 473–483. Google ScholarDigital Library
- Heekwon Park, Seungjae Baek, Jongmoo Choi, Donghee Lee, and Sam H. Noh. 2013. Regularities Considered Harmful: Forcing Randomness to Memory Accesses to Reduce Row Buffer Conflicts for Multi-core, Multi-bank Systems. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems. 181–192. ISSTA’17, July 2017, Santa Barbara, CA, USA Shaikh Mostafa, Xiaoyin Wang, and Tao Xie Google ScholarDigital Library
- Michael Pradel, Markus Huggler, and Thomas R. Gross. 2014. Performance Regression Testing of Concurrent Classes. In Proceedings of the International Symposium on Software Testing and Analysis. 13–25. Google ScholarDigital Library
- C. Pyo, S. Pae, and G. Lee. 2009. DRAM as Source of Randomness. Electronics Letters 45, 1 (2009), 26–27.Google ScholarCross Ref
- Thomas Reps, Susan Horwitz, and Mooly Sagiv. 1995. Precise Interprocedural Dataflow Analysis via Graph Reachability. In Proceedings of the 22Nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. 49–61. Google ScholarDigital Library
- Gregg Rothermel and Mary J. Harrold. 1997. A Safe, Efficient Regression Test Selection Technique. ACM Trans. Softw. Eng. Methodol. 6, 2 (April 1997), 173–210. Google ScholarDigital Library
- Gregg Rothermel, Roland H. Untch, Chengyun Chu, and Mary J. Harrold. 1999. Test Case Prioritization: An Empirical Study. In Software Maintenance, 1999. (ICSM ’99) Proceedings. IEEE International Conference on. 179–188. Google ScholarDigital Library
- Kai Shen, Ming Zhong, and Chuanpeng Li. 2005. I/O System Performance Debugging Using Model-driven Anomaly Characterization. In Proceedings of the 4th Conference on USENIX Conference on File and Storage Technologies - Volume 4. 23–23. Google ScholarDigital Library
- Mark Sherriff and Laurie Williams. 2008. Empirical Software Change Impact Analysis Using Singular Value Decomposition. In Proceedings of the 2008 International Conference on Software Testing, Verification, and Validation. 268–277. Google ScholarDigital Library
- Stoyan Stefanov. 2008. Yslow 2.0. In CSDN Software Development 2.0 Conference.Google Scholar
- Richard J. Turver and Malcolm Munro. 1994. An early impact analysis technique for software maintenance. Journal of Software Maintenance: Research and Practice 6, 1 (1994), 35–52.Google ScholarCross Ref
- Raja Vallée-Rai, Phong Co, Etienne Gagnon, Laurie Hendren, Patrick Lam, and Vijay Sundaresan. 1999. Soot - a Java Bytecode Optimization Framework. In Proceedings of the 1999 Conference of the Centre for Advanced Studies on Collaborative Research (CASCON ’99). 13. Google ScholarDigital Library
- Xiaoyin Wang, David Lo, Jiefeng Cheng, Lu Zhang, Hong Mei, and Jeffrey Xu Yu. 2010. Matching Dependence-related Queries in the System Dependence Graph. In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering. 457–466. Google ScholarDigital Library
- Elaine J. Weyuker and Filippos I. Vokolos. 2000. Experience with Performance Testing of Software Systems: Issues, an Approach, and Case Study. IEEE Trans. Softw. Eng. 26, 12 (Dec. 2000), 1147–1156. Google ScholarDigital Library
- Elain J. Weyuker and Filippos I. Vokolos. 2000. Experience with Performance Testing of Software Systems: Issues, An Approach, and Case Study. IEEE Transactions on Software Engineering 26, 12 (2000), 1147–1156. Google ScholarDigital Library
- Paul R. Wilson. 1992. Uniprocessor Garbage Collection Techniques. 1–42.Google Scholar
- Dacong Yan, Guoqing Xu, and Atanas Rountev. 2012. Uncovering Performance Problems in Java Applications with Reference Propagation Profiling. In Proceedings of the 34th International Conference on Software Engineering. 134–144. Google ScholarDigital Library
- Shahed Zaman, Bram Adams, and Ahmed E. Hassan. 2012. A Qualitative Study on Performance Bugs. In Proceedings of the 9th IEEE Working Conference on Mining Software Repositories. 199–208. Google ScholarDigital Library
- Lingming Zhang, Dan Hao, Lu Zhang, Gregg Rothermel, and Hong Mei. 2013. Bridging the Gap Between the Total and Additional Test-case Prioritization Strategies. In Proceedings of the 2013 International Conference on Software Engineering (ICSE ’13). 192–201. Google ScholarDigital Library
- Pingyu Zhang, Sebastian Elbaum, and Matthew B. Dwyer. 2011. Automatic Generation of Load Tests. In Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering. 43–52. Google ScholarDigital Library
- Hao Zhong, Lu Zhang, and Hong Mei. 2006. An Experimental Comparison of Four Test Suite Reduction Techniques. In Proceedings of the 28th International Conference on Software Engineering. 636–640. Google ScholarDigital Library
Index Terms
- PerfRanker: prioritization of performance regression tests for collection-intensive software
Recommendations
A Heuristic Model-Based Test Prioritization Method for Regression Testing
IS3C '12: Proceedings of the 2012 International Symposium on Computer, Consumer and ControlDue to the resource and time constraints for re-executing large test suites in regression testing, developers are interested in detecting faults in the system as early as possible. Test case prioritization seeks to order test cases in such a way that ...
Optimizing test prioritization via test distribution analysis
ESEC/FSE 2018: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software EngineeringTest prioritization aims to detect regression faults faster via reordering test executions, and a large number of test prioritization techniques have been proposed accordingly. However, test prioritization effectiveness is usually measured in terms of ...
Model-based test prioritization heuristic methods and their evaluation
A-MOST '07: Proceedings of the 3rd international workshop on Advances in model-based testingDuring regression testing, a modified system needs to be retested using the existing test suite. Since test suites may be very large, developers are interested in detecting faults in the system as early as possible. Test prioritization orders test cases ...
Comments