ABSTRACT
Striving for reliability of software systems often results in immense numbers of tests. Due to the lack of a generally used annotation, finding the parts of code these tests were meant to assess can be a demanding task. This is a valid problem of software engineering called test-to-code traceability. Recent research on the subject has attempted to cope with this problem applying various approaches and their combinations, achieving profound results. These approaches have involved the use of naming conventions during development processes and also have utilized various information retrieval (IR) methods often referred to as conceptual information. In this work we investigate the benefits of textual information located in software code and its value for aiding traceability. We evaluated the capabilities of the natural language processing technique called Latent Semantic Indexing (LSI) in the view of the results of the naming conventions technique on five real, medium sized software systems. Although LSI is already used for this purpose, we extend the viewpoint of one-to-one traceability approach to the more versatile view of LSI as a recommendation system. We found that considering the top 5 elements in the ranked list increases the results by 30% on average and makes LSI a viable alternative in projects where naming conventions are not followed systematically.
- Reem S. Alsuhaibani, Christian D. Newman, Michael L. Collard, and Jonathan I. Maletic. 2015. Heuristic-based part-of-speech tagging of source code identifiers and comments. In 2015 IEEE 5th Workshop on Mining Unstructured Data (MUD). IEEE, 1--6.Google Scholar
- Markus Borg, Per Runeson, and Anders Ardö. 2014. Recovering from a decade: a systematic mapping of information retrieval approaches to software traceability. Empirical Software Engineering 19, 6 (dec 2014), 1565--1616. Google ScholarDigital Library
- SC Deerwester, ST Dumais, and TK Landauer. 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science and Technology 41, 6 (1990), 391--407.Google ScholarCross Ref
- Bogdan Dit, Latifa Guerrouj, Denys Poshyvanyk, and Giuliano Antoniol. 2011. Can Better Identifier Splitting Techniques Help Feature Location?. In Program Comprehension (ICPC), 2011 IEEE 19th International Conference on (ICPC '11). IEEE, Washington, DC, USA, 11--20. Google ScholarDigital Library
- Brian P. Eddy, Jeffrey A. Robinson, Nicholas A. Kraft, and Jeffrey C. Carver. 2013. Evaluating source code summarization techniques: Replication and expansion. In 2013 21st International Conference on Program Comprehension (ICPC). IEEE, 13--22.Google Scholar
- Emily Hill, David Binkley, Dawn Lawrie, Lori Pollock, and K. Vijay-Shanker. 2014. An empirical study of identifier splitting techniques. Empirical Software Engineering 19, 6 (dec 2014), 1754--1780. Google ScholarDigital Library
- Emily Hill, Shivani Rao, and Avinash Kak. 2012. On the Use of Stemming for Concern Location and Bug Localization in Java. In Working Conference on Source Code Analysis and Manipulation. IEEE, 184--193. Google ScholarDigital Library
- Manabu Kamimura and Gail C. Murphy. 2013. Towards generating human-oriented summaries of unit test cases. In 2013 21st International Conference on Program Comprehension (ICPC). IEEE, 215--218.Google Scholar
- András Kicsi, László Vidács, Arpád Beszédes, Ferenc Kocsis, and István Kovács. 2017. Information Retrieval Based Feature Analysis for Product Line Adoption in 4GL Systems. In Proceedins of the 17th International Conference on Computational Science and Its Applications - ICCSA 2017. IEEE, 1--6.Google ScholarCross Ref
- András Kicsi, László Vidács, Viktor Csuvik, Ferenc Horváth, Arpád Beszédes, and Ferenc Kocsis. 2018. Supporting Product Line Adoption by Combining Syntactic and Textual Feature Extraction. In International Conference on Software Reuse, ICSR 2018. Springer International Publishing.Google ScholarCross Ref
- Pavneet Singh Kochhar, Xin Xia, David Lo, and Shanping Li. 2016. Practitioners' expectations on automated fault localization. In Proceedings of the 25th International Symposium on Software Testing and Analysis - ISSTA 2016. ACM Press, New York, New York, USA, 165--176. Google ScholarDigital Library
- Alina Lazar, Sarah Ritchey, and Bonita Sharif. 2014. Improving the accuracy of duplicate bug report detection using textual similarity measures. In Proceedings of the 11th Working Conference on Mining Software Repositories - MSR 2014. ACM Press, New York, New York, USA, 308--311. Google ScholarDigital Library
- Boyang Li, Christopher Vendome, Mario Linares-Vasquez, Denys Poshyvanyk, and Nicholas A. Kraft. 2016. Automatically Documenting Unit Test Cases. In 2016 IEEE International Conference on Software Testing, Verification and Validation (ICST). Coll. of William & Mary, Williamsburg, VA, USA, IEEE, 341--352.Google Scholar
- Patrick Mader and Alexander Egyed. 2012. Assessing the effect of requirements traceability for software maintenance. IEEE International Conference on Software Maintenance, ICSM (2012), 171--180. Google ScholarDigital Library
- Andrian Marcus and Jonathan I. Maletic. 2003. Recovering documentation-to-source-code traceability links using latent semantic indexing. 25th International Conference on Software Engineering, 2003 (2003), 125--135. Google ScholarDigital Library
- Paul W. McBurney. 2015. Automatic Documentation Generation via Source Code Summarization. In Proceedings - International Conference on Software Engineering (ICSE 2015), Vol. 2. 903--906. Google ScholarDigital Library
- Laura Moreno, Gabriele Bavota, Sonia Haiduc, Massimiliano Di Penta, Rocco Oliveto, Barbara Russo, and Andrian Marcus. 2015. Query-based configuration of text retrieval solutions for software engineering tasks. In ESEC/FSE 2015. ACM Press, 567--578. Google ScholarDigital Library
- Laura Moreno, John Joseph Treadway, Andrian Marcus, and Wuwei Shen. 2014. On the use of stack traces to improve text retrieval-based bug localization. In Proceedings - 30th International Conference on Software Maintenance and Evolution, ICSME 2014. IEEE, 151--160. Google ScholarDigital Library
- A. Panichella, C. McMillan, E. Moritz, D. Palmieri, R. Oliveto, D. Poshyvanyk, and A. De Lucia. 2013. When and How Using Structural Information to Improve IR-Based Traceability Recovery. In 2013 17th European Conference on Software Maintenance and Reengineering. IEEE, 199--208. Google ScholarDigital Library
- Sebastiano Panichella, Annibale Panichella, Moritz Beller, Andy Zaidman, and Harald C. Gall. 2016. The impact of test case summaries on bug fixing performance. In 38th International Conference on Software Engineering - ICSE '16. ACM Press, New York, New York, USA, 547--558. Google ScholarDigital Library
- Reza Meimandi Parizi, Sai Peck Lee, and Mohammad Dabbagh. 2014. Achievements and Challenges in State-of-the-Art Software Traceability Between Test and Code Artifacts. IEEE Transactions on Reliability 63 (2014), 913--926.Google ScholarCross Ref
- Friedrich Steimann Philipp Bouillon, Jens Krinke, Nils Meyer. 2007. EzUnit: A Framework for Associating Failed Unit Tests with Potential Programming Errors. In Agile Processes in Software Engineering and Extreme Programming. Vol. 4536. Springer Berlin Heidelberg, 101--104. Google ScholarDigital Library
- Abdallah Qusef, Gabriele Bavota, Rocco Oliveto, Andrea De Lucia, and David Binkley. 2011. SCOTCH: Test-to-code traceability using slicing and conceptual coupling. In IEEE International Conference on Software Maintenance, ICSM. IEEE, 63--72. Google ScholarDigital Library
- Abdallah Qusef, Gabriele Bavota, Rocco Oliveto, Andrea De Lucia, and Dave Binkley. 2014. Recovering test-to-code traceability using slicing and textual analysis. Journal of Systems and Software 88 (2014), 147--168.Google ScholarDigital Library
- Abdallah Qusef, Rocco Oliveto, and Andrea De Lucia. 2010. Recovering traceability links between unit tests and classes under test: An improved method. In IEEE International Conference on Software Maintenance, ICSM. IEEE, 1--10. Google ScholarDigital Library
- Radim Rehurek and Petr Sojka. 2010. Software Framework for Topic Modelling with Large Corpora. Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks (2010), 45--50.Google Scholar
- Martin Robillard, Robert Walker, and Thomas Zimmermann. 2010. Recommendation Systems for Software Engineering. IEEE Software 27, 4 (jul 2010), 80--86. Google ScholarDigital Library
- Martin P. Robillard, Walid Maalej, Robert J. Walker, and Thomas Zimmermann. 2014. Recommendation Systems in Software Engineering. Springer Publishing Company, Incorporated. Google ScholarDigital Library
- Bart Van Rompaey and Serge Demeyer. 2009. Establishing traceability links between unit test cases and units under test. In European Conference on Software Maintenance and Reengineering, CSMR. IEEE, 209--218. Google ScholarDigital Library
- Ripon K. Saha, Lingming Zhang, Sarfraz Khurshid, and Dewayne E. Perry. 2015. An Information Retrieval Approach for Regression Test Prioritization Based on Program Changes. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering. IEEE, 268--279. Google ScholarDigital Library
- H.M. Sneed. 2004. Reverse engineering of test cases for selective regression testing. In European Conference on Software Maintenance and Reengineering, CSMR 2004. IEEE, 69--74. Google ScholarDigital Library
- SourceMeter 2018. SourceMeter Webpage. https://www.sourcemeter.com/. (2018).Google Scholar
- Chengnian Sun, David Lo, Siau Cheng Khoo, and Jing Jiang. 2011. Towards more accurate retrieval of duplicate bug reports. 2011 26th IEEE/ACM International Conference on Automated Software Engineering, ASE 2011, Proceedings (2011), 253--262. Google ScholarDigital Library
- Xin Xia, Lingfeng Bao, David Lo, and Shanping Li. 2016. "Automated Debugging Considered Harmful" Considered Harmful: A User Study Revisiting the Usefulness of Spectra-Based Fault Localization Techniques with Professionals Using Real Bugs from Large Systems. In 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 267--278.Google ScholarCross Ref
- Benwen Zhang, Emily Hill, and James Clause. 2016. Towards automatically generating descriptive names for unit tests. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering - ASE 2016. ACM Press, New York, New York, USA, 625--636. Google ScholarDigital Library
- Yun Zhang, David Lo, Xin Xia, and Jian Ling Sun. 2015. Multi-Factor Duplicate Question Detection in Stack Overflow. Journal of Computer Science and Technology 30, 5 (2015), 981--997.Google ScholarCross Ref
Index Terms
- Exploring the benefits of utilizing conceptual information in test-to-code traceability
Recommendations
Source code level word embeddings in aiding semantic test-to-code traceability
SST '19: Proceedings of the 10th International Workshop on Software and Systems TraceabilityProper recovery of test-to-code traceability links from source code could considerably aid software maintenance. Scientific research has already shown that this can be achieved to an extent with a range of techniques relying on various information ...
Feature-to-Code Traceability in Legacy Software Variants
SEAA '13: Proceedings of the 2013 39th Euromicro Conference on Software Engineering and Advanced ApplicationsExisting similar software variants, developed by ad-hoc reuse technique such as left clone-and-own right, represent a starting point to build a software product line (SPL) core assets. To re-engineer such legacy software variants into an SPL for ...
Toward improved traceability of non-functional requirements
TEFSE '05: Proceedings of the 3rd international workshop on Traceability in emerging forms of software engineeringThis position paper examines current practices and challenges for tracing non-functional requirements (NFRs). Anecdotal evidence suggests that many organizations do not effectively trace NFRs and that functional changes are often implemented with very ...
Comments