research-article

Exploring the benefits of utilizing conceptual information in test-to-code traceability

Authors:
András Kicsi

University of Szeged, Szeged, Hungary

University of Szeged, Szeged, Hungary
View Profile

,
László Tóth

University of Szeged, Szeged, Hungary

University of Szeged, Szeged, Hungary
View Profile

,
László Vidács

University of Szeged, Szeged, Hungary

University of Szeged, Szeged, Hungary
View Profile

RAISE '18: Proceedings of the 6th International Workshop on Realizing Artificial Intelligence Synergies in Software EngineeringMay 2018Pages 8–14https://doi.org/10.1145/3194104.3194106

Published:28 May 2018Publication History

RAISE '18: Proceedings of the 6th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering

Pages 8–14

ABSTRACT

Striving for reliability of software systems often results in immense numbers of tests. Due to the lack of a generally used annotation, finding the parts of code these tests were meant to assess can be a demanding task. This is a valid problem of software engineering called test-to-code traceability. Recent research on the subject has attempted to cope with this problem applying various approaches and their combinations, achieving profound results. These approaches have involved the use of naming conventions during development processes and also have utilized various information retrieval (IR) methods often referred to as conceptual information. In this work we investigate the benefits of textual information located in software code and its value for aiding traceability. We evaluated the capabilities of the natural language processing technique called Latent Semantic Indexing (LSI) in the view of the results of the naming conventions technique on five real, medium sized software systems. Although LSI is already used for this purpose, we extend the viewpoint of one-to-one traceability approach to the more versatile view of LSI as a recommendation system. We found that considering the top 5 elements in the ranked list increases the results by 30% on average and makes LSI a viable alternative in projects where naming conventions are not followed systematically.

References

Reem S. Alsuhaibani, Christian D. Newman, Michael L. Collard, and Jonathan I. Maletic. 2015. Heuristic-based part-of-speech tagging of source code identifiers and comments. In 2015 IEEE 5th Workshop on Mining Unstructured Data (MUD). IEEE, 1--6.Google Scholar
Markus Borg, Per Runeson, and Anders Ardö. 2014. Recovering from a decade: a systematic mapping of information retrieval approaches to software traceability. Empirical Software Engineering 19, 6 (dec 2014), 1565--1616. Google ScholarDigital Library
SC Deerwester, ST Dumais, and TK Landauer. 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science and Technology 41, 6 (1990), 391--407.Google ScholarCross Ref
Bogdan Dit, Latifa Guerrouj, Denys Poshyvanyk, and Giuliano Antoniol. 2011. Can Better Identifier Splitting Techniques Help Feature Location?. In Program Comprehension (ICPC), 2011 IEEE 19th International Conference on (ICPC '11). IEEE, Washington, DC, USA, 11--20. Google ScholarDigital Library
Brian P. Eddy, Jeffrey A. Robinson, Nicholas A. Kraft, and Jeffrey C. Carver. 2013. Evaluating source code summarization techniques: Replication and expansion. In 2013 21st International Conference on Program Comprehension (ICPC). IEEE, 13--22.Google Scholar
Emily Hill, David Binkley, Dawn Lawrie, Lori Pollock, and K. Vijay-Shanker. 2014. An empirical study of identifier splitting techniques. Empirical Software Engineering 19, 6 (dec 2014), 1754--1780. Google ScholarDigital Library
Emily Hill, Shivani Rao, and Avinash Kak. 2012. On the Use of Stemming for Concern Location and Bug Localization in Java. In Working Conference on Source Code Analysis and Manipulation. IEEE, 184--193. Google ScholarDigital Library
Manabu Kamimura and Gail C. Murphy. 2013. Towards generating human-oriented summaries of unit test cases. In 2013 21st International Conference on Program Comprehension (ICPC). IEEE, 215--218.Google Scholar
András Kicsi, László Vidács, Arpád Beszédes, Ferenc Kocsis, and István Kovács. 2017. Information Retrieval Based Feature Analysis for Product Line Adoption in 4GL Systems. In Proceedins of the 17th International Conference on Computational Science and Its Applications - ICCSA 2017. IEEE, 1--6.Google ScholarCross Ref
András Kicsi, László Vidács, Viktor Csuvik, Ferenc Horváth, Arpád Beszédes, and Ferenc Kocsis. 2018. Supporting Product Line Adoption by Combining Syntactic and Textual Feature Extraction. In International Conference on Software Reuse, ICSR 2018. Springer International Publishing.Google ScholarCross Ref
Pavneet Singh Kochhar, Xin Xia, David Lo, and Shanping Li. 2016. Practitioners' expectations on automated fault localization. In Proceedings of the 25th International Symposium on Software Testing and Analysis - ISSTA 2016. ACM Press, New York, New York, USA, 165--176. Google ScholarDigital Library
Alina Lazar, Sarah Ritchey, and Bonita Sharif. 2014. Improving the accuracy of duplicate bug report detection using textual similarity measures. In Proceedings of the 11th Working Conference on Mining Software Repositories - MSR 2014. ACM Press, New York, New York, USA, 308--311. Google ScholarDigital Library
Boyang Li, Christopher Vendome, Mario Linares-Vasquez, Denys Poshyvanyk, and Nicholas A. Kraft. 2016. Automatically Documenting Unit Test Cases. In 2016 IEEE International Conference on Software Testing, Verification and Validation (ICST). Coll. of William & Mary, Williamsburg, VA, USA, IEEE, 341--352.Google Scholar
Patrick Mader and Alexander Egyed. 2012. Assessing the effect of requirements traceability for software maintenance. IEEE International Conference on Software Maintenance, ICSM (2012), 171--180. Google ScholarDigital Library
Andrian Marcus and Jonathan I. Maletic. 2003. Recovering documentation-to-source-code traceability links using latent semantic indexing. 25th International Conference on Software Engineering, 2003 (2003), 125--135. Google ScholarDigital Library
Paul W. McBurney. 2015. Automatic Documentation Generation via Source Code Summarization. In Proceedings - International Conference on Software Engineering (ICSE 2015), Vol. 2. 903--906. Google ScholarDigital Library
Laura Moreno, Gabriele Bavota, Sonia Haiduc, Massimiliano Di Penta, Rocco Oliveto, Barbara Russo, and Andrian Marcus. 2015. Query-based configuration of text retrieval solutions for software engineering tasks. In ESEC/FSE 2015. ACM Press, 567--578. Google ScholarDigital Library
Laura Moreno, John Joseph Treadway, Andrian Marcus, and Wuwei Shen. 2014. On the use of stack traces to improve text retrieval-based bug localization. In Proceedings - 30th International Conference on Software Maintenance and Evolution, ICSME 2014. IEEE, 151--160. Google ScholarDigital Library
A. Panichella, C. McMillan, E. Moritz, D. Palmieri, R. Oliveto, D. Poshyvanyk, and A. De Lucia. 2013. When and How Using Structural Information to Improve IR-Based Traceability Recovery. In 2013 17th European Conference on Software Maintenance and Reengineering. IEEE, 199--208. Google ScholarDigital Library
Sebastiano Panichella, Annibale Panichella, Moritz Beller, Andy Zaidman, and Harald C. Gall. 2016. The impact of test case summaries on bug fixing performance. In 38th International Conference on Software Engineering - ICSE '16. ACM Press, New York, New York, USA, 547--558. Google ScholarDigital Library
Reza Meimandi Parizi, Sai Peck Lee, and Mohammad Dabbagh. 2014. Achievements and Challenges in State-of-the-Art Software Traceability Between Test and Code Artifacts. IEEE Transactions on Reliability 63 (2014), 913--926.Google ScholarCross Ref
Friedrich Steimann Philipp Bouillon, Jens Krinke, Nils Meyer. 2007. EzUnit: A Framework for Associating Failed Unit Tests with Potential Programming Errors. In Agile Processes in Software Engineering and Extreme Programming. Vol. 4536. Springer Berlin Heidelberg, 101--104. Google ScholarDigital Library
Abdallah Qusef, Gabriele Bavota, Rocco Oliveto, Andrea De Lucia, and David Binkley. 2011. SCOTCH: Test-to-code traceability using slicing and conceptual coupling. In IEEE International Conference on Software Maintenance, ICSM. IEEE, 63--72. Google ScholarDigital Library
Abdallah Qusef, Gabriele Bavota, Rocco Oliveto, Andrea De Lucia, and Dave Binkley. 2014. Recovering test-to-code traceability using slicing and textual analysis. Journal of Systems and Software 88 (2014), 147--168.Google ScholarDigital Library
Abdallah Qusef, Rocco Oliveto, and Andrea De Lucia. 2010. Recovering traceability links between unit tests and classes under test: An improved method. In IEEE International Conference on Software Maintenance, ICSM. IEEE, 1--10. Google ScholarDigital Library
Radim Rehurek and Petr Sojka. 2010. Software Framework for Topic Modelling with Large Corpora. Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks (2010), 45--50.Google Scholar
Martin Robillard, Robert Walker, and Thomas Zimmermann. 2010. Recommendation Systems for Software Engineering. IEEE Software 27, 4 (jul 2010), 80--86. Google ScholarDigital Library
Martin P. Robillard, Walid Maalej, Robert J. Walker, and Thomas Zimmermann. 2014. Recommendation Systems in Software Engineering. Springer Publishing Company, Incorporated. Google ScholarDigital Library
Bart Van Rompaey and Serge Demeyer. 2009. Establishing traceability links between unit test cases and units under test. In European Conference on Software Maintenance and Reengineering, CSMR. IEEE, 209--218. Google ScholarDigital Library
Ripon K. Saha, Lingming Zhang, Sarfraz Khurshid, and Dewayne E. Perry. 2015. An Information Retrieval Approach for Regression Test Prioritization Based on Program Changes. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering. IEEE, 268--279. Google ScholarDigital Library
H.M. Sneed. 2004. Reverse engineering of test cases for selective regression testing. In European Conference on Software Maintenance and Reengineering, CSMR 2004. IEEE, 69--74. Google ScholarDigital Library
SourceMeter 2018. SourceMeter Webpage. https://www.sourcemeter.com/. (2018).Google Scholar
Chengnian Sun, David Lo, Siau Cheng Khoo, and Jing Jiang. 2011. Towards more accurate retrieval of duplicate bug reports. 2011 26th IEEE/ACM International Conference on Automated Software Engineering, ASE 2011, Proceedings (2011), 253--262. Google ScholarDigital Library
Xin Xia, Lingfeng Bao, David Lo, and Shanping Li. 2016. "Automated Debugging Considered Harmful" Considered Harmful: A User Study Revisiting the Usefulness of Spectra-Based Fault Localization Techniques with Professionals Using Real Bugs from Large Systems. In 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 267--278.Google ScholarCross Ref
Benwen Zhang, Emily Hill, and James Clause. 2016. Towards automatically generating descriptive names for unit tests. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering - ASE 2016. ACM Press, New York, New York, USA, 625--636. Google ScholarDigital Library
Yun Zhang, David Lo, Xin Xia, and Jian Ling Sun. 2015. Multi-Factor Duplicate Question Detection in Stack Overflow. Journal of Computer Science and Technology 30, 5 (2015), 981--997.Google ScholarCross Ref

Index Terms

Exploring the benefits of utilizing conceptual information in test-to-code traceability
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
2. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Process validation
        Traceability

Recommendations

Source code level word embeddings in aiding semantic test-to-code traceability
SST '19: Proceedings of the 10th International Workshop on Software and Systems Traceability

Proper recovery of test-to-code traceability links from source code could considerably aid software maintenance. Scientific research has already shown that this can be achieved to an extent with a range of techniques relying on various information ...
Read More
Feature-to-Code Traceability in Legacy Software Variants
SEAA '13: Proceedings of the 2013 39th Euromicro Conference on Software Engineering and Advanced Applications

Existing similar software variants, developed by ad-hoc reuse technique such as left clone-and-own right, represent a starting point to build a software product line (SPL) core assets. To re-engineer such legacy software variants into an SPL for ...
Read More
Toward improved traceability of non-functional requirements
TEFSE '05: Proceedings of the 3rd international workshop on Traceability in emerging forms of software engineering

This position paper examines current practices and challenges for tracing non-functional requirements (NFRs). Anecdotal evidence suggests that many organizations do not effectively trace NFRs and that functional changes are often implemented with very ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
RAISE '18: Proceedings of the 6th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering
May 2018
67 pages
ISBN:9781450357234
DOI:10.1145/3194104
Conference Chairs:
Walter F. Tichy
Karlsruhe Institute of Technology, Germany
,
Leandro Minku
University of Leicester, UK
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 May 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
LSI
natural language processing (nlp)
testing
traceability
Qualifiers
- research-article
Conference

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 98
  Total Downloads
- Downloads (Last 12 months)13
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Exploring the benefits of utilizing conceptual information in test-to-code traceability

RAISE '18: Proceedings of the 6th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Source code level word embeddings in aiding semantic test-to-code traceability

Feature-to-Code Traceability in Legacy Software Variants

Toward improved traceability of non-functional requirements