skip to main content
10.1145/3194104.3194106acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Exploring the benefits of utilizing conceptual information in test-to-code traceability

Published:28 May 2018Publication History

ABSTRACT

Striving for reliability of software systems often results in immense numbers of tests. Due to the lack of a generally used annotation, finding the parts of code these tests were meant to assess can be a demanding task. This is a valid problem of software engineering called test-to-code traceability. Recent research on the subject has attempted to cope with this problem applying various approaches and their combinations, achieving profound results. These approaches have involved the use of naming conventions during development processes and also have utilized various information retrieval (IR) methods often referred to as conceptual information. In this work we investigate the benefits of textual information located in software code and its value for aiding traceability. We evaluated the capabilities of the natural language processing technique called Latent Semantic Indexing (LSI) in the view of the results of the naming conventions technique on five real, medium sized software systems. Although LSI is already used for this purpose, we extend the viewpoint of one-to-one traceability approach to the more versatile view of LSI as a recommendation system. We found that considering the top 5 elements in the ranked list increases the results by 30% on average and makes LSI a viable alternative in projects where naming conventions are not followed systematically.

References

  1. Reem S. Alsuhaibani, Christian D. Newman, Michael L. Collard, and Jonathan I. Maletic. 2015. Heuristic-based part-of-speech tagging of source code identifiers and comments. In 2015 IEEE 5th Workshop on Mining Unstructured Data (MUD). IEEE, 1--6.Google ScholarGoogle Scholar
  2. Markus Borg, Per Runeson, and Anders Ardö. 2014. Recovering from a decade: a systematic mapping of information retrieval approaches to software traceability. Empirical Software Engineering 19, 6 (dec 2014), 1565--1616. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. SC Deerwester, ST Dumais, and TK Landauer. 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science and Technology 41, 6 (1990), 391--407.Google ScholarGoogle ScholarCross RefCross Ref
  4. Bogdan Dit, Latifa Guerrouj, Denys Poshyvanyk, and Giuliano Antoniol. 2011. Can Better Identifier Splitting Techniques Help Feature Location?. In Program Comprehension (ICPC), 2011 IEEE 19th International Conference on (ICPC '11). IEEE, Washington, DC, USA, 11--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Brian P. Eddy, Jeffrey A. Robinson, Nicholas A. Kraft, and Jeffrey C. Carver. 2013. Evaluating source code summarization techniques: Replication and expansion. In 2013 21st International Conference on Program Comprehension (ICPC). IEEE, 13--22.Google ScholarGoogle Scholar
  6. Emily Hill, David Binkley, Dawn Lawrie, Lori Pollock, and K. Vijay-Shanker. 2014. An empirical study of identifier splitting techniques. Empirical Software Engineering 19, 6 (dec 2014), 1754--1780. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Emily Hill, Shivani Rao, and Avinash Kak. 2012. On the Use of Stemming for Concern Location and Bug Localization in Java. In Working Conference on Source Code Analysis and Manipulation. IEEE, 184--193. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Manabu Kamimura and Gail C. Murphy. 2013. Towards generating human-oriented summaries of unit test cases. In 2013 21st International Conference on Program Comprehension (ICPC). IEEE, 215--218.Google ScholarGoogle Scholar
  9. András Kicsi, László Vidács, Arpád Beszédes, Ferenc Kocsis, and István Kovács. 2017. Information Retrieval Based Feature Analysis for Product Line Adoption in 4GL Systems. In Proceedins of the 17th International Conference on Computational Science and Its Applications - ICCSA 2017. IEEE, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  10. András Kicsi, László Vidács, Viktor Csuvik, Ferenc Horváth, Arpád Beszédes, and Ferenc Kocsis. 2018. Supporting Product Line Adoption by Combining Syntactic and Textual Feature Extraction. In International Conference on Software Reuse, ICSR 2018. Springer International Publishing.Google ScholarGoogle ScholarCross RefCross Ref
  11. Pavneet Singh Kochhar, Xin Xia, David Lo, and Shanping Li. 2016. Practitioners' expectations on automated fault localization. In Proceedings of the 25th International Symposium on Software Testing and Analysis - ISSTA 2016. ACM Press, New York, New York, USA, 165--176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Alina Lazar, Sarah Ritchey, and Bonita Sharif. 2014. Improving the accuracy of duplicate bug report detection using textual similarity measures. In Proceedings of the 11th Working Conference on Mining Software Repositories - MSR 2014. ACM Press, New York, New York, USA, 308--311. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Boyang Li, Christopher Vendome, Mario Linares-Vasquez, Denys Poshyvanyk, and Nicholas A. Kraft. 2016. Automatically Documenting Unit Test Cases. In 2016 IEEE International Conference on Software Testing, Verification and Validation (ICST). Coll. of William & Mary, Williamsburg, VA, USA, IEEE, 341--352.Google ScholarGoogle Scholar
  14. Patrick Mader and Alexander Egyed. 2012. Assessing the effect of requirements traceability for software maintenance. IEEE International Conference on Software Maintenance, ICSM (2012), 171--180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Andrian Marcus and Jonathan I. Maletic. 2003. Recovering documentation-to-source-code traceability links using latent semantic indexing. 25th International Conference on Software Engineering, 2003 (2003), 125--135. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Paul W. McBurney. 2015. Automatic Documentation Generation via Source Code Summarization. In Proceedings - International Conference on Software Engineering (ICSE 2015), Vol. 2. 903--906. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Laura Moreno, Gabriele Bavota, Sonia Haiduc, Massimiliano Di Penta, Rocco Oliveto, Barbara Russo, and Andrian Marcus. 2015. Query-based configuration of text retrieval solutions for software engineering tasks. In ESEC/FSE 2015. ACM Press, 567--578. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Laura Moreno, John Joseph Treadway, Andrian Marcus, and Wuwei Shen. 2014. On the use of stack traces to improve text retrieval-based bug localization. In Proceedings - 30th International Conference on Software Maintenance and Evolution, ICSME 2014. IEEE, 151--160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Panichella, C. McMillan, E. Moritz, D. Palmieri, R. Oliveto, D. Poshyvanyk, and A. De Lucia. 2013. When and How Using Structural Information to Improve IR-Based Traceability Recovery. In 2013 17th European Conference on Software Maintenance and Reengineering. IEEE, 199--208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Sebastiano Panichella, Annibale Panichella, Moritz Beller, Andy Zaidman, and Harald C. Gall. 2016. The impact of test case summaries on bug fixing performance. In 38th International Conference on Software Engineering - ICSE '16. ACM Press, New York, New York, USA, 547--558. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Reza Meimandi Parizi, Sai Peck Lee, and Mohammad Dabbagh. 2014. Achievements and Challenges in State-of-the-Art Software Traceability Between Test and Code Artifacts. IEEE Transactions on Reliability 63 (2014), 913--926.Google ScholarGoogle ScholarCross RefCross Ref
  22. Friedrich Steimann Philipp Bouillon, Jens Krinke, Nils Meyer. 2007. EzUnit: A Framework for Associating Failed Unit Tests with Potential Programming Errors. In Agile Processes in Software Engineering and Extreme Programming. Vol. 4536. Springer Berlin Heidelberg, 101--104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Abdallah Qusef, Gabriele Bavota, Rocco Oliveto, Andrea De Lucia, and David Binkley. 2011. SCOTCH: Test-to-code traceability using slicing and conceptual coupling. In IEEE International Conference on Software Maintenance, ICSM. IEEE, 63--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Abdallah Qusef, Gabriele Bavota, Rocco Oliveto, Andrea De Lucia, and Dave Binkley. 2014. Recovering test-to-code traceability using slicing and textual analysis. Journal of Systems and Software 88 (2014), 147--168.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Abdallah Qusef, Rocco Oliveto, and Andrea De Lucia. 2010. Recovering traceability links between unit tests and classes under test: An improved method. In IEEE International Conference on Software Maintenance, ICSM. IEEE, 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Radim Rehurek and Petr Sojka. 2010. Software Framework for Topic Modelling with Large Corpora. Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks (2010), 45--50.Google ScholarGoogle Scholar
  27. Martin Robillard, Robert Walker, and Thomas Zimmermann. 2010. Recommendation Systems for Software Engineering. IEEE Software 27, 4 (jul 2010), 80--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Martin P. Robillard, Walid Maalej, Robert J. Walker, and Thomas Zimmermann. 2014. Recommendation Systems in Software Engineering. Springer Publishing Company, Incorporated. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Bart Van Rompaey and Serge Demeyer. 2009. Establishing traceability links between unit test cases and units under test. In European Conference on Software Maintenance and Reengineering, CSMR. IEEE, 209--218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Ripon K. Saha, Lingming Zhang, Sarfraz Khurshid, and Dewayne E. Perry. 2015. An Information Retrieval Approach for Regression Test Prioritization Based on Program Changes. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering. IEEE, 268--279. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. H.M. Sneed. 2004. Reverse engineering of test cases for selective regression testing. In European Conference on Software Maintenance and Reengineering, CSMR 2004. IEEE, 69--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. SourceMeter 2018. SourceMeter Webpage. https://www.sourcemeter.com/. (2018).Google ScholarGoogle Scholar
  33. Chengnian Sun, David Lo, Siau Cheng Khoo, and Jing Jiang. 2011. Towards more accurate retrieval of duplicate bug reports. 2011 26th IEEE/ACM International Conference on Automated Software Engineering, ASE 2011, Proceedings (2011), 253--262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Xin Xia, Lingfeng Bao, David Lo, and Shanping Li. 2016. "Automated Debugging Considered Harmful" Considered Harmful: A User Study Revisiting the Usefulness of Spectra-Based Fault Localization Techniques with Professionals Using Real Bugs from Large Systems. In 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 267--278.Google ScholarGoogle ScholarCross RefCross Ref
  35. Benwen Zhang, Emily Hill, and James Clause. 2016. Towards automatically generating descriptive names for unit tests. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering - ASE 2016. ACM Press, New York, New York, USA, 625--636. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Yun Zhang, David Lo, Xin Xia, and Jian Ling Sun. 2015. Multi-Factor Duplicate Question Detection in Stack Overflow. Journal of Computer Science and Technology 30, 5 (2015), 981--997.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Exploring the benefits of utilizing conceptual information in test-to-code traceability

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        RAISE '18: Proceedings of the 6th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering
        May 2018
        67 pages
        ISBN:9781450357234
        DOI:10.1145/3194104

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 28 May 2018

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Upcoming Conference

        ICSE 2025

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader