Skip to main content
Erschienen in: Empirical Software Engineering 2/2015

01.04.2015

An empirical study on the importance of source code entities for requirements traceability

verfasst von: Nasir Ali, Zohreh Sharafi, Yann-Gaël Guéhéneuc, Giuliano Antoniol

Erschienen in: Empirical Software Engineering | Ausgabe 2/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Requirements Traceability (RT) links help developers during program comprehension and maintenance tasks. However, creating RT links is a laborious and resource-consuming task. Information Retrieval (IR) techniques are useful to automatically create traceability links. However, IR-based techniques typically have low accuracy (precision, recall, or both) and thus, creating RT links remains a human intensive process. We conjecture that understanding how developers verify RT links could help improve the accuracy of IR-based RT techniques to create RT links. Consequently, we perform an empirical study consisting of four case studies. First, we use an eye-tracking system to capture developers’ eye movements while they verify RT links. We analyse the obtained data to identify and rank developers’ preferred types of Source Code Entities (SCEs), e.g., domain vs. implementation-level source code terms and class names vs. method names. Second, we perform another eye-tracking case study to confirm that it is the semantic content of the developers’ preferred types of SCEs and not their locations that attract developers’ attention and help them in their task to verify RT links. Third, we propose an improved term weighting scheme, i.e., Developers Preferred Term Frequency/Inverse Document Frequency (D P T F / I D F), that uses the knowledge of the developers’ preferred types of SCEs to give more importance to these SCEs into the term weighting scheme. We integrate thisweighting scheme with an IR technique, i.e., Latent Semantic Indexing (LSI), to create a new technique to RT link recovery. Using three systems (iTrust, Lucene, and Pooka), we show that the proposed technique statistically improves the accuracy of the recovered RT links over a technique based on LSI and the usual Term Frequency/Inverse Document Frequency (T F / I D F) weighting scheme. Finally, we compare the newly proposed D P T F / I D F with our original Domain Or Implementation/Inverse Document Frequency (D O I / I D F) weighting scheme.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Fußnoten
1
In this paper, we call “source code entities” any domain-level term, implementation-level term, class name, method name, variable name, or comment found in a piece of code. Domain concepts are concepts pertaining to the use of the system by users. Implementation concepts relate to data structures, GUI elements, databases, and algorithms. For example, in the Pooka e-mail client, addAddress in AddressBook.java class and addFocusListener in AddressEntryTextArea.java are domain-level and implementation-level concepts, respectively.
 
5
We consider any object X is a source code class, i.e., c i .
 
Literatur
Zurück zum Zitat Abadi A, Nisenson M, Simionovici Y (2008) A traceability technique for specifications. In: Proceeding of 16th IEEE international conference on program comprehension, pp 103 –112 Abadi A, Nisenson M, Simionovici Y (2008) A traceability technique for specifications. In: Proceeding of 16th IEEE international conference on program comprehension, pp 103 –112
Zurück zum Zitat Abebe SL, Tonella P (2011) Towards the extraction of domain concepts from the identifiers. In: Proceeding of 18th working conference on reverse engineering (WCRE), pp 77–86 Abebe SL, Tonella P (2011) Towards the extraction of domain concepts from the identifiers. In: Proceeding of 18th working conference on reverse engineering (WCRE), pp 77–86
Zurück zum Zitat Ali N, Guéhéneuc Y-G, Antoniol G (2011a) Factors impacting the inputs of traceability recovery approaches. In: Zisman A, Cleland-Huang J, Gotel O (eds) Software and systems traceability, chapter 7. Springer, New York Ali N, Guéhéneuc Y-G, Antoniol G (2011a) Factors impacting the inputs of traceability recovery approaches. In: Zisman A, Cleland-Huang J, Gotel O (eds) Software and systems traceability, chapter 7. Springer, New York
Zurück zum Zitat Ali N, Gueheneuc Y-G, Antoniol G (2011b) Requirements traceability for object oriented systems by partitioning source code. In: Proceedings of 18th working conference on reverse engineering, WCRE ’11. IEEE Computer Society, Washington, DC, pp pp 45–54 Ali N, Gueheneuc Y-G, Antoniol G (2011b) Requirements traceability for object oriented systems by partitioning source code. In: Proceedings of 18th working conference on reverse engineering, WCRE ’11. IEEE Computer Society, Washington, DC, pp pp 45–54
Zurück zum Zitat Ali N, Guéhéneuc Y-G, Antoniol G (2011c) Trust-based requirements traceability. In: Proceeding of 19th IEEE international conference on program comprehension. IEEE Computer Society, Washington, DC,p 10 Ali N, Guéhéneuc Y-G, Antoniol G (2011c) Trust-based requirements traceability. In: Proceeding of 19th IEEE international conference on program comprehension. IEEE Computer Society, Washington, DC,p 10
Zurück zum Zitat Ali N, Guéhéneuc Y-G, Antoniol G (2012a) Trustrace: mining software repositories to improve the accuracy of requirement traceability links. IEEE Trans Softw Eng 99(PrePrints):1 Ali N, Guéhéneuc Y-G, Antoniol G (2012a) Trustrace: mining software repositories to improve the accuracy of requirement traceability links. IEEE Trans Softw Eng 99(PrePrints):1
Zurück zum Zitat Ali N, Sharafi Z, Guéhéneuc Y-G, Antoniol G (2012b) An empirical study on requirements traceability using eye-tracking. In: Proceedings of IEEE international conference on software maintenance, pp 191–200 Ali N, Sharafi Z, Guéhéneuc Y-G, Antoniol G (2012b) An empirical study on requirements traceability using eye-tracking. In: Proceedings of IEEE international conference on software maintenance, pp 191–200
Zurück zum Zitat Antoniol G, Caprile B, Potrich A, Tonella P (2000) Design-code traceability for object-oriented systems. Ann Softw Eng 9(1):35–58CrossRef Antoniol G, Caprile B, Potrich A, Tonella P (2000) Design-code traceability for object-oriented systems. Ann Softw Eng 9(1):35–58CrossRef
Zurück zum Zitat Antoniol G, Canfora G, Casazza G, De Lucia A, Merlo E (2002) Recovering traceability links between code and documentation. IEEE Trans Softw Eng 28(10):970–983CrossRef Antoniol G, Canfora G, Casazza G, De Lucia A, Merlo E (2002) Recovering traceability links between code and documentation. IEEE Trans Softw Eng 28(10):970–983CrossRef
Zurück zum Zitat Bachmann A, Bird C, Rahman F, Devanbu P, Bernstein A (2010) The missing links: bugs and bug-fix commits. In: Proceedings of the 18th ACM SIGSOFT international symposium on foundations of software engineering, FSE ’10. ACM, New York, pp 97–106 Bachmann A, Bird C, Rahman F, Devanbu P, Bernstein A (2010) The missing links: bugs and bug-fix commits. In: Proceedings of the 18th ACM SIGSOFT international symposium on foundations of software engineering, FSE ’10. ACM, New York, pp 97–106
Zurück zum Zitat Baldi PF, Lopes CV, Linstead EJ, Bajracharya SK (2008) A theory of aspects as latent topics. Sigplan Not 43(10):543–562CrossRef Baldi PF, Lopes CV, Linstead EJ, Bajracharya SK (2008) A theory of aspects as latent topics. Sigplan Not 43(10):543–562CrossRef
Zurück zum Zitat Bednarik R, Tukiainen M (2006) An eye-tracking methodology for characterizing program comprehension processes. In: Proceedings of the 2006 symposium on eye tracking research & applications. ETRA ’06. ACM, New York, pp 125–132 Bednarik R, Tukiainen M (2006) An eye-tracking methodology for characterizing program comprehension processes. In: Proceedings of the 2006 symposium on eye tracking research & applications. ETRA ’06. ACM, New York, pp 125–132
Zurück zum Zitat Bunge M (1977) Treatise on basic philosophy: vol. 3: ontology I: the furniture of the world. Reidel, BostonCrossRefMATH Bunge M (1977) Treatise on basic philosophy: vol. 3: ontology I: the furniture of the world. Reidel, BostonCrossRefMATH
Zurück zum Zitat Busjahn T, Schulte C, Busjahn A (2011) Analysis of code reading to gain more insight in program comprehension. In: Proceedings of the 11th Koli calling international conference on computing education research. Koli Calling ’11. ACM, New York, pp 1–9 Busjahn T, Schulte C, Busjahn A (2011) Analysis of code reading to gain more insight in program comprehension. In: Proceedings of the 11th Koli calling international conference on computing education research. Koli Calling ’11. ACM, New York, pp 1–9
Zurück zum Zitat Cepeda Porras G, Guéhéneuc Y-G (2010) An empirical study on the efficiency of different design pattern representations in uml class diagrams. Empir Softw Eng 15:493–522CrossRef Cepeda Porras G, Guéhéneuc Y-G (2010) An empirical study on the efficiency of different design pattern representations in uml class diagrams. Empir Softw Eng 15:493–522CrossRef
Zurück zum Zitat Dagenais B, Ossher H, Bellamy RKE, Robillard MP, de Vries JP (2010) Moving into a new software project landscape. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering - volume 1. ICSE ’10. ACM, New York, pp 275–284 Dagenais B, Ossher H, Bellamy RKE, Robillard MP, de Vries JP (2010) Moving into a new software project landscape. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering - volume 1. ICSE ’10. ACM, New York, pp 275–284
Zurück zum Zitat De Lucia A, Fasano F, Oliveto R, Tortora G (2007) Recovering traceability links in software artifact management systems using information retrieval methods. ACM Trans Softw Eng Methodol 16(4) De Lucia A, Fasano F, Oliveto R, Tortora G (2007) Recovering traceability links in software artifact management systems using information retrieval methods. ACM Trans Softw Eng Methodol 16(4)
Zurück zum Zitat De Lucia A, Di Penta M, Oliveto R, Panichella A, Panichella S (2011a) Improving ir-based traceability recovery using smoothing filters. In: Proceeding of 19th IEEE international conference on program comprehension, pp 21 –30 De Lucia A, Di Penta M, Oliveto R, Panichella A, Panichella S (2011a) Improving ir-based traceability recovery using smoothing filters. In: Proceeding of 19th IEEE international conference on program comprehension, pp 21 –30
Zurück zum Zitat De Lucia A, Di Penta M, Oliveto R (2011b) Improving source code lexicon via traceability and information retrieval. IEEE Trans Softw Eng 37:205–227CrossRef De Lucia A, Di Penta M, Oliveto R (2011b) Improving source code lexicon via traceability and information retrieval. IEEE Trans Softw Eng 37:205–227CrossRef
Zurück zum Zitat De Lucia A, Marcus A, Oliveto R, Poshyvanyk D (2012) Information retrieval methods for automated traceability recovery. In: Software and systems traceability, pp 71–98 De Lucia A, Marcus A, Oliveto R, Poshyvanyk D (2012) Information retrieval methods for automated traceability recovery. In: Software and systems traceability, pp 71–98
Zurück zum Zitat De Smet B, Lempereur L, Sharafi Z, Guéhéneuc Y-G, Antoniol G, Habra N (2012) Taupe: visualizing and analyzing eye-tracking data. Sci Comput Program De Smet B, Lempereur L, Sharafi Z, Guéhéneuc Y-G, Antoniol G, Habra N (2012) Taupe: visualizing and analyzing eye-tracking data. Sci Comput Program
Zurück zum Zitat Dit B, Panichella A, Moritz E, Oliveto R, Di Penta M, Poshyvanyk D, De Lucia A (2013) Configuring topic models for software engineering tasks in tracelab. In: Proceedings of 7th ACM/IEEE international conference in software engineering, vol 13, pp 105–109 Dit B, Panichella A, Moritz E, Oliveto R, Di Penta M, Poshyvanyk D, De Lucia A (2013) Configuring topic models for software engineering tasks in tracelab. In: Proceedings of 7th ACM/IEEE international conference in software engineering, vol 13, pp 105–109
Zurück zum Zitat Duchowski AT (2002) A breadth-first survey of eye-tracking applications. Behav Res Methods 34(4):455–470CrossRef Duchowski AT (2002) A breadth-first survey of eye-tracking applications. Behav Res Methods 34(4):455–470CrossRef
Zurück zum Zitat Duchowski AT (2007) Eye tracking methodology: theory and practice. Springer, New York Duchowski AT (2007) Eye tracking methodology: theory and practice. Springer, New York
Zurück zum Zitat Erol B, Berkner K, Joshi S (2006) Multimedia thumbnails for documents. In: Proceedings of the 14th annual ACM international conference on Multimedia. MULTIMEDIA ’06. ACM, New York, pp 231–240 Erol B, Berkner K, Joshi S (2006) Multimedia thumbnails for documents. In: Proceedings of the 14th annual ACM international conference on Multimedia. MULTIMEDIA ’06. ACM, New York, pp 231–240
Zurück zum Zitat Gethers M, Savage T, Di Penta M, Oliveto R, Poshyvanyk D, De Lucia A (2011) Codetopics: which topic am i coding now? In: Proceedings of the 33rd international conference on software engineering. ACM, pp 1034–1036 Gethers M, Savage T, Di Penta M, Oliveto R, Poshyvanyk D, De Lucia A (2011) Codetopics: which topic am i coding now? In: Proceedings of the 33rd international conference on software engineering. ACM, pp 1034–1036
Zurück zum Zitat Gotel OCZ, Finkelstein CW (1994) An analysis of the requirements traceability problem. In: 1st international conference on requirements engineering, pp 94–101 Gotel OCZ, Finkelstein CW (1994) An analysis of the requirements traceability problem. In: 1st international conference on requirements engineering, pp 94–101
Zurück zum Zitat Griffiths TL, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci USA 101(Suppl 1):5228–5235CrossRef Griffiths TL, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci USA 101(Suppl 1):5228–5235CrossRef
Zurück zum Zitat Guéhéneuc YG (2006) Taupe: towards understanding program comprehension. In: Proceedings of conference of the center for advanced studies on collaborative research. ACM, pp 1–13 Guéhéneuc YG (2006) Taupe: towards understanding program comprehension. In: Proceedings of conference of the center for advanced studies on collaborative research. ACM, pp 1–13
Zurück zum Zitat Kitchenham BA, Pfleeger SL, Pickard LM, Jones PW, Hoaglin DC, El Emam K, Rosenberg J (2002) Preliminary guidelines for empirical research in software engineering. Trans Softw Eng 28(8):721–734CrossRef Kitchenham BA, Pfleeger SL, Pickard LM, Jones PW, Hoaglin DC, El Emam K, Rosenberg J (2002) Preliminary guidelines for empirical research in software engineering. Trans Softw Eng 28(8):721–734CrossRef
Zurück zum Zitat Kowalski G (2010) Information retrieval architecture and algorithms. Springer, New York Kowalski G (2010) Information retrieval architecture and algorithms. Springer, New York
Zurück zum Zitat Macbeth G, Razumiejczyk E, Ledesma RD (2011) Cliff’s delta calculator: a non-parametric effect size program for two groups of observations. Univ Psychol 10(2):545–555 Macbeth G, Razumiejczyk E, Ledesma RD (2011) Cliff’s delta calculator: a non-parametric effect size program for two groups of observations. Univ Psychol 10(2):545–555
Zurück zum Zitat Marcus A, Maletic JI (2003) Recovering documentation-to-source-code traceability links using latent semantic indexing. In: Proceedings of 25th international conference on software engineering. IEEE CS Press, Portland, pp 125–135 Marcus A, Maletic JI (2003) Recovering documentation-to-source-code traceability links using latent semantic indexing. In: Proceedings of 25th international conference on software engineering. IEEE CS Press, Portland, pp 125–135
Zurück zum Zitat Maskeri G, Sarkar S, Heafield K (2008) Mining business topics in source code using latent dirichlet allocation. In: Proceedings of the 1st India software engineering conference, ISEC ’08. ACM, New York, pp 113–120 Maskeri G, Sarkar S, Heafield K (2008) Mining business topics in source code using latent dirichlet allocation. In: Proceedings of the 1st India software engineering conference, ISEC ’08. ACM, New York, pp 113–120
Zurück zum Zitat Pan B, Hembrooke H, Joachims T, Lorigo L, Gay G, Granka L (2007) In Google we trust: users’ decisions on rank, position, and relevance. J Comput-Mediat Commun 12(3):801–823CrossRef Pan B, Hembrooke H, Joachims T, Lorigo L, Gay G, Granka L (2007) In Google we trust: users’ decisions on rank, position, and relevance. J Comput-Mediat Commun 12(3):801–823CrossRef
Zurück zum Zitat Panichella A, Dit B, Oliveto R, Di Penta M, Poshyvanyk D, De Lucia A (2013) How to effectively use topic models for software engineering tasks? an approach based on genetic algorithms. In: Proceedings of the 2013 international conference on software engineering. IEEE Press, pp 522–531 Panichella A, Dit B, Oliveto R, Di Penta M, Poshyvanyk D, De Lucia A (2013) How to effectively use topic models for software engineering tasks? an approach based on genetic algorithms. In: Proceedings of the 2013 international conference on software engineering. IEEE Press, pp 522–531
Zurück zum Zitat Poshyvanyk D, Guéhéneuc Y-G, Marcus A, Antoniol G, Rajlich V (2007) Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Trans Softw Eng 33(6):420–432CrossRef Poshyvanyk D, Guéhéneuc Y-G, Marcus A, Antoniol G, Rajlich V (2007) Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Trans Softw Eng 33(6):420–432CrossRef
Zurück zum Zitat Rayner K (1998) Eye movements in reading and information processing: 20 years of research. Psychol Bull 124(3):372CrossRef Rayner K (1998) Eye movements in reading and information processing: 20 years of research. Psychol Bull 124(3):372CrossRef
Zurück zum Zitat Sharif B, Kagdi H (2011) On the use of eye tracking in software traceability. In: Proceedings of the 6th international workshop on traceability in emerging forms of software engineering (TEFSE). New York, pp 67–70 Sharif B, Kagdi H (2011) On the use of eye tracking in software traceability. In: Proceedings of the 6th international workshop on traceability in emerging forms of software engineering (TEFSE). New York, pp 67–70
Zurück zum Zitat Sharif B, Maletic JI (2010) An eye tracking study on camelcase and under_score identifier styles. In: Proceedings of 18th international conference on program comprehension (ICPC). IEEE, pp 196–205 Sharif B, Maletic JI (2010) An eye tracking study on camelcase and under_score identifier styles. In: Proceedings of 18th international conference on program comprehension (ICPC). IEEE, pp 196–205
Zurück zum Zitat Sharif B, Falcone M, Maletic JI (2012) An eye-tracking study on the role of scan time in finding source code defects. In: Proceedings of the symposium on eye tracking research and applications. ETRA ’12. ACM, New York, pp 381–384 Sharif B, Falcone M, Maletic JI (2012) An eye-tracking study on the role of scan time in finding source code defects. In: Proceedings of the symposium on eye tracking research and applications. ETRA ’12. ACM, New York, pp 381–384
Zurück zum Zitat Sun YH, He PL, Chen ZG (2004) An improved term weighting scheme for vector space model. In: Proceedings of 2004 international conference on machine learning and cybernetics, vol 3. IEEE, pp 1692–1695 Sun YH, He PL, Chen ZG (2004) An improved term weighting scheme for vector space model. In: Proceedings of 2004 international conference on machine learning and cybernetics, vol 3. IEEE, pp 1692–1695
Zurück zum Zitat Uwano H, Nakamura M, Monden A, Matsumoto K (2006) Analyzing individual performance of source code review using reviewers’ eye movement. In: Proceedings of the 2006 symposium on eye tracking research & applications (ETRA). ACM, New York, pp 133–140 Uwano H, Nakamura M, Monden A, Matsumoto K (2006) Analyzing individual performance of source code review using reviewers’ eye movement. In: Proceedings of the 2006 symposium on eye tracking research & applications (ETRA). ACM, New York, pp 133–140
Zurück zum Zitat Wang J, Peng X, Xing Z, Zhao W (2011) An exploratory study of feature location process: distinct phases, recurring patterns, and elementary actions. In: Proceedings of 27th IEEE international conference on software maintenance (ICSM), pp 213–222 Wang J, Peng X, Xing Z, Zhao W (2011) An exploratory study of feature location process: distinct phases, recurring patterns, and elementary actions. In: Proceedings of 27th IEEE international conference on software maintenance (ICSM), pp 213–222
Zurück zum Zitat Yusuf S, Kagdi H, Maletic JI (2007) Assessing the comprehension of uml class diagrams via eye tracking. In: Proceedings of 15th IEEE international conference on program comprehension (ICPC). IEEE,pp 113–122 Yusuf S, Kagdi H, Maletic JI (2007) Assessing the comprehension of uml class diagrams via eye tracking. In: Proceedings of 15th IEEE international conference on program comprehension (ICPC). IEEE,pp 113–122
Metadaten
Titel
An empirical study on the importance of source code entities for requirements traceability
verfasst von
Nasir Ali
Zohreh Sharafi
Yann-Gaël Guéhéneuc
Giuliano Antoniol
Publikationsdatum
01.04.2015
Verlag
Springer US
Erschienen in
Empirical Software Engineering / Ausgabe 2/2015
Print ISSN: 1382-3256
Elektronische ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-014-9315-y

Weitere Artikel der Ausgabe 2/2015

Empirical Software Engineering 2/2015 Zur Ausgabe