Skip to main content
Erschienen in: Empirical Software Engineering 6/2014

01.12.2014

An experimental investigation on the effects of context on source code identifiers splitting and expansion

verfasst von: Latifa Guerrouj, Massimiliano Di Penta, Yann-Gaël Guéhéneuc, Giuliano Antoniol

Erschienen in: Empirical Software Engineering | Ausgabe 6/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Recent and past studies indicate that source code lexicon plays an important role in program comprehension. Developers often compose source code identifiers with abbreviated words and acronyms, and do not always use consistent mechanisms and explicit separators when creating identifiers. Such choices and inconsistencies impede the work of developers that must understand identifiers by decomposing them into their component terms, and mapping them onto dictionary, application or domain words. When software documentation is scarce, outdated or simply not available, developers must therefore use the available contextual information to understand the source code. This paper aims at investigating how developers split and expand source code identifiers, and, specifically, the extent to which different kinds of contextual information could support such a task. In particular, we consider (i) an internal context consisting of the content of functions and source code files in which the identifiers are located, and (ii) an external context involving external documentation. We conducted a family of two experiments with 63 participants, including bachelor, master, Ph.D. students, and post-docs. We randomly sampled a set of 50 identifiers from a corpus of open source C programs and we asked participants to split and expand them with the availability (or not) of internal and external contexts. We report evidence on the usefulness of contextual information for identifier splitting and acronym/abbreviation expansion. We observe that the source code files are more helpful than just looking at function source code, and that the application-level contextual information does not help any further. The availability of external sources of information only helps in some circumstances. Also, in some cases, we observe that participants better expanded acronyms than abbreviations, although in most cases both exhibit the same level of accuracy. Finally, results indicated that the knowledge of English plays a significant effect in identifier splitting/expansion. The obtained results confirm the conjecture that contextual information is useful in program comprehension, including when developers split and expand identifiers to understand them. We hypothesize that the integration of identifier splitting and expansion tools with IDE could help to improve developers’ productivity.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
5
Significant p-values are highlighted in bold face here and in all other tables.
 
Literatur
Zurück zum Zitat Anquetil N, Lethbridge T (1998) Assessing the relevance of identifier names in a legacy software system. In: Proceedings of CASCON, pp 213–222 Anquetil N, Lethbridge T (1998) Assessing the relevance of identifier names in a legacy software system. In: Proceedings of CASCON, pp 213–222
Zurück zum Zitat Antoniol G, Canfora G, Casazza G, De Lucia A, Merlo E (2002) Recovering traceability links between code and documentation. IEEE Trans Softw Eng 28:970–983CrossRef Antoniol G, Canfora G, Casazza G, De Lucia A, Merlo E (2002) Recovering traceability links between code and documentation. IEEE Trans Softw Eng 28:970–983CrossRef
Zurück zum Zitat Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley
Zurück zum Zitat Baker RD (1995) Modern permutation test software. In: Edgington EG (ed) Randomization tests. Marcel Decker Baker RD (1995) Modern permutation test software. In: Edgington EG (ed) Randomization tests. Marcel Decker
Zurück zum Zitat Basili V, Caldiera G, Rombach DH (1994) The goal question metric paradigm encyclopedia of software engineering. John Wiley and Sons Basili V, Caldiera G, Rombach DH (1994) The goal question metric paradigm encyclopedia of software engineering. John Wiley and Sons
Zurück zum Zitat Binkley D, Davis M, Lawrie D, Morrell C (2009) To camelcase or under_score. In: The 17th IEEE international conference on program comprehension, ICPC 2009. Vancouver, British Columbia, Canada, May 17–19, 2009. IEEE Computer Society, pp 158–167 Binkley D, Davis M, Lawrie D, Morrell C (2009) To camelcase or under_score. In: The 17th IEEE international conference on program comprehension, ICPC 2009. Vancouver, British Columbia, Canada, May 17–19, 2009. IEEE Computer Society, pp 158–167
Zurück zum Zitat Binkley D, Davis M, Lawrie D, Maletic JI, Morrell C, Sharif B (2013) The impact of identifier style on effort and comprehension. Empir Software Eng 2(18):219–276CrossRef Binkley D, Davis M, Lawrie D, Maletic JI, Morrell C, Sharif B (2013) The impact of identifier style on effort and comprehension. Empir Software Eng 2(18):219–276CrossRef
Zurück zum Zitat Caprile B, Tonella P (1999) Nomen est omen: analyzing the language of function identifiers. In: Proc. of the working conference on reverse engineering (WCRE). Atlanta, Georgia, USA, pp 112–122 Caprile B, Tonella P (1999) Nomen est omen: analyzing the language of function identifiers. In: Proc. of the working conference on reverse engineering (WCRE). Atlanta, Georgia, USA, pp 112–122
Zurück zum Zitat Caprile B, Tonella P (2000) Restructuring program identifier names. In: Proc. of the International Conference on Software Maintenance (ICSM), pp 97–107 Caprile B, Tonella P (2000) Restructuring program identifier names. In: Proc. of the International Conference on Software Maintenance (ICSM), pp 97–107
Zurück zum Zitat Deißenböck F, Pizka M (2005) Concise and consistent naming. In: Proc. of the International Workshop on Program Comprehension (IWPC) Deißenböck F, Pizka M (2005) Concise and consistent naming. In: Proc. of the International Workshop on Program Comprehension (IWPC)
Zurück zum Zitat Dit B, Guerrouj L, Poshyvanyk D, Antoniol G (2011) Can better identifier splitting techniques help feature location? In: Proc. of the International Conference on Program Comprehension (ICPC). Kingston, pp 11–20 Dit B, Guerrouj L, Poshyvanyk D, Antoniol G (2011) Can better identifier splitting techniques help feature location? In: Proc. of the International Conference on Program Comprehension (ICPC). Kingston, pp 11–20
Zurück zum Zitat Enslen E, Hill E, Pollock LL, Vijay-Shanker K (2009) Mining source code to automatically split identifiers for software analysis. In: Proceedings of the 6th international working conference on mining software repositories, MSR 2009. Vancouver, BC, Canada, May 16–17, pp 71–80 Enslen E, Hill E, Pollock LL, Vijay-Shanker K (2009) Mining source code to automatically split identifiers for software analysis. In: Proceedings of the 6th international working conference on mining software repositories, MSR 2009. Vancouver, BC, Canada, May 16–17, pp 71–80
Zurück zum Zitat Grissom RJ, Kim JJ (2005) Effect sizes for research: a broad practical approach, 2nd edn. Lawrence Earlbaum Associates Grissom RJ, Kim JJ (2005) Effect sizes for research: a broad practical approach, 2nd edn. Lawrence Earlbaum Associates
Zurück zum Zitat Guerrouj L, Di Penta M, Antoniol G, Guéhéneuc YG (2013) TIDIER: an identifier splitting approach using speech recognition techniques. J Softw Evol Process 25(6):569–661CrossRef Guerrouj L, Di Penta M, Antoniol G, Guéhéneuc YG (2013) TIDIER: an identifier splitting approach using speech recognition techniques. J Softw Evol Process 25(6):569–661CrossRef
Zurück zum Zitat Holm A (1979) A simple sequentially rejective Bonferroni test procedure. Scand J Stat 6:65–70MathSciNetMATH Holm A (1979) A simple sequentially rejective Bonferroni test procedure. Scand J Stat 6:65–70MathSciNetMATH
Zurück zum Zitat Kersten M, Murphy GC (2006) Using task context to improve programmer productivity. In: SIGSOFT ’06/FSE-14: proceedings of the 14th ACM SIGSOFT international symposium on Foundations of software engineering. ACM Press, Portland, Oregon, pp 1–11CrossRef Kersten M, Murphy GC (2006) Using task context to improve programmer productivity. In: SIGSOFT ’06/FSE-14: proceedings of the 14th ACM SIGSOFT international symposium on Foundations of software engineering. ACM Press, Portland, Oregon, pp 1–11CrossRef
Zurück zum Zitat Lawrie D, Binkley D (2011) Expanding identifiers to normalize source code vocabulary. In: Proc. of the International Conference on Software Maintenance (ICSM), pp 113–122 Lawrie D, Binkley D (2011) Expanding identifiers to normalize source code vocabulary. In: Proc. of the International Conference on Software Maintenance (ICSM), pp 113–122
Zurück zum Zitat Lawrie D, Feild H, Binkley D (2006a) Syntactic identifier conciseness and consistency. In: 6th IEEE international workshop on source code analysis and manipulation. Philadelphia, Pennsylvania, USA, pp 139–148 Lawrie D, Feild H, Binkley D (2006a) Syntactic identifier conciseness and consistency. In: 6th IEEE international workshop on source code analysis and manipulation. Philadelphia, Pennsylvania, USA, pp 139–148
Zurück zum Zitat Lawrie D, Morrell C, Feild H, Binkley D (2006b) What’s in a name? A study of identifiers. In: Proceedings of 14th IEEE international conference on program comprehension. IEEE CS Press, Athens, pp 3–12CrossRef Lawrie D, Morrell C, Feild H, Binkley D (2006b) What’s in a name? A study of identifiers. In: Proceedings of 14th IEEE international conference on program comprehension. IEEE CS Press, Athens, pp 3–12CrossRef
Zurück zum Zitat Lawrie D, Morrell C, Feild H, Binkley D (2007) Effective identifier names for comprehension and memory. Innov Syst Softw Eng 3(4):303–318CrossRef Lawrie D, Morrell C, Feild H, Binkley D (2007) Effective identifier names for comprehension and memory. Innov Syst Softw Eng 3(4):303–318CrossRef
Zurück zum Zitat Lawrie DJ, Binkley D, Morrell C (2010) Normalizing source code vocabulary. In: Proc. of the Working Conference on Reverse Engineering (WCRE), pp 112–122 Lawrie DJ, Binkley D, Morrell C (2010) Normalizing source code vocabulary. In: Proc. of the Working Conference on Reverse Engineering (WCRE), pp 112–122
Zurück zum Zitat Liu D, Marcus A, Poshyvanyk D, Rajlich V (2007) Feature location via information retrieval based filtering of a single scenario execution trace. In: Proceedings of the 22nd IEEE/ACM international conference on automated software engineering. ACM, New York, NY, pp 234–243 Liu D, Marcus A, Poshyvanyk D, Rajlich V (2007) Feature location via information retrieval based filtering of a single scenario execution trace. In: Proceedings of the 22nd IEEE/ACM international conference on automated software engineering. ACM, New York, NY, pp 234–243
Zurück zum Zitat Madani N, Guerrouj L, Di Penta M, Guéhéneuc Y-G, Antoniol G (2010) Recognizing words from source code identifiers using speech recognition techniques. In: Proceedings of the conference on software maintenance and reengineering. IEEE, pp 69–78 Madani N, Guerrouj L, Di Penta M, Guéhéneuc Y-G, Antoniol G (2010) Recognizing words from source code identifiers using speech recognition techniques. In: Proceedings of the conference on software maintenance and reengineering. IEEE, pp 69–78
Zurück zum Zitat Maletic JI, Marcus A (2001) Supporting program comprehension using semantic and structural information. In: Proc. of 23rd international conference on software engineering. Toronto, pp 103–112 Maletic JI, Marcus A (2001) Supporting program comprehension using semantic and structural information. In: Proc. of 23rd international conference on software engineering. Toronto, pp 103–112
Zurück zum Zitat Marc E, Alfred A, Giuliano A, Guéhéneuc Y-G (2008) Cerberus: tracing requirements to source code using information retrieval dynamic analysis and program analysis. In: ICPC ’08: Proceedings of the 2008 the 16th IEEE international conference on program comprehension. IEEE Computer Society, Washington DC pp 53–62 Marc E, Alfred A, Giuliano A, Guéhéneuc Y-G (2008) Cerberus: tracing requirements to source code using information retrieval dynamic analysis and program analysis. In: ICPC ’08: Proceedings of the 2008 the 16th IEEE international conference on program comprehension. IEEE Computer Society, Washington DC pp 53–62
Zurück zum Zitat Marcus A, Maletic JI, Sergeyev A (2005) Recovery of traceability links between software documentation and source code. Int J Softw Eng Knowl Eng 15(5):811–836CrossRef Marcus A, Maletic JI, Sergeyev A (2005) Recovery of traceability links between software documentation and source code. Int J Softw Eng Knowl Eng 15(5):811–836CrossRef
Zurück zum Zitat Merlo E, McAdam I, De Mori R (2003) Feed-forward and recurrent neural networks for source code informal information analysis. J Softw Maint 15(4):205–244CrossRef Merlo E, McAdam I, De Mori R (2003) Feed-forward and recurrent neural networks for source code informal information analysis. J Softw Maint 15(4):205–244CrossRef
Zurück zum Zitat Ney H (1984) The use of a one-stage dynamic programming algorithm for connected word recognition. IEEE Trans Acoust Speech Signal Process 32(2):263–271CrossRef Ney H (1984) The use of a one-stage dynamic programming algorithm for connected word recognition. IEEE Trans Acoust Speech Signal Process 32(2):263–271CrossRef
Zurück zum Zitat Poshyvanyk D, Guéhéneuc Y-G, Marcus A, Antoniol G, Rajlich V (2007) Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Trans Software Eng 33(6):420–432CrossRef Poshyvanyk D, Guéhéneuc Y-G, Marcus A, Antoniol G, Rajlich V (2007) Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Trans Software Eng 33(6):420–432CrossRef
Zurück zum Zitat R Core Team (2012) R: a language and environment for statistical computing. Vienna, Austria. ISBN 3-900051-07-0 R Core Team (2012) R: a language and environment for statistical computing. Vienna, Austria. ISBN 3-900051-07-0
Zurück zum Zitat Ricca F, Di Penta M, Torchiano M, Tonella P, Ceccato M (2010) How developers’ experience and ability influence web application comprehension tasks supported by uml stereotypes: a series of four experiments. IEEE Trans Softw Eng 36(1):96–118CrossRef Ricca F, Di Penta M, Torchiano M, Tonella P, Ceccato M (2010) How developers’ experience and ability influence web application comprehension tasks supported by uml stereotypes: a series of four experiments. IEEE Trans Softw Eng 36(1):96–118CrossRef
Zurück zum Zitat Robillard MP, Coelho W, Murphy GC (2004) How effective developers investigate source code: anexploratory study. IEEE Trans Softw Eng 30(12):889–903CrossRef Robillard MP, Coelho W, Murphy GC (2004) How effective developers investigate source code: anexploratory study. IEEE Trans Softw Eng 30(12):889–903CrossRef
Zurück zum Zitat Sharif B, Maletic JI (2010) An eye tracking study on camelcase and under_score identifier styles. In: Proceedings of the international conference on program comprehension, pp 196–205 Sharif B, Maletic JI (2010) An eye tracking study on camelcase and under_score identifier styles. In: Proceedings of the international conference on program comprehension, pp 196–205
Zurück zum Zitat Sheskin DJ (2007) Handbook of parametric and nonparametric statistical procedures, 4th edn. Chapman & Hall Sheskin DJ (2007) Handbook of parametric and nonparametric statistical procedures, 4th edn. Chapman & Hall
Zurück zum Zitat Sillito J, Murphy GC, De Volder K (2008) Asking and answering questions during a programming change task. IEEE Trans Softw Eng 34:434–451CrossRef Sillito J, Murphy GC, De Volder K (2008) Asking and answering questions during a programming change task. IEEE Trans Softw Eng 34:434–451CrossRef
Zurück zum Zitat Soloway E, Bonar J, Ehrlich K (1983) Cognitive strategies and looping constructs: an empirical study. Commun ACM 26(11):853–860CrossRef Soloway E, Bonar J, Ehrlich K (1983) Cognitive strategies and looping constructs: an empirical study. Commun ACM 26(11):853–860CrossRef
Zurück zum Zitat Storey MAD (1998) A cognitive framework for describing and evaluating software exploration tools. PhD thesis Simon Fraser University Storey MAD (1998) A cognitive framework for describing and evaluating software exploration tools. PhD thesis Simon Fraser University
Zurück zum Zitat Takang A, Grubb PA, Macredie RD (1996) The effects of comments and identifier names on program comprehensibility: an experiential study. J Program Lang 4(3):143–167 Takang A, Grubb PA, Macredie RD (1996) The effects of comments and identifier names on program comprehensibility: an experiential study. J Program Lang 4(3):143–167
Zurück zum Zitat von Mayrhauser A, Vans AM (1995) Program comprehension during software maintenance and evolution. IEEE Comput 28(8):44–55CrossRef von Mayrhauser A, Vans AM (1995) Program comprehension during software maintenance and evolution. IEEE Comput 28(8):44–55CrossRef
Zurück zum Zitat Wohlin C, Runeson P, Host M, Ohlsson MC, Regnell B, Wesslen A (2000) Experimentation in software engineering—an introduction. Kluwer Academic Publishers Wohlin C, Runeson P, Host M, Ohlsson MC, Regnell B, Wesslen A (2000) Experimentation in software engineering—an introduction. Kluwer Academic Publishers
Metadaten
Titel
An experimental investigation on the effects of context on source code identifiers splitting and expansion
verfasst von
Latifa Guerrouj
Massimiliano Di Penta
Yann-Gaël Guéhéneuc
Giuliano Antoniol
Publikationsdatum
01.12.2014
Verlag
Springer US
Erschienen in
Empirical Software Engineering / Ausgabe 6/2014
Print ISSN: 1382-3256
Elektronische ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-013-9260-1

Weitere Artikel der Ausgabe 6/2014

Empirical Software Engineering 6/2014 Zur Ausgabe

Premium Partner