Skip to main content
Erschienen in: Empirical Software Engineering 1/2009

01.02.2009

An empirical analysis of information retrieval based concept location techniques in software comprehension

verfasst von: Brendan Cleary, Chris Exton, Jim Buckley, Michael English

Erschienen in: Empirical Software Engineering | Ausgabe 1/2009

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Concept location, the problem of associating human oriented concepts with their counterpart solution domain concepts, is a fundamental problem that lies at the heart of software comprehension. Recent research has attempted to alleviate the impact of the concept location problem through the application of methods drawn from the information retrieval (IR) community. Here we present a new approach based on a complimentary IR method which also has a sound basis in cognitive theory. We compare our approach to related work through an experiment and present our conclusions. This research adapts and expands upon existing language modelling frameworks in IR for use in concept location, in software systems. In doing so it is novel in that it leverages implicit information available in system documentation. Surprisingly, empirical evaluation of this approach showed little performance benefit overall and several possible explanations are forwarded for this finding.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Anquetil N, Lethbridge T (1997) File clustering using naming conventions for legacy systems. Conference of the centre for advanced studies on collaborative research. IBM, Toronto, Ontario, Canada Anquetil N, Lethbridge T (1997) File clustering using naming conventions for legacy systems. Conference of the centre for advanced studies on collaborative research. IBM, Toronto, Ontario, Canada
Zurück zum Zitat Antoniol G, Canfora G et al (2002) Recovering traceability links between code and documentation. IEEE Trans Soft Eng 28(10):970–983CrossRef Antoniol G, Canfora G et al (2002) Recovering traceability links between code and documentation. IEEE Trans Soft Eng 28(10):970–983CrossRef
Zurück zum Zitat Bai J, Song D et al (2005) Query expansion using term relationships in language models for information retrieval. 14th ACM International Conference on Information and Knowledge Management. ACM, Bremen, Germany Bai J, Song D et al (2005) Query expansion using term relationships in language models for information retrieval. 14th ACM International Conference on Information and Knowledge Management. ACM, Bremen, Germany
Zurück zum Zitat Biggerstaff TJ, Mitbander BG et al (1993) The concept assignment problem in program understanding. 15th International Conference on Software Engineering. IEEE Computer Society Press, Baltimore, MD, USA Biggerstaff TJ, Mitbander BG et al (1993) The concept assignment problem in program understanding. 15th International Conference on Software Engineering. IEEE Computer Society Press, Baltimore, MD, USA
Zurück zum Zitat Biggerstaff TJ, Mitbander BG et al (1994) Program understanding and the concept assignment problem. Commun ACM 37(5):72–82CrossRef Biggerstaff TJ, Mitbander BG et al (1994) Program understanding and the concept assignment problem. Commun ACM 37(5):72–82CrossRef
Zurück zum Zitat Bruza PD, Song D (2002) Inferring query models by computing information flow. Proceedings of the eleventh international conference on Information and knowledge management. ACM, McLean, VA, USA Bruza PD, Song D (2002) Inferring query models by computing information flow. Proceedings of the eleventh international conference on Information and knowledge management. ACM, McLean, VA, USA
Zurück zum Zitat Buckley C, Voorhees EM (2000) Evaluating evaluation measure stability. 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, Athens, Greece Buckley C, Voorhees EM (2000) Evaluating evaluation measure stability. 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, Athens, Greece
Zurück zum Zitat Canfora G, Cerulo L (2005) Impact analysis by mining software and change request repositories. 11th IEEE International Symposium on Software Metrics (METRICS’05) Canfora G, Cerulo L (2005) Impact analysis by mining software and change request repositories. 11th IEEE International Symposium on Software Metrics (METRICS’05)
Zurück zum Zitat Canfora G, Cerulo L (2006) Fine grained indexing of software repositories to support impact analysis. International Workshop on Mining Software Repositories (MSR’06) Canfora G, Cerulo L (2006) Fine grained indexing of software repositories to support impact analysis. International Workshop on Mining Software Repositories (MSR’06)
Zurück zum Zitat Chung W, Harrison W et al (2005) Working with implicit concerns in the concern manipulation environment. Linking aspect technology and evolution (LATE) co located with aspect orientated software development (ASOD 05). IEEE, Chicago, USA Chung W, Harrison W et al (2005) Working with implicit concerns in the concern manipulation environment. Linking aspect technology and evolution (LATE) co located with aspect orientated software development (ASOD 05). IEEE, Chicago, USA
Zurück zum Zitat Cleary B, Exton C (2006a) Assisting concept assignment using probabilistic classification and cognitive mapping. 2nd International Workshop on Supporting Knowledge Collaboration in Software Development (KSCD2006). IEEE/ACM, Tokyo, Japan Cleary B, Exton C (2006a) Assisting concept assignment using probabilistic classification and cognitive mapping. 2nd International Workshop on Supporting Knowledge Collaboration in Software Development (KSCD2006). IEEE/ACM, Tokyo, Japan
Zurück zum Zitat Cleary B, Exton C (2006b) The cognitive assignment eclipse plug-in (ICPC 06). 10th International Conference on Program Comprehension. IEEE Computer Society Press, Athens, GreeceCrossRef Cleary B, Exton C (2006b) The cognitive assignment eclipse plug-in (ICPC 06). 10th International Conference on Program Comprehension. IEEE Computer Society Press, Athens, GreeceCrossRef
Zurück zum Zitat Cleary B, Exton C (2007) Assisting concept location in software comprehension. 19th Annual Psychology of Programming Workshop (PPIG07). Joensu, Finland Cleary B, Exton C (2007) Assisting concept location in software comprehension. 19th Annual Psychology of Programming Workshop (PPIG07). Joensu, Finland
Zurück zum Zitat Cubranic D, Murphy GC et al (2005) Hipikat: a project memory for software development. IEEE Trans Soft Eng 31(6):446–465CrossRef Cubranic D, Murphy GC et al (2005) Hipikat: a project memory for software development. IEEE Trans Soft Eng 31(6):446–465CrossRef
Zurück zum Zitat Deerwester S, Dumais ST et al (1990) Indexing by latent semantic analysis. J Am Soc Info Sci 41(6):391–407CrossRef Deerwester S, Dumais ST et al (1990) Indexing by latent semantic analysis. J Am Soc Info Sci 41(6):391–407CrossRef
Zurück zum Zitat Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNet Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNet
Zurück zum Zitat Diesner J, Carley K (2004) AutoMap1.2-Extract, analyze, represent, and compare mental models from texts. Carnegie Mellon University Diesner J, Carley K (2004) AutoMap1.2-Extract, analyze, represent, and compare mental models from texts. Carnegie Mellon University
Zurück zum Zitat Eisenbarth T, Koschke R et al (2003) Locating features in source code. IEEE Trans Softw Eng 29(3):210–224CrossRef Eisenbarth T, Koschke R et al (2003) Locating features in source code. IEEE Trans Softw Eng 29(3):210–224CrossRef
Zurück zum Zitat Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32:675–701CrossRef Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32:675–701CrossRef
Zurück zum Zitat Gao J, Nie J-Y et al (2004) Dependence language model for information retrieval. 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, Sheffield, UK Gao J, Nie J-Y et al (2004) Dependence language model for information retrieval. 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, Sheffield, UK
Zurück zum Zitat Greenfield J, Short K (2004) Software factories: assembling applications with patterns, frameworks, models & tools. Wiley, New York Greenfield J, Short K (2004) Software factories: assembling applications with patterns, frameworks, models & tools. Wiley, New York
Zurück zum Zitat Hassan AE, Holt RC (2004) Using development history sticky notes to understand software architecture. Proceedings of the 12th IEEE International Workshop on Program Comprehension Hassan AE, Holt RC (2004) Using development history sticky notes to understand software architecture. Proceedings of the 12th IEEE International Workshop on Program Comprehension
Zurück zum Zitat Hill E, Pollock L et al (2007) Exploring the neighborhood with Dora to expedite software maintenance. 22nd IEEE/ACM International Conference on Automated Software Engineering (ASE’07) Hill E, Pollock L et al (2007) Exploring the neighborhood with Dora to expedite software maintenance. 22nd IEEE/ACM International Conference on Automated Software Engineering (ASE’07)
Zurück zum Zitat IEEE (2000) IEEE recommended practice for architectural description of software-intensive systems. Software Engineering Standards Committee IEEE (2000) IEEE recommended practice for architectural description of software-intensive systems. Software Engineering Standards Committee
Zurück zum Zitat Jones KS, Walker S et al (2000) A probabilistic model of information retrieval: development and comparative experiments. Inf Process Manage 36(6):779–808CrossRef Jones KS, Walker S et al (2000) A probabilistic model of information retrieval: development and comparative experiments. Inf Process Manage 36(6):779–808CrossRef
Zurück zum Zitat Kagdi H, Maletic JI et al (2007) Mining software repositories for traceability links. Proceedings of the 15th IEEE International Conference on Program Comprehension (ICPC ’07) Kagdi H, Maletic JI et al (2007) Mining software repositories for traceability links. Proceedings of the 15th IEEE International Conference on Program Comprehension (ICPC ’07)
Zurück zum Zitat Kiczales G, Lamping J et al (1997) Aspect-oriented programming. European conference on object-oriented programming. Springer, Jyväskylä, Finland Kiczales G, Lamping J et al (1997) Aspect-oriented programming. European conference on object-oriented programming. Springer, Jyväskylä, Finland
Zurück zum Zitat Kishida K (2005) Property of average precision and its generalization: an examination of evaluation indicator for information retrieval experiments. National Institute of Informatics, Tokyo, Japan Kishida K (2005) Property of average precision and its generalization: an examination of evaluation indicator for information retrieval experiments. National Institute of Informatics, Tokyo, Japan
Zurück zum Zitat Knight C, Munro M (2002) Program comprehension experiences with GXL: comprehension for comprehension. 10th International Workshop on Program Comprehension (IWPC 02). IEEE Computer Society Press, Paris, France Knight C, Munro M (2002) Program comprehension experiences with GXL: comprehension for comprehension. 10th International Workshop on Program Comprehension (IWPC 02). IEEE Computer Society Press, Paris, France
Zurück zum Zitat Landauer TK, Foltz PW et al (1998) Introduction to latent semantic analysis. Discourse Process 25:259–248CrossRef Landauer TK, Foltz PW et al (1998) Introduction to latent semantic analysis. Discourse Process 25:259–248CrossRef
Zurück zum Zitat LeGear A, Buckley J et al (2005) Achieving a reuse perspective within a component recovery process: an industrial scale case study. 13th International Workshop on Program Comprehension (IWPC 2005). IEEE Computer Society Press, St. Louis, MI, USA LeGear A, Buckley J et al (2005) Achieving a reuse perspective within a component recovery process: an industrial scale case study. 13th International Workshop on Program Comprehension (IWPC 2005). IEEE Computer Society Press, St. Louis, MI, USA
Zurück zum Zitat Littman DC, Pinto J et al (1986) Mental models and software maintenance. First Workshop on Empirical Studies of Programmers. Ablex, Washington, DC, USA Littman DC, Pinto J et al (1986) Mental models and software maintenance. First Workshop on Empirical Studies of Programmers. Ablex, Washington, DC, USA
Zurück zum Zitat Lund K, Burgess C (1996) Producing high-dimensional semantic spaces from lexical co-occurrence. Behav Res Meth Instrum Comput 28(2):203–208 Lund K, Burgess C (1996) Producing high-dimensional semantic spaces from lexical co-occurrence. Behav Res Meth Instrum Comput 28(2):203–208
Zurück zum Zitat Lund K, Burgess C (1997) Producing high-dimensional semantic spaces from lexical co-occurrence. Behav Res Meth Instrum Comput 28:203–208 Lund K, Burgess C (1997) Producing high-dimensional semantic spaces from lexical co-occurrence. Behav Res Meth Instrum Comput 28:203–208
Zurück zum Zitat Manning CD, Raghavan P et al (2007) Introduction to information retrieval. Cambridge University Press, Cambridge Manning CD, Raghavan P et al (2007) Introduction to information retrieval. Cambridge University Press, Cambridge
Zurück zum Zitat Marcus A, Maletic JI (2003) Recovering documentation-to-source-code traceability links using latent semantic indexing. 25th International Conference on Software Engineering (ICSE 2003). ACM/IEEE, Portland, OR, USA Marcus A, Maletic JI (2003) Recovering documentation-to-source-code traceability links using latent semantic indexing. 25th International Conference on Software Engineering (ICSE 2003). ACM/IEEE, Portland, OR, USA
Zurück zum Zitat Marcus A, Feng L et al (2003) Comprehension of software analysis data using 3D visualisation. 1st IEEE International Workshop on Program Comprehension (IWPC’03). IEEE Computer Society Press, Portland, OR, USA Marcus A, Feng L et al (2003) Comprehension of software analysis data using 3D visualisation. 1st IEEE International Workshop on Program Comprehension (IWPC’03). IEEE Computer Society Press, Portland, OR, USA
Zurück zum Zitat Marcus A, Sergeyev A et al (2004) An information retrieval approach to concept location in source code. 11th Working Conference on Reverse Engineering (WCRE 2004). Delft, The Netherlands. Marcus A, Sergeyev A et al (2004) An information retrieval approach to concept location in source code. 11th Working Conference on Reverse Engineering (WCRE 2004). Delft, The Netherlands.
Zurück zum Zitat Merlo E, McAdam I et al (2003) Feed-forward and recurrent neural networks for source code informal information analysis. J Softw Maint Evol Res Pract 15(4):205–244CrossRef Merlo E, McAdam I et al (2003) Feed-forward and recurrent neural networks for source code informal information analysis. J Softw Maint Evol Res Pract 15(4):205–244CrossRef
Zurück zum Zitat Murphy GC, Kersten M et al (2006) How are java software developers using the eclipse IDE? IEEE Softw 23(4):76–83CrossRef Murphy GC, Kersten M et al (2006) How are java software developers using the eclipse IDE? IEEE Softw 23(4):76–83CrossRef
Zurück zum Zitat Nemenyi PB (1963) Distribution-free multiple comparisons. PhD thesis, Princeton University Nemenyi PB (1963) Distribution-free multiple comparisons. PhD thesis, Princeton University
Zurück zum Zitat Poshyvanyk D, Marcus A (2007) Combining formal concept analysis with information retrieval for concept location in source code. 15th IEEE International Conference on Program Comprehension (ICPC ‘07) Poshyvanyk D, Marcus A (2007) Combining formal concept analysis with information retrieval for concept location in source code. 15th IEEE International Conference on Program Comprehension (ICPC ‘07)
Zurück zum Zitat Poshyvanyk D, Marcus A et al (2006a) JIRiSS—an eclipse plug-in for source code exploration. 14th IEEE International Conference on Program Comprehension (ICPC 2006). Athens, Greece. Poshyvanyk D, Marcus A et al (2006a) JIRiSS—an eclipse plug-in for source code exploration. 14th IEEE International Conference on Program Comprehension (ICPC 2006). Athens, Greece.
Zurück zum Zitat Poshyvanyk D, Marcus A et al (2006b) Combining probabilistic ranking and latent semantic indexing for feature identification. 14th IEEE International Conference on Program Comprehension (ICPC 2006). IEEE Computer Society Press, Athens, Greece Poshyvanyk D, Marcus A et al (2006b) Combining probabilistic ranking and latent semantic indexing for feature identification. 14th IEEE International Conference on Program Comprehension (ICPC 2006). IEEE Computer Society Press, Athens, Greece
Zurück zum Zitat Rajlich V, Wilde N (2002) The role of concepts in program comprehension. 10th International Workshop on Program Comprehension, (IWPC 2002). IEEE Computer Society Press, Paris, France Rajlich V, Wilde N (2002) The role of concepts in program comprehension. 10th International Workshop on Program Comprehension, (IWPC 2002). IEEE Computer Society Press, Paris, France
Zurück zum Zitat Robillard MP (2003) Representing concerns in source code. The University of British Columbia Robillard MP (2003) Representing concerns in source code. The University of British Columbia
Zurück zum Zitat Salton G (1989) Automatic text processing the transformation analysis and retrieval of information by computer. Addison-Wesley, Reading, MA Salton G (1989) Automatic text processing the transformation analysis and retrieval of information by computer. Addison-Wesley, Reading, MA
Zurück zum Zitat Salton G, Buckley C (1987) Term weighting approaches in automatic text retrieval. Cornell University, NY, USA Salton G, Buckley C (1987) Term weighting approaches in automatic text retrieval. Cornell University, NY, USA
Zurück zum Zitat Salton G, Wong A et al (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620MATHCrossRef Salton G, Wong A et al (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620MATHCrossRef
Zurück zum Zitat Sayyad-Shirabad J, Lethbridge TC et al (1997) A little knowledge can go a long way towards program understanding. Fifth International Workshop on Program Comprehension (IWPC ‘97). IEEE Computer Society Press, Dearborn, MI, USA Sayyad-Shirabad J, Lethbridge TC et al (1997) A little knowledge can go a long way towards program understanding. Fifth International Workshop on Program Comprehension (IWPC ‘97). IEEE Computer Society Press, Dearborn, MI, USA
Zurück zum Zitat Schneidewind N, Kitchenharn B et al (1999) Resolved: software maintenance is nothing more than another form of development. IEEE International Conference on Software Maintenance (ICSM ‘99). IEEE Computer Society Press, Oxford, UK Schneidewind N, Kitchenharn B et al (1999) Resolved: software maintenance is nothing more than another form of development. IEEE International Conference on Software Maintenance (ICSM ‘99). IEEE Computer Society Press, Oxford, UK
Zurück zum Zitat Shepherd D, Fry Z et al (2007) Using natural language program analysis to locate and understand action-oriented concerns. International Conference on Aspect Oriented Software Development (AOSD’07) Shepherd D, Fry Z et al (2007) Using natural language program analysis to locate and understand action-oriented concerns. International Conference on Aspect Oriented Software Development (AOSD’07)
Zurück zum Zitat Song D, Bruza P (2001) Discovering information flow suing high dimensional conceptual space. Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New Orleans, LA, USA, pp 327–333 Song D, Bruza P (2001) Discovering information flow suing high dimensional conceptual space. Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New Orleans, LA, USA, pp 327–333
Zurück zum Zitat Song D, Bruza P (2003) Towards context-sensitive information inference. J Am Soc Info Sci Technol (JASIST) 4(54):321–334CrossRef Song D, Bruza P (2003) Towards context-sensitive information inference. J Am Soc Info Sci Technol (JASIST) 4(54):321–334CrossRef
Zurück zum Zitat Wilde N, Scully MC (1995) Software reconnaissance: mapping program features to code. J Softw Maint Res Pract 7(1):49–62CrossRef Wilde N, Scully MC (1995) Software reconnaissance: mapping program features to code. J Softw Maint Res Pract 7(1):49–62CrossRef
Zurück zum Zitat Wilde N, Page H et al (2001) A case study of feature location in unstructured legacy Fortran code. 5th European Conference on Software Maintenance and Reengineering (CSMR 01). IEEE Computer Society Press, Lisbon, Portugal Wilde N, Page H et al (2001) A case study of feature location in unstructured legacy Fortran code. 5th European Conference on Software Maintenance and Reengineering (CSMR 01). IEEE Computer Society Press, Lisbon, Portugal
Zurück zum Zitat Zayour L, Lethbridge TC (2001) Adoption of reverse engineering tools a cognitive perspective and methodology. 9th International Workshop on Program Comprehension (IWPC 01). IEEE Computer Society Press, Toronto, Canada Zayour L, Lethbridge TC (2001) Adoption of reverse engineering tools a cognitive perspective and methodology. 9th International Workshop on Program Comprehension (IWPC 01). IEEE Computer Society Press, Toronto, Canada
Zurück zum Zitat Zhai C, Lafferty J (2004) A study of smoothing methods for language models applied to information retrieval. ACM Trans Info Syst 22(2):179–214CrossRef Zhai C, Lafferty J (2004) A study of smoothing methods for language models applied to information retrieval. ACM Trans Info Syst 22(2):179–214CrossRef
Zurück zum Zitat Zhao W, Zhang L et al (2004) SNIAFL: towards a static non-interactive approach to feature location. International Conference on Software Engineering (ICSE 04). ACM/IEEE, Edinburgh, Scotland Zhao W, Zhang L et al (2004) SNIAFL: towards a static non-interactive approach to feature location. International Conference on Software Engineering (ICSE 04). ACM/IEEE, Edinburgh, Scotland
Zurück zum Zitat Zimmermann T (2006) Knowledge collaboration by mining software repositories. 2nd International Workshop on Supporting Knowledge Collaboration in Software Development (KSCD2006). IEEE/ACM, Tokyo, Japan Zimmermann T (2006) Knowledge collaboration by mining software repositories. 2nd International Workshop on Supporting Knowledge Collaboration in Software Development (KSCD2006). IEEE/ACM, Tokyo, Japan
Metadaten
Titel
An empirical analysis of information retrieval based concept location techniques in software comprehension
verfasst von
Brendan Cleary
Chris Exton
Jim Buckley
Michael English
Publikationsdatum
01.02.2009
Verlag
Springer US
Erschienen in
Empirical Software Engineering / Ausgabe 1/2009
Print ISSN: 1382-3256
Elektronische ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-008-9095-3

Weitere Artikel der Ausgabe 1/2009

Empirical Software Engineering 1/2009 Zur Ausgabe

Premium Partner