Skip to main content
Erschienen in: Empirical Software Engineering 6/2016

09.11.2015

Large-scale information retrieval in software engineering - an experience report from industrial application

verfasst von: Michael Unterkalmsteiner, Tony Gorschek, Robert Feldt, Niklas Lavesson

Erschienen in: Empirical Software Engineering | Ausgabe 6/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Software Engineering activities are information intensive. Research proposes Information Retrieval (IR) techniques to support engineers in their daily tasks, such as establishing and maintaining traceability links, fault identification, and software maintenance. We describe an engineering task, test case selection, and illustrate our problem analysis and solution discovery process. The objective of the study is to gain an understanding of to what extent IR techniques (one potential solution) can be applied to test case selection and provide decision support in a large-scale, industrial setting. We analyze, in the context of the studied company, how test case selection is performed and design a series of experiments evaluating the performance of different IR techniques. Each experiment provides lessons learned from implementation, execution, and results, feeding to its successor. The three experiments led to the following observations: 1) there is a lack of research on scalable parameter optimization of IR techniques for software engineering problems; 2) scaling IR techniques to industry data is challenging, in particular for latent semantic analysis; 3) the IR context poses constraints on the empirical evaluation of IR techniques, requiring more research on developing valid statistical approaches. We believe that our experiences in conducting a series of IR experiments with industry grade data are valuable for peer researchers so that they can avoid the pitfalls that we have encountered. Furthermore, we identified challenges that need to be addressed in order to bridge the gap between laboratory IR experiments and real applications of IR in the industry.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Fußnoten
1
Note that the terms NLP and IR are independently used in literature to describe computerized processing of text and there is an overlap on what they comprise (Falessi et al. 2013). In the remainder of the paper we use exclusively the term IR to maintain consistency.
 
2
(Grossman and Frieder 2004) provides a broader overview
 
6
Nevertheless, standard functionality can be extended through Rapidminers’ plugin mechanism
 
8
Unfortunately it is not possible to determine whether this potential assumption violation had an impact on the outcome of the analysis since the raw data for this particular part (Part 1) of the case study has not been published by the authors.
 
9
They use inferential statistics to compare the best IR technique with the optimal combination of IR techniques, but this comparison is questionable since the preceding selection of best technique is based on descriptive statistics.
 
Literatur
Zurück zum Zitat Abadi A, Nisenson M, Simionovici Y (2008) A Traceability Technique for Specifications. In: Proceedings 16th international conference on program comprehension (ICPC), IEEE. Amsterdam, The Netherlands, pp 103–112 Abadi A, Nisenson M, Simionovici Y (2008) A Traceability Technique for Specifications. In: Proceedings 16th international conference on program comprehension (ICPC), IEEE. Amsterdam, The Netherlands, pp 103–112
Zurück zum Zitat Antoniol G, Canfora G, De Lucia A, Merlo E (1999) Recovering code to documentation links in OO systems. In: Proceedings 6th working conference on reverse engineering (WCRE), IEEE. Atlanta, USA, pp 136–144 Antoniol G, Canfora G, De Lucia A, Merlo E (1999) Recovering code to documentation links in OO systems. In: Proceedings 6th working conference on reverse engineering (WCRE), IEEE. Atlanta, USA, pp 136–144
Zurück zum Zitat Antoniol G, Canfora G, Casazza G, De Lucia A (2000) Information retrieval models for recovering traceability links between code and documentation. In: Proceedings international conference on software maintenance (ICSM), IEEE. San Jose, USA, pp 40–49 Antoniol G, Canfora G, Casazza G, De Lucia A (2000) Information retrieval models for recovering traceability links between code and documentation. In: Proceedings international conference on software maintenance (ICSM), IEEE. San Jose, USA, pp 40–49
Zurück zum Zitat Antoniol G, Canfora G, Casazza G, De Lucia A, Merlo E (2002) Recovering traceability links between code and documentation. IEEE Trans Softw Eng 28(10):970–983CrossRef Antoniol G, Canfora G, Casazza G, De Lucia A, Merlo E (2002) Recovering traceability links between code and documentation. IEEE Trans Softw Eng 28(10):970–983CrossRef
Zurück zum Zitat Arcuri A, Fraser G (2011) On Parameter Tuning in Search Based Software Engineering. In: Proceedings 3rd international conference on search based software engineering (SSBSE), Springer. Szeged, Hungary, pp 33–47CrossRef Arcuri A, Fraser G (2011) On Parameter Tuning in Search Based Software Engineering. In: Proceedings 3rd international conference on search based software engineering (SSBSE), Springer. Szeged, Hungary, pp 33–47CrossRef
Zurück zum Zitat Asuncion H, Asuncion A, Taylor R (2010) Software traceability with topic modeling. In: Proceedings 32nd international conference on software engineering (ICSE), IEEE. Cape Town, South Africa, pp 95–104 Asuncion H, Asuncion A, Taylor R (2010) Software traceability with topic modeling. In: Proceedings 32nd international conference on software engineering (ICSE), IEEE. Cape Town, South Africa, pp 95–104
Zurück zum Zitat Babar MA, Lianping C, Shull F (2010) Managing Variability in Software Product Lines. IEEE Software 27(3):89–91CrossRef Babar MA, Lianping C, Shull F (2010) Managing Variability in Software Product Lines. IEEE Software 27(3):89–91CrossRef
Zurück zum Zitat Basili V, Caldiera G (1995) Improve software quality by reusing knowledge and experience. Sloan Manage Rev 37(1):55–64 Basili V, Caldiera G (1995) Improve software quality by reusing knowledge and experience. Sloan Manage Rev 37(1):55–64
Zurück zum Zitat Bender R, Lange S (2001) Adjusting for multiple testing - when and how?. J Clin Epidemiol 54(4):343–349CrossRef Bender R, Lange S (2001) Adjusting for multiple testing - when and how?. J Clin Epidemiol 54(4):343–349CrossRef
Zurück zum Zitat Berry MW, Mehzer D, Philippe B, Sameh A (2006) Parallel Algorithms for the Singular Value Decomposition. In: Handbook of Parallel Computing and Statistics, 1st edn, CRC Press Berry MW, Mehzer D, Philippe B, Sameh A (2006) Parallel Algorithms for the Singular Value Decomposition. In: Handbook of Parallel Computing and Statistics, 1st edn, CRC Press
Zurück zum Zitat Biggers LR, Bocovich C, Capshaw R, Eddy BP, Etzkorn LH, Kraft NA (2014) Configuring latent Dirichlet allocation based feature location. Empir Softw Eng 19(3):465–500CrossRef Biggers LR, Bocovich C, Capshaw R, Eddy BP, Etzkorn LH, Kraft NA (2014) Configuring latent Dirichlet allocation based feature location. Empir Softw Eng 19(3):465–500CrossRef
Zurück zum Zitat Biggerstaff TJ, Mitbander BG, Webster D (1993) The Concept Assignment Problem in Program Understanding. In: Proceedings 15th international conference on software engineering (ICSE), IEEE. USA, Baltimore, pp 482–498 Biggerstaff TJ, Mitbander BG, Webster D (1993) The Concept Assignment Problem in Program Understanding. In: Proceedings 15th international conference on software engineering (ICSE), IEEE. USA, Baltimore, pp 482–498
Zurück zum Zitat Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet Allocation. J Mach Learn Res 3:993–1022MATH Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet Allocation. J Mach Learn Res 3:993–1022MATH
Zurück zum Zitat Borg M, Runeson P, Ardö A (2013) Recovering from a decade: a systematic mapping of information retrieval approaches to software traceability. Empir Softw Eng:1–52. in Print Borg M, Runeson P, Ardö A (2013) Recovering from a decade: a systematic mapping of information retrieval approaches to software traceability. Empir Softw Eng:1–52. in Print
Zurück zum Zitat Broy M (2006) Challenges in Automotive Software Engineering. In: Proceedings 28th international conference on software engineering (ICSE), ACM, Shanghai,China, pp 33–42 Broy M (2006) Challenges in Automotive Software Engineering. In: Proceedings 28th international conference on software engineering (ICSE), ACM, Shanghai,China, pp 33–42
Zurück zum Zitat Calcote J (2010) Autotools: A Practitioner’s Guide to GNU Autoconf, Automake, and Libtool. No Starch Press Calcote J (2010) Autotools: A Practitioner’s Guide to GNU Autoconf, Automake, and Libtool. No Starch Press
Zurück zum Zitat Chen L, Babar MA (2011) A systematic review of evaluation of variability management approaches in software product lines. Inf Softw Technol 53(4):344–362CrossRef Chen L, Babar MA (2011) A systematic review of evaluation of variability management approaches in software product lines. Inf Softw Technol 53(4):344–362CrossRef
Zurück zum Zitat Cleary B, Exton C, Buckley J, English M (2009) An empirical analysis of information retrieval based concept location techniques in software comprehension. Empir Softw Eng 14(1):93–130CrossRef Cleary B, Exton C, Buckley J, English M (2009) An empirical analysis of information retrieval based concept location techniques in software comprehension. Empir Softw Eng 14(1):93–130CrossRef
Zurück zum Zitat Cleland-Huang J, Czauderna A, Dekhtyar A, Gotel O, Hayes J H, Keenan E, Leach G, Maletic J, Poshyvanyk D, Shin Y, Zisman A, Antoniol G, Berenbach B, Egyed A, Maeder P (2011) Grand Challenges, Benchmarks, and TraceLab: Developing Infrastructure for the Software Traceability Research Community. In: Proceedings 6th international workshop on traceability in emerging forms of software engineering (TEFSE), ACM, Honolulu, USA, pp 17–23 Cleland-Huang J, Czauderna A, Dekhtyar A, Gotel O, Hayes J H, Keenan E, Leach G, Maletic J, Poshyvanyk D, Shin Y, Zisman A, Antoniol G, Berenbach B, Egyed A, Maeder P (2011) Grand Challenges, Benchmarks, and TraceLab: Developing Infrastructure for the Software Traceability Research Community. In: Proceedings 6th international workshop on traceability in emerging forms of software engineering (TEFSE), ACM, Honolulu, USA, pp 17–23
Zurück zum Zitat Clements P, Northrop L (2001) Software Product Lines: Practices and Patterns, 3rd edn. Addison-Wesley Professional, Boston Clements P, Northrop L (2001) Software Product Lines: Practices and Patterns, 3rd edn. Addison-Wesley Professional, Boston
Zurück zum Zitat Corazza A, Di Martino S, Maggio V (2012) LINSEN: An efficient approach to split identifiers and expand abbreviations. In: Proceedings 28th international conference on software maintenance (ICSM), IEEE. Trento, Italy, pp 233–242 Corazza A, Di Martino S, Maggio V (2012) LINSEN: An efficient approach to split identifiers and expand abbreviations. In: Proceedings 28th international conference on software maintenance (ICSM), IEEE. Trento, Italy, pp 233–242
Zurück zum Zitat Crawley MJ (2007) The R Book, 1st edn. Wiley Crawley MJ (2007) The R Book, 1st edn. Wiley
Zurück zum Zitat Cullum JK, Willoughby RA (2002) Lanczos Algorithms for Large Symmetric Eigenvalue Computations: Vol. 1: Theory, 2nd edn. SIAM Cullum JK, Willoughby RA (2002) Lanczos Algorithms for Large Symmetric Eigenvalue Computations: Vol. 1: Theory, 2nd edn. SIAM
Zurück zum Zitat De Lucia A, Fasano F, Oliveto R, Tortora G (2006) Can Information Retrieval Techniques Effectively Support Traceability Link Recovery?. In: Proceedings 14th international conference on program comprehension (ICPC), IEEE. Athens, Greece, pp 307–316CrossRef De Lucia A, Fasano F, Oliveto R, Tortora G (2006) Can Information Retrieval Techniques Effectively Support Traceability Link Recovery?. In: Proceedings 14th international conference on program comprehension (ICPC), IEEE. Athens, Greece, pp 307–316CrossRef
Zurück zum Zitat De Lucia A, Fasano F, Oliveto R, Tortora G (2007) Recovering traceability links in software artifact management systems using information retrieval methods. ACM Trans Softw Eng Methodol 16 (4) De Lucia A, Fasano F, Oliveto R, Tortora G (2007) Recovering traceability links in software artifact management systems using information retrieval methods. ACM Trans Softw Eng Methodol 16 (4)
Zurück zum Zitat De Lucia A, Oliveto R, Tortora G (2009) Assessing IR-based traceability recovery tools through controlled experiments. Empir Softw Eng 14(1):57–92CrossRef De Lucia A, Oliveto R, Tortora G (2009) Assessing IR-based traceability recovery tools through controlled experiments. Empir Softw Eng 14(1):57–92CrossRef
Zurück zum Zitat De Lucia A, Di Penta M, Oliveto R, Panichella A, Panichella S (2011) Improving IR-based Traceability Recovery Using Smoothing Filters. In: Proceedings 19th international conference on program comprehension (ICPC), IEEE. Kingston, Canada, pp 21–30 De Lucia A, Di Penta M, Oliveto R, Panichella A, Panichella S (2011) Improving IR-based Traceability Recovery Using Smoothing Filters. In: Proceedings 19th international conference on program comprehension (ICPC), IEEE. Kingston, Canada, pp 21–30
Zurück zum Zitat Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by Latent Semantic Analysis. J Am Soc Inf Sci 41(6):391–407CrossRef Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by Latent Semantic Analysis. J Am Soc Inf Sci 41(6):391–407CrossRef
Zurück zum Zitat Dit B, Guerrouj L, Poshyvanyk D, Antoniol G (2011a) Can Better Identifier Splitting Techniques Help Feature Location?. In: Proceedings 19th international conference on program comprehension (ICPC), IEEE. Kingston, Canada, pp 11–20 Dit B, Guerrouj L, Poshyvanyk D, Antoniol G (2011a) Can Better Identifier Splitting Techniques Help Feature Location?. In: Proceedings 19th international conference on program comprehension (ICPC), IEEE. Kingston, Canada, pp 11–20
Zurück zum Zitat Dit B, Revelle M, Gethers M, Poshyvanyk D (2011b) Feature location in source code: a taxonomy and survey. Journal of Software Maintenance and Evolution: Research and Practice 25(1):53–95CrossRef Dit B, Revelle M, Gethers M, Poshyvanyk D (2011b) Feature location in source code: a taxonomy and survey. Journal of Software Maintenance and Evolution: Research and Practice 25(1):53–95CrossRef
Zurück zum Zitat Dit B, Panichella A, Moritz E, Oliveto R, Di Penta M, Poshyvanyk D, De Lucia A (2013) Configuring topic models for software engineering tasks in TraceLab. In: Proceedings 7th international workshop on traceability in emerging forms of software engineering (TEFSE), IEEE. San Francisco, USA, pp 105–109 Dit B, Panichella A, Moritz E, Oliveto R, Di Penta M, Poshyvanyk D, De Lucia A (2013) Configuring topic models for software engineering tasks in TraceLab. In: Proceedings 7th international workshop on traceability in emerging forms of software engineering (TEFSE), IEEE. San Francisco, USA, pp 105–109
Zurück zum Zitat Dit B, Moritz E, Linares-Vásquez M, Poshyvanyk D, Cleland-Huang J (2014) Supporting and accelerating reproducible empirical research in software evolution and maintenance using TraceLab Component Library. Empir Softw Eng:1–39 Dit B, Moritz E, Linares-Vásquez M, Poshyvanyk D, Cleland-Huang J (2014) Supporting and accelerating reproducible empirical research in software evolution and maintenance using TraceLab Component Library. Empir Softw Eng:1–39
Zurück zum Zitat Dumais ST (1992) LSI meets TREC: a status report. In: NIST special publication, National Institute of Standards and Technology, pp 137–152 Dumais ST (1992) LSI meets TREC: a status report. In: NIST special publication, National Institute of Standards and Technology, pp 137–152
Zurück zum Zitat Dyba T, Kitchenham B, Jorgensen M (2005) Evidence-based software engineering for practitioners. IEEE Softw 22(1):58–65CrossRef Dyba T, Kitchenham B, Jorgensen M (2005) Evidence-based software engineering for practitioners. IEEE Softw 22(1):58–65CrossRef
Zurück zum Zitat Ebert C, Jones C (2009) Embedded Software: Facts, Figures, and Future. Computer 42(4):42–52CrossRef Ebert C, Jones C (2009) Embedded Software: Facts, Figures, and Future. Computer 42(4):42–52CrossRef
Zurück zum Zitat Edgington E, Onghena P (2007) Randomization Tests, 4th edn. Chapman and Hall/CRC, Boca Raton, FL Edgington E, Onghena P (2007) Randomization Tests, 4th edn. Chapman and Hall/CRC, Boca Raton, FL
Zurück zum Zitat Engström E, Runeson P (2011) Software product line testing - A systematic mapping study. Inf Softw Technol 53(1):2–13CrossRef Engström E, Runeson P (2011) Software product line testing - A systematic mapping study. Inf Softw Technol 53(1):2–13CrossRef
Zurück zum Zitat Engström E, Runeson P, Skoglund M (2010) A systematic review on regression test selection techniques. Inf Softw Technol 52(1):14–30CrossRef Engström E, Runeson P, Skoglund M (2010) A systematic review on regression test selection techniques. Inf Softw Technol 52(1):14–30CrossRef
Zurück zum Zitat Enslen E, Hill E, Pollock L, Vijay-Shanker K (2009) Mining source code to automatically split identifiers for software analysis. In: Proceedings 6th international working conference on mining software repositories (MSR), IEEE. Vancouver, Canada, pp 71–80 Enslen E, Hill E, Pollock L, Vijay-Shanker K (2009) Mining source code to automatically split identifiers for software analysis. In: Proceedings 6th international working conference on mining software repositories (MSR), IEEE. Vancouver, Canada, pp 71–80
Zurück zum Zitat Falessi D, Cantone G, Canfora G (2013) Empirical Principles and an Industrial Case Study in Retrieving Equivalent Requirements via Natural Language Processing Techniques. Transactions on Software Engineering 39(1):18–44CrossRef Falessi D, Cantone G, Canfora G (2013) Empirical Principles and an Industrial Case Study in Retrieving Equivalent Requirements via Natural Language Processing Techniques. Transactions on Software Engineering 39(1):18–44CrossRef
Zurück zum Zitat Feinerer I, Hornik K, Meyer D (2008) Text Mining Infrastructure in R. J Stat Softw 25(5):1–54CrossRef Feinerer I, Hornik K, Meyer D (2008) Text Mining Infrastructure in R. J Stat Softw 25(5):1–54CrossRef
Zurück zum Zitat Feldman SI (1979) Make - a program for maintaining computer programs. Software: Practice and Experience 9(4):255–265MATH Feldman SI (1979) Make - a program for maintaining computer programs. Software: Practice and Experience 9(4):255–265MATH
Zurück zum Zitat Gay G, Haiduc S, Marcus A, Menzies T (2009) On the use of relevance feedback in IR-based concept location. In: Proceedings 28th international conference on software maintenance (ICSM), IEEE. Edmonton, Canada, pp 351–360 Gay G, Haiduc S, Marcus A, Menzies T (2009) On the use of relevance feedback in IR-based concept location. In: Proceedings 28th international conference on software maintenance (ICSM), IEEE. Edmonton, Canada, pp 351–360
Zurück zum Zitat Gethers M, Oliveto R, Poshyvanyk D, De Lucia A (2011) On integrating orthogonal information retrieval methods to improve traceability recovery. In: Proceedings 27th international conference on software maintenance (ICSM), IEEE. Williamsburg, USA, pp 133–142 Gethers M, Oliveto R, Poshyvanyk D, De Lucia A (2011) On integrating orthogonal information retrieval methods to improve traceability recovery. In: Proceedings 27th international conference on software maintenance (ICSM), IEEE. Williamsburg, USA, pp 133–142
Zurück zum Zitat Goldberg DE (1989) Genetic algorithms in search, optimization, and machine learning, 1st edn Addison Wesley. USA, Boston Goldberg DE (1989) Genetic algorithms in search, optimization, and machine learning, 1st edn Addison Wesley. USA, Boston
Zurück zum Zitat Gorschek T, Wohlin C, Carre P, Larsson S (2006) A Model for Technology Transfer in Practice. IEEE Software 23(6):88–95CrossRef Gorschek T, Wohlin C, Carre P, Larsson S (2006) A Model for Technology Transfer in Practice. IEEE Software 23(6):88–95CrossRef
Zurück zum Zitat Gotel O, Finkelstein CW (1994) An analysis of the requirements traceability problem. In: Proceedings 1st international conference on requirements engineering (RE), IEEE. Colorado Springs, USA, pp 94–101CrossRef Gotel O, Finkelstein CW (1994) An analysis of the requirements traceability problem. In: Proceedings 1st international conference on requirements engineering (RE), IEEE. Colorado Springs, USA, pp 94–101CrossRef
Zurück zum Zitat Graaf B, Lormans M, Toetenel H (2003) Embedded software engineering: the state of the practice. IEEE Softw 20(6):61–69CrossRef Graaf B, Lormans M, Toetenel H (2003) Embedded software engineering: the state of the practice. IEEE Softw 20(6):61–69CrossRef
Zurück zum Zitat Grant S, Cordy JR, Skillicorn DB (2013) Using heuristics to estimate an appropriate number of latent topics in source code analysis. Sci Comput Program 78 (9):1663–1678CrossRef Grant S, Cordy JR, Skillicorn DB (2013) Using heuristics to estimate an appropriate number of latent topics in source code analysis. Sci Comput Program 78 (9):1663–1678CrossRef
Zurück zum Zitat Grossman D, Frieder O (2004) Information Retrieval - Algorithms and Heuristics, The Information Retrieval Series vol 15, 2nd edn. Springer, New YorkMATH Grossman D, Frieder O (2004) Information Retrieval - Algorithms and Heuristics, The Information Retrieval Series vol 15, 2nd edn. Springer, New YorkMATH
Zurück zum Zitat Guerrouj L, Di Penta M, Antoniol G, Guéhéneuc YG (2011) TIDIER: an identifier splitting approach using speech recognition techniques. J Softw Maint Evol Res Pract 25(6):575–599CrossRef Guerrouj L, Di Penta M, Antoniol G, Guéhéneuc YG (2011) TIDIER: an identifier splitting approach using speech recognition techniques. J Softw Maint Evol Res Pract 25(6):575–599CrossRef
Zurück zum Zitat Hernandez V, Roman JE, Vidal V (2005) SLEPc: A Scalable and Flexible Toolkit for the Solution of Eigenvalue Problems. ACM Trans Math Softw 31(3):351–362MathSciNetCrossRefMATH Hernandez V, Roman JE, Vidal V (2005) SLEPc: A Scalable and Flexible Toolkit for the Solution of Eigenvalue Problems. ACM Trans Math Softw 31(3):351–362MathSciNetCrossRefMATH
Zurück zum Zitat Hill E, Fry ZP, Boyd H, Sridhara G, Novikova Y, Pollock L, Vijay-Shanker K (2008) AMAP: automatically mining abbreviation expansions in programs to enhance software maintenance tools. In: Proceedings 5th international working conference on mining software repositories (MSR), ACM. Leipzig, Germany, pp 79–88 Hill E, Fry ZP, Boyd H, Sridhara G, Novikova Y, Pollock L, Vijay-Shanker K (2008) AMAP: automatically mining abbreviation expansions in programs to enhance software maintenance tools. In: Proceedings 5th international working conference on mining software repositories (MSR), ACM. Leipzig, Germany, pp 79–88
Zurück zum Zitat Islam M, Marchetto A, Susi A, Scanniello G (2012) A Multi-Objective Technique to Prioritize Test Cases Based on Latent Semantic Indexing. In: Proceedings 16th European conference on software maintenance and reengineering (CSMR), IEEE. Szeged, Hungary, pp 21–30 Islam M, Marchetto A, Susi A, Scanniello G (2012) A Multi-Objective Technique to Prioritize Test Cases Based on Latent Semantic Indexing. In: Proceedings 16th European conference on software maintenance and reengineering (CSMR), IEEE. Szeged, Hungary, pp 21–30
Zurück zum Zitat Ivarsson M, Gorschek T (2011) A method for evaluating rigor and industrial relevance of technology evaluations. Empir Softw Eng 16(3):365–395CrossRef Ivarsson M, Gorschek T (2011) A method for evaluating rigor and industrial relevance of technology evaluations. Empir Softw Eng 16(3):365–395CrossRef
Zurück zum Zitat Jiang HY, Nguyen TN, Chen IX, Jaygarl H, Chang CK (2008) Incremental Latent Semantic Indexing for Automatic Traceability Link Evolution Management. In: Proceedings 23rd international conference on automated software engineering (ASE), IEEE. L’Aquila, Italy, pp 59–68 Jiang HY, Nguyen TN, Chen IX, Jaygarl H, Chang CK (2008) Incremental Latent Semantic Indexing for Automatic Traceability Link Evolution Management. In: Proceedings 23rd international conference on automated software engineering (ASE), IEEE. L’Aquila, Italy, pp 59–68
Zurück zum Zitat Lavesson N, Davidsson P (2006) Quantifying the Impact of Learning Algorithm Parameter Tuning. In: Proceedings 21st national conference on artificial intelligence (AAAI), AAAI Press. USA, Boston, pp 395–400 Lavesson N, Davidsson P (2006) Quantifying the Impact of Learning Algorithm Parameter Tuning. In: Proceedings 21st national conference on artificial intelligence (AAAI), AAAI Press. USA, Boston, pp 395–400
Zurück zum Zitat Lawrie D, Binkley D (2011) Expanding identifiers to normalize source code vocabulary. In: Proceedings 27th IEEE international conference on software maintenance (ICSM), IEEE. Williamsburg, USA, pp 113–122 Lawrie D, Binkley D (2011) Expanding identifiers to normalize source code vocabulary. In: Proceedings 27th IEEE international conference on software maintenance (ICSM), IEEE. Williamsburg, USA, pp 113–122
Zurück zum Zitat Lee J, Kang S Lee D (2012) A Survey on Software Product Line Testing. In: Proceedings 16th international software product line conference (SPLC), ACM. Salvador, Brazil, pp 31–40 Lee J, Kang S Lee D (2012) A Survey on Software Product Line Testing. In: Proceedings 16th international software product line conference (SPLC), ACM. Salvador, Brazil, pp 31–40
Zurück zum Zitat Liu D, Marcus A, Poshyvanyk D, Rajlich V (2007) Feature Location via Information Retrieval Based Filtering of a Single Scenario Execution Trace. In: Proceedings 22nd international conference on automated software engineering (ASE), ACM. Atlanta, USA, pp 234–243 Liu D, Marcus A, Poshyvanyk D, Rajlich V (2007) Feature Location via Information Retrieval Based Filtering of a Single Scenario Execution Trace. In: Proceedings 22nd international conference on automated software engineering (ASE), ACM. Atlanta, USA, pp 234–243
Zurück zum Zitat Liu TY (2011) Learning to Rank for Information Retrieval, 1st edn. Springer Liu TY (2011) Learning to Rank for Information Retrieval, 1st edn. Springer
Zurück zum Zitat Lohar S, Amornborvornwong S, Zisman A, Cleland-Huang J (2013) Improving Trace Accuracy Through Data-driven Configuration and Composition of Tracing Features . In: Proceedings 9th joint meeting on foundations of software engineering (FSE), ACM. Saint Petersburg, Russia, pp 378–388 Lohar S, Amornborvornwong S, Zisman A, Cleland-Huang J (2013) Improving Trace Accuracy Through Data-driven Configuration and Composition of Tracing Features . In: Proceedings 9th joint meeting on foundations of software engineering (FSE), ACM. Saint Petersburg, Russia, pp 378–388
Zurück zum Zitat Lormans M, van Deursen A (2006) Proceedings 10th European conference on software maintenance and reengineering (CSMR), IEEE. In: Can LSI help reconstructing requirements traceability in design and test?. Bari, Italy, pp 47–56 Lormans M, van Deursen A (2006) Proceedings 10th European conference on software maintenance and reengineering (CSMR), IEEE. In: Can LSI help reconstructing requirements traceability in design and test?. Bari, Italy, pp 47–56
Zurück zum Zitat Ludbrook J, Dudley H (1998) Why Permutation Tests are Superior to t and F Tests in Biomedical Research. Am Stat 52(2):127–132 Ludbrook J, Dudley H (1998) Why Permutation Tests are Superior to t and F Tests in Biomedical Research. Am Stat 52(2):127–132
Zurück zum Zitat Maletic J, Marcus A (2000) Using latent semantic analysis to identify similarities in source code to support program understanding. In: Proceedings 12th international conference on tools with artificial intelligence (ICTAI), IEEE. Vancouver, Canada, pp 46–53 Maletic J, Marcus A (2000) Using latent semantic analysis to identify similarities in source code to support program understanding. In: Proceedings 12th international conference on tools with artificial intelligence (ICTAI), IEEE. Vancouver, Canada, pp 46–53
Zurück zum Zitat Maletic J (2001) Supporting Program Comprehension Using Semantic and Structural Information. In: Proceedings 23rd international conference on software engineering (ICSE), IEEE. Toronto, Canada, pp 103–112 Maletic J (2001) Supporting Program Comprehension Using Semantic and Structural Information. In: Proceedings 23rd international conference on software engineering (ICSE), IEEE. Toronto, Canada, pp 103–112
Zurück zum Zitat Maletic J, Valluri N (1999) Automatic software clustering via Latent Semantic Analysis. In: Proceedings 14th international conference on automated software engineering (ASE), IEEE. Cocoa Beach, USA, pp 251–254CrossRef Maletic J, Valluri N (1999) Automatic software clustering via Latent Semantic Analysis. In: Proceedings 14th international conference on automated software engineering (ASE), IEEE. Cocoa Beach, USA, pp 251–254CrossRef
Zurück zum Zitat Maletic J, Collard M, Marcus A (2002) Source code files as structured documents. In: Proceedings 10th international workshop on program comprehension (IWPC), IEEE. France, Paris, pp 289–292CrossRef Maletic J, Collard M, Marcus A (2002) Source code files as structured documents. In: Proceedings 10th international workshop on program comprehension (IWPC), IEEE. France, Paris, pp 289–292CrossRef
Zurück zum Zitat Marcus A, Maletic J (2003) Recovering documentation-to-source-code traceability links using latent semantic indexing. In: Proceedings 25th international conference on software engineering (ICSE), IEEE, Portland, USA, pp 125–135 Marcus A, Maletic J (2003) Recovering documentation-to-source-code traceability links using latent semantic indexing. In: Proceedings 25th international conference on software engineering (ICSE), IEEE, Portland, USA, pp 125–135
Zurück zum Zitat Marcus A, Sergeyev A, Rajlich V, Maletic J (2004) An information retrieval approach to concept location in source code. In: Proceedings 11th working conference on reverse engineering (WCRE), IEEE. Delft, The Netherlands, pp 214–223 Marcus A, Sergeyev A, Rajlich V, Maletic J (2004) An information retrieval approach to concept location in source code. In: Proceedings 11th working conference on reverse engineering (WCRE), IEEE. Delft, The Netherlands, pp 214–223
Zurück zum Zitat Mittas N, Angelis L (2008) Comparing cost prediction models by resampling techniques. J Syst Softw 81(5):616–632CrossRef Mittas N, Angelis L (2008) Comparing cost prediction models by resampling techniques. J Syst Softw 81(5):616–632CrossRef
Zurück zum Zitat Moreno L, Bandara W, Haiduc S (2013) On the Relationship between the Vocabulary of Bug Reports and Source Code. In: Proceedings 29th international conference on software maintenance (ICSM), IEEE. Eindhoven, The Netherlands, pp 452–455 Moreno L, Bandara W, Haiduc S (2013) On the Relationship between the Vocabulary of Bug Reports and Source Code. In: Proceedings 29th international conference on software maintenance (ICSM), IEEE. Eindhoven, The Netherlands, pp 452–455
Zurück zum Zitat Oliveto R, Gethers M, Poshyvanyk D, De Lucia A (2010) On the Equivalence of Information Retrieval Methods for Automated Traceability Link Recovery. In: Proceedings 18th international conference on program comprehension (ICPC), IEEE. Braga, Portugal, pp 68–71 Oliveto R, Gethers M, Poshyvanyk D, De Lucia A (2010) On the Equivalence of Information Retrieval Methods for Automated Traceability Link Recovery. In: Proceedings 18th international conference on program comprehension (ICPC), IEEE. Braga, Portugal, pp 68–71
Zurück zum Zitat Panichella A, Dit B, Oliveto R, Di Penta M, Poshyvanyk D, De Lucia A (2013) How to Effectively Use Topic Models for Software Engineering Tasks? An Approach Based on Genetic Algorithms. In: Proceedings 35th international conference on software engineering (ICSE), IEEE. San Francisco, USA, pp 522– 531 Panichella A, Dit B, Oliveto R, Di Penta M, Poshyvanyk D, De Lucia A (2013) How to Effectively Use Topic Models for Software Engineering Tasks? An Approach Based on Genetic Algorithms. In: Proceedings 35th international conference on software engineering (ICSE), IEEE. San Francisco, USA, pp 522– 531
Zurück zum Zitat Perrouin G, Sen S, Klein J, Baudry B, Le Traon Y (2010) Automated and Scalable T-wise Test Case Generation Strategies for Software Product Lines. In: Proceedings 3rd international conference on software testing, verification and validation (ICST), pp 459–468 Perrouin G, Sen S, Klein J, Baudry B, Le Traon Y (2010) Automated and Scalable T-wise Test Case Generation Strategies for Software Product Lines. In: Proceedings 3rd international conference on software testing, verification and validation (ICST), pp 459–468
Zurück zum Zitat Pettersson F, Ivarsson M, Gorschek T, Öhman P (2008) A practitioner’s guide to light weight software process assessment and improvement planning. The Journal of Systems and Software 81(6):972–995CrossRef Pettersson F, Ivarsson M, Gorschek T, Öhman P (2008) A practitioner’s guide to light weight software process assessment and improvement planning. The Journal of Systems and Software 81(6):972–995CrossRef
Zurück zum Zitat Pohl K, Metzger A (2006) Software Product Line Testing. Commun ACM 49 (12):78–81CrossRef Pohl K, Metzger A (2006) Software Product Line Testing. Commun ACM 49 (12):78–81CrossRef
Zurück zum Zitat Poshyvanyk D, Marcus A (2007) Combining Formal Concept Analysis with Information Retrieval for Concept Location in Source Code. In: Proceedings 15th international conference on program comprehension (ICPC), IEEE. Banff, Canada, pp 37–48CrossRef Poshyvanyk D, Marcus A (2007) Combining Formal Concept Analysis with Information Retrieval for Concept Location in Source Code. In: Proceedings 15th international conference on program comprehension (ICPC), IEEE. Banff, Canada, pp 37–48CrossRef
Zurück zum Zitat Poshyvanyk D, Gueheneuc YG, Marcus A, Antoniol G, Rajlich V (2006) Combining Probabilistic Ranking and Latent Semantic Indexing for Feature Identification. In: Proceedings 14th international conference on program comprehension (ICPC), IEEE. Athens, Greece, pp 137–148CrossRef Poshyvanyk D, Gueheneuc YG, Marcus A, Antoniol G, Rajlich V (2006) Combining Probabilistic Ranking and Latent Semantic Indexing for Feature Identification. In: Proceedings 14th international conference on program comprehension (ICPC), IEEE. Athens, Greece, pp 137–148CrossRef
Zurück zum Zitat Poshyvanyk D, Gethers M, Marcus A (2012) Concept location using formal concept analysis and information retrieval. ACM Trans Softw Eng Methodol 21(4) Poshyvanyk D, Gethers M, Marcus A (2012) Concept location using formal concept analysis and information retrieval. ACM Trans Softw Eng Methodol 21(4)
Zurück zum Zitat Qusef A, Bavota G, Oliveto R, De Lucia A, Binkley D (2014) Recovering test-to-code traceability using slicing and textual analysis. J Syst Softw 88:147–168CrossRef Qusef A, Bavota G, Oliveto R, De Lucia A, Binkley D (2014) Recovering test-to-code traceability using slicing and textual analysis. J Syst Softw 88:147–168CrossRef
Zurück zum Zitat Rothermel G, Harrold M (1996) Analyzing regression test selection techniques. IEEE Trans Softw Eng 22(8):529–551CrossRef Rothermel G, Harrold M (1996) Analyzing regression test selection techniques. IEEE Trans Softw Eng 22(8):529–551CrossRef
Zurück zum Zitat Salton G, Wong A, Yang CS (1975) A Vector Space Model for Automatic Indexing. Commun ACM 18(11):613–620CrossRefMATH Salton G, Wong A, Yang CS (1975) A Vector Space Model for Automatic Indexing. Commun ACM 18(11):613–620CrossRefMATH
Zurück zum Zitat Shepperd M, Bowes D, Hall T (2014) Researcher Bias: The Use of Machine Learning in Software Defect Prediction. IEEE Trans Softw Eng 40(6):603–616CrossRef Shepperd M, Bowes D, Hall T (2014) Researcher Bias: The Use of Machine Learning in Software Defect Prediction. IEEE Trans Softw Eng 40(6):603–616CrossRef
Zurück zum Zitat Sheskin DJ (2000) Handbook of Parametric and Nonparametric Statistical Procedures Second Edition, 2nd edn. Chapman and Hall/CRC, Boca RatonMATH Sheskin DJ (2000) Handbook of Parametric and Nonparametric Statistical Procedures Second Edition, 2nd edn. Chapman and Hall/CRC, Boca RatonMATH
Zurück zum Zitat Smucker MD, Allan J, Carterette B (2007) Acomparison of statistical significance tests for information retrieval evaluation. In: Proceedings 16th conference on information and knowledge management (CIKM), ACM. Lisbon, Portugal, pp 623–632 Smucker MD, Allan J, Carterette B (2007) Acomparison of statistical significance tests for information retrieval evaluation. In: Proceedings 16th conference on information and knowledge management (CIKM), ACM. Lisbon, Portugal, pp 623–632
Zurück zum Zitat Steyvers M, Griffiths T (2007) Probabilistic topic models. In: Handbook of Latent Semantic Analysis, 1st edn, Psychology Press Steyvers M, Griffiths T (2007) Probabilistic topic models. In: Handbook of Latent Semantic Analysis, 1st edn, Psychology Press
Zurück zum Zitat Thomas SW, Nagappan M, Blostein D, Hassan AE (2013) The Impact of Classifier Configuration and Classifier Combination on Bug Localization. IEEE Trans Softw Eng 39(10):1427–1443CrossRef Thomas SW, Nagappan M, Blostein D, Hassan AE (2013) The Impact of Classifier Configuration and Classifier Combination on Bug Localization. IEEE Trans Softw Eng 39(10):1427–1443CrossRef
Zurück zum Zitat Thomas SW, Hemmati H, Hassan AE, Blostein D (2014) Static test case prioritization using topic models. Empir Softw Eng 19(1):182–212CrossRef Thomas SW, Hemmati H, Hassan AE, Blostein D (2014) Static test case prioritization using topic models. Empir Softw Eng 19(1):182–212CrossRef
Zurück zum Zitat Thörn C (2010) Current state and potential of variability management practices in software-intensive SMEs: Results from a regional industrial survey. Inf Softw Technol 52(4):411–421CrossRef Thörn C (2010) Current state and potential of variability management practices in software-intensive SMEs: Results from a regional industrial survey. Inf Softw Technol 52(4):411–421CrossRef
Zurück zum Zitat Utting M, Pretschner A, Legeard B (2012) A taxonomy of model-based testing approaches. Software Testing. Verification and Reliability 22(5):297–312CrossRef Utting M, Pretschner A, Legeard B (2012) A taxonomy of model-based testing approaches. Software Testing. Verification and Reliability 22(5):297–312CrossRef
Zurück zum Zitat Wohlin C, Runeson P, Höst M, Ohlsson MC, Regnell B, Wesslén A (2000) Experimentation in software engineering: an introduction. Kluwer Academic Publishers, NorwellCrossRefMATH Wohlin C, Runeson P, Höst M, Ohlsson MC, Regnell B, Wesslén A (2000) Experimentation in software engineering: an introduction. Kluwer Academic Publishers, NorwellCrossRefMATH
Zurück zum Zitat Wohlin C, Aurum A, Angelis L, Phillips L, Dittrich Y, Gorschek T, Grahn H, Henningsson K, Kagstrom S, Low G, Rovegard P, Tomaszewski P, van Toorn C, Winter J (2012) The Success Factors Powering Industry-Academia Collaboration. IEEE Softw 29(2):67–73CrossRef Wohlin C, Aurum A, Angelis L, Phillips L, Dittrich Y, Gorschek T, Grahn H, Henningsson K, Kagstrom S, Low G, Rovegard P, Tomaszewski P, van Toorn C, Winter J (2012) The Success Factors Powering Industry-Academia Collaboration. IEEE Softw 29(2):67–73CrossRef
Zurück zum Zitat Xia X, Lo D, Wang X, Zhou B (2015) Dual analysis for recommending developers to resolve bugs. Journal of Software: Evolution and Process In Print Xia X, Lo D, Wang X, Zhou B (2015) Dual analysis for recommending developers to resolve bugs. Journal of Software: Evolution and Process In Print
Zurück zum Zitat Yoo S, Harman M (2012) Regression testing minimization, selection and prioritization: a survey. Software Testing. Verification and Reliability 22(2):67–120CrossRef Yoo S, Harman M (2012) Regression testing minimization, selection and prioritization: a survey. Software Testing. Verification and Reliability 22(2):67–120CrossRef
Zurück zum Zitat Zeimpekis D (2006) TMG: A MATLAB Toolbox for Generating Term-Document Matrices from Text Collections. In: Grouping Multidimensional Data, Springer. Berlin, Heidelberg, pp 187–210CrossRef Zeimpekis D (2006) TMG: A MATLAB Toolbox for Generating Term-Document Matrices from Text Collections. In: Grouping Multidimensional Data, Springer. Berlin, Heidelberg, pp 187–210CrossRef
Zurück zum Zitat Zhao W, Zhang L, Liu Y, Sun J, Yang F (2006) SNIAFL: Towards a static noninteractive approach to feature location. ACM Trans Softw Eng Methodol 15(2):195–226CrossRef Zhao W, Zhang L, Liu Y, Sun J, Yang F (2006) SNIAFL: Towards a static noninteractive approach to feature location. ACM Trans Softw Eng Methodol 15(2):195–226CrossRef
Zurück zum Zitat Řehůřek R (2011) Subspace Tracking for Latent Semantic Analysis. In: Proceedings 33rd European conference on advances in information retrieval (ECIR), Springer. Dublin, Ireland, pp 289–300 Řehůřek R (2011) Subspace Tracking for Latent Semantic Analysis. In: Proceedings 33rd European conference on advances in information retrieval (ECIR), Springer. Dublin, Ireland, pp 289–300
Metadaten
Titel
Large-scale information retrieval in software engineering - an experience report from industrial application
verfasst von
Michael Unterkalmsteiner
Tony Gorschek
Robert Feldt
Niklas Lavesson
Publikationsdatum
09.11.2015
Verlag
Springer US
Erschienen in
Empirical Software Engineering / Ausgabe 6/2016
Print ISSN: 1382-3256
Elektronische ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-015-9410-8

Weitere Artikel der Ausgabe 6/2016

Empirical Software Engineering 6/2016 Zur Ausgabe