Skip to main content
Erschienen in: Empirical Software Engineering 4/2018

25.11.2017

The need for software specific natural language techniques

verfasst von: Dave Binkley, Dawn Lawrie, Christopher Morrell

Erschienen in: Empirical Software Engineering | Ausgabe 4/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

For over two decades, software engineering (SE) researchers have been importing tools and techniques from information retrieval (IR). Initial results have been quite positive. For example, when applied to problems such as feature location or re-establishing traceability links, IR techniques work well on their own, and often even better in combination with more traditional source code analysis techniques such as static and dynamic analysis. However, recently there has been growing awareness among SE researchers that IR tools and techniques are designed to work under different assumptions than those that hold for a software system. Thus it may be beneficial to consider IR-inspired tools and techniques that are specifically designed to work with software. One aim of this work is to provide quantitative empirical evidence in support of this observation. To do so a new technique is introduced that captures the level of difficulty found in an information need, the true, often latent, information that a searcher desires to know. The new technique is used to compare two domains: Natural Language (NL) and SE. Analysis of the data leads to three significant findings. First, the variation in the distribution of difficulty of the SE information needs differs from that of the NL information needs; second, collection age plays a role in the differences between the NL collections; and finally, the retrieval model used has little impact on the results.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Abebe SL, Haiduc S, Tonella P, Marcus A (2009) Lexicon bad smells in software. In: 2009 16th Working Conference on Reverse Engineering. IEEE, Piscataway, pp 95–99 Abebe SL, Haiduc S, Tonella P, Marcus A (2009) Lexicon bad smells in software. In: 2009 16th Working Conference on Reverse Engineering. IEEE, Piscataway, pp 95–99
Zurück zum Zitat Alduailij M, Al-Duailej M (2015) Performance evaluation of information retrieval models in bug localization on the method level. In: 2015 international conference on collaboration technologies and systems (CTS). IEEE, Piscataway, pp 305–313 Alduailij M, Al-Duailej M (2015) Performance evaluation of information retrieval models in bug localization on the method level. In: 2015 international conference on collaboration technologies and systems (CTS). IEEE, Piscataway, pp 305–313
Zurück zum Zitat Arnaoudova V, Eshkevari LM, Di Penta M, Oliveto R, Antoniol G, Gueheneuc Y-G (2014) Repent: Analyzing the nature of identifier renamings. IEEE Trans Softw Eng 40(5):502–532CrossRef Arnaoudova V, Eshkevari LM, Di Penta M, Oliveto R, Antoniol G, Gueheneuc Y-G (2014) Repent: Analyzing the nature of identifier renamings. IEEE Trans Softw Eng 40(5):502–532CrossRef
Zurück zum Zitat Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval, vol 463. ACM Press, New York Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval, vol 463. ACM Press, New York
Zurück zum Zitat Bavota G, De Lucia A, Oliveto R (2011) Identifying extract class refactoring opportunities using structural and semantic cohesion measures. J Syst Softw 84(3):397–414CrossRef Bavota G, De Lucia A, Oliveto R (2011) Identifying extract class refactoring opportunities using structural and semantic cohesion measures. J Syst Softw 84(3):397–414CrossRef
Zurück zum Zitat Binkley D, Lawrie D (2016) A case for software specific natural language techniques. In: 2016 IEEE 16th international working conference on source code analysis and manipulation (SCAM) . IEEE, Piscataway, pp 187–196 Binkley D, Lawrie D (2016) A case for software specific natural language techniques. In: 2016 IEEE 16th international working conference on source code analysis and manipulation (SCAM) . IEEE, Piscataway, pp 187–196
Zurück zum Zitat Binkley D, Lawrie DJ, Uehlinger C, Heinz D (2015) Enabling improved ir-based feature location. J Syst Softw 101(0):30–42CrossRef Binkley D, Lawrie DJ, Uehlinger C, Heinz D (2015) Enabling improved ir-based feature location. J Syst Softw 101(0):30–42CrossRef
Zurück zum Zitat Callan JP, Bruce Croft W, Harding SM (1992) The inquery retrieval system. In: Proceedings of the third international conference on database and expert systems applications, pp 78–83 Callan JP, Bruce Croft W, Harding SM (1992) The inquery retrieval system. In: Proceedings of the third international conference on database and expert systems applications, pp 78–83
Zurück zum Zitat Cleverdon C (1967) The cranfield tests on index language devices. Aslib proceedings 19(6):173–194. MCB UP LtdCrossRef Cleverdon C (1967) The cranfield tests on index language devices. Aslib proceedings 19(6):173–194. MCB UP LtdCrossRef
Zurück zum Zitat De Lucia A, Oliveto R, Sgueglia P (2006) Incremental approach and user feedbacks: a silver bullet for traceability recovery. In: 22nd IEEE international conference on software maintenance, 2006. ICSM’06. IEEE, Piscataway, pp 299–309 De Lucia A, Oliveto R, Sgueglia P (2006) Incremental approach and user feedbacks: a silver bullet for traceability recovery. In: 22nd IEEE international conference on software maintenance, 2006. ICSM’06. IEEE, Piscataway, pp 299–309
Zurück zum Zitat De Lucia A, Di Penta M, Oliveto R (2011) Improving source code lexicon via traceability and information retrieval. IEEE Trans Softw Eng 37(2):205–227CrossRef De Lucia A, Di Penta M, Oliveto R (2011) Improving source code lexicon via traceability and information retrieval. IEEE Trans Softw Eng 37(2):205–227CrossRef
Zurück zum Zitat Dit B, Revelle M, Gethers M, Poshyvanyk D (2011) Feature location in source code: A taxonomy and survey. J Softw Maint Evol 23(7):107–117 Dit B, Revelle M, Gethers M, Poshyvanyk D (2011) Feature location in source code: A taxonomy and survey. J Softw Maint Evol 23(7):107–117
Zurück zum Zitat Enslen E, Hill E, Pollock L, Vijay-Shanker K (2009) Mining source code to automatically split identifiers for software analysis. In: Proceedings of the 2009 mining software repositories (MSR). IEEE, Piscataway Enslen E, Hill E, Pollock L, Vijay-Shanker K (2009) Mining source code to automatically split identifiers for software analysis. In: Proceedings of the 2009 mining software repositories (MSR). IEEE, Piscataway
Zurück zum Zitat Gay G, Haiduc S, Marcus A, Menzies T (2009) On the use of relevance feedback in ir-based concept location. In: 2009. ICSM 2009. IEEE international conference on software maintenance. IEEE, Piscataway, pp 351–360 Gay G, Haiduc S, Marcus A, Menzies T (2009) On the use of relevance feedback in ir-based concept location. In: 2009. ICSM 2009. IEEE international conference on software maintenance. IEEE, Piscataway, pp 351–360
Zurück zum Zitat Guerrouj L (2010) Automatic derivation of concepts based on the analysis of source code identifiers. In: 2013 20th working conference on reverse engineering (WCRE), vol 0, pp 301–304 Guerrouj L (2010) Automatic derivation of concepts based on the analysis of source code identifiers. In: 2013 20th working conference on reverse engineering (WCRE), vol 0, pp 301–304
Zurück zum Zitat Krovetz R (1993) Viewing morphology as an inference process. In: Korfhage R et al (eds) Special interest group on information retrieval Krovetz R (1993) Viewing morphology as an inference process. In: Korfhage R et al (eds) Special interest group on information retrieval
Zurück zum Zitat Lavrenko V, Croft WB (2001) Relevance-based language models. In: Croft WB, Harper DJ, Kraft DH, Zobel J (eds) SIGIR conference on research and development in information retrieval Lavrenko V, Croft WB (2001) Relevance-based language models. In: Croft WB, Harper DJ, Kraft DH, Zobel J (eds) SIGIR conference on research and development in information retrieval
Zurück zum Zitat Lin J, Craig Murray G (2005) Assessing the term independence assumption in blind relevance feedback, ACM, New York Lin J, Craig Murray G (2005) Assessing the term independence assumption in blind relevance feedback, ACM, New York
Zurück zum Zitat Manning C, Raghavan P, Schutze H (2008) Introduction to information retrieval, Cambridge University Press, Cambridge Manning C, Raghavan P, Schutze H (2008) Introduction to information retrieval, Cambridge University Press, Cambridge
Zurück zum Zitat Mccallum A (2002) Mallet: A machine learning for language toolkit Mccallum A (2002) Mallet: A machine learning for language toolkit
Zurück zum Zitat Pradel M, Gross TR (2011) Detecting anomalies in the order of equally-typed method arguments. In: Proceedings of the 2011 international symposium on software testing and analysis. ACM, New York, pp 232–242 Pradel M, Gross TR (2011) Detecting anomalies in the order of equally-typed method arguments. In: Proceedings of the 2011 international symposium on software testing and analysis. ACM, New York, pp 232–242
Zurück zum Zitat Rao S, Kak A (2011) Retrieval from software libraries for bug localization: a comparative study of generic and composite text models. In: Proceedings of the 8th working conference on mining software repositories (MSR ’11). ACM, New York, pp 43–52 Rao S, Kak A (2011) Retrieval from software libraries for bug localization: a comparative study of generic and composite text models. In: Proceedings of the 8th working conference on mining software repositories (MSR ’11). ACM, New York, pp 43–52
Zurück zum Zitat Robertson SE, Walker S, Jones S, Hancock-Beaulieu MM, Gatford M et al (1995) Okapi at trec-3. Nist Special Publication Sp 109:109 Robertson SE, Walker S, Jones S, Hancock-Beaulieu MM, Gatford M et al (1995) Okapi at trec-3. Nist Special Publication Sp 109:109
Zurück zum Zitat Saha RK (2016) Effective bug detection and localization using information retrieval. PhD thesis, University of Texas, Austin Saha RK (2016) Effective bug detection and localization using information retrieval. PhD thesis, University of Texas, Austin
Zurück zum Zitat Savage T., Revelle M., Poshyvanyk D (2010) Flat∧3: Feature location and textual tracing tool. In: Proceedings of 32nd ACM/IEEE international conference on software engineering (ICSE’10), formal research tool demonstration. ACM, New York Savage T., Revelle M., Poshyvanyk D (2010) Flat3: Feature location and textual tracing tool. In: Proceedings of 32nd ACM/IEEE international conference on software engineering (ICSE’10), formal research tool demonstration. ACM, New York
Zurück zum Zitat Sisman B, Kak AC (2013) Assisting code search with automatic query reformulation for bug localization. In: Proceedings of the 10th working conference on mining software repositories. IEEE Press, Piscataway, pp 309–318 Sisman B, Kak AC (2013) Assisting code search with automatic query reformulation for bug localization. In: Proceedings of the 10th working conference on mining software repositories. IEEE Press, Piscataway, pp 309–318
Zurück zum Zitat Tian K, Revelle M, Poshyvanyk D (2009) Using latent dirichlet allocation for automatic categorization of software. In: 2009 MSR’09. 6th, IEEE International Working Conference on Mining software repositories. IEEE, Piscataway, pp 163–166 Tian K, Revelle M, Poshyvanyk D (2009) Using latent dirichlet allocation for automatic categorization of software. In: 2009 MSR’09. 6th, IEEE International Working Conference on Mining software repositories. IEEE, Piscataway, pp 163–166
Zurück zum Zitat Ellen M (2008) Voorhees. Overview of trec 2007. Technical report Ellen M (2008) Voorhees. Overview of trec 2007. Technical report
Zurück zum Zitat Voorhees EM, Hardman DK (1999) Overview of the eightj text retrieval conference (trec-8). In: Trec, vol 99, pp 1–25 Voorhees EM, Hardman DK (1999) Overview of the eightj text retrieval conference (trec-8). In: Trec, vol 99, pp 1–25
Zurück zum Zitat Wang S, Lo D, Lawall J (2014) Compositional vector space models for improved bug localization. In: 2014 IEEE international conference on Software maintenance and evolution (ICSME) . IEEE, Piscataway, pp 171–180 Wang S, Lo D, Lawall J (2014) Compositional vector space models for improved bug localization. In: 2014 IEEE international conference on Software maintenance and evolution (ICSME) . IEEE, Piscataway, pp 171–180
Zurück zum Zitat Wei X, Croft WB (2006) LDA-based document models for ad-hoc retrieval. In: Conference on Research and Development in Information Retrieval, Seattle Wei X, Croft WB (2006) LDA-based document models for ad-hoc retrieval. In: Conference on Research and Development in Information Retrieval, Seattle
Zurück zum Zitat Zhai C, Lafferty J (2004) A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems (TOIS) 22 (2):179–214CrossRef Zhai C, Lafferty J (2004) A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems (TOIS) 22 (2):179–214CrossRef
Zurück zum Zitat Zhai C, Lafferty J (2017) A study of smoothing methods for language models applied to ad hoc information retrieval. SIGIR Forum 51(2):268–276CrossRef Zhai C, Lafferty J (2017) A study of smoothing methods for language models applied to ad hoc information retrieval. SIGIR Forum 51(2):268–276CrossRef
Metadaten
Titel
The need for software specific natural language techniques
verfasst von
Dave Binkley
Dawn Lawrie
Christopher Morrell
Publikationsdatum
25.11.2017
Verlag
Springer US
Erschienen in
Empirical Software Engineering / Ausgabe 4/2018
Print ISSN: 1382-3256
Elektronische ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-017-9566-5

Weitere Artikel der Ausgabe 4/2018

Empirical Software Engineering 4/2018 Zur Ausgabe