Skip to main content
Erschienen in: Empirical Software Engineering 2/2018

29.06.2017

Noise in Mylyn interaction traces and its impact on developers and recommendation systems

verfasst von: Zéphyrin Soh, Foutse Khomh, Yann-Gaël Guéhéneuc, Giuliano Antoniol

Erschienen in: Empirical Software Engineering | Ausgabe 2/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Interaction traces (ITs) are developers’ logs collected while developers maintain or evolve software systems. Researchers use ITs to study developers’ editing styles and recommend relevant program entities when developers perform changes on source code. However, when using ITs, they make assumptions that may not necessarily be true. This article assesses the extent to which researchers’ assumptions are true and examines noise in ITs. It also investigates the impact of noise on previous studies. This article describes a quasi-experiment collecting both Mylyn ITs and video-screen captures while 15 participants performed four realistic software maintenance tasks. It assesses the noise in ITs by comparing Mylyn ITs and the ITs obtained from the video captures. It proposes an approach to correct noise and uses this approach to revisit previous studies. The collected data show that Mylyn ITs can miss, on average, about 6% of the time spent by participants performing tasks and can contain, on average, about 85% of false edit events, which are not real changes to the source code. The approach to correct noise reveals about 45% of misclassification of ITs. It can improve the precision and recall of recommendation systems from the literature by up to 56% and 62%, respectively. Mylyn ITs include noise that biases subsequent studies and, thus, can prevent researchers from assisting developers effectively. They must be cleaned before use in studies and recommendation systems. The results on Mylyn ITs open new perspectives for the investigation of noise in ITs generated by other monitoring tools such as DFlow, FeedBag, and Mimec, and for future studies based on ITs.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Fußnoten
1
Example of IT shared in December 2015 by a developer when fixing a bug: https://​bugs.​eclipse.​org/​bugs/​show_​bug.​cgi?​id=​483421
 
5
The archived systems are available in the replication package.
 
8
The inactivity periods is practically impossible to monitor. Even being able to identify these inactivity periods from the videos, it’s still hard to know what developers were doing (e.g., thinking vs. reading the description of the task). However, we mainly use the detection of the mouse focus to know whether developers were reading (visual exploration) the code. Sometimes, developers were reading code when scrolling. Thus, reading code and scrolling the editor could be interchangeably and this does not affect our study regarding the kind of activity performed by the developers as the kind of activity that is mainly used in the following is the edit activity which is easy to identify.
 
9
The original version of this article (Soh et al. 2015) includes a mistake in quantifying the proportion of false edit events, which we corrected in this version. This correction affects only the proportion of the false edit events and does not affect our conclusions.
 
Literatur
Zurück zum Zitat Amann S, Proksch S, Nadi S (2016) Feedbag: an interaction tracker for visual studio 24th IEEE international conference on program comprehension, ICPC 2016, pp 1–3 Amann S, Proksch S, Nadi S (2016) Feedbag: an interaction tracker for visual studio 24th IEEE international conference on program comprehension, ICPC 2016, pp 1–3
Zurück zum Zitat Bantelay F, Zanjani M, Kagdi H (2013) Comparing and combining evolutionary couplings from interactions and commits 2013 20th working conference on reverse engineering (WCRE), pp 311– 320 Bantelay F, Zanjani M, Kagdi H (2013) Comparing and combining evolutionary couplings from interactions and commits 2013 20th working conference on reverse engineering (WCRE), pp 311– 320
Zurück zum Zitat Beller M, Gousios G, Panichella A, Zaidman A (2015) When, how, and why developers (do not) test in their ides Proceedings of the 2015 10th joint meeting on foundations of software engineering, ESEC/FSE 2015, pp 179–190 Beller M, Gousios G, Panichella A, Zaidman A (2015) When, how, and why developers (do not) test in their ides Proceedings of the 2015 10th joint meeting on foundations of software engineering, ESEC/FSE 2015, pp 179–190
Zurück zum Zitat Bouckaert RR, Frank E, Hall M, Kirkby R, Reutemann P, Seewald A, Scuse D (2013) WEKA Manual for Version 3-7-8 Bouckaert RR, Frank E, Hall M, Kirkby R, Reutemann P, Seewald A, Scuse D (2013) WEKA Manual for Version 3-7-8
Zurück zum Zitat Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Int Res 16(1):321–357MATH Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Int Res 16(1):321–357MATH
Zurück zum Zitat DeLine R, Czerwinski M, Robertson G (2005) Easing program comprehension by sharing navigation data 2005 IEEE symposium on visual languages and human-centric computing, pp 241–248 DeLine R, Czerwinski M, Robertson G (2005) Easing program comprehension by sharing navigation data 2005 IEEE symposium on visual languages and human-centric computing, pp 241–248
Zurück zum Zitat Fritz T, Shepherd DC, Kevic K, Snipes W, Bräunlich C (2014) Developers’ code context models for change tasks Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering, FSE 2014, pp 7–18 Fritz T, Shepherd DC, Kevic K, Snipes W, Bräunlich C (2014) Developers’ code context models for change tasks Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering, FSE 2014, pp 7–18
Zurück zum Zitat Kamei Y, Monden A, Matsumoto S, Kakimoto T, Matsumoto KI (2007) The effects of over and under sampling on fault-prone module detection First international symposium on empirical software engineering and measurement, 2007. ESEM 2007, pp 196–204 Kamei Y, Monden A, Matsumoto S, Kakimoto T, Matsumoto KI (2007) The effects of over and under sampling on fault-prone module detection First international symposium on empirical software engineering and measurement, 2007. ESEM 2007, pp 196–204
Zurück zum Zitat Kersten M, Murphy GC (2005) Mylar: a degree-of-interest model for ides Proceedings of the 4th international conference on aspect-oriented software development, AOSD ’05, pp 159–168 Kersten M, Murphy GC (2005) Mylar: a degree-of-interest model for ides Proceedings of the 4th international conference on aspect-oriented software development, AOSD ’05, pp 159–168
Zurück zum Zitat Kersten M, Murphy GC (2006) Using task context to improve programmer productivity Proceedings of the 14th ACM SIGSOFT/FSE, pp 1–11 Kersten M, Murphy GC (2006) Using task context to improve programmer productivity Proceedings of the 14th ACM SIGSOFT/FSE, pp 1–11
Zurück zum Zitat Ko A, Myers B, Coblenz M, Aung H (2006) An exploratory study of how developers seek, relate, and collect relevant information during software maintenance tasks. IEEE Trans Softw Eng 32(12):971–987CrossRef Ko A, Myers B, Coblenz M, Aung H (2006) An exploratory study of how developers seek, relate, and collect relevant information during software maintenance tasks. IEEE Trans Softw Eng 32(12):971–987CrossRef
Zurück zum Zitat Kuhn M (2008) Building predictive models in r using the caret package. J Stat Softw 28(5):1–26CrossRef Kuhn M (2008) Building predictive models in r using the caret package. J Stat Softw 28(5):1–26CrossRef
Zurück zum Zitat Layman LM (2009) Information needs of developers for program comprehension during software maintenance tasks. Ph.D. thesis, North Carolina State University Layman LM (2009) Information needs of developers for program comprehension during software maintenance tasks. Ph.D. thesis, North Carolina State University
Zurück zum Zitat Layman LM, Williams LA, St. Amant R (2008) Mimec: intelligent user notification of faults in the eclipse ide Proceedings of the 2008 international workshop on cooperative and human aspects of software engineering, CHASE ’08, pp 73–76 Layman LM, Williams LA, St. Amant R (2008) Mimec: intelligent user notification of faults in the eclipse ide Proceedings of the 2008 international workshop on cooperative and human aspects of software engineering, CHASE ’08, pp 73–76
Zurück zum Zitat Lee S, Kang S (2013) Clustering navigation sequences to create contexts for guiding code navigation. J Syst Softw Lee S, Kang S (2013) Clustering navigation sequences to create contexts for guiding code navigation. J Syst Softw
Zurück zum Zitat Lee S, Kang S, Kim S, Staats M (2015) The impact of view histories on edit recommendations. IEEE Trans Softw Eng 41(3):314–330CrossRef Lee S, Kang S, Kim S, Staats M (2015) The impact of view histories on edit recommendations. IEEE Trans Softw Eng 41(3):314–330CrossRef
Zurück zum Zitat Minelli R, Mocci A, Lanza M, Kobayashi T (2014) Quantifying program comprehension with interaction data 14th international conference on quality software, QSIC 2014 Minelli R, Mocci A, Lanza M, Kobayashi T (2014) Quantifying program comprehension with interaction data 14th international conference on quality software, QSIC 2014
Zurück zum Zitat Murphy GC, Kersten M, Findlater L (2006) How are java software developers using the eclipse IDE IEEE Soft 23(4):76–83CrossRef Murphy GC, Kersten M, Findlater L (2006) How are java software developers using the eclipse IDE IEEE Soft 23(4):76–83CrossRef
Zurück zum Zitat Parnin C, Rugaber S (2011) Resumption strategies for interrupted programming tasks. Softw Qual J 19(1): 5–34CrossRef Parnin C, Rugaber S (2011) Resumption strategies for interrupted programming tasks. Softw Qual J 19(1): 5–34CrossRef
Zurück zum Zitat Robbes R, Lanza M (2010) Improving code completion with program history. Autom Softw Eng 17(2): 181–212CrossRef Robbes R, Lanza M (2010) Improving code completion with program history. Autom Softw Eng 17(2): 181–212CrossRef
Zurück zum Zitat Robbes R, Röthlisberger D (2013) Using developer interaction data to compare expertise metrics Proceedings MSR, pp 297–300 Robbes R, Röthlisberger D (2013) Using developer interaction data to compare expertise metrics Proceedings MSR, pp 297–300
Zurück zum Zitat Robillard M, Walker R, Zimmermann T (2010) Recommendation systems for software engineering. IEEE Soft 27(4):80–86CrossRef Robillard M, Walker R, Zimmermann T (2010) Recommendation systems for software engineering. IEEE Soft 27(4):80–86CrossRef
Zurück zum Zitat Romano J, Kromrey JD, Coraggio J, Skowronek J (2006) Appropriate statistics for ordinal level data: should we really be using t-test and cohen’s d for evaluating group differences on the nsse and other surveys Annual meeting of the Florida Association of Institutional Research Romano J, Kromrey JD, Coraggio J, Skowronek J (2006) Appropriate statistics for ordinal level data: should we really be using t-test and cohen’s d for evaluating group differences on the nsse and other surveys Annual meeting of the Florida Association of Institutional Research
Zurück zum Zitat Rousseeuw P (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20(1):53–65CrossRefMATH Rousseeuw P (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20(1):53–65CrossRefMATH
Zurück zum Zitat Sanchez H, Robbes R, Gonzalez VM (2015) An empirical study of work fragmentation in software evolution tasks Proceedings SANER, pp 251–260 Sanchez H, Robbes R, Gonzalez VM (2015) An empirical study of work fragmentation in software evolution tasks Proceedings SANER, pp 251–260
Zurück zum Zitat Singer J, Elves R, Storey MA (2005) Navtracks: supporting naviga-tion in software maintenance International conference on software maintenance, pp 325–334 Singer J, Elves R, Storey MA (2005) Navtracks: supporting naviga-tion in software maintenance International conference on software maintenance, pp 325–334
Zurück zum Zitat Soh Z, Drioul T, Rappe PA, Khomh F, Gueheneuc YG, Habra N (2015) Noises in interaction traces data and their impact on previous research studies 9th International symposium on empirical software engineering and measurement. To appear Soh Z, Drioul T, Rappe PA, Khomh F, Gueheneuc YG, Habra N (2015) Noises in interaction traces data and their impact on previous research studies 9th International symposium on empirical software engineering and measurement. To appear
Zurück zum Zitat Soh Z, Khomh F, Gueheneuc YG, Antoniol G (2013) Towards understanding how developers spend their effort during maintenance activities 2013 20th working conference on reverse engineering (WCRE), pp 152–161 Soh Z, Khomh F, Gueheneuc YG, Antoniol G (2013) Towards understanding how developers spend their effort during maintenance activities 2013 20th working conference on reverse engineering (WCRE), pp 152–161
Zurück zum Zitat Soh Z, Khomh F, Gueheneuc YG, Antoniol G, Adams B (2013) On the effect of program exploration on maintenance tasks 2013 20th Working conference on reverse engineering (WCRE), pp 391– 400 Soh Z, Khomh F, Gueheneuc YG, Antoniol G, Adams B (2013) On the effect of program exploration on maintenance tasks 2013 20th Working conference on reverse engineering (WCRE), pp 391– 400
Zurück zum Zitat Tan PN, Steinbach M, Kumar V (2006) Introduction to data mining, chap. 6: Association analysis: basic concepts and algorithms. Pearson Tan PN, Steinbach M, Kumar V (2006) Introduction to data mining, chap. 6: Association analysis: basic concepts and algorithms. Pearson
Zurück zum Zitat Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2016) Automated parameter optimization of classification techniques for defect prediction models Proceedings of the 38th international conference on software engineering, ICSE ’16, pp 321–332 Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2016) Automated parameter optimization of classification techniques for defect prediction models Proceedings of the 38th international conference on software engineering, ICSE ’16, pp 321–332
Zurück zum Zitat Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann Publishers Inc, MorganMATH Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann Publishers Inc, MorganMATH
Zurück zum Zitat Wohlin C, Runeson P, Höst M., Ohlsson MC, Regnell B, Wesslén A (2000) Experimentation in software engineering—an introduction. Kluwer Academic Publishers, KluwerCrossRefMATH Wohlin C, Runeson P, Höst M., Ohlsson MC, Regnell B, Wesslén A (2000) Experimentation in software engineering—an introduction. Kluwer Academic Publishers, KluwerCrossRefMATH
Zurück zum Zitat Ying A, Robillard M (2011) The influence of the task on programmer behaviour Proceedings ICPC, pp 31–40 Ying A, Robillard M (2011) The influence of the task on programmer behaviour Proceedings ICPC, pp 31–40
Zurück zum Zitat Zanjani MB, Swartzendruber G, Kagdi H (2014) Impact analysis of change requests on source code based on interaction and commit histories Proceedings of the 11th working conference on mining software repositories, MSR 2014, pp 162–171 Zanjani MB, Swartzendruber G, Kagdi H (2014) Impact analysis of change requests on source code based on interaction and commit histories Proceedings of the 11th working conference on mining software repositories, MSR 2014, pp 162–171
Zurück zum Zitat Zhang F, Khomh F, Zou Y, Hassan AE (2012) An empirical study of the effect of file editing patterns on software quality Proceedings WCRE, pp 456–465 Zhang F, Khomh F, Zou Y, Hassan AE (2012) An empirical study of the effect of file editing patterns on software quality Proceedings WCRE, pp 456–465
Zurück zum Zitat Zhang F, Zheng Q, Zou Y, Hassan AE (2016) Cross-project defect prediction using a connectivity-based unsupervised classifier Proceedings of the 38th international conference on software engineering, ICSE ’16, pp 309–320 Zhang F, Zheng Q, Zou Y, Hassan AE (2016) Cross-project defect prediction using a connectivity-based unsupervised classifier Proceedings of the 38th international conference on software engineering, ICSE ’16, pp 309–320
Zurück zum Zitat Zimmermann T, Weißgerber P, Diehl S, Zeller A (2005) Mining version histories to guide software changes. IEEE Trans Softw Eng 31(6):429–445CrossRef Zimmermann T, Weißgerber P, Diehl S, Zeller A (2005) Mining version histories to guide software changes. IEEE Trans Softw Eng 31(6):429–445CrossRef
Metadaten
Titel
Noise in Mylyn interaction traces and its impact on developers and recommendation systems
verfasst von
Zéphyrin Soh
Foutse Khomh
Yann-Gaël Guéhéneuc
Giuliano Antoniol
Publikationsdatum
29.06.2017
Verlag
Springer US
Erschienen in
Empirical Software Engineering / Ausgabe 2/2018
Print ISSN: 1382-3256
Elektronische ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-017-9529-x

Weitere Artikel der Ausgabe 2/2018

Empirical Software Engineering 2/2018 Zur Ausgabe