Abstract
As a software system evolves, the system's codebase constantly changes, making it difficult for developers to answer such questions as who is knowledgeable about particular parts of the code or who needs to know about changes made. In this article, we show that an externalized model of a developer's individual knowledge of code can make it easier for developers to answer such questions. We introduce a degree-of-knowledge model that computes automatically, for each source-code element in a codebase, a real value that represents a developer's knowledge of that element based on a developer's authorship and interaction data. We present evidence that shows that both authorship and interaction data of the code are important in characterizing a developer's knowledge of code. We report on the usage of our model in case studies on expert finding, knowledge transfer, and identifying changes of interest. We show that our model improves upon an existing expertise-finding approach and can accurately identify changes for which a developer should likely be aware. We discuss how our model may provide a starting point for knowledge transfer but that more refinement is needed. Finally, we discuss the robustness of the model across multiple development sites.
- Erik M. Altmann. 2001. Near-term memory in programming: A simulation-based analysis. Int. J. Human Comput. Stud. 54, 2, 189--210. citeseer.ist.psu.edu/article/altmann99nearterm.html Google ScholarDigital Library
- Andrew Begel, Yit Phang Khoo, and Thomas Zimmermann. 2010. Codebook: Discovering and exploiting relationships in software repositories. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering (ICSE'10). ACM, 125--134. Google ScholarDigital Library
- Lucy M. Berlin. 1993. Beyond program understanding: A look at programming expertise in industry. In Proceedings of the 5th Workshop on Empirical Studies of Programmers. Curtis R. Cook, Jean C. Scholtz, and James C. Spohrer, Eds., Ablex Publishing Corporation, 6--25. Google ScholarDigital Library
- Jacob T. Biehl, Mary Czerwinski, Greg Smith, and George G. Robertson. 2007. FASTDash: A visual dashboard for fostering awareness in software teams. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI'07). ACM, New York, NY, 1313--1322. DOI: http://dx.doi.org/10.1145/1240624.1240823 Google ScholarDigital Library
- Ruven Brooks. 1978. Using a behavioral theory of program comprehension in software engineering. In Proceedings of the 3rd International Conference on Software Engineering (ICSE'78). IEEE Press, 196--201. http://portal.acm.org/citation.cfm?id=800099.803210 Google ScholarDigital Library
- Ruven Brooks. 1983. Towards a theory of the comprehension of computer programs. Int. J. Man-Mach. Stud. 18, 6, 543--554. DOI: http://dx.doi.org/DOI: 10.1016/S0020-7373(83)80031-5Google ScholarCross Ref
- Neil R. Carlson, William Buskist, Michael E. Enzle, and C. Donald Heth. 2005. Psychology: The Science of Behaviour. Prentice Hall Canada.Google Scholar
- Mauro Cherubini, Gina Venolia, Rob DeLine, and Andrew J. Ko. 2007. Let's go to the whiteboard: How and why software developers use drawings. In Proceedings of CHI. ACM, 557--566. Google ScholarDigital Library
- Robert DeLine, Amir Khella, Mary Czerwinski, and George Robertson. 2005. Towards understanding programs through wear-based filtering. In Proceedings of the ACM Symposium Software Visualization (SoftVis'05). ACM, 183--192. DOI: http://dx.doi.org/10.1145/1056018.1056044 Google ScholarDigital Library
- Françoise Détienne. 2002. Software Design—Cognitive Aspects. Springer-Verlag New York, Inc. Google ScholarDigital Library
- Paul Dourish and Victoria Bellotti. 1992. Awareness and coordination in shared workspaces. In Proceedings of the ACM Conference on Computer-Supported Cooperative Work (CSCW'92). ACM, New York, NY, 107--114. Google ScholarDigital Library
- Thomas Fritz and Gail C. Murphy. 2010. Using information fragments to answer the questions developers ask. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering (ISCE'10). ACM, New York, NY, 175--184. DOI: http://dx.doi.org/10.1145/1806799.1806828 Google ScholarDigital Library
- Thomas Fritz, Gail C. Murphy, and Emily Hill. 2007. Does a programmer's activity indicate knowledge of code? In Proceedings of the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC-FSE'07). ACM, New York, NY, 341--350. DOI: http://dx.doi.org/10.1145/1287624.1287673 Google ScholarDigital Library
- Thomas Fritz, Jingwen Ou, Gail C. Murphy, and Emerson Murphy-Hill. 2010. A degree-of-knowledge model to capture source code familiarity. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering (ISCE'10). ACM, New York, NY, 385--394. DOI: http://dx.doi.org/10.1145/1806799.1806856 Google ScholarDigital Library
- Gary Gillund and Richard M. Shiffrin. 1984. A retrieval model for both recognition and recall. Psychol. Rev. 91, 1, 1--67. Google ScholarDigital Library
- Tudor Girba, Adrian Kuhn, Mauricio Seeberger, and Stéphane Ducasse. 2005. How developers drive software evolution. In Proceedings of the 8th International Workshop on Principles of Software Evolution (IWPSE'05). IEEE Computer Society, 113--122. DOI: http://dx.doi.org/10.1109/IWPSE.2005.21 Google ScholarDigital Library
- Peter Graf and Daniel L. Schacter. 1987. Selective effects of interference on implicit and explicit memory for new associations. J. Exp. Psychol. Learn. Memory Cognition 13.Google ScholarCross Ref
- Carl Gutwin, Reagan Penner, and Kevin Schneider. 2004. Group awareness in distributed software development. In Proceedings of the ACM Conference on Computer Supported Cooperative Work (CSCW'04). ACM, New York, NY, 72--81. DOI: http://dx.doi.org/10.1145/1031607.1031621 Google ScholarDigital Library
- Lile Hattori and Michele Lanza. 2009. Mining the history of synchronous changes to refine code ownership. In Proceedings of the International Workshop on Mining Software Repositories. 141--150. DOI: http://dx.doi.org/10.1109/MSR.2009.5069492 Google ScholarDigital Library
- Reid Holmes and Andrew Begel. 2008. Deep intellisense: A tool for rehydrating evaporated information. In Proceedings of the International Workshop on Mining Software Repositories (MSR'08). ACM, New York, NY, 23--26. DOI: http://dx.doi.org/10.1145/1370750.1370755 Google ScholarDigital Library
- Reid Holmes and Robert J. Walker. 2010. Customized awareness: Recommending relevant external change events. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1 (ICSE'10). ACM, New York, NY, 465--474. Google ScholarDigital Library
- Mik Kersten. 2007. Focusing knowledge work with task context. Ph.D. Dissertation, University of British Columbia. Google ScholarDigital Library
- Mik Kersten and Gail C. Murphy. 2005. Mylar: A degree-of-interest model for IDEs. In Proceedings of the 4th International Conference on Aspect-Oriented Software Development (AOSD'05). ACM, New York, NY, 159--168. DOI: http://dx.doi.org/10.1145/1052898.1052912 Google ScholarDigital Library
- Mik Kersten and Gail C. Murphy. 2006. Using task context to improve programmer productivity. In Proceedings of the 14th ACM SIGSOFT International Symposium on Foundations of Software Engineering (SIGSOFT'06/FSE-14). ACM, New York, NY, USA, 1--11. DOI: http://dx.doi.org/10.1145/1181775.1181777 Google ScholarDigital Library
- Andrew J. Ko, Brad A. Myers, Michael J. Coblenz, and Htet Htet Aung. 2006. An exploratory study of how developers seek, relate, and collect relevant information during software maintenance tasks. IEEE Trans. Softw. Eng. 32, 971--987. DOI: http://dx.doi.org/10.1109/TSE.2006.116 Google ScholarDigital Library
- Thomas D. LaToza and Brad A. Myers. 2010. Hard-to-answer questions about code. In Proceedings of the 2nd Workshop on the Evaluation and Usability of Programming Languages and Tools at SPLASH'10. Google ScholarDigital Library
- Thomas D. LaToza, Gina Venolia, and Robert DeLine. 2006. Maintaining mental models: A study of developer work habits. In Proceedings of the 28th International Conference on Software Engineering (ICSE'06). ACM, New York, NY, 492--501. Google ScholarDigital Library
- Taek Lee, Jaechang Nam, DongGyun Han, Sunghun Kim, and Hoh Peter In. 2011. Micro interaction metrics for defect prediction. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering (ESEC/FSE'11). 311--321. Google ScholarDigital Library
- David W. McDonald and Mark S. Ackerman. 2000. Expertise recommender: A flexible recommendation system and architecture. In Proceedings of the ACM Conference on Computer Supported Cooperative Work (CSCW'00). ACM Press, New York, NY, 231--240. DOI: http://dx.doi.org/10.1145/358916.358994 Google ScholarDigital Library
- Shawn Minto and Gail C. Murphy. 2007. Recommending emergent teams. In Proceedings of the International Workshop on Mining Software Repositories (MSR'07). IEEE Computer Society, 5. DOI: http://dx.doi.org/10.1109/MSR.2007.27 Google ScholarDigital Library
- Audris Mockus and James D. Herbsleb. 2002. Expertise browser: A quantitative approach to identifying expertise. In Proceedings of the 24th International Conference on Software Engineering (ICSE'02). ACM, New York, NY, 503--512. DOI: http://dx.doi.org/10.1145/581339.581401 Google ScholarDigital Library
- Gail C. Murphy, Mik Kersten, and Leah Findlater. 2006. How are java software developers using the Eclipse IDE? IEEE Softw. 23, 4, 76--83. DOI: http://dx.doi.org/10.1109/MS.2006.105 Google ScholarDigital Library
- Emerson Murphy-Hill and Andrew P. Black. 2010. An interactive ambient visualization for code smells. In Proceedings of the ACM Symposium on Software Visualization (SoftVis'10). ACM. Google ScholarDigital Library
- Chris Parnin, Carsten Görg, and Spencer Rugaber. 2006. Enriching revision history with interactions. In Proceedings of the International Workshop on Mining Software Repositories (MSR'06). ACM, 155--158. DOI: http://dx.doi.org/10.1145/1137983.1138019 Google ScholarDigital Library
- Nancy Pennington. 1987. Stimulus structures and mental representations in expert comprehension of computer programs. Cognitive Psychol. 19, 3, 295--341. DOI: http://dx.doi.org/DOI: 10.1016/0010-0285(87) 90007-7Google ScholarCross Ref
- Charles Rich and Richard C. Waters. 1988. The programmer's apprentice: A research overview. Computer 21, 11, 10--25. DOI: http://dx.doi.org/10.1109/2.86782 Google ScholarDigital Library
- Anita Sarma, Gerald Bortis, and Andre van der Hoek. 2007. Towards supporting awareness of indirect conflicts across software configuration management workspaces. In Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering (ASE'07). ACM, New York, NY, 94--103. DOI: http://dx.doi.org/10.1145/1321631.1321647 Google ScholarDigital Library
- Anita Sarma, Zahra Noroozi, and André van der Hoek. 2003. Palantír: Raising awareness among configuration management workspaces. In Proceedings of the 25th International Conference on Software Engineering (ICSE'03). IEEE Computer Society, 444--454. Google ScholarDigital Library
- Daniel L. Schacter. 1987. Implicit memory: History and current status. J. Exp. Psychol. Learn. Memory Cognition 13.Google ScholarCross Ref
- David Schuler and Thomas Zimmermann. 2008. Mining usage expertise from version archives. In Proceedings of the International Workshop on Mining Software Repositories (MSR'08). ACM, 121--124. DOI: http://dx.doi.org/10.1145/1370750.1370779 Google ScholarDigital Library
- Elliot Soloway and Kate Ehrlich. 1984. Empirical studies of programming knowledge. IEEE Trans. Softw. Eng. 10, 5, 595--609. Google ScholarDigital Library
- A. von Mayrhauser and A. M. Vans. 1994. Comprehension processes during large scale maintenance. In Proceedings of the 16th International Conference on Software Engineering. 39--48. Google ScholarDigital Library
- Lijie Zou and Michael W. Godfrey. 2008. Understanding interaction differences between newcomer and expert programmers. In Proceedings of the International Workshop on Recommendation Systems for Software Engineering (RSSE'08). ACM, 26--29. DOI: http://dx.doi.org/10.1145/1454247.1454256 Google ScholarDigital Library
Index Terms
- Degree-of-knowledge: Modeling a developer's knowledge of code
Recommendations
Does a programmer's activity indicate knowledge of code?
ESEC-FSE '07: Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineeringThe practice of software development can likely be improved if an externalized model of each programmer's knowledge of a particular code base is available. Some tools already assume a useful form of such a model can be created from data collected during ...
An empirical study on how expert knowledge affects bug reports
Bug reports are crucial software artifacts for both software maintenance researchers and practitioners. A typical use of bug reports by researchers is to evaluate automated software maintenance tools: a large repository of reports is used as input for a ...
Determining Differences in Reading Behavior Between Experts and Novices by Investigating Eye Movement on Source Code Constructs During a Bug Fixing Task
ETRA '21 Short Papers: ACM Symposium on Eye Tracking Research and ApplicationsThis research compares the eye movement of expert and novice programmers working on a bug fixing task. This comparison aims at investigating which source code elements programmers focus on when they review Java source code. Programmer code reading ...
Comments