ABSTRACT
Large-scale software engineering requires communication and collaboration to successfully build and ship products. We conducted a survey with Microsoft engineers on inter-team coordination and found that the most impactful problems concerned finding and keeping track of other engineers. Since engineers are connected by their shared work, a tool that discovers connections in their work-related repositories can help.
Here we describe the Codebook framework for mining software repositories. It is flexible enough to address all of the problems identified by our survey with a single data structure (graph of people and artifacts) and a single algorithm (regular language reachability). Codebook handles a larger variety of problems than prior work, analyzes more kinds of work artifacts, and can be customized by and for end-users. To evaluate our framework's flexibility, we built two applications, Hoozizat and Deep Intellisense. We evaluated these applications with engineers to show effectiveness in addressing multiple inter-team coordination problems.
- F. Alkhateeb. Querying RDF(S) with Regular Expressions. PhD thesis, Joseph Fourier University of Grenoble, June 2008.Google Scholar
- M. C. Andrew Cencini. Sql server 2005 full-text search: Internals and enhancements. http://msdn.microsoft.com/en-us/library/ms345119(SQL.90).aspx.Google Scholar
- J. Anvik, L. Hiew, and G. C. Murphy. Who should fix this bug? In Proceedings of ICSE, pages 361--370, 2006. Google ScholarDigital Library
- J. Aranda and G. Venolia. The secret life of bugs: Going past the errors and omissions in software repositories. In Proceedings of ICSE, pages 298--308, 2009. Google ScholarDigital Library
- B. Ashok, J. Joy, H. Liang, S. Rajamani, G. Srinivasa, and V. Vangala. Debugadvisor: A recommender system for debugging. In Proceedings of ESEC/FSE '09, August 2009. Google ScholarDigital Library
- A. Begel and R. DeLine. Codebook: Social networking over code. In Proceedings of ICSE, NIER Track, 2009.Google ScholarCross Ref
- A. Begel, N. Nagappan, C. Poile, and L. Layman. Coordination in large-scale software teams. In Proceedings of CHASE, pages 1--7, 2009. Google ScholarDigital Library
- M. Cataldo, D. Damian, P. Devanbu, S. Easterbrook, J. Herbsleb, and A. Mockus. 2nd international workshop on socio-technical congruence, May 2009.Google Scholar
- M. Cataldo, J. D. Herbsleb, and K. M. Carley. Socio-technical congruence: a framework for assessing the impact of technical and work dependencies on software development productivity. In Proceedings of ESEM, pages 2--11, 2008. Google ScholarDigital Library
- M. Cataldo, P. A. Wagstrom, J. D. Herbsleb, and K. M. Carley. Identification of coordination requirements: implications for the design of collaboration and awareness tools. In Proceedings of CSCW, pages 353--362, 2006. Google ScholarDigital Library
- D. Cubranic, J. Singer, and K. S. Booth. Hipikat: A project memory for software development. IEEE TSE, 31(6):446--465, 2005. Member-Gail C. Murphy. Google ScholarDigital Library
- C. de Souza, J. Froehlich, and P. Dourish. Seeking the source: software source code as a social and technical artifact. In Proceedings of GROUP, pages 197--206, 2005. Google ScholarDigital Library
- C. R. B. de Souza and D. F. Redmiles. An empirical study of software developers' management of dependencies and changes. In Proceedings of ICSE, pages 241--250, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- A. E. Hassan. The road ahead for mining software repositories. In Proceedings ICSM, FoSM track, pages 48--57, 2008.Google ScholarCross Ref
- P. Hinds and C. McGrath. Structures that work: social structure, work structure and coordination ease in geographically distributed teams. In Proceedings of CSCW, pages 343--352, 2006. Google ScholarDigital Library
- R. Holmes and A. Begel. Deep intellisense: a tool for rehydrating evaporated information. In Proceedings of MSR, pages 23--26, 2008. Google ScholarDigital Library
- R. C. Holt. Grokking software architecture. In Proceedings of WCRE, pages 5--14, 2008. Google ScholarDigital Library
- D. Hyland-Wood, D. Carrington, and S. Kaplan. Toward a software maintenance methodology using semantic web techniques. In Proceedings of SOFTWARE-EVOLVABILITY, pages 23--30, 2006. Google ScholarDigital Library
- H. H. Kagdi, M. L. Collard, and J. I. Maletic. A survey and taxonomy of approaches for mining software repositories in the context of software evolution. Journal of Software Maintenance, 19(2):77--131, 2007. Google ScholarDigital Library
- C. Kiefer, A. Bernstein, and J. Tappolet. Mining software repositories with iSPARQL and a software evolution ontology. In Proceedings of MSR, page 10, 2007. Google ScholarDigital Library
- A. J. Ko, R. DeLine, and G. Venolia. Information needs in collocated software development teams. In Proceedings of ICSE, pages 344--353, 2007. Google ScholarDigital Library
- K. Kochut and M. Janik. Sparqler: Extended sparql for semantic association discovery. In Proceedings of ESWC, pages 145--159, 2007. Google ScholarDigital Library
- T. D. LaToza, G. Venolia, and R. DeLine. Maintaining mental models: a study of developer work habits. In Proceedings of ICSE, pages 492--501, 2006. Google ScholarDigital Library
- F. Manola and E. Miller. RDS primer. http://www.w3.org/TR/REC-rdf-syntax/, February 2004.Google Scholar
- A. Mockus and J. D. Herbsleb. Expertise browser: a quantitative approach to identifying expertise. In Proceedings of ICSE, pages 503--512, 2002. Google ScholarDigital Library
- E. Prud'hommeaux and A. Seaborne. SPARQL query language for RDF. http://www.w3.org/TR/rdf-sparql-query/, January 2008.Google Scholar
- P. Runeson, M. Alexandersson, and O. Nyholm. Detection of duplicate defect reports using natural language processing. In Proceedings of ICSE, pages 499--510, 2007. Google ScholarDigital Library
- Z. M. Saul, V. Filkov, P. Devanbu, and C. Bird. Recommending random walks. In Proceedings of ESEC-FSE, pages 15--24, 2007. Google ScholarDigital Library
- A. Tarvo. Mining software history to improve software maintenance quality: A case study. IEEE Software, 26(1):34--40, 2009. Google ScholarDigital Library
- E. Trainer, S. Quirk, C. de Souza, and D. Redmiles. Bridging the gap between technical and social dependencies with ariadne. In Proceedings of eTX at OOPSLA, pages 26--30, 2005. Google ScholarDigital Library
- G. Venolia. Textual alusions to artifacts in software-related repositories. In Proceedings of MSR, pages 151--154, 2006. Google ScholarDigital Library
- T. Zimmermann, P. Weißgerber, S. Diehl, and A. Zeller. Mining version histories to guide software changes. IEEE TSE, 31(6):429--445, 2005. Google ScholarDigital Library
Recommendations
Maintaining mental models: a study of developer work habits
ICSE '06: Proceedings of the 28th international conference on Software engineeringTo understand developers' typical tools, activities, and practices and their satisfaction with each, we conducted two surveys and eleven interviews. We found that many problems arose because developers were forced to invest great effort recovering ...
Social coding in GitHub: transparency and collaboration in an open software repository
CSCW '12: Proceedings of the ACM 2012 conference on Computer Supported Cooperative WorkSocial applications on the web let users track and follow the activities of a large number of others regardless of location or affiliation. There is a potential for this transparency to radically improve collaboration and learning in complex knowledge-...
Analyze this! 145 questions for data scientists in software engineering
ICSE 2014: Proceedings of the 36th International Conference on Software EngineeringIn this paper, we present the results from two surveys related to data science applied to software engineering. The first survey solicited questions that software engineers would like data scientists to investigate about software, about software ...
Comments