skip to main content
10.1145/1806799.1806821acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Codebook: discovering and exploiting relationships in software repositories

Published:01 May 2010Publication History

ABSTRACT

Large-scale software engineering requires communication and collaboration to successfully build and ship products. We conducted a survey with Microsoft engineers on inter-team coordination and found that the most impactful problems concerned finding and keeping track of other engineers. Since engineers are connected by their shared work, a tool that discovers connections in their work-related repositories can help.

Here we describe the Codebook framework for mining software repositories. It is flexible enough to address all of the problems identified by our survey with a single data structure (graph of people and artifacts) and a single algorithm (regular language reachability). Codebook handles a larger variety of problems than prior work, analyzes more kinds of work artifacts, and can be customized by and for end-users. To evaluate our framework's flexibility, we built two applications, Hoozizat and Deep Intellisense. We evaluated these applications with engineers to show effectiveness in addressing multiple inter-team coordination problems.

References

  1. F. Alkhateeb. Querying RDF(S) with Regular Expressions. PhD thesis, Joseph Fourier University of Grenoble, June 2008.Google ScholarGoogle Scholar
  2. M. C. Andrew Cencini. Sql server 2005 full-text search: Internals and enhancements. http://msdn.microsoft.com/en-us/library/ms345119(SQL.90).aspx.Google ScholarGoogle Scholar
  3. J. Anvik, L. Hiew, and G. C. Murphy. Who should fix this bug? In Proceedings of ICSE, pages 361--370, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Aranda and G. Venolia. The secret life of bugs: Going past the errors and omissions in software repositories. In Proceedings of ICSE, pages 298--308, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. B. Ashok, J. Joy, H. Liang, S. Rajamani, G. Srinivasa, and V. Vangala. Debugadvisor: A recommender system for debugging. In Proceedings of ESEC/FSE '09, August 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Begel and R. DeLine. Codebook: Social networking over code. In Proceedings of ICSE, NIER Track, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  7. A. Begel, N. Nagappan, C. Poile, and L. Layman. Coordination in large-scale software teams. In Proceedings of CHASE, pages 1--7, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Cataldo, D. Damian, P. Devanbu, S. Easterbrook, J. Herbsleb, and A. Mockus. 2nd international workshop on socio-technical congruence, May 2009.Google ScholarGoogle Scholar
  9. M. Cataldo, J. D. Herbsleb, and K. M. Carley. Socio-technical congruence: a framework for assessing the impact of technical and work dependencies on software development productivity. In Proceedings of ESEM, pages 2--11, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Cataldo, P. A. Wagstrom, J. D. Herbsleb, and K. M. Carley. Identification of coordination requirements: implications for the design of collaboration and awareness tools. In Proceedings of CSCW, pages 353--362, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Cubranic, J. Singer, and K. S. Booth. Hipikat: A project memory for software development. IEEE TSE, 31(6):446--465, 2005. Member-Gail C. Murphy. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. de Souza, J. Froehlich, and P. Dourish. Seeking the source: software source code as a social and technical artifact. In Proceedings of GROUP, pages 197--206, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. C. R. B. de Souza and D. F. Redmiles. An empirical study of software developers' management of dependencies and changes. In Proceedings of ICSE, pages 241--250, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. E. Hassan. The road ahead for mining software repositories. In Proceedings ICSM, FoSM track, pages 48--57, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  15. P. Hinds and C. McGrath. Structures that work: social structure, work structure and coordination ease in geographically distributed teams. In Proceedings of CSCW, pages 343--352, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. Holmes and A. Begel. Deep intellisense: a tool for rehydrating evaporated information. In Proceedings of MSR, pages 23--26, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. C. Holt. Grokking software architecture. In Proceedings of WCRE, pages 5--14, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. Hyland-Wood, D. Carrington, and S. Kaplan. Toward a software maintenance methodology using semantic web techniques. In Proceedings of SOFTWARE-EVOLVABILITY, pages 23--30, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. H. H. Kagdi, M. L. Collard, and J. I. Maletic. A survey and taxonomy of approaches for mining software repositories in the context of software evolution. Journal of Software Maintenance, 19(2):77--131, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. C. Kiefer, A. Bernstein, and J. Tappolet. Mining software repositories with iSPARQL and a software evolution ontology. In Proceedings of MSR, page 10, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. J. Ko, R. DeLine, and G. Venolia. Information needs in collocated software development teams. In Proceedings of ICSE, pages 344--353, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. K. Kochut and M. Janik. Sparqler: Extended sparql for semantic association discovery. In Proceedings of ESWC, pages 145--159, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. T. D. LaToza, G. Venolia, and R. DeLine. Maintaining mental models: a study of developer work habits. In Proceedings of ICSE, pages 492--501, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. F. Manola and E. Miller. RDS primer. http://www.w3.org/TR/REC-rdf-syntax/, February 2004.Google ScholarGoogle Scholar
  25. A. Mockus and J. D. Herbsleb. Expertise browser: a quantitative approach to identifying expertise. In Proceedings of ICSE, pages 503--512, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. E. Prud'hommeaux and A. Seaborne. SPARQL query language for RDF. http://www.w3.org/TR/rdf-sparql-query/, January 2008.Google ScholarGoogle Scholar
  27. P. Runeson, M. Alexandersson, and O. Nyholm. Detection of duplicate defect reports using natural language processing. In Proceedings of ICSE, pages 499--510, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Z. M. Saul, V. Filkov, P. Devanbu, and C. Bird. Recommending random walks. In Proceedings of ESEC-FSE, pages 15--24, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. A. Tarvo. Mining software history to improve software maintenance quality: A case study. IEEE Software, 26(1):34--40, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. E. Trainer, S. Quirk, C. de Souza, and D. Redmiles. Bridging the gap between technical and social dependencies with ariadne. In Proceedings of eTX at OOPSLA, pages 26--30, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. G. Venolia. Textual alusions to artifacts in software-related repositories. In Proceedings of MSR, pages 151--154, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. T. Zimmermann, P. Weißgerber, S. Diehl, and A. Zeller. Mining version histories to guide software changes. IEEE TSE, 31(6):429--445, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    ICSE '10: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
    May 2010
    627 pages
    ISBN:9781605587196
    DOI:10.1145/1806799

    Copyright © 2010 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 1 May 2010

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate276of1,856submissions,15%

    Upcoming Conference

    ICSE 2025

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader