skip to main content
10.1145/3236024.3264598acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
short-paper

PyDriller: Python framework for mining software repositories

Published:26 October 2018Publication History

ABSTRACT

Software repositories contain historical and valuable information about the overall development of software systems. Mining software repositories (MSR) is nowadays considered one of the most interesting growing fields within software engineering. MSR focuses on extracting and analyzing data available in software repositories to uncover interesting, useful, and actionable information about the system. Even though MSR plays an important role in software engineering research, few tools have been created and made public to support developers in extracting information from Git repository. In this paper, we present PyDriller, a Python Framework that eases the process of mining Git. We compare our tool against the state-of-the-art Python Framework GitPython, demonstrating that PyDriller can achieve the same results with, on average, 50% less LOC and significantly lower complexity.

URL: https://github.com/ishepard/pydriller

Materials: https://doi.org/10.5281/zenodo.1327363

Pre-print: https://doi.org/10.5281/zenodo.1327411

References

  1. {n. d.}. GitPython. https://github.com/gitpython-developers/GitPython.Google ScholarGoogle Scholar
  2. Alberto Bacchelli and Christian Bird. 2013. Expectations, outcomes, and challenges of modern code review. In Proc. of the 35th International Conference on Software Engineering. 712–721. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Alberto Bacchelli, Marco D’Ambros, and Michele Lanza. 2010. Are popular classes more defect prone?. In International Conference on Fundamental Approaches to Software Engineering. Springer, 59–73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Moritz Beller, Alberto Bacchelli, Andy Zaidman, and Elmar Juergens. 2014. Modern code reviews in open-source projects: Which problems do they fix?. In Proc. of the 11th working conference on mining software repositories. ACM, 202–211. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. K. Chaturvedi, V. B. Sing, and P. Singh. 2013. Tools in Mining Software Repositories. In 2013 13th International Conference on Computational Science and Its Applications. 89–98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Marco D’Ambros, Alberto Bacchelli, and Michele Lanza. 2010. On the Impact of Design Flaws on Software Defects. In Proc. of the 10th International Conference on Quality Software. 23–31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Robert Dyer, Hoan Anh Nguyen, Hridesh Rajan, and Tien N. Nguyen. 2013. Boa: A Language and Infrastructure for Analyzing Ultra-Large-Scale Software Repositories. In Proc. of the 35th Int’l Conference on Software Engineering. 422–431. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Eclipse Foundation. {n. d.}. JGit. https://www.eclipse.org/jgit/.Google ScholarGoogle Scholar
  9. Georgios Gousios. 2013. The GHTorrent dataset and tool suite. In Proc. of the 10th Working Conference on Mining Software Repositories. 233–236. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Vincent J Hellendoorn, Premkumar T Devanbu, and Alberto Bacchelli. 2015. Will they like this?: Evaluating code contributions with language models. In Proc. of the 12th Working Conference on Mining Software Repositories. IEEE Press, 157–167. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T J McCabe. 1976. A Complexity Measure.Google ScholarGoogle Scholar
  12. Audris Mockus, Roy T Fielding, and James Herbsleb. 2000. A case study of open source software development: the Apache server. In Proc. of the 22nd international conference on Software engineering. Acm, 263–272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Steffen M. Olbrich, Daniela Cruzes, and Dag I. K. Sjøberg. 2010. Are all code smells harmful? A study of God Classes and Brain Classes in the evolution of three open source systems. In 26th IEEE International Conference on Software Maintenance (ICSM 2010), September 12-18, 2010, Timisoara, Romania. 1–10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Fabio Palomba, Gabriele Bavota, Massimiliano Di Penta, Rocco Oliveto, and Andrea De Lucia. 2014. Do They Really Smell Bad? A Study on Developers’ Perception of Bad Code Smells. In Proc. of the 30th International Conference on Software Maintenance and Evolution. 101–110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Fabio Palomba, Annibale Panichella, Andy Zaidman, Rocco Oliveto, and Andrea De Lucia. 2017. The Scent of a Smell: An Extensive Comparison between Textual and Structural Smells. IEEE Transactions on Software Engineering (2017).Google ScholarGoogle Scholar
  16. Luca Pascarella, Fabio Palomba, and Alberto Bacchelli. 2018. Re-evaluating Method-Level Bug Prediction. In Proc. of the 25th International Conference on Software Analysis, Evolution, and Reengineering. 592–601.Google ScholarGoogle ScholarCross RefCross Ref
  17. Jacek Sliwerski, Thomas Zimmermann, and Andreas Zeller. 2005. When do changes induce fixes?. In Proc. of the 2nd International Workshop on Mining Software Repositories. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Francisco Zigmund Sokol, Mauricio Finavaro Aniche, and Marco Aurélio Gerosa. 2013. MetricMiner: Supporting researchers in mining software repositories. IEEE 13th International Working Conference on Source Code Analysis and Manipulation, SCAM 2013 (2013), 142–146.Google ScholarGoogle ScholarCross RefCross Ref
  19. Davide Spadini. 2017. PyDriller Dataset.Google ScholarGoogle Scholar
  20. Davide Spadini, Maurício Aniche, Margaret-Anne Storey, Magiel Bruntink, and Alberto Bacchelli. 2018. When Testing Meets Code Review: Why and How Developers Review Tests. In Proc. of the 40th International Conference on Software Engineering. 677–687. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Andy Zaidman, Bart Van Rompaey, Serge Demeyer, and Arie van Deursen. 2008. Mining Software Repositories to Study Co-Evolution of Production & Test Code. In 2008 International Conference on Software Testing, Verification, and Validation, Vol. 3. IEEE, 220–229. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. PyDriller: Python framework for mining software repositories

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ESEC/FSE 2018: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering
      October 2018
      987 pages
      ISBN:9781450355735
      DOI:10.1145/3236024

      Copyright © 2018 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 26 October 2018

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • short-paper

      Acceptance Rates

      Overall Acceptance Rate112of543submissions,21%

      Upcoming Conference

      FSE '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader