ABSTRACT
Software repositories contain historical and valuable information about the overall development of software systems. Mining software repositories (MSR) is nowadays considered one of the most interesting growing fields within software engineering. MSR focuses on extracting and analyzing data available in software repositories to uncover interesting, useful, and actionable information about the system. Even though MSR plays an important role in software engineering research, few tools have been created and made public to support developers in extracting information from Git repository. In this paper, we present PyDriller, a Python Framework that eases the process of mining Git. We compare our tool against the state-of-the-art Python Framework GitPython, demonstrating that PyDriller can achieve the same results with, on average, 50% less LOC and significantly lower complexity.
URL: https://github.com/ishepard/pydriller
Materials: https://doi.org/10.5281/zenodo.1327363
Pre-print: https://doi.org/10.5281/zenodo.1327411
- {n. d.}. GitPython. https://github.com/gitpython-developers/GitPython.Google Scholar
- Alberto Bacchelli and Christian Bird. 2013. Expectations, outcomes, and challenges of modern code review. In Proc. of the 35th International Conference on Software Engineering. 712–721. Google ScholarDigital Library
- Alberto Bacchelli, Marco D’Ambros, and Michele Lanza. 2010. Are popular classes more defect prone?. In International Conference on Fundamental Approaches to Software Engineering. Springer, 59–73. Google ScholarDigital Library
- Moritz Beller, Alberto Bacchelli, Andy Zaidman, and Elmar Juergens. 2014. Modern code reviews in open-source projects: Which problems do they fix?. In Proc. of the 11th working conference on mining software repositories. ACM, 202–211. Google ScholarDigital Library
- K. K. Chaturvedi, V. B. Sing, and P. Singh. 2013. Tools in Mining Software Repositories. In 2013 13th International Conference on Computational Science and Its Applications. 89–98. Google ScholarDigital Library
- Marco D’Ambros, Alberto Bacchelli, and Michele Lanza. 2010. On the Impact of Design Flaws on Software Defects. In Proc. of the 10th International Conference on Quality Software. 23–31. Google ScholarDigital Library
- Robert Dyer, Hoan Anh Nguyen, Hridesh Rajan, and Tien N. Nguyen. 2013. Boa: A Language and Infrastructure for Analyzing Ultra-Large-Scale Software Repositories. In Proc. of the 35th Int’l Conference on Software Engineering. 422–431. Google ScholarDigital Library
- Eclipse Foundation. {n. d.}. JGit. https://www.eclipse.org/jgit/.Google Scholar
- Georgios Gousios. 2013. The GHTorrent dataset and tool suite. In Proc. of the 10th Working Conference on Mining Software Repositories. 233–236. Google ScholarDigital Library
- Vincent J Hellendoorn, Premkumar T Devanbu, and Alberto Bacchelli. 2015. Will they like this?: Evaluating code contributions with language models. In Proc. of the 12th Working Conference on Mining Software Repositories. IEEE Press, 157–167. Google ScholarDigital Library
- T J McCabe. 1976. A Complexity Measure.Google Scholar
- Audris Mockus, Roy T Fielding, and James Herbsleb. 2000. A case study of open source software development: the Apache server. In Proc. of the 22nd international conference on Software engineering. Acm, 263–272. Google ScholarDigital Library
- Steffen M. Olbrich, Daniela Cruzes, and Dag I. K. Sjøberg. 2010. Are all code smells harmful? A study of God Classes and Brain Classes in the evolution of three open source systems. In 26th IEEE International Conference on Software Maintenance (ICSM 2010), September 12-18, 2010, Timisoara, Romania. 1–10. Google ScholarDigital Library
- Fabio Palomba, Gabriele Bavota, Massimiliano Di Penta, Rocco Oliveto, and Andrea De Lucia. 2014. Do They Really Smell Bad? A Study on Developers’ Perception of Bad Code Smells. In Proc. of the 30th International Conference on Software Maintenance and Evolution. 101–110. Google ScholarDigital Library
- Fabio Palomba, Annibale Panichella, Andy Zaidman, Rocco Oliveto, and Andrea De Lucia. 2017. The Scent of a Smell: An Extensive Comparison between Textual and Structural Smells. IEEE Transactions on Software Engineering (2017).Google Scholar
- Luca Pascarella, Fabio Palomba, and Alberto Bacchelli. 2018. Re-evaluating Method-Level Bug Prediction. In Proc. of the 25th International Conference on Software Analysis, Evolution, and Reengineering. 592–601.Google ScholarCross Ref
- Jacek Sliwerski, Thomas Zimmermann, and Andreas Zeller. 2005. When do changes induce fixes?. In Proc. of the 2nd International Workshop on Mining Software Repositories. Google ScholarDigital Library
- Francisco Zigmund Sokol, Mauricio Finavaro Aniche, and Marco Aurélio Gerosa. 2013. MetricMiner: Supporting researchers in mining software repositories. IEEE 13th International Working Conference on Source Code Analysis and Manipulation, SCAM 2013 (2013), 142–146.Google ScholarCross Ref
- Davide Spadini. 2017. PyDriller Dataset.Google Scholar
- Davide Spadini, Maurício Aniche, Margaret-Anne Storey, Magiel Bruntink, and Alberto Bacchelli. 2018. When Testing Meets Code Review: Why and How Developers Review Tests. In Proc. of the 40th International Conference on Software Engineering. 677–687. Google ScholarDigital Library
- Andy Zaidman, Bart Van Rompaey, Serge Demeyer, and Arie van Deursen. 2008. Mining Software Repositories to Study Co-Evolution of Production & Test Code. In 2008 International Conference on Software Testing, Verification, and Validation, Vol. 3. IEEE, 220–229. Google ScholarDigital Library
Index Terms
- PyDriller: Python framework for mining software repositories
Recommendations
Process mining software repositories from student projects in an undergraduate software engineering course
ICSE Companion 2014: Companion Proceedings of the 36th International Conference on Software EngineeringAn undergraduate level Software Engineering courses generally consists of a team-based semester long project and emphasizes on both technical and managerial skills. Software Engineering is a practice-oriented and applied discipline and hence there is ...
Web development with python and django (abstract only)
SIGCSE '12: Proceedings of the 43rd ACM technical symposium on Computer Science EducationMany instructors have already discovered the joy of teaching programming using the Python programming language. Now it's time to take Python to the next level. This workshop will introduce Django, an open source Python web framework that saves you time ...
On mining sensor network software repositories
SESENA '11: Proceedings of the 2nd Workshop on Software Engineering for Sensor Network ApplicationsWireless Sensor Network (WSN) software is typically developed in one of the two prominent WSN operating systems: TinyOS or Contiki. Both of these operating systems are open-source projects and basically frameworks for WSN developers. In this paper, we ...
Comments