Skip to main content
Erschienen in: Empirical Software Engineering 3/2019

12.10.2018

Categorizing the Content of GitHub README Files

verfasst von: Gede Artha Azriadi Prana, Christoph Treude, Ferdian Thung, Thushari Atapattu, David Lo

Erschienen in: Empirical Software Engineering | Ausgabe 3/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

README files play an essential role in shaping a developer’s first impression of a software repository and in documenting the software project that the repository hosts. Yet, we lack a systematic understanding of the content of a typical README file as well as tools that can process these files automatically. To close this gap, we conduct a qualitative study involving the manual annotation of 4,226 README file sections from 393 randomly sampled GitHub repositories and we design and evaluate a classifier and a set of features that can categorize these sections automatically. We find that information discussing the ‘What’ and ‘How’ of a repository is very common, while many README files lack information regarding the purpose and status of a repository. Our multi-label classifier which can predict eight different categories achieves an F1 score of 0.746. To evaluate the usefulness of the classification, we used the automatically determined classes to label sections in GitHub README files using badges and showed files with and without these badges to twenty software professionals. The majority of participants perceived the automated labeling of sections based on our classifier to ease information discovery. This work enables the owners of software repositories to improve the quality of their documentation and it has the potential to make it easier for the software development community to discover relevant information in GitHub README files.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Fußnoten
11
We only consider README.md files in our work since these are the ones that GitHub initializes automatically. GitHub also supports further formats such as README.rst, but these are much less common and out of scope for this study.
 
13
In cases where there was perfect agreement between the two annotators, the majority vote rule simply yields the codes that both annotators agreed on.
 
Literatur
Zurück zum Zitat Abebe SL, Ali N, Hassan AE (2016) An empirical study of software release notes. Empir Softw Eng 21(3):1107–1142CrossRef Abebe SL, Ali N, Hassan AE (2016) An empirical study of software release notes. Empir Softw Eng 21(3):1107–1142CrossRef
Zurück zum Zitat Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the International Conference on Management of Data. ACM, New York, pp 207–216 Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the International Conference on Management of Data. ACM, New York, pp 207–216
Zurück zum Zitat Antoniol G, Ayari K, Di Penta M, Khomh F, Guéhéneuc YG (2008) Is it a bug or an enhancement?: A text-based approach to classify change requests. In: Proceedings of the Conference of the Center for Advanced Studies on Collaborative Research: Meeting of Minds. ACM, New York, pp 23:304–23:318 Antoniol G, Ayari K, Di Penta M, Khomh F, Guéhéneuc YG (2008) Is it a bug or an enhancement?: A text-based approach to classify change requests. In: Proceedings of the Conference of the Center for Advanced Studies on Collaborative Research: Meeting of Minds. ACM, New York, pp 23:304–23:318
Zurück zum Zitat Asaduzzaman M, Mashiyat AS, Roy CK, Schneider KA (2013) Answering questions about unanswered questions of stack overflow. In: Proceedings of the 10th Working Conference on Mining Software Repositories. IEEE Press, Piscataway, pp 97–100 Asaduzzaman M, Mashiyat AS, Roy CK, Schneider KA (2013) Answering questions about unanswered questions of stack overflow. In: Proceedings of the 10th Working Conference on Mining Software Repositories. IEEE Press, Piscataway, pp 97–100
Zurück zum Zitat Begel A, Bosch J, Storey MA (2013) Social networking meets software development: Perspectives from GitHub, MSDN, stack exchange, and topcoder. IEEE Softw 30(1):52–66CrossRef Begel A, Bosch J, Storey MA (2013) Social networking meets software development: Perspectives from GitHub, MSDN, stack exchange, and topcoder. IEEE Softw 30(1):52–66CrossRef
Zurück zum Zitat Bird S, Klein E, Loper E (2009) Natural language processing with Python: Analyzing text with the natural language toolkit. O’Reilly Media Inc, SebastopolMATH Bird S, Klein E, Loper E (2009) Natural language processing with Python: Analyzing text with the natural language toolkit. O’Reilly Media Inc, SebastopolMATH
Zurück zum Zitat Boughorbel S, Jarray F, El-Anbari M (2017) Optimal classifier for imbalanced data using matthews correlation coefficient metric. PloS one 12(6):e0177,678CrossRef Boughorbel S, Jarray F, El-Anbari M (2017) Optimal classifier for imbalanced data using matthews correlation coefficient metric. PloS one 12(6):e0177,678CrossRef
Zurück zum Zitat Campos EC, de Almeida Maia M (2014) Automatic categorization of questions from Q&A sites. In: Proceedings of the 29th Annual ACM Symposium on Applied Computing. ACM, New York, pp 641–643 Campos EC, de Almeida Maia M (2014) Automatic categorization of questions from Q&A sites. In: Proceedings of the 29th Annual ACM Symposium on Applied Computing. ACM, New York, pp 641–643
Zurück zum Zitat Canfora G, De Lucia A, Di Penta M, Oliveto R, Panichella A, Panichella S (2013) Multi-objective cross-project defect prediction. In: 2013 IEEE 6th International Conference on Software Testing, Verification And Validation (ICST). IEEE, pp 252-261 Canfora G, De Lucia A, Di Penta M, Oliveto R, Panichella A, Panichella S (2013) Multi-objective cross-project defect prediction. In: 2013 IEEE 6th International Conference on Software Testing, Verification And Validation (ICST). IEEE, pp 252-261
Zurück zum Zitat Chaparro O, Lu J, Zampetti F, Moreno L, Di Penta M, Marcus A, Bavota G, Ng V (2017) Detecting missing information in bug descriptions. In: Proceedings of the Joint Meeting on Foundations of Software Engineering, ACM, pp 396–407 Chaparro O, Lu J, Zampetti F, Moreno L, Di Penta M, Marcus A, Bavota G, Ng V (2017) Detecting missing information in bug descriptions. In: Proceedings of the Joint Meeting on Foundations of Software Engineering, ACM, pp 396–407
Zurück zum Zitat Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357MATHCrossRef Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357MATHCrossRef
Zurück zum Zitat Chen N, Lin J, Hoi SCH, Xiao X, Zhang B (2014) Ar-miner: Mining Informative reviews for developers from mobile app marketplace. In: Proceedings of the 36th International Conference on Software Engineering. ACM, New York, pp 767–778 Chen N, Lin J, Hoi SCH, Xiao X, Zhang B (2014) Ar-miner: Mining Informative reviews for developers from mobile app marketplace. In: Proceedings of the 36th International Conference on Software Engineering. ACM, New York, pp 767–778
Zurück zum Zitat Corbin JM, Strauss A (1990) Grounded theory research: Procedures, canons, and evaluative criteria. Qual Sociol 13(1):3–21CrossRef Corbin JM, Strauss A (1990) Grounded theory research: Procedures, canons, and evaluative criteria. Qual Sociol 13(1):3–21CrossRef
Zurück zum Zitat Correa D, Sureka A (2014) Chaff from the wheat: Characterization and modeling of deleted questions on stack overflow. In: Proceedings of the 23rd International Conference on World Wide Web. ACM, New York, pp 631–642 Correa D, Sureka A (2014) Chaff from the wheat: Characterization and modeling of deleted questions on stack overflow. In: Proceedings of the 23rd International Conference on World Wide Web. ACM, New York, pp 631–642
Zurück zum Zitat Davies S, Roper M (2014) What’s in a bug report?. In: Proceedings of the International Symposium on Empirical Software Engineering and Measurement, ACM, p 26 Davies S, Roper M (2014) What’s in a bug report?. In: Proceedings of the International Symposium on Empirical Software Engineering and Measurement, ACM, p 26
Zurück zum Zitat Decan A, Mens T, Claes M, Grosjean P (2016) When GitHub meets CRAN: An analysis of inter-repository package dependency problems. In: Proceedings of the 23rd International Conference on Software Analysis, Evolution, and Reengineering. IEEE, Piscataway, pp 493–504 Decan A, Mens T, Claes M, Grosjean P (2016) When GitHub meets CRAN: An analysis of inter-repository package dependency problems. In: Proceedings of the 23rd International Conference on Software Analysis, Evolution, and Reengineering. IEEE, Piscataway, pp 493–504
Zurück zum Zitat Ding W, Liang P, Tang A, Van Vliet H (2014) Knowledge-based approaches in software documentation: A systematic literature review. Inf Softw Technol 56(6):545–567CrossRef Ding W, Liang P, Tang A, Van Vliet H (2014) Knowledge-based approaches in software documentation: A systematic literature review. Inf Softw Technol 56(6):545–567CrossRef
Zurück zum Zitat Erdem A, Johnson WL, Marsella S (1998) Task oriented software understanding. In: Proceedings of the 13th International Conference on Automated Software Engineering. IEEE Computer Society, Washington, DC, pp 230–239 Erdem A, Johnson WL, Marsella S (1998) Task oriented software understanding. In: Proceedings of the 13th International Conference on Automated Software Engineering. IEEE Computer Society, Washington, DC, pp 230–239
Zurück zum Zitat Erdös K, Sneed HM (1998) Partial comprehension of complex programs (enough to perform maintenance). In: Proceedings of the 6th International Workshop on Program Comprehension. IEEE Computer Society, Washington, DC, pp 98–105 Erdös K, Sneed HM (1998) Partial comprehension of complex programs (enough to perform maintenance). In: Proceedings of the 6th International Workshop on Program Comprehension. IEEE Computer Society, Washington, DC, pp 98–105
Zurück zum Zitat Fogel K (2005) Producing open source software: How to run a successful free software project. O’Reilly Media, Inc., Sebastopol Fogel K (2005) Producing open source software: How to run a successful free software project. O’Reilly Media, Inc., Sebastopol
Zurück zum Zitat Fritz T, Murphy GC (2010) Using information fragments to answer the questions developers ask. In: Proceedings of the International Conference on Software Engineering, vol 1. ACM, New York, pp 175–184 Fritz T, Murphy GC (2010) Using information fragments to answer the questions developers ask. In: Proceedings of the International Conference on Software Engineering, vol 1. ACM, New York, pp 175–184
Zurück zum Zitat Greene GJ, Fischer B (2016) Cvexplorer: Identifying candiyear developers by mining and exploring their open source contributions. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. ACM, New York, pp 804–809 Greene GJ, Fischer B (2016) Cvexplorer: Identifying candiyear developers by mining and exploring their open source contributions. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. ACM, New York, pp 804–809
Zurück zum Zitat Guzman E, El-Haliby M, Bruegge B (2015) Ensemble methods for app review classification: An approach for software evolution (n). In: Proceedings of the 30th International Conference on Automated Software Engineering. IEEE Press, Piscataway, pp 771–776 Guzman E, El-Haliby M, Bruegge B (2015) Ensemble methods for app review classification: An approach for software evolution (n). In: Proceedings of the 30th International Conference on Automated Software Engineering. IEEE Press, Piscataway, pp 771–776
Zurück zum Zitat Haenni N, Lungu M, Schwarz N, Nierstrasz O (2013) Categorizing developer information needs in software ecosystems. In: Proceedings of the International Workshop on Ecosystem Architectures. ACM, New York, pp 1–5 Haenni N, Lungu M, Schwarz N, Nierstrasz O (2013) Categorizing developer information needs in software ecosystems. In: Proceedings of the International Workshop on Ecosystem Architectures. ACM, New York, pp 1–5
Zurück zum Zitat Hassan F, Wang X (2017) Mining readme files to support automatic building of Java projects in software repositories: Poster. In: Proceedings of the 39th International Conference on Software Engineering Companion. IEEE Press, Piscataway, pp 277–279 Hassan F, Wang X (2017) Mining readme files to support automatic building of Java projects in software repositories: Poster. In: Proceedings of the 39th International Conference on Software Engineering Companion. IEEE Press, Piscataway, pp 277–279
Zurück zum Zitat Hauff C, Gousios G (2015) Matching GitHub developer profiles to job advertisements. In: Proceedings of the 12th Working Conference on Mining Software Repositories. IEEE Press, Piscataway, pp 362–366 Hauff C, Gousios G (2015) Matching GitHub developer profiles to job advertisements. In: Proceedings of the 12th Working Conference on Mining Software Repositories. IEEE Press, Piscataway, pp 362–366
Zurück zum Zitat Herbsleb JD, Kuwana E (1993) Preserving knowledge in design projects: What designers need to know. In: Proceedings of the INTERACT ’93 and CHI ’93 Conference on Human Factors in Computing Systems. ACM, New York, pp 7–14 Herbsleb JD, Kuwana E (1993) Preserving knowledge in design projects: What designers need to know. In: Proceedings of the INTERACT ’93 and CHI ’93 Conference on Human Factors in Computing Systems. ACM, New York, pp 7–14
Zurück zum Zitat Hou D, Wong K, Hoover HJ (2005) What can programmer questions tell us about frameworks?. In: Proceedings of the 13th International Workshop on Program Comprehension. IEEE, Piscataway, pp 87–96 Hou D, Wong K, Hoover HJ (2005) What can programmer questions tell us about frameworks?. In: Proceedings of the 13th International Workshop on Program Comprehension. IEEE, Piscataway, pp 87–96
Zurück zum Zitat Jeong SY, Xie Y, Beaton J, Myers BA, Stylos J, Ehret R, Karstens J, Efeoglu A, Busse DK (2009) Improving documentation for eSOA APIs through user studies. In: Proceedings of the 2nd International Symposium on End-User Development. Springer, Berlin, pp 86–105 Jeong SY, Xie Y, Beaton J, Myers BA, Stylos J, Ehret R, Karstens J, Efeoglu A, Busse DK (2009) Improving documentation for eSOA APIs through user studies. In: Proceedings of the 2nd International Symposium on End-User Development. Springer, Berlin, pp 86–105
Zurück zum Zitat Johnson WL, Erdem A (1997) Interactive explanation of software systems. Autom Softw Eng 4(1):53–75CrossRef Johnson WL, Erdem A (1997) Interactive explanation of software systems. Autom Softw Eng 4(1):53–75CrossRef
Zurück zum Zitat Kalliamvakou E, Gousios G, Blincoe K, Singer L, German DM, Damian D (2014) The promises and perils of mining GitHub. In: Proceedings of the 11th Working Conference on Mining Software Repositories. ACM, New York, pp 92–101 Kalliamvakou E, Gousios G, Blincoe K, Singer L, German DM, Damian D (2014) The promises and perils of mining GitHub. In: Proceedings of the 11th Working Conference on Mining Software Repositories. ACM, New York, pp 92–101
Zurück zum Zitat Kim S, Whitehead Jr EJ, Zhang Y (2008) Classifying software changes: Clean or buggy? IEEE Trans Softw Eng 34(2):181–196 Kim S, Whitehead Jr EJ, Zhang Y (2008) Classifying software changes: Clean or buggy? IEEE Trans Softw Eng 34(2):181–196
Zurück zum Zitat Kirk D, Roper M, Wood M (2007) Identifying and addressing problems in object-oriented framework reuse. Empir Softw Eng 12(3):243–274CrossRef Kirk D, Roper M, Wood M (2007) Identifying and addressing problems in object-oriented framework reuse. Empir Softw Eng 12(3):243–274CrossRef
Zurück zum Zitat Ko AJ, DeLine R, Venolia G (2007) Information needs in collocated software development teams. In: Proceedings of the 29th International Conference on Software Engineering. IEEE Computer Society, Washington, DC, pp 344–353 Ko AJ, DeLine R, Venolia G (2007) Information needs in collocated software development teams. In: Proceedings of the 29th International Conference on Software Engineering. IEEE Computer Society, Washington, DC, pp 344–353
Zurück zum Zitat Kumar N, Devanbu PT (2016) Ontocat: Automatically categorizing knowledge in API documentation. arXiv:1607.07602:preprint Kumar N, Devanbu PT (2016) Ontocat: Automatically categorizing knowledge in API documentation. arXiv:1607.​07602:preprint
Zurück zum Zitat Kurtanović Z, Maalej W (2017) Mining user rationale from software reviews. In: Proceedings of the 25th International Requirements Engineering Conference. IEEE, Piscataway, pp 61–70 Kurtanović Z, Maalej W (2017) Mining user rationale from software reviews. In: Proceedings of the 25th International Requirements Engineering Conference. IEEE, Piscataway, pp 61–70
Zurück zum Zitat LaToza TD, Myers BA (2010) Hard-to-answer questions about code. In: Evaluation and Usability of Programming Languages and Tools. ACM, New York, pp 8:1–8:6 LaToza TD, Myers BA (2010) Hard-to-answer questions about code. In: Evaluation and Usability of Programming Languages and Tools. ACM, New York, pp 8:1–8:6
Zurück zum Zitat Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: A proposed framework and novel findings. IEEE Trans Softw Eng 34(4):485–496CrossRef Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: A proposed framework and novel findings. IEEE Trans Softw Eng 34(4):485–496CrossRef
Zurück zum Zitat Luaces O, Díez J, Barranquero J, del Coz JJ, Bahamonde A (2012) Binary relevance efficacy for multilabel classification. Progress in Artificial Intelligence 1(4):303–313CrossRef Luaces O, Díez J, Barranquero J, del Coz JJ, Bahamonde A (2012) Binary relevance efficacy for multilabel classification. Progress in Artificial Intelligence 1(4):303–313CrossRef
Zurück zum Zitat Maalej W, Robillard MP (2013) Patterns of knowledge in API reference documentation. IEEE Trans Softw Eng 39(9):1264–1282CrossRef Maalej W, Robillard MP (2013) Patterns of knowledge in API reference documentation. IEEE Trans Softw Eng 39(9):1264–1282CrossRef
Zurück zum Zitat Maalej W, Kurtanović Z, Nabil H, Stanik C (2016) On the automatic classification of app reviews. Requir Eng 21(3):311–331CrossRef Maalej W, Kurtanović Z, Nabil H, Stanik C (2016) On the automatic classification of app reviews. Requir Eng 21(3):311–331CrossRef
Zurück zum Zitat Mahmoud A, Williams G (2016) Detecting, classifying, and tracing non-functional software requirements. Requir Eng 21(3):357–381CrossRef Mahmoud A, Williams G (2016) Detecting, classifying, and tracing non-functional software requirements. Requir Eng 21(3):357–381CrossRef
Zurück zum Zitat Miles MB, Huberman AM (1994) Qualitative data analysis: An expanded sourcebook. SAGE publications, Thousand Oaks Miles MB, Huberman AM (1994) Qualitative data analysis: An expanded sourcebook. SAGE publications, Thousand Oaks
Zurück zum Zitat Monperrus M, Eichberg M, Tekes E, Mezini M (2012) What should developers be aware of? an empirical study on the directives of api documentation. Empir Softw Eng 17(6):703–737CrossRef Monperrus M, Eichberg M, Tekes E, Mezini M (2012) What should developers be aware of? an empirical study on the directives of api documentation. Empir Softw Eng 17(6):703–737CrossRef
Zurück zum Zitat Moreno L, Bavota G, Di Penta M, Oliveto R, Marcus A, Canfora G (2014) Automatic generation of release notes. In: Proceedings of the International Symposium on Foundations of Software Engineering, ACM, pp 484–495 Moreno L, Bavota G, Di Penta M, Oliveto R, Marcus A, Canfora G (2014) Automatic generation of release notes. In: Proceedings of the International Symposium on Foundations of Software Engineering, ACM, pp 484–495
Zurück zum Zitat Mylopoulos J, Borgida A, Yu E (1997) Representing software engineering knowledge. Autom Softw Eng 4(3):291–317CrossRef Mylopoulos J, Borgida A, Yu E (1997) Representing software engineering knowledge. Autom Softw Eng 4(3):291–317CrossRef
Zurück zum Zitat Nam J, Pan SJ, Kim S (2013) Transfer defect learning. In: 2013 Proceedings of the International Conference on Software Engineering. IEEE Press, pp 382-391 Nam J, Pan SJ, Kim S (2013) Transfer defect learning. In: 2013 Proceedings of the International Conference on Software Engineering. IEEE Press, pp 382-391
Zurück zum Zitat Nasehi SM, Sillito J, Maurer F, Burns C (2012) What makes a good code example?: A study of programming Q&A in StackOverflow. In: Proceedings of the International Conference on Software Maintenance. IEEE Computer Society, Washington, DC, pp 25–34 Nasehi SM, Sillito J, Maurer F, Burns C (2012) What makes a good code example?: A study of programming Q&A in StackOverflow. In: Proceedings of the International Conference on Software Maintenance. IEEE Computer Society, Washington, DC, pp 25–34
Zurück zum Zitat Nykaza J, Messinger R, Boehme F, Norman CL, Mace M, Gordon M (2002) What programmers really want: Results of a needs assessment for sdk documentation. In: Proceedings of the 20th Annual International Conference on Computer Documentation. ACM, New York, pp 133–141 Nykaza J, Messinger R, Boehme F, Norman CL, Mace M, Gordon M (2002) What programmers really want: Results of a needs assessment for sdk documentation. In: Proceedings of the 20th Annual International Conference on Computer Documentation. ACM, New York, pp 133–141
Zurück zum Zitat Pagano D, Maalej W (2013) How do open source communities blog? Empir Softw Eng 18(6):1090–1124CrossRef Pagano D, Maalej W (2013) How do open source communities blog? Empir Softw Eng 18(6):1090–1124CrossRef
Zurück zum Zitat Panichella S, Di Sorbo A, Guzman E, Visaggio CA, Canfora G, Gall HC (2015) How can i improve my app? classifying user reviews for software maintenance and evolution. In: 2015 IEEE international conference on Software maintenance and evolution (ICSME). IEEE, pp 281-290 Panichella S, Di Sorbo A, Guzman E, Visaggio CA, Canfora G, Gall HC (2015) How can i improve my app? classifying user reviews for software maintenance and evolution. In: 2015 IEEE international conference on Software maintenance and evolution (ICSME). IEEE, pp 281-290
Zurück zum Zitat Parnin C, Treude C (2011) Measuring API documentation on the web. In: Proceedings of the 2nd International Workshop on Web 2.0 for Software Engineering. ACM, New York, pp 25–30 Parnin C, Treude C (2011) Measuring API documentation on the web. In: Proceedings of the 2nd International Workshop on Web 2.0 for Software Engineering. ACM, New York, pp 25–30
Zurück zum Zitat Parnin C, Treude C, Storey MA (2013) Blogging developer knowledge: Motivations, challenges, and future directions. In: Proceedings of the 21st International Conference on Program Comprehension. IEEE Press, Piscataway, pp 211–214 Parnin C, Treude C, Storey MA (2013) Blogging developer knowledge: Motivations, challenges, and future directions. In: Proceedings of the 21st International Conference on Program Comprehension. IEEE Press, Piscataway, pp 211–214
Zurück zum Zitat Pascarella L, Bacchelli A (2017) Classifying code comments in java open-source software systems. In: Proceedings of the 14th International Conference on Mining Software Repositories. IEEE Press, Piscataway, pp 227–237 Pascarella L, Bacchelli A (2017) Classifying code comments in java open-source software systems. In: Proceedings of the 14th International Conference on Mining Software Repositories. IEEE Press, Piscataway, pp 227–237
Zurück zum Zitat Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: Machine learning in python. J Mach Learn Res 12(Oct):2825–2830MathSciNetMATH Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: Machine learning in python. J Mach Learn Res 12(Oct):2825–2830MathSciNetMATH
Zurück zum Zitat Portugal RLQ, do Prado Leite JCS (2016) Extracting requirements patterns from software repositories. In: Proceedings of the 24th International Requirements Engineering Conference Workshops. IEEE, Piscataway, pp 304–307 Portugal RLQ, do Prado Leite JCS (2016) Extracting requirements patterns from software repositories. In: Proceedings of the 24th International Requirements Engineering Conference Workshops. IEEE, Piscataway, pp 304–307
Zurück zum Zitat Prasetyo PK, Lo D, Achananuparp P, Tian Y, Lim EP (2012) Automatic classification of software related microblogs. In: Software Maintenance (ICSM), 2012 28th IEEE International Conference on, IEEE, pp 596–599 Prasetyo PK, Lo D, Achananuparp P, Tian Y, Lim EP (2012) Automatic classification of software related microblogs. In: Software Maintenance (ICSM), 2012 28th IEEE International Conference on, IEEE, pp 596–599
Zurück zum Zitat Rahman F, Devanbu P (2013) How, and why, process metrics are better. In: 2013 Proceedings of the International Conference on Software Engineering. IEEE Press, pp 432-441 Rahman F, Devanbu P (2013) How, and why, process metrics are better. In: 2013 Proceedings of the International Conference on Software Engineering. IEEE Press, pp 432-441
Zurück zum Zitat Rahman F, Posnett D, Devanbu P (2012) Recalling the imprecision of cross-project defect prediction. In: Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, ACM, pp 61:1–61:11 Rahman F, Posnett D, Devanbu P (2012) Recalling the imprecision of cross-project defect prediction. In: Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, ACM, pp 61:1–61:11
Zurück zum Zitat Romano D, Pinzger M (2011) Using source code metrics to predict change-prone java interfaces. In: 2011 27th IEEE International Conference on Software Maintenance (ICSM). IEEE, pp 303–312 Romano D, Pinzger M (2011) Using source code metrics to predict change-prone java interfaces. In: 2011 27th IEEE International Conference on Software Maintenance (ICSM). IEEE, pp 303–312
Zurück zum Zitat Sharma A, Thung F, Kochhar PS, Sulistya A, Lo D (2017) Cataloging GitHub repositories. In: Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering. ACM, New York, pp 314–319 Sharma A, Thung F, Kochhar PS, Sulistya A, Lo D (2017) Cataloging GitHub repositories. In: Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering. ACM, New York, pp 314–319
Zurück zum Zitat Sillito J, Murphy GC, De Volder K (2006) Questions programmers ask during software evolution tasks. In: Proceedings of the International Symposium on the Foundations of Software Engineering. ACM, New York, pp 23–34 Sillito J, Murphy GC, De Volder K (2006) Questions programmers ask during software evolution tasks. In: Proceedings of the International Symposium on the Foundations of Software Engineering. ACM, New York, pp 23–34
Zurück zum Zitat Sillito J, Murphy GC, De Volder K (2008) Asking and answering questions during a programming change task. IEEE Trans Softw Eng 34(4):434–451CrossRef Sillito J, Murphy GC, De Volder K (2008) Asking and answering questions during a programming change task. IEEE Trans Softw Eng 34(4):434–451CrossRef
Zurück zum Zitat Sorbo AD, Panichella S, Visaggio CA, Penta MD, Canfora G, Gall HC (2015) Development emails content analyzer: Intention mining in developer discussions (t). In: Proceedings of the 30th International Conference on Automated Software Engineering. IEEE Press, Piscataway, pp 12–23 Sorbo AD, Panichella S, Visaggio CA, Penta MD, Canfora G, Gall HC (2015) Development emails content analyzer: Intention mining in developer discussions (t). In: Proceedings of the 30th International Conference on Automated Software Engineering. IEEE Press, Piscataway, pp 12–23
Zurück zum Zitat de Souza LBL, Campos EC, Maia MdA (2014) Ranking crowd knowledge to assist software development. In: Proceedings of the 22nd International Conference on Program Comprehension. ACM, New York, pp 72–82 de Souza LBL, Campos EC, Maia MdA (2014) Ranking crowd knowledge to assist software development. In: Proceedings of the 22nd International Conference on Program Comprehension. ACM, New York, pp 72–82
Zurück zum Zitat Steinmacher I, Conte TU, Treude C, Gerosa MA (2016) Overcoming open source project entry barriers with a portal for newcomers. In: Proceedings of the 38th International Conference on Software Engineering. ACM, New York, pp 273–284 Steinmacher I, Conte TU, Treude C, Gerosa MA (2016) Overcoming open source project entry barriers with a portal for newcomers. In: Proceedings of the 38th International Conference on Software Engineering. ACM, New York, pp 273–284
Zurück zum Zitat Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2017) An empirical comparison of model validation techniques for defect prediction models. IEEE Trans Softw Eng 43(1):1–18CrossRef Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2017) An empirical comparison of model validation techniques for defect prediction models. IEEE Trans Softw Eng 43(1):1–18CrossRef
Zurück zum Zitat Tiarks R, Maalej W (2014) How does a typical tutorial for mobile development look like?. In: Proceedings of the 11th Working Conference on Mining Software Repositories. ACM, New York, pp 272–281 Tiarks R, Maalej W (2014) How does a typical tutorial for mobile development look like?. In: Proceedings of the 11th Working Conference on Mining Software Repositories. ACM, New York, pp 272–281
Zurück zum Zitat Treude C, Robillard MP (2016) Augmenting API documentation with insights from stack overflow. In: Proceedings of the 38th International Conference on Software Engineering. ACM, New York, pp 392–403 Treude C, Robillard MP (2016) Augmenting API documentation with insights from stack overflow. In: Proceedings of the 38th International Conference on Software Engineering. ACM, New York, pp 392–403
Zurück zum Zitat Treude C, Barzilay O, Storey MA (2011) How do programmers ask and answer questions on the web? (NIER track). In: Proceedings of the 33rd International Conference on Software Engineering. ACM, New York, pp 804–807 Treude C, Barzilay O, Storey MA (2011) How do programmers ask and answer questions on the web? (NIER track). In: Proceedings of the 33rd International Conference on Software Engineering. ACM, New York, pp 804–807
Zurück zum Zitat Treude C, Figueira Filho F, Kulesza U (2015) Summarizing and measuring development activity. In: Proceedings of the 10th Joint Meeting on Foundations of Software Engineering. ACM, New York, pp 625–636 Treude C, Figueira Filho F, Kulesza U (2015) Summarizing and measuring development activity. In: Proceedings of the 10th Joint Meeting on Foundations of Software Engineering. ACM, New York, pp 625–636
Zurück zum Zitat Trockman A, Zhou S, Kästner C, Vasilescu B (2018) Adding sparkle to social coding: an empirical study of repository badges in the npm ecosystem. In: Proceedings of the 40th International Conference on Software Engineering, ACM, pp 511–522 Trockman A, Zhou S, Kästner C, Vasilescu B (2018) Adding sparkle to social coding: an empirical study of repository badges in the npm ecosystem. In: Proceedings of the 40th International Conference on Software Engineering, ACM, pp 511–522
Zurück zum Zitat Xia X, Feng Y, Lo D, Chen Z, Wang X (2014) Towards more accurate multi-label software behavior learning. In: 2014 Software Evolution Week-IEEE Conference on Software maintenance, reengineering and reverse engineering (CSMR-WCRE). IEEE, pp 134-143 Xia X, Feng Y, Lo D, Chen Z, Wang X (2014) Towards more accurate multi-label software behavior learning. In: 2014 Software Evolution Week-IEEE Conference on Software maintenance, reengineering and reverse engineering (CSMR-WCRE). IEEE, pp 134-143
Zurück zum Zitat Zhang Y, Lo D, Kochhar PS, Xia X, Li Q, Sun J (2017) Detecting similar repositories on GitHub. In: Proceedings of the 24th International Conference on Software Analysis, Evolution and Reengineering. IEEE, Piscataway, pp 13–23 Zhang Y, Lo D, Kochhar PS, Xia X, Li Q, Sun J (2017) Detecting similar repositories on GitHub. In: Proceedings of the 24th International Conference on Software Analysis, Evolution and Reengineering. IEEE, Piscataway, pp 13–23
Zurück zum Zitat Zimmermann T, Premraj R, Bettenburg N, Just S, Schröter A, Weiss C (2010) What makes a good bug report? IEEE Trans Softw Eng 36(5):618–643CrossRef Zimmermann T, Premraj R, Bettenburg N, Just S, Schröter A, Weiss C (2010) What makes a good bug report? IEEE Trans Softw Eng 36(5):618–643CrossRef
Metadaten
Titel
Categorizing the Content of GitHub README Files
verfasst von
Gede Artha Azriadi Prana
Christoph Treude
Ferdian Thung
Thushari Atapattu
David Lo
Publikationsdatum
12.10.2018
Verlag
Springer US
Erschienen in
Empirical Software Engineering / Ausgabe 3/2019
Print ISSN: 1382-3256
Elektronische ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-018-9660-3

Weitere Artikel der Ausgabe 3/2019

Empirical Software Engineering 3/2019 Zur Ausgabe