Skip to main content
Top

2017 | OriginalPaper | Chapter

7. Lessons Learned from a Decade of FLOSS Data Collection

Authors : Kevin Crowston, Megan Squire

Published in: Big Data Factories

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In 2004 a collaborative research team based at Syracuse University and Elon University began collecting and sharing data in order to understand how free/libre open source software (FLOSS) is made. Embodying some of the same FLOSS ethos, this team created a public-facing repository for their own data and analyses and encouraged other researchers to use it and contribute to it. This chapter tells the story of how the FLOSSmole project began, where the data comes from and what we have learned from it, and how the project has grown and changed over the years. In addition to capturing snapshots of the current state of the FLOSS landscape, FLOSSmole also serves as a mirror to the larger FLOSS ecosystem, since changes in FLOSSmole’s mission and goals over the years necessarily reflect some of the cultural and technological changes taking place in FLOSS itself. As such, FLOSSmole will continue to face many challenges in the future, including the continual need to provide broader access and more sophisticated and relevant data and analyses and to do all this in a way that is sustainable and community driven.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Biazzini, M., & Baudry, B.. (2014) “May the fork be with you”: Novel metrics to analyze collaboration on GitHub. Proceedings of the 5th International Workshop on Emerging Trends in Software Metrics – WETSoM 2014, New York, pp. 37–43. Biazzini, M., & Baudry, B.. (2014) “May the fork be with you”: Novel metrics to analyze collaboration on GitHub. Proceedings of the 5th International Workshop on Emerging Trends in Software Metrics – WETSoM 2014, New York, pp. 37–43.
go back to reference Booch, G., & Brown, A. W. (2003). Collaborative development environments. Advances in Computers, 59, 1–27.CrossRef Booch, G., & Brown, A. W. (2003). Collaborative development environments. Advances in Computers, 59, 1–27.CrossRef
go back to reference Conklin, M. (2006). Beyond low-hanging fruit: Seeking the next generation in FLOSS data mining. In Proceedings of the 2nd IFIP WG 2.13 International Conference on Open Source Systems. Como: IFIP, Elsevier. June 8–10. pp. 47–57. Conklin, M. (2006). Beyond low-hanging fruit: Seeking the next generation in FLOSS data mining. In Proceedings of the 2nd IFIP WG 2.13 International Conference on Open Source Systems. Como: IFIP, Elsevier. June 8–10. pp. 47–57.
go back to reference Corona, E. I. M., & Rossi, B. (2013). Linchpin developers in open source software projects. In Proceedings of The IASTED International Conference on Software Engineering (pp. 8). Innsbruck: ACTA Press. February 11–13. Corona, E. I. M., & Rossi, B. (2013). Linchpin developers in open source software projects. In Proceedings of The IASTED International Conference on Software Engineering (pp. 8). Innsbruck: ACTA Press. February 11–13.
go back to reference Crowston, K., Howison, J., & Hala, A. (2006). Information systems success in free and open source software development: Theory and measures. Software Process–Improvement and Practice, 11(2), 123–148.CrossRef Crowston, K., Howison, J., & Hala, A. (2006). Information systems success in free and open source software development: Theory and measures. Software Process–Improvement and Practice, 11(2), 123–148.CrossRef
go back to reference Crowston, K., Wiggins, A., Howison, J. (2010). Analyzing leadership dynamics in distributed group communication, 43rd Hawaii International Conference on System Sciences (HICSS 2010), Honolulu, Hawaii, USA, pp. 1–10. Crowston, K., Wiggins, A., Howison, J. (2010). Analyzing leadership dynamics in distributed group communication, 43rd Hawaii International Conference on System Sciences (HICSS 2010), Honolulu, Hawaii, USA, pp. 1–10.
go back to reference Gousios, G. (2013). The GHTorent dataset and tool suite. In Proceedings of the 10th Working Conference on Mining Software Repositories (pp. 233–236). IEEE Press. May 18. Gousios, G. (2013). The GHTorent dataset and tool suite. In Proceedings of the 10th Working Conference on Mining Software Repositories (pp. 233–236). IEEE Press. May 18.
go back to reference Howison, J. (2008) Cross-repository data linking with RDF and OWL. 3rd Workshop on Public Data about Software Development (WoPDaSD 2008), pp. 15–22. Howison, J. (2008) Cross-repository data linking with RDF and OWL. 3rd Workshop on Public Data about Software Development (WoPDaSD 2008), pp. 15–22.
go back to reference Howison, J., & Crowston, K. (2004). The perils and pitfalls of mining SourceForge. In Proceedings of the International Workshop on Mining Software Repositories (MSR 2004) (pp. 7–11). Howison, J., & Crowston, K. (2004). The perils and pitfalls of mining SourceForge. In Proceedings of the International Workshop on Mining Software Repositories (MSR 2004) (pp. 7–11).
go back to reference Howison, J., Conklin, M., & Crowston, K. (2006). FLOSSmole: A collaborative repository for FLOSS research data and analyses. International Journal of Information Technology and Web Engineering, 1(3), 17–26.CrossRef Howison, J., Conklin, M., & Crowston, K. (2006). FLOSSmole: A collaborative repository for FLOSS research data and analyses. International Journal of Information Technology and Web Engineering, 1(3), 17–26.CrossRef
go back to reference Iqbal, A., Cyganiak, R., Hausenblas, M. (2012). Integrating FLOSS repositories on the Web, Technical Report #2012-12-10 of the Digital Enterprise Research Institute (DERI) at the National University of Ireland, Galway. Iqbal, A., Cyganiak, R., Hausenblas, M. (2012). Integrating FLOSS repositories on the Web, Technical Report #2012-12-10 of the Digital Enterprise Research Institute (DERI) at the National University of Ireland, Galway.
go back to reference Kina, K., Tsunoda, M., Tamada, H., & Igaki, H. (2016). Analyzing the decision criteria of software developers based on prospect theory. 23rd IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER 2016) at Osaka, 03/2016. Kina, K., Tsunoda, M., Tamada, H., & Igaki, H. (2016). Analyzing the decision criteria of software developers based on prospect theory. 23rd IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER 2016) at Osaka, 03/2016.
go back to reference Mockus, A. (2009). Amassing and indexing a large sample of version control systems: towards the census of public source code history. 6th IEEE Working Conference on Mining Software Repositories, May 16–17. Mockus, A. (2009). Amassing and indexing a large sample of version control systems: towards the census of public source code history. 6th IEEE Working Conference on Mining Software Repositories, May 16–17.
go back to reference Piggot, J., & Amrit, C. (2013). How healthy is my project? Open source project attributes as indicators of success. IFIP Advances in Information and Communication Technology Open Source Software: Quality Verification, Volume 404, Berlin, Heidelberg, pp. 30–44. Piggot, J., & Amrit, C. (2013). How healthy is my project? Open source project attributes as indicators of success. IFIP Advances in Information and Communication Technology Open Source Software: Quality Verification, Volume 404, Berlin, Heidelberg, pp. 30–44.
go back to reference Rezende, H. R., & Esmin, A. A. A. (2010). Proposed application of data mining techniques for clustering software projects. INFOCOMP Special Edition (pp. 43–48). Rezende, H. R., & Esmin, A. A. A. (2010). Proposed application of data mining techniques for clustering software projects. INFOCOMP Special Edition (pp. 43–48).
go back to reference Rossi, B., Russo, B., & Succi, G. (2010). Download patterns and releases in open source software projects: A perfect symbiosis? Open Source Software: New Horizons, 319, 252–267. Rossi, B., Russo, B., & Succi, G. (2010). Download patterns and releases in open source software projects: A perfect symbiosis? Open Source Software: New Horizons, 319, 252–267.
go back to reference Samoladas, I., Bibi, S., Stamelos, I., Sowe Sulayman, K., Deligiannis, I. (2007). A preliminary analysis of publicly available FLOSS measurements: Towards discovering maintainability trends. 2nd Workshop on Public Data about Software Development (WoPDaSD 2007). Samoladas, I., Bibi, S., Stamelos, I., Sowe Sulayman, K., Deligiannis, I. (2007). A preliminary analysis of publicly available FLOSS measurements: Towards discovering maintainability trends. 2nd Workshop on Public Data about Software Development (WoPDaSD 2007).
go back to reference Schweik, C. M., & English, R. (2012). Internet success: A study of open source software commons. Cambridge, MA: MIT Press.CrossRef Schweik, C. M., & English, R. (2012). Internet success: A study of open source software commons. Cambridge, MA: MIT Press.CrossRef
go back to reference Sood, A., Mohamed, T. P., Varma, V. (2013). Topic-focused summarization of chat conversations. Advances in Information Retrieval, Volume 7814 of the series Lecture Notes in Computer Science. 35th European Conference on IR Research, ECIR 2013, Moscow, Russia, March 24–27. Springer. pp. 800–803. Sood, A., Mohamed, T. P., Varma, V. (2013). Topic-focused summarization of chat conversations. Advances in Information Retrieval, Volume 7814 of the series Lecture Notes in Computer Science. 35th European Conference on IR Research, ECIR 2013, Moscow, Russia, March 24–27. Springer. pp. 800–803.
go back to reference Squire, M. (2012). How the FLOSS research community uses email archives. International Journal of Open Source Software and Processes, 4(1), 37–59.CrossRef Squire, M. (2012). How the FLOSS research community uses email archives. International Journal of Open Source Software and Processes, 4(1), 37–59.CrossRef
go back to reference Squire, M. (2013a). Project roles in the apache Software Foundation: A dataset. In Proceedings 10th Working Conference on Mining Software Repositories (MSR2013) (pp. 301–304). San Francisco: IEEE. May 18–19. Squire, M. (2013a). Project roles in the apache Software Foundation: A dataset. In Proceedings 10th Working Conference on Mining Software Repositories (MSR2013) (pp. 301–304). San Francisco: IEEE. May 18–19.
go back to reference Squire, M. (2013b). Apache-affiliated Twitter screen names: A dataset. In Proceedings 10th Working Conference on Mining Software Repositories (MSR2013) (pp. 305–308). San Francisco: IEEE. May 18–19. Squire, M. (2013b). Apache-affiliated Twitter screen names: A dataset. In Proceedings 10th Working Conference on Mining Software Repositories (MSR2013) (pp. 305–308). San Francisco: IEEE. May 18–19.
go back to reference Squire, M. (2013c). A replicable infrastructure for empirical studies of email archives. In Proceedings 3rd International Workshop on Replication in Empirical Software Engineering (RESER 2013) (pp. 43–50). Baltimore: IEEE. October 9. Squire, M. (2013c). A replicable infrastructure for empirical studies of email archives. In Proceedings 3rd International Workshop on Replication in Empirical Software Engineering (RESER 2013) (pp. 43–50). Baltimore: IEEE. October 9.
go back to reference Squire, M. (2015). “Should we move to Stack Overflow?” Measuring the utility of social media for developer support. In Proceedings of 37th International Conference on Software Engineering (ICSE-2015) vol. 2 (pp. 219–228). Florence: IEEE. May 20–22. Squire, M. (2015). “Should we move to Stack Overflow?” Measuring the utility of social media for developer support. In Proceedings of 37th International Conference on Software Engineering (ICSE-2015) vol. 2 (pp. 219–228). Florence: IEEE. May 20–22.
go back to reference Squire, M. (2016). Data sets: The circle of life in Ruby hosting, 2003-2015. In Proceedings of the 13th International Conference on Mining Software Repositories (MSR2016) (pp. 452–455). Austin: ACM. May 15. Squire, M. (2016). Data sets: The circle of life in Ruby hosting, 2003-2015. In Proceedings of the 13th International Conference on Mining Software Repositories (MSR2016) (pp. 452–455). Austin: ACM. May 15.
go back to reference Squire, M. & Gazda, R. (2015). FLOSS as a source for profanity and insults: Collecting the data. In Proceedings of 48th Hawai'i International Conference on System Sciences (HICSS-48) (pp. 5290–5298). Hawaii: IEEE. January 6–8. Squire, M. & Gazda, R. (2015). FLOSS as a source for profanity and insults: Collecting the data. In Proceedings of 48th Hawai'i International Conference on System Sciences (HICSS-48) (pp. 5290–5298). Hawaii: IEEE. January 6–8.
go back to reference Squire, M., & Smith, A. (2015). The diffusion of pastebin tools to enhance communication in FLOSS mailing lists. In Proceedings of the 11th International Conference on Open Source Systems (OSS2015) (pp. 45–57). Florence: IFIP, Elsevier. May 16. Squire, M., & Smith, A. (2015). The diffusion of pastebin tools to enhance communication in FLOSS mailing lists. In Proceedings of the 11th International Conference on Open Source Systems (OSS2015) (pp. 45–57). Florence: IFIP, Elsevier. May 16.
go back to reference Taylor, Q. C., Stevenson James, E.., Delorey Daniel, P., Knutson Charles, D. (2008). Author entropy: A metric for characterization of software authorship patterns, 3rd Workshop on Public Data about Software Development (WoPDaSD 2008), pp. 42–47. Taylor, Q. C., Stevenson James, E.., Delorey Daniel, P., Knutson Charles, D. (2008). Author entropy: A metric for characterization of software authorship patterns, 3rd Workshop on Public Data about Software Development (WoPDaSD 2008), pp. 42–47.
go back to reference Valverde, S., Theraulaz, G., Gautrais, J., Fourcassie, V., & Sole, R. V. (2006). Self-organization patterns in wasp and open source communities. IEEE Intelligent Systems., 03/2006, 21(2), 36–40.CrossRef Valverde, S., Theraulaz, G., Gautrais, J., Fourcassie, V., & Sole, R. V. (2006). Self-organization patterns in wasp and open source communities. IEEE Intelligent Systems., 03/2006, 21(2), 36–40.CrossRef
go back to reference Van Antwerp, M., & Madey, G. (2008). Advances in the Sourceforge Research Data Archive. In Workshop on Public Data about Software Development (WoPDaSD) at The 4th International Conference on Open Source Systems. Milan. Van Antwerp, M., & Madey, G. (2008). Advances in the Sourceforge Research Data Archive. In Workshop on Public Data about Software Development (WoPDaSD) at The 4th International Conference on Open Source Systems. Milan.
go back to reference Wasserman, A., & Das, A.. (2007). Using FLOSSmole data in determining business readiness ratings. 2nd Workshop on Public Data about Software Development (WopDaSD 2007). Wasserman, A., & Das, A.. (2007). Using FLOSSmole data in determining business readiness ratings. 2nd Workshop on Public Data about Software Development (WopDaSD 2007).
go back to reference Zhang, F., Mockus, A., Zou, Y., Khomh, F., and Hassan Ahmed, E. (2013). How does context affect the distribution of software maintainability metrics? Proceedings of the 29th IEEE International Conference on Software Maintainability. Zhang, F., Mockus, A., Zou, Y., Khomh, F., and Hassan Ahmed, E. (2013). How does context affect the distribution of software maintainability metrics? Proceedings of the 29th IEEE International Conference on Software Maintainability.
go back to reference Zhu, L. & Hovy, E. (2005). Digesting virtual “geek” culture: The summarization of technical internet relay chats. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (ACL ’05) (pp. 298–305). Stroudsburg: Association for Computational Linguistics.. Zhu, L. & Hovy, E. (2005). Digesting virtual “geek” culture: The summarization of technical internet relay chats. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (ACL ’05) (pp. 298–305). Stroudsburg: Association for Computational Linguistics..
Metadata
Title
Lessons Learned from a Decade of FLOSS Data Collection
Authors
Kevin Crowston
Megan Squire
Copyright Year
2017
DOI
https://doi.org/10.1007/978-3-319-59186-5_7

Premium Partner