ABSTRACT
Communication & Co-ordination activities are central to large software projects, but are difficult to observe and study in traditional (closed-source, commercial) settings because of the prevalence of informal, direct communication modes. OSS projects, on the other hand, use the internet as the communication medium,and typically conduct discussions in an open, public manner. As a result, the email archives of OSS projects provide a useful trace of the communication and co-ordination activities of the participants. However, there are various challenges that must be addressed before this data can be effectively mined. Once this is done, we can construct social networks of email correspondents, and begin to address some interesting questions. These include questions relating to participation in the email; the social status of different types of OSS participants; the relationship of email activity and commit activity (in the CVS repositories) and the relationship of social status with commit activity. In this paper, we begin with a discussion of our infrastructure (including a novel use of Scientific Workflow software) and then discuss our approach to mining the email archives; and finally we present some preliminary results from our data analysis.
- R. Agrawal, S. Rajagopalan, R. Srikant, and Y. Xu. Mining newsgroups using networks arising from social behavior. In WWW '03: Proceedings of the 12th international conference on World Wide Web, 2003. Google ScholarDigital Library
- A.-L. Barabási and R. Albert. Emergence of scaling in random networks. Science, 286:509--512, 1999.Google ScholarCross Ref
- C. Bird, A. Gourley, P. Devanbu, A. Swaminathan, and M. Gertz. Mining email social networks in postgres. In MSR '06: Proceedings of the International Workshop on Mining Software Repositories, 2006. Google ScholarDigital Library
- F. Brooks. The Mythical Man-Month: Essays on Software Engineering, 20th Anniversary Edition. Addison-Wesley, 1995. Google ScholarDigital Library
- S. Chapman. Sam's string metrics page. www.dcs.shef.ac.uk/ sam/stringmetrics.html.Google Scholar
- J. F. P. D. Cleidson de Souza. Seeking the source: Software source code as a social and technical artifact, 2005. http://opensource.mit.edu/papers/desouza.pdf.Google Scholar
- K. Crowston and J. Howison. The social structure of free and open source software development. opensource.mit.edu/papers/crowstonhowison.pdf, November 2004.Google Scholar
- B. J. Dempsey, D. Weiss, P. Jones, and J. Greenberg. Who is an open source software developer? Communications of the ACM, 45(2):67--72, February 2002. Google ScholarDigital Library
- L. C. Freeman. Centrality in social networks I. Conceptual clarification. Social Networks, 1:215--239, 1979.Google ScholarCross Ref
- M. Granovetter. The strength of weak ties. American Journal of Sociology, 78:1360--1380, 1973.Google ScholarCross Ref
- K. Kuwabara. Linux: A bazaar at the edge of chaos. First Monday, 5(3), March 2000.Google Scholar
- L. Lopez, J. M. Gonzalez-Barahona, and G. Robles. Applying social network analysis to the information in cvs repositories. In Proceedings of the International Workshop on Mining Software Repositories, 2004.Google Scholar
- G. Navarro. A guided tour to approximate string matching. ACM Comput. Surveys, 33(1):31--88, 2001. Google ScholarDigital Library
- M. E. J. Newman. The structure and function of complex networks. SIAM Review, 45:167--256, 2003.Google ScholarDigital Library
- J. Nieminen. On centrality in a graph. Scandinavian Journal of Psychology, 15:322--336, 1974.Google ScholarCross Ref
- E. S. Raymond. The Cathedral and the Bazaar: Musings on Linux and Open Source by an Accidental Revolutionary. O'Reilly and Associates, Sebastopol, California, 1999. Google ScholarDigital Library
- E. Ukkonen. Algorithms for approximate string matching. Information & Control, 64(1-3), 1985. Google ScholarDigital Library
- P. A. Wagstrom, J. D. Herbsleb, and K. Carley. A social network approach to free/open source software simulation. In Proceedings First International Conference on Open Source Systems, pages 16--23, 2005.Google Scholar
- J. Xu, Y. Gao, S. Christley, and G. Madey. A topological analysis of the open source software development community. In HICSS '05: Proceedings of the Proceedings of the 38th Annual Hawaii International Conference on System Sciences (HICSS'05) - Track 7, 2005. Google ScholarDigital Library
Index Terms
- Mining email social networks
Recommendations
Mining email social networks in Postgres
MSR '06: Proceedings of the 2006 international workshop on Mining software repositoriesOpen Source Software (OSS) projects provide a unique opportunity to gather and analyze publicly available historical data. The Postgres SQL server, for example, has over seven years of recorded development and communication activity. We mined data from ...
Mining Interaction Behaviors for Email Reply Order Prediction
ASONAM '10: Proceedings of the 2010 International Conference on Advances in Social Networks Analysis and MiningIn email networks, user behaviors affect the way emails are sent and replied. While knowing these user behaviors can help to create more intelligent email services, there has not been much research into mining these behaviors. In this paper, we ...
Using social networks to harvest email addresses
WPES '10: Proceedings of the 9th annual ACM workshop on Privacy in the electronic societySocial networking is one of the most popular Internet activities with millions of members from around the world. However, users are unaware of the privacy risks involved. Even if they protect their private information, their name is enough to be used ...
Comments