Skip to main content
Top
Published in: World Wide Web 6/2018

05-02-2018

SCSMiner: mining social coding sites for software developer recommendation with relevance propagation

Authors: Yao Wan, Liang Chen, Guandong Xu, Zhou Zhao, Jie Tang, Jian Wu

Published in: World Wide Web | Issue 6/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

With the advent of social coding sites, software development has entered a new era of collaborative work. Social coding sites (e.g., GitHub) can integrate social networking and distributed version control in a unified platform to facilitate collaborative developments over the world. One unique characteristic of such sites is that the past development experiences of developers provided on the sites convey the implicit metrics of developer’s programming capability and expertise, which can be applied in many areas, such as software developer recruitment for IT corporations. Motivated by this intuition, we aim to develop a framework to effectively locate the developers with right coding skills. To achieve this goal, we devise a generativ e probabilistic expert ranking model upon which a consistency among projects is incorporated as graph regularization to enhance the expert ranking and a perspective of relevance propagation illustration is introduced. For evaluation, StackOverflow is leveraged to complement the ground truth of expert. Finally, a prototype system, SCSMiner, which provides expert search service based on a real-world dataset crawled from GitHub is implemented and demonstrated.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Balog, K., Azzopardi, L., De Rijke, M.: Formal models for expert finding in enterprise corpora. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 43–50. ACM (2006) Balog, K., Azzopardi, L., De Rijke, M.: Formal models for expert finding in enterprise corpora. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 43–50. ACM (2006)
2.
go back to reference Balog, K., Fang, Y., de Rijke, M., Serdyukov, P., Si, L.: Expertise retrieval. Found. Trends Inf. Retr. 6(2–3), 127–256 (2012)CrossRef Balog, K., Fang, Y., de Rijke, M., Serdyukov, P., Si, L.: Expertise retrieval. Found. Trends Inf. Retr. 6(2–3), 127–256 (2012)CrossRef
3.
go back to reference Beeferman, D., Berger, A., Lafferty, J.: Statistical models for text segmentation. Mach. Learn. 34(1-3), 177–210 (1999)CrossRef Beeferman, D., Berger, A., Lafferty, J.: Statistical models for text segmentation. Mach. Learn. 34(1-3), 177–210 (1999)CrossRef
4.
go back to reference Begel, A., Bosch, J., Storey, M.-A.: Social networking meets software development: perspectives from github, msdn, stack exchange, and topcoder. IEEE Softw. 30(1), 52–66 (2013)CrossRef Begel, A., Bosch, J., Storey, M.-A.: Social networking meets software development: perspectives from github, msdn, stack exchange, and topcoder. IEEE Softw. 30(1), 52–66 (2013)CrossRef
5.
go back to reference Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH
6.
go back to reference Buckley, C., Voorhees, E.M.: Retrieval evaluation with incomplete information. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 25–32. ACM (2004) Buckley, C., Voorhees, E.M.: Retrieval evaluation with incomplete information. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 25–32. ACM (2004)
7.
go back to reference Chung, F.R.: Spectral Graph Theory, vol. 92. American Mathematical Society, Providence (1997) Chung, F.R.: Spectral Graph Theory, vol. 92. American Mathematical Society, Providence (1997)
8.
go back to reference Dabbish, L., Stuart, C., Tsay, J., Herbsleb, J.: Social coding in github: transparency and collaboration in an open software repository. In: Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work, pp. 1277–1286. ACM (2012) Dabbish, L., Stuart, C., Tsay, J., Herbsleb, J.: Social coding in github: transparency and collaboration in an open software repository. In: Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work, pp. 1277–1286. ACM (2012)
9.
go back to reference Deng, H., Han, J., Lyu, M.R., King, I.: Modeling and exploiting heterogeneous bibliographic networks for expertise ranking. In: Proceedings of the 12th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 71–80. ACM (2012) Deng, H., Han, J., Lyu, M.R., King, I.: Modeling and exploiting heterogeneous bibliographic networks for expertise ranking. In: Proceedings of the 12th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 71–80. ACM (2012)
10.
go back to reference Fang, H., Zhai, C.: Probabilistic Models for Expert Finding. Springer, Berlin (2007)CrossRef Fang, H., Zhai, C.: Probabilistic Models for Expert Finding. Springer, Berlin (2007)CrossRef
11.
go back to reference Fang, Y., Si, L., Mathur, A.P.: Discriminative models of integrating document evidence and document-candidate associations for expert search. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 683–690. ACM (2010) Fang, Y., Si, L., Mathur, A.P.: Discriminative models of integrating document evidence and document-candidate associations for expert search. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 683–690. ACM (2010)
12.
go back to reference Gyimesi, P., Gyimesi, G., Tóth, Z., Ferenc, R.: Characterization of source code defects by data mining conducted on github. In: Computational Science and Its Applications–ICCSA 2015, pp. 47–62. Springer (2015) Gyimesi, P., Gyimesi, G., Tóth, Z., Ferenc, R.: Characterization of source code defects by data mining conducted on github. In: Computational Science and Its Applications–ICCSA 2015, pp. 47–62. Springer (2015)
13.
go back to reference Hauff, C., Gousios, G.: Matching github developer profiles to job advertisements. In: Proceedings of the 12th Working Conference on Mining Software Repositories, pp. 362–366. IEEE Press (2015) Hauff, C., Gousios, G.: Matching github developer profiles to job advertisements. In: Proceedings of the 12th Working Conference on Mining Software Repositories, pp. 362–366. IEEE Press (2015)
14.
go back to reference Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57. ACM (1999) Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57. ACM (1999)
15.
go back to reference Jiang, J., Zhang, L., Li, L.: Understanding project dissemination on a social coding site. In: 20th Working Conference on Reverse Engineering (WCRE), 2013, pp. 132–141. IEEE (2013) Jiang, J., Zhang, L., Li, L.: Understanding project dissemination on a social coding site. In: 20th Working Conference on Reverse Engineering (WCRE), 2013, pp. 132–141. IEEE (2013)
16.
go back to reference Lima, A., Rossi, L., Musolesi, M.: Coding together at scale: github as a collaborative social network. arXiv:1407.2535 Lima, A., Rossi, L., Musolesi, M.: Coding together at scale: github as a collaborative social network. arXiv:1407.​2535
17.
go back to reference Macdonald, C., Ounis, I.: Voting for candidates: adapting data fusion techniques for an expert search task. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management, pp. 387–396. ACM (2006) Macdonald, C., Ounis, I.: Voting for candidates: adapting data fusion techniques for an expert search task. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management, pp. 387–396. ACM (2006)
18.
go back to reference Majumder, A., Datta, S., Naidu, K.: Capacitated team formation problem on social networks. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1005–1013. ACM (2012) Majumder, A., Datta, S., Naidu, K.: Capacitated team formation problem on social networks. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1005–1013. ACM (2012)
19.
go back to reference Marlow, J., Dabbish, L., Herbsleb, J.: Impression formation in online peer production: activity traces and personal profiles in github. In: Proceedings of the 2013 Conference on Computer Supported Cooperative Work, pp. 117–128. ACM (2013) Marlow, J., Dabbish, L., Herbsleb, J.: Impression formation in online peer production: activity traces and personal profiles in github. In: Proceedings of the 2013 Conference on Computer Supported Cooperative Work, pp. 117–128. ACM (2013)
20.
go back to reference Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 275–281. ACM (1998) Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 275–281. ACM (1998)
21.
go back to reference Serdyukov, P., Rode, H., Hiemstra, D.: Modeling multi-step relevance propagation for expert finding. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, pp. 1133–1142. ACM (2008) Serdyukov, P., Rode, H., Hiemstra, D.: Modeling multi-step relevance propagation for expert finding. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, pp. 1133–1142. ACM (2008)
22.
go back to reference Soderland, S.: Learning information extraction rules for semi-structured and free text. Mach. Learn. 34(1-3), 233–272 (1999)CrossRef Soderland, S.: Learning information extraction rules for semi-structured and free text. Mach. Learn. 34(1-3), 233–272 (1999)CrossRef
23.
go back to reference Steyvers, M., Smyth, P., Rosen-Zvi, M., Griffiths, T.: Probabilistic author-topic models for information discovery. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 306–315. ACM (2004) Steyvers, M., Smyth, P., Rosen-Zvi, M., Griffiths, T.: Probabilistic author-topic models for information discovery. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 306–315. ACM (2004)
24.
go back to reference Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: Arnetminer: extraction and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 990–998. ACM (2008) Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: Arnetminer: extraction and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 990–998. ACM (2008)
25.
go back to reference Thung, F., Bissyandé, T.F., Lo, D., Jiang, L.: Network structure of social coding in github. In: 17th European Conference on Software Maintenance and Reengineering (CSMR), 2013, pp. 323–326. IEEE (2013) Thung, F., Bissyandé, T.F., Lo, D., Jiang, L.: Network structure of social coding in github. In: 17th European Conference on Software Maintenance and Reengineering (CSMR), 2013, pp. 323–326. IEEE (2013)
26.
go back to reference Vasilescu, B., Filkov, V., Serebrenik, A.: Stackoverflow and github: associations between software development and crowdsourced knowledge. In: International Conference on Social Computing (Socialcom), 2013, pp. 188–195. IEEE (2013) Vasilescu, B., Filkov, V., Serebrenik, A.: Stackoverflow and github: associations between software development and crowdsourced knowledge. In: International Conference on Social Computing (Socialcom), 2013, pp. 188–195. IEEE (2013)
27.
go back to reference Vendome, C., Linares-Vásquez, M., Bavota, G., Di Penta, M., German, D., Poshyvanyk, D.: License usage and changes: a large-scale study of java projects on github. In: IEEE 23rd International Conference on Program Comprehension (ICPC), 2015, pp. 218–228. IEEE (2015) Vendome, C., Linares-Vásquez, M., Bavota, G., Di Penta, M., German, D., Poshyvanyk, D.: License usage and changes: a large-scale study of java projects on github. In: IEEE 23rd International Conference on Program Comprehension (ICPC), 2015, pp. 218–228. IEEE (2015)
28.
go back to reference White, J.P.: Towards readme-eval: interpreting readme file instructions. ACL 2014, 76 (2014)CrossRef White, J.P.: Towards readme-eval: interpreting readme file instructions. ACL 2014, 76 (2014)CrossRef
29.
go back to reference Zhao, Z., Cheng, J., Wei, F., Zhou, M., Ng, W., Wu, Y.: Socialtransfer: transferring social knowledge for cold-start cowdsourcing. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 779–788. ACM (2014) Zhao, Z., Cheng, J., Wei, F., Zhou, M., Ng, W., Wu, Y.: Socialtransfer: transferring social knowledge for cold-start cowdsourcing. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 779–788. ACM (2014)
30.
go back to reference Zhao, Z., Yang, Q., Cai, D., He, X., Zhuang, Y.: Expert finding for community-based question answering via ranking metric network learning. In: IJCAI, pp. 3000–3006 (2016) Zhao, Z., Yang, Q., Cai, D., He, X., Zhuang, Y.: Expert finding for community-based question answering via ranking metric network learning. In: IJCAI, pp. 3000–3006 (2016)
31.
go back to reference Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B.: Learning with local and global consistency. Advances in Neural Information Processing systems 16(16), 321–328 (2004) Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B.: Learning with local and global consistency. Advances in Neural Information Processing systems 16(16), 321–328 (2004)
32.
go back to reference Zhou, D., Huang, J., Schölkopf, B.: Learning from labeled and unlabeled data on a directed graph. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 1036–1043. ACM (2005) Zhou, D., Huang, J., Schölkopf, B.: Learning from labeled and unlabeled data on a directed graph. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 1036–1043. ACM (2005)
33.
go back to reference Zhou, D., Schölkopf, B.: A regularization framework for learning from graph data. In: ICML Workshop on Statistical Relational Learning and its Connections to Other Fields, vol. 15, pp. 67–68 (2004) Zhou, D., Schölkopf, B.: A regularization framework for learning from graph data. In: ICML Workshop on Statistical Relational Learning and its Connections to Other Fields, vol. 15, pp. 67–68 (2004)
34.
go back to reference Zhu, X., Li, X., Zhang, S.: Block-row sparse multiview multilabel learning for image classification. IEEE Transactions on Cybernetics 46(2), 450–461 (2016)CrossRef Zhu, X., Li, X., Zhang, S.: Block-row sparse multiview multilabel learning for image classification. IEEE Transactions on Cybernetics 46(2), 450–461 (2016)CrossRef
Metadata
Title
SCSMiner: mining social coding sites for software developer recommendation with relevance propagation
Authors
Yao Wan
Liang Chen
Guandong Xu
Zhou Zhao
Jie Tang
Jian Wu
Publication date
05-02-2018
Publisher
Springer US
Published in
World Wide Web / Issue 6/2018
Print ISSN: 1386-145X
Electronic ISSN: 1573-1413
DOI
https://doi.org/10.1007/s11280-018-0526-9

Other articles of this Issue 6/2018

World Wide Web 6/2018 Go to the issue

Premium Partner