Skip to main content
Erschienen in: World Wide Web 3/2020

27.01.2020

iLinker: a novel approach for issue knowledge acquisition in GitHub projects

verfasst von: Yang Zhang, Yiwen Wu, Tao Wang, Huaimin Wang

Erschienen in: World Wide Web | Ausgabe 3/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Social coding facilitates the sharing of knowledge in GitHub projects. In particular, issue reports, as an important knowledge in the software development, usually contain relevant information, and can thus be shared and linked in the developers’ discussion to aid the issue resolution. Linking issues to potentially related issues, i.e. issue knowledge acquisition, would provide developers with more targeted resource and information when they search and resolve issues. However, identifying and acquiring related issues is in general challenging, because the real-world acquiring practice is time-consuming and mainly depends on the experience and knowledge of the individual developers. Therefore, acquiring related issues automatically is a meaningful task which can improve development efficiency of GitHub projects. In this paper, we formulate the problem of acquiring related issue knowledge as a recommendation problem. To solve this problem, we propose a novel approach, iLinker, combining information retrieval technique, i.e. TF-IDF, and deep learning techniques, i.e. Word Embedding and Document Embedding. Our evaluation results show that, in both coarse-grained recommendation and fine-grained recommendation tasks, iLinker outperforms the baseline approaches.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
5
In our dataset, the percentage of developers’ linking duplicate issues is less than 20%: during the analysis, we randomly select 250 link cases from collected data of Request project (with population= 1,110 and confidence level= 95%, the confidence interval\(\simeq \)5.46), and manual check the duplicate relationship between two linked issues by following the strategy used by Ye et al. [31]. The analysis is performed by two coders (first and third author) separately. The inter-rater agreement between the two coders is almost perfect (Fleiss’s Kappa value [32] is 0.83). All authors reviewed and agreed on the final result.
 
6
In this study, we use Lancaster stemmer that was implemented by NLTK. Because it works very well in Python programs and it is a very aggressive stemming algorithm with the fastest processing speed. It can reduce our working set of words hugely, which is meaningful for the GitHub projects to quickly train issues data and build practical tools.
 
17
In our study, for each query issue, we calculate its metric values for NextBug and iLinker. We compute p-value and Cliff’s delta based on all query issues. We use Bonferroni correction to counteract the impact of multiple hypothesis tests.
 
18
For each group, the Wilcoxon test results and Cliff’s delta confirm that their differences are significant and substantial.
 
Literatur
1.
Zurück zum Zitat Dabbish, L., Stuart, C., Tsay, J., Herbsleb, J.: Social Coding in Github: Transparency and Collaboration in an Open Software Repository. In: CSCW, pp. 1277–1286. ACM (2012) Dabbish, L., Stuart, C., Tsay, J., Herbsleb, J.: Social Coding in Github: Transparency and Collaboration in an Open Software Repository. In: CSCW, pp. 1277–1286. ACM (2012)
2.
Zurück zum Zitat Zhang, Y., Wang, H., Yin, G., et al.: Social media in GitHub: the role of @-mention in assisting software development. Sci. China Inf. Sci. 60(3), 032102 (2017)CrossRef Zhang, Y., Wang, H., Yin, G., et al.: Social media in GitHub: the role of @-mention in assisting software development. Sci. China Inf. Sci. 60(3), 032102 (2017)CrossRef
3.
Zurück zum Zitat Gharehyazie, M., Ray, B., Filkov, V.: Some from Here, Some from There: Cross-Project Code Reuse in Github. In: MSR, pp. 291–301. IEEE (2017) Gharehyazie, M., Ray, B., Filkov, V.: Some from Here, Some from There: Cross-Project Code Reuse in Github. In: MSR, pp. 291–301. IEEE (2017)
4.
Zurück zum Zitat Sun, C., Lo, D., Khoo, S. -C., Jiang, J.: Towards More Accurate Retrieval of Duplicate Bug Reports. In: ASE, pp. 253–262. IEEE (2011) Sun, C., Lo, D., Khoo, S. -C., Jiang, J.: Towards More Accurate Retrieval of Duplicate Bug Reports. In: ASE, pp. 253–262. IEEE (2011)
5.
Zurück zum Zitat Zhou, J., Zhang, H., Lo, D.: Where Should the Bugs Be Fixed? More Accurate Information Retrieval-Based Bug Localization Based on Bug Reports. In: ICSE, pp. 14–24. IEEE (2012) Zhou, J., Zhang, H., Lo, D.: Where Should the Bugs Be Fixed? More Accurate Information Retrieval-Based Bug Localization Based on Bug Reports. In: ICSE, pp. 14–24. IEEE (2012)
6.
Zurück zum Zitat Rocha, H., Valente, M. T., Marques-Neto, H., Murphy, G. C.: An Empirical Study on Recommendations of Similar Bugs. In: SANER, pp. 46–56. IEEE (2016) Rocha, H., Valente, M. T., Marques-Neto, H., Murphy, G. C.: An Empirical Study on Recommendations of Similar Bugs. In: SANER, pp. 46–56. IEEE (2016)
7.
Zurück zum Zitat Le, Q., Mikolov, T.: Distributed Representations of Sentences and Documents. In: ICML, pp. 1188–1196 (2014) Le, Q., Mikolov, T.: Distributed Representations of Sentences and Documents. In: ICML, pp. 1188–1196 (2014)
8.
Zurück zum Zitat Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv:1301.3781 (2013) Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv:1301.​3781 (2013)
9.
Zurück zum Zitat Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., Dean, J.: Distributed Representations of Words and Phrases and Their Compositionality. In: NIPS, pp. 3111–3119 (2013) Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., Dean, J.: Distributed Representations of Words and Phrases and Their Compositionality. In: NIPS, pp. 3111–3119 (2013)
10.
Zurück zum Zitat Xu, B., Ye, D., Xing, Z., Xia, X., Chen, G., Li, S.: Predicting Semantically Linkable Knowledge in Developer Online Forums via Convolutional Neural Network. In: ASE. ACM, pp. 51–62 (2016) Xu, B., Ye, D., Xing, Z., Xia, X., Chen, G., Li, S.: Predicting Semantically Linkable Knowledge in Developer Online Forums via Convolutional Neural Network. In: ASE. ACM, pp. 51–62 (2016)
11.
Zurück zum Zitat Ye, X., Shen, H., Ma, X., Bunescu, R., Liu, C.: From Word Embeddings to Document Similarities for Improved Information Retrieval in Software Engineering. In: ICSE, pp. 404–415. ACM (2016) Ye, X., Shen, H., Ma, X., Bunescu, R., Liu, C.: From Word Embeddings to Document Similarities for Improved Information Retrieval in Software Engineering. In: ICSE, pp. 404–415. ACM (2016)
12.
Zurück zum Zitat Yang, X., Lo, D., Xia, X., Bao, L., Sun, J.: Combining Word Embedding with Information Retrieval to Recommend Similar Bug Reports. In: ISSRE, pp. 127–137. IEEE (2016) Yang, X., Lo, D., Xia, X., Bao, L., Sun, J.: Combining Word Embedding with Information Retrieval to Recommend Similar Bug Reports. In: ISSRE, pp. 127–137. IEEE (2016)
13.
Zurück zum Zitat Fan, Y., Xia, X., Lo, D., Hassan, A.E.: Chaff from the wheat: Characterizing and determining valid bug reports. IEEE Transactions on Software Engineering (2018) Fan, Y., Xia, X., Lo, D., Hassan, A.E.: Chaff from the wheat: Characterizing and determining valid bug reports. IEEE Transactions on Software Engineering (2018)
14.
Zurück zum Zitat Li, L., Ren, Z., Li, X., Zou, W., Jiang, H.: How are Issue Units Linked? Empirical Study on the Linking Behavior in GitHub. In: APSEC, pp. 386–395. IEEE (2018) Li, L., Ren, Z., Li, X., Zou, W., Jiang, H.: How are Issue Units Linked? Empirical Study on the Linking Behavior in GitHub. In: APSEC, pp. 386–395. IEEE (2018)
15.
Zurück zum Zitat Zampetti, F., Ponzanelli, L., Bavota, G., Mocci, A., Penta, M. D., Lanza, M.: How Developers Document Pull Requests with External References. In: ICPC, pp. 23-33. IEEE (2017) Zampetti, F., Ponzanelli, L., Bavota, G., Mocci, A., Penta, M. D., Lanza, M.: How Developers Document Pull Requests with External References. In: ICPC, pp. 23-33. IEEE (2017)
16.
Zurück zum Zitat Zhang, Y., Yu, Y., Wang, H., Vasilescu, B., Filkov, V.: Within-Ecosystem Issue Linking: a Large-Scale Study of Rails. In: Software Mining, pp. 12–19. ACM (2018) Zhang, Y., Yu, Y., Wang, H., Vasilescu, B., Filkov, V.: Within-Ecosystem Issue Linking: a Large-Scale Study of Rails. In: Software Mining, pp. 12–19. ACM (2018)
17.
Zurück zum Zitat Zhang, Y., Wu, Y., Wang, T., et al.: A novel approach for recommending semantically linkable issues in GitHub projects. Sci. China Inf. Sci. 62(9), 199105 (2019)CrossRef Zhang, Y., Wu, Y., Wang, T., et al.: A novel approach for recommending semantically linkable issues in GitHub projects. Sci. China Inf. Sci. 62(9), 199105 (2019)CrossRef
18.
Zurück zum Zitat Boisselle, V., Adams, B.: The Impact of Cross-Distribution Bug Duplicates, Empirical Study on Debian and Ubuntu. In: SCAM, pp. 131–140. IEEE (2015) Boisselle, V., Adams, B.: The Impact of Cross-Distribution Bug Duplicates, Empirical Study on Debian and Ubuntu. In: SCAM, pp. 131–140. IEEE (2015)
19.
Zurück zum Zitat Blei, D. M., Ng, A. Y., Jordan, M. I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)MATH Blei, D. M., Ng, A. Y., Jordan, M. I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)MATH
20.
Zurück zum Zitat Dai, A. M., Olah, C., Le, Q. V.: Document embedding with paragraph vectors. arXiv:1507.07998 (2015) Dai, A. M., Olah, C., Le, Q. V.: Document embedding with paragraph vectors. arXiv:1507.​07998 (2015)
21.
Zurück zum Zitat Crowston, K., Scozzi, B.: Bug fixing practices within free/libre open source software development teams (2008)CrossRef Crowston, K., Scozzi, B.: Bug fixing practices within free/libre open source software development teams (2008)CrossRef
22.
Zurück zum Zitat Jeong, G., Kim, S., Zimmermann, T.: Improving Bug Triage with Bug Tossing Graphs. In: ESEC/FSE, pp. 111–120. ACM (2009) Jeong, G., Kim, S., Zimmermann, T.: Improving Bug Triage with Bug Tossing Graphs. In: ESEC/FSE, pp. 111–120. ACM (2009)
23.
Zurück zum Zitat Xia, X., Lo, D., Ding, Y., Al-Kofahi, J. M., Nguyen, T. N., Wang, X.: Improving automated bug triaging with specialized topic model. IEEE Trans. Softw. Eng. 43(3), 272–297 (2017)CrossRef Xia, X., Lo, D., Ding, Y., Al-Kofahi, J. M., Nguyen, T. N., Wang, X.: Improving automated bug triaging with specialized topic model. IEEE Trans. Softw. Eng. 43(3), 272–297 (2017)CrossRef
24.
Zurück zum Zitat Yan, M., Zhang, X., Yang, D., Xu, L., Kymer, J. D.: A component recommender for bug reports using Discriminative Probability Latent Semantic Analysis. Inf. Softw. Technol. 73, 37–51 (2016)CrossRef Yan, M., Zhang, X., Yang, D., Xu, L., Kymer, J. D.: A component recommender for bug reports using Discriminative Probability Latent Semantic Analysis. Inf. Softw. Technol. 73, 37–51 (2016)CrossRef
25.
Zurück zum Zitat Anvik, J., Hiew, L., Murphy, G. C.: Who Should Fix This Bug?. In: ICSE, pp. 361–370. ACM (2006) Anvik, J., Hiew, L., Murphy, G. C.: Who Should Fix This Bug?. In: ICSE, pp. 361–370. ACM (2006)
26.
Zurück zum Zitat Guo, P. J., Zimmermann, T., Nagappan, N., Murphy, B.: Characterizing and Predicting Which Bugs Get Fixed: an Empirical Study of Microsoft Windows. In: ICSE, pp. 495–504. IEEE (2010) Guo, P. J., Zimmermann, T., Nagappan, N., Murphy, B.: Characterizing and Predicting Which Bugs Get Fixed: an Empirical Study of Microsoft Windows. In: ICSE, pp. 495–504. IEEE (2010)
27.
Zurück zum Zitat Bachmann, A., Bird, C., Rahman, F., Devanbu, P., Bernstein, A.: The Missing Links: Bugs and Bug-Fix Commits. In: FSE, pp. 97–106. ACM (2010) Bachmann, A., Bird, C., Rahman, F., Devanbu, P., Bernstein, A.: The Missing Links: Bugs and Bug-Fix Commits. In: FSE, pp. 97–106. ACM (2010)
28.
Zurück zum Zitat Ye, X., Bunescu, R., Liu, C.: Learning to Rank Relevant Files for Bug Reports Using Domain Knowledge. In: FSE, pp. 689–699. ACM (2014) Ye, X., Bunescu, R., Liu, C.: Learning to Rank Relevant Files for Bug Reports Using Domain Knowledge. In: FSE, pp. 689–699. ACM (2014)
29.
Zurück zum Zitat Zhang, Y., Yin, G., Wang, T., Yu, Y., knowledge, H. Wang.: Evaluating Bug Severity Using Crowd-Based an Exploratory Study. In: Internetware, pp. 70–73. ACM (2015) Zhang, Y., Yin, G., Wang, T., Yu, Y., knowledge, H. Wang.: Evaluating Bug Severity Using Crowd-Based an Exploratory Study. In: Internetware, pp. 70–73. ACM (2015)
30.
Zurück zum Zitat Wang, X., Zhang, L., Xie, T., Anvik, J., Sun, J.: An Approach to Detecting Duplicate Bug Reports Using Natural Language and Execution Information. In: ICSE, pp. 461–470. IEEE (2008) Wang, X., Zhang, L., Xie, T., Anvik, J., Sun, J.: An Approach to Detecting Duplicate Bug Reports Using Natural Language and Execution Information. In: ICSE, pp. 461–470. IEEE (2008)
31.
Zurück zum Zitat Ye, D., Xing, Z., Kapre, N.: The structure and dynamics of knowledge network in domain-specific q&a sites: a case study of stack overflow. Empir. Softw. Eng. 22(1), 375–406 (2017)CrossRef Ye, D., Xing, Z., Kapre, N.: The structure and dynamics of knowledge network in domain-specific q&a sites: a case study of stack overflow. Empir. Softw. Eng. 22(1), 375–406 (2017)CrossRef
32.
Zurück zum Zitat Landis, J. R., Koch, G. G.: The measurement of observer agreement for categorical data. Biometrics 33(1), 159–174 (1977)CrossRef Landis, J. R., Koch, G. G.: The measurement of observer agreement for categorical data. Biometrics 33(1), 159–174 (1977)CrossRef
33.
Zurück zum Zitat Paice, C.: A Word Stemmer Based on the Lancaster Stemming Algorithm. In: ACM SIGIR, pp. 56–61 (1990) Paice, C.: A Word Stemmer Based on the Lancaster Stemming Algorithm. In: ACM SIGIR, pp. 56–61 (1990)
34.
Zurück zum Zitat Kohavi, R.: A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. In: IJCAI, pp. 1137–1145 (1995) Kohavi, R.: A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. In: IJCAI, pp. 1137–1145 (1995)
35.
Zurück zum Zitat Hindle, A., Alipour, A., Stroulia, E.: A contextual approach towards more accurate duplicate bug report detection and ranking. Empir. Softw. Eng. 21(2), 368–410 (2016)CrossRef Hindle, A., Alipour, A., Stroulia, E.: A contextual approach towards more accurate duplicate bug report detection and ranking. Empir. Softw. Eng. 21(2), 368–410 (2016)CrossRef
36.
Zurück zum Zitat Thung, F., Kochhar, P. S., Lo, D.: Dupfinder: Integrated Tool Support for Duplicate Bug Report Detection. In: ASE, pp. 871-874. ACM (2014) Thung, F., Kochhar, P. S., Lo, D.: Dupfinder: Integrated Tool Support for Duplicate Bug Report Detection. In: ASE, pp. 871-874. ACM (2014)
37.
Zurück zum Zitat Tian, Y., Sun, C., Lo, D.: Improved Duplicate Bug Report Identification. In: CSMR, pp. 385-390. IEEE (2012) Tian, Y., Sun, C., Lo, D.: Improved Duplicate Bug Report Identification. In: CSMR, pp. 385-390. IEEE (2012)
38.
Zurück zum Zitat Zhang, Y., Lo, D., Xia, X., Sun, J.-L.: Multi-factor duplicate question detection in stack overflow. J. Comput. Sci. Technol. 30(5), 981–997 (2015)CrossRef Zhang, Y., Lo, D., Xia, X., Sun, J.-L.: Multi-factor duplicate question detection in stack overflow. J. Comput. Sci. Technol. 30(5), 981–997 (2015)CrossRef
39.
Zurück zum Zitat Zhang, W. E., Sheng, Q. Z., Tang, Z., Ruan, W.: Related Or Duplicate: Distinguishing Similar CQA Questions via Convolutional Neural Networks. In: SIGIR, pp. 1153-1156. ACM (2018) Zhang, W. E., Sheng, Q. Z., Tang, Z., Ruan, W.: Related Or Duplicate: Distinguishing Similar CQA Questions via Convolutional Neural Networks. In: SIGIR, pp. 1153-1156. ACM (2018)
Metadaten
Titel
iLinker: a novel approach for issue knowledge acquisition in GitHub projects
verfasst von
Yang Zhang
Yiwen Wu
Tao Wang
Huaimin Wang
Publikationsdatum
27.01.2020
Verlag
Springer US
Erschienen in
World Wide Web / Ausgabe 3/2020
Print ISSN: 1386-145X
Elektronische ISSN: 1573-1413
DOI
https://doi.org/10.1007/s11280-019-00770-1

Weitere Artikel der Ausgabe 3/2020

World Wide Web 3/2020 Zur Ausgabe

Premium Partner