Skip to main content
Top
Published in: Empirical Software Engineering 2/2023

01-03-2023

Semantically-enhanced topic recommendation systems for software projects

Authors: Maliheh Izadi, Mahtab Nejati, Abbas Heydarnoori

Published in: Empirical Software Engineering | Issue 2/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Software-related platforms such as GitHub and Stack Overflow, have enabled their users to collaboratively label software entities with a form of metadata called topics. Tagging software repositories with relevant topics can be exploited for facilitating various downstream tasks. For instance, a correct and complete set of topics assigned to a repository can increase its visibility. Consequently, this improves the outcome of tasks such as browsing, searching, navigation, and organization of repositories. Unfortunately, assigned topics are usually highly noisy, and some repositories do not have well-assigned topics. Thus, there have been efforts on recommending topics for software projects, however, the semantic relationships among these topics have not been exploited so far. In this work, we propose two recommender models for tagging software projects that incorporate the semantic relationship among topics. Our approach has two main phases; (1) we first take a collaborative approach to curate a dataset of quality topics specifically for the domain of software engineering and development. We also enrich this data with the semantic relationships among these topics and encapsulate them in a knowledge graph we call SED-KGraph. Then, (2) we build two recommender systems; The first one operates only based on the list of original topics assigned to a repository and the relationships specified in our knowledge graph. The second predictive model, however, assumes there are no topics available for a repository, hence it proceeds to predict the relevant topics based on both textual information of a software project (such as its README file), and SED-KGraph. We built SED-KGraph in a crowd-sourced project with 170 contributors from both academia and industry. Through their contributions, we constructed SED-KGraph with 2,234 carefully evaluated relationships among 863 community-curated topics. Regarding the recommenders’ performance, the experiment results indicate that our solutions outperform baselines that neglect the semantic relationships among topics by at least 25% and 23% in terms of Average Success Rate and Mean Average Precision metrics, respectively. We share SED-KGraph, as a rich form of knowledge for the community to re-use and build upon. We also release the source code of our two recommender models, KGRec and KGRec+ (https://​github.​com/​mahtab-nejati/​KGRec).

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
go back to reference Alonso O, Marshall C, Najork M (2014) Crowdsourcing a subjective labeling task: a human-centered framework to ensure reliable results. Microsoft Res, Redmond, WA, USA, Tech Rep MSR-TR:2014–91 Alonso O, Marshall C, Najork M (2014) Crowdsourcing a subjective labeling task: a human-centered framework to ensure reliable results. Microsoft Res, Redmond, WA, USA, Tech Rep MSR-TR:2014–91
go back to reference Bollacker K, Evans C, Paritosh P, Sturge T, Taylor J (2008) Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, pp 1247–1250 Bollacker K, Evans C, Paritosh P, Sturge T, Taylor J (2008) Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, pp 1247–1250
go back to reference Cai X, Zhu J, Shen B, Chen Y (2016) Greta: graph-based tag assignment for github repositories. In: In proceedings of the 40th annual computer software and applications conference (COMPSAC). IEEE, vol 1, pp 63–72 Cai X, Zhu J, Shen B, Chen Y (2016) Greta: graph-based tag assignment for github repositories. In: In proceedings of the 40th annual computer software and applications conference (COMPSAC). IEEE, vol 1, pp 63–72
go back to reference Cao J, Du T, Shen B, Li W, Wu Q, Chen Y (2019) Constructing a knowledge base of coding conventions from online resources. In: The international conference on software engineering and knowledge engineering (SEKE), pp 5–14 Cao J, Du T, Shen B, Li W, Wu Q, Chen Y (2019) Constructing a knowledge base of coding conventions from online resources. In: The international conference on software engineering and knowledge engineering (SEKE), pp 5–14
go back to reference Chen D, Li B, Zhou C, Zhu X (2019) Automatically identifying bug entities and relations for bug analysis. In: 2019 IEEE 1st international workshop on intelligent bug fixing (IBF), pp 39–43 Chen D, Li B, Zhou C, Zhu X (2019) Automatically identifying bug entities and relations for bug analysis. In: 2019 IEEE 1st international workshop on intelligent bug fixing (IBF), pp 39–43
go back to reference Crestani F (1997) Application of spreading activation techniques in information retrieval. Artif Intell Rev 11(6):453–482CrossRef Crestani F (1997) Application of spreading activation techniques in information retrieval. Artif Intell Rev 11(6):453–482CrossRef
go back to reference Di Rocco J, Di Ruscio D, Di Sipio C, Nguyen P, Rubei R (2020) Topfilter: an approach to recommend relevant github topics. In: In proceedings of the 14th international symposium on empirical software engineering and measurement (ESEM). ACM, ESEM ’20, New York Di Rocco J, Di Ruscio D, Di Sipio C, Nguyen P, Rubei R (2020) Topfilter: an approach to recommend relevant github topics. In: In proceedings of the 14th international symposium on empirical software engineering and measurement (ESEM). ACM, ESEM ’20, New York
go back to reference Di Sipio C, Rubei R, Di Ruscio D, Nguyen PT (2020) A multinomial naïve bayesian (mnb) network to automatically recommend topics for github repositories. In: In proceedings of the 24th international conference on evaluation and assessment in software engineering (EASE). ACM, pp 71–80 Di Sipio C, Rubei R, Di Ruscio D, Nguyen PT (2020) A multinomial naïve bayesian (mnb) network to automatically recommend topics for github repositories. In: In proceedings of the 24th international conference on evaluation and assessment in software engineering (EASE). ACM, pp 71–80
go back to reference Dong L, Wei F, Zhou M, Xu K (2015) Question answering over freebase with multi-column convolutional neural networks. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (vol 1: long papers), pp 260–269 Dong L, Wei F, Zhou M, Xu K (2015) Question answering over freebase with multi-column convolutional neural networks. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (vol 1: long papers), pp 260–269
go back to reference Escobar-Avila J, Linares-Vásquez M, Haiduc S (2015) Unsupervised software categorization using bytecode, pp In proceedings of the 23rd international conference on program comprehension (ICPC). IEEE, pp 229–239 Escobar-Avila J, Linares-Vásquez M, Haiduc S (2015) Unsupervised software categorization using bytecode, pp In proceedings of the 23rd international conference on program comprehension (ICPC). IEEE, pp 229–239
go back to reference Fathalla S, Lange C (2018) Eventskg: a knowledge graph representation for top-prestigious computer science events metadata. In: In proceedings of the 10th international conference on computational collective intelligence (ICCCI). Springer, pp 53–63 Fathalla S, Lange C (2018) Eventskg: a knowledge graph representation for top-prestigious computer science events metadata. In: In proceedings of the 10th international conference on computational collective intelligence (ICCCI). Springer, pp 53–63
go back to reference Golder SA, Huberman BA (2006) Usage patterns of collaborative tagging systems. J Inf Sci 32(2):198–208CrossRef Golder SA, Huberman BA (2006) Usage patterns of collaborative tagging systems. J Inf Sci 32(2):198–208CrossRef
go back to reference Han Z, Li X, Liu H, Xing Z, Feng Z (2018) Deepweak: reasoning common software weaknesses via knowledge graph embedding. In: In proceedings of the 25th international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 456–466 Han Z, Li X, Liu H, Xing Z, Feng Z (2018) Deepweak: reasoning common software weaknesses via knowledge graph embedding. In: In proceedings of the 25th international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 456–466
go back to reference Held C, Kimmerle J, Cress U (2012) Learning by foraging: the impact of individual knowledge and social tags on web navigation processes. Comput Hum Behav 28(1):34–40CrossRef Held C, Kimmerle J, Cress U (2012) Learning by foraging: the impact of individual knowledge and social tags on web navigation processes. Comput Hum Behav 28(1):34–40CrossRef
go back to reference Izadi M, Ahmadabadi MN (2022) On the evaluation of nlp-based models for software engineering. In: 2022 IEEE/ACM 1st international workshop on natural language-based software engineering (NLBSE). IEEE computer society, USA, pp 48–50 Izadi M, Ahmadabadi MN (2022) On the evaluation of nlp-based models for software engineering. In: 2022 IEEE/ACM 1st international workshop on natural language-based software engineering (NLBSE). IEEE computer society, USA, pp 48–50
go back to reference Izadi M, Akbari K, Heydarnoori A (2022) Predicting the objective and priority of issue reports in software repositories. Empir Softw Eng 27(2):1–37CrossRef Izadi M, Akbari K, Heydarnoori A (2022) Predicting the objective and priority of issue reports in software repositories. Empir Softw Eng 27(2):1–37CrossRef
go back to reference Izadi M, Heydarnoori A, Gousios G (2021) Topic recommendation for software repositories using multi-label classification algorithms. Empir Softw Eng 26(5):1–33CrossRef Izadi M, Heydarnoori A, Gousios G (2021) Topic recommendation for software repositories using multi-label classification algorithms. Empir Softw Eng 26(5):1–33CrossRef
go back to reference Karthik S, Medvidovic N (2019) Automatic detection of latent software component relationships from online qa sites. In: Proceedings of the 7th international workshop on realizing artificial intelligence synergies in software engineering (RAISE). IEEE Press, pp 15–21 Karthik S, Medvidovic N (2019) Automatic detection of latent software component relationships from online qa sites. In: Proceedings of the 7th international workshop on realizing artificial intelligence synergies in software engineering (RAISE). IEEE Press, pp 15–21
go back to reference Li H, Li S, Sun J, Xing Z, Peng X, Liu M, Zhao X (2018) Improving api caveats accessibility by mining api caveats knowledge graph. In: In proceedings of the 34th international conference on software maintenance and evolution (ICSME), pp 183–193 Li H, Li S, Sun J, Xing Z, Peng X, Liu M, Zhao X (2018) Improving api caveats accessibility by mining api caveats knowledge graph. In: In proceedings of the 34th international conference on software maintenance and evolution (ICSME), pp 183–193
go back to reference Liu J, Zhou P, Yang Z, Liu X, Grundy J (2018) Fasttagrec: fast tag recommendation for software information sites. Autom Softw Eng 25 (4):675–701CrossRef Liu J, Zhou P, Yang Z, Liu X, Grundy J (2018) Fasttagrec: fast tag recommendation for software information sites. Autom Softw Eng 25 (4):675–701CrossRef
go back to reference Maity SK, Panigrahi A, Ghosh S, Banerjee A, Goyal P, Mukherjee A (2019) Deeptagrec: a content-cum-user based tag recommendation framework for stack overflow. In: In proceedings of the 41st european conference on information retrieval (ECIR). Springer, pp 125–131 Maity SK, Panigrahi A, Ghosh S, Banerjee A, Goyal P, Mukherjee A (2019) Deeptagrec: a content-cum-user based tag recommendation framework for stack overflow. In: In proceedings of the 41st european conference on information retrieval (ECIR). Springer, pp 125–131
go back to reference Mazrae PR, Izadi M, Heydarnoori A (2021) Automated recovery of issue-commit links leveraging both textual and non-textual data. In: 2021 IEEE international conference on software maintenance and evolution (ICSME). IEEE computer society, USA, pp 263–273 Mazrae PR, Izadi M, Heydarnoori A (2021) Automated recovery of issue-commit links leveraging both textual and non-textual data. In: 2021 IEEE international conference on software maintenance and evolution (ICSME). IEEE computer society, USA, pp 263–273
go back to reference McMillan C, Grechanik M, Poshyvanyk D (2012) Detecting similar software applications. In: In proceedings of the 34th international conference on software engineering (ICSE). IEEE, pp 364–374 McMillan C, Grechanik M, Poshyvanyk D (2012) Detecting similar software applications. In: In proceedings of the 34th international conference on software engineering (ICSE). IEEE, pp 364–374
go back to reference Reyes J, Ramírez D, Paciello J (2016) Automatic classification of source code archives by programming language: a deep learning approach. In: 2016 International conference on computational science and computational intelligence (CSCI), pp 514–519 Reyes J, Ramírez D, Paciello J (2016) Automatic classification of source code archives by programming language: a deep learning approach. In: 2016 International conference on computational science and computational intelligence (CSCI), pp 514–519
go back to reference Sun J, Xing Z, Chu R, Bai H, Wang J, Peng X (2019) Know-how in programming tasks: from textual tutorials to task-oriented knowledge graph. In: IEEE international conference on software maintenance and evolution (ICSME), pp 257–268, 09 Sun J, Xing Z, Chu R, Bai H, Wang J, Peng X (2019) Know-how in programming tasks: from textual tutorials to task-oriented knowledge graph. In: IEEE international conference on software maintenance and evolution (ICSME), pp 257–268, 09
go back to reference Sun J, Xing Z, Peng X, Xu X, Zhu L (2021) Task-oriented api usage examples prompting powered by programming task knowledge graph. In: 2021 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 448–459 Sun J, Xing Z, Peng X, Xu X, Zhu L (2021) Task-oriented api usage examples prompting powered by programming task knowledge graph. In: 2021 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 448–459
go back to reference Thung F, Lo D, Jiang L (2012) Detecting similar applications with collaborative tagging. In: In proceedings of the 28th international conference on software maintenance (ICSM). IEEE, pp 600–603 Thung F, Lo D, Jiang L (2012) Detecting similar applications with collaborative tagging. In: In proceedings of the 28th international conference on software maintenance (ICSM). IEEE, pp 600–603
go back to reference Vargas-Baldrich S, Linares-Vásquez M, Poshyvanyk D (2015) Automated tagging of software projects using bytecode and dependencies (n). In: In proceedings of the 30th international conference on automated software engineering (ASE). IEEE, pp 289–294 Vargas-Baldrich S, Linares-Vásquez M, Poshyvanyk D (2015) Automated tagging of software projects using bytecode and dependencies (n). In: In proceedings of the 30th international conference on automated software engineering (ASE). IEEE, pp 289–294
go back to reference Wagner S, Fernández DM (2015) Chapter 3 - analyzing text in software projects. In: Bird C, Menzies T, Zimmermann T (eds) The art and science of analyzing software data. Morgan Kaufmann, Boston, pp 39–72 Wagner S, Fernández DM (2015) Chapter 3 - analyzing text in software projects. In: Bird C, Menzies T, Zimmermann T (eds) The art and science of analyzing software data. Morgan Kaufmann, Boston, pp 39–72
go back to reference Wang H, Zhang F, Wang J, Zhao M, Li W, Xie X, Guo M (2018) Ripplenet: propagating user preferences on the knowledge graph for recommender systems. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM). ACM, New York, pp 417–426 Wang H, Zhang F, Wang J, Zhao M, Li W, Xie X, Guo M (2018) Ripplenet: propagating user preferences on the knowledge graph for recommender systems. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM). ACM, New York, pp 417–426
go back to reference Wang L, Sun X, Wang J, Duan Y, Li B (2017) Construct bug knowledge graph for bug resolution. In: In proceedings of the 39th international conference on software engineering companion (ICSE-C). IEEE, pp 189–191 Wang L, Sun X, Wang J, Duan Y, Li B (2017) Construct bug knowledge graph for bug resolution. In: In proceedings of the 39th international conference on software engineering companion (ICSE-C). IEEE, pp 189–191
go back to reference Wang S, Lo D, Vasilescu B, Serebrenik A (2018) Entagrec++: an enhanced tag recommendation system for software information sites. Empir Softw Eng 23(2):800–832CrossRef Wang S, Lo D, Vasilescu B, Serebrenik A (2018) Entagrec++: an enhanced tag recommendation system for software information sites. Empir Softw Eng 23(2):800–832CrossRef
go back to reference Wang T, Wang H, Yin G, Ling CX, Li X, Zou P (2014) Tag recommendation for open source software. Frontiers Comput Sci (FCS) 8 (1):69–82MathSciNetCrossRef Wang T, Wang H, Yin G, Ling CX, Li X, Zou P (2014) Tag recommendation for open source software. Frontiers Comput Sci (FCS) 8 (1):69–82MathSciNetCrossRef
go back to reference Xia X, Lo D, Wang X, Zhou B (2013) Tag recommendation in software information sites. In: 2013 10th Working conference on mining software repositories (MSR). IEEE, pp 287–296 Xia X, Lo D, Wang X, Zhou B (2013) Tag recommendation in software information sites. In: 2013 10th Working conference on mining software repositories (MSR). IEEE, pp 287–296
go back to reference Xin-Yu Wang DL, Xia X (2015) Tagcombine: recommending tags to contents in software information sites. J Comput Sci Technol 30(5):1017CrossRef Xin-Yu Wang DL, Xia X (2015) Tagcombine: recommending tags to contents in software information sites. J Comput Sci Technol 30(5):1017CrossRef
go back to reference Xu K, Reddy S, Feng Y, Huang S, Zhao D (2016) Question answering on freebase via relation extraction and textual evidence Xu K, Reddy S, Feng Y, Huang S, Zhao D (2016) Question answering on freebase via relation extraction and textual evidence
go back to reference Yang Y, Li Y, Yue Y, Wu Z, Shao W (2016) Cut: a combined approach for tag recommendation in software information sites. In: Lehner F, Fteimi N (eds) Knowledge science, engineering and management. Springer, Cham, pp 599–612 Yang Y, Li Y, Yue Y, Wu Z, Shao W (2016) Cut: a combined approach for tag recommendation in software information sites. In: Lehner F, Fteimi N (eds) Knowledge science, engineering and management. Springer, Cham, pp 599–612
go back to reference Yao X, B. Van Durme. (2014) Information extraction over structured data: question answering with freebase. In: Proceedings of the 52nd annual meeting of the association for computational linguistics (vol 1: long papers), pp 956–966 Yao X, B. Van Durme. (2014) Information extraction over structured data: question answering with freebase. In: Proceedings of the 52nd annual meeting of the association for computational linguistics (vol 1: long papers), pp 956–966
go back to reference Zhang E, Banovic N (2021) Method for exploring generative adversarial networks (gans) via automatically generated image galleries. In: Proceedings of the conference on human factors in computing systems (CHI), pp 1–15 Zhang E, Banovic N (2021) Method for exploring generative adversarial networks (gans) via automatically generated image galleries. In: Proceedings of the conference on human factors in computing systems (CHI), pp 1–15
go back to reference Zhang Y, Lo D, Kochhar PS, Xia X, Li Q, Sun J (2017) Detecting similar repositories on github. In: In proceedings of the 24th international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 13–23 Zhang Y, Lo D, Kochhar PS, Xia X, Li Q, Sun J (2017) Detecting similar repositories on github. In: In proceedings of the 24th international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 13–23
go back to reference Zhang Y, Xu FF, Li S, Meng Y, Wang X, Li Q, Han J (2019) Higitclass: keyword-driven hierarchical classification of github repositories. In: 2019 IEEE international conference on data mining (ICDM). IEEE, pp 876–885 Zhang Y, Xu FF, Li S, Meng Y, Wang X, Li Q, Han J (2019) Higitclass: keyword-driven hierarchical classification of github repositories. In: 2019 IEEE international conference on data mining (ICDM). IEEE, pp 876–885
go back to reference Zhao X, Xing Z, Kabir MA, Sawada N, Li J, Lin S (2017) Hdskg: harvesting domain specific knowledge graph from content of webpages. In: In proceedings of the 24th international conference on software analysis, evolution and reengineering (SANER), pp 56–67 Zhao X, Xing Z, Kabir MA, Sawada N, Li J, Lin S (2017) Hdskg: harvesting domain specific knowledge graph from content of webpages. In: In proceedings of the 24th international conference on software analysis, evolution and reengineering (SANER), pp 56–67
go back to reference Zhao Y, Wang H, Ma L, Liu Y, Li L, Grundy J (2019) Knowledge graphing git repositories: a preliminary study. In: 2019 IEEE 26th international conference on software analysis, evolution and reengineering (SANER), pp 599–603 Zhao Y, Wang H, Ma L, Liu Y, Li L, Grundy J (2019) Knowledge graphing git repositories: a preliminary study. In: 2019 IEEE 26th international conference on software analysis, evolution and reengineering (SANER), pp 599–603
go back to reference Zhou P, Liu J, Yang Z, Zhou G (2017) Scalable tag recommendation for software information sites. In: In proceedings of the 24th international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 272–282 Zhou P, Liu J, Yang Z, Zhou G (2017) Scalable tag recommendation for software information sites. In: In proceedings of the 24th international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 272–282
go back to reference Zou X (2020) A survey on application of knowledge graph. J Phys Conf Ser 1487(03):012016CrossRef Zou X (2020) A survey on application of knowledge graph. J Phys Conf Ser 1487(03):012016CrossRef
Metadata
Title
Semantically-enhanced topic recommendation systems for software projects
Authors
Maliheh Izadi
Mahtab Nejati
Abbas Heydarnoori
Publication date
01-03-2023
Publisher
Springer US
Published in
Empirical Software Engineering / Issue 2/2023
Print ISSN: 1382-3256
Electronic ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-022-10272-w

Other articles of this Issue 2/2023

Empirical Software Engineering 2/2023 Go to the issue

Premium Partner