Top

Empirical Software Engineering

Published in:

01-03-2023

Semantically-enhanced topic recommendation systems for software projects

Authors: Maliheh Izadi, Mahtab Nejati, Abbas Heydarnoori

Published in: Empirical Software Engineering | Issue 2/2023

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Software-related platforms such as GitHub and Stack Overflow, have enabled their users to collaboratively label software entities with a form of metadata called topics. Tagging software repositories with relevant topics can be exploited for facilitating various downstream tasks. For instance, a correct and complete set of topics assigned to a repository can increase its visibility. Consequently, this improves the outcome of tasks such as browsing, searching, navigation, and organization of repositories. Unfortunately, assigned topics are usually highly noisy, and some repositories do not have well-assigned topics. Thus, there have been efforts on recommending topics for software projects, however, the semantic relationships among these topics have not been exploited so far. In this work, we propose two recommender models for tagging software projects that incorporate the semantic relationship among topics. Our approach has two main phases; (1) we first take a collaborative approach to curate a dataset of quality topics specifically for the domain of software engineering and development. We also enrich this data with the semantic relationships among these topics and encapsulate them in a knowledge graph we call SED-KGraph. Then, (2) we build two recommender systems; The first one operates only based on the list of original topics assigned to a repository and the relationships specified in our knowledge graph. The second predictive model, however, assumes there are no topics available for a repository, hence it proceeds to predict the relevant topics based on both textual information of a software project (such as its README file), and SED-KGraph. We built SED-KGraph in a crowd-sourced project with 170 contributors from both academia and industry. Through their contributions, we constructed SED-KGraph with 2,234 carefully evaluated relationships among 863 community-curated topics. Regarding the recommenders’ performance, the experiment results indicate that our solutions outperform baselines that neglect the semantic relationships among topics by at least 25% and 23% in terms of Average Success Rate and Mean Average Precision metrics, respectively. We share SED-KGraph, as a rich form of knowledge for the community to re-use and build upon. We also release the source code of our two recommender models, KGRec and KGRec+ (https://github.com/mahtab-nejati/KGRec).

previous article The impact of class imbalance techniques on crashing fault residence prediction models

next article Software selection in large-scale software engineering: A model and criteria based on interactive rapid reviews

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Available only for authorised users

January 2022, https://github.com/search

https://angular.io/

https://github.com/github/explore

https://github.com/mahtab-nejati/KGRec

https://github.com/MalihehIzadi/SoftwareTagRecommender

To access the platform, please refer to our public GitHub repository at https://github.com/mahtab-nejati/KGRec.

https://tedboy.github.io/nlps/generated/generated/nltk.edit_distance.html

For more samples please refer to Appendix B

https://github.com/mahtab-nejati/KGRec

https://github.com/MalihehIzadi/SoftwareTagRecommender

Alonso O, Marshall C, Najork M (2014) Crowdsourcing a subjective labeling task: a human-centered framework to ensure reliable results. Microsoft Res, Redmond, WA, USA, Tech Rep MSR-TR:2014–91

Bollacker K, Evans C, Paritosh P, Sturge T, Taylor J (2008) Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, pp 1247–1250

Cai X, Zhu J, Shen B, Chen Y (2016) Greta: graph-based tag assignment for github repositories. In: In proceedings of the 40th annual computer software and applications conference (COMPSAC). IEEE, vol 1, pp 63–72

Cao J, Du T, Shen B, Li W, Wu Q, Chen Y (2019) Constructing a knowledge base of coding conventions from online resources. In: The international conference on software engineering and knowledge engineering (SEKE), pp 5–14

Chen D, Li B, Zhou C, Zhu X (2019) Automatically identifying bug entities and relations for bug analysis. In: 2019 IEEE 1st international workshop on intelligent bug fixing (IBF), pp 39–43

Crestani F (1997) Application of spreading activation techniques in information retrieval. Artif Intell Rev 11(6):453–482CrossRef

Di Rocco J, Di Ruscio D, Di Sipio C, Nguyen P, Rubei R (2020) Topfilter: an approach to recommend relevant github topics. In: In proceedings of the 14th international symposium on empirical software engineering and measurement (ESEM). ACM, ESEM ’20, New York

Di Sipio C, Rubei R, Di Ruscio D, Nguyen PT (2020) A multinomial naïve bayesian (mnb) network to automatically recommend topics for github repositories. In: In proceedings of the 24th international conference on evaluation and assessment in software engineering (EASE). ACM, pp 71–80

Dong L, Wei F, Zhou M, Xu K (2015) Question answering over freebase with multi-column convolutional neural networks. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (vol 1: long papers), pp 260–269

Escobar-Avila J, Linares-Vásquez M, Haiduc S (2015) Unsupervised software categorization using bytecode, pp In proceedings of the 23rd international conference on program comprehension (ICPC). IEEE, pp 229–239

Fathalla S, Lange C (2018) Eventskg: a knowledge graph representation for top-prestigious computer science events metadata. In: In proceedings of the 10th international conference on computational collective intelligence (ICCCI). Springer, pp 53–63

Golder SA, Huberman BA (2006) Usage patterns of collaborative tagging systems. J Inf Sci 32(2):198–208CrossRef

Han Z, Li X, Liu H, Xing Z, Feng Z (2018) Deepweak: reasoning common software weaknesses via knowledge graph embedding. In: In proceedings of the 25th international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 456–466

Held C, Kimmerle J, Cress U (2012) Learning by foraging: the impact of individual knowledge and social tags on web navigation processes. Comput Hum Behav 28(1):34–40CrossRef

Izadi M, Ahmadabadi MN (2022) On the evaluation of nlp-based models for software engineering. In: 2022 IEEE/ACM 1st international workshop on natural language-based software engineering (NLBSE). IEEE computer society, USA, pp 48–50

Izadi M, Akbari K, Heydarnoori A (2022) Predicting the objective and priority of issue reports in software repositories. Empir Softw Eng 27(2):1–37CrossRef

Izadi M, Heydarnoori A, Gousios G (2021) Topic recommendation for software repositories using multi-label classification algorithms. Empir Softw Eng 26(5):1–33CrossRef

Karthik S, Medvidovic N (2019) Automatic detection of latent software component relationships from online qa sites. In: Proceedings of the 7th international workshop on realizing artificial intelligence synergies in software engineering (RAISE). IEEE Press, pp 15–21

Li H, Li S, Sun J, Xing Z, Peng X, Liu M, Zhao X (2018) Improving api caveats accessibility by mining api caveats knowledge graph. In: In proceedings of the 34th international conference on software maintenance and evolution (ICSME), pp 183–193

Liu J, Zhou P, Yang Z, Liu X, Grundy J (2018) Fasttagrec: fast tag recommendation for software information sites. Autom Softw Eng 25 (4):675–701CrossRef

Maity SK, Panigrahi A, Ghosh S, Banerjee A, Goyal P, Mukherjee A (2019) Deeptagrec: a content-cum-user based tag recommendation framework for stack overflow. In: In proceedings of the 41st european conference on information retrieval (ECIR). Springer, pp 125–131

Mazrae PR, Izadi M, Heydarnoori A (2021) Automated recovery of issue-commit links leveraging both textual and non-textual data. In: 2021 IEEE international conference on software maintenance and evolution (ICSME). IEEE computer society, USA, pp 263–273

McMillan C, Grechanik M, Poshyvanyk D (2012) Detecting similar software applications. In: In proceedings of the 34th international conference on software engineering (ICSE). IEEE, pp 364–374

Reyes J, Ramírez D, Paciello J (2016) Automatic classification of source code archives by programming language: a deep learning approach. In: 2016 International conference on computational science and computational intelligence (CSCI), pp 514–519

Sun J, Xing Z, Chu R, Bai H, Wang J, Peng X (2019) Know-how in programming tasks: from textual tutorials to task-oriented knowledge graph. In: IEEE international conference on software maintenance and evolution (ICSME), pp 257–268, 09

Sun J, Xing Z, Peng X, Xu X, Zhu L (2021) Task-oriented api usage examples prompting powered by programming task knowledge graph. In: 2021 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 448–459

Thung F, Lo D, Jiang L (2012) Detecting similar applications with collaborative tagging. In: In proceedings of the 28th international conference on software maintenance (ICSM). IEEE, pp 600–603

Vargas-Baldrich S, Linares-Vásquez M, Poshyvanyk D (2015) Automated tagging of software projects using bytecode and dependencies (n). In: In proceedings of the 30th international conference on automated software engineering (ASE). IEEE, pp 289–294

Wagner S, Fernández DM (2015) Chapter 3 - analyzing text in software projects. In: Bird C, Menzies T, Zimmermann T (eds) The art and science of analyzing software data. Morgan Kaufmann, Boston, pp 39–72

Wang H, Zhang F, Wang J, Zhao M, Li W, Xie X, Guo M (2018) Ripplenet: propagating user preferences on the knowledge graph for recommender systems. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM). ACM, New York, pp 417–426

Wang L, Sun X, Wang J, Duan Y, Li B (2017) Construct bug knowledge graph for bug resolution. In: In proceedings of the 39th international conference on software engineering companion (ICSE-C). IEEE, pp 189–191

Wang S, Lo D, Vasilescu B, Serebrenik A (2018) Entagrec++: an enhanced tag recommendation system for software information sites. Empir Softw Eng 23(2):800–832CrossRef

Wang T, Wang H, Yin G, Ling CX, Li X, Zou P (2014) Tag recommendation for open source software. Frontiers Comput Sci (FCS) 8 (1):69–82MathSciNetCrossRef

Xia X, Lo D, Wang X, Zhou B (2013) Tag recommendation in software information sites. In: 2013 10th Working conference on mining software repositories (MSR). IEEE, pp 287–296

Xin-Yu Wang DL, Xia X (2015) Tagcombine: recommending tags to contents in software information sites. J Comput Sci Technol 30(5):1017CrossRef

Xu K, Reddy S, Feng Y, Huang S, Zhao D (2016) Question answering on freebase via relation extraction and textual evidence

Yang Y, Li Y, Yue Y, Wu Z, Shao W (2016) Cut: a combined approach for tag recommendation in software information sites. In: Lehner F, Fteimi N (eds) Knowledge science, engineering and management. Springer, Cham, pp 599–612

Yao X, B. Van Durme. (2014) Information extraction over structured data: question answering with freebase. In: Proceedings of the 52nd annual meeting of the association for computational linguistics (vol 1: long papers), pp 956–966

Zhang E, Banovic N (2021) Method for exploring generative adversarial networks (gans) via automatically generated image galleries. In: Proceedings of the conference on human factors in computing systems (CHI), pp 1–15

Zhang Y, Lo D, Kochhar PS, Xia X, Li Q, Sun J (2017) Detecting similar repositories on github. In: In proceedings of the 24th international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 13–23

Zhang Y, Xu FF, Li S, Meng Y, Wang X, Li Q, Han J (2019) Higitclass: keyword-driven hierarchical classification of github repositories. In: 2019 IEEE international conference on data mining (ICDM). IEEE, pp 876–885

Zhao X, Xing Z, Kabir MA, Sawada N, Li J, Lin S (2017) Hdskg: harvesting domain specific knowledge graph from content of webpages. In: In proceedings of the 24th international conference on software analysis, evolution and reengineering (SANER), pp 56–67

Zhao Y, Wang H, Ma L, Liu Y, Li L, Grundy J (2019) Knowledge graphing git repositories: a preliminary study. In: 2019 IEEE 26th international conference on software analysis, evolution and reengineering (SANER), pp 599–603

Zhou P, Liu J, Yang Z, Zhou G (2017) Scalable tag recommendation for software information sites. In: In proceedings of the 24th international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 272–282

Zou X (2020) A survey on application of knowledge graph. J Phys Conf Ser 1487(03):012016CrossRef

Title: Semantically-enhanced topic recommendation systems for software projects
Authors: Maliheh Izadi
Mahtab Nejati
Abbas Heydarnoori
Publication date: 01-03-2023
Publisher: Springer US
Published in: Empirical Software Engineering / Issue 2/2023
Print ISSN: 1382-3256
Electronic ISSN: 1573-7616
DOI: https://doi.org/10.1007/s10664-022-10272-w

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Other articles of this Issue 2/2023

A multi-model framework for semantically enhancing detection of quality-related bug report descriptions

Assessing the exposure of software changes

Evaluating ensemble imputation in software effort estimation

Differential testing for machine learning: an analysis for classification algorithms beyond deep learning

Correction to: Advantages and disadvantages of (dedicated) model transformation languages

Evaluating state-of-the-art # SAT solvers on industrial configuration spaces

Premium Partner