Skip to main content
Erschienen in: Software Quality Journal 2/2020

15.02.2020

An automated approach to assess the similarity of GitHub repositories

verfasst von: Phuong T. Nguyen, Juri Di Rocco, Riccardo Rubei, Davide Di Ruscio

Erschienen in: Software Quality Journal | Ausgabe 2/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Open source software (OSS) allows developers to study, change, and improve the code free of charge. There are several high-quality software projects which deliver stable and well-documented products. Most OSS forges typically sustain active user and expert communities which in turn provide decent levels of support both with respect to answering user questions as well as to repairing reported software bugs. Code reuse is an intrinsic feature of OSS, and developing a new system by leveraging existing open source components can reduce development effort, and thus it can be beneficial to at least two phases of the software life cycle, i.e., implementation and maintenance. However, to improve software quality, it is essential to develop a system by learning from well-defined, mature projects. In this sense, the ability to find similar projects that facilitate the undergoing development activities is of high importance. In this paper, we address the issue of mining open source software repositories to detect similar projects, which can be eventually reused by developers. We propose CrossSim as a novel approach to model the OSS ecosystem and to compute similarities among software projects. An evaluation on a dataset collected from GitHub shows that our proposed approach outperforms three well-established baselines.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
Zurück zum Zitat Bagnato, A, Barmpis, K, Bessis, N, Cabrera-Diego, LA, Di Rocco, J, Di Ruscio, D, Gergely, T, Hansen, S, Kolovos, D, Krief, P, Korkontzelos, I, Laurière, S, Lopez de la Fuente, JM, Maló, P, Paige, RF, Spinellis, D, Thomas, C, Vinju, J. (2018). Developer-centric knowledge mining from large open-source software repositories (crossminer). In Seidl, M, & Zschaler, S (Eds.) Software technologies: applications and foundations (pp. 375–384). Cham: Springer International Publishing. Bagnato, A, Barmpis, K, Bessis, N, Cabrera-Diego, LA, Di Rocco, J, Di Ruscio, D, Gergely, T, Hansen, S, Kolovos, D, Krief, P, Korkontzelos, I, Laurière, S, Lopez de la Fuente, JM, Maló, P, Paige, RF, Spinellis, D, Thomas, C, Vinju, J. (2018). Developer-centric knowledge mining from large open-source software repositories (crossminer). In Seidl, M, & Zschaler, S (Eds.) Software technologies: applications and foundations (pp. 375–384). Cham: Springer International Publishing.
Zurück zum Zitat Baltes, S., Dumani, L., Treude, C., Diehl, S. (2018). SOTorrent: reconstructing and analyzing the evolution of stack overflow posts. In: MSR. Baltes, S., Dumani, L., Treude, C., Diehl, S. (2018). SOTorrent: reconstructing and analyzing the evolution of stack overflow posts. In: MSR.
Zurück zum Zitat Behnamghader, P., Alfayez, R., Srisopha, K., Boehm, B. (2017). Towards better understanding of software quality evolution through commit-impact analysis. In 2017 IEEE International conference on software quality, reliability and security (QRS) (pp. 251–262). Behnamghader, P., Alfayez, R., Srisopha, K., Boehm, B. (2017). Towards better understanding of software quality evolution through commit-impact analysis. In 2017 IEEE International conference on software quality, reliability and security (QRS) (pp. 251–262).
Zurück zum Zitat Bhandari, U, Sugiyama, K, Datta, A, Jindal, R. (2013). Serendipitous recommendation for mobile apps using item-item similarity graph. In Banchs, RE, Silvestri, F, Liu, T-Y, Zhang, M, Gao, S, Lang, J (Eds.) AIRS, volume 8281 of lecture notes in computer science (pp. 440–451): Springer. Bhandari, U, Sugiyama, K, Datta, A, Jindal, R. (2013). Serendipitous recommendation for mobile apps using item-item similarity graph. In Banchs, RE, Silvestri, F, Liu, T-Y, Zhang, M, Gao, S, Lang, J (Eds.) AIRS, volume 8281 of lecture notes in computer science (pp. 440–451): Springer.
Zurück zum Zitat Bizer, C., Heath, T., Berners-Lee, T. (2009). Linked data - the story so far. International Journal on Semantic Web and Information Systems, 5(3), 1–22.CrossRef Bizer, C., Heath, T., Berners-Lee, T. (2009). Linked data - the story so far. International Journal on Semantic Web and Information Systems, 5(3), 1–22.CrossRef
Zurück zum Zitat Blondel, V.D., Gajardo, A., Heymans, M., Senellart, P., Dooren, P.V. (2004). A measure of similarity between graph vertices: applications to synonym extraction and web searching. SIAM Review, 46(4), 647–666.MathSciNetCrossRef Blondel, V.D., Gajardo, A., Heymans, M., Senellart, P., Dooren, P.V. (2004). A measure of similarity between graph vertices: applications to synonym extraction and web searching. SIAM Review, 46(4), 647–666.MathSciNetCrossRef
Zurück zum Zitat Borges, H., Hora, A., Valente, M.T. (2016). Understanding the factors that impact the popularity of github repositories. In 2016 IEEE International conference on software maintenance and evolution (ICSME) (pp. 334–344). Borges, H., Hora, A., Valente, M.T. (2016). Understanding the factors that impact the popularity of github repositories. In 2016 IEEE International conference on software maintenance and evolution (ICSME) (pp. 334–344).
Zurück zum Zitat Chen, N., Hoi, S.C., Li, S., Xiao, X. (2015). SimApp: a framework for detecting similar mobile applications by online kernel learning. In Proceedings of the eighth ACM international conference on web search and data mining, WSDM ’15 (pp. 305–314). New York: ACM. Chen, N., Hoi, S.C., Li, S., Xiao, X. (2015). SimApp: a framework for detecting similar mobile applications by online kernel learning. In Proceedings of the eighth ACM international conference on web search and data mining, WSDM ’15 (pp. 305–314). New York: ACM.
Zurück zum Zitat Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12, 2493–2537.MATH Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12, 2493–2537.MATH
Zurück zum Zitat Coutinho, A.E.V.B., Cartaxo, E.G., de Lima Machado, P.D. (2014). Analysis of distance functions for similarity-based test suite reduction in the context of model-based testing. Software Quality Journal, 24, 407–445.CrossRef Coutinho, A.E.V.B., Cartaxo, E.G., de Lima Machado, P.D. (2014). Analysis of distance functions for similarity-based test suite reduction in the context of model-based testing. Software Quality Journal, 24, 407–445.CrossRef
Zurück zum Zitat Crussell, J., Gibler, C., Chen, H. (2013). AnDarwin: scalable detection of semantically similar android applications. In Computer security - ESORICS 2013 - 18th European symposium on research in computer security, Egham, UK, September 9-13, 2013. Proceedings (pp. 182–199). Crussell, J., Gibler, C., Chen, H. (2013). AnDarwin: scalable detection of semantically similar android applications. In Computer security - ESORICS 2013 - 18th European symposium on research in computer security, Egham, UK, September 9-13, 2013. Proceedings (pp. 182–199).
Zurück zum Zitat Di Noia, T., Mirizzi, R., Ostuni, V.C., Romito, D., Zanker, M. (2012). Linked open data to support content-based recommender systems. In Proceedings of the 8th international conference on semantic systems, I-SEMANTICS ’12 (pp. 1–8). New York: ACM. Di Noia, T., Mirizzi, R., Ostuni, V.C., Romito, D., Zanker, M. (2012). Linked open data to support content-based recommender systems. In Proceedings of the 8th international conference on semantic systems, I-SEMANTICS ’12 (pp. 1–8). New York: ACM.
Zurück zum Zitat Evans, W.S., Fraser, C.W., Ma, F. (2009). Clone detection via structural abstraction. Software Quality Journal, 17(4), 309–330.CrossRef Evans, W.S., Fraser, C.W., Ma, F. (2009). Clone detection via structural abstraction. Software Quality Journal, 17(4), 309–330.CrossRef
Zurück zum Zitat Garg, P.K., Kawaguchi, S., Matsushita, M., Inoue, K. (2004). MUDABlue: an automatic categorization system for open source repositories. In 2013 20th Asia-Pacific software engineering conference (APSEC) (pp. 184–193). Garg, P.K., Kawaguchi, S., Matsushita, M., Inoue, K. (2004). MUDABlue: an automatic categorization system for open source repositories. In 2013 20th Asia-Pacific software engineering conference (APSEC) (pp. 184–193).
Zurück zum Zitat Ghose, S., & Lowengart, O. (2001). Taste tests: impacts of consumer perceptions and preferences on brand positioning strategies. Journal of Targeting, Measurement and Analysis for Marketing, 10(1), 26–41.CrossRef Ghose, S., & Lowengart, O. (2001). Taste tests: impacts of consumer perceptions and preferences on brand positioning strategies. Journal of Targeting, Measurement and Analysis for Marketing, 10(1), 26–41.CrossRef
Zurück zum Zitat Gitchell, D., & Tran, N. (1999). Sim: a utility for detecting similarity in computer programs. In The proceedings of the thirtieth SIGCSE technical symposium on computer science education, SIGCSE ’99 (pp. 266–270). New York: ACM. Gitchell, D., & Tran, N. (1999). Sim: a utility for detecting similarity in computer programs. In The proceedings of the thirtieth SIGCSE technical symposium on computer science education, SIGCSE ’99 (pp. 266–270). New York: ACM.
Zurück zum Zitat Jaccard, P. (1912). The distribution of the flora in the alpine zone. New Phytologist, 11(2), 37–50.CrossRef Jaccard, P. (1912). The distribution of the flora in the alpine zone. New Phytologist, 11(2), 37–50.CrossRef
Zurück zum Zitat Jeh, G., & Widom, J. (2002). Simrank: a measure of structural-context similarity. In Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’02 (pp. 538–543). New York: ACM. Jeh, G., & Widom, J. (2002). Simrank: a measure of structural-context similarity. In Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’02 (pp. 538–543). New York: ACM.
Zurück zum Zitat Jiang, J., Lo, D., He, J., Xia, X., Kochhar, P.S., Zhang, L. (2017). Why and how developers fork what from whom in github. Empirical Software Engineering, 22(1), 547–578.CrossRef Jiang, J., Lo, D., He, J., Xia, X., Kochhar, P.S., Zhang, L. (2017). Why and how developers fork what from whom in github. Empirical Software Engineering, 22(1), 547–578.CrossRef
Zurück zum Zitat Kendall, M.G. (1938). A new measure of rank correlation. Biometrika, 30(1/2), 81–93.CrossRef Kendall, M.G. (1938). A new measure of rank correlation. Biometrika, 30(1/2), 81–93.CrossRef
Zurück zum Zitat Khan, S.U.R., Lee, S.P., Ahmad, R.W., Akhunzada, A., Chang, V. (2016). A survey on test suite reduction frameworks and tools. International Journal of Information Management, 36(6), 963–975.CrossRef Khan, S.U.R., Lee, S.P., Ahmad, R.W., Akhunzada, A., Chang, V. (2016). A survey on test suite reduction frameworks and tools. International Journal of Information Management, 36(6), 963–975.CrossRef
Zurück zum Zitat Kobilarov, G., Scott, T., Raimond, Y., Oliver, S., Sizemore, C., Smethurst, M., Bizer, C., Lee, R. (2009). Media meets semantic web – how the bbc uses dbpedia and linked data to make connections. In Aroyo, L., Traverso, P., Ciravegna, F., Cimiano, P., Heath, T., Hyvönen, E., Mizoguchi, R., Oren, E., Sabou, M., Simperl, E. (Eds.) The semantic web: research and applications (pp. 723–737). Berlin: Springer. Kobilarov, G., Scott, T., Raimond, Y., Oliver, S., Sizemore, C., Smethurst, M., Bizer, C., Lee, R. (2009). Media meets semantic web – how the bbc uses dbpedia and linked data to make connections. In Aroyo, L., Traverso, P., Ciravegna, F., Cimiano, P., Heath, T., Hyvönen, E., Mizoguchi, R., Oren, E., Sabou, M., Simperl, E. (Eds.) The semantic web: research and applications (pp. 723–737). Berlin: Springer.
Zurück zum Zitat Kollias, G., Sathe, M., Schenk, O., Grama, A. (2014). Fast parallel algorithms for graph similarity and matching. Journal of Parallel and Distributed Computing, 74 (5), 2400–2410.CrossRef Kollias, G., Sathe, M., Schenk, O., Grama, A. (2014). Fast parallel algorithms for graph similarity and matching. Journal of Parallel and Distributed Computing, 74 (5), 2400–2410.CrossRef
Zurück zum Zitat Landauer, T.K. (2006). Latent semantic analysis. Wiley Online Library. Landauer, T.K. (2006). Latent semantic analysis. Wiley Online Library.
Zurück zum Zitat Landauer, T., Foltz, P., Laham, D. (1998). An introduction to latent semantic analysis. Discourse Processes, 25, 259–284.CrossRef Landauer, T., Foltz, P., Laham, D. (1998). An introduction to latent semantic analysis. Discourse Processes, 25, 259–284.CrossRef
Zurück zum Zitat Leitão, A.M. (2004). Detection of redundant code using r2d2. Software Quality Journal, 12(4), 361–382.CrossRef Leitão, A.M. (2004). Detection of redundant code using r2d2. Software Quality Journal, 12(4), 361–382.CrossRef
Zurück zum Zitat Linares-Vasquez, M., Holtzhauer, A., Poshyvanyk, D. (2016). On automatically detecting similar android apps. 2016 IEEE 24th International Conference on Program Comprehension (ICPC), 00, 1–10. Linares-Vasquez, M., Holtzhauer, A., Poshyvanyk, D. (2016). On automatically detecting similar android apps. 2016 IEEE 24th International Conference on Program Comprehension (ICPC), 00, 1–10.
Zurück zum Zitat Liu, C., Chen, C., Han, J., Yu, P.S. (2006). GPLAG: detection of software plagiarism by program dependence graph analysis. In Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’06 (pp. 872–881). New York: ACM. Liu, C., Chen, C., Han, J., Yu, P.S. (2006). GPLAG: detection of software plagiarism by program dependence graph analysis. In Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’06 (pp. 872–881). New York: ACM.
Zurück zum Zitat Lo, D., Jiang, L., Thung, F. (2012). Detecting similar applications with collaborative tagging. In Proceedings of the 2012 IEEE international conference on software maintenance (ICSM), ICSM ’12 (pp. 600–603). Washington, DC: IEEE Computer Society. Lo, D., Jiang, L., Thung, F. (2012). Detecting similar applications with collaborative tagging. In Proceedings of the 2012 IEEE international conference on software maintenance (ICSM), ICSM ’12 (pp. 600–603). Washington, DC: IEEE Computer Society.
Zurück zum Zitat Maarek, Y.S., Berry, D.M., Kaiser, G.E. (1991). An information retrieval approach for automatically constructing software libraries. IEEE Transactions on Software Engineering, 17(8), 800–813.CrossRef Maarek, Y.S., Berry, D.M., Kaiser, G.E. (1991). An information retrieval approach for automatically constructing software libraries. IEEE Transactions on Software Engineering, 17(8), 800–813.CrossRef
Zurück zum Zitat McMillan, C., Grechanik, M., Poshyvanyk, D. (2012). Detecting similar software applications. In Proceedings of the 34th international conference on software engineering, ICSE ’12 (pp. 364–374). Piscataway: IEEE Press. McMillan, C., Grechanik, M., Poshyvanyk, D. (2012). Detecting similar software applications. In Proceedings of the 34th international conference on software engineering, ICSE ’12 (pp. 364–374). Piscataway: IEEE Press.
Zurück zum Zitat Miller, G.A. (1995). Wordnet: a lexical database for english. Communications of the ACM, 38(11), 39–41.CrossRef Miller, G.A. (1995). Wordnet: a lexical database for english. Communications of the ACM, 38(11), 39–41.CrossRef
Zurück zum Zitat Nassar, H., Veldt, N., Mohammadi, S., Grama, A., Gleich, D.F. (2018). Low rank spectral network alignment. In Proceedings of the 2018 World Wide Web conference, WWW ’18 (pp. 619–628). Republic and Canton of Geneva: International World Wide Web Conferences Steering Committee. Nassar, H., Veldt, N., Mohammadi, S., Grama, A., Gleich, D.F. (2018). Low rank spectral network alignment. In Proceedings of the 2018 World Wide Web conference, WWW ’18 (pp. 619–628). Republic and Canton of Geneva: International World Wide Web Conferences Steering Committee.
Zurück zum Zitat Nguyen, P.T., Tomeo, P., Di Noia, T., Di Sciascio, E. (2015). An evaluation of SimRank and personalized PageRank to build a recommender system for the web of data. In Proceedings of the 24th international conference on World Wide Web, WWW ’15 Companion (pp. 1477–1482). New York: ACM. Nguyen, P.T., Tomeo, P., Di Noia, T., Di Sciascio, E. (2015). An evaluation of SimRank and personalized PageRank to build a recommender system for the web of data. In Proceedings of the 24th international conference on World Wide Web, WWW ’15 Companion (pp. 1477–1482). New York: ACM.
Zurück zum Zitat Nguyen, P.T., Di Rocco, J., Di Ruscio, D. (2018a). Knowledge-aware recommender system for software development. In Proceedings of the 1st Workshop on Knowledge-aware and Conversational Recommender System, KaRS, Vol. 2018. New York: ACM. Nguyen, P.T., Di Rocco, J., Di Ruscio, D. (2018a). Knowledge-aware recommender system for software development. In Proceedings of the 1st Workshop on Knowledge-aware and Conversational Recommender System, KaRS, Vol. 2018. New York: ACM.
Zurück zum Zitat Nguyen, P.T., Di Rocco, J., Di Ruscio, D. (2018b). Mining software repositories to support OSS developers: a recommender systems approach. In Proceedings of the 9th Italian information retrieval workshop, Rome, Italy, May, 28-30, 2018. Nguyen, P.T., Di Rocco, J., Di Ruscio, D. (2018b). Mining software repositories to support OSS developers: a recommender systems approach. In Proceedings of the 9th Italian information retrieval workshop, Rome, Italy, May, 28-30, 2018.
Zurück zum Zitat Nguyen, P.T., Di Rocco, J., Rubei, R., Di Ruscio, D. (2018c). CrossSim: exploiting mutual relationships to detect similar OSS projects. In 2018 44th Euromicro conference on software engineering and advanced applications (SEAA) (pp. 388–395). Nguyen, P.T., Di Rocco, J., Rubei, R., Di Ruscio, D. (2018c). CrossSim: exploiting mutual relationships to detect similar OSS projects. In 2018 44th Euromicro conference on software engineering and advanced applications (SEAA) (pp. 388–395).
Zurück zum Zitat Nguyen, P.T., Di Rocco, J., Di Ruscio, D. (2019a). Enabling heterogeneous recommendations in OSS development: what’s done and what’s next in CROSSMINER. In Proceedings of the evaluation and assessment on software engineering, EASE ’19 (pp. 326–331). New York: ACM. Nguyen, P.T., Di Rocco, J., Di Ruscio, D. (2019a). Enabling heterogeneous recommendations in OSS development: what’s done and what’s next in CROSSMINER. In Proceedings of the evaluation and assessment on software engineering, EASE ’19 (pp. 326–331). New York: ACM.
Zurück zum Zitat Nguyen, P.T., Di Rocco, J., Di Ruscio, D., Ochoa, L., Degueule, T., Di Penta, M. (2019b). FOCUS: a recommender system for mining API function calls and usage patterns. In Proceedings of the 41st international conference on software engineering, ICSE ’19 (pp. 1050–1060). Piscataway: IEEE Press. Nguyen, P.T., Di Rocco, J., Di Ruscio, D., Ochoa, L., Degueule, T., Di Penta, M. (2019b). FOCUS: a recommender system for mining API function calls and usage patterns. In Proceedings of the 41st international conference on software engineering, ICSE ’19 (pp. 1050–1060). Piscataway: IEEE Press.
Zurück zum Zitat Pettigrew, S., & Charters, S. (2008). Tasting as a projective technique. Qualitative Market Research: An International Journal, 11(3), 331–343.CrossRef Pettigrew, S., & Charters, S. (2008). Tasting as a projective technique. Qualitative Market Research: An International Journal, 11(3), 331–343.CrossRef
Zurück zum Zitat Ponzanelli, L., Bavota, G., Di Penta, M., Oliveto, R., Lanza, M. (2014). Mining StackOverflow to turn the IDE into a self-confident programming prompter. In Proceedings of MSR 2014 (pp. 102–111): ACM. Ponzanelli, L., Bavota, G., Di Penta, M., Oliveto, R., Lanza, M. (2014). Mining StackOverflow to turn the IDE into a self-confident programming prompter. In Proceedings of MSR 2014 (pp. 102–111): ACM.
Zurück zum Zitat Ragkhitwetsagul, C., Krinke, J., Clark, D. (2018a). A comparison of code similarity analysers. Empirical Software Engineering, 23(4), 2464–2519.CrossRef Ragkhitwetsagul, C., Krinke, J., Clark, D. (2018a). A comparison of code similarity analysers. Empirical Software Engineering, 23(4), 2464–2519.CrossRef
Zurück zum Zitat Ragkhitwetsagul, C., Krinke, J., Marnette, B. (2018b). A picture is worth a thousand words: code clone detection based on image similarity. In 2018 IEEE 12th International workshop on software clones (IWSC) (pp. 44–50). Ragkhitwetsagul, C., Krinke, J., Marnette, B. (2018b). A picture is worth a thousand words: code clone detection based on image similarity. In 2018 IEEE 12th International workshop on software clones (IWSC) (pp. 44–50).
Zurück zum Zitat Rattan, D., Bhatia, R., Singh, M. (2013). Software clone detection: a systematic review. Information and Software Technology, 55(7), 1165–1199.CrossRef Rattan, D., Bhatia, R., Singh, M. (2013). Software clone detection: a systematic review. Information and Software Technology, 55(7), 1165–1199.CrossRef
Zurück zum Zitat Schafer, J.B., Frankowski, D., Herlocker, J., Sen, S. (2007). The adaptive web. Chapter collaborative filtering recommender systems, (pp. 291–324). Berlin: Springer. Schafer, J.B., Frankowski, D., Herlocker, J., Sen, S. (2007). The adaptive web. Chapter collaborative filtering recommender systems, (pp. 291–324). Berlin: Springer.
Zurück zum Zitat Spearman, C. (1904). The proof and measurement of association between two things. The American Journal of Psychology, 15(1), 72–101.CrossRef Spearman, C. (1904). The proof and measurement of association between two things. The American Journal of Psychology, 15(1), 72–101.CrossRef
Zurück zum Zitat Spinellis, D., & Szyperski, C. (2004). How is open source affecting software development? IEEE Software, 21(1), 28–33.CrossRef Spinellis, D., & Szyperski, C. (2004). How is open source affecting software development? IEEE Software, 21(1), 28–33.CrossRef
Zurück zum Zitat Stadler, C., Lehmann, J., Höffner, K., Auer, S. (2012). LinkedGeoData: a core for a web of spatial open data. Semantic Web, 3, 333–354.CrossRef Stadler, C., Lehmann, J., Höffner, K., Auer, S. (2012). LinkedGeoData: a core for a web of spatial open data. Semantic Web, 3, 333–354.CrossRef
Zurück zum Zitat Thung, F., Lo, D., Lawall, J. (2013). Automated library recommendation. In 2013 20th Working conference on reverse engineering (WCRE) (pp. 182–191). Thung, F., Lo, D., Lawall, J. (2013). Automated library recommendation. In 2013 20th Working conference on reverse engineering (WCRE) (pp. 182–191).
Zurück zum Zitat Tiarks, R., Koschke, R., Falke, R. (2011). An extended assessment of type-3 clones as detected by state-of-the-art tools. Software Quality Journal, 19(2), 295–331.CrossRef Tiarks, R., Koschke, R., Falke, R. (2011). An extended assessment of type-3 clones as detected by state-of-the-art tools. Software Quality Journal, 19(2), 295–331.CrossRef
Zurück zum Zitat Turney, P.D., & Pantel, P. (2010). From frequency to meaning: vector space models of semantics. Journal of Artificial Intelligence Research, 37(1), 141–188.MathSciNetCrossRef Turney, P.D., & Pantel, P. (2010). From frequency to meaning: vector space models of semantics. Journal of Artificial Intelligence Research, 37(1), 141–188.MathSciNetCrossRef
Zurück zum Zitat Tversky, A. (1977). Features of similarity. Psychological Review, 84(4), 327–352.CrossRef Tversky, A. (1977). Features of similarity. Psychological Review, 84(4), 327–352.CrossRef
Zurück zum Zitat Ugurel, S., Krovetz, R., Giles, C.L. (2002). What’s the code?: automatic classification of source code archives. In Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’02 (pp. 632–638). New York: ACM. Ugurel, S., Krovetz, R., Giles, C.L. (2002). What’s the code?: automatic classification of source code archives. In Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’02 (pp. 632–638). New York: ACM.
Zurück zum Zitat Walenstein, A., El-Ramly, M., Cordy, J.R., Evans, W.S., Mahdavi, K., Pizka, M., Ramalingam, G., von Gudenberg, J.W. (2006). Similarity in programs. In Duplication, redundancy, and similarity in software, 23.07. - 26.07.2006. Walenstein, A., El-Ramly, M., Cordy, J.R., Evans, W.S., Mahdavi, K., Pizka, M., Ramalingam, G., von Gudenberg, J.W. (2006). Similarity in programs. In Duplication, redundancy, and similarity in software, 23.07. - 26.07.2006.
Zurück zum Zitat Wang, H., Guo, Y., Ma, Z., Chen, X. (2015a). WuKong: a scalable and accurate two-phase approach to android App clone detection. In Proceedings of the 2015 international symposium on software testing and analysis, ISSTA 2015 (pp. 71–82). New York: ACM. Wang, H., Guo, Y., Ma, Z., Chen, X. (2015a). WuKong: a scalable and accurate two-phase approach to android App clone detection. In Proceedings of the 2015 international symposium on software testing and analysis, ISSTA 2015 (pp. 71–82). New York: ACM.
Zurück zum Zitat Wang, M., Wang, C., Yu, J.X., Zhang, J. (2015b). Community detection in social networks: an in-depth benchmarking study with a procedure-oriented framework. Proceedings of the VLDB Endowment, 8(10), 998–1009.CrossRef Wang, M., Wang, C., Yu, J.X., Zhang, J. (2015b). Community detection in social networks: an in-depth benchmarking study with a procedure-oriented framework. Proceedings of the VLDB Endowment, 8(10), 998–1009.CrossRef
Zurück zum Zitat Xia, X., Lo, D., Wang, X., Zhou, B. (2013). Tag recommendation in software information sites. In Proceedings of the 10th Working Conference on Mining Software Repositories, MSR ’13 (pp. 287–296). Piscataway: IEEE Press. Xia, X., Lo, D., Wang, X., Zhou, B. (2013). Tag recommendation in software information sites. In Proceedings of the 10th Working Conference on Mining Software Repositories, MSR ’13 (pp. 287–296). Piscataway: IEEE Press.
Zurück zum Zitat Zhang, Y., Lo, D., Kochhar, P.S., Xia, X., Li, Q., Sun, J. (2017). Detecting similar repositories on GitHub. 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER), 00, 13–23.CrossRef Zhang, Y., Lo, D., Kochhar, P.S., Xia, X., Li, Q., Sun, J. (2017). Detecting similar repositories on GitHub. 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER), 00, 13–23.CrossRef
Metadaten
Titel
An automated approach to assess the similarity of GitHub repositories
verfasst von
Phuong T. Nguyen
Juri Di Rocco
Riccardo Rubei
Davide Di Ruscio
Publikationsdatum
15.02.2020
Verlag
Springer US
Erschienen in
Software Quality Journal / Ausgabe 2/2020
Print ISSN: 0963-9314
Elektronische ISSN: 1573-1367
DOI
https://doi.org/10.1007/s11219-019-09483-0

Weitere Artikel der Ausgabe 2/2020

Software Quality Journal 2/2020 Zur Ausgabe

EditorialNotes

In this issue

Premium Partner