Skip to main content
Erschienen in: Empirical Software Engineering 1/2022

01.01.2022

Information retrieval versus deep learning approaches for generating traceability links in bilingual projects

verfasst von: Jinfeng Lin, Yalin Liu, Jane Cleland-Huang

Erschienen in: Empirical Software Engineering | Ausgabe 1/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Software traceability links are established between diverse artifacts of the software development process in order to support tasks such as compliance analysis, safety assurance, and requirements validation. However, practice has shown that it is difficult and costly to create and maintain trace links in non-trivially sized projects. For this reason, many researchers have proposed and evaluated automated approaches based on information retrieval and deep-learning. Generating trace links automatically can also be challenging – especially in multi-national projects which include artifacts written in multiple languages. The intermingled language use can reduce the efficiency of automated tracing solutions. In this work, we analyze patterns of intermingled language that we observed in several different projects, and then comparatively evaluate different tracing algorithms. These include Information Retrieval techniques, such as the Vector Space Model (VSM), Latent Semantic Indexing (LSI), Latent Dirichlet Allocation (LDA), and various models that combine mono- and cross-lingual word embeddings with the Generative Vector Space Model (GVSM), and a deep-learning approach based on a BERT language model. Our experimental analysis of trace links generated for 14 Chinese-English projects indicates that our MultiLingual Trace-BERT approach performed best in large projects with close to 2-times the accuracy of the best IR approach, while the IR-based GVSM with neural machine translation and a monolingual word embedding performed best on small projects.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat EF EPI (2019) EF English Proficiency Index EF EPI (2019) EF English Proficiency Index
Zurück zum Zitat Fasttext (2021) Word vectors for 157 languages ⋅ fasttext Fasttext (2021) Word vectors for 157 languages ⋅ fasttext
Zurück zum Zitat Double Blinded (2020) All information is blinded due to current submission under double blind review. the paper is available upon request to the associate editors of the msr emse special edition Double Blinded (2020) All information is blinded due to current submission under double blind review. the paper is available upon request to the associate editors of the msr emse special edition
Zurück zum Zitat Abufardeh S, Magel K (2010) The impact of global software cultural and linguistic aspects on global software development process (gsd): Issues and challenges. In: 4th International conference on new trends in information science and service science. pp 133–138 Abufardeh S, Magel K (2010) The impact of global software cultural and linguistic aspects on global software development process (gsd): Issues and challenges. In: 4th International conference on new trends in information science and service science. pp 133–138
Zurück zum Zitat Ali N, Guéhéneuc Y, Antoniol G (2013) Trustrace: Mining software repositories to improve the accuracy of requirement traceability links. IEEE Trans Softw Eng 39(5):725–741CrossRef Ali N, Guéhéneuc Y, Antoniol G (2013) Trustrace: Mining software repositories to improve the accuracy of requirement traceability links. IEEE Trans Softw Eng 39(5):725–741CrossRef
Zurück zum Zitat Almasri M, Berrut C, Chevallet J (2016) A comparison of deep learning based query expansion with pseudo-relevance feedback and mutual information. In: Advances in information retrieval - 38th European conference on IR research, ECIR 2016, Padua, Italy, March 20-23, 2016. Proceedings. pp 709–715 Almasri M, Berrut C, Chevallet J (2016) A comparison of deep learning based query expansion with pseudo-relevance feedback and mutual information. In: Advances in information retrieval - 38th European conference on IR research, ECIR 2016, Padua, Italy, March 20-23, 2016. Proceedings. pp 709–715
Zurück zum Zitat Antoniol G, Canfora G, Casazza G, Lucia AD, Merlo E (2002) Recovering traceability links between code and documentation. IEEE Trans Software Eng 28(10):970–983CrossRef Antoniol G, Canfora G, Casazza G, Lucia AD, Merlo E (2002) Recovering traceability links between code and documentation. IEEE Trans Software Eng 28(10):970–983CrossRef
Zurück zum Zitat Asuncion HU, Asuncion A, Taylor RN (2010) Software traceability with topic modeling. In: 32nd ACM/IEEE International conference on software engineering (ICSE). pp 95–104 Asuncion HU, Asuncion A, Taylor RN (2010) Software traceability with topic modeling. In: 32nd ACM/IEEE International conference on software engineering (ICSE). pp 95–104
Zurück zum Zitat Asuncion HU, Taylor RN (2012) Automated techniques for capturing custom traceability links across heterogeneous artifacts. In: Software and systems traceability. pp 129–146 Asuncion HU, Taylor RN (2012) Automated techniques for capturing custom traceability links across heterogeneous artifacts. In: Software and systems traceability. pp 129–146
Zurück zum Zitat Bird S (2006) NLTK: the natural language toolkit. In: ACL 2006, 21st International conference on computational linguistics and 44th annual meeting of the association for computational linguistics, Proceedings of the Conference, Sydney, Australia, 17-21 July 2006 Bird S (2006) NLTK: the natural language toolkit. In: ACL 2006, 21st International conference on computational linguistics and 44th annual meeting of the association for computational linguistics, Proceedings of the Conference, Sydney, Australia, 17-21 July 2006
Zurück zum Zitat Calefato F, Lanubile F, P Minervini and (2010) Can real-time machine translation overcome language barriers in distributed requirements engineering?. In: 2010 5th IEEE International conference on global software engineering. IEEE, pp 257–264 Calefato F, Lanubile F, P Minervini and (2010) Can real-time machine translation overcome language barriers in distributed requirements engineering?. In: 2010 5th IEEE International conference on global software engineering. IEEE, pp 257–264
Zurück zum Zitat Calefato F, Lanubile F, Prikladnicki R (2011) A controlled experiment on the effects of machine translation in multilingual requirements meetings. In: 6th IEEE International conference on global software engineering, ICGSE 2011, Helsinki, Finland, August 15-18, 2011. pp 94–102 Calefato F, Lanubile F, Prikladnicki R (2011) A controlled experiment on the effects of machine translation in multilingual requirements meetings. In: 6th IEEE International conference on global software engineering, ICGSE 2011, Helsinki, Finland, August 15-18, 2011. pp 94–102
Zurück zum Zitat Cleland-Huang J, Czauderna A, Dekhtyar A, Gotel O, Hayes JH, Keenan E, Leach G, Maletic JI, Poshyvanyk D, Shin Y, Zisman A, Antoniol G, Berenbach B, Egyed A, Mȧder P (2011) Grand challenges, benchmarks, and tracelab: developing infrastructure for the software traceability research community. In: TEFSE’11, Proceedings of the 6th International workshop on traceability in emerging forms of software engineering, May 23, 2011, Waikiki, Honolulu, HI, USA. pp 17–23 Cleland-Huang J, Czauderna A, Dekhtyar A, Gotel O, Hayes JH, Keenan E, Leach G, Maletic JI, Poshyvanyk D, Shin Y, Zisman A, Antoniol G, Berenbach B, Egyed A, Mȧder P (2011) Grand challenges, benchmarks, and tracelab: developing infrastructure for the software traceability research community. In: TEFSE’11, Proceedings of the 6th International workshop on traceability in emerging forms of software engineering, May 23, 2011, Waikiki, Honolulu, HI, USA. pp 17–23
Zurück zum Zitat Cleland-Huang J, Gotel O, Hayes JH, Mäder P, Zisman A (2014) Software traceability: trends and future directions. In: FOSE. pp 55–69 Cleland-Huang J, Gotel O, Hayes JH, Mäder P, Zisman A (2014) Software traceability: trends and future directions. In: FOSE. pp 55–69
Zurück zum Zitat Cleland-Huang J, Rahimi M, Mȧder P (2014) Achieving lightweight trustworthy traceability. In: Proceedings of the 22nd ACM SIGSOFT International symposium on foundations of software engineering, (FSE-22), Hong Kong, China, November 16 - 22, 2014. pp 849–852 Cleland-Huang J, Rahimi M, Mȧder P (2014) Achieving lightweight trustworthy traceability. In: Proceedings of the 22nd ACM SIGSOFT International symposium on foundations of software engineering, (FSE-22), Hong Kong, China, November 16 - 22, 2014. pp 849–852
Zurück zum Zitat Conneau A, Lample G, Ranzato M, Denoyer L, Jégou H. (2017) Word translation without parallel data. arXiv:1710.04087 Conneau A, Lample G, Ranzato M, Denoyer L, Jégou H. (2017) Word translation without parallel data. arXiv:1710.​04087
Zurück zum Zitat Conneau A, Lample G, Rinott R, Williams A, Bowman SR, Schwenk H, Stoyanov V (2018) Xnli: Evaluating cross-lingual sentence representations. arXiv:1809.05053 Conneau A, Lample G, Rinott R, Williams A, Bowman SR, Schwenk H, Stoyanov V (2018) Xnli: Evaluating cross-lingual sentence representations. arXiv:1809.​05053
Zurück zum Zitat Cover TM, Thomas JA (2006) Elements of information theory (Wiley series in telecommunications and signal processing). Wiley-Interscience, New York Cover TM, Thomas JA (2006) Elements of information theory (Wiley series in telecommunications and signal processing). Wiley-Interscience, New York
Zurück zum Zitat Cruz BD, Jayaraman B, Dwarakanath A, McMillan C (2017) Detecting vague words & phrases in requirements documents in a multilingual environment. In: 2017 IEEE 25th International requirements engineering conference (RE). pp 233–242. IEEE Cruz BD, Jayaraman B, Dwarakanath A, McMillan C (2017) Detecting vague words & phrases in requirements documents in a multilingual environment. In: 2017 IEEE 25th International requirements engineering conference (RE). pp 233–242. IEEE
Zurück zum Zitat Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding arXiv:1810.04805 Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding arXiv:1810.​04805
Zurück zum Zitat Dhingra B, Zhou Z, Fitzpatrick D, Muehl M, Cohen WW (2016) Tweet2vec: Character-based distributed representations for social media. In: Proceedings of the 54th annual meeting of the association for computational linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 2: Short Papers Dhingra B, Zhou Z, Fitzpatrick D, Muehl M, Cohen WW (2016) Tweet2vec: Character-based distributed representations for social media. In: Proceedings of the 54th annual meeting of the association for computational linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 2: Short Papers
Zurück zum Zitat Fu Y (2021) Who offers the best chinese-english machine translation? a comparison of google, microsoft bing, baidu, tencent, sogou, and netease youdao Fu Y (2021) Who offers the best chinese-english machine translation? a comparison of google, microsoft bing, baidu, tencent, sogou, and netease youdao
Zurück zum Zitat Google-Research (2019) Github Repository: Multilingual Models google-research/bert Google-Research (2019) Github Repository: Multilingual Models google-research/bert
Zurück zum Zitat Gotel O, Cleland-Huang J, Huffman Hayes J, Zisman A, Egyed A, Grünbacher P., Antoniol G (2012) The quest for ubiquity: A roadmap for software and systems traceability research. In: 21st IEEE International requirements engineering conference (RE). pp 71–80 Gotel O, Cleland-Huang J, Huffman Hayes J, Zisman A, Egyed A, Grünbacher P., Antoniol G (2012) The quest for ubiquity: A roadmap for software and systems traceability research. In: 21st IEEE International requirements engineering conference (RE). pp 71–80
Zurück zum Zitat Gotel OCZ, Finkelstein A (1994) An analysis of the requirements traceability problem. In: Proceedings of the first IEEE international conference on requirements engineering, ICRE ’94, Colorado Springs, Colorado, USA, April 18-21, 1994. pp 94–101 Gotel OCZ, Finkelstein A (1994) An analysis of the requirements traceability problem. In: Proceedings of the first IEEE international conference on requirements engineering, ICRE ’94, Colorado Springs, Colorado, USA, April 18-21, 1994. pp 94–101
Zurück zum Zitat Gouws S, Bengio Y, Corrado G (2015) Bilbowa: Fast bilingual distributed representations without word alignments. In: Proceedings of the 32nd International conference on machine learning, ICML 2015, Lille, France, 6-11 July 2015. pp 748–756 Gouws S, Bengio Y, Corrado G (2015) Bilbowa: Fast bilingual distributed representations without word alignments. In: Proceedings of the 32nd International conference on machine learning, ICML 2015, Lille, France, 6-11 July 2015. pp 748–756
Zurück zum Zitat Guo J, Cheng J, Cleland-Huang J (2017) Semantically enhanced software traceability using deep learning techniques. In: Proceedings of the 39th international conference on software engineering, ICSE 2017, Buenos Aires, Argentina, May 20-28, 2017. pp 3–14 Guo J, Cheng J, Cleland-Huang J (2017) Semantically enhanced software traceability using deep learning techniques. In: Proceedings of the 39th international conference on software engineering, ICSE 2017, Buenos Aires, Argentina, May 20-28, 2017. pp 3–14
Zurück zum Zitat Guo J, Cleland-Huang J, Berenbach B (2013) Foundations for an expert system in domain-specific traceability. In: 21st IEEE International requirements engineering conference, RE 2013, Rio de Janeiro-RJ, Brazil, July 15-19, 2013. IEEE Computer Society, pp 42–5 Guo J, Cleland-Huang J, Berenbach B (2013) Foundations for an expert system in domain-specific traceability. In: 21st IEEE International requirements engineering conference, RE 2013, Rio de Janeiro-RJ, Brazil, July 15-19, 2013. IEEE Computer Society, pp 42–5
Zurück zum Zitat Hayes JH, Dekhtyar A, Sundaram SK (2006) Advancing candidate link generation for requirements tracing: The study of methods. IEEE Trans Software Eng 32(1):4–19CrossRef Hayes JH, Dekhtyar A, Sundaram SK (2006) Advancing candidate link generation for requirements tracing: The study of methods. IEEE Trans Software Eng 32(1):4–19CrossRef
Zurück zum Zitat Hilgert L, Lopes L, Freitas A, Vieira R, Hogetop D, Vanim A (2014) Building domain specific bilingual dictionaries. In: Proceedings of the ninth international conference on language resources and evaluation (LREC’14), 2014, Islândia Hilgert L, Lopes L, Freitas A, Vieira R, Hogetop D, Vanim A (2014) Building domain specific bilingual dictionaries. In: Proceedings of the ninth international conference on language resources and evaluation (LREC’14), 2014, Islândia
Zurück zum Zitat Hofmann T (1999) Probabilistic latent semantic indexing. In: SIGIR ’99: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval, August 15-19, 1999, Berkeley, CA, USA. pp 50–57 Hofmann T (1999) Probabilistic latent semantic indexing. In: SIGIR ’99: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval, August 15-19, 1999, Berkeley, CA, USA. pp 50–57
Zurück zum Zitat Jenkins J (1999) New ideographs in unicode 3.0 and beyond. In: Proceedings of the 15th international unicode conference C, vol 15. pp 1–2 Jenkins J (1999) New ideographs in unicode 3.0 and beyond. In: Proceedings of the 15th international unicode conference C, vol 15. pp 1–2
Zurück zum Zitat Johnson M, Schuster M, Le QV, Krikun M, Wu Y, Chen Z, Thorat N, Viégas F, Wattenberg M, Corrado G et al (2017) Google’s multilingual neural machine translation system: Enabling zero-shot translation. Trans Assoc Comput Linguist 5:339–351 Johnson M, Schuster M, Le QV, Krikun M, Wu Y, Chen Z, Thorat N, Viégas F, Wattenberg M, Corrado G et al (2017) Google’s multilingual neural machine translation system: Enabling zero-shot translation. Trans Assoc Comput Linguist 5:339–351
Zurück zum Zitat Jones E, Oliphant T, Peterson P et al (2001) SciPy: Open source scientific tools for Python. [Online; accessed < today >] Jones E, Oliphant T, Peterson P et al (2001) SciPy: Open source scientific tools for Python. [Online; accessed < today >]
Zurück zum Zitat Joulin A, Bojanowski P, Mikolov T, Jégou H., Grave E (2018) Loss in translation: Learning bilingual word mapping with a retrieval criterion. In: Proceedings of the 2018 conference on empirical methods in natural language processing, Brussels, Belgium, October 31 - November 4, 2018. pp 2979–2984 Joulin A, Bojanowski P, Mikolov T, Jégou H., Grave E (2018) Loss in translation: Learning bilingual word mapping with a retrieval criterion. In: Proceedings of the 2018 conference on empirical methods in natural language processing, Brussels, Belgium, October 31 - November 4, 2018. pp 2979–2984
Zurück zum Zitat Kailath T (1967) The divergence and bhattacharyya distance measures in signal selection. IEEE Trans Commun Technol 15(1):52–60CrossRef Kailath T (1967) The divergence and bhattacharyya distance measures in signal selection. IEEE Trans Commun Technol 15(1):52–60CrossRef
Zurück zum Zitat Khandkar SH (2009) Open coding. University of Calgary, 23:2009 Khandkar SH (2009) Open coding. University of Calgary, 23:2009
Zurück zum Zitat Krishna S, Sahay S, Walsham G (2004) Managing cross-cultural issues in global software outsourcing. Commun ACM 47(4):62–66CrossRef Krishna S, Sahay S, Walsham G (2004) Managing cross-cultural issues in global software outsourcing. Commun ACM 47(4):62–66CrossRef
Zurück zum Zitat Liu Y, Lin J, Cleland-Huang J (2020) Traceability support for multi-lingual software projects. In: Kim S, Gousios G, Nadi S, Hejderup J (eds) MSR ’20: 17th International conference on mining software repositories, Seoul, Republic of Korea, 29-30 June, 2020. ACM, pp 443–454 Liu Y, Lin J, Cleland-Huang J (2020) Traceability support for multi-lingual software projects. In: Kim S, Gousios G, Nadi S, Hejderup J (eds) MSR ’20: 17th International conference on mining software repositories, Seoul, Republic of Korea, 29-30 June, 2020. ACM, pp 443–454
Zurück zum Zitat Liu Y, Lin J, Zeng Q, Jiang M, Cleland-Huang J (2020) Towards semantically guided traceability. In: 2020 IEEE 28th International requirements engineering conference (RE). pp 328–333. IEEE Liu Y, Lin J, Zeng Q, Jiang M, Cleland-Huang J (2020) Towards semantically guided traceability. In: 2020 IEEE 28th International requirements engineering conference (RE). pp 328–333. IEEE
Zurück zum Zitat Lohar S, Amornborvornwong S, Zisman A, Cleland-Huang J (2013) Improving trace accuracy through data-driven configuration and composition of tracing features. In: Joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering, ESEC/FSE’13, Saint Petersburg, Russian Federation, August 18-26, 2013. pp 378–388 Lohar S, Amornborvornwong S, Zisman A, Cleland-Huang J (2013) Improving trace accuracy through data-driven configuration and composition of tracing features. In: Joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering, ESEC/FSE’13, Saint Petersburg, Russian Federation, August 18-26, 2013. pp 378–388
Zurück zum Zitat Lormans M, Van Deursen A (2006) Can lsi help reconstructing requirements traceability in design and test?. In: Conference on software maintenance and reengineering (CSMR’06). IEEE, pp 10–pp Lormans M, Van Deursen A (2006) Can lsi help reconstructing requirements traceability in design and test?. In: Conference on software maintenance and reengineering (CSMR’06). IEEE, pp 10–pp
Zurück zum Zitat Lucia AD, Fasano F, Oliveto R, Tortora G (2007) Recovering traceability links in software artifact management systems using information retrieval methods. ACM Trans Softw Eng Methodol 16(4) Lucia AD, Fasano F, Oliveto R, Tortora G (2007) Recovering traceability links in software artifact management systems using information retrieval methods. ACM Trans Softw Eng Methodol 16(4)
Zurück zum Zitat Lutz B (2009) Linguistic challenges in global software development: Lessons learned in an international SW development division. In: 4th IEEE International conference on global software engineering, ICGSE 2009, Limerick, Ireland, 13-16 July, 2009. pp 249–253 Lutz B (2009) Linguistic challenges in global software development: Lessons learned in an international SW development division. In: 4th IEEE International conference on global software engineering, ICGSE 2009, Limerick, Ireland, 13-16 July, 2009. pp 249–253
Zurück zum Zitat Mȧder P, Gotel O (2012) Towards automated traceability maintenance. J Syst Softw 85(10):2205–2227CrossRef Mȧder P, Gotel O (2012) Towards automated traceability maintenance. J Syst Softw 85(10):2205–2227CrossRef
Zurück zum Zitat Meeker M, Wu L (2018) Internet trends 2018 Meeker M, Wu L (2018) Internet trends 2018
Zurück zum Zitat Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781 Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:1301.​3781
Zurück zum Zitat Mikolov T, Grave E, Bojanowski P, Puhrsch C, Joulin A (2018) Advances in pre-training distributed word representations. In: Proceedings of the International conference on language resources and evaluation (LREC 2018) Mikolov T, Grave E, Bojanowski P, Puhrsch C, Joulin A (2018) Advances in pre-training distributed word representations. In: Proceedings of the International conference on language resources and evaluation (LREC 2018)
Zurück zum Zitat Monti J, Monteleone M, Di Buono MP, Marano F (2013) Natural language processing and big data-an ontology-based approach for cross-lingual information retrieval. In: 2013 International conference on social computing. IEEE, pp 725–731 Monti J, Monteleone M, Di Buono MP, Marano F (2013) Natural language processing and big data-an ontology-based approach for cross-lingual information retrieval. In: 2013 International conference on social computing. IEEE, pp 725–731
Zurück zum Zitat Moulin C, Sugawara K, Fujita S, Wouters L, Manabe Y (2009) Multilingual collaborative design support system. In: Proceedings of the 13th International conference on computers supported cooperative work in design, CSCWD 2009, April 22-24, 2009, Santiago, Chile. pp 312–318 Moulin C, Sugawara K, Fujita S, Wouters L, Manabe Y (2009) Multilingual collaborative design support system. In: Proceedings of the 13th International conference on computers supported cooperative work in design, CSCWD 2009, April 22-24, 2009, Santiago, Chile. pp 312–318
Zurück zum Zitat Muhr M, Kern R, Zechner M, Granitzer M (2010) External and intrinsic plagiarism detection using a cross-lingual retrieval and segmentation system. In: Notebook papers of CLEF 2010 LABs and workshops Muhr M, Kern R, Zechner M, Granitzer M (2010) External and intrinsic plagiarism detection using a cross-lingual retrieval and segmentation system. In: Notebook papers of CLEF 2010 LABs and workshops
Zurück zum Zitat Oliveto R, Gethers M, Poshyvanyk D, Lucia AD (2010) On the equivalence of information retrieval methods for automated traceability link recovery. In: The 18th IEEE International conference on program comprehension, ICPC 2010, Braga, Minho, Portugal, June 30-July 2, 2010. pp 68–71 Oliveto R, Gethers M, Poshyvanyk D, Lucia AD (2010) On the equivalence of information retrieval methods for automated traceability link recovery. In: The 18th IEEE International conference on program comprehension, ICPC 2010, Braga, Minho, Portugal, June 30-July 2, 2010. pp 68–71
Zurück zum Zitat Pawelka T, Juergens E (2015) Is this code written in english? a study of the natural language of comments and identifiers in practice. In: 2015 IEEE International conference on software maintenance and evolution (ICSME). IEEE, pp 401–410 Pawelka T, Juergens E (2015) Is this code written in english? a study of the natural language of comments and identifiers in practice. In: 2015 IEEE International conference on software maintenance and evolution (ICSME). IEEE, pp 401–410
Zurück zum Zitat Rath M, Rendall J, Guo JLC, Cleland-Huang J, Mȧder P (2018) Traceability in the wild: automatically augmenting incomplete trace links. In: Proceedings of the 40th international conference on software engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018. pp 834–845 Rath M, Rendall J, Guo JLC, Cleland-Huang J, Mȧder P (2018) Traceability in the wild: automatically augmenting incomplete trace links. In: Proceedings of the 40th international conference on software engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018. pp 834–845
Zurück zum Zitat Rempel P, Mäder P, Kuschke T, Cleland-Huang J (2015) Traceability gap analysis for assessing the conformance of software traceability to relevant guidelines. In: Software engineering & management 2015, Multikonferenz der GI-Fachbereiche Softwaretechnik (SWT) und Wirtschaftsinformatik, Dresden, Germany. pp 120–121 Rempel P, Mäder P, Kuschke T, Cleland-Huang J (2015) Traceability gap analysis for assessing the conformance of software traceability to relevant guidelines. In: Software engineering & management 2015, Multikonferenz der GI-Fachbereiche Softwaretechnik (SWT) und Wirtschaftsinformatik, Dresden, Germany. pp 120–121
Zurück zum Zitat Ruder S, Vuli’c I, Sogaard A (2017) A survey of cross-lingual word embedding models Ruder S, Vuli’c I, Sogaard A (2017) A survey of cross-lingual word embedding models
Zurück zum Zitat Sanh V, Debut L, Chaumond J, Wolf T (2019) Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv:1910.01108 Sanh V, Debut L, Chaumond J, Wolf T (2019) Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv:1910.​01108
Zurück zum Zitat Shin Y, Hayes JH, Cleland-Huang J (2015) Guidelines for benchmarking automated software traceability techniques. In: 8th IEEE/ACM International symposium on software and systems traceability, SST 2015, Florence, Italy, May 17, 2015. pp 61–67 Shin Y, Hayes JH, Cleland-Huang J (2015) Guidelines for benchmarking automated software traceability techniques. In: 8th IEEE/ACM International symposium on software and systems traceability, SST 2015, Florence, Italy, May 17, 2015. pp 61–67
Zurück zum Zitat Spanoudakis G, Zisman A, Pérez-Miñana E, Krause P (2004) Rule-based generation of requirements traceability relations. J Syst Softw 72(2):105–127CrossRef Spanoudakis G, Zisman A, Pérez-Miñana E, Krause P (2004) Rule-based generation of requirements traceability relations. J Syst Softw 72(2):105–127CrossRef
Zurück zum Zitat Tang G, Xia Y, Zhang M, Li H, Zheng F (2011) CLGVSM: adapting generalized vector space model to cross-lingual document clustering. In: Fifth International joint conference on natural language processing, IJCNLP 2011, Chiang Mai, Thailand, November 8-13, 2011. pp 580–588 Tang G, Xia Y, Zhang M, Li H, Zheng F (2011) CLGVSM: adapting generalized vector space model to cross-lingual document clustering. In: Fifth International joint conference on natural language processing, IJCNLP 2011, Chiang Mai, Thailand, November 8-13, 2011. pp 580–588
Zurück zum Zitat Trec-Kba, trec-kba/many-stop-words (2021) Trec-Kba, trec-kba/many-stop-words (2021)
Zurück zum Zitat Treude C, Prolo CA, Figueira Filho F (2015) Challenges in analyzing software documentation in portuguese. In: 2015 29th Brazilian symposium on software engineering. IEEE, pp 179–184 Treude C, Prolo CA, Figueira Filho F (2015) Challenges in analyzing software documentation in portuguese. In: 2015 29th Brazilian symposium on software engineering. IEEE, pp 179–184
Zurück zum Zitat Tsatsaronis G, Panagiotopoulou V (2009) A generalized vector space model for text retrieval based on semantic relatedness. In: EACL 2009, 12th conference of the european chapter of the association for computational linguistics, Proceedings of the Conference, Athens, Greece, March 30 - April 3, 2009. pp 70–78 Tsatsaronis G, Panagiotopoulou V (2009) A generalized vector space model for text retrieval based on semantic relatedness. In: EACL 2009, 12th conference of the european chapter of the association for computational linguistics, Proceedings of the Conference, Athens, Greece, March 30 - April 3, 2009. pp 70–78
Zurück zum Zitat Tsatsaronis G, Varlamis I, Vazirgiannis M (2010) Text relatedness based on a word thesaurus. J Artif Intell Res 37:1–39CrossRef Tsatsaronis G, Varlamis I, Vazirgiannis M (2010) Text relatedness based on a word thesaurus. J Artif Intell Res 37:1–39CrossRef
Zurück zum Zitat Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems. pp 5998–6008 Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems. pp 5998–6008
Zurück zum Zitat Vulic I (2017) Cross-lingual syntactically informed distributed word representations. In: Proceedings of the 15th conference of the european chapter of the association for computational linguistics, EACL 2017, Valencia, Spain, April 3-7, 2017, Volume 2: Short Papers. pp 408–414 Vulic I (2017) Cross-lingual syntactically informed distributed word representations. In: Proceedings of the 15th conference of the european chapter of the association for computational linguistics, EACL 2017, Valencia, Spain, April 3-7, 2017, Volume 2: Short Papers. pp 408–414
Zurück zum Zitat Vulic I, Moens M (2015) Monolingual and cross-lingual information retrieval models based on (bilingual) word embeddings. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, Santiago, Chile, August 9-13, 2015. pp 363–372 Vulic I, Moens M (2015) Monolingual and cross-lingual information retrieval models based on (bilingual) word embeddings. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, Santiago, Chile, August 9-13, 2015. pp 363–372
Zurück zum Zitat Wada T, Iwata T (2018) Unsupervised cross-lingual word embedding by multilingual neural language models. arXiv:1809.02306 Wada T, Iwata T (2018) Unsupervised cross-lingual word embedding by multilingual neural language models. arXiv:1809.​02306
Zurück zum Zitat Wong SKM, Ziarko W, Raghavan VV, Wong PCN (1989) Extended boolean query processing in the generalized vector space model. Inf Syst 14(1):47–63CrossRef Wong SKM, Ziarko W, Raghavan VV, Wong PCN (1989) Extended boolean query processing in the generalized vector space model. Inf Syst 14(1):47–63CrossRef
Zurück zum Zitat Wong SKM, Ziarko W, Wong PCN (1985) Generalized vector space model in information retrieval. In: Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval, Montréal, Québec, Canada, June 5-7, 1985. pp 18–25 Wong SKM, Ziarko W, Wong PCN (1985) Generalized vector space model in information retrieval. In: Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval, Montréal, Québec, Canada, June 5-7, 1985. pp 18–25
Zurück zum Zitat Woolson R (2007) Wilcoxon signed-rank test. Wiley encyclopedia of clinical trials. pp 1–3 Woolson R (2007) Wilcoxon signed-rank test. Wiley encyclopedia of clinical trials. pp 1–3
Zurück zum Zitat Wouters L, Kaeri Y, Sugawara K (2013) Multi-domain multi-lingual collaborative design. In: Proceedings of the 2013 IEEE 17th International conference on computer supported cooperative work in design (CSCWD), IEEE, pp 269–274 Wouters L, Kaeri Y, Sugawara K (2013) Multi-domain multi-lingual collaborative design. In: Proceedings of the 2013 IEEE 17th International conference on computer supported cooperative work in design (CSCWD), IEEE, pp 269–274
Zurück zum Zitat Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K, Klingner J, Shah A, Johnson M, Liu X, Kaiser L, Gouws S, Kato Y, Kudo T, Kazawa H, Stevens K, Kurian G, Patil N, Wang W, Young C, Smith J, Riesa J, Rudnick A, Vinyals O, Corrado G, Hughes M, Dean J (2016) Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv:1609.08144 Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K, Klingner J, Shah A, Johnson M, Liu X, Kaiser L, Gouws S, Kato Y, Kudo T, Kazawa H, Stevens K, Kurian G, Patil N, Wang W, Young C, Smith J, Riesa J, Rudnick A, Vinyals O, Corrado G, Hughes M, Dean J (2016) Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv:1609.​08144
Zurück zum Zitat Xia X, Lo D, Wang X, Zhang C, Wang X (2014) Cross-language bug localization. In: Proceedings of the 22nd International conference on program comprehension. pp 275–278 Xia X, Lo D, Wang X, Zhang C, Wang X (2014) Cross-language bug localization. In: Proceedings of the 22nd International conference on program comprehension. pp 275–278
Zurück zum Zitat Xu B, Xing Z, Xia X, Lo D, Li S (2018) Domain-specific cross-language relevant question retrieval. Empir Softw Eng 23(2):1084–1122CrossRef Xu B, Xing Z, Xia X, Lo D, Li S (2018) Domain-specific cross-language relevant question retrieval. Empir Softw Eng 23(2):1084–1122CrossRef
Zurück zum Zitat Ye X, Qi Z, Massey D (2015) Learning relevance from click data via neural network based similarity models. In: 2015 IEEE International conference on big data, Big Data 2015, Santa Clara, CA. pp 801–806 Ye X, Qi Z, Massey D (2015) Learning relevance from click data via neural network based similarity models. In: 2015 IEEE International conference on big data, Big Data 2015, Santa Clara, CA. pp 801–806
Zurück zum Zitat Zhao T, Cao Q, Sun Q (2017) An improved approach to traceability recovery based on word embeddings. In: 24th Asia-pacific software engineering conference, APSEC 2017, Nanjing, China, December 4-8, 2017. pp 81–89 Zhao T, Cao Q, Sun Q (2017) An improved approach to traceability recovery based on word embeddings. In: 24th Asia-pacific software engineering conference, APSEC 2017, Nanjing, China, December 4-8, 2017. pp 81–89
Metadaten
Titel
Information retrieval versus deep learning approaches for generating traceability links in bilingual projects
verfasst von
Jinfeng Lin
Yalin Liu
Jane Cleland-Huang
Publikationsdatum
01.01.2022
Verlag
Springer US
Erschienen in
Empirical Software Engineering / Ausgabe 1/2022
Print ISSN: 1382-3256
Elektronische ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-021-10050-0

Weitere Artikel der Ausgabe 1/2022

Empirical Software Engineering 1/2022 Zur Ausgabe

Premium Partner