Skip to main content

2020 | OriginalPaper | Buchkapitel

Automatic Word Embeddings-Based Glossary Term Extraction from Large-Sized Software Requirements

verfasst von : Siba Mishra, Arpit Sharma

Erschienen in: Requirements Engineering: Foundation for Software Quality

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

[Context and Motivation] Requirements glossary defines specialized and technical terms used in a requirements document. A requirements glossary helps in improving the quality and understandability of requirements documents. [Question/Problem] Manual extraction of glossary terms from a large body of requirements is an expensive and time-consuming task. This paper proposes a fundamentally new approach for automated extraction of glossary terms from large-sized requirements documents. [Principal Ideas/Result] Firstly, our technique extracts the candidate glossary terms by applying text chunking. Next, we apply a novel word embeddings based semantic filter for reducing the number of candidate glossary terms. Since word embeddings are very effective in identifying terms that are semantically very similar, this filter ensures that only domain-specific terms are present in the final set of glossary terms. We create a domain-specific reference corpus for home automation by Wikipedia crawling and use it for computing the semantic similarity scores of candidate glossary terms. We apply our technique to a large-sized requirements document, i.e., a CrowdRE dataset with around 3000 crowd-generated requirements for smart home applications. Semantic filtering reduces the number of glossary terms by 92.7%. To evaluate the quality of our extracted glossary terms we manually create the ground truth data from CrowdRE dataset and use it for computing precision and recall. Additionally, we also compute the requirements coverage of these extracted glossary terms. [Contributions] Our detailed experiments show that word embeddings based semantic filtering can be very useful for extracting glossary terms from a large body of requirements.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Aguilera, C., Berry, D.M.: The use of a repeated phrase finder in requirements extraction. J. Syst. Softw. 13(3), 209–230 (1990)CrossRef Aguilera, C., Berry, D.M.: The use of a repeated phrase finder in requirements extraction. J. Syst. Softw. 13(3), 209–230 (1990)CrossRef
2.
Zurück zum Zitat Arora, C., Sabetzadeh, M., Briand, L.C., Zimmer, F.: Automated extraction and clustering of requirements glossary terms. IEEE Trans. Softw. Eng. 43(10), 918–945 (2017)CrossRef Arora, C., Sabetzadeh, M., Briand, L.C., Zimmer, F.: Automated extraction and clustering of requirements glossary terms. IEEE Trans. Softw. Eng. 43(10), 918–945 (2017)CrossRef
3.
Zurück zum Zitat Dwarakanath, A., Ramnani, R.R., Sengupta, S.: Automatic extraction of glossary terms from natural language requirements. In: 21st IEEE International Requirements Engineering Conference (RE), pp. 314–319, July 2013 Dwarakanath, A., Ramnani, R.R., Sengupta, S.: Automatic extraction of glossary terms from natural language requirements. In: 21st IEEE International Requirements Engineering Conference (RE), pp. 314–319, July 2013
4.
Zurück zum Zitat Ferrari, A., Donati, B., Gnesi, S.: Detecting domain-specific ambiguities: an NLP approach based on Wikipedia crawling and word embeddings. In: 25th IEEE International Requirements Engineering Conference Workshops (REW), pp. 393–399, September 2017 Ferrari, A., Donati, B., Gnesi, S.: Detecting domain-specific ambiguities: an NLP approach based on Wikipedia crawling and word embeddings. In: 25th IEEE International Requirements Engineering Conference Workshops (REW), pp. 393–399, September 2017
5.
Zurück zum Zitat Ferrari, A., Esuli, A., Gnesi, S.: Identification of cross-domain ambiguity with language models. In: 5th International Workshop on Artificial Intelligence for Requirements Engineering (AIRE), pp. 31–38, August 2018 Ferrari, A., Esuli, A., Gnesi, S.: Identification of cross-domain ambiguity with language models. In: 5th International Workshop on Artificial Intelligence for Requirements Engineering (AIRE), pp. 31–38, August 2018
6.
Zurück zum Zitat Gemkow, T., Conzelmann, M., Hartig, K., Vogelsang, A.: Automatic glossary term extraction from large-scale requirements specifications. In: 26th IEEE International Requirements Engineering Conference, pp. 412–417. IEEE Computer Society (2018) Gemkow, T., Conzelmann, M., Hartig, K., Vogelsang, A.: Automatic glossary term extraction from large-scale requirements specifications. In: 26th IEEE International Requirements Engineering Conference, pp. 412–417. IEEE Computer Society (2018)
9.
Zurück zum Zitat Justeson, J.S., Katz, S.M.: Technical terminology: some linguistic properties and an algorithm for identification in text. Nat. Lang. Eng. 1(1), 9–27 (1995)CrossRef Justeson, J.S., Katz, S.M.: Technical terminology: some linguistic properties and an algorithm for identification in text. Nat. Lang. Eng. 1(1), 9–27 (1995)CrossRef
10.
Zurück zum Zitat Kof, L.: Natural language processing for requirements engineering: applicability to large requirements documents. In: Workshop on Automated Software Engineering (2004) Kof, L.: Natural language processing for requirements engineering: applicability to large requirements documents. In: Workshop on Automated Software Engineering (2004)
11.
Zurück zum Zitat Levy, O., Goldberg, Y.: Neural word embedding as implicit matrix factorization. In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2. NIPS 2014, pp. 2177–2185 (2014) Levy, O., Goldberg, Y.: Neural word embedding as implicit matrix factorization. In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2. NIPS 2014, pp. 2177–2185 (2014)
13.
Zurück zum Zitat Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013) Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:​1301.​3781 (2013)
14.
Zurück zum Zitat Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2. NIPS 2013, pp. 3111–3119 (2013) Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2. NIPS 2013, pp. 3111–3119 (2013)
15.
Zurück zum Zitat Mishra, S., Sharma, A.: On the use of word embeddings for identifying domain specific ambiguities in requirements. In: 2019 IEEE 27th International Requirements Engineering Conference Workshops (REW), pp. 234–240 (2019) Mishra, S., Sharma, A.: On the use of word embeddings for identifying domain specific ambiguities in requirements. In: 2019 IEEE 27th International Requirements Engineering Conference Workshops (REW), pp. 234–240 (2019)
16.
Zurück zum Zitat Murukannaiah, P.K., Ajmeri, N., Singh, M.P.: Toward automating crowd RE. In: 25th IEEE International Requirements Engineering Conference (RE), pp. 512–515, September 2017 Murukannaiah, P.K., Ajmeri, N., Singh, M.P.: Toward automating crowd RE. In: 25th IEEE International Requirements Engineering Conference (RE), pp. 512–515, September 2017
17.
Zurück zum Zitat Murukannaiah, P.K., Ajmeri, N., Singh, M.P.: Acquiring creative requirements from the crowd: understanding the influences of individual personality and creative potential in crowd RE. In: 24th IEEE International Requirements Engineering Conference (RE), pp. 176–185, September 2016 Murukannaiah, P.K., Ajmeri, N., Singh, M.P.: Acquiring creative requirements from the crowd: understanding the influences of individual personality and creative potential in crowd RE. In: 24th IEEE International Requirements Engineering Conference (RE), pp. 176–185, September 2016
19.
Zurück zum Zitat Pohl, K.: Requirements Engineering - Fundamentals, Principles, and Techniques, 1st edn. Springer, Heidelberg (2010) Pohl, K.: Requirements Engineering - Fundamentals, Principles, and Techniques, 1st edn. Springer, Heidelberg (2010)
21.
22.
Zurück zum Zitat Romero, F.P., Olivas, J.A., Genero, M., Piattini, M.: Automatic extraction of the main terminology used in empirical software engineering through text mining techniques. In: Proceedings of the Second ACM-IEEE International Symposium on Empirical Software Engineering and Measurement. ESEM 2008, pp. 357–358 (2008) Romero, F.P., Olivas, J.A., Genero, M., Piattini, M.: Automatic extraction of the main terminology used in empirical software engineering through text mining techniques. In: Proceedings of the Second ACM-IEEE International Symposium on Empirical Software Engineering and Measurement. ESEM 2008, pp. 357–358 (2008)
Metadaten
Titel
Automatic Word Embeddings-Based Glossary Term Extraction from Large-Sized Software Requirements
verfasst von
Siba Mishra
Arpit Sharma
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-44429-7_15