Skip to main content
Erschienen in: Artificial Intelligence Review 9/2023

22.02.2023

Impact of word embedding models on text analytics in deep learning environment: a review

verfasst von: Deepak Suresh Asudani, Naresh Kumar Nagwani, Pradeep Singh

Erschienen in: Artificial Intelligence Review | Ausgabe 9/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The selection of word embedding and deep learning models for better outcomes is vital. Word embeddings are an n-dimensional distributed representation of a text that attempts to capture the meanings of the words. Deep learning models utilize multiple computing layers to learn hierarchical representations of data. The word embedding technique represented by deep learning has received much attention. It is used in various natural language processing (NLP) applications, such as text classification, sentiment analysis, named entity recognition, topic modeling, etc. This paper reviews the representative methods of the most prominent word embedding and deep learning models. It presents an overview of recent research trends in NLP and a detailed understanding of how to use these models to achieve efficient results on text analytics tasks. The review summarizes, contrasts, and compares numerous word embedding and deep learning models and includes a list of prominent datasets, tools, APIs, and popular publications. A reference for selecting a suitable word embedding and deep learning approach is presented based on a comparative analysis of different techniques to perform text analytics tasks. This paper can serve as a quick reference for learning the basics, benefits, and challenges of various word representation approaches and deep learning models, with their application to text analytics and a future outlook on research. It can be concluded from the findings of this study that domain-specific word embedding and the long short term memory model can be employed to improve overall text analytics task performance.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pre-training Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pre-training
Zurück zum Zitat Budhkar A, Vishnubhotla K, Hossain S, Rudzicz F (2019) Generative adversarial networks for text using word2vec intermediaries. In: Proc 4th Work Represent Learn NLP, Assoc Comput Linguist RepL4NLP-ACL-2019, pp 15–26. https://doi.org/10.18653/v1/W19-4303 Budhkar A, Vishnubhotla K, Hossain S, Rudzicz F (2019) Generative adversarial networks for text using word2vec intermediaries. In: Proc 4th Work Represent Learn NLP, Assoc Comput Linguist RepL4NLP-ACL-2019, pp 15–26. https://​doi.​org/​10.​18653/​v1/​W19-4303
Zurück zum Zitat Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL HLT Conf North Am Chapter Assoc Comput Linguist Hum Lang Technol, vol 1, pp 4171–4186. https://doi.org/10.18653/v1/N19-1423 Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL HLT Conf North Am Chapter Assoc Comput Linguist Hum Lang Technol, vol 1, pp 4171–4186. https://​doi.​org/​10.​18653/​v1/​N19-1423
Zurück zum Zitat Dhar A, Mukherjee H, Sekhar N, Kaushik D (2020) Text categorization : past and present. Springer, Amsterdam Dhar A, Mukherjee H, Sekhar N, Kaushik D (2020) Text categorization : past and present. Springer, Amsterdam
Zurück zum Zitat Döbrössy B, Makrai M, Tarján B, Szaszák G (2019) Investigating sub-word embedding strategies for the morphologically rich and free phrase-order Hungarian. In: Proc 4th Work Represent Learn NLP, Assoc Comput Linguist RepL4NLP-ACL-2019, pp 187–193. https://doi.org/10.18653/v1/w19-4321 Döbrössy B, Makrai M, Tarján B, Szaszák G (2019) Investigating sub-word embedding strategies for the morphologically rich and free phrase-order Hungarian. In: Proc 4th Work Represent Learn NLP, Assoc Comput Linguist RepL4NLP-ACL-2019, pp 187–193. https://​doi.​org/​10.​18653/​v1/​w19-4321
Zurück zum Zitat Du C, Sun H, Wang J, et al (2019) Investigating capsule network and semantic feature on hyperplanes for text classification. In: Proc 2019—Conf Empir Methods Nat Lang Process 9th Int Jt Conf Nat Lang Process (EMNLP-IJCNLP-ACL), Assoc Comput Linguist, pp 456–465. https://doi.org/10.18653/v1/d19-1043 Du C, Sun H, Wang J, et al (2019) Investigating capsule network and semantic feature on hyperplanes for text classification. In: Proc 2019—Conf Empir Methods Nat Lang Process 9th Int Jt Conf Nat Lang Process (EMNLP-IJCNLP-ACL), Assoc Comput Linguist, pp 456–465. https://​doi.​org/​10.​18653/​v1/​d19-1043
Zurück zum Zitat Ebadulla D, Raman R, Shetty HK, Mamatha HR (2021) A comparative study on language models for the Kannada language. In : Proc 4th Int Conf Nat Lang Speech Process Assoc Comput Linguist ICNLSP-ACL-2021, pp 280–284 Ebadulla D, Raman R, Shetty HK, Mamatha HR (2021) A comparative study on language models for the Kannada language. In : Proc 4th Int Conf Nat Lang Speech Process Assoc Comput Linguist ICNLSP-ACL-2021, pp 280–284
Zurück zum Zitat Ekaterina Vylomova NH (2021) Semantic changes in harm-related concepts in English. Language Science Press, Berlin Ekaterina Vylomova NH (2021) Semantic changes in harm-related concepts in English. Language Science Press, Berlin
Zurück zum Zitat Elsafoury F, Wilson SR, Katsigiannis S, Ramzan N (2022) SOS: systematic offensive stereotyping bias in word embeddings. In: Proc 29th Int Conf Comput Linguist COLING 1263–1274 Elsafoury F, Wilson SR, Katsigiannis S, Ramzan N (2022) SOS: systematic offensive stereotyping bias in word embeddings. In: Proc 29th Int Conf Comput Linguist COLING 1263–1274
Zurück zum Zitat Firth JR (1957) Studies in linguistic analysis. Blackwell, Oxford Firth JR (1957) Studies in linguistic analysis. Blackwell, Oxford
Zurück zum Zitat Grishman R, Sundheim BM (1996) Message Understanding Conference—6: A Brief History. In: The 16th International Conference on Computational Linguistics. COLING 1996, pp 466–471 Grishman R, Sundheim BM (1996) Message Understanding Conference—6: A Brief History. In: The 16th International Conference on Computational Linguistics. COLING 1996, pp 466–471
Zurück zum Zitat Kitchenham B (2004) Procedures for performing systematic reviews, version 1.0. Empir Softw Eng 33:1–26 Kitchenham B (2004) Procedures for performing systematic reviews, version 1.0. Empir Softw Eng 33:1–26
Zurück zum Zitat Lippincott T, Shapiro P, Duh K, McNamee P (2019) JHU system description for the MADAR Arabic dialect identification shared task. In: Proc Fourth Arab Nat Lang Process Work Assoc Comput Linguist ANLP-ACL-2019, pp 264–268. https://doi.org/10.18653/v1/w19-4634 Lippincott T, Shapiro P, Duh K, McNamee P (2019) JHU system description for the MADAR Arabic dialect identification shared task. In: Proc Fourth Arab Nat Lang Process Work Assoc Comput Linguist ANLP-ACL-2019, pp 264–268. https://​doi.​org/​10.​18653/​v1/​w19-4634
Zurück zum Zitat Moreo A, Esuli A, Sebastiani F (2021) Word-class embeddings for multiclass text classification. Springer, New YorkCrossRefMATH Moreo A, Esuli A, Sebastiani F (2021) Word-class embeddings for multiclass text classification. Springer, New YorkCrossRefMATH
Zurück zum Zitat Mulki H, Haddad H, Gridach M, Babaoǧlu I (2019) Syntax-ignorant N-gram embeddings for sentiment analysis of Arabic dialects. In: Proc Fourth Arab Nat Lang Process Work Assoc Comput Linguist ANLP-ACL-2019, pp 30–39. https://doi.org/10.18653/v1/w19-4604 Mulki H, Haddad H, Gridach M, Babaoǧlu I (2019) Syntax-ignorant N-gram embeddings for sentiment analysis of Arabic dialects. In: Proc Fourth Arab Nat Lang Process Work Assoc Comput Linguist ANLP-ACL-2019, pp 30–39. https://​doi.​org/​10.​18653/​v1/​w19-4604
Zurück zum Zitat Parikh P, Abburi H, Badjatiya P, et al (2019) Multi-label categorization of accounts of sexism using a neural framework. In: Proc 2019 - Conf Empir Methods Nat Lang Process 9th Int Jt Conf Nat Lang Process Assoc Comput Linguist EMNLP-IJCNLP-ACL 1642–1652. https://doi.org/10.18653/v1/d19-1174 Parikh P, Abburi H, Badjatiya P, et al (2019) Multi-label categorization of accounts of sexism using a neural framework. In: Proc 2019 - Conf Empir Methods Nat Lang Process 9th Int Jt Conf Nat Lang Process Assoc Comput Linguist EMNLP-IJCNLP-ACL 1642–1652. https://​doi.​org/​10.​18653/​v1/​d19-1174
Zurück zum Zitat Radford A, Wu J, Child R, et al (2019) Language models are unsupervised multitask learners. 1:OpenAI blog Radford A, Wu J, Child R, et al (2019) Language models are unsupervised multitask learners. 1:OpenAI blog
Zurück zum Zitat See A (2019) Natural language processing with deep learning: natural language generation. 2022:1–39 See A (2019) Natural language processing with deep learning: natural language generation. 2022:1–39
Zurück zum Zitat Sun Z, Sarma PK, Sethares WA, Liang Y (2020b) Learning relationships between text, audio, and video via deep canonical correlation for multimodal language analysis. Assoc Adv Artif Intell (AAAI 2020b)—34th AAAI Conf Artif Intell 8992–8999. https://doi.org/10.1609/aaai.v34i05.6431 Sun Z, Sarma PK, Sethares WA, Liang Y (2020b) Learning relationships between text, audio, and video via deep canonical correlation for multimodal language analysis. Assoc Adv Artif Intell (AAAI 2020b)—34th AAAI Conf Artif Intell 8992–8999. https://​doi.​org/​10.​1609/​aaai.​v34i05.​6431
Zurück zum Zitat Wu L, Cui P, Pei J, Zhao L (2022) Graph neural networks: foundations, frontiers, and applications. Springer, SingaporeCrossRefMATH Wu L, Cui P, Pei J, Zhao L (2022) Graph neural networks: foundations, frontiers, and applications. Springer, SingaporeCrossRefMATH
Zurück zum Zitat Yang R, Wu F, Zhang C, Zhang L (2021b) iEnhancer-GAN: A Deep Learning Framework in Combination with Word Embedding and Sequence Generative Adversarial Net to Identify Enhancers and Their Strength. Int J Mol Sci 22:. https://doi.org/10.3390/ijms22073589 Yang R, Wu F, Zhang C, Zhang L (2021b) iEnhancer-GAN: A Deep Learning Framework in Combination with Word Embedding and Sequence Generative Adversarial Net to Identify Enhancers and Their Strength. Int J Mol Sci 22:. https://​doi.​org/​10.​3390/​ijms22073589
Metadaten
Titel
Impact of word embedding models on text analytics in deep learning environment: a review
verfasst von
Deepak Suresh Asudani
Naresh Kumar Nagwani
Pradeep Singh
Publikationsdatum
22.02.2023
Verlag
Springer Netherlands
Erschienen in
Artificial Intelligence Review / Ausgabe 9/2023
Print ISSN: 0269-2821
Elektronische ISSN: 1573-7462
DOI
https://doi.org/10.1007/s10462-023-10419-1

Weitere Artikel der Ausgabe 9/2023

Artificial Intelligence Review 9/2023 Zur Ausgabe

Premium Partner