Skip to main content
Top

2020 | OriginalPaper | Chapter

Deep Learning Models for Representing Out-of-Vocabulary Words

Authors : Johannes V. Lochter, Renato M. Silva, Tiago A. Almeida

Published in: Intelligent Systems

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Communication has become increasingly dynamic with the popularization of social networks and applications that allow people to express themselves and communicate instantly. In this scenario, distributed representation models have their quality impacted by new words that appear frequently or that are derived from spelling errors. These words that are unknown by the models, known as out-of-vocabulary (OOV) words, need to be properly handled to not degrade the quality of the natural language processing (NLP) applications, which depend on the appropriate vector representation of the texts. To better understand this problem and finding the best techniques to handle OOV words, in this study, we present a comprehensive performance evaluation of deep learning models for representing OOV words. We performed an intrinsic evaluation using a benchmark dataset and an extrinsic evaluation using different NLP tasks: text categorization, named entity recognition, and part-of-speech tagging. Although the results indicated that the best technique for handling OOV words is different for each task, Comick, a deep learning method that infers the embedding based on the context and the morphological structure of the OOV word, obtained promising results.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
In this study, we use the word sample to denote instance or text document.
 
2
Transformers. Available at https://​huggingface.​co/​transformers, Accessed on 2020/10/07 14:45:55.
 
3
HiCE. Available at https://​github.​com/​acbull/​HiCE, Accessed on 2020/10/07 14:45:55.
 
4
PyTorch Github. Available at https://​bit.​ly/​2B7LS3U, Accessed on 2020/10/07 14:45:55.
 
5
HiCE. Available at https://​github.​com/​acbull/​HiCE, Accessed on 2020/10/07 14:45:55.
 
6
Keras. Available at https://​keras.​io/​. Accessed on 2020/10/07 14:45:55.
 
7
TensorFlow. Available at https://​www.​tensorflow.​org/​. Accessed on 2020/10/07 14:45:55.
 
8
NER dataset with unusual and unseen entities in the context of emerging discussions.
 
9
NER dataset with sentences that have technical terms in the biology domain.
 
10
POS tagging dataset of Twitter messages.
 
Literature
1.
go back to reference Adams, O., Makarucha, A., Neubig, G., Bird, S., Cohn, T.: Cross-lingual word embeddings for low-resource language modeling. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pp. 937–947. Association for Computational Linguistics, Valencia, April 2017 Adams, O., Makarucha, A., Neubig, G., Bird, S., Cohn, T.: Cross-lingual word embeddings for low-resource language modeling. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pp. 937–947. Association for Computational Linguistics, Valencia, April 2017
2.
go back to reference Agarwal, A., Xie, B., Vovsha, I., Rambow, O., Passonneau, R.: Sentiment analysis of twitter data. In: Proceedings of the Workshop on Languages in Social Media (LSM 2011), pp. 30–38. Association for Computational Linguistics, Portland, June 2011 Agarwal, A., Xie, B., Vovsha, I., Rambow, O., Passonneau, R.: Sentiment analysis of twitter data. In: Proceedings of the Workshop on Languages in Social Media (LSM 2011), pp. 30–38. Association for Computational Linguistics, Portland, June 2011
3.
go back to reference Baziotis, C., Pelekis, N., Doulkeridis, C.: DataStories at SemEval-2017 task 4: deep LSTM with attention for message-level and topic-based sentiment analysis. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp. 747–754. Association for Computational Linguistics, Vancouver, August 2017. https://doi.org/10.18653/v1/S17-2126 Baziotis, C., Pelekis, N., Doulkeridis, C.: DataStories at SemEval-2017 task 4: deep LSTM with attention for message-level and topic-based sentiment analysis. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp. 747–754. Association for Computational Linguistics, Vancouver, August 2017. https://​doi.​org/​10.​18653/​v1/​S17-2126
4.
go back to reference Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)CrossRef Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)CrossRef
5.
go back to reference Cho, K., et al.: Learning phrase representations using rnn encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734. Association for Computational Linguistics, Doha, October 2014 Cho, K., et al.: Learning phrase representations using rnn encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734. Association for Computational Linguistics, Doha, October 2014
6.
7.
go back to reference Clark, K., Luong, M.T., Le, Q.V., Manning, C.D.: Electra: pre-training text encoders as discriminators rather than generators. In: International Conference on Learning Representations (2020) Clark, K., Luong, M.T., Le, Q.V., Manning, C.D.: Electra: pre-training text encoders as discriminators rather than generators. In: International Conference on Learning Representations (2020)
8.
go back to reference Derczynski, L., Nichols, E., van Erp, M., Limsopatham, N.: Results of the WNUT2017 shared task on novel and emerging entity recognition. In: Proceedings of the 3rd Workshop on Noisy User-generated Text, pp. 140–147. Association for Computational Linguistics, Copenhagen, September 2017. https://doi.org/10.18653/v1/W17-4418 Derczynski, L., Nichols, E., van Erp, M., Limsopatham, N.: Results of the WNUT2017 shared task on novel and emerging entity recognition. In: Proceedings of the 3rd Workshop on Noisy User-generated Text, pp. 140–147. Association for Computational Linguistics, Copenhagen, September 2017. https://​doi.​org/​10.​18653/​v1/​W17-4418
10.
go back to reference Garneau, N., Leboeuf, J.S., Lamontagne, L.: Predicting and interpreting embeddings for out of vocabulary words in downstream tasks. In: Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pp. 331–333. Association for Computational Linguistics, Brussels, November 2018. https://doi.org/10.18653/v1/W18-5439 Garneau, N., Leboeuf, J.S., Lamontagne, L.: Predicting and interpreting embeddings for out of vocabulary words in downstream tasks. In: Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pp. 331–333. Association for Computational Linguistics, Brussels, November 2018. https://​doi.​org/​10.​18653/​v1/​W18-5439
11.
go back to reference Hu, Z., Chen, T., Chang, K.W., Sun, Y.: Few-shot representation learning for out-of-vocabulary words. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 4102–4112. Association for Computational Linguistics, Florence, July 2019. https://doi.org/10.18653/v1/P19-1402 Hu, Z., Chen, T., Chang, K.W., Sun, Y.: Few-shot representation learning for out-of-vocabulary words. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 4102–4112. Association for Computational Linguistics, Florence, July 2019. https://​doi.​org/​10.​18653/​v1/​P19-1402
12.
go back to reference Khodak, M., Saunshi, N., Liang, Y., Ma, T., Stewart, B., Arora, S.: A la carte embedding: cheap but effective induction of semantic feature vectors. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 12–22. Association for Computational Linguistics, Melbourne, July 2018. https://doi.org/10.18653/v1/P18-1002 Khodak, M., Saunshi, N., Liang, Y., Ma, T., Stewart, B., Arora, S.: A la carte embedding: cheap but effective induction of semantic feature vectors. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 12–22. Association for Computational Linguistics, Melbourne, July 2018. https://​doi.​org/​10.​18653/​v1/​P18-1002
13.
go back to reference Kim, J.D., Ohta, T., Tsuruoka, Y., Tateisi, Y., Collier, N.: Introduction to the bio-entity recognition task at JNLPBA. In: Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications. JNLPBA 2004, pp. 70–75. Association for Computational Linguistics, Stroudsburg (2004) Kim, J.D., Ohta, T., Tsuruoka, Y., Tateisi, Y., Collier, N.: Introduction to the bio-entity recognition task at JNLPBA. In: Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications. JNLPBA 2004, pp. 70–75. Association for Computational Linguistics, Stroudsburg (2004)
14.
go back to reference Lazaridou, A., Marelli, M., Zamparelli, R., Baroni, M.: Compositional-ly derived representations of morphologically complex words in distributional semantics. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1517–1526. Association for Computational Linguistics, Sofia, August 2013 Lazaridou, A., Marelli, M., Zamparelli, R., Baroni, M.: Compositional-ly derived representations of morphologically complex words in distributional semantics. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1517–1526. Association for Computational Linguistics, Sofia, August 2013
15.
go back to reference Liu, Y., et al.: Roberta: a robustly optimized BERT pretraining approach. CoRR abs/1907.11692 (2019) Liu, Y., et al.: Roberta: a robustly optimized BERT pretraining approach. CoRR abs/1907.11692 (2019)
16.
go back to reference Lochter, J., Pires, P., Bossolani, C., Yamakami, A., Almeida, T.: Evaluating the impact of corpora used to train distributed text representation models for noisy and short texts. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 315–322, July 2018 Lochter, J., Pires, P., Bossolani, C., Yamakami, A., Almeida, T.: Evaluating the impact of corpora used to train distributed text representation models for noisy and short texts. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 315–322, July 2018
17.
go back to reference Lochter, J., Zanetti, R., Reller, D., Almeida, T.: Short text opinion detection using ensemble of classifiers and semantic indexing. Expert Syst. Appl. 62, 243–249 (2016)CrossRef Lochter, J., Zanetti, R., Reller, D., Almeida, T.: Short text opinion detection using ensemble of classifiers and semantic indexing. Expert Syst. Appl. 62, 243–249 (2016)CrossRef
18.
go back to reference Mikolov, T., Karafiát, M., Burget, L., Cernocký, J., Khudanpur, S.: Recurrent neural network based language model. In: INTERSPEECH, pp. 1045–1048 (2010) Mikolov, T., Karafiát, M., Burget, L., Cernocký, J., Khudanpur, S.: Recurrent neural network based language model. In: INTERSPEECH, pp. 1045–1048 (2010)
19.
go back to reference Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems (NIPS 2013), pp. 3111–3119. Curran Associates Inc., Lake Tahoe (2013) Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems (NIPS 2013), pp. 3111–3119. Curran Associates Inc., Lake Tahoe (2013)
20.
go back to reference Ouyang, X., Zhou, P., Li, C.H., Liu, L.: Sentiment analysis using convolutional neural network. In: 2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing, pp. 2359–2364 (2015) Ouyang, X., Zhou, P., Li, C.H., Liu, L.: Sentiment analysis using convolutional neural network. In: 2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing, pp. 2359–2364 (2015)
22.
go back to reference Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners (2019) Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners (2019)
23.
go back to reference Ritter, A., Clark, S., Mausam, Etzioni, O.: Named entity recognition in tweets: an experimental study. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 1524–1534. Association for Computational Linguistics, Edinburgh, July 2011 Ritter, A., Clark, S., Mausam, Etzioni, O.: Named entity recognition in tweets: an experimental study. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 1524–1534. Association for Computational Linguistics, Edinburgh, July 2011
24.
go back to reference Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter (2019) Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter (2019)
25.
26.
go back to reference Speriosu, M., Sudan, N., Upadhyay, S., Baldridge, J.: Twitter polarity classification with label propagation over lexical links and the follower graph. In: Proceedings of the First Workshop on Unsupervised Learning in NLP (EMNLP 2011), pp. 53–63. Association for Computational Linguistics, Edinburgh (2011) Speriosu, M., Sudan, N., Upadhyay, S., Baldridge, J.: Twitter polarity classification with label propagation over lexical links and the follower graph. In: Proceedings of the First Workshop on Unsupervised Learning in NLP (EMNLP 2011), pp. 53–63. Association for Computational Linguistics, Edinburgh (2011)
27.
go back to reference Sundermeyer, M., Schlüter, R., Ney, H.: LSTM neural networks for language modeling. In: INTERSPEECH (2012) Sundermeyer, M., Schlüter, R., Ney, H.: LSTM neural networks for language modeling. In: INTERSPEECH (2012)
29.
go back to reference Vaswani, A., et al.: Attention is all you need. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems 30, pp. 5998–6008. Curran Associates, Inc. (2017) Vaswani, A., et al.: Attention is all you need. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems 30, pp. 5998–6008. Curran Associates, Inc. (2017)
30.
go back to reference Yang, J., Leskovec, J.: Patterns of temporal variation in online media. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining. WSDM 2011, pp. 177–186. ACM, New York (2011) Yang, J., Leskovec, J.: Patterns of temporal variation in online media. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining. WSDM 2011, pp. 177–186. ACM, New York (2011)
Metadata
Title
Deep Learning Models for Representing Out-of-Vocabulary Words
Authors
Johannes V. Lochter
Renato M. Silva
Tiago A. Almeida
Copyright Year
2020
DOI
https://doi.org/10.1007/978-3-030-61377-8_29

Premium Partner