Top

International Journal of Machine Learning and Cybernetics

Published in:

17-02-2020 | Original Article

From static to dynamic word representations: a survey

Authors: Yuxuan Wang, Yutai Hou, Wanxiang Che, Ting Liu

Published in: International Journal of Machine Learning and Cybernetics | Issue 7/2020

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

In the history of natural language processing (NLP) development, the representation of words has always been a significant research topic. In this survey, we provide a comprehensive typology of word representation models from a novel perspective that the development from static to dynamic embeddings can effectively address the polysemy problem, which has been a great challenge in this field. Then the survey covers the main evaluation metrics and applications of these word embeddings. And, we further discuss the development of word embeddings from static to dynamic in cross-lingual scenario. Finally, we point out some open issues and future works.

previous article An ORESTE approach for multi-criteria decision-making with probabilistic hesitant fuzzy information

next article Fine-tuning of line and slope based on evolutionary mechanism

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

ATZelectronics worldwide

ATZlectronics worldwide is up-to-speed on new trends and developments in automotive electronics on a scientific level with a high depth of information.

Order your 30-days-trial for free and without any commitment.

inform now

ATZelektronik

Die Fachzeitschrift ATZelektronik bietet für Entwickler und Entscheider in der Automobil- und Zulieferindustrie qualitativ hochwertige und fundierte Informationen aus dem gesamten Spektrum der Pkw- und Nutzfahrzeug-Elektronik.

Lassen Sie sich jetzt unverbindlich 2 kostenlose Ausgabe zusenden.

inform now

These embeddings are contextualized or dynamic as opposed to the traditional ones.

https://wordnet.princeton.edu/.

Please refer to [27] for more detailed comparison and analysis of these distributional representation models.

We will use \({\varvec{C}}(w)\) to denote the distributed embedding of word w in the rest of this paper.

Please refer to the paper [87] for detailed results.

Please refer to the paper [90] for a detailed description of the pre-processing techniques.

Such functional tokens are also used by GPT but only introduced while finetuning.

Check https://github.com/thunlp/PLMpapers and https://github.com/cedrickchee/awesome-bert-nlp for latest progress of dynamic word representation.

Please refer to the paper [124] for implementation details.

Almuhareb A (2006) Attributes in lexical acquisition. PhD thesis, University of Essex

Artetxe M, Ruder S, Yogatama D (2019) On the cross-lingual transferability of monolingual representations. arXiv preprint arXiv:1910.11856

Bakarov A (2018) A survey of word embeddings evaluation methods. arXiv preprint arXiv:1801.09536

Baroni M, Evert S, Lenci A (2008) Bridging the gap between semantic theory and computational simulations. In: Proc. of the esslli workshop on distributional lexical semantic. FOLLI, Hamburg

Baroni M, Murphy B, Barbu E, Poesio M (2010) Strudel: a corpus-based semantic model based on properties and types. Cogn Sci 34:222–254CrossRef

Baroni M, Dinu G, Kruszewski G (2014) Don’t count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors. In: Proceedings of the 52nd annual meeting of the association for computational linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Baltimore, Maryland, pp 238–247. https://doi.org/10.3115/v1/P14-1023. https://www.aclweb.org/anthology/P14-1023

Bengio Y, Ducharme R, Vincent P, Janvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155MATH

Blei DM, Ng AY, Jordan MI, Lafferty J (2003) Latent dirichlet allocation. J Mach Learn Res 3:2003

Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146CrossRef

10.

Bowman SR, Angeli G, Potts C, Manning CD (2015) A large annotated corpus for learning natural language inference. arXiv preprint arXiv:1508.05326

11.

Brown PF, deSouza PV, Mercer RL, Pietra VJD, Lai JC (1992) Class-based n-gram models of natural language. Comput Linguist 18(4):467–479

12.

Bruni E, Tran NK, Baroni M (2014) Multimodal distributional semantics. J Artif Int Res pp 1–47

13.

Chen D, Manning C (2014) A fast and accurate dependency parser using neural networks. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, pp 740–750 https://doi.org/10.3115/v1/D14-1082. https://www.aclweb.org/anthology/D14-1082

14.

Chen X, Cardie C (2018) Unsupervised multilingual word embeddings. In: Proceedings of the 2018 conference on empirical methods in natural language processing. Association for Computational Linguistics, Brussels, Belgium, pp 261–270. https://doi.org/10.18653/v1/D18-1024. https://www.aclweb.org/anthology/D18-1024

15.

Chen X, Liu Z, Sun M (2014) A unified model for word sense representation and disambiguation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, pp 1025–1035. https://doi.org/10.3115/v1/D14-1110. https://www.aclweb.org/anthology/D14-1110

16.

Cho K, van Merriënboer B, Bahdanau D, Bengio Y (2014) On the properties of neural machine translation: encoder–decoder approaches. In: Proceedings of SSST-8, eighth workshop on syntax, semantics and structure in statistical translation. Association for Computational Linguistics, Doha, Qatar, pp 103–111.https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012

17.

Clark K, Luong MT, Manning CD, Le QV (2018) Semi-supervised sequence modeling with cross-view training. In: Proceedings of the 2018 conference on empirical methods in natural language processing. Association for Computational Linguistics, Brussels, Belgium, pp 1914–1925. https://doi.org/10.18653/v1/D18-1217. https://www.aclweb.org/anthology/D18-1217

18.

Clark K, Khandelwal U, Levy O, Manning CD (2019a) What does BERT look at? An analysis of BERT’s attention. In: Proceedings of the 2019 ACL Workshop BlackboxNLP: analyzing and interpreting neural networks for NLP. Association for Computational Linguistics, Florence, Italy, pp 276–286. https://doi.org/10.18653/v1/W19-4828. https://www.aclweb.org/anthology/W19-4828

19.

Clark K, Luong MT, Khandelwal U, Manning CD, Le QV (2019b) BAM! born-again multi-task networks for natural language understanding. In: Proc. of ACL, pp 5931–5937

20.

Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–2537MATH

21.

Conneau A, Khandelwal K, Goyal N, Chaudhary V, Wenzek G, Guzmán F, Grave E, Ott M, Zettlemoyer L, Stoyanov V (2019) Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116

22.

Cui Y, Che W, Liu T, Qin B, Yang Z, Wang S, Hu G (2019) Pre-training with whole word masking for chinese bert. arXiv preprint arXiv:1906.08101

23.

Dagan I, Pereira F, Lee L (1994) Similarity-based estimation of word cooccurrence probabilities. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, Las Cruces, New Mexico, USA, pp 272–278. https://doi.org/10.3115/981732.981770

24.

Dai Z, Yang Z, Yang Y, Cohen WW, Carbonell J, Le QV, Salakhutdinov R (2019) Transformer-xl: Attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860

25.

Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407CrossRef

26.

Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805

27.

Dinu G, Lapata M (2010) Measuring distributional similarity in context. In: Proc. of EMNLP, pp 1162–1172

28.

Eckart C, Young G (1936) The approximation of one matrix by another of lower rank. Psychometrika 1(3):211–218CrossRef

29.

Faruqui M, Dyer C (2014) Improving vector space word representations using multilingual correlation. In: Proceedings of the 14th conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, Gothenburg, Sweden, pp 462–471. https://doi.org/10.3115/v1/E14-1049. https://www.aclweb.org/anthology/E14-1049

30.

Fei-Fei L (2006) Knowledge transfer in learning to recognize visual objects classes. In: International Conference on Development and Learning. Department of Psychological and Brain Sciences, Indiana University, pp 1–8

31.

Fink M (2005) Object classification from a single example utilizing class relevance metrics. In: Saul LK, Weiss Y, Bottou L (eds) Advances in neural information processing systems. MIT Press, pp 449–456. http://papers.nips.cc/paper/2576-object-classification-from-a-single-example-utilizing-class-relevance-metrics.pdf

32.

Finkelstein L, Gabrilovich E, Matias Y, Rivlin E, Solan Z, Wolfman G, Ruppin E (2001) Placing search in context: the concept revisited. ACM Trans Inf Syst. https://doi.org/10.1145/503104.503110 CrossRef

33.

Firth JR (1957) A synopsis of linguistic theory 1930–1955. In: Studies in linguistic analysis (special volume of the Philological Society), vol 1952–1959. The Philological Society, Oxford, pp 1–32. https://www.bibsonomy.org/bibtex/25e3d6c72cdd123a638f71886d78f3c1e/brightbyte

34.

Gao B, Bian J, Liu TY (2014) Wordrep: A benchmark for research on learning word representations. arXiv preprint arXiv:1407.1640

35.

Gerz D, Vulić I, Hill F, Reichart R, Korhonen A (2016) SimVerb-3500: A large-scale evaluation set of verb similarity. In: Proceedings of the 2016 conference on empirical methods in natural language processing. Association for Computational Linguistics, Austin, Texas, pp 2173–2182. https://doi.org/10.18653/v1/D16-1235. https://www.aclweb.org/anthology/D16-1235

36.

Ghannay S, Favre B, Estève Y, Camelin N (2016) Word embedding evaluation and combination. In: Proceedings of the tenth international conference on language resources and evaluation (LREC'16). Portorož, Slovenia, pp 300–305 https://www.aclweb.org/anthology/L16-1046

37.

Gladkova A, Drozd A (2016) Intrinsic evaluations of word embeddings: what can we do better? In: Proceedings of the 1st workshop on evaluating vector-space representations for NLP. Association for Computational Linguistics, Berlin, Germany, pp 36–42. https://doi.org/10.18653/v1/W16-2507. https://www.aclweb.org/anthology/W16-2507

38.

Gladkova A, Drozd A, Matsuoka S (2016) Analogy-based detection of morphological and semantic relations with word embeddings: what works and what doesn’t. In: Proceedings of the NAACL student research workshop. Association for Computational Linguistics, San Diego, California, pp 8–15. https://doi.org/10.18653/v1/N16-2002. https://www.aclweb.org/anthology/N16-2002

39.

Golovanov S, Kurbanov R, Nikolenko S, Truskovskyi K, Tselousov A, Wolf T (2019) Large-scale transfer learning for natural language generation. In: Proceedings of the 57th annual meeting of the association for computational linguistics. Association for Computational Linguistics, Florence, Italy, pp 6053–6058. https://doi.org/10.18653/v1/P19-1608. https://www.aclweb.org/anthology/P19-1608

40.

Greenberg C, Demberg V, Sayeed A (2015) Verb polysemy and frequency effects in thematic fit modeling. In: Proceedings of the 6th workshop on cognitive modeling and computational linguistics. Association for Computational Linguistics, Denver, Colorado, pp 48–57. https://doi.org/10.3115/v1/W15-1106. https://www.aclweb.org/anthology/W15-1106

41.

Guo J, Che W, Wang H, Liu T (2014) Learning sense-specific word embeddings by exploiting bilingual resources. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics: technical papers. Dublin City University and Association for Computational Linguistics, Dublin, Ireland, pp 497–507. https://www.aclweb.org/anthology/C14-1048

42.

Guo J, Che W, Yarowsky D, Wang H, Liu T (2015) Cross-lingual dependency parsing based on distributed representations. In: Proc. of ACL and IJCNLP, pp 1234–1244

43.

Guo J, Che W, Yarowsky D, Wang H, Liu T (2016a) A distributed representation-based framework for cross-lingual transfer parsing. J Artif Int Res 55(1):995–1023MathSciNet

44.

Guo J, Che W, Yarowsky D, Wang H, Liu T (2016b) A representation learning framework for multi-source transfer parsing. In: Proceedings of the thirtieth AAAI conference on artificial intelligence, AAAI’16. AAAI Press, Phoenix, Arizona, pp 2734–2740.

45.

Hermann KM, Blunsom P (2014) Multilingual models for compositional distributed semantics. In: Proc. of ACL, pp 58–68

46.

Hewitt J, Manning CD (2019) A structural probe for finding syntax in word representations. In: Proc. of NAACL, pp 4129–4138, https://doi.org/10.18653/v1/N19-1419

47.

Heyman G, Verreet B, Vulić I, Moens MF (2019) Learning unsupervised multilingual word embeddings with incremental multilingual hubs. In: Proceedings of the 2019 conference of the north American chapter of the association for computational linguistics: human language technologies, vol 1 (long and short papers). Association for Computational Linguistics, Minneapolis, Minnesota, pp 1890–1902. https://doi.org/10.18653/v1/N19-1188. https://www.aclweb.org/anthology/N19-1188

48.

Hou Y, Zhou Z, Liu Y, Wang N, Che W, Liu H, Liu T (2019) Few-shot sequence labeling with label dependency transfer. arXiv preprint arXiv:1906.08711

49.

Howard J, Ruder S (2018) Universal language model fine-tuning for text classification. In: Proceedings of the 56th annual meeting of the association for computational linguistics (vol 1: long papers), Association for Computational Linguistics, Melbourne, Australia, pp 328–339. https://doi.org/10.18653/v1/P18-1031. https://www.aclweb.org/anthology/P18-1031

50.

Huang E, Socher R, Manning C, Ng A (2012) Improving word representations via global context and multiple word prototypes. In: Proceedings of the 56th annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 873–882

51.

Huang F, Yates A (2009) Distributional representations for handling sparsity in supervised sequence-labeling. In: Proc. of ACL and IJCNLP, pp 495–503

52.

Iacobacci I, Pilehvar MT, Navigli R (2016) Embeddings for word sense disambiguation: an evaluation study. In: Proc. of ACL, pp 897–907

53.

Jarmasz M, Szpakowicz S (2003) Roget’s thesaurus and semantic similarity. In: Proc. of RANLP, pp 212–219

54.

Joshi M, Chen D, Liu Y, Weld DS, Zettlemoyer L, Levy O (2019) Spanbert: improving pre-training by representing and predicting spans. arXiv preprint arXiv:1907.10529

55.

Joulin A, Grave E, Bojanowski P, Mikolov T (2017) Bag of tricks for efficient text classification. In: Proceedings of the 15th conference of the European chapter of the association for computational linguistics, vol 2, short papers. Association for Computational Linguistics, Valencia, Spain, pp 427–431. https://www.aclweb.org/anthology/E17-2068

56.

Klementiev A, Titov I, Bhattarai B (2012) Inducing crosslingual distributed representations of words. In: Proceedings of COLING 2012. The COLING 2012 Organizing Committee, Mumbai, India, pp 1459–1474. https://www.aclweb.org/anthology/C12-1089

57.

Kočiský T, Hermann KM, Blunsom P (2014) Learning bilingual word representations by marginalizing alignments. In: Proceedings of the 52nd annual meeting of the association for computational linguistics, vol 2 short papers. Association for Computational Linguistics, Baltimore, Maryland, pp 224–229. https://doi.org/10.3115/v1/P14-2037. https://www.aclweb.org/anthology/P14-2037

58.

Lample G, Conneau A (2019) Cross-lingual language model pretraining. arXiv preprint arXiv:1901.07291

59.

Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C (2016) Neural architectures for named entity recognition. In: Proc. of NAACL, pp 260–270

60.

Lample G, Conneau A, Ranzato M, Denoyer L, Jégou H (2018) Word translation without parallel data. In: Proc. of ICLR

61.

Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R (2019) Albert: a lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942

62.

Landauer TK, Dutnais ST (1997) A solution to plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol Rev 211–240

63.

Lazaridou A, Dinu G, Baroni M (2015) Hubness and pollution: Delving into cross-space mapping for zero-shot learning. In: Proc. of ACL and IJCNLP, pp 270–280

64.

Liu NF, Gardner M, Belinkov Y, Peters ME, Smith NA (2019a) Linguistic knowledge and transferability of contextual representations. In: Proc. of NAACL, pp 1073–1094, https://doi.org/10.18653/v1/N19-1112

65.

Liu W, Zhou P, Zhao Z, Wang Z, Ju Q, Deng H, Wang P (2019b) K-bert: Enabling language representation with knowledge graph. arXiv preprint arXiv:1909.07606

66.

Liu X, He P, Chen W, Gao J (2019c) Multi-task deep neural networks for natural language understanding. In: Proceedings of the 57th annual meeting of the association for computational linguistics. Association for Computational Linguistics, Florence, Italy, pp 4487–4496. https://doi.org/10.18653/v1/P19-1441. https://www.aclweb.org/anthology/P19-1441

67.

Lu A, Wang W, Bansal M, Gimpel K, Livescu K (2015) Deep multilingual correlation for improved word embeddings. In: Proceedings of the 2015 conference of the north american chapter of the association for computational linguistics: human language technologies. Association for Computational Linguistics, Denver, Colorado, pp 250–256. https://doi.org/10.3115/v1/N15-1028. https://www.aclweb.org/anthology/N15-1028

68.

Lu J, Batra D, Parikh D, Lee S (2019) Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. In: Wallach H, Larochelle H, Beygelzimer A, d' Alché-Buc F, Fox E, Garnett R (eds) Advances in Neural Information Processing Systems. Curran Associates, Inc., pp 13–23. http://papers.nips.cc/paper/8297-vilbert-pretraining-task-agnostic-visiolinguistic-representations-for-vision-and-language-tasks.pdf

69.

Luong T, Pham H, Manning CD (2015) Bilingual word representations with monolingual quality in mind. In: Proceedings of the 1st workshop on vector space modeling for natural language processing. Association for Computational Linguistics, Denver, Colorado, pp 151–159. https://doi.org/10.3115/v1/W15-1521. https://www.aclweb.org/anthology/W15-1521

70.

McCallum A, Freitag D, Pereira FCN (2000) Maximum entropy markov models for information extraction and segmentation. In: Proceedings of the seventeenth international conference on machine learning, ICML ’00. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 591–598

71.

McCann B, Bradbury J, Xiong C, Socher R (2017) Learned in translation: contextualized word vectors. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural Information processing systems 30. Curran Associates, Inc., pp 6294–6305. http://papers.nips.cc/paper/7209-learned-in-translation-contextualized-word-vectors.pdf

72.

McRae K, Ferretti TR, Amyote L (1997) Thematic roles as verb-specific concepts. Lang Cogn Process 12(2–3):137–176CrossRef

73.

Mikolov T, Karafiát M, Burget L, Cernocky J, Khudanpur S (2010) Recurrent neural network based language model. In: Kobayashi T, Hirose K, Nakamura S (eds) INTERSPEECH. ISCA, pp 1045–1048. https://www.bibsonomy.org/bibtex/2aee1e280d06e82474b17c4996aaea076/dblp

74.

Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations in vector space. ICLR Workshop

75.

Mikolov T, Le QV, Sutskever I (2013b) Exploiting similarities among languages for machine translation. arXiv preprint arXiv:1309.4168

76.

Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013c) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119

77.

Mikolov T, Yih Wt, Zweig G (2013d) Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 conference of the north American chapter of the association for computational linguistics: human language technologies. Association for Computational Linguistics, Atlanta, Georgia, pp 746–751. https://www.aclweb.org/anthology/N13-1090

78.

Miller GA (1995) Wordnet: A lexical database for english. Commun ACM 39–41

79.

Mnih A, Hinton G (2007) Three new graphical models for statistical language modelling. In: Proceedings of the 24th international conference on machine learning, ICML ’07. Association for Computing Machinery, Corvalis, Oregon, USA, pp 641–648. https://doi.org/10.1145/1273496.1273577

80.

Mnih A, Hinton GE (2009) A scalable hierarchical distributed language model. Adv Neural Inf Process Syst 21:1081–1088

81.

Mulcaire P, Kasai J, Smith N (2019a) Polyglot contextual representations improve crosslingual transfer. In: Proceedings of the 2019 conference of the north American chapter of the association for computational linguistics: human language technologies, vol 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, pp 3912–3918. https://doi.org/10.18653/v1/N19-1392. https://www.aclweb.org/anthology/N19-1392

82.

Mulcaire P, Kasai J, Smith NA (2019b) Low-resource parsing with crosslingual contextualized representations. In: Proceedings of the 23rd conference on computational natural language learning (CoNLL), Association for Computational Linguistics, Hong Kong, China, pp 304–315. https://doi.org/10.18653/v1/K19-1029. https://www.aclweb.org/anthology/K19-1029

83.

Neelakantan A, Shankar J, Passos A, McCallum A (2014) Efficient non-parametric estimation of multiple embeddings per word in vector space. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, pp 1059–1069. https://doi.org/10.3115/v1/D14-1113. https://www.aclweb.org/anthology/D14-1113

84.

Niven T, Kao H (2019) Probing neural network comprehension of natural language arguments. In: Proceedings of the 57th annual meeting of the association for computational linguistics. Association for Computational Linguistics, Florence, Italy, pp 4658–4664. https://doi.org/10.18653/v1/P19-1459. https://www.aclweb.org/anthology/P19-1459

85.

Padó S, Lapata M (2007) Dependency-based construction of semantic space models. Comput Linguist 33(2):161–199CrossRef

86.

Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, pp 1532–1543. https://doi.org/10.3115/v1/D14-1162. https://www.aclweb.org/anthology/D14-1162

87.

Peters M, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. In: Proceedings of the 2018 conference of the north American chapter of the association for computational linguistics: human language technologies, vol 1 (Long Papers). Association for Computational Linguistics, New Orleans, Louisiana, pp 2227–2237. https://doi.org/10.18653/v1/N18-1202. https://www.aclweb.org/anthology/N18-1202

88.

Peters ME, Neumann M, Logan R, Schwartz R, Joshi V, Singh S, Smith NA (2019) Knowledge enhanced contextual word representations. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, pp 43–54. https://doi.org/10.18653/v1/D19-1005 https://www.aclweb.org/anthology/D19-1005

89.

Pires T, Schlinger E, Garrette D (2019) How multilingual is multilingual bert? arXiv preprint arXiv:1906.01502

90.

Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pre-training. https://s3-us-west-2amazonaws.com/openai-assets/research-covers/languageunsupervised/language/understanding/paper/pdf

91.

Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I (2019) Language models are unsupervised multitask learners. OpenAI Blog 1(8)

92.

Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ (2019) Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683

93.

Rajpurkar P, Zhang J, Lopyrev K, Liang P (2016) Squad: 100,000+ questions for machine comprehension of text. In: Proc. of EMNLP, pp 2383–2392

94.

Rajpurkar P, Jia R, Liang P (2018) Know what you don’t know: Unanswerable questions for squad. arXiv preprint arXiv:1806.03822

95.

Reisinger J, Mooney RJ (2010) Multi-prototype vector-space models of word meaning. In: Proc. of HLT-NAACL, pp 109–117

96.

Ruder S, Vulic I, Søgaard A (2017) A survey of cross-lingual embedding models. arXiv preprint arXiv:1706.04902

97.

Schnabel T, Labutov I, Mimno D, Joachims T (2015) Evaluation methods for unsupervised word embeddings. In: Proceedings of the 2015 conference on empirical methods in natural language processing. Association for Computational Linguistics, Lisbon, Portugal, pp 298–307. https://doi.org/10.18653/v1/D15-1036. https://www.aclweb.org/anthology/D15-1036

98.

Schuster T, Ram O, Barzilay R, Globerson A (2019) Cross-lingual alignment of contextual word embeddings, with applications to zero-shot dependency parsing. In: Proceedings of the 2019 conference of the north American chapter of the association for computational linguistics: human language technologies, vol 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, pp 1599–1613. https://doi.org/10.18653/v1/N19-1162. https://www.aclweb.org/anthology/N19-1162

99.

Sharma P, Ding N, Goodman S, Soricut R (2018) Conceptual captions: a cleaned, hypernymed, image alt-text dataset for automatic image captioning. In: Proc. of ACL, pp 2556–2565, https://doi.org/10.18653/v1/P18-1238

100.

Smith SL, Turban DHP, Hamblin S, Hammerla NY (2017) Offline bilingual word vectors, orthogonal transformations and the inverted softmax. In: Proc. of ICLR

101.

Snell J, Swersky K, Zemel R (2017) Prototypical networks for few-shot learning. In: Advances in Neural Information Processing Systems, pp 4077–4087

102.

Song K, Tan X, Qin T, Lu J, Liu T (2019) MASS: masked sequence to sequence pre-training for language generation. In: Proc. of ICML, pp 5926–5936

103.

Su W, Zhu X, Cao Y, Li B, Lu L, Wei F, Dai J (2019) Vl-bert: Pre-training of generic visual-linguistic representations. arXiv preprint arXiv:1908.08530

104.

Sun C, Myers A, Vondrick C, Murphy K, Schmid C (2019a) Videobert: A joint model for video and language representation learning. arXiv preprint arXiv:1904.01766

105.

Sun Y, Wang S, Li Y, Feng S, Chen X, Zhang H, Tian X, Zhu D, Tian H, Wu H (2019b) Ernie: Enhanced representation through knowledge integration. arXiv preprint arXiv:1904.09223

106.

Sun Y, Wang S, Li Y, Feng S, Tian H, Wu H, Wang H (2019c) Ernie 2.0: a continual pre-training framework for language understanding. arXiv preprint arXiv:1907.12412

107.

Tenney I, Das D, Pavlick E (2019) BERT rediscovers the classical NLP pipeline. In: Proceedings of the 57th annual meeting of the association for computational linguistics. Association for Computational Linguistics, Florence, Italy, pp 4593–4601. https://doi.org/10.18653/v1/P19-1452

108.

Tian F, Dai H, Bian J, Gao B, Zhang R, Chen E, Liu TY (2014) A probabilistic model for learning multi-prototype word embeddings. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics: technical papers. Dublin City University and Association for Computational Linguistics, Dublin, Ireland, pp 151–160. https://www.aclweb.org/anthology/C14-1016

109.

Tsvetkov Y, Faruqui M, Ling W, Lample G, Dyer C (2015) Evaluation of word vector representations by subspace alignment. In: Proceedings of the 2015 conference on empirical methods in natural language processing. Association for Computational Linguistics, Lisbon, Portugal, pp 2049–2054. https://doi.org/10.18653/v1/D15-1243. https://www.aclweb.org/anthology/D15-1243

110.

Turian J, Ratinov LA, Bengio Y (2010) Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th annual meeting of the association for computational linguistics. Association for Computational Linguistics, Uppsala, Sweden, pp 384–394. https://www.aclweb.org/anthology/P10-1040

111.

Turney PD (2001a) Mining the web for synonyms: Pmi-ir versus lsa on toefl. In: De Raedt L, Flach P (eds) Machine learning: ECML 2001. Springer, Berlin Heidelberg, pp 491–502CrossRef

112.

Turney PD (2001b) Mining the web for synonyms: Pmi-ir versus lsa on toefl. In: De Raedt L, Flach P (eds) Machine learning: ECML 2001. Springer, Berlin Heidelberg, pp 491–502CrossRef

113.

Upadhyay S, Faruqui M, Dyer C, Roth D (2016) Cross-lingual models of word embeddings: an empirical comparison. In: Proceedings of the 54th annual meeting of the association for computational linguistics, vol 1: long papers. Association for Computational Linguistics, Berlin, Germany, pp 1661–1670. https://doi.org/10.18653/v1/P16-1157. https://www.aclweb.org/anthology/P16-1157

114.

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Lu, Polosukhin I (2017) Attention is all you need. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems. Curran Associates, Inc., pp 5998–6008. http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf

115.

Vinyals O, Blundell C, Lillicrap T, Kavukcuoglu K, Wierstra D (2016) Matching networks for one shot learning. In: Lee DD, Sugiyama M, Luxburg UV, Guyon I, Garnett R (eds) Advances in neural information processing systems. Curran Associates, Inc., pp 3630–3638. http://papers.nips.cc/paper/6385-matching-networks-for-one-shot-learning.pdf

116.

Vulić I, Moens MF (2015) Bilingual word embeddings from non-parallel document-aligned data applied to bilingual lexicon induction. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing, vol 2 (short papers). Association for Computational Linguistics, Beijing, China, pp 719–725. https://doi.org/10.3115/v1/P15-2118. https://www.aclweb.org/anthology/P15-2118

117.

Wallace E, Feng S, Kandpal N, Gardner M, Singh S (2019) Universal adversarial triggers for attacking and analyzing NLP. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, pp 2153–2162. https://doi.org/10.18653/v1/D19-1221. https://www.aclweb.org/anthology/D19-1221

118.

Wang A, Cho K (2019) BERT has a mouth, and it must speak: BERT as a Markov random field language model. In: Proc. of NeuralGen, pp 30–36, https://doi.org/10.18653/v1/W19-2304

119.

Wang P, Qian Y, Soong FK, He L, Zhao H (2015) Part-of-speech tagging with bidirectional long short-term memory recurrent neural network. arXiv preprint arXiv:1510.06168

120.

Wang Y, Che W, Guo J, Liu Y, Liu T (2019) Cross-lingual BERT transformation for zero-shot dependency parsing. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, pp 5721–5727. https://doi.org/10.18653/v1/D19-1575. https://www.aclweb.org/anthology/D19-1575

121.

Williams A, Nangia N, Bowman SR (2017) A broad-coverage challenge corpus for sentence understanding through inference. arXiv preprint arXiv:1704.05426

122.

Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K, et al. (2016) Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144

123.

Xing C, Wang D, Liu C, Lin Y (2015) Normalized word embedding and orthogonal transform for bilingual word translation. In: Proceedings of the 2015 conference of the north American chapter of the association for computational linguistics: human language technologies. Association for Computational Linguistics, Denver, Colorado, pp 1006–1011. https://doi.org/10.3115/v1/N15-1104. https://www.aclweb.org/anthology/N15-1104

124.

Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov R, Le QV (2019) Xlnet: generalized autoregressive pretraining for language understanding. arXiv preprint arXiv:1906.08237

125.

Yijia l (2019) Sentence-level language analysis with contextualized word embeddings. Ph.D. thesis, Harbin Institute of Technology

126.

Zhang H, Gong Y, Yan Y, Duan N, Xu J, Wang J, Gong M, Zhou M (2019a) Pretraining-based natural language generation for text summarization. arXiv preprint arXiv:1902.09243

127.

Zhang T, Kishore V, Wu F, Weinberger KQ, Artzi Y (2019b) Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675

128.

Zhang Z, Han X, Liu Z, Jiang X, Sun M, Liu Q (2019c) ERNIE: enhanced language representation with informative entities. In: Proceedings of the 57th annual meeting of the association for computational linguistics. Association for Computational Linguistics, Florence, Italy, pp 1441–1451. https://doi.org/10.18653/v1/P19-1139. https://www.aclweb.org/anthology/P19-1139

129.

Zhou J, Xu W (2015) End-to-end learning of semantic role labeling using recurrent neural networks. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing, vol 1 (long papers). Association for Computational Linguistics, Beijing, China, pp 1127–1137. https://doi.org/10.3115/v1/P15-1109. https://www.aclweb.org/anthology/P15-1109

130.

Zou WY, Socher R, Cer D, Manning CD (2013) Bilingual word embeddings for phrase-based machine translation. In: Proceedings of the 2013 conference on empirical methods in natural language processing. Association for Computational Linguistics, Seattle, Washington, USA, pp 1393–1398. https://www.aclweb.org/anthology/D13-1141

Title: From static to dynamic word representations: a survey
Authors: Yuxuan Wang
Yutai Hou
Wanxiang Che
Ting Liu
Publication date: 17-02-2020
Publisher: Springer Berlin Heidelberg
Published in: International Journal of Machine Learning and Cybernetics / Issue 7/2020
Print ISSN: 1868-8071
Electronic ISSN: 1868-808X
DOI: https://doi.org/10.1007/s13042-020-01069-8

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

ATZelectronics worldwide

ATZelektronik

Other articles of this Issue 7/2020

Regularization on a rapidly varying manifold

An ORESTE approach for multi-criteria decision-making with probabilistic hesitant fuzzy information

Fine-tuning of line and slope based on evolutionary mechanism

Unsupervised feature learning with sparse Bayesian auto-encoding based extreme learning machine

Orientation space code and multi-feature two-phase sparse representation for palmprint recognition

Parameter self-tuning schemes for the two phase test sample sparse representation classifier