Skip to main content
Top
Published in: International Journal of Machine Learning and Cybernetics 7/2020

17-02-2020 | Original Article

From static to dynamic word representations: a survey

Authors: Yuxuan Wang, Yutai Hou, Wanxiang Che, Ting Liu

Published in: International Journal of Machine Learning and Cybernetics | Issue 7/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In the history of natural language processing (NLP) development, the representation of words has always been a significant research topic. In this survey, we provide a comprehensive typology of word representation models from a novel perspective that the development from static to dynamic embeddings can effectively address the polysemy problem, which has been a great challenge in this field. Then the survey covers the main evaluation metrics and applications of these word embeddings. And, we further discuss the development of word embeddings from static to dynamic in cross-lingual scenario. Finally, we point out some open issues and future works.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Show more products
Footnotes
1
These embeddings are contextualized or dynamic as opposed to the traditional ones.
 
3
Please refer to [27] for more detailed comparison and analysis of these distributional representation models.
 
4
We will use \({\varvec{C}}(w)\) to denote the distributed embedding of word w in the rest of this paper.
 
5
Please refer to the paper [87] for detailed results.
 
6
Please refer to the paper [90] for a detailed description of the pre-processing techniques.
 
7
Such functional tokens are also used by GPT but only introduced while finetuning.
 
9
Please refer to the paper [124] for implementation details.
 
Literature
1.
go back to reference Almuhareb A (2006) Attributes in lexical acquisition. PhD thesis, University of Essex Almuhareb A (2006) Attributes in lexical acquisition. PhD thesis, University of Essex
2.
go back to reference Artetxe M, Ruder S, Yogatama D (2019) On the cross-lingual transferability of monolingual representations. arXiv preprint arXiv:1910.11856 Artetxe M, Ruder S, Yogatama D (2019) On the cross-lingual transferability of monolingual representations. arXiv preprint arXiv:​1910.​11856
4.
go back to reference Baroni M, Evert S, Lenci A (2008) Bridging the gap between semantic theory and computational simulations. In: Proc. of the esslli workshop on distributional lexical semantic. FOLLI, Hamburg Baroni M, Evert S, Lenci A (2008) Bridging the gap between semantic theory and computational simulations. In: Proc. of the esslli workshop on distributional lexical semantic. FOLLI, Hamburg
5.
go back to reference Baroni M, Murphy B, Barbu E, Poesio M (2010) Strudel: a corpus-based semantic model based on properties and types. Cogn Sci 34:222–254CrossRef Baroni M, Murphy B, Barbu E, Poesio M (2010) Strudel: a corpus-based semantic model based on properties and types. Cogn Sci 34:222–254CrossRef
7.
go back to reference Bengio Y, Ducharme R, Vincent P, Janvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155MATH Bengio Y, Ducharme R, Vincent P, Janvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155MATH
8.
go back to reference Blei DM, Ng AY, Jordan MI, Lafferty J (2003) Latent dirichlet allocation. J Mach Learn Res 3:2003 Blei DM, Ng AY, Jordan MI, Lafferty J (2003) Latent dirichlet allocation. J Mach Learn Res 3:2003
9.
go back to reference Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146CrossRef Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146CrossRef
10.
go back to reference Bowman SR, Angeli G, Potts C, Manning CD (2015) A large annotated corpus for learning natural language inference. arXiv preprint arXiv:1508.05326 Bowman SR, Angeli G, Potts C, Manning CD (2015) A large annotated corpus for learning natural language inference. arXiv preprint arXiv:​1508.​05326
11.
go back to reference Brown PF, deSouza PV, Mercer RL, Pietra VJD, Lai JC (1992) Class-based n-gram models of natural language. Comput Linguist 18(4):467–479 Brown PF, deSouza PV, Mercer RL, Pietra VJD, Lai JC (1992) Class-based n-gram models of natural language. Comput Linguist 18(4):467–479
12.
go back to reference Bruni E, Tran NK, Baroni M (2014) Multimodal distributional semantics. J Artif Int Res pp 1–47 Bruni E, Tran NK, Baroni M (2014) Multimodal distributional semantics. J Artif Int Res pp 1–47
19.
go back to reference Clark K, Luong MT, Khandelwal U, Manning CD, Le QV (2019b) BAM! born-again multi-task networks for natural language understanding. In: Proc. of ACL, pp 5931–5937 Clark K, Luong MT, Khandelwal U, Manning CD, Le QV (2019b) BAM! born-again multi-task networks for natural language understanding. In: Proc. of ACL, pp 5931–5937
20.
go back to reference Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–2537MATH Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–2537MATH
21.
go back to reference Conneau A, Khandelwal K, Goyal N, Chaudhary V, Wenzek G, Guzmán F, Grave E, Ott M, Zettlemoyer L, Stoyanov V (2019) Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116 Conneau A, Khandelwal K, Goyal N, Chaudhary V, Wenzek G, Guzmán F, Grave E, Ott M, Zettlemoyer L, Stoyanov V (2019) Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:​1911.​02116
22.
go back to reference Cui Y, Che W, Liu T, Qin B, Yang Z, Wang S, Hu G (2019) Pre-training with whole word masking for chinese bert. arXiv preprint arXiv:1906.08101 Cui Y, Che W, Liu T, Qin B, Yang Z, Wang S, Hu G (2019) Pre-training with whole word masking for chinese bert. arXiv preprint arXiv:​1906.​08101
23.
go back to reference Dagan I, Pereira F, Lee L (1994) Similarity-based estimation of word cooccurrence probabilities. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, Las Cruces, New Mexico, USA, pp 272–278. https://doi.org/10.3115/981732.981770 Dagan I, Pereira F, Lee L (1994) Similarity-based estimation of word cooccurrence probabilities. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, Las Cruces, New Mexico, USA, pp 272–278. https://​doi.​org/​10.​3115/​981732.​981770
24.
go back to reference Dai Z, Yang Z, Yang Y, Cohen WW, Carbonell J, Le QV, Salakhutdinov R (2019) Transformer-xl: Attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860 Dai Z, Yang Z, Yang Y, Cohen WW, Carbonell J, Le QV, Salakhutdinov R (2019) Transformer-xl: Attentive language models beyond a fixed-length context. arXiv preprint arXiv:​1901.​02860
25.
go back to reference Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407CrossRef Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407CrossRef
26.
go back to reference Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:​1810.​04805
27.
go back to reference Dinu G, Lapata M (2010) Measuring distributional similarity in context. In: Proc. of EMNLP, pp 1162–1172 Dinu G, Lapata M (2010) Measuring distributional similarity in context. In: Proc. of EMNLP, pp 1162–1172
28.
go back to reference Eckart C, Young G (1936) The approximation of one matrix by another of lower rank. Psychometrika 1(3):211–218CrossRef Eckart C, Young G (1936) The approximation of one matrix by another of lower rank. Psychometrika 1(3):211–218CrossRef
30.
go back to reference Fei-Fei L (2006) Knowledge transfer in learning to recognize visual objects classes. In: International Conference on Development and Learning. Department of Psychological and Brain Sciences, Indiana University, pp 1–8 Fei-Fei L (2006) Knowledge transfer in learning to recognize visual objects classes. In: International Conference on Development and Learning. Department of Psychological and Brain Sciences, Indiana University, pp 1–8
34.
36.
41.
go back to reference Guo J, Che W, Wang H, Liu T (2014) Learning sense-specific word embeddings by exploiting bilingual resources. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics: technical papers. Dublin City University and Association for Computational Linguistics, Dublin, Ireland, pp 497–507. https://www.aclweb.org/anthology/C14-1048 Guo J, Che W, Wang H, Liu T (2014) Learning sense-specific word embeddings by exploiting bilingual resources. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics: technical papers. Dublin City University and Association for Computational Linguistics, Dublin, Ireland, pp 497–507. https://​www.​aclweb.​org/​anthology/​C14-1048
42.
go back to reference Guo J, Che W, Yarowsky D, Wang H, Liu T (2015) Cross-lingual dependency parsing based on distributed representations. In: Proc. of ACL and IJCNLP, pp 1234–1244 Guo J, Che W, Yarowsky D, Wang H, Liu T (2015) Cross-lingual dependency parsing based on distributed representations. In: Proc. of ACL and IJCNLP, pp 1234–1244
43.
go back to reference Guo J, Che W, Yarowsky D, Wang H, Liu T (2016a) A distributed representation-based framework for cross-lingual transfer parsing. J Artif Int Res 55(1):995–1023MathSciNet Guo J, Che W, Yarowsky D, Wang H, Liu T (2016a) A distributed representation-based framework for cross-lingual transfer parsing. J Artif Int Res 55(1):995–1023MathSciNet
44.
go back to reference Guo J, Che W, Yarowsky D, Wang H, Liu T (2016b) A representation learning framework for multi-source transfer parsing. In: Proceedings of the thirtieth AAAI conference on artificial intelligence, AAAI’16. AAAI Press, Phoenix, Arizona, pp 2734–2740. Guo J, Che W, Yarowsky D, Wang H, Liu T (2016b) A representation learning framework for multi-source transfer parsing. In: Proceedings of the thirtieth AAAI conference on artificial intelligence, AAAI’16. AAAI Press, Phoenix, Arizona, pp 2734–2740.
45.
go back to reference Hermann KM, Blunsom P (2014) Multilingual models for compositional distributed semantics. In: Proc. of ACL, pp 58–68 Hermann KM, Blunsom P (2014) Multilingual models for compositional distributed semantics. In: Proc. of ACL, pp 58–68
48.
go back to reference Hou Y, Zhou Z, Liu Y, Wang N, Che W, Liu H, Liu T (2019) Few-shot sequence labeling with label dependency transfer. arXiv preprint arXiv:1906.08711 Hou Y, Zhou Z, Liu Y, Wang N, Che W, Liu H, Liu T (2019) Few-shot sequence labeling with label dependency transfer. arXiv preprint arXiv:​1906.​08711
50.
go back to reference Huang E, Socher R, Manning C, Ng A (2012) Improving word representations via global context and multiple word prototypes. In: Proceedings of the 56th annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 873–882 Huang E, Socher R, Manning C, Ng A (2012) Improving word representations via global context and multiple word prototypes. In: Proceedings of the 56th annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 873–882
51.
go back to reference Huang F, Yates A (2009) Distributional representations for handling sparsity in supervised sequence-labeling. In: Proc. of ACL and IJCNLP, pp 495–503 Huang F, Yates A (2009) Distributional representations for handling sparsity in supervised sequence-labeling. In: Proc. of ACL and IJCNLP, pp 495–503
52.
go back to reference Iacobacci I, Pilehvar MT, Navigli R (2016) Embeddings for word sense disambiguation: an evaluation study. In: Proc. of ACL, pp 897–907 Iacobacci I, Pilehvar MT, Navigli R (2016) Embeddings for word sense disambiguation: an evaluation study. In: Proc. of ACL, pp 897–907
53.
go back to reference Jarmasz M, Szpakowicz S (2003) Roget’s thesaurus and semantic similarity. In: Proc. of RANLP, pp 212–219 Jarmasz M, Szpakowicz S (2003) Roget’s thesaurus and semantic similarity. In: Proc. of RANLP, pp 212–219
54.
go back to reference Joshi M, Chen D, Liu Y, Weld DS, Zettlemoyer L, Levy O (2019) Spanbert: improving pre-training by representing and predicting spans. arXiv preprint arXiv:1907.10529 Joshi M, Chen D, Liu Y, Weld DS, Zettlemoyer L, Levy O (2019) Spanbert: improving pre-training by representing and predicting spans. arXiv preprint arXiv:​1907.​10529
55.
go back to reference Joulin A, Grave E, Bojanowski P, Mikolov T (2017) Bag of tricks for efficient text classification. In: Proceedings of the 15th conference of the European chapter of the association for computational linguistics, vol 2, short papers. Association for Computational Linguistics, Valencia, Spain, pp 427–431. https://www.aclweb.org/anthology/E17-2068 Joulin A, Grave E, Bojanowski P, Mikolov T (2017) Bag of tricks for efficient text classification. In: Proceedings of the 15th conference of the European chapter of the association for computational linguistics, vol 2, short papers. Association for Computational Linguistics, Valencia, Spain, pp 427–431. https://​www.​aclweb.​org/​anthology/​E17-2068
59.
go back to reference Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C (2016) Neural architectures for named entity recognition. In: Proc. of NAACL, pp 260–270 Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C (2016) Neural architectures for named entity recognition. In: Proc. of NAACL, pp 260–270
60.
go back to reference Lample G, Conneau A, Ranzato M, Denoyer L, Jégou H (2018) Word translation without parallel data. In: Proc. of ICLR Lample G, Conneau A, Ranzato M, Denoyer L, Jégou H (2018) Word translation without parallel data. In: Proc. of ICLR
61.
go back to reference Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R (2019) Albert: a lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942 Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R (2019) Albert: a lite bert for self-supervised learning of language representations. arXiv preprint arXiv:​1909.​11942
62.
go back to reference Landauer TK, Dutnais ST (1997) A solution to plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol Rev 211–240 Landauer TK, Dutnais ST (1997) A solution to plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol Rev 211–240
63.
go back to reference Lazaridou A, Dinu G, Baroni M (2015) Hubness and pollution: Delving into cross-space mapping for zero-shot learning. In: Proc. of ACL and IJCNLP, pp 270–280 Lazaridou A, Dinu G, Baroni M (2015) Hubness and pollution: Delving into cross-space mapping for zero-shot learning. In: Proc. of ACL and IJCNLP, pp 270–280
65.
go back to reference Liu W, Zhou P, Zhao Z, Wang Z, Ju Q, Deng H, Wang P (2019b) K-bert: Enabling language representation with knowledge graph. arXiv preprint arXiv:1909.07606 Liu W, Zhou P, Zhao Z, Wang Z, Ju Q, Deng H, Wang P (2019b) K-bert: Enabling language representation with knowledge graph. arXiv preprint arXiv:​1909.​07606
70.
go back to reference McCallum A, Freitag D, Pereira FCN (2000) Maximum entropy markov models for information extraction and segmentation. In: Proceedings of the seventeenth international conference on machine learning, ICML ’00. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 591–598 McCallum A, Freitag D, Pereira FCN (2000) Maximum entropy markov models for information extraction and segmentation. In: Proceedings of the seventeenth international conference on machine learning, ICML ’00. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 591–598
72.
go back to reference McRae K, Ferretti TR, Amyote L (1997) Thematic roles as verb-specific concepts. Lang Cogn Process 12(2–3):137–176CrossRef McRae K, Ferretti TR, Amyote L (1997) Thematic roles as verb-specific concepts. Lang Cogn Process 12(2–3):137–176CrossRef
74.
go back to reference Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations in vector space. ICLR Workshop Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations in vector space. ICLR Workshop
75.
go back to reference Mikolov T, Le QV, Sutskever I (2013b) Exploiting similarities among languages for machine translation. arXiv preprint arXiv:1309.4168 Mikolov T, Le QV, Sutskever I (2013b) Exploiting similarities among languages for machine translation. arXiv preprint arXiv:​1309.​4168
76.
go back to reference Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013c) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119 Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013c) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119
77.
go back to reference Mikolov T, Yih Wt, Zweig G (2013d) Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 conference of the north American chapter of the association for computational linguistics: human language technologies. Association for Computational Linguistics, Atlanta, Georgia, pp 746–751. https://www.aclweb.org/anthology/N13-1090 Mikolov T, Yih Wt, Zweig G (2013d) Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 conference of the north American chapter of the association for computational linguistics: human language technologies. Association for Computational Linguistics, Atlanta, Georgia, pp 746–751. https://​www.​aclweb.​org/​anthology/​N13-1090
78.
go back to reference Miller GA (1995) Wordnet: A lexical database for english. Commun ACM 39–41 Miller GA (1995) Wordnet: A lexical database for english. Commun ACM 39–41
79.
go back to reference Mnih A, Hinton G (2007) Three new graphical models for statistical language modelling. In: Proceedings of the 24th international conference on machine learning, ICML ’07. Association for Computing Machinery, Corvalis, Oregon, USA, pp 641–648. https://doi.org/10.1145/1273496.1273577 Mnih A, Hinton G (2007) Three new graphical models for statistical language modelling. In: Proceedings of the 24th international conference on machine learning, ICML ’07. Association for Computing Machinery, Corvalis, Oregon, USA, pp 641–648. https://​doi.​org/​10.​1145/​1273496.​1273577
80.
go back to reference Mnih A, Hinton GE (2009) A scalable hierarchical distributed language model. Adv Neural Inf Process Syst 21:1081–1088 Mnih A, Hinton GE (2009) A scalable hierarchical distributed language model. Adv Neural Inf Process Syst 21:1081–1088
85.
go back to reference Padó S, Lapata M (2007) Dependency-based construction of semantic space models. Comput Linguist 33(2):161–199CrossRef Padó S, Lapata M (2007) Dependency-based construction of semantic space models. Comput Linguist 33(2):161–199CrossRef
91.
go back to reference Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I (2019) Language models are unsupervised multitask learners. OpenAI Blog 1(8) Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I (2019) Language models are unsupervised multitask learners. OpenAI Blog 1(8)
92.
go back to reference Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ (2019) Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683 Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ (2019) Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:​1910.​10683
93.
go back to reference Rajpurkar P, Zhang J, Lopyrev K, Liang P (2016) Squad: 100,000+ questions for machine comprehension of text. In: Proc. of EMNLP, pp 2383–2392 Rajpurkar P, Zhang J, Lopyrev K, Liang P (2016) Squad: 100,000+ questions for machine comprehension of text. In: Proc. of EMNLP, pp 2383–2392
94.
95.
go back to reference Reisinger J, Mooney RJ (2010) Multi-prototype vector-space models of word meaning. In: Proc. of HLT-NAACL, pp 109–117 Reisinger J, Mooney RJ (2010) Multi-prototype vector-space models of word meaning. In: Proc. of HLT-NAACL, pp 109–117
98.
go back to reference Schuster T, Ram O, Barzilay R, Globerson A (2019) Cross-lingual alignment of contextual word embeddings, with applications to zero-shot dependency parsing. In: Proceedings of the 2019 conference of the north American chapter of the association for computational linguistics: human language technologies, vol 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, pp 1599–1613. https://doi.org/10.18653/v1/N19-1162. https://www.aclweb.org/anthology/N19-1162 Schuster T, Ram O, Barzilay R, Globerson A (2019) Cross-lingual alignment of contextual word embeddings, with applications to zero-shot dependency parsing. In: Proceedings of the 2019 conference of the north American chapter of the association for computational linguistics: human language technologies, vol 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, pp 1599–1613. https://​doi.​org/​10.​18653/​v1/​N19-1162. https://​www.​aclweb.​org/​anthology/​N19-1162
100.
go back to reference Smith SL, Turban DHP, Hamblin S, Hammerla NY (2017) Offline bilingual word vectors, orthogonal transformations and the inverted softmax. In: Proc. of ICLR Smith SL, Turban DHP, Hamblin S, Hammerla NY (2017) Offline bilingual word vectors, orthogonal transformations and the inverted softmax. In: Proc. of ICLR
101.
go back to reference Snell J, Swersky K, Zemel R (2017) Prototypical networks for few-shot learning. In: Advances in Neural Information Processing Systems, pp 4077–4087 Snell J, Swersky K, Zemel R (2017) Prototypical networks for few-shot learning. In: Advances in Neural Information Processing Systems, pp 4077–4087
102.
go back to reference Song K, Tan X, Qin T, Lu J, Liu T (2019) MASS: masked sequence to sequence pre-training for language generation. In: Proc. of ICML, pp 5926–5936 Song K, Tan X, Qin T, Lu J, Liu T (2019) MASS: masked sequence to sequence pre-training for language generation. In: Proc. of ICML, pp 5926–5936
103.
go back to reference Su W, Zhu X, Cao Y, Li B, Lu L, Wei F, Dai J (2019) Vl-bert: Pre-training of generic visual-linguistic representations. arXiv preprint arXiv:1908.08530 Su W, Zhu X, Cao Y, Li B, Lu L, Wei F, Dai J (2019) Vl-bert: Pre-training of generic visual-linguistic representations. arXiv preprint arXiv:​1908.​08530
104.
go back to reference Sun C, Myers A, Vondrick C, Murphy K, Schmid C (2019a) Videobert: A joint model for video and language representation learning. arXiv preprint arXiv:1904.01766 Sun C, Myers A, Vondrick C, Murphy K, Schmid C (2019a) Videobert: A joint model for video and language representation learning. arXiv preprint arXiv:​1904.​01766
105.
go back to reference Sun Y, Wang S, Li Y, Feng S, Chen X, Zhang H, Tian X, Zhu D, Tian H, Wu H (2019b) Ernie: Enhanced representation through knowledge integration. arXiv preprint arXiv:1904.09223 Sun Y, Wang S, Li Y, Feng S, Chen X, Zhang H, Tian X, Zhu D, Tian H, Wu H (2019b) Ernie: Enhanced representation through knowledge integration. arXiv preprint arXiv:​1904.​09223
106.
go back to reference Sun Y, Wang S, Li Y, Feng S, Tian H, Wu H, Wang H (2019c) Ernie 2.0: a continual pre-training framework for language understanding. arXiv preprint arXiv:1907.12412 Sun Y, Wang S, Li Y, Feng S, Tian H, Wu H, Wang H (2019c) Ernie 2.0: a continual pre-training framework for language understanding. arXiv preprint arXiv:​1907.​12412
107.
go back to reference Tenney I, Das D, Pavlick E (2019) BERT rediscovers the classical NLP pipeline. In: Proceedings of the 57th annual meeting of the association for computational linguistics. Association for Computational Linguistics, Florence, Italy, pp 4593–4601. https://doi.org/10.18653/v1/P19-1452 Tenney I, Das D, Pavlick E (2019) BERT rediscovers the classical NLP pipeline. In: Proceedings of the 57th annual meeting of the association for computational linguistics. Association for Computational Linguistics, Florence, Italy, pp 4593–4601. https://​doi.​org/​10.​18653/​v1/​P19-1452
108.
go back to reference Tian F, Dai H, Bian J, Gao B, Zhang R, Chen E, Liu TY (2014) A probabilistic model for learning multi-prototype word embeddings. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics: technical papers. Dublin City University and Association for Computational Linguistics, Dublin, Ireland, pp 151–160. https://www.aclweb.org/anthology/C14-1016 Tian F, Dai H, Bian J, Gao B, Zhang R, Chen E, Liu TY (2014) A probabilistic model for learning multi-prototype word embeddings. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics: technical papers. Dublin City University and Association for Computational Linguistics, Dublin, Ireland, pp 151–160. https://​www.​aclweb.​org/​anthology/​C14-1016
110.
go back to reference Turian J, Ratinov LA, Bengio Y (2010) Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th annual meeting of the association for computational linguistics. Association for Computational Linguistics, Uppsala, Sweden, pp 384–394. https://www.aclweb.org/anthology/P10-1040 Turian J, Ratinov LA, Bengio Y (2010) Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th annual meeting of the association for computational linguistics. Association for Computational Linguistics, Uppsala, Sweden, pp 384–394. https://​www.​aclweb.​org/​anthology/​P10-1040
111.
go back to reference Turney PD (2001a) Mining the web for synonyms: Pmi-ir versus lsa on toefl. In: De Raedt L, Flach P (eds) Machine learning: ECML 2001. Springer, Berlin Heidelberg, pp 491–502CrossRef Turney PD (2001a) Mining the web for synonyms: Pmi-ir versus lsa on toefl. In: De Raedt L, Flach P (eds) Machine learning: ECML 2001. Springer, Berlin Heidelberg, pp 491–502CrossRef
112.
go back to reference Turney PD (2001b) Mining the web for synonyms: Pmi-ir versus lsa on toefl. In: De Raedt L, Flach P (eds) Machine learning: ECML 2001. Springer, Berlin Heidelberg, pp 491–502CrossRef Turney PD (2001b) Mining the web for synonyms: Pmi-ir versus lsa on toefl. In: De Raedt L, Flach P (eds) Machine learning: ECML 2001. Springer, Berlin Heidelberg, pp 491–502CrossRef
119.
go back to reference Wang P, Qian Y, Soong FK, He L, Zhao H (2015) Part-of-speech tagging with bidirectional long short-term memory recurrent neural network. arXiv preprint arXiv:1510.06168 Wang P, Qian Y, Soong FK, He L, Zhao H (2015) Part-of-speech tagging with bidirectional long short-term memory recurrent neural network. arXiv preprint arXiv:​1510.​06168
121.
go back to reference Williams A, Nangia N, Bowman SR (2017) A broad-coverage challenge corpus for sentence understanding through inference. arXiv preprint arXiv:1704.05426 Williams A, Nangia N, Bowman SR (2017) A broad-coverage challenge corpus for sentence understanding through inference. arXiv preprint arXiv:​1704.​05426
122.
go back to reference Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K, et al. (2016) Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K, et al. (2016) Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:​1609.​08144
124.
go back to reference Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov R, Le QV (2019) Xlnet: generalized autoregressive pretraining for language understanding. arXiv preprint arXiv:1906.08237 Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov R, Le QV (2019) Xlnet: generalized autoregressive pretraining for language understanding. arXiv preprint arXiv:​1906.​08237
125.
go back to reference Yijia l (2019) Sentence-level language analysis with contextualized word embeddings. Ph.D. thesis, Harbin Institute of Technology Yijia l (2019) Sentence-level language analysis with contextualized word embeddings. Ph.D. thesis, Harbin Institute of Technology
126.
go back to reference Zhang H, Gong Y, Yan Y, Duan N, Xu J, Wang J, Gong M, Zhou M (2019a) Pretraining-based natural language generation for text summarization. arXiv preprint arXiv:1902.09243 Zhang H, Gong Y, Yan Y, Duan N, Xu J, Wang J, Gong M, Zhou M (2019a) Pretraining-based natural language generation for text summarization. arXiv preprint arXiv:​1902.​09243
127.
go back to reference Zhang T, Kishore V, Wu F, Weinberger KQ, Artzi Y (2019b) Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675 Zhang T, Kishore V, Wu F, Weinberger KQ, Artzi Y (2019b) Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:​1904.​09675
130.
go back to reference Zou WY, Socher R, Cer D, Manning CD (2013) Bilingual word embeddings for phrase-based machine translation. In: Proceedings of the 2013 conference on empirical methods in natural language processing. Association for Computational Linguistics, Seattle, Washington, USA, pp 1393–1398. https://www.aclweb.org/anthology/D13-1141 Zou WY, Socher R, Cer D, Manning CD (2013) Bilingual word embeddings for phrase-based machine translation. In: Proceedings of the 2013 conference on empirical methods in natural language processing. Association for Computational Linguistics, Seattle, Washington, USA, pp 1393–1398. https://​www.​aclweb.​org/​anthology/​D13-1141
Metadata
Title
From static to dynamic word representations: a survey
Authors
Yuxuan Wang
Yutai Hou
Wanxiang Che
Ting Liu
Publication date
17-02-2020
Publisher
Springer Berlin Heidelberg
Published in
International Journal of Machine Learning and Cybernetics / Issue 7/2020
Print ISSN: 1868-8071
Electronic ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-020-01069-8

Other articles of this Issue 7/2020

International Journal of Machine Learning and Cybernetics 7/2020 Go to the issue