1 Introduction
2 Related work
2.1 Distributional semantics
2.1.1 Constructing word-vector models
although “it cannot be assumed that the brain stores a perfect representation of word-context pairs or runs complex matrix decomposition algorithms in the same way as digital computers do” (ibid. Mandera et al. 2017). Some examples of count models are Latent Semantic Analysis (LSA) (Deerwester et al. 1990), Hyperspace Analogue to Language (HAL) (Lund and Burgess 1996), Latent Dirichlet Allocation (LDA) (Blei et al. 2003), and Hellinger PCA (Lebret and Collobert 2014).the counting step and its associated weighting scheme could be seen as a rough approximation of conditioning or associative processes and that the dimensionality reduction step could be considered an approximation of a data reduction process performed by the brain
However, their experiments showed that these methods do not necessarily outperform traditional static embedding models, which is why our research only focused on the latter.For example, the contextualized vectors of a word can be averaged over a large corpus. Alternatively, the word vector parameters from the token embedding layer in a contextualized model can be used as static embeddings.
2.1.2 Combining word vectors
(Agirre et al. (2009)) employed a hybrid model. On the one hand, they computed a personalized PageRank vector of probability distributions over the WordNet graph for each word. On the other hand, they constructed a corpus-based vector-space model from different approaches, i.e. bag of words, context window and syntactic dependency, where the method based on context windows provided the best results for similarity and the bag-of-words representation outperformed for relatedness. Finally, they demonstrated that distributional similarities can perform as well as the knowledge-based approach, and the combination of both models using a supervised learner can exceed the performance of results.The plethora of measures available in the literature suggests that no single method is capable of adequately quantifying the similarity/relatedness between words. Therefore, combining different approaches may provide a better result. (Niraula et al. (2015) , p. 200)
As the geometric constellation that holds between words is similar across languages, it is possible to transform the vector space of the source language to the vector space of the target language by employing a technique such as SVD or CCA to learn a linear projection between the languages.Mapping-based approaches [...] first train monolingual word representations independently on large monolingual corpora and then seek to learn a transformation matrix that maps representations in one language to the representations of the other language. They learn this transformation from word alignments or bilingual dictionaries. (Ruder et al. 2019, p. 581 )
2.1.3 Word embeddings in text classification
2.2 Word associations
2.2.1 Measuring word associations
2.2.2 Evaluating word associations
For example, yellow is strongly associated with banana, but the two words rarely co-occur in discourse because most bananas are yellow, so mentioning yellow together with banana is uninformative. In their experiments, they used several standard datasets of word similarity and relatedness to evaluate external language models constructed from text corpora and internal language models constructed from a semantic graph derived from the English Small World of Words (SWOW-EN) De Deyne et al. (2019), consisting of over 12,000 cue words and 300 associations for each cue resulting from judgements from over 90,000 participants. They showed, for example, that an internal language model grounded on Word2Vec embeddings substantially outperformed an external language model grounded on a random-walk semantic graph. However, the superior performance of this internal language model is unsurprising: the model was constructed from data derived from free-association tasks and then compared with human judgements on word associations, inevitably resulting in a biased evaluation.word associations are not merely propositional but tap directly into the semantic information of the mental lexicon [...]. They are considered to be free from pragmatics or the intent to communicate some organized discourse, and thought to be simply the expression of thought. (De Deyne et al. (2015), p. 1646)
2.3 Ensemble application of symbolic and sub-symbolic approaches to natural language processing
3 Proposed method
3.1 Combining word embeddings
3.2 Evaluating word associations
Cue | Target | Test | Reference |
---|---|---|---|
Jazz | Music | 0.564 | 0.367 |
Champagne | Bubble | 0.291 | 0.163 |
Adult | Responsible | 0.086 | 0.041 |
Cancer | Kill | 0.488 | 0.020 |
Athlete | Player | 0.103 | 0.014 |
Cue | Target | Score |
---|---|---|
Accident | Car | 0.358 |
Accident | Crash | 0.128 |
Accident | Pain | 0.020 |
Accident | Danger | 0.014 |
Therefore, for a group-based evaluation, the RankDCG score of the model is calculated with Equation 31, where k is the number of groups in the test dataset Q, and \({RankDCG_{G_j}}\) is the RankDCG score corresponding to the group \({G_j}\), which should be part of Q.Knowing that the response “read” is produced by 43% of the participants to the cue BOOK does not tell us how strong this response is in any absolute sense; it tells us only that this response is stronger than “study” which was produced by 5.5% of the participants. Unfortunately, free association norms like relatedness ratings provide only ordinal measures of strength of association but, as far as we know, there are no known measures of absolute strength.
3.3 Computational implementation
4 Experiments
Dataset | Original | Test | Coverage (%) |
---|---|---|---|
MC | 30 | 30 | 100 |
YP | 130 | 47 | 36.15 |
RG | 65 | 65 | 100 |
MTurk-287 | 287 | 76 | 26.48 |
WS-SIM | 203 | 192 | 94.58 |
WS-REL | 252 | 233 | 92.46 |
RW | 2034 | 276 | 13.57 |
WS-ALL | 353 | 328 | 92.92 |
MTurk-771 | 771 | 769 | 99.74 |
MEN | 3000 | 1592 | 53.07 |
FAN | 63,619 | 17,204 | 27.04 |
5 Results
dataset | Word2Vec | GloVe | FastText | |||
---|---|---|---|---|---|---|
WALE-1 | WALE-2 | WALE-1 | WALE-2 | WALE-1 | WALE-2 | |
MC | .84 (.4–.6) | .84 (.8–.2) | .80 (.2–.8) | .83 (.8–.2) | .85 (.9–.1) | .85 (.9–.1) |
YP | .77 (0–1) | .77 (0–1) | .77 (0–1) | .77 (0–1) | .77 (0–1) | .77 (0–1) |
RG | .81 (.4–.6) | .83 (.6–.4) | .84 (.2–.8) | .86 (.8–.2) | .86 (.8–.2) | .87 (.8–.2) |
MTurk-287 | .76 (.6–.4) | .78 (.7–.3) | .75 (.2–.8) | .75 (.7–.3) | .85 (1–0) | .85 (.9–.1) |
WS-SIM | .78 (.4–.6) | .80 (.7–.3) | .78 (.2–.8) | .79 (.6–.4) | .84 (.8–.2) | .85 (.8–.2) |
WS-REL | .62 (.7–.3) | .64 (.8–.2) | .64 (.3–.7) | .65 (.8–.2) | .73 (.9–.1) | .74 (.9–.1) |
RW | .56 (.5–.5) | .57 (.9–.1) | .52 (.2–.8) | .52 (.7–.3) | .60 (.9–.1) | .60 (.9–.1) |
WS-ALL | .70 (.4–.6) | .72 (.8–.2) | .70 (.2–.8) | .72 (.7–.3) | .78 (.8–.2) | .79 (.9–.1) |
MTurk-771 | .70 (.4–.6) | .72 (.8–.2) | .73 (.2–.8) | .75 (.8–.2) | .75 (.9–.1) | .77 (.8–.2) |
MEN | .78 (.5–.5) | .78 (.8–.2) | .76 (.3–.7) | .77 (.8–.2) | .84 (1–0) | .84 (.9–.1) |
FAN | .32 (.4–.6) | .32 (.8–.2) | .30 (.2–.8) | .30 (.7–.3) | .36 (.7–.3) | .36 (.8–.2) |
dataset | Word2Vec | GloVe | FastText | |||
---|---|---|---|---|---|---|
WALE-1 | WALE-2 | WALE-1 | WALE-2 | WALE-1 | WALE-2 | |
MC | .83 (.4–.6) | .83 (.8–.2) | .82 (.2–.8) | .83 (.8–.2) | .84 (.9–.1) | .84 (.9–.1) |
YP | .84 (0–1) | .84 (.2–.8) | .84 (.1–.9) | .84 (.2–.8) | .84 (0–1) | .85 (.3–.7) |
RG | .81 (.3–.7) | .82 (.7–.3) | .83 (.2–.8) | .84 (.7–.3) | .86 (.7–.3) | .86 (.8–.2) |
MTurk-287 | .72 (.5–.5) | .72 (.8–.2) | .74 (.3–.7) | .74 (.8–.2) | .80 (1–0) | .80 (1–0) |
WS-SIM | .78 (.4–.6) | .79 (.8–.2) | .78 (.2–.8) | .79 (.7–.3) | .84 (.8–.2) | .84 (.9–.1) |
WS-REL | .59 (.5–.5) | .60 (.8–.2) | .66 (.3–.7) | .67 (.8–.2) | .72 (.9–.1) | .72 (.9–.1) |
RW | .51 (.4–.6) | .53 (.8–.2) | .49 (.2–.8) | .49 (.7–.3) | .57 (.9–.1) | .58 (.9–.1) |
WS-ALL | .65 (.5–.5) | .67 (.8–.2) | .69 (.2–.8) | .70 (.8–.2) | .75 (.9–.1) | .75 (.9–.1) |
MTurk-771 | .69 (.4–.6) | .70 (.7–.3) | .73 (.2–.8) | .74 (.8–.2) | .73 (.8–.2) | .75 (.8–.2) |
MEN | .76 (.5–.5) | .76 (.9–.1) | .75 (.3–.7) | .76 (.8–.2) | .82 (1–0) | .82 (1–0) |
FAN | .34 (.4–.6) | .34 (.8–.2) | .31 (.2–.8) | .32 (.7–.3) | .38 (.7–.3) | .38 (.8–.2) |
dataset | Word2Vec | GloVe | FastText | |||
---|---|---|---|---|---|---|
WALE-1 | WALE-2 | WALE-1 | WALE-2 | WALE-1 | WALE-2 | |
MC | .97 (.3–.7) | .97 (.8–.2) | .95 (.2–.8) | .95 (.8–.2) | .98 (.9–.1) | .97 (.9–.1) |
YP | .91 (.2–.8) | .94 (.4–.6) | .91 (0–1) | .91 (0–1) | .92 (.5–.5) | .93 (.6–.4) |
RG | .94 (.1–.9) | .94 (.2–.8) | .94 (.1–.9) | .94 (.1–.9) | .94 (.6–.4) | .95 (.4–.6) |
MTurk-287 | .90 (.2–.8) | .91 (.3–.7) | .90 (.6–.4) | .90 (1–0) | .90 (.7–.3) | .90 (.8–.2) |
WS-SIM | .92 (.3–.7) | .92 (.8–.2) | .92 (.1–.9) | .93 (.6–.4) | .93 (.6–.4) | .93 (.6–.4) |
WS-REL | .88 (.3–.7) | .87 (.6–.4) | .89 (.3–.7) | .89 (.8–.2) | .89 (.6–.4) | .91 (.8–.2) |
RW | .84 (.3–.7) | .83 (.9–.1) | .84 (.6–.4) | .84 (1–0) | .85 (1–0) | .85 (1–0) |
WS-ALL | .91 (.3–.7) | .91 (.6–.4) | .92 (.2–.8) | .92 (.6–.4) | .92 (.8–.2) | .93 (.8–.2) |
MTurk-771 | .90 (.3–.7) | .90 (.5–.5) | .89 (.2–.8) | .90 (.6–.4) | .87 (.6–.4) | .90 (.6–.4) |
MEN | .88 (.4–.6) | .88 (.7–.3) | .87 (.2–.8) | .88 (.7–.3) | .90 (.8–.2) | .90 (.9–.1) |
FAN | .53 (.5–.5) | .53 (.8–.2) | .53 (.2–.8) | .53 (.8–.2) | .56 (.7–.3) | .56 (.9–.1) |
dataset | Word2Vec | GloVe | FastText | |||
---|---|---|---|---|---|---|
WALE-1 | WALE-2 | WALE-1 | WALE-2 | WALE-1 | WALE-2 | |
MC | .99 (.3–.7) | .98 (.4–.6) | .98 (.2–.8) | .98 (.7–.3) | .98 (.9–.1) | .98 (.9–.1) |
YP | .96 (.2–.8) | .98 (.4–.6) | .93 (.2–.8) | .92 (1–0) | .96 (.5–.5) | .98 (.6–.4) |
RG | .96 (.1–.9) | .96 (.2–.8) | .96 (.1–.9) | .96 (.5–.5) | .96 (.6–.4) | .97 (.4–.6) |
MTurk-287 | .92 (.2–.8) | .93 (.3–.7) | .93 (.6–.4) | .93 (1–0) | .92 (.3–.7) | .92 (.2–.8) |
WS-SIM | .95 (.1–.9) | .95 (.1–.9) | .95 (.1–.9) | .96 (.6–.4) | .96 (.8–.2) | .96 (.9–.1) |
WS-REL | .91 (.3–.7) | .91 (.7–.3) | .92 (.3–.7) | .93 (.8–.2) | .92 (.6–.4) | .94 (.8–.2) |
RW | .88 (.3–.7) | .86 (.9–.1) | .89 (.6–.4) | .88 (1–0) | .87 (1–0) | .87 (.9–.1) |
WS-ALL | .94 (.1–.9) | .94 (.6–.4) | .95 (.2–.8) | .95 (.6–.4) | .95 (.8–.2) | .95 (.9–.1) |
MTurk-771 | .88 (.3–.7) | .88 (.8–.2) | .88 (.2–.8) | .89 (.7–.3) | .86 (.7–.3) | .88 (.7–.3) |
MEN | .86 (.8–.2) | .87 (.7–.3) | .85 (.2–.8) | .86 (.7–.3) | .89 (.6–.4) | .89 (.8–.2) |
FAN | .58 (.4–.6) | .59 (.8–.2) | .58 (.2–.8) | .58 (.7–.3) | .62 (.7–.3) | .61 (.9–.1) |
dataset | Word2Vec | GloVe | FastText | |||
---|---|---|---|---|---|---|
WALE-1 | WALE-2 | WALE-1 | WALE-2 | WALE-1 | WALE-2 | |
FAN (groups) | .67 (.4–.6) | .68 (.8–.2) | .69 (.3–.7) | .69 (.9–.1) | .69 (1–0) | .70 (.9–.1) |
dataset | Word2Vec | GloVe | FastText | |||
---|---|---|---|---|---|---|
WALE-1 | WALE-2 | WALE-1 | WALE-2 | WALE-1 | WALE-2 | |
FAN (groups) | .56 (.4–.6) | .56 (.8–.2) | 58 (.4–.6) | .58 (.9–.1) | .59 (.9–.1) | .59 (.9–.1) |
dataset | Spearman | Pearson | RankDCG’ | RankDCG” |
---|---|---|---|---|
MC | .83 (.9–.1) | .84 (.9–.1) | .97 (.9–.1) | .98 (.6–.4) |
YP | .75 (.1–.9) | .84 (.2–.8) | .94 (.4–.6) | .98 (.4–.6) |
RG | .86 (.7–.3) | .86 (.8–.2) | .95 (.4–.6) | .97 (.4–.6) |
MTurk-287 | .85 (.8–.2) | .80 (.9–.1) | .94 (.5–.5) | .94 (.7–.3) |
WS-SIM | .85 (.8–.2) | .84 (.9–.1) | .94 (.8–.2) | .96 (.8–.2) |
WS-REL | .73 (.9–.1) | .72 (.9–.1) | .91 (.8–.2) | .94 (.8–.2) |
RW | .60 (.9–.1) | .57 (.9–.1) | .85 (1–0) | .87 (.9–.1) |
WS-ALL | .79 (.8–.2) | .75 (.9–.1) | .93 (.8–.2) | .96 (.8–.2) |
MTurk-771 | .77 (.9–.1) | .75 (.8–.2) | .90 (.6–.4) | .88 (.8–.2) |
MEN | .84 (.9–.1) | .82 (1–0) | .90 (.8–.2) | .89 (.8–.2) |
FAN | .36 (.8–.2) | .38 (.8–.2) | .56 (.9–.1) | .61 (.9–.1) |