1 Introduction
2 The roots and consequences of bias against the poor
3 Detection of bias against the poor: materials and methods
3.1 Materials
3.1.1 Target terms and attributes
3.1.2 Word coding/embeddings
3.1.3 Pre-trained embeddings
-
Google news word2vec pre-trained embeddingThe Google News 300 word embedding is a pre-trained model of word representation as vectors, using 300 features or coordinates in a 300-dimensional system. This model was trained using a Google News database (about 100 million words). A representation of more than 3 million words and phrases was obtained. The base algorithm used for the creation of this embedding was proposed by Mikolov et al. (2013). The resulting model has a weight of 1.3 Gb.
-
Wikipedia GloVe pre-trained embeddingThe Wikipedia GloVe word embedding is a pre-trained word representation model, using the GloVe technique based on the global co-occurrence matrix between words. The training corpus is a dataset of Wikipedia publications. The Wikipedia corpus contains about 2000 million words of text from 4400 million Wikipedia pages consolidated up to 2014. Additionally, it contains the Gigaword 5 dataset, a comprehensive collection of news text data that has been acquired over several years by the Linguistic Data Consortium (LDC) and contains 4 billion words. The resulting word representation model contains 6 billion tokens, 400 thousand vocabulary words and was trained with all words uncased. Thus, there are four versions of trained embeddings with different vector dimensions: 50, 100, 200 and 300 dimensions. The weight of the resulting model is 822 MB.
-
Twitter GloVe pre-trained embeddingThe Twitter GloVe word embedding is a pre-trained word representation model using the GloVe technique based on the global co-occurrence matrix between words. The training corpus is a dataset of tweets extracted from Twitter social network. For the construction of the model, 2 billion tweets written in English were taken. The resulting model contains 27 billion tokens, 1.2 million vocabulary words and was trained with all words uncased. For this word representation model, there are 25-, 50-, 100- and 200-dimensional versions. The weight of the resulting model is 1.42 GB
3.2 Methods
3.2.1 Semantic analysis of words based on vector distances
3.2.2 Cosine distance between words
3.2.3 Calculation of the dot product between words
3.2.4 Semantic relations between target and attribute words based on cosine distance
3.2.5 Identifying logical relationships (analogies) in the same context (embedding)
4 Results and discussion
Negative attributes | Proximity to “poor” (cosine) | Proximity to “rich” (cosine) | Relative value: 1 suggests attribute closer to “poor” | Relative distance to “poor” (in radians) | Relative distance to “rich” (in radians) | Aporophobia bias indicator (ABI) |
---|---|---|---|---|---|---|
Substandard | 0.518799 | 0.065894 | 1 | 1.025350 | 1.504854 | 0.479503 |
Dreadful | 0.496364 | 0.108623 | 1 | 1.051390 | 1.461958 | 0.410568 |
Mediocre | 0.525181 | 0.157387 | 1 | 1.017868 | 1.412751 | 0.394883 |
Inferior | 0.442338 | 0.154269 | 1 | 1.112590 | 1.415908 | 0.303316 |
Indifference | 0.295424 | 0.049471 | 1 | 1.270896 | 1.521304 | 0.250408 |
Displeasure | 0.181486 | − 0.043921 | 1 | 1.388298 | 1.614732 | 0.226433 |
Humiliating | 0.236273 | 0.013788 | 1 | 1.332267 | 1.557007 | 0.224740 |
Abhorrent | 0.177211 | − 0.034837 | 1 | 1.392643 | 1.605641 | 0.212997 |
Disgust | 0.175618 | − 0.033866 | 1 | 1.394262 | 1.604669 | 0.210406 |
Disrespect | 0.178972 | − 0.002676 | 1 | 1.390853 | 1.573472 | 0.182618 |
Disregard | 0.165259 | − 0.011534 | 1 | 1.404775 | 1.582331 | 0.177555 |
Fear | 0.174980 | 0.019890 | 1 | 1.394910 | 1.550904 | 0.155994 |
Irritation | 0.152907 | 0.011789 | 1 | 1.417287 | 1.559006 | 0.141719 |
Hostile | 0.185884 | 0.045462 | 1 | 1.383824 | 1.525318 | 0.141493 |
Rudeness | 0.176455 | 0.038615 | 1 | 1.393411 | 1.532171 | 0.138759 |
Annoyance | 0.110991 | − 0.026991 | 1 | 1.459575 | 1.597791 | 0.138215 |
Disgusting | 0.259967 | 0.133528 | 1 | 1.307807 | 1.436867 | 0.129059 |
Hostility | 0.132259 | 0.040978 | 1 | 1.438148 | 1.529806 | 0.091657 |
Rejection | 0.100165 | 0.037907 | 1 | 1.470462 | 1.532879 | 0.062416 |
Contempt | 0.091754 | 0.034602 | 1 | 1.478912 | 1.536186 | 0.057273 |
Hate | 0.166657 | 0.111664 | 1 | 1.403357 | 1.458898 | 0.055540 |
Insult | 0.150543 | 0.107800 | 1 | 1.419678 | 1.462786 | 0.043107 |
Aversion | 0.169729 | 0.132875 | 1 | 1.400240 | 1.437526 | 0.037285 |
hate act | 0.143041 | 0.111930 | 1 | 1.427262 | 1.458631 | 0.031369 |
hate speech | 0.154789 | 0.134926 | 1 | 1.415381 | 1.435456 | 0.020075 |
Antipathy | 0.082810 | 0.075422 | 1 | 1.487891 | 1.495302 | 0.007411 |
Favourable attributes | Proximity to “poor” (cosine) | Proximity to “rich” (cosine) | Relative value: 1 suggests attribute closer to the poor | Relative distance to “poor” (in radians) | Relative distance to “rich” (in radians) | Aporophobia bias indicator (ABI) |
---|---|---|---|---|---|---|
Sympathy | 0.169531 | 0.018321 | 1 | 1.400441 | 1.552474 | 0.152032 |
Politeness | 0.132293 | 0.068439 | 1 | 1.438114 | 1.502303 | 0.064189 |
Pleasing | 0.227241 | 0.174897 | 1 | 1.341551 | 1.394995 | 0.053443 |
Goodwill | 0.088890 | 0.039868 | 1 | 1.481787 | 1.530918 | 0.049129 |
Cordiality | 0.043623 | 0.007792 | 1 | 1.527159 | 1.563004 | 0.035845 |
Happy | 0.212202 | 0.180576 | 1 | 1.356968 | 1.389223 | 0.032255 |
Fearless | 0.100959 | 0.069186 | 1 | 1.469664 | 1.501554 | 0.031889 |
Pride | 0.104457 | 0.088019 | 1 | 1.466148 | 1.482663 | 0.016514 |
Friendliness | 0.178084 | 0.175157 | 1 | 1.391756 | 1.394731 | 0.002974 |
Courageous | 1 | 1 | 0 | 0 | 0 | 0 |
Self-assurance | 1 | 1 | 0 | 0 | 0 | 0 |
Carelessness | 1 | 1 | 0 | 0 | 0 | 0 |
Defence | 1 | 1 | 0 | 0 | 0 | 0 |
Affection | 0.100301 | 0.10674 | 0 | 1.470325 | 1.463852 | − 0.006474 |
Liked | 0.125296 | 0.135883 | 0 | 1.445169 | 1.434491 | − 0.010678 |
Delight | 0.033640 | 0.045317 | 0 | 1.537149 | 1.525463 | − 0.011687 |
Desire | 0.085015 | 0.096916 | 0 | 1.485677 | 1.473728 | − 0.011949 |
Pleasant | 0.168783 | 0.187770 | 0 | 1.401201 | 1.381905 | − 0.019297 |
acceptation | 0.049464 | 0.099845 | 0 | 1.521311 | 1.470784 | − 0.050527 |
appreciation | 0.005268 | 0.075830 | 0 | 1.565527 | 1.494893 | − 0.070635 |
independence | 0.067198 | 0.141933 | 0 | 1.503546 | 1.428382 | − 0.075165 |
Love | 0.107482 | 0.184401 | 0 | 1.463105 | 1.385334 | − 0.077772 |
Delightful | 0.131124 | 0.215119 | 0 | 1.439293 | 1.353983 | − 0.085311 |
Flattery | 0.054658 | 0.140086 | 0 | 1.516110 | 1.430247 | − 0.085864 |
Friendly | 0.184168 | 0.271432 | 0 | 1.385570 | 1.295916 | − 0.089655 |
Endorsement | − 0.049720 | 0.057279 | 0 | 1.620537 | 1.513486 | − 0.107052 |
Taste | 0.147377 | 0.261997 | 0 | 1.422879 | 1.305705 | − 0.117175 |
Pleasure | − 0.005007 | 0.120311 | 0 | 1.575803 | 1.450193 | − 0.125610 |
Attractive | 0.146302 | 0.282672 | 0 | 1.423967 | 1.284217 | − 0.139750 |