Top

Political Behavior

23-03-2023 | Original Paper

Language Barriers: Causal Evidence of Linguistic Item Bias in Multilingual Surveys

Authors: Yamil Ricardo Velez, Ángel Saavedra Cisneros, Jose Gomez

Published in: Political Behavior

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Accurate estimation of public opinion in diverse countries requires survey questions that operate similarly across languages. We leverage 3,026 bilingual Latinos across four studies to investigate how language-of-administration affects the measurement properties of political science scales. We randomly assign bilinguals to English and Spanish survey forms and uncover item bias, also known as differential item functioning (DIF), across items measuring identity, attitudes, and political knowledge. We examine whether translation errors are the culprit, yet find that perceived translation quality does not predict the magnitude of DIF and questions with minimal text exhibit item bias. Our findings suggest otherwise similar Latinos may be estimated to possess different levels of knowledge, attitude strength, and group identification due to survey language selection. We discuss the implications of our findings on the design of multilingual surveys and the study of racial and ethnic politics more broadly.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Available only for authorised users

TRAPD is an acronym for translation, review, adjudication, pre-testing, and documentation. This method has been used in developing cross-national surveys such as the European Social Survey (Jowell et al., 2007).

To our knowledge, political scientists have yet to use the bilingual experimental design to assess if multi-item scales measuring political attitudes and knowledge function similarly across languages (i.e., have similar item properties).

Students were recruited to participate in a study on Latino/a politics via instructor e-mails. A $50 Amazon.com gift card lottery was used to incentivize participation.

CloudResearch has been shown to recover treatment effects observed in other studies involving Latino samples (Velez et al,. 2022).

Pre-analysis plans for these two studies can be found at https://osf.io/39j2d. In “Appendix G”, we reproduce the pre-analysis plans and describe deviations.

Upon being assigned to a language form, respondents were also randomly assigned to different prompts asking them to write about the importance of their ethnic identity. These treatments had no discernible effects on attitudes. We use the complete sample to preserve statistical power.

The survey block measuring ideology in Study 3 had a programming error that presented participants with an uneven number of items across both languages. No other scales were affected. We corrected the error in Study 4.

The final study performed randomization at the scale level, such that for each scale, participants could be assigned to one of four scales (English items; Spanish items; English items first, Spanish second; Spanish items first, English second). This was done to counterbalance possible order effects.

The first study randomized the order of languages. Given the large number of scales in the second and third studies, respondents in the hybrid condition always responded to English items first before Spanish items. The final study randomized the order of languages.

Study 4 used successful passage of the two-question language quiz as inclusion criteria.

In “Appendix D”, we estimate one-parameter versions of each IRT model presented in the paper, and still detect evidence of DIF for most scales.

For the HSI study, we estimate simpler one-parameter models to avoid over-fitting, given the smaller sample size (N = 194). These models estimate a difficulty parameter for each binary item and item step difficulties for each polytomous item. Item step difficulties capture the point on the latent scale where probability curves for two adjacent response categories intersect (e.g., the point on the latent scale at which the probability of “somewhat agree” is equal to “strongly agree”) (Nering & Remo, 2011).

Difficulty and discrimination have analogs in factor analysis; discrimination corresponds to factor loadings and difficulty corresponds to factor intercepts.

Given the randomization of participants to language forms, we view this as a plausible assumption.

This scale was identical to the HSI version, except for the inclusion of visual items.

We present item discrimination differences in “Appendix E”. We focus on item difficulties because they are easier to interpret and more commonly used in substantive applications of IRT.

These “mixed” cases can be observed in the anti-trans sentiment, panethnic identity, and immigration opinion scales, where the selection of lower response options is easier in English, but the selection of higher response options is easier in Spanish, and vice versa.

These translators are affiliated with the Department of Latin American and Iberian Cultures at the principal investigator’s institution, and comprise the entire set of translators listed on the Spanish translation webpage.

We thank an anonymous reviewer for this suggestion.

For binary items, the number of steps is one (i.e., the probability of moving from 0 to 1), whereas for ordinal items, the number of steps is equal to ${\text {K}}-1,$ where K represents the number of response categories. For example, four step difficulties are estimated when one is modeling a five-point scale. These step difficulties capture the probability of selecting the second response category versus the first, the third versus the second, and so on.

Peytcheva (2020) describes a theory of language-driven survey response that enumerates possible mechanisms underlying language effects. In this study, we are unable to empirically distinguish between these mechanisms, but future research could assess the psychological processes underlying linguistic DIF when translation errors are not responsible.

Notably, DIF is only detected for non-Latino public officials. However, in Study 3, question formats varied across Latino and non-Latino public officials, so it is unclear whether the DIF is due to question format or legislator ethnicity. In Study 4, we addressed this by presenting participants with Latino and non-Latino public officials within each question type. Consistent with Study 3, we tend to detect evidence of linguistic DIF for non-Latino officials (see Political Knowledge CR2 in Fig. 1). However, we fail to reject the null hypothesis of zero difference in mean absolute DIF between items measuring knowledge of Latino and non-Latino officials ($\beta = -.06$; SE = .20; p = .77).

We thank an anonymous reviewer for suggesting this analysis.

Abrajano, M. (2015). Reexamining the “racial gap’’ in political knowledge. The Journal of Politics, 77(1), 44–54.CrossRef

Ankenmann, R. D., Witt, E. A., & Dunbar, S. B. (1999). An investigation of the power of the likelihood ratio goodness-of-fit statistic in detecting differential item functioning. Journal of Educational Measurement, 36(4), 277–300.CrossRef

Atar, B., & Kamata, A. (2011). Comparison of IRT likelihood ratio test and logistic regression DIF detection procedures. Hacettepe University Journal of Education, 41, 36–47.

Awad, G. H., Hashem, H., & Nguyen, H. (2021). Identity and ethnic/racial self-labeling among Americans of Arab or Middle Eastern and North African descent. Identity, 21(2), 115–130.CrossRef

Boroditsky, L. (2001). Does language shape thought? Mandarin and English speakers’ conceptions of time. Cognitive Psychology, 43(1), 1–22.CrossRef

Brislin, R. W. (1970). Back-translation for cross-cultural research. Journal of Cross-Cultural Psychology, 1(3), 185–216.CrossRef

Ervin, S., & Bower, R. T. (1952). Translation problems in international surveys. Public Opinion Quarterly, 16(4), 595–604.CrossRef

Flores, A., & Coppock, A. (2018). Do bilinguals respond more favorably to candidate advertisements in English or in Spanish? Political Communication, 35(4), 612–633.CrossRef

Gomez-Aguinaga, B. (2021). Messaging “en Español’’: The impact of Spanish language on linked fate among bilingual Latinos. The International Journal of Press/Politics. https://doi.org/10.1177/19401612211050889.CrossRef

Harkness, J. A., Van de Vijver, F. J. R., Mohler, P. P., & Wiley, J. (2003). Cross-cultural survey methods (Vol. 325). Wiley-Interscience.

Harkness, J., Stange, M., Cibelli, K. L., Mohler, P., & Pennell, B.-E. (2014). Surveying cultural and linguistic minorities. In Hard-to-survey populations. Cambridge University Press.

Hidalgo-Montesinos, M. D., & Gómez-Benito, J. (2003). Test Purification and the evaluation of differential item functioning with multinomial logistic regression. European Journal of Psychological Assessment, 19(1), 1.CrossRef

Hill, K. A., & Moreno, D. V. (2001). Language as a variable: English, Spanish, ethnicity, and political opinion polling in South Florida. Hispanic Journal of Behavioral Sciences, 23(2), 208–228.CrossRef

Huddy, L., Mason, L., & Aarøe, L. (2015). Expressive partisanship: Campaign involvement, political emotion, and partisan identity. American Political Science Review, 109(1), 1–17.CrossRef

Iyengar, S. (1993). Assessing linguistic equivalence in multilingual surveys. In Social research in developing countries: Surveys and censuses in the Third World (pp. 173–182). London: Wiley.

Jones-Correa, M., Al-Faham, H., & Cortez, D. (2018). Political (mis) behavior: Attention and lacunae in the study of Latino politics. Annual Review of Sociology, 44(1), 213–235.CrossRef

Jowell, R., Roberts, C., Fitzgerald, R., & Eva, G. (2007). Measuring attitudes cross-nationally: Lessons from the European Social Survey. SAGE.

King, G., & Wand, J. (2007). Comparing incomparable survey responses: Evaluating and selecting anchoring vignettes. Political Analysis, 15(1), 46–66.CrossRef

Lee, S., & Grant, D. (2009). The effect of question order on self-rated general health status in a multilingual survey context. American Journal of Epidemiology, 169(12), 1525–1530.CrossRef

Lee, T., & Pérez, E. O. (2014). The persistent connection between language-of-interview and Latino political opinion. Political Behavior, 36(2), 401–425.CrossRef

Lien, P., Margaret Conway, M., & Wong, J. (2003). The contours and sources of ethnic identity choices among Asian Americans. Social Science Quarterly, 84(2), 461–481.CrossRef

Marian, V., & Kaushanskaya, M. (2004). Self-construal and emotion in bicultural bilinguals. Journal of Memory and Language, 51(2), 190–201.CrossRef

Nering, M. L., & Ostini, R. (2011). Handbook of polytomous item response theory models. Taylor & Francis.

Pérez, E., & Tavits, M. (2022). Voicing politics. In Voicing politics. Princeton University Press.

Pérez, E. O. (2009). Lost in translation? Item validity in bilingual political surveys. The Journal of Politics, 71(4), 1530–1548.CrossRef

Pérez, E. O. (2011). The origins and implications of language effects in multilingual surveys: A MIMIC approach with application to Latino political attitudes. Political Analysis, 19(4), 434–454.CrossRef

Pérez, E. O., & Tavits, M. (2016). Language shapes public attitudes toward gender equality. The Journal of Politics. https://doi.org/10.1086/700004.CrossRef

Pérez, E. O., & Tavits, M. (2017). Language shapes people’s time perspective and support for future-oriented policies. American Journal of Political Science, 61(3), 715–727.CrossRef

Pérez, E. O., & Tavits, M. (2019). Language influences public attitudes toward gender equality. The Journal of Politics, 81(1), 81–93.CrossRef

Peytcheva, E. (2020). The effect of language of survey administration on the response formation process. In The essential role of language in survey research (p. 1). RTI International.

Pietryka, M. T., & Macintosh, R. C. (2017). ANES scales often don’t measure what you think they measure—An ERPC2016 analysis.

Prior, M. (2014). Visual political knowledge: A different road to competence? The Journal of Politics, 76(1), 41–57.CrossRef

Ramakrishnan, K., & Ahmad, F. Z. (2014). Language diversity and English proficiency (p. 27). Center for American Progress.

Ramirez, C. M., Abrajano, M. A., & Michael Alvarez, R. (2019). Using machine learning to uncover hidden heterogeneities in survey data. Scientific Reports, 9(1), 1–11.CrossRef

Saavedra Cisneros, A., Carey Jr, T. E., Rogers, D. L., & Johnson, J. M. (2022). One size does not fit all: Core political values and principles across race, ethnicity, and gender. Politics, Groups, and Identities 1–20.

Sireci, S. G. (1997). Problems and issues in linking assessments across languages. Educational Measurement: Issues and Practice, 16(1), 12–19.CrossRef

Sireci, S. G., & Berberoglu, G. (2000). Using bilingual respondents to evaluate translated-adapted items. Applied Measurement in Education, 13(3), 229–248.CrossRef

Slobin, D. I. (1996). From “thought and language” to “thinking for speaking”. In Rethinking linguistic relativity (pp. 70–96). Cambridge University Press.

Velez, Y. R., Porter, E., & Wood, T. (2022). Latino-targeted misinformation and the power of factual corrections. Journal of Politics. https://doi.org/10.1086/722345.CrossRef

Welch, S., Comer, J., & Steinman, M. (1973). Interviewing in a Mexican–American community: An investigation of some potential sources of response bias. The Public Opinion Quarterly, 37(1), 115–126.CrossRef

Willis, G. B. (2015). The practice of cross-cultural cognitive interviewing. Public Opinion Quarterly, 79(S1), 359–395.CrossRef

Wong, J. S., Karthick Ramakrishnan, S., Lee, T., Junn, J., & Wong, J. (2011). Asian American political participation: Emerging constituents and their political identities. Russell Sage Foundation.

Zavala-Rojas, D. (2018). Exploring language effects in crosscultural survey research: Does the language of administration affect answers about politics? Methods, Data, Analyses: A Journal for Quantitative Methods and Survey Methodology (MDA), 12(1), 127–150.

Title: Language Barriers: Causal Evidence of Linguistic Item Bias in Multilingual Surveys
Authors: Yamil Ricardo Velez
Ángel Saavedra Cisneros
Jose Gomez
Publication date: 23-03-2023
Publisher: Springer US
Published in: Political Behavior
Print ISSN: 0190-9320
Electronic ISSN: 1573-6687
DOI: https://doi.org/10.1007/s11109-023-09869-8