Skip to main content
Top

23-03-2023 | Original Paper

Language Barriers: Causal Evidence of Linguistic Item Bias in Multilingual Surveys

Authors: Yamil Ricardo Velez, Ángel Saavedra Cisneros, Jose Gomez

Published in: Political Behavior

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Accurate estimation of public opinion in diverse countries requires survey questions that operate similarly across languages. We leverage 3,026 bilingual Latinos across four studies to investigate how language-of-administration affects the measurement properties of political science scales. We randomly assign bilinguals to English and Spanish survey forms and uncover item bias, also known as differential item functioning (DIF), across items measuring identity, attitudes, and political knowledge. We examine whether translation errors are the culprit, yet find that perceived translation quality does not predict the magnitude of DIF and questions with minimal text exhibit item bias. Our findings suggest otherwise similar Latinos may be estimated to possess different levels of knowledge, attitude strength, and group identification due to survey language selection. We discuss the implications of our findings on the design of multilingual surveys and the study of racial and ethnic politics more broadly.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
1
TRAPD is an acronym for translation, review, adjudication, pre-testing, and documentation. This method has been used in developing cross-national surveys such as the European Social Survey (Jowell et al., 2007).
 
2
To our knowledge, political scientists have yet to use the bilingual experimental design to assess if multi-item scales measuring political attitudes and knowledge function similarly across languages (i.e., have similar item properties).
 
3
Students were recruited to participate in a study on Latino/a politics via instructor e-mails. A $50 Amazon.com gift card lottery was used to incentivize participation.
 
4
CloudResearch has been shown to recover treatment effects observed in other studies involving Latino samples (Velez et al,. 2022).
 
5
Pre-analysis plans for these two studies can be found at https://​osf.​io/​39j2d. In “Appendix G”, we reproduce the pre-analysis plans and describe deviations.
 
6
Upon being assigned to a language form, respondents were also randomly assigned to different prompts asking them to write about the importance of their ethnic identity. These treatments had no discernible effects on attitudes. We use the complete sample to preserve statistical power.
 
7
The survey block measuring ideology in Study 3 had a programming error that presented participants with an uneven number of items across both languages. No other scales were affected. We corrected the error in Study 4.
 
8
The final study performed randomization at the scale level, such that for each scale, participants could be assigned to one of four scales (English items; Spanish items; English items first, Spanish second; Spanish items first, English second). This was done to counterbalance possible order effects.
 
9
The first study randomized the order of languages. Given the large number of scales in the second and third studies, respondents in the hybrid condition always responded to English items first before Spanish items. The final study randomized the order of languages.
 
10
Study 4 used successful passage of the two-question language quiz as inclusion criteria.
 
11
In “Appendix D”, we estimate one-parameter versions of each IRT model presented in the paper, and still detect evidence of DIF for most scales.
 
12
For the HSI study, we estimate simpler one-parameter models to avoid over-fitting, given the smaller sample size (N = 194). These models estimate a difficulty parameter for each binary item and item step difficulties for each polytomous item. Item step difficulties capture the point on the latent scale where probability curves for two adjacent response categories intersect (e.g., the point on the latent scale at which the probability of “somewhat agree” is equal to “strongly agree”) (Nering & Remo, 2011).
 
13
Difficulty and discrimination have analogs in factor analysis; discrimination corresponds to factor loadings and difficulty corresponds to factor intercepts.
 
14
Given the randomization of participants to language forms, we view this as a plausible assumption.
 
15
This scale was identical to the HSI version, except for the inclusion of visual items.
 
16
We present item discrimination differences in “Appendix E”. We focus on item difficulties because they are easier to interpret and more commonly used in substantive applications of IRT.
 
17
These “mixed” cases can be observed in the anti-trans sentiment, panethnic identity, and immigration opinion scales, where the selection of lower response options is easier in English, but the selection of higher response options is easier in Spanish, and vice versa.
 
18
These translators are affiliated with the Department of Latin American and Iberian Cultures at the principal investigator’s institution, and comprise the entire set of translators listed on the Spanish translation webpage.
 
19
We thank an anonymous reviewer for this suggestion.
 
20
For binary items, the number of steps is one (i.e., the probability of moving from 0 to 1), whereas for ordinal items, the number of steps is equal to \({\text {K}}-1,\) where K represents the number of response categories. For example, four step difficulties are estimated when one is modeling a five-point scale. These step difficulties capture the probability of selecting the second response category versus the first, the third versus the second, and so on.
 
21
Peytcheva (2020) describes a theory of language-driven survey response that enumerates possible mechanisms underlying language effects. In this study, we are unable to empirically distinguish between these mechanisms, but future research could assess the psychological processes underlying linguistic DIF when translation errors are not responsible.
 
22
Notably, DIF is only detected for non-Latino public officials. However, in Study 3, question formats varied across Latino and non-Latino public officials, so it is unclear whether the DIF is due to question format or legislator ethnicity. In Study 4, we addressed this by presenting participants with Latino and non-Latino public officials within each question type. Consistent with Study 3, we tend to detect evidence of linguistic DIF for non-Latino officials (see Political Knowledge CR2 in Fig. 1). However, we fail to reject the null hypothesis of zero difference in mean absolute DIF between items measuring knowledge of Latino and non-Latino officials (\(\beta = -.06\); SE = .20; p = .77).
 
23
We thank an anonymous reviewer for suggesting this analysis.
 
Literature
go back to reference Abrajano, M. (2015). Reexamining the “racial gap’’ in political knowledge. The Journal of Politics, 77(1), 44–54.CrossRef Abrajano, M. (2015). Reexamining the “racial gap’’ in political knowledge. The Journal of Politics, 77(1), 44–54.CrossRef
go back to reference Ankenmann, R. D., Witt, E. A., & Dunbar, S. B. (1999). An investigation of the power of the likelihood ratio goodness-of-fit statistic in detecting differential item functioning. Journal of Educational Measurement, 36(4), 277–300.CrossRef Ankenmann, R. D., Witt, E. A., & Dunbar, S. B. (1999). An investigation of the power of the likelihood ratio goodness-of-fit statistic in detecting differential item functioning. Journal of Educational Measurement, 36(4), 277–300.CrossRef
go back to reference Atar, B., & Kamata, A. (2011). Comparison of IRT likelihood ratio test and logistic regression DIF detection procedures. Hacettepe University Journal of Education, 41, 36–47. Atar, B., & Kamata, A. (2011). Comparison of IRT likelihood ratio test and logistic regression DIF detection procedures. Hacettepe University Journal of Education, 41, 36–47.
go back to reference Awad, G. H., Hashem, H., & Nguyen, H. (2021). Identity and ethnic/racial self-labeling among Americans of Arab or Middle Eastern and North African descent. Identity, 21(2), 115–130.CrossRef Awad, G. H., Hashem, H., & Nguyen, H. (2021). Identity and ethnic/racial self-labeling among Americans of Arab or Middle Eastern and North African descent. Identity, 21(2), 115–130.CrossRef
go back to reference Boroditsky, L. (2001). Does language shape thought? Mandarin and English speakers’ conceptions of time. Cognitive Psychology, 43(1), 1–22.CrossRef Boroditsky, L. (2001). Does language shape thought? Mandarin and English speakers’ conceptions of time. Cognitive Psychology, 43(1), 1–22.CrossRef
go back to reference Brislin, R. W. (1970). Back-translation for cross-cultural research. Journal of Cross-Cultural Psychology, 1(3), 185–216.CrossRef Brislin, R. W. (1970). Back-translation for cross-cultural research. Journal of Cross-Cultural Psychology, 1(3), 185–216.CrossRef
go back to reference Ervin, S., & Bower, R. T. (1952). Translation problems in international surveys. Public Opinion Quarterly, 16(4), 595–604.CrossRef Ervin, S., & Bower, R. T. (1952). Translation problems in international surveys. Public Opinion Quarterly, 16(4), 595–604.CrossRef
go back to reference Flores, A., & Coppock, A. (2018). Do bilinguals respond more favorably to candidate advertisements in English or in Spanish? Political Communication, 35(4), 612–633.CrossRef Flores, A., & Coppock, A. (2018). Do bilinguals respond more favorably to candidate advertisements in English or in Spanish? Political Communication, 35(4), 612–633.CrossRef
go back to reference Harkness, J. A., Van de Vijver, F. J. R., Mohler, P. P., & Wiley, J. (2003). Cross-cultural survey methods (Vol. 325). Wiley-Interscience. Harkness, J. A., Van de Vijver, F. J. R., Mohler, P. P., & Wiley, J. (2003). Cross-cultural survey methods (Vol. 325). Wiley-Interscience.
go back to reference Harkness, J., Stange, M., Cibelli, K. L., Mohler, P., & Pennell, B.-E. (2014). Surveying cultural and linguistic minorities. In Hard-to-survey populations. Cambridge University Press. Harkness, J., Stange, M., Cibelli, K. L., Mohler, P., & Pennell, B.-E. (2014). Surveying cultural and linguistic minorities. In Hard-to-survey populations. Cambridge University Press.
go back to reference Hidalgo-Montesinos, M. D., & Gómez-Benito, J. (2003). Test Purification and the evaluation of differential item functioning with multinomial logistic regression. European Journal of Psychological Assessment, 19(1), 1.CrossRef Hidalgo-Montesinos, M. D., & Gómez-Benito, J. (2003). Test Purification and the evaluation of differential item functioning with multinomial logistic regression. European Journal of Psychological Assessment, 19(1), 1.CrossRef
go back to reference Hill, K. A., & Moreno, D. V. (2001). Language as a variable: English, Spanish, ethnicity, and political opinion polling in South Florida. Hispanic Journal of Behavioral Sciences, 23(2), 208–228.CrossRef Hill, K. A., & Moreno, D. V. (2001). Language as a variable: English, Spanish, ethnicity, and political opinion polling in South Florida. Hispanic Journal of Behavioral Sciences, 23(2), 208–228.CrossRef
go back to reference Huddy, L., Mason, L., & Aarøe, L. (2015). Expressive partisanship: Campaign involvement, political emotion, and partisan identity. American Political Science Review, 109(1), 1–17.CrossRef Huddy, L., Mason, L., & Aarøe, L. (2015). Expressive partisanship: Campaign involvement, political emotion, and partisan identity. American Political Science Review, 109(1), 1–17.CrossRef
go back to reference Iyengar, S. (1993). Assessing linguistic equivalence in multilingual surveys. In Social research in developing countries: Surveys and censuses in the Third World (pp. 173–182). London: Wiley. Iyengar, S. (1993). Assessing linguistic equivalence in multilingual surveys. In Social research in developing countries: Surveys and censuses in the Third World (pp. 173–182). London: Wiley.
go back to reference Jones-Correa, M., Al-Faham, H., & Cortez, D. (2018). Political (mis) behavior: Attention and lacunae in the study of Latino politics. Annual Review of Sociology, 44(1), 213–235.CrossRef Jones-Correa, M., Al-Faham, H., & Cortez, D. (2018). Political (mis) behavior: Attention and lacunae in the study of Latino politics. Annual Review of Sociology, 44(1), 213–235.CrossRef
go back to reference Jowell, R., Roberts, C., Fitzgerald, R., & Eva, G. (2007). Measuring attitudes cross-nationally: Lessons from the European Social Survey. SAGE. Jowell, R., Roberts, C., Fitzgerald, R., & Eva, G. (2007). Measuring attitudes cross-nationally: Lessons from the European Social Survey. SAGE.
go back to reference King, G., & Wand, J. (2007). Comparing incomparable survey responses: Evaluating and selecting anchoring vignettes. Political Analysis, 15(1), 46–66.CrossRef King, G., & Wand, J. (2007). Comparing incomparable survey responses: Evaluating and selecting anchoring vignettes. Political Analysis, 15(1), 46–66.CrossRef
go back to reference Lee, S., & Grant, D. (2009). The effect of question order on self-rated general health status in a multilingual survey context. American Journal of Epidemiology, 169(12), 1525–1530.CrossRef Lee, S., & Grant, D. (2009). The effect of question order on self-rated general health status in a multilingual survey context. American Journal of Epidemiology, 169(12), 1525–1530.CrossRef
go back to reference Lee, T., & Pérez, E. O. (2014). The persistent connection between language-of-interview and Latino political opinion. Political Behavior, 36(2), 401–425.CrossRef Lee, T., & Pérez, E. O. (2014). The persistent connection between language-of-interview and Latino political opinion. Political Behavior, 36(2), 401–425.CrossRef
go back to reference Lien, P., Margaret Conway, M., & Wong, J. (2003). The contours and sources of ethnic identity choices among Asian Americans. Social Science Quarterly, 84(2), 461–481.CrossRef Lien, P., Margaret Conway, M., & Wong, J. (2003). The contours and sources of ethnic identity choices among Asian Americans. Social Science Quarterly, 84(2), 461–481.CrossRef
go back to reference Marian, V., & Kaushanskaya, M. (2004). Self-construal and emotion in bicultural bilinguals. Journal of Memory and Language, 51(2), 190–201.CrossRef Marian, V., & Kaushanskaya, M. (2004). Self-construal and emotion in bicultural bilinguals. Journal of Memory and Language, 51(2), 190–201.CrossRef
go back to reference Nering, M. L., & Ostini, R. (2011). Handbook of polytomous item response theory models. Taylor & Francis. Nering, M. L., & Ostini, R. (2011). Handbook of polytomous item response theory models. Taylor & Francis.
go back to reference Pérez, E., & Tavits, M. (2022). Voicing politics. In Voicing politics. Princeton University Press. Pérez, E., & Tavits, M. (2022). Voicing politics. In Voicing politics. Princeton University Press.
go back to reference Pérez, E. O. (2009). Lost in translation? Item validity in bilingual political surveys. The Journal of Politics, 71(4), 1530–1548.CrossRef Pérez, E. O. (2009). Lost in translation? Item validity in bilingual political surveys. The Journal of Politics, 71(4), 1530–1548.CrossRef
go back to reference Pérez, E. O. (2011). The origins and implications of language effects in multilingual surveys: A MIMIC approach with application to Latino political attitudes. Political Analysis, 19(4), 434–454.CrossRef Pérez, E. O. (2011). The origins and implications of language effects in multilingual surveys: A MIMIC approach with application to Latino political attitudes. Political Analysis, 19(4), 434–454.CrossRef
go back to reference Pérez, E. O., & Tavits, M. (2017). Language shapes people’s time perspective and support for future-oriented policies. American Journal of Political Science, 61(3), 715–727.CrossRef Pérez, E. O., & Tavits, M. (2017). Language shapes people’s time perspective and support for future-oriented policies. American Journal of Political Science, 61(3), 715–727.CrossRef
go back to reference Pérez, E. O., & Tavits, M. (2019). Language influences public attitudes toward gender equality. The Journal of Politics, 81(1), 81–93.CrossRef Pérez, E. O., & Tavits, M. (2019). Language influences public attitudes toward gender equality. The Journal of Politics, 81(1), 81–93.CrossRef
go back to reference Peytcheva, E. (2020). The effect of language of survey administration on the response formation process. In The essential role of language in survey research (p. 1). RTI International. Peytcheva, E. (2020). The effect of language of survey administration on the response formation process. In The essential role of language in survey research (p. 1). RTI International.
go back to reference Pietryka, M. T., & Macintosh, R. C. (2017). ANES scales often don’t measure what you think they measure—An ERPC2016 analysis. Pietryka, M. T., & Macintosh, R. C. (2017). ANES scales often don’t measure what you think they measure—An ERPC2016 analysis.
go back to reference Prior, M. (2014). Visual political knowledge: A different road to competence? The Journal of Politics, 76(1), 41–57.CrossRef Prior, M. (2014). Visual political knowledge: A different road to competence? The Journal of Politics, 76(1), 41–57.CrossRef
go back to reference Ramakrishnan, K., & Ahmad, F. Z. (2014). Language diversity and English proficiency (p. 27). Center for American Progress. Ramakrishnan, K., & Ahmad, F. Z. (2014). Language diversity and English proficiency (p. 27). Center for American Progress.
go back to reference Ramirez, C. M., Abrajano, M. A., & Michael Alvarez, R. (2019). Using machine learning to uncover hidden heterogeneities in survey data. Scientific Reports, 9(1), 1–11.CrossRef Ramirez, C. M., Abrajano, M. A., & Michael Alvarez, R. (2019). Using machine learning to uncover hidden heterogeneities in survey data. Scientific Reports, 9(1), 1–11.CrossRef
go back to reference Saavedra Cisneros, A., Carey Jr, T. E., Rogers, D. L., & Johnson, J. M. (2022). One size does not fit all: Core political values and principles across race, ethnicity, and gender. Politics, Groups, and Identities 1–20. Saavedra Cisneros, A., Carey Jr, T. E., Rogers, D. L., & Johnson, J. M. (2022). One size does not fit all: Core political values and principles across race, ethnicity, and gender. Politics, Groups, and Identities 1–20.
go back to reference Sireci, S. G. (1997). Problems and issues in linking assessments across languages. Educational Measurement: Issues and Practice, 16(1), 12–19.CrossRef Sireci, S. G. (1997). Problems and issues in linking assessments across languages. Educational Measurement: Issues and Practice, 16(1), 12–19.CrossRef
go back to reference Sireci, S. G., & Berberoglu, G. (2000). Using bilingual respondents to evaluate translated-adapted items. Applied Measurement in Education, 13(3), 229–248.CrossRef Sireci, S. G., & Berberoglu, G. (2000). Using bilingual respondents to evaluate translated-adapted items. Applied Measurement in Education, 13(3), 229–248.CrossRef
go back to reference Slobin, D. I. (1996). From “thought and language” to “thinking for speaking”. In Rethinking linguistic relativity (pp. 70–96). Cambridge University Press. Slobin, D. I. (1996). From “thought and language” to “thinking for speaking”. In Rethinking linguistic relativity (pp. 70–96). Cambridge University Press.
go back to reference Welch, S., Comer, J., & Steinman, M. (1973). Interviewing in a Mexican–American community: An investigation of some potential sources of response bias. The Public Opinion Quarterly, 37(1), 115–126.CrossRef Welch, S., Comer, J., & Steinman, M. (1973). Interviewing in a Mexican–American community: An investigation of some potential sources of response bias. The Public Opinion Quarterly, 37(1), 115–126.CrossRef
go back to reference Willis, G. B. (2015). The practice of cross-cultural cognitive interviewing. Public Opinion Quarterly, 79(S1), 359–395.CrossRef Willis, G. B. (2015). The practice of cross-cultural cognitive interviewing. Public Opinion Quarterly, 79(S1), 359–395.CrossRef
go back to reference Wong, J. S., Karthick Ramakrishnan, S., Lee, T., Junn, J., & Wong, J. (2011). Asian American political participation: Emerging constituents and their political identities. Russell Sage Foundation. Wong, J. S., Karthick Ramakrishnan, S., Lee, T., Junn, J., & Wong, J. (2011). Asian American political participation: Emerging constituents and their political identities. Russell Sage Foundation.
go back to reference Zavala-Rojas, D. (2018). Exploring language effects in crosscultural survey research: Does the language of administration affect answers about politics? Methods, Data, Analyses: A Journal for Quantitative Methods and Survey Methodology (MDA), 12(1), 127–150. Zavala-Rojas, D. (2018). Exploring language effects in crosscultural survey research: Does the language of administration affect answers about politics? Methods, Data, Analyses: A Journal for Quantitative Methods and Survey Methodology (MDA), 12(1), 127–150.
Metadata
Title
Language Barriers: Causal Evidence of Linguistic Item Bias in Multilingual Surveys
Authors
Yamil Ricardo Velez
Ángel Saavedra Cisneros
Jose Gomez
Publication date
23-03-2023
Publisher
Springer US
Published in
Political Behavior
Print ISSN: 0190-9320
Electronic ISSN: 1573-6687
DOI
https://doi.org/10.1007/s11109-023-09869-8