2.1 Reducing SDB: a matter of privacy and anonymity
The increasingly ubiquitous digitization of all domains of life is opening up exciting new research options (Groves
2011; Hill et al.
2020). Recent developments such as affective computing and sentiment analysis (Cambria
2016) or automated hate-speech recognition (Greevy and Smiton
2004; Laaksonen et al.
2020) rely on digital traces, including social media usage, to detect sensitive attitudes such as AIS. While side-stepping traditional manifestations of response bias, such innovative data sources and research techniques are in turn vexed by various kinds of bias (Sen et al.
2019), and there are no agreed procedures (yet) for deriving population estimates from such data (Japec et al.
2015:872). As of today, for scholars aiming to estimate the prevalence of attitudes and behaviours in large populations, the self-report sample survey remains the foremost tool at hand (Groves
2011; Hill et al.
2020).
However, traditional survey methods are subject to manifold problems and limitations. Survey research relies on two main assumptions: the first one is that the sampled individuals are representative of the target population, and the second that respondents report information accurately (Groves et al.
2009). As every survey practitioner knows, accomplishing these goals is not a straightforward task as multiple sources of error may arise in the process. Some respondents may fail to understand the question or lack the information required to give a proper answer. And when questions are perceived as intrusive or embarrassing, respondents may deliberately distort their answers (Tourangeau and Yan
2007).
When faced with topics of a sensitive nature, some respondents will edit their responses in order to manage the impression they make on others or, arguably, even to deceive themselves (Paulhus
1984). The tendency “to make oneself look good in terms of prevailing cultural norms when answering to specific survey questions” (Krumpal
2013) is known as social desirability bias (SDB) or socially desirable responding and has been extensively studied by psychologists and survey methodologists. Extant research has shown that socially objectionable behaviors such as drug use, binge drinking, abortion and sexual risk-taking are usually underestimated in surveys, as are racism, sexism and other socially ill-regarded attitudes (cf. Krysan
1998; Tourangeau and Yan
2007; Krumpal
2013). SDB also explains why surveys tend to overestimate well-considered behaviors like voting, charitable giving, energy conservation, church attendance, seat belt use, and the like (Tourangeau and Yan
2007).
Mode comparison studies have consistently shown that self-administered modes of data collection usually yield more accurate answers to sensitive questions than interviewer-administered ones (Tourangeau et al.
2013; Tourangeau and Yan
2007) and that this effect is particularly strong for computerized forms of self-administration (Gnambs and Kaspar
2015; Richman et al.
1999). Yet, while privacy is generally accepted to be a necessary condition when dealing with sensitive topics, some evidence suggests that it may not be sufficient to avoid SDB, depending on the perceived level of anonymity of the survey situation (Callegaro, Manfreda, and Vehovar
2015; Tourangeau
2018; Tourangeau and Yan
2007). Several studies reveal that SDB may still be an issue for self-administered surveys that ask for respondent identification (Joinson
1999), when survey notifications are personalized (Heerwegh et al.
2005; Joinson
1999),
1 when staff remain close by during Computer-Assisted Self-Interviews (CASI) (Liu and Wang
2016), or in panel contexts which rely on prior information on and communication with respondents (Coutts and Jann
2011).
To ensure that respondents perceive guaranteed anonymity, survey methodologists have developed specialized questioning techniques which “(make) it impossible to directly link incriminating data to an individual” (Nuno and Saint John
2015).
2 These techniques have been used successfully to inquire about different sensitive topics, from sexual risk behaviors or drug use to employee theft and vote buying (Aronow et al.
2015; Coutts and Jann
2011). Among them, the item count technique (ICT), also known as list experiment or unmatched count technique (Imai
2011), is gaining ground among scholars trying to quantify the effect of SDB on the measurement of sensitive behaviors and attitudes (Wolter and Laier
2014).
ICT randomly assigns respondents to two experimental groups, to then ask about the
number of behaviors they have engaged in/abstained from or the
number of attitudinal items they support/reject. The list proposed to the treatment group adds the sensitive item under research to the “innocuous” list offered to the control group. By computing the difference between the average numbers obtained for both groups, researchers can estimate the population proportion that supports (or rejects, as the case may be) the sensitive item net of social desirability pressures. The size of SDB is calculated by comparing these estimates to the proportion obtained with a direct question (DQ) regarding that same sensitive item (Lax, Phillips and Stollwerk
2016).
Numerous studies have found ICT to reduce SDB in self-administered paper questionnaires as well as in face-to-face, CATI, and CAWI surveys (Wolter and Laier
2014). However, some studies have found that even ICT may be subject to SDB and, as is the case with other survey instruments, to non-strategic respondent error (Ahlquist
2018). In some cases, ICT misreporting may be induced by a design that endangers the unobtrusive quality of the experiment: i.e. when none (floor effect) or all (ceiling effect) of the control items apply to a significant share of respondents (Blair and Imai
2012; Glynn
2013; Kuklinski, Cobb and Gilens
1997a,
b). In other cases, respondents may remain suspicious of the instrument
3 and offer deflated (or inflated, in the case of desirable behaviors or attitudes) item-counts to send a clear signal of disassociation from (or association with) the sensitive item. Negative ICT estimates suggest the existence of a deflation effect, whereas estimates exceeding 1 indicate an inflation effect (Zigerell
2011); still, there is scant empirical evidence to date of such distortions.
Extant scholarship on SDB-reducing survey techniques suggests that the list-experiment generates higher AIS estimates than a direct question (H1). While there is no guarantee that unobtrusive question formats capture the true extent of AIS, the prevailing view in the extant literature is that ICT estimates lack discernible traces of SDB (H2).
2.2 Measuring AIS: is more always better?
Among ATII researchers, interest in innovative survey procedures such as ICT has been rather limited to date even though race-relations research, from which ATII students have borrowed numerous conceptual and methodological blueprints (Ceobanu and Escandell
2010; Fussell
2014), proved their worth (Gilens, Sniderman, and Kuklinski
1998; Heerwig and McCabe
2009; Kuklinski et al.
1997a,
b; Kuklinski, Cobb, and Gilens
1997a,
b; Redlawsk, Tolbert, and Franko
2010; Sniderman and Carmines
1997). Instead, most ATII scholarship confides in generic survey routines or diluted focal constructs as alleged antidotes against SDB. We first review these two kinds of studies, then those which do employ State-of-the-Art techniques. Table
1 provides an overview on extant measurement approaches and their application to ATII studies.
Table 1
Overview: techniques for estimating sensitive attitudes and SDB
Generic survey routines | Assurance of confidentiality Non-reactive interviewing Neutral questions and scales | Reduction of SDB Easy implementation Covariates analysis | Remaining SDB unknown | Very frequent, especially in studies of policy preferences |
Indirect survey questions | Proxy indicators | Policy-related survey items widely available Easy implementation Covariates analysis | Dilution of focal construct False positives Magnitude of SDB unknown | Frequent in studies of AIS (cf. expansive notions of prejudice) |
Self-administered surveys | Computer-assisted web interview (CAWI) | Reduction of SDB | Coverage bias in CAWI Remaining SDB unknown | Increasingly frequent |
Survey experiments | Item count technique (ICT) Randomized Response Technique (RRT) | Potential for estimating both the sensitive attitude and, by comparison with a direct question, the extent of SDB | Difficult to implement Possibly still subject to SDB Limited use in statistical models | Few antecedents, most of which focussing on policy preferences |
Digital traces (“big data”) | Affective computing Pattern recognition | Real-time, low-cost Huge data volumes “Revealed preferences” (via online behaviour) | Requires data science skills Coverage and selection biases Access issues (proprietary data) Scant capacity of detecting covariates | Incipient |
To keep response bias at bay, a sizable share of ATII research depends solely on generic survey routines such as confidentiality assurances, non-reactive interviewing, and non-suggestive semantics and scales. The possibility of dishonest answers is occasionally acknowledged and educational attainment flagged as a potential SDB covariate (e.g. Burns and Gimpel
2000:205), but for practical purposes, obtrusive ATII estimators are taken at face value. This run-of-the-mill approach prevails in studies aiming to explain migration policy preferences (e.g. Bohman and Hjerm
2016; Citrin et al.
1997; Hainmueller and Hiscox
2007; Hiers, Soehl, and Wimmer
2017; Sides and Citrin
2007), a thematic dimension on which large public-domain datasets, such as the European Social Survey, provide a nuanced range of head-on items. Even if unobtrusive indicators were available as readily, they are rather ill-suited for delivering dependents of explanatory models: the anonymity guarantee awarded by ICT and similar procedures comes at the price of severing any tie between individual respondents, on one hand, and scores of the sensitive item, on the other. This drawback was recently eased by the development of imputation techniques (Blair and Imai
2012; Chou, Imai and Rosenfeld
2017; Corstange
2009; Holbrook and Krosnick
2010; Imai
2011), but these entail high standard errors. Thus, from a model-optimization perspective, the aim of discerning ATII determinants is best served when all variables—including the dependent—originate in DQs. However, such models would be of limited value if respondents who candidly express unfavourable ATII were to differ substantially, in terms of sociodemographic and attitudinal profile, from those giving deceitful answers (Janus
2010). To assess this possibility, this study compares predictor models for obtrusive and unobtrusive gauges of the same attitude facet. We hypothesize predictors of ICT-based and DQ-based AIS estimates to differ at least partially from one another (H3).
Reliance on generic quality routines treats ATII as ordinary public preferences, i.e., favorable and unfavorable views are supposed to be equally legitimate. This assumption is dubious: the shockingly swift progression from “idle chatter” to two World Wars and the Holocaust forged a generalized commitment against
all forms of ethnic and racial prejudice (Allport
1954:14–15)—including disrespectful verbalizations. To the extent to which unfavourable views are thought to convey ethnic or racial overtones, survey respondents may therefore shun their manifestation. Since such connotations are especially obvious with regard to outright animosity, researchers of anti-immigrant prejudice have recognized the need for safeguards against dishonest or evasive answers. Yet, what counts as a safeguard when true population parameters are unknown? Because undesirable attitudes cannot be validated externally, the highest estimator is generally accepted as best approximation (Höglinger and Jann
2018). Scholars of anti-immigrant prejudice have doubled down on the “more-is-better” approach by interpreting
any unfavorable view regarding international migration as telltale of gratuitous hostility: “most theoretical models about attitudes toward immigration share the idea that anti-immigration attitudes are a form of prejudice” (Wilkes, Guppy, and Farris
2008:303). Acceptance of this conception was fueled by notions of “symbolic” or “subtle” prejudice (Gaertner and Dovidio
1986; Kinder and Sanders
1996; Sears
1988) and by the outright equation of perceived group-competition with prejudice (Bobo
1999; Quillian
1995); in contrast, classic formulations had considered such perceptions a (potentially forceful) trigger of prejudice, rather than its equivalent (Allport
1954:229–232). While inhospitable policy preferences may conceivably be less bias-prone than items regarding virulent animosity (Cea D’Ancona
2014), higher sample shares may also reflect nuanced positions toward distinct ATII facets (Ceobanu and Escandell
2010: 311–13). And while it is impossible to evaluate the justifications of natives’ qualms (Esses, Jackson, and Armstrong
1998), the potential benefit of classifying any qualms as prejudice has to be weighed against the cost of conceptually eliminating the very possibility of legitimate concerns (Rinken
2016). Few studies on anti-immigrant prejudice (e.g. Hello, Scheepers, and Gijsberts
2002) employ specific gauges of animosity; instead, unwelcoming policy preferences or unfavorable impact assessments are used as indicators of “anti-foreigner sentiment” (Semyonov, Raijman, and Gorodzeisky
2006), “ethnic exclusionism” (Coenders and Scheepers
2003) or “xenophobia” (Hjerm
2007).
Such re-labelling overcomes none of the shortcomings of obtrusive measurement. As it happens, most experimental research on ATII
4 has focused on immigration control preferences, detecting sizable SDB and thus highlighting the inadequacy of expansive notions of prejudice as bias-reducing strategy. Janus’ (
2010) CATI-based study, conducted in 2005, reveals substantially more restrictionist preferences in ICT than direct measurement; this gap increases among liberal and well-educated respondents, suggesting that apparent pockets of tolerance derive from heightened propensity to bias. An (
2015) also observes larger differences between direct and indirect measures of restrictionism among well-educated respondents. Comparing Janus’ data with CAWI data for 2010, Creighton et al. (
2015) detect more explicit opposition to immigration in 2010, whereas ICT results are similar; somewhat precipitously (since mode differences might play a role, cf. Dillman and Christian,
2005), they infer a time-trend of decreasing SDB. For their part, Creighton and Jamal (
2015) find more overt opposition against naturalization of Muslim than Christian immigrants, whereas ICT yields similar results; they deduce added normative pressure to appear tolerant toward Christians. Similarly, Creighton et al. (
2019) observe more
masked opposition to racially similar immigrants than to racially different or poorer ones. As far as we are aware, just two papers address attitude facets other than policy preferences: Knoll (
2013b) finds nativism in the US to be
over-reported in direct measurement as compared to ICT, suggesting that associations with patriotism trigger inverse desirability pressures, while Krumpal (
2012) employs the randomized response technique (RRT) to estimate xenophobia and anti-Semitism in Germany, obtaining modest increments vis-à-vis obtrusive measurement.
Internet-based data, such as social media and internet search data, are employed increasingly to study racist or anti-immigrant attitudes and their relation with populist communication strategies and right-wing voting (e.g., Stephens-Davidowitz,
2014; Heiss and Matthes,
2020). Arguably, such data elude traditional manifestations of SDB, even fomenting niche-specific social desirability dynamics that may favor explicit expressions of AIS. However, a combination of coverage and selection biases (Japec et al.,
2015; Hill et al.,
2020) makes such data unsuitable (as yet) for estimating AIS prevalence and covariates across large populations.
To resume, extant scholarship on ATII measurement suggests a fourth hypothesis: we expect AIS-related SDB to be associated with respondent characteristics that imply heightened susceptibility to normative pressures (H4). Specifically, we hypothesize significant gaps between obtrusive and unobtrusive AIS estimates, and hence SDB, among people with better education (H4.1), leftist ideology (H4.2), those interviewed in CATI mode (H4.3), and perhaps additional features (H4.4).