Characteristics of the response scales’ conceptualization
|
Scales’ evaluative dimension | Agree–disagree (AD) Item-specific (IS) | (Brown 2004): AD scales are clearer to interpret than vague or closed-range quantifier scales (Krosnick 1999): people simply choose to agree because it seems like the commanded and polite action to take (Krosnick et al. 2005): to eliminate acquiescence avoid AD scales (Kunz 2015): AD scales are more difficult to understand and map the appropriate judgement (Saris et al. 2010): AD more acquiescence because of its usual presentation in batteries (Schaeffer and Presser 2003): AD simpler to conduct | (Alwin 2007): the reliability of AD scales is lower compared to IS scales [Wiley–Wiley reliability] → YES (Billiet and McClendon 2000): Acquiescence is found in AD scales [Acquiescence bias through SEM factor] → YES (Krosnick 1991): AD scales lead to lower reliabilities than IS [Pearson product-moment test-rest correlations] → YES (Revilla and Ochoa 2015): AD scales have much lower quality than IS [True-score MTMM reliability and validity] → YES (Saris and Gallhofer 2014): AD scales have lower quality than IS [True-score MTMM reliability and validity] → YES (Saris et al. 2010): IS scales have higher quality than AD [True-score MTMM reliability and validity] → YES |
Scales’ polarity | Bipolar Unipolar | (Kunz 2015): a disadvantage of bipolar scales is that respondents are reluctant to choose negative responses | (Alwin 2007): unipolar scales have somewhat higher reliabilities than bipolar scales [Wiley–Wiley reliability] → YES |
Concept-Scale polarity agreement | Both bipolar Both unipolar Bipolar concept with Unipolar scale Unipolar concept with Bipolar scale | (Rossiter 2011): not distinguish between unipolar and bipolar leads to stupid misinterpretations; unipolar attributes should not be measured with bipolar scales | (Saris and Gallhofer 2007): the impact of using unipolar scales for bipolar concepts is not significantly lowering reliability and increasing validity [True-score MTMM reliability and validity] → NO (van Doorn et al. 1982): differences in the response distributions are clear [Response style through distribution comparison] → YES |
Characteristics of the type of response scale and its length
|
Type of response scales | Absolute open-ended quantifier Relative open-ended quantifier Relative metric Absolute metric Dichotomous Rating Closed quantifiers Branching | (Hjermstad et al. 2011): metric scales are comparable to categorical scales; the type of scale is not the most important but the conditions related to them (Krosnick et al. 2005): dichotomous scales are clearer in meaning and require less interpretative efforts which can harm consistency compared to rating scales (Krosnick and Fabrigar 1997): relative open-ended scales (or magnitude scaling) are a difficult method to administer which only reveals ratios among stimuli and not absolute judgments (Liu and Conrad 2016): respondents are more likely to provide rounded answers in 101 metric scales, as an easy way out (Revilla 2015): the closed-range quantifier labels provided can influence their results if they do not represent the population distribution (Saris and Gallhofer 2014): line production (or relative metric) scales are better than relative open-ended quantifiers because rounding is avoided (Schaeffer and Bradburn 1989): magnitude estimates (or relative open-ended quantifiers) have problems related to the appropriate standard and recoding into categorical distinctions (Schaeffer and Presser 2003): branching has the advantage to provide large number of categories not visually (Schwarz et al. 1985): closed-range informs the respondent about the researcher expectations and adds systematic bias in respondent’s reports and related judgements compared to absolute open-ended formats (Sudman and Bradburn 1983): better use open quantifiers than closed quantifiers for numerical answers to avoid misleading the respondent (Tourangeau et al. 2000): round answers in open-ended quantifiers may be a signal of the unwillingness to come up with a more exact answer and introduce systematic bias, in continuous scales | (Al Baghal 2014a): numerical open ended are as accurate as vague-closed options [Rank-order correlations and regression slopes] → NO (Alwin 2007): rating scales have higher reliabilities than dichotomous but comparable to metric scales [Wiley–Wiley reliability] → YES (Cook et al. 2001): metric scale less reliable than radio button [Score reliability] → YES (Couper et al. 2006): metric scales suffer more missing data than categorical or open-ended quantifier [Item-nonresponse] → YES (Funke and Reips 2012): metric scales are comparable to 5p scales on item-nonresponse [Item-nonresponse] → NO (Koskey et al. 2013): absolute open-ended scales are comparable to rating scales on reliability [Cramer’s V reliability] → NO (Krosnick 1991): metric scales have lower reliability than rating scales; lower reliabilities when using dichotomous scales; branching provides higher reliabilities than rating scales [Pearson product-moment test–retest correlations] → YES (Krosnick and Berent 1993): branching improves reliability compared to no branching (rating scale) [Item reliability] → YES (Liu and Conrad 2016): non-significant differences on item-nonresponse between absolute open ended, rating scale or metric [Item-nonresponse] → NO (Lundmark et al. 2016): dichotomous less valid than rating scales [Concurrent validity] → YES (McKelvie 1978): no difference on reliability or validity between metric and rating scale [Test retest reliability and Test validity] → NO (Miethe 1985): magnitude scaling less credible in terms of reliability compared to rating scales [Test–retest reliability] → YES (Preston and Colman 2000): 2p scales less reliable and valid [Test retest reliability, Cronbach alpha and Criterion validity] → YES (Saris and Gallhofer 2007): open-ended quantifiers and metric scales have significantly higher reliability but lower validity than rating scales [True-score MTMM reliability and validity] → YES |
Response scales’ length | Minimum possible value Maximum possible value Number of categories | (Alwin 2007): the optimal number of points in a scale should be taken into consideration in relation to the polarity of the scale (Cox III 1980): there is no single number of response alternatives for a scale which is appropriate under all circumstances (Krosnick and Fabrigar 1997): optimal is a complex decision to few categories may compromise the information gathered, too long compromises the clarity of meaning (Reips and Funke 2008): optimal length of continuous scales depends on the size of the device screen (Schaeffer and Presser 2003): more categories compromise discrimination and limit the capacity of respondents to make finer distinctions between the options | (Aiken 1983): reliabilities remained constant despite changing the number of categories [Internal consistency reliability] → NO (Alwin 1997): 11p scales more reliable than 7p [True Score MTMM reliability] → YES (Alwin 2007): the use of 4p scales improves reliability in unipolar scales, while the reliability in bipolar scales is higher for 2, 3 and 5p and lowest for 7p. [Wiley–Wiley reliability] → YES (Alwin and Krosnick 1991): no differences between AD with 2 and 5p, IS reliability increases from 3 to 9p, but no differences between 7 to 9p [Proportion of variance attributed to true attitudes] → YES (Andrews 1984). The biggest effect on data quality. More categories better. 3p is worse than 2p [MTMM validity, method effect and residual error] → YES (Bendig 1954): reliability independent of the number of scale categories [Test reliability] → NO (Jacoby and Matell 1971): reliability and validity are independent of the number of points [Test retest reliability, concurrent validity and predictive validity] → NO (Komorita and Graham 1965): reliability increases with the number of points up to 6p [Cronbach alpha] → YES (Lundmark et al. 2016): validity higher in 7p and 11p points than 2p [Concurrent validity] → YES (Matell and Jacoby 1971): reliability independent of the number of points [Internal consistency and Test retest reliability] → NO (McKelvie 1978): validity is slightly better on 7p rather than 11p, reliability unaffected scale [Test retest reliability and Test validity] → NO (Preston and Colman 2000): reliability lower for 2, 3, 4p, higher for 7, 8, 9, 10p, decreases with more than 10p [Test–retest reliability] → YES (Revilla and Ochoa 2015): 11p affects positively the quality of IS scales [True-score MTMM reliability and validity] → YES (Revilla et al. 2014): quality does not improve with more than 5p for AD scales [True-score MTMM reliability and validity] → YES (Rodgers et al. 1992): the number of points has the biggest effect on validity; use at least 5 to 7p, better quality [MTMM construct validity] → YES (Saris and Gallhofer 2007): reliability can be improved by using more categories (11p) without decreasing validity; [True-score MTMM reliability and validity] → YES (Saris and Gallhofer 2007): the maximum value of a continuous scale has a significant effect on reliability or validity [True-score MTMM reliability and validity] → YES (Scherpenzeel and Saris 1997): highest validity with 4, 5 or 7p [True-score MTMM validity] → YES (Weijters et al. 2010): 5 AD points reduces extreme response style [Extreme Response Style through log odds] → YES |
Characteristics of the response scales’ labels
|
Verbal labels | Fully-labelled End-points and more points labelled Endand midpoints labelled End-points only labelled Not labelled | (Alwin 2007): labels reduce ambiguity in translating subjective responses to scales’ options (Krosnick and Fabrigar 1997): verbal labels suffer from language ambiguity and are more complex to hold in memory, label only the endpoints are less cognitively demanding than fully labelling; verbal labels are more natural form of expression than numbers and labelling all points can help to clarify the meaning of numbers (Krosnick and Presser 2010): verbal labels are advantageous because they clarify the meanings of the scale points while reducing the respondent burden (Kunz 2015): labelling may increase the cognitive effort required to read and process all options, while clarifying the meaning of them | (Alwin 2007): fully labelled increases reliability significantly compared to only labelling the endpoints. [Wiley–Wiley reliability] → YES (Alwin and Krosnick 1991): fully labelled increases reliability [Proportion of variance attributed to true attitudes] → YES (Andrews 1984): data quality is below average with all categories labelled [MTMM validity, method effect and residual error] → YES (Eutsler and Lang 2015): Fully labelled produces less extreme responses [Extreme response bias through distribution comparison] → YES (Krosnick and Berent 1993): full verbal labelling improve reliability [Item reliability] → YES (Menold et al. 2014): Fully labelled scales have higher reliabilities than when only the endpoints are labelled [Guttman’s lambda] → YES (Moors et al. 2014): end labelling evokes more extreme responses [Extreme response bias through latent class factor] → YES (Rodgers et al. 1992): non-verbal alternatives have lower random error [MTMM construct validity] → YES (Saris and Gallhofer 2007): The use of labels increase reliability significantly [True-score MTMM reliability and validity] → YES (Weijters et al. 2010): higher acquiescence and lower extreme scores when all categories are labelled [Acquiescence and Extreme response bias through log odds] → YES |
Verbal labels’ information | Non-conceptual Conceptual Objective Subjective Full-informative | – | (Saris and Gallhofer 2007): reliability reduced by having large labels [True Score MTMM reliability] → YES |
Quantifier labels | Vague Closed-range | (Brown 2004): AD scales are clearer to interpret than vague quantifiers (Pohl 1981): it is not clear what exactly word set provides better equal interval scaling (Revilla 2015): closed-range should provide enough labels such that respondents do not feel that their behaviours are not normal (Saris and Gallhofer 2014): vague are prone to different interpretations than closed (Schwarz et al. 1985): respondents use the labels like “usual” as standards of comparison and seem reluctant to report behaviours that are unusual in the context of the scale | (Al Baghal 2014b): vague quantifiers display higher levels of validity than numeric open-ended quantifiers [Predictive validity] → YES (Al Baghal 2014a): vague are equal or better than open-ended quantifiers [Rank-order correlations and regression slopes] → NO |
Fixed reference points | Number of fixed reference points | (Saris and De Rooij 1988): the reference points should add no doubt of its position on the subjective scale of the respondents (Saris and Gallhofer 2014): reference points are necessary to assure that respondents are using the same underlying scale | (Revilla and Ochoa 2015): the use of two fixed reference points increases slightly measurement quality [True-score MTMM reliability and validity] → YES (Saris and De Rooij 1988): differences are due to the freedom respondents have when no fixed reference points are stablished [Response bias through distribution comparison] → YES (Saris and Gallhofer 2007): fixed reference points have a positive and significant effect on reliability and validity [True-score MTMM reliability and validity] → YES |
Order verbal labels | From negative-to-positive (N-P) From positive-to-negative (P-N) | (Christian et al. 2007b): responses vary depending on the order since it provides an addition source of information (Kunz 2015): P-N scales may tempt respondents to rush through a set of items at a faster pace | (Christian et al. 2007b): the order of the verbal labels does not provide significant differences on responses [Response style through distribution comparison] → YES (Christian et al. 2009): no primacy effect found by varying the order of the verbal labels [Satisficing bias through distribution comparison] → YES (Krebs and Hoffmeyer-Zlotnik 2010): more positive answers (primary effect) on P-N, non-significant evidence in the N-P format [Satisficing bias through distribution comparison] → YES (Saris and Gallhofer 2007): the order does not have a significant impact on measurement quality [True-score MTMM reliability and validity] → NO (Scherpenzeel and Saris 1997): order had little or no effect on validity and reliability [True-score MTMM reliability and validity] → NO |
Nonverbal labels | Numbers Letters Symbols None | (Christian et al. 2009): adding numbers provides an additional source of information to process by the respondents before submitting an answer (Krosnick and Fabrigar 1997): numeric labels more precise and easier but have no inherent meaning (Tourangeau et al. 2007): numbers help respondents to decide whether the scale is supposed to be unipolar or bipolar (Schwarz et al. 1991): use numeric labels to disambiguate the meaning of scale verbal labels. 0 to10 numbers suggest the absence or presence of an attribute, while -5 to 5 suggest that the absence corresponds to 0 whereas the negative values refer to the presence of its opposite | (Christian et al. 2009): response style is unaffected when using scales with or without numbers [Satisficing bias through distribution comparison] → NO (Moors et al. 2014): scales with no numbers evoke more extreme responding than with numbers [Extreme response bias through latent class factor] → YES (Tourangeau et al. 2000): scales with no numbers are comparable to those with positive numbers [Response style through distribution comparison] → NO |
Order numerical labels | Negative-to-positive Positive-to-negative 0-to-positive 0-to-negative Positive-to-0 Negative-to-0 1 (or higher)-to-positive Positive-to-1 (or higher) | – | (Schwarz et al. 1991): differences are significant when a scale is presented with 0 to10 values or with -5 to 5 [Response style through distribution comparison] → YES (Tourangeau et al. 2007): differences are significant when negative numerical labels are provided in comparison to when all are positive [Response style though distribution comparison] → YES (Reips 2002): different numerical labelling do not seem to influence the answering behaviours of participants [Response style through distribution comparison] → NO |
Correspondence between numerical and verbal labels | High Medium Low | (Amoo and Friedman 2001): more negative connotation is attached to negative numbers than positive with the same verbal label (Krosnick 1999): use only verbal labels or use numbers that reinforce the meanings of the words (Krosnick and Fabrigar 1997): numbers should be selected carefully to reinforce the meaning of the scale points (O’Muircheartaigh et al. 1995): numeric and verbal labels should provide bipolar/unipolar framework to the respondent (Schaeffer and Presser 2003): when bipolar verbal labels are combined with bipolar numeric labels they would reinforce each other to appear clearer to respondents, however bipolar numeric labels move responses toward the positive end (Schwarz and Hippler 1995): a verbal scale with a negative numeric value suggest a more negative interpretation of the verbal scale anchor and results in more positive responses along the scale (Schwarz et al. 1991): match numeric values with the intended conceptualization of the unior bipolar dimension, numbers should not be selected arbitrarily because respondents use them to communicate intended meanings | (Christian et al. 2007b): low correspondence does not impact substantially the responses [Response style through distribution comparison] → NO (Rammstedt and Krebs 2007): lower reliabilities when the lower numbers correspond to higher positive labels [Test–retest reliability] → YES (Saris and Gallhofer 2007): low correspondence lowers significantly reliability [True-score MTMM reliability] → YES |
Scales’ symmetry | Symmetric Asymmetric | (Saris and Gallhofer 2014): an asymmetric scale presupposes knowledge about the opinion of the sample, otherwise is biased | (Saris and Gallhofer 2007): symmetric scales have a positive effect on reliability and validity [True-score MTMM reliability and validity] → YES (Scherpenzeel and Saris 1997): reliability and validity are slightly higher for asymmetric scales [True-score MTMM reliability and validity] → NO |
Neutral alternative | Explicit Implicit Not provided | (Bishop 1987): midpoints attract respondents under uncertainty (Kulas and Stachowski 2009): midpoints are used when respondents are undecided, misunderstanding the item, when their response is conditional or when they have a neutral opinion (Saris and Gallhofer 2014): used to not force people to make a choice on a specific direction (Sturgis et al. 2014): people do appear to have positions which are neutral; omitting will force these individuals to select an option which does not reflect the true opinion (Tourangeau et al. 2004): respondents can interpret de midpoint in a scale as the most typical and use it as reference point | (Alwin and Krosnick 1991): Midpoints lower reliability, more valuable in 7 point scales [Proportion of variance attributed to true attitudes] → YES (Andrews 1984): midpoint had only slight effect on data quality [MTMM validity, method effect and residual error] → NO (Malhotra et al. 2009): midpoint reduces validity [Criterion validity] → YES (Saris and Gallhofer 2007): not providing a neutral category improves significantly both reliability and validity [True-score MTMM reliability and validity] → YES (Scherpenzeel and Saris 1997): explicit midpoint has no effect on reliability but a higher validity [True Score MTMM reliability and validity] → YES (Schuman and Presser 1981): offering the middle alternative increases the proportion of respondents in that category [Response style through distribution comparison] → YES (Weijters et al. 2010): midpoint increases acquiescence and lowers extreme responses [Acquiescence and Extreme response bias] → YES |
“Don’t know” (DK) option | Explicit Implicit Not provided | (Alwin and Krosnick 1991): DK may be selected because of truly not having an attitude, lack of motivation, wish to avoid giving an answer or are uncertain of which exact point represents best their opinion (Dolnicar 2013): if some respondents cannot answer the question, offer explicit DK (Gilljam and Granberg 1993): explicit DK increases the likelihood of false negatives (Krosnick et al. 2002): providing DK leads to less valid and informative data than omitting it (Krosnick et al. 2005) DK provision encourages respondents to not provide undesirable or unflattering opinions (Kunz 2015): DK option should be explicitly provided if there is a good reason to believe that respondents truly have no opinion on the issue in question (Saris and Gallhofer 2014): explicit DK leads to incomplete data, better use implicit DK | (Alwin 2007): Providing an explicit DK option has a comparable reliability to not providing it [Wiley–Wiley reliability] → NO (Andrews 1984): explicit DK leads to higher data quality [MTMM validity, method effect and residual error] → YES (De Leeuw et al. 2016): Explicit DK increases missing data and lowers reliability. Implicit DK lowers missing data and increases reliability [Item non-response and Coefficient alpha] → YES (McClendon 1991): explicit DK does not reduce acquiescence or recency responses [Acquiescence and Satisficing bias] → YES (McClendon and Alwin 1993): no support towards offering DK to improve reliability [True-score reliability] → NO (Rodgers et al. 1992): lower validities when offering DK explicitly [MTMM construct validity] → YES (Saris and Gallhofer 2007): The provision of the DK option does not have a significant effect on measurement quality [True-score MTMM reliability and validity] → NO (Scherpenzeel and Saris 1997) DK explicit or implicit does not affect reliability or validity [True-score MTMM reliability and validity] → NO |
Characteristics of the response scales’ visual presentation
|
Types of visual response requirement | Point-selection Slider Text-box input Drop-down menu Drag-and-drop | (Buskirk et al. 2015): box format does no give a clear sense of the range of the options (Christian et al. 2007a): numeric text-box input better because drop-down menus are more cumbersome when large number of possible options are listed (Christian et al. 2009): box format is closer to how questions are asked on telephone, where the visual display is not provided (Couper et al. 2004): drop boxes require added effort from respondents who have to click and scroll simply to see the answer options (De Leeuw et al. 2008): drop-down menus are more burdensome for respondents (Dillman and Bowker 2001): respondents are more frustrated with drop-down menus as it requires a two-step process (Funke et al. 2011): more demanding requires more hand–eye coordination than point-selection and provides problems to identify non-substantive responses (Kunz 2015): drag and drop may prevent systematic response tendencies since respondents need to spend more time (Reips 2002): hand movement is longer than for other types of scales (Roster et al. 2015): sliders are more fun and engaging and produce better data than point-selection scales | (Buskirk et al. 2015): differences on selecting the lowest, middle or highest options and in missing data between sliders, radio button scales and box format [Satisficing bias and Item-nonresponse] → YES (Christian et al. 2007b): responses are comparable between point-selection and number box scales [Response style through distribution comparison] → NO (Christian et al. 2009): Box entry has a significant impact on responses compared to point-selection [Response style bias through distribution comparison] → YES (Cook et al. 2001): sliders show no difference compared rating scales on reliability [Score reliability] → NO (Couper et al. 2004): nonresponse was comparable between drop-down menu and point-selection [Item-nonresponse] → NO (Couper et al. 2006): more missing data in the slider than in the radio button or text input scale [Item-nonresponse] → YES (Kunz 2015): drag-and-drop scales suffered from higher item-nonresponse compared to radio button scales [Item-nonresponse] → YES (Liu and Conrad 2016): item-nonresponse is nonsignificantly different compared to drop-down and text-box input [Item-nonresponse] → NO (Reips 2002): drop-down menus do not influence on the answering behaviours compared to radio button scales [Response style through distribution comparison] → NO (Roster et al. 2015): response rates between sliders and radio-button scales are non-significantly different [Item-nonresponse] → NO |
Sliders’ marker position | Left/Bottom Right/Top Middle Outside | (Funke 2016): a drawback of sliders is item-nonresponse is difficult to identify | (Buskirk et al. 2015): more nonresponse, middle and higher response options selection for middle and right marker position compared to left marker [Satisficing bias and item-nonresponse] → YES |
Scales’ illustrative format | Ladder Thermometer Other None | (Alwin 2007): offering a thermometer scale usually requires lengthy introductions (Krosnick and Presser 2010): thermometers and ladders may not be good measuring devices because all points cannot be labelled (Sudman and Bradburn 1983): use thermometers, ladders, telephone dials and clocks for numerical scales with many points | (Andrews and Crandall 1975): ladder scales obtained lower validity than other types of scales [Construct validity] → YES (Krosnick 1991): reliability is higher for a rating scale than for the feeling thermometer [Pearson product-moment test–retest correlations] → YES (Levin and Currie 2014): the ladder scale provided better reliability and validity scores than other scales [Pearson correlations and convergent validity] → YES (Schwarz et al. 1998): responses are significantly different whether a pyramid or an onion format is used [Response style through distribution comparison] → YES |
Scales’ layout display | Horizontal Vertical Nonlinear | (Toepoel et al. 2009): respondents are more willing to read option in the horizontal format because they first read horizontally and then vertically (Tourangeau et al. 2004): vertical scales imply more positive options at the top | (Christian et al. 2009): responses to nonlinear layout compared to vertical were significantly different [Response style through distribution comparison] → YES (Toepoel et al. 2009): presenting the options in a horizontal or vertical layout results in different response distributions [Response style through distribution comparison] → YES |
Overlap between verbal and numerical labels | Overlap present Text clearly connected to categories | NS | NS |
Labels’ visual separation | Non-substantive options Neutral options End-points All options None | (Christian et al. 2009): visual separation of labels may encourage respondents to select it and may take longer for respondents to process than when all labels are evenly spaced (Tourangeau et al. 2004): separation calls the attention of the separated option | (De Leeuw et al. 2016): clearly separating the DK option from the substantive responses reduces missing data and produced higher reliability [Item nonresponse and Coefficient alpha] → YES (Christian et al. 2009): separation of the non-substantive option leads to significant different responses, separation of the midpoint does not lead to significant differences [Response style through distribution comparison] → YES (Tourangeau et al. 2004): separation of non-substantive options affected the distribution of answers [Response style through distribution comparison] → YES |
Labels’ illustrative images | Feeling faces Other human symbols Non-human symbols None | (Emde and Fuchs 2013): faces scales are easy to format and attract the attention and increase respondents’ enjoyment (Kunin 1998): Faces scales have the advantage of eliminating the necessity for translating feelings into words, faces are easier to identify by respondents than words | (Andrews and Crandall 1975): comparable validity between faces scales and rating scales [Construct validity] → NO (Derham 2011): the emoticon scale presented significantly higher no answers than slider or point-selection scales [Item-nonresponse] → YES (Emde and Fuchs 2013): non-significant differences in the responses between the smiley scales and the radio button design [Response style through distribution comparison] → NO |