In national and international surveys, life satisfaction is often measured by a single item. However, there is a lot of debate in survey research about whether rating scales should be ordered in an ascending order (from negative to positive) or a descending order (from positive to negative). We investigate the effect of scale orientation by randomly assigning both versions in an online survey (N = 3,138). The average reported life satisfaction is 0.7 points lower when the descending version of an 11-point scale is used, as compared to the ascending scale (p < 0.001). We further test the construct validity by correlating each version of the response scales with other measures related to life satisfaction (e.g. happiness, depressive mood, and physical health). Generally speaking, the correlations of the ascending scale are significantly stronger as compared to the descending scale, indicating higher validity. Moreover, we investigate the impact of horizontal versus vertical presentations of the 11-point life satisfaction answer scale. Our results indicate that there are no statistically significant differences between horizontally and vertically presented response scales. We conclude that the order of response scales should be chosen carefully, as it affects the measurement of life satisfaction. Overall, our results suggest using an ascending life satisfaction scale.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
1 Introduction
In many countries, the political agenda includes improving the well-being of the country’s inhabitants. Knowledge about well-being and how it can be influenced helps policy makers to develop and evaluate various measures to improve the quality of life (Diener et al., 2009). This is also reflected in the fact that the Organisation for Economic Co-operation and Development (OECD) regularly publishes the “How’s life?” reports, which are part of the “Better Life Initiative”. According to the OECD, well-being is a multifaceted concept (OECD, 2020). Researchers capture it using a variety of objective and subjective measures (e.g. Forgeard et al., 2011). Accordingly, the “How’s life?” report documents the development of different indicators of well-being. One important component of individuals’ subjective well-being is life satisfaction (e.g. Diener et al., 1999), and there are two different ways of measuring it. First, many researchers use multi-item scales. One prominent example is the “Satisfaction with Life Scale” (SWLS), which consists of five items and uses 7-point answer scales ranging from 1 (strongly disagree) to 7 (strongly agree) (Diener et al., 1985; Pavot et al., 1991; Pavot & Diener, 1993). Second, much research relies on a single item, such as “All things considered, how satisfied are you with your life as a whole?”. The item is frequently followed by an 11-point answering scale, with endpoint labeling. The single item approach is often preferred in comprehensive multi-topic surveys and regular household panels due to time and space restrictions, and to reduce the burden on respondents (e.g. Allen et al., 2022). Additionally, research shows that the single-item life satisfaction measurement performs acceptably when compared to the multiple-item life satisfaction measurement (e.g. Cheung & Lucas, 2014, Cummins, 1995, Diener et al., 2013, Jovanovic 2016, Kahnemann and Kruger 2006). However, even when a global single-item measurement for life satisfaction is used, various differences exist in terms of question wording and with regard to the response scales (cf. Cummins, 1995). Response scales differ concerning the number and direction of the response categories. The latter can be ordered in ascending (from negative to positive) or descending order (from positive to negative). For instance, the OECD (“How’s life?”), the European Social Survey (ESS) and the Swiss Household Panel (SHP) use 11-point ascending response scales, while the Panel Study of Income Dynamics (PSID), the Eurobarometer or the International Social Survey Programme (ISSP) apply shorter, descending answering scales. If scale order effects are an issue, then this may impede cross-national or longitudinal comparisons. In this study we investigate the question whether scale direction affects the single-item measurement of life satisfaction.
Previous research on scale order effects remains inconclusive. While some studies find no differences between the scale versions, many others show scale orientation effects. Much of the existing research tests fully verbalized or short response scales (e.g. 5-point scales), and only fewer studies relate to longer scales with labeled endpoints (such as the 11-point answer scale). However, there are only a few studies that have analyzed scale order effects of satisfaction questions, and the authors are not aware of any study investigating order effects relating to the question of life satisfaction. Therefore, we conducted an online experiment on the effect of scale orientation to measure life satisfaction with a single item and an 11-point response scale.
Anzeige
In addition to the question of whether the negative or positive pole should be mentioned first, it is important to know if the horizontal or vertical presentation of the scale also influences the measurement. This is crucial for two reasons. First, some surveys present their answer scales vertically. For instance, the World Happiness Report (Gallup World Poll) uses the Cantril Ladder question, in which the responder is asked to imagine a vertical ladder. Second, many surveys are self-administered online surveys, providing respondents the choice to complete the surveys on their smartphones (Bosnjak et al., 2018; Gummer et al., 2023; Peterson et al., 2017). Since it is typically difficult to display longer horizontal scales on mobile devices they need to be optimized for smartphones. This can be accomplished by drop-down menus or by simply presenting the scale vertically. This raises the question whether horizontal, as compared to vertical, presentations affect survey results. Thus, the aim of this study is twofold: First, we investigate if the scale orientation influences the measurement of life satisfaction. Second, we evaluate the impact of the horizontal versus vertical layout.
The remainder of the article is organized into four sections. The next section discusses previous research findings concerning the impact of ascending and descending answer scales. In section three, we describe the data and methods used in our study. Section four presents the results, and section five presents a summary and a discussion of the results.
2 Literature Review
There is an ongoing debate about how best to design questions and answer scales in surveys. There are three recent reviews discussing the effect of answer scale orientation, which is one important aspect in survey question design (Chyung et al., 2018; DeCastellarnau, 2018; Schaeffer & Dykema, 2020). All three reviews conclude that the evidence is mixed. Furthermore, DeCastellarnau (2018) highlights that correspondence between numerical values and verbal labels (e.g. 0 = “not at all”, 10 = “completely”) increases the reliability of the results. Chyung et al. (2018) recommend the use of ascending scales that start with the negative value, because descending scales would inflate the selection of positive values and skew the means upward.
2.1 Studies with no Response Scale Orientation Effect
There are only a few studies that find no response scale order effects. Weng and Cheng (2000) investigate answer scale orientation effects using structural equation modeling in a sample of about 850 students in Taiwan. They varied the order of a fully verbalized 5-point answer scale ranging from “describes me very well” (4) to “does not describe me well” (0) in order to measure personal distress. The authors use a test-retest design and randomized between four groups: Some students received the same scale twice (ascending or descending), and the other two groups the changed version at the second time. Numbers corresponded to the labels so that the order of the numbers changed together with the labels. The results showed no scale order effects in the between-subjects design, and inconsistent results for the within-subject comparison. Accordingly, the authors conclude that the response scale order does not affect participants’ responses.
Anzeige
Research by Christian et al. (2007) also finds no empirical evidence for response scale effects in telephone and web surveys based on responses from about 2,000 university students. First, they conducted an experiment in which they tested whether assigning the highest (5) or lowest (1) category to the most positive answer affected response behavior. They find that the most positive answer was selected more often when associated with the highest number. However, no statistically significant mean and distribution differences were present in the web survey. Second, the authors tested response order effects using two items with a five-point answer scale, in which they always assigned the highest number to the most positive answer category. The authors observe no statistically significant differences between the ascending and descending scale version.
Keusch and Yan (2019) study answer scale orientation effects in behavioral frequency questions using a fully verbalized 5-point response scale (e.g. using the end poles “never” and “all of the time”). Based on the results of 1,700 participants in a representative U.S. online panel, the authors do not find an impact of whether the scale started or ended with the most positive option.
2.2 Studies with Mixed Response Scale Orientation Effects
More studies report mixed empirical evidence. For instance, a study by Krebs and Hoffmeyer-Zlotnik (2010) investigates scale order effects using a self-administered paper survey field to about 430 university students in Germany. The authors asked the students about the importance of multiple future job characteristics, with the aim of measuring two dimensions of job motivation (intrinsic and extrinsic). The 8-point response scales used endpoint labeling ranging from “not at all important” to “very important”. They varied whether the scale started or ended with the positive label. However, the numerical values attached to the response labels always ranged from 1 to 8. The results generally pointed toward a primacy effect – that is, the more frequent selection of the option presented first – especially when using the descending scale. Furthermore, the authors show that the dimensional structure of the construct was not affected by the answer scale orientation; however, there was no measurement equality concerning the factor loadings. Interestingly, the authors find only statistically significant mean and variance differences for the items measuring the intrinsic motivation dimension, which is closer related to individuals’ personality as the extrinsic motivation. Therefore, the authors conclude that possible scale order effects might be dependent on the content, and might be especially pronounced when respondents do not already have an established opinion.
Höhne and Krebs (2018) test for answer scale orientation effects in two different types of question formats, namely agree-disagree and item-specific questions, using data from 930 students. They varied the order of 5-point, fully verbalized response scales with no numeric values and they find that item-specific scales are more robust against order effects. Additionally, the authors point out that response scale order effects depend on the content of the question.
A recent study by Liu and Keusch (2017) investigates the effects of the scale direction of rating scales in face-to-face and web surveys using the American National Election Studies. They varied the order of a vertically displayed, fully verbalized 5-point scale from “agree strongly” to “disagree strongly” without numeric values for 11 items. The authors find that “agree strongly” was selected more often (acquiescence bias) when a descending scale was used in the self-administered web survey. However, they do not find an impact of the scale direction on the substantive latent class variables.
Christian et al. (2009) present the results of different scale experiments that were embedded in two online surveys fielded to a random sample of undergraduate students at the Washington State University in the U.S. Besides visual spacing of midpoints or “don’t know” options, they also investigate scale order effects on different answer scales, including a 5-point satisfaction scale from “very satisfied” to “very dissatisfied”. The scale order only produced a mean difference in one out of seven satisfaction items (or one out of a total of 10 items, respectively). Based on a chi2-test, only three out of all 10 items showed significant differences in the frequency distribution, and, again, this was true only for one out of seven satisfaction items. Some of the scales were presented horizontally, and some vertically. The few items that were affected by response scale order were all presented vertically. Although the authors conclude that the response scale order did not influence the measurement systematically, they show that respondents’ burden increased with descending scales (starting with the positive end), as these led to significantly longer response times.
Hofmans et al. (2007) analyze scale order effects using a within-subjects design in a sample of 156 Dutch speaking highly educated Belgians that were recruited via a quota online panel. The questionnaire was about team roles and was divided into eight different subsets containing several different items. For the within-design, eight items from the survey were asked twice: once on an ascending scale and once on a descending fully verbalized 5-point Likert scale ranging from “strongly agree” to “strongly disagree”. The authors made sure that three complete subsets were answered before an item was measured repeatedly. Considering the averages of the ratings, the authors do not find a response scale order effect. However, they find – consistent with some studies that will be presented next – that the scale orientation slightly affected the frequency distribution. Taking a closer look, the results show that the most positive option was selected more frequently when the scale was ordered in a descending order; however, the second most positive answer was selected less often. Interestingly, statistically significant differences were only induced by scale orientation for the two most positive categories, but not for the other three response options.
Finally, research by Yan et al. (2018) studies the impact of question characteristics on scale direction effects. For this, they use data from the American National Election Studies, in which half of the respondents were presented with either descending or ascending response scales throughout the whole questionnaire. The scale order experiment was applied to answer scales of different lengths. The authors find that positive answers were selected significantly more often when the response scale was ordered in a descending order (high to low) than when it was ordered in an ascending order. However, this occurred only in about a quarter of all questions. Furthermore, the results suggest that scale direction effects on survey responses were stronger for longer answer scales. In line with that result, Höhne et al. (2023) investigate scale order effects in 5- and 7-point rating scales about job motivation and achievement that were end-labeled with no numeric values and displayed vertically. They used more than 4,600 cases from the German Internet Panel. Based on analyses on the observational and latent level they find scale orientation effects only in the longer scale version, but not for all items considered.
Turning to experimental studies that do find a somewhat clearer impact of answer scale orientation on response behavior, we start with studies using between five and seven response categories and no numeric values. A study by Garbarski et al. (2019) includes a scale experiment for self-rated health measured with five verbal response categories ranging from “poor” to “excellent” (with no numeric values). Based on almost 3,000 workers on Amazon Mechanical Turk they find an order effect in which the descending scale starting with the positive end produced higher averages and more mentions of “excellent” and “very good”. Additionally, they show that ordering the scale ascendingly showed higher validity with theoretically related concepts, e.g., education of respondents. Based on this result and their previous research (Garbarski et al., 2015) they suggest that starting the scale with the negative pole reduces clustering at the positive end of the scale.
A study by Smyth et al. (2019) investigates scale order effects in combination with question stem order (i.e., which of the two end poles is mentioned first in the question). They analyze eight satisfaction questions (mainly about customer service) measured on 5-point, fully verbalized scales with no numerical values ranging from “very dissatisfied” to “very satisfied”. They find that the most positive option is consistently and significantly more often selected when a descending response scale is used, although the effects are very small. However, the results for question stem order are mixed. They find only small effects for half of the items. Furthermore, they find no interaction between question stem order and response scale order.
A study by Höhne and Yan (2020) uses about 4,700 cases from the German Internet Panel to test response option order effects of vertically presented answer options. However, in this study they use a different approach, by comparing a fully verbalized 5-point descending response scale with a scale that presents answer options in an inconsistent order (“it depends”, “agree strongly”, “disagree strongly”, “agree”, “disagree”). The reason for this is to test Tourangeau et al.’s (2004) “left and top means first” heuristic, thus expecting categories to be selected more often when they are presented earlier. They find that “strongly disagree” was selected more often when presented as a third category in the inconsistent order. Overall, response distributions and answer times differed significantly between the two treatments. In addition, they show that the consistent response scales correlated significantly higher with another related construct (higher criterion validity), as did the results observed for the inconsistent order scale. It is not very surprising that a logically inconsistent answer scales leads to different results, but the study successfully demonstrates that it is useful to include various outcomes, such as response distribution and validity, to test the impact of answer scales orientation.
Some other studies analyze scale order effects in 5- and 7-point answer scales that include numerical values that correspond to the response options. A study by Nicholls et al. (2006) investigates answer scale order effects using a fully verbalized 5-point agree/disagree response scale for 22 items measuring student satisfaction, in which the highest number (5) always indicates strong agreement. The results are based on 292 undergraduate students in Australia and show that the descending scale led to significantly higher satisfaction. Further, it was noticeable that the scale order effect was mainly driven by categories other than the mode (mostly agree). Keusch and Yan (2018) conducted a scale order experiment in a web survey in Austria. About 500 respondents rated characteristics of different brand logos and advertisements. Ratings were measured on a 6-point scale, with labeled endpoints ranging from “totally applies” to “does not apply at all”. The items and answer scales were presented horizontally in a grid. Additionally, the authors randomized the usage of numerical values in addition to the verbal labels. If numbers were used, they always ran from one to six, and only the order of the verbal labels changed. The results remained the same regardless of whether numbers were used or not. Overall, the study showed a strong scale direction effect: more values from the negative spectrum (“not applying”) were selected when an ascending scale starting with the most negative value was presented. Furthermore, the results showed a stronger scale order effect for individuals who answered the survey quickly (speeders), but the effect was also present for non-speeders. Accordingly, the authors argue that motivation might be associated with scale direction effects.
Yan et al. (2018) compare 3- and 4-point scales with 5- and 6-point scales, and Höhne et al. (2023) compare 5- and 7-point scales. Both studies conclude that answer scale orientation effects are more pronounced in longer scales. Therefore, it might be possible that the scale direction effect is stronger in 11-point answer scales, which are frequently used for the single-item life satisfaction measurement. However, there are only few studies that are based on such longer scales with numerical values and labeled endpoints, and they also report mixed results. For instance, Rammstedt and Krebs (2007) investigate the impact of scale orientation, considering both the order of the labels as well as the numerical values attached to the scale in a panel design. They measure a short version of the Big Five Inventory (10 items) using an 8-point response scale with labeled endpoints ranging from “strongly agree” to “strongly disagree”. Data was collected through a paper and pencil survey at a German university (N = 315) at two points in time. At time one, they included two scale versions (ascending and descending), while also changing the numerical order of the scale so that the highest value (8) was always attached to “strongly agree”. At time two, they kept the numerical order from 1 to 8, but included an ascending and descending order of the verbal labels. This led to three different scale versions. They did not find an impact of scale orientation when both verbal and numerical labels were reversed, but the results showed a scale order effect when the descending labels were counterintuitively attached to ascendingly sorted numerical values ranging from 1 to 8.
In line with these results is research from Hartley and Betts (2010). The authors conceptualized a response scale experiment that varied the order of the verbal labels, as well as the order of the attached numerical values, thus creating four experimental conditions. Participants were asked to evaluate a scientific abstract on an 11-point Likert scale with the endpoints “clear” and “unclear”, in a web-based survey. Based on the rating of 465 academic writers, they find that the scale starting with 10 and the positive pole “clear” produced significantly higher ratings, while there were no differences between the other three versions. The authors extended their research using 130 British children; however, this time they applied a scale with six categories (1 to 6) and the poles “very much” and “not at all”. They find a bias to the left when (a) the positive end of the scale (very much) was presented first, and (b) when the higher number (6) was displayed on the left side of the response scale (Betts & Hartley, 2012). These results are in line with their previous study based on adults, but less differentiated, as children’s ratings were always higher no matter which numerical order was presented. Taken together, the authors present empirical evidence for response order effects in long rating scales when verbal and numerical labels decrease correspondingly.
Finally, a study by Yan and Keusch (2015) approached answer scale orientation effects by experimentally varying whether an end-labeled 11-point scale was presented ranging from 0 to 10 or from 10 to 0. However, in both orientations 0 was the most negative option and 10 the most positive, and the only difference was which pole was mentioned first. Respondents were asked to rate the development of different countries on this 11-point scale. Using almost 500 computer assisted telephone interviews (CATI) in the United States, the authors find a scale order effect, but only for those countries that already had the highest ratings.
2.4 Summary
In summary, our literature review suggests mixed evidence with regard to scale orientation effects. Most studies that use fully verbalized 5-point answering categories report no effect. However, in longer scales with only endpoint labeling, scale orientation seems to matter more often. The differences in scale orientation sometimes relate to the mean, sometimes to extreme responses, and at other times to the whole answering distribution. Hence, scale orientation probably matters in larger answering scales with endpoint labeling. The literature suggests that descending scales (from positive to negative) produce more frequent selection of positive answers, and hence higher means. Various theoretical arguments are presented to underpin these empirical findings. First, there might be satisficing at work, suggesting that respondents select the first reasonable answer category in order to reduce cognitive effort (Krosnick, 1991). Accordingly, satisficing and speeding could be reasons why more positive answers are selected when these categories are presented first. Second, a primacy effect or a general bias to the left side could explain why scales starting with the positive pole often lead to higher means and different response distributions. Third, there is evidence that respondents interpret the first category presented as an anchoring and adjustment point (Yan & Keusch, 2015). In addition to that, acquiescent response styles – respondents’ tendency to agree with items independent of the content (Baumgartner & Steenkamp, 2001) – have been found to produce more positive answers in agree/disagree questions (Liu & Keusch, 2017). Fourth, it is often argued that people expect the positive option to be mentioned first (Tourangeau et al., 2004), so it might confuse respondents when a scale starts with the most negative option instead. However, many international surveys and ongoing panel studies use ascending scales, so it could also be that individuals are used to these kinds of scales and are not confused by them. In any case, it has been shown that it helps respondents when verbal and numerical labels are presented in a corresponding manner (De Castellarnau 2018, Hartley & Betts, 2010). Respondents might to be irritated when the verbal labels of a scale suggest an ascending direction but the numbers do not (cf. Rammstedt & Krebs, 2007). The question remains as to whether it is better if the scale starts with the positive end. Generally, the evidence suggests that a descending order inflates positive answers and should therefore be avoided (Chyung et al., 2018).
In terms of vertical or horizontal display of the response scales, evidence on the impact of scale order is again mixed. Christian et al. (2009) find scale order effects for the items that were presented vertically, but not horizontally. Höhne et al. (2023) observed only scale order effect in longer vertical scales. However, Liu and Keusch (2017) showed with vertical scales that the response distribution is affected but not the latent mean. In contrast, the research from Garbarski et al. (2019) finds neither a main effect of scale display nor an interaction of it with response scale order.
Our review also reveals that much research is concerned with rather short scales or fully verbalized response scales, or the use of the less specific agree/disagree scale. There is considerably less research using long, endpoint labeled, or item-specific answer scales. Furthermore, we observe great topical variety. We did not find recent experiments investigating scale order effects in life satisfaction questions. The two most relevant studies use a 5-point satisfaction scale for various items, but only one study finds clear response order effects (Smyth et al., 2019), while the other finds them only for some items (Christian et al., 2009). Another study’s topic is student satisfaction; however, the answer scale was labeled with agree/disagree. The results indicate a clear scale order effect (Nicholls et al., 2006). As empirical evidence on scale order effects of life satisfaction is missing, we aim to provide an empirical contribution to both life satisfaction and survey research. Based on this literature review, and because we use a long response scale in which we vary verbal labels, but keep numeric values ascending, we hypothesize that there is a response order effect for the measurement of life satisfaction such that the descending answer scale leads to higher means of reported life satisfaction as compared to an ascending scale. Next to investigating the difference in the mean we also look at differences in the distribution, and test which version of the scale has better construct validity.
3 Data and Methods
The data used stems from a university-wide student survey conducted at the University of Bern, in the summer of 2022. All bachelor’s and master’s degree students were contacted via the Student Admission Office. The online questionnaire covered various topics, such as reasons for the choice of study subject, motivation, workload, employment, and well-being. Participation was incentivized through running a lottery in which a single prize was given: paying the tuition fee for one semester (about $900). The response rate was about 25%. Overall, 3,138 students answered the life satisfaction question (after excluding 17 individuals who were older than 50 years).
Students were asked about their satisfaction with life using the global single-item measurement. The question reads as follows: “All things considered, how satisfied or dissatisfied are you with your life in general?” Students could indicate their level of satisfaction on an 11-point response scale. In order to test the impact of the scale orientation, we randomly varied the labels of the endpoints of the answer scale. 75% of respondents received an ascending version starting with 0 “not at all satisfied” and ending with 10 “completely satisfied”. This version was also used in a previous university-wide student survey in 2012. The remaining 25% of respondents received a descending scale version, starting with 0 “completely satisfied” and ending with 10 “not at all satisfied”. In both cases, the numbers ranged from 0 on the left to 10 on the right, and only the labels were changed. We chose an unequal randomization (75% versus 25%) because the survey is part of a trend study. Since we expected a scale orientation effect, we wanted to make sure that enough respondents receive the same question version as in previous years. The online questionnaire was programmed using the software Tivian/Unipark. In addition to life satisfaction, the questionnaire also contained an 11-point answering question concerning students’ satisfaction with their study. This is also useful from a theoretical point of view as study satisfaction is one element that adds to students’ overall life satisfaction (e.g. Benjamin & Hollings, 1995, Lounsbury et al. 2005). The study satisfaction item read as follows: “How satisfied are you with the course of your studies?”, and was varied in the same fashion as the general life satisfaction question. We made sure that respondents received the same answer scale (ascending or descending) for both satisfaction questions.
In addition to presenting the satisfaction questions in ascending or descending order, the response scale varied according to the type of device used to answer the survey. Students who completed the questionnaire on a computer saw the regular horizontal version of the response scale. Participants using mobile devices received a smartphone-optimized vertical display of the scale. Therefore, we can also evaluate differences in horizontal versus vertical scale presentation. However, we did not randomize this aspect as respondents decided which device type they wanted to use.
For the statistical analysis we recoded the descending version of the response scale, so that it also ranged from 0 “not at all satisfied” to 10 “completely satisfied”, which facilitates the interpretation of the results. In the next section, we present the distribution of life satisfaction, calculate a t-test to compare the group means, and use a Levene test to investigate differences in the variances.
Thereafter, we assess the construct validity of both scale versions by correlating them with traits that are related to life satisfaction (Cheung & Lucas, 2014; Jovanović, 2016; Jovanović & Lazić, 2020). First, the construct validity of life satisfaction is evaluated by using a measure of happiness. The correlation should be high, since life satisfaction and happiness are very similar constructs (e.g. Cummins, 1998, Cheung and Lukas 2014, Proctor et al., 2009). We asked respondents how many times they felt happy in the last four weeks on a 5-point scale ranging from 1 “never” to 5 “always”. Second, life satisfaction correlates positively with physical exercise (Grant et al., 2009; Proctor et al., 2009). Therefore, we correlate life satisfaction with the reported average number of hours of sport and active travel (walking and cycling) during a week.
Additionally, we included a variety of indicators that should be negatively correlated with life satisfaction, such as negative affect (e.g. Busseri, 2018, Diener et al., 1985, Joshanloo, 2023, Suh et al., 1998), and the amount of stress in life or school (Lombardo et al., 2018; Milas et al., 2021; Moksnes et al., 2014). We incorporated one facet of negative affect (cf. Krohne et al., 1996, Watson et al., 1988) by asking respondents how often they felt nervous during the last four weeks. The response scale ranged from 1 “never” to 5 “always”. In order to capture students’ stress level, we used an item relating to the frequency of feeling overwhelmed with studying, ranging from 1 “never” to 5 “very often”. Furthermore, previous life satisfaction research showed a negative correlation with depression and depressive moods (Koivumaa-Honkanen et al., 2004; Proctor et al., 2009; Moksnes et al., 2014) as well as feeling negative emotions like worries or sadness (Cheung and Lukas 2014, Kuppens et al., 2008). We measured depressive moods using two items: the frequency of feeling depressed, and feeling discouraged during the last four weeks. Both items are measured on 5-point response scales ranging from 1 “never” to 5 “very often”. Since both items correlate strongly (r = 0.70, p < 0.0001), we constructed an additive index for depressive mood, ranging from 2 to 10 with higher values indicating more depressive feelings. A similar measure is also contained in the PHQ-8 screener which is widely used to measure depression (Kroenke et al., 2009).
In addition, life satisfaction is negatively associated with bad physical health (e.g. Lombardo et al., 2018, Proctor et al., 2009, Rogowska et al., 2021). Accordingly, we asked respondents how often they experienced physical health issues during the last semester. Again, the answer scale ranged from 1 “never” to 5 “very often”. We included five health issues: insomnia, tinnitus, visual disturbance, dizziness, and headaches. We built an additive index out of these five indicators (Cronbach’s alpha 0.67). After recoding, the index ranged from five to 25, with higher scores indicating better physical health. We used case-wise deletion in the case of any missing values on the correlates of life satisfaction. The data were analyzed using Stata 18 (StataCorp, 2023).
4 Results
3,014 students answered the life satisfaction question. The experimental randomization (3:1) worked well as 74.5% of respondents received the answer scale ordered from negative to positive, while 25.5% received the positive to negative response scale. Furthermore, the results show that the majority of students participated via computer. However, more than a quarter of respondents (26.9%) chose to answer the survey on their smartphones. The sample consists of 65.5% females, 33.6% males, and 1% of students identifying with neither female nor male gender. The mean age is 24.4 years.
4.1 Answer Scale Orientation Experiment: Ascending Versus Descending Order
Figure 1 displays the distribution of the life satisfaction variable. The left bars represent the ascending scale from negative to positive, and the right bars show a recoded version of the descending scale from positive to negative. Hence, in Figure 1, high numbers indicate higher satisfaction. The descending answer scale version (positive to negative) produces a larger share in the range of dissatisfaction and a smaller share in the area of satisfaction than the answer scale ranging from negative to positive. This is confirmed by a chi2-test which shows that the distributions differ statistically significantly (chi2 = 140.35, df = 10, p < 0.001). The average life satisfaction is 7.25 on the ascending scale. Using the scale starting with the positive pole position shows a mean of 6.56, which is almost 0.7 points smaller than with the ascending scale. Furthermore, the ascending scale has a smaller standard deviation (1.78) as compared to the descending response scale (2.42). A robust Levene test for the equality of variances shows that the variation differs statistically significantly (W = 155.14, df = (1, 3136), p < 0.001). Thus, a two-sided mean comparison t-test with unequal variances is calculated. Results reveal that the means of the two life satisfaction response scales differ statistically significantly (df = 1108, t = 7.42, p < 0.001). Cohen’s d is 0.35, indicating a small difference of the means (Cohen, 1988).
Fig. 1
Distribution of life satisfaction using either an ascending or descending 11-point response scale
×
After showing that the two response scales produce a different distribution, the question arises whether this has consequences for the construct validity. Table 1 displays the correlations of both life satisfaction measurements with various related constructs. The variables happiness, nervousness, being overwhelmed, depressed mood, and physical health all correlate as expected with life satisfaction. However, it is noticeable that the life satisfaction measure using the ascending scale consistently shows stronger associations with the other traits than the results based on the descending scale. The strength of the correlations differ statistically significantly in these cases by a Wald test (Jennrich, 1970). Surprisingly, the behavioral variable “physical exercising” correlates only weakly with life satisfaction. Accordingly, there is no statistically significant difference between the two response scale versions. Contrary to previous research, however, the descending scale does not lead to more responses at the positive end of the scale but to fewer. This contradicts the bias to the left and the satisficing hypothesis.
Table 1
Correlations of life satisfaction using either an ascending or descending 11-point response scale, with other related aspects
Expected correlation
Life
satisfaction
Significant difference?
ascending
(- to +)
descending
(+ to -)
Happiness
+
0.63***
0.37***
YES***
Hours exercising
+
0.07*
0.02
NO
Nervousness
-
-0.37***
-0.22***
YES***
Overwhelmed
-
-0.30***
-0.20***
YES**
Depressive mood
-
-0.59***
-0.36***
YES***
Physical health
+
0.57***
0.23***
YES***
Statistical significance level: *** p < 0.001, ** p < 0.01, * p < 0.05. Last column indicates whether the correlations differ according to a Wald test (Jennrich, 1970). Sample size ascending scale version: 2217–2337. Sample size descending scale version: 752–801
Study satisfaction is an important part of students’ overall life satisfaction. Indeed, both constructs strongly correlate (r = 0.605, n = 3138, p < 0.001). As we varied the answer scale order on both life and study satisfaction, we can also compare the effect of the answer scale design. Surprisingly, the correlation of life and study satisfaction is smaller when using the ascending scale (r = 0.56) as compared to the descending scale (r = 0.668). Comparing the mean and standard deviation of both study satisfaction measurement version, similar results compared to life satisfaction appear. The mean of the scale ordered from negative to positive is 6.97, while the descending scale produces a 0.5 smaller mean (6.46). Again, the standard deviation observed is significantly smaller with the ascending scale (1.95) than with the scale ordered from positive to negative (2.41) (W = 93.1, df = (1, 3136), p < 0.001). A mean comparison test with unequal variances confirms that both scales have statistically significantly different means (t = 5.363, p < 0.001, Cohen’s d = 0.24). Furthermore, the difference in distributions of study satisfaction is similar to those of life satisfaction. Both scale versions lead to statistically significant response distributions (chi2 = 61.62, df = 10, p < 0.00, see Figure S1 in the Appendix). The correlations of study satisfaction with the other constructs mainly replicate the previous finding: the associations are directed similarly, but again the descending response scale leads to lower correlations (see Table S1 in the Appendix). In addition, the correlations of study satisfaction with the other constructs are consistently lower than for life satisfaction, which was expected. The only exception in this respect is the correlation between being overwhelmed and study satisfaction, which is stronger than the correlation of being overwhelmed with life satisfaction. However, this can easily be explained by the selected item, which asked about feelings of being overwhelmed during studies. Overall, the analyses of study satisfaction replicate the findings of scale order effects in the measurement of life satisfaction.
4.2 Answer Scale Orientation: Horizontal Versus Vertical Representation
Next, the impact of a horizontal versus vertical presentation of the scale is analyzed. Figure 2 displays the distribution of life satisfaction separated by the survey mode. In contrast to the scale direction effect, there are only small differences in the response distribution between horizontal scales (computer) and vertical scales (smartphone) that are not statistically significant (chi2 = 11.08, df = 10, p = 0.351). Furthermore, there is no systematic variation in the response distribution. The standard deviations of both modes do not differ statistically significantly (SD_c = 1.978, SD_s = 1.997, W = 0.005, df = (1, 3136), p = 0.9442). Additionally, both modes have very similar means (mean_c = 7.10, mean_s = 7.02, t = 0.906, p = 0.365), which suggests that vertical smartphone optimization does not have a relevant impact on the measurement of life satisfaction. Table 2 supports these findings and shows no statistically significant differences in the correlations of life satisfaction and other aspects depending on the survey mode used.
As a robustness check, we also computed mean comparison tests for the survey mode (i.e. vertical versus horizontal scale display) using only either the ascending or descending response scale. Again, the results show no significant differences in variance and mean values. (see Figure S2 and S3 in the Appendix). The only thing that stands out is that the descending scale in the smartphone group – in which the most positive option was presented on top – tends to lead to a slightly lower mean value (around 0.25 points lower, not statistically significant, t = 1.271, p = 0.204). This observation contrasts with the expected primacy effects and the “top means good” heuristic proposed by Tourangeau et al. (2004).
Fig. 2
Distribution of life satisfaction separated by survey mode. 0 = not at all satisfied, 10 = completely satisfied
×
Table 2
Correlations of life satisfaction with other related aspects, separated by survey mode
Expected correlation
Life satisfaction
Significant difference?
Computer
(horizontal)
Smartphone
(vertical)
Happiness
+
0.54***
0.51***
NO
Hours exercising
+
0.06**
0.02
NO
Nervousness
-
-0.33***
-0.28***
NO
Overwhelmed
-
-0.26***
-0.26***
NO
Depressed mood
-
-0.51***
-0.50***
NO
Physical health
+
0.31***
0.30***
NO
Statistical significance level: *** p < 0.001, ** p < 0.01, * p < 0.05. Last column indicates whether the correlations differ according to a Wald test (Jennrich, 1970). Sample size ascending scale version: 2191–2294. Sample size descending scale version: 778–844
Furthermore, we can replicate these findings with study satisfaction. There are only small and insignificant differences between the vertical and horizontal response scale version (i.e. the survey modes) (chi2 = 8.19, df = 10, p = 0.61). In addition, no statistically significant differences were found for the means (mean_c = 6.83, mean_s = 6.85, t = 0.218, p = 0.827), and standard deviations (SD_c = 2.07, SD_s = 2.15, W = 0.681, df = (1, 3136), p = 0.409) of study satisfaction. Moreover, the correlations of study satisfaction with other constructs do not differ between the two modes (see Table S2 in the Appendix). Looking at the mean differences between the survey modes only using the ascending or descending scale, respectively, produces the same findings: no statistically significant differences, but a slightly lower mean (0.18 points) in the smartphone group using a vertical scale with the most positive option on top. It is important to remember that the mode used to answer the survey was not randomized: it was the result of each respondent’s personal decision. Still, these results can be interpreted as empirical evidence that the use of a horizontal versus a vertical scale display does not influence the results.
5 Summary and Discussion
The aim of this study was to provide an empirical contribution to the question of whether the response scale order affects response behavior using a long and item-specific Likert-type answer scale with labeled endpoints. For this, we varied the order of an 11-point response scale measuring life satisfaction using a single-item measurement. Consistent with many previous results, we find that answer scale orientation does affect the response distribution and the means. However, in contrast to most other studies, we do not find evidence for the hypothesized bias to the left or satisficing behavior, in which the descending scale usually produces higher means and more positive values. Instead, we observe the opposite. Our data from a university-wide online student survey at the University of Bern (N = 3,138), shows that average life satisfaction is about 0.7 points lower on the descending response scale as compared to the ascending scale.
We recommend using an ascending scale – ranging from negative to positive – when measuring life satisfaction on a 11-point scale for two reasons. First, we observed stronger correlations with other related constructs using the ascending scale, suggesting higher validity. This is particularly surprising given that the related constructs were measured on descending scales, so that the descending life satisfaction scale was more similar, but still performed worse. Second, a considerable number of important international surveys and panel studies (e.g. “How’s Life?” (OECD), European Social Survey (ESS), World Values Survey (WVS), Swiss Household Panel (SHP), German Socioeconomic Panel (GSOEP)) measure life satisfaction with a single item in ascending order. Hence, we recommend using the ascending scale also for comparative purposes.
Furthermore, we investigated the impact of horizontal versus vertical displays on the measurement of life satisfaction. This was possible because smartphone users were presented with an optimized vertical scale, as compared to a horizontal presentation on standard computer screens. We find no impact of scale display on means, distributions and the validity of the life satisfaction question. We conclude that scales can be changed from a horizontal to a vertical layout, in order to adapt them to smartphone screens.
Our study has also some limitations that can be critically discussed. First, our results apply to self-administered surveys – or, more precisely, to web questionnaires – and cannot necessarily be generalized to telephone or face-to-face interviews. Second, we only investigated life (and study) satisfaction. Thus, we cannot conclude that scale order effects exist for all types of questions and content. Third, the ascending scale we used was consistent with the numerical values, such that higher numbers indicate more satisfaction, but our descending scale was not. Previous studies have found stronger scale order effects when there is a mismatch of numerical and verbal labels (e.g. Rammstedt & Krebs, 2007). This could be an explanation for why the mismatching descending scale performed worse as compared to the ascending scale. A bipolar scale (satisfied - dissatisfied) might have mitigated our finding, since respondents might find it more logical to associate “completely dissatisfied” with a high number than to combine “not at all satisfied” with a high number. Fourth, our question wording was always the same, mentioning “satisfied” first followed by “dissatisfied”. We did not investigate the stem order effect by changing the order of the poles in the question. At first sight this might be seen as a disadvantage of our study. If there were a stem order effect, we should have observed more positive answers in the descending scale due to the match of the order in the question and the answer scale (Smyth et al., 2019). However, the ascending scale produced a higher mean than the descending scale which contradicts the stem order effect. Hence, these limitations do not alter our recommendation of using an ascending answering scale when measuring life satisfaction.
Declarations
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Ethics Approval and Consent to Participate
Ethical clearance was received from the Ethics Committee of the Faculty of Business, Economics and Social Sciences at the University of Bern, Switzerland (serial number: 022022).
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.