3.3.1 Stated willingness to use emojis in surveys
First, to investigate if Millennials would like to use emojis in surveys and for what type of questions, after the experimental part, we asked respondents three separate questions about if they would be willing to use emojis in different surveys contexts: (1) in general (yes, no, I already use it, I do not know), (2) as labels for specific scales (yes, no) and (3) in open-ended questions (yes, no). Moreover, we will report the percentage of respondents answering in a check-all-that-apply question that they would be willing to use emojis to express (1) emotions and feelings, (2) opinions about products, services and campaigns and (3) how much they like or dislike something. We should notice that the different question format (separate questions versus check–all-that-apply) could lead to lower percentages of respondents being willing for the last three cases. The question about the use of emojis in general was not dichotomic: respondents answering “I already use it” were considered as willing and the “I do not know” answers as missing values. The exact formulation of these questions can be found in the Electronic Supplementary Material. All these questions were included after the experimental questions. This could produce a priming effect, affecting the willingness of those asked to answer with emojis in the experimental part. The significance of the differences between willingness levels for control and treatment groups was tested using Z-tests. We found that the willingness to use emojis was significantly higher for the experimental group participants for open-ended questions (79.0% vs. 86.5%, Z = − 3.67,
p < 0.001) and scales (86.4% vs. 91.0%, Z = − 2.69,
p = 0.007). No other significant difference was found.
2 We will report the proportions of respondents stating that they would indeed be willing to use emojis in each case. Analyses were done by country and based only on those who answered the questions (i.e. missing values were excluded). Differences between countries were tested using Z-tests.
Second, we studied if Millennials would prefer to use emoji scales instead of: (1) a dichotomic Like/Dislike type of scale: the thumbs up/down emojis were opposed to the verbal labels “Like” and “Dislike”; (2) an emotion scale: emoji labels were compared to verbal labels and to a reduced set of PrEmo
3 labels, a non-verbal tool used to measure consumer emotions (“Appendix
3” shows the PrEmo scale presented in the survey); (3) a satisfaction scale: smiley-emoji labels, similar to the ones used in hedonic scales for children, were compared to verbal labels from “very satisfied” to “very unsatisfied”; (4) a specific check-all-that-apply scale about what respondents consider before choosing a trip destination: a wide range of different verbal labels (e.g. price) were faced off with their associated emoji labels representing different reasons; and (5) a specific check–all-that-apply scale about the means of transport respondents use to go to their job: verbal labels (e.g. car) were opposed to the emoji labels representing those means. Screenshots of the questions and the scales, as well as their English translations can be found in the Electronic Supplementary Material. We will report the proportions of respondents that prefer or consider equivalent to use emojis in each case. Again, analyses were done by country and based only on those who answered the questions (i.e. missing values were excluded). Differences of distributions between countries were tested using Pearson’s Chi-squared tests. Since Pearson’s Chi-squared test is sensitive to sample size, we additionally computed Cramer’s V to measure the strength of the association between the scale preference and country. Cramer’s V ranges from 0 (no association) to 1 (perfect association).
3.3.3 Impact on data quality, completion time and survey evaluation
To assess the impact of encouraging Millennials to use emojis in open-ended survey answers on data quality, completion time and survey evaluation, we compared treatment and control group on each aspect using data from the experimental part. The statistical significance of the differences between control and treatment groups was assessed using Student’s T-tests (averages) or two-sample Z-tests of proportions.
First, we focus on data quality. Different indicators of data quality have been used in previous research, such as item nonresponse, response latency and multitasking, primacy effect, non-differentiation and straight-lining (e.g. Mavletova
2013; de Bruijne and Wijnant
2014). However, many of them (e.g. non-differentiation and straight-lining) are thought for closed ended questions (presenting an answer scale). Nevertheless, there are also different indicators that have been used to assess the quality in the case of open-ended questions. In particular: (1) the proportion of valid/substantive answers, or its complement, item nonresponse (e.g. Mavletova
2013; Toepoel and Lugtig
2014) and (2) the amount of information conveyed by each answer (e.g. Smyth et al.
2009; Revilla and Ochoa
2016). Item nonresponse is considered as an indicator of low data quality since it suggests respondents did not put the necessary effort into answering the question. Moreover, item nonresponse results in a loss of information which can make estimates less efficient and can threaten the representativeness of the answers if there is a systematic bias in who decide to answer versus not. However, providing an answer is not enough. Open answers can vary a lot in terms of their content. Thus, it is important to consider also other aspects to evaluate the quality of open answers. The amount of information is a key one: indeed, one of the main reasons for using open questions is to obtain more developed answers and thus more salient and detailed information (Geer
1991). The amount of information is often measured by the length of the answers (e.g. number of characters; Mavletova
2013; Revilla and Ochoa
2016). However, the number of characters does not always represent well how much information is provided. Longer sentences can be due to the use of a writing style that is repetitive, and not provide further information. Thus, some authors (see Smyth et al.
2009) proposed to assess the amount of conveyed information in open-ended responses by calculating the number of themes, described as concepts or subjects that answered the question but was independent of all other concepts. This approach seems more suited for measuring the quality of open-ended answers.
Therefore, we use two indicators in order to measure data quality: item nonresponse and amount of information conveyed.
Item nonresponse was calculated as the number of experimental questions without an intelligible answer per person (maximum of six). The averages for each group (treatment/control) were then compared.
Concerning the amount of information conveyed, we adapted Smyth et al. (
2009) approach for emojis. Hence, we computed the amount of information as the number of ideas, opinions, emotions or feelings independent from each other conveyed in each answer. Therefore, the answer “I like this advert” would be a theme, and an emoji laughing, which would convey the information “it is funny”, would represent another theme. “Appendix
4” presents some real examples. This was coded manually by a researcher. Then, we took the average across all six questions. Finally, we compared the average for each group. To test inter-rater reliability (IRR), a second researcher coded a random 20% sample of answers for each question. We then computed Spearman’s ρ for the number of information, for each question. Spearman’s ρ was 0.80 for question 1 (opinion), 0.89 for question 2 (emotion), 0.81, 0.79 and 0.88 for questions 3, 4 and 5 (reaction to different slogans) and 0.78 for question 6 (personality). Both coders were native Spanish speakers.
Second, in order to compute the completion time, we used an adapted version of the paradata tool “SurveyFocus” (Höhne et al.
2017), which allows determining how often and for how long respondents are focusing on the survey page, and therefore to control for multitasking within the same device. However, we could not control for multitasking on a different device or offline. Therefore, in order to deal with outliers, for each page, we applied the same method as Revilla and Ochoa (
2015): substituting the values for the 1% respondents with the highest time (considered as the ones that clearly were multitasking) by the average time
4 spent by the other 99% to answer the questions on that same page. We used the average time and not the maximum time of the other 99% because following Revilla and Ochoa (
2015), we believe that very long completion times do not indicate extremely slow respondents but respondents who interrupted the survey. Then, there is no reason to expect these respondents to spend a longer time on the page once they come back to it. We report and compare the average completion times per question.
Third, to measure the survey evaluation, we used two indicators: the level of satisfaction (proportion of respondents saying that they liked (a lot) answering the experimental questions) and the usability perceived by the users (proportion of respondents saying that it was (very) easy to fill in the experimental questions).
Finally, we conducted linear and logistic regressions for the treatment group to determine the impact of the number of emojis used in open-ended questions on different dependent variables: the amount of information conveyed and the completion time in seconds (multivariable linear regressions); and the stated enjoyment (liked (a lot) = 1, the rest = 0) and usability ((very) easy = 1, the rest = 0) (logistic regressions). We expect that the higher the number of emojis used in the experimental part, the higher the quality, the completion time and the survey evaluation levels. In addition, we included some control variables. First, we included socio-demographic variables: gender (women = 1) and country of residence (Mexico = 1). Moreover, we included the average number of hours of internet use per day through their smartphone (from 1 = “less than 30 min” to 7 = “5h01 or more”), expecting that more familiarity with internet may be related to a higher number of emojis used as well as a higher fluency answering online survey, making it easier and quicker for respondents to complete the survey. Furthermore, personality traits could be related to the number of emojis used. For instance, extrovert, creative and lazy individuals may use more emojis when given the opportunity, for different reasons. This, in turn, can make them convey more information, as well as liking more the experience. Thus, we included the personality traits of extroversion, creativity and laziness (composite scores
5 ranging from − 9 to 9). In addition, we included if others were present while respondents completed the survey (others = 1). We expect respondents answering when others are present to convey less information and found it more difficult, as well as to be more prone to use emojis to answer in an easier and faster way. Finally, we included the stated number of emojis sent weekly (from “1 = none” till “12 = more than 100”), expecting that this would affect how much respondents like having to use emojis. This number might not accurately measure the real emoji use because of human memory limitations and recall issues when reporting online behaviors (e.g. Revilla et al.
2017). However, it can give an estimate of respondent’s perceived use of emojis, a proxy for their actual behavior.