We undertook this work to assess the human side of the human–robot interaction equation in research. Driven by critical scholarship within and beyond engineering and computer science, we asked: Are HRI participants WEIRD? However, we realized that this question alone was insufficient to assess how “weird” HRI research populations might be. We thus went a step further: Are HRI participant samples strange in other ways? Are they diverse? Our systematic review reveals that HRI participants are indeed WEIRD and may also not be as diverse as expected. Even so, we must consider these results in light of another major finding of this work: the state of reporting in HRI. We now turn to discussing each of these findings and potential ways forward, as well as limitations of our own work and trajectories for future work.
5.1 WEIRD and Diverse Patterns in HRI Participants
HRI participants are on the majority WEIRD or EIRD, located in or of nationalities that are primarily Western, and especially American, with the notable exceptions of Japan and Korea (as the EIRD nations). Given the global powerhouse that is the US, this is not unexpected. Moreover, Japan and to some extent Korea have long been heralded as technology-forward nations. Japan, in particular, is globally recognized for its contributions to robotics [
221]. We thus might not be surprised to find over-representation in HRI samples from these countries. A related pattern on the diversity factors side is apparent anglocentrism. A rather “weird” aspect of WEIRD research is the lack of reporting on race, ethnicity, and cultural identity or background. In particular, “the West” seems to be a code for people of certain characteristics, although this is not explicitly represented in the WEIRD framework. Recent critical race scholarship would argue that these characteristics are whiteness, English ability, if not nativity, and Anglo-Saxon heritage. Our sample indicates that this could be true for HRI research. At the same time, we should be mindful of other possibilities. Some factors may be sensitive, such as matters of disability or the body, and researchers may be uncertain whether and how to ask. As we will discuss, institutions could also place limits and barriers on demographics data collection that mask a true desire on behalf of researchers to capture this data for the purpose of reporting on representation. Moreover, we do not have a formal way by which to capture and represent most of this information, which we discuss next. Yet, there is a shift occurring, with recent work tackling race, ethnicity, and identity in HRI spaces head-on. But we cannot just work on “robots and race;” we also need to capture the “race and humans” element in our samples.
HRI samples are also “uncanny” in other ways related to factors of diversity. Robots are physical, for the most part, and robots that interact with humans often do so through physical means (but not always, as is the case with conversational robots). Yet, participant embodiment was vastly under-considered. This is despite a surge of research on factors of the body, especially approach distance [
222], medical robots that lift people [
223,
224], and recognition that the relative size of the robot can instill comfort or discomfort [
152]. Moreover, how disability plays out and what characteristics of the body relate to disability were almost never considered. We also found evidence of exclusion based on researcher assumptions of ability and “health,” as well as explicit exclusion based on bodily features and/or disability. We urge our fellow researchers to include people of all configurations, unless they have a good reason not to. This may mean recognizing that the research is disabling in some way and correcting it. If certain bodies or embodiments need to be excluded, clear and fair reasoning should be provided. We should never exclude based on convenience.
Virtually all researchers have taken an ideologically neutral approach, relying on an assumed foundation of beliefs, attitudes, opinions, and values. This tended to occur even when researchers were conducting research on morality and ethics. Yet, the relationship between ideology and beliefs and the research at hand may be difficult to determine. At the very least, we encourage researchers who study robots and ethics, law, morals, beliefs, religion, spirituality, and other ideological topics to capture the relevant demographics and attitudes, incorporating these in their analyses, and reporting on them faithfully. At the same time, we should consider the ethics of asking about personal ethics. We should avoid forced disclosures, not only for the comfort and safety of participants, but also to ensure data quality. We refer to other work [
16,
225,
226] on how to navigate this sensitive aspect of reporting.
While WEIRD research has highlighted the problem of relying on undergraduate populations, this issue has been under-acknowledged in HRI research. Yet, as our critical review shows, HRI samples have been primarily made up of people who knew computers well, were students in computer science or engineering, and were familiar with robots. Familiarity can bias results, and we must take heed not to over-generalize our results, given the over-representation of “those in the know” as participants. Additionally, HRI samples tend to be young. Given the average age of undergraduate students in most nations, this is to be expected. Nevertheless, we should aim to capture a full range of human experience, across age groups, and without making assumptions of interest or ability based on age, as some in our corpus have done.
When it comes to sex, gender, sexuality, and family configuration, the expected patterns exist. Most research has relied on limited frameworks of sex and gender. While this is changing, the conflation of sex and gender and reliance on the gender binary prevails. We found almost no reporting of non-binary and transgender people, or gender diversity. This does not mean that diverse people were not included, as it was also difficult to access how data was collected. Moreover, many relied on an “other” category, which collapses diversity and implicitly “others” people, i.e., acts as a cue that the person is atypical [
50]. We do not know if it was a recruitment problem or a measurement problem, i.e., gender-diverse options were not offered. HRI researchers can follow recent shifts on asking about and reporting on sex/gender [
11,
47,
49,
50]. Finally, we discovered several meta-level patterns resulting from social norms and habits in HRI or research generally that should be highlighted and challenged. Many researchers, operating from a gender binary perspective, only reported female or women counts and/or percentages. The implication is that “the rest” are male or men. This may be a matter of social norms in research reporting, arising from a legacy of women being excluded as participants and, which was normalized over time, with the goal of
highlighting the recruitment of women. Even so, our analyses show that researchers did this even when there were more women or girls recruited than men or boys. Moreover, there were roughly even numbers of men and women participants overall. We raise this question for the community: Why continue? For breadth and accuracy, we recommend reporting on whether sex and/or gender was captured, and then providing the counts and/or percentages for each sex (as there are a range of intersexes) and gender, making explicit note of whether diverse
gender identities were considered. This should be reported for the sake of the research goals as well as for transparency in representation, regardless of whether the data is used for main analyses.
The apparent lack of diversity has special implications for HRI research. Robots are often humanlike and social, and this matters. We draw from the Computers are Social Actors (CASA) paradigm [
227,
228], which is backed by a wealth of research over the last couple of decades [
229,
230]. In short, the research indicates that we tend to ascribe and react to human-like computer agents as if they are human, often without realizing it, and sometimes even when we do. (This may in fact be an argument in favour of not worrying too much about the overrepresentation of people familiar with robots in HRI samples–they are not necessarily immune to this phenomenon.) Robots are also expensive and typically built for WEIRD nations or the nations in which the builders are located. This can have implications beyond cost, including language support, local tech support, and so on. Moreover, the very notion of “robot” may be “WEIRD” and certainly has “WEIRD” roots. Nevertheless, robot-adjacent concepts may exist, and the robot concept itself can be adapted elsewhere. This raises important questions about research inclusion, not only for participants but also the researchers themselves.
What can we do? In their CHI paper, Linxen et al. [
3] provide several ideas that may be appropriate for HRI venues, too: diversifying authorship; fostering the use of online research; developing methods for studying geographically-diverse samples; appreciating replications and extensions of findings; reporting and tracking the international breadth of participant samples; and identifying constraints on generalizability. We echo these suggestions with some caveats and additions. Online research, for instance, was reported in 77 studies (9.5%), and we expect this proportion to increase as a result of shifting attitudes towards research practice following the global COVID-19 pandemic [
231]. The challenge will be how to incorporate physical robots into virtual or hybrid research contexts. Other challenges remain. HRI research has not been widely conducted outside of WEIRD and EIRD nations. We imagine two opportunities here. First, WEIRD and EIRD researchers can make a concerted effort to bring on researchers, labs, companies, and institutions as collaborators. Second, we may seek to learn from non-WEIRD and non-EIRD researchers and participants about what robots are or can be. Wealth of all kinds can be shared … including intellectual, artistic, and phenomenological wealth. We can also take up posts as outreach officers, such as for ACM and IEEE regional chapters. We also add a call for reflexivity and stricter reporting standards. Perhaps this is a matter of underreporting, which could shift the results either way. We turn to this topic and propose a solution next.
5.2 A Matter of Reporting?
We have a reporting problem in HRI research. Much of our analyses, and therefore our results, were limited by insufficient reporting. This played out in a variety of ways. Some researchers simply did not report any information about participants. In some cases, there was no mention of participants in the paper, and we had to make a guess based on what was implied by other features of the research, such as the system design, data analyses, and results. Others reported some information but not all (e.g., 86% were students, but who were the rest?). Others reported information in non-standard ways or ways that cannot be used in meta-analyses (e.g., median ages). Some information was obscure due to lack of detail (e.g., nationality or location? Does “other” mean another gender identity or that someone preferred not to say, or something else?). There was also unclear or implicit reporting, (e.g., “roughly” 100 participants). If this state of affairs continues, then we will not be able to determine the extent of the underlying problems, or lack thereof, when it comes to representation and inclusion.
All of these issues are easy to fall prey to … but potentially easy to resolve, at least in theory. Indeed, the greater scientific community, notably headed by Nature group, have recently made strides towards improving reporting by providing templates.
15 Other fields of study, in particular the medical and health fields, have long recognized the need for standard reporting to evaluate the relative degree of consensus on a certain intervention. PICO (Population, Intervention, Comparison, Outcomes) [
232], PICOS (PICO plus Study) [
233], and SPIDER (Sample, Phenomenon of Interest, Design, Evaluation, Research type) [
234] are long-standing, widely used templates in these domains. Nevertheless, they are disciplinary and high-level. Moreover, HRI papers are as likely to be short papers as long papers. There may simply not be enough space to report all details. Indeed, the reported counts for the short and long papers suggest that researchers may have been forced to cut details due to space limitations. Finally, we acknowledge that there may be institutional and structural barriers to capturing and reporting participant details. For example, ethics boards may request or require a limit to the number and kind of demographics questions asked. Such barriers may not be resolvable, but should be reported as an explanation, e.g., “The ethics board did not allow us to capture demographics factors that were not directly tied to our research questions and hypotheses.”
Keeping in mind the particularities of the HRI conference, we offer our recommendation. First, consider adopting an existing template. SPIDER may be especially reflective of most HRI research. Adapting a template or developing a new one will take time and community engagement. Future work should involve workshops and other forms of engagement as well as testing templates out. Ideally, the HRI conference will develop a standard template and provide it in the template for papers. This may be especially important for short papers, which can be as few as two pages. The first page could use a template like SPIDER and the second page could be open-ended, based on the characteristics of the reported research. Second, we recommend using the WEIRD and diversity frameworks as a checklist and format for reporting. We offer the following structure for writing up results based on the clusters and intersections among WEIRD and diversity factors, with sensitive or case-dependent factors in square brackets:
Age, [sex], gender, [sexuality], [family configuration], race, ethnicity, nationality, location, education, computer-oriented education, [ideology], disability status, [body factors]
Regardless, we need to report whether our recruitment measures were successful, as well as when they were not. We need to report upon failure to recruit, rather than leave it out and allow the reader to assume. For example, one study [
192] reported on failure to recruit ideal participants: “Unfortunately, we were unable to recruit a guide dog user” (p. 107). This is clear and to the point. We urge other researchers to do the same. In a similar fashion, we acknowledge that it might not be possible, or even appropriate, to collect information concerning all diversity factors of participants. Potentially sensitive and/or uncomfortable questions on sex, sexuality, race/ethnicity, and ideology (some of which have legal ramifications in certain nations) might justifiably raise ethical concerns, especially when they are not directly linked to the research question of a study. However, to increase clarity and transparency in the reporting, we believe that is important for researchers to specifically state when they choose not to collect information about diversity factors and, where possible, the reason behind their choice.