In this study, we examined twelve classification projects combining CS and M in several scientific fields. This combination results in socio-technical epistemic systems consisting of human and non-human actors, each equipped with different amounts of knowledge and power. Our analysis suggests no task performed by either citizens or experts can be handled fully by ML at present. The suitability of a task for ML depends on the task characteristics and the level of knowledge required to perform it. However, machine learning may yield an epistemically stratified organization, as it requires more expert knowledge and skills, while still soliciting citizen scientists’ contributions. Due to the temporality of tasks in a CS project, the epistemic agency will be ascribed to different actors at different times. This means that, at different times, the epistemic agency of certain actors will come to the fore, while that of others will be obscured. Arguably, this means that narratives of ML, AI and CS (biography, success, optimization, ideal of science, etc.) must be understood in terms of how tasks are delegated and distributed among actors and their epistemic agency, as they change over time. Whether you ask the question “if”, or tell the story of “how” a project succeeds, when machines take over, etc., the answer will be different, and in fact it will differ based on when and where you examine the project.
Our descriptive results raise three main issues, which we will explore through the lens of perspectives informed by Science and Technology Studies (STS) and Information Systems. Specifically, we discuss the emphasis of optimization on the ideal of science, the problem of induction, and the role of citizen scientists.
5.1 Emphasis on optimization to achieve greater scale, accuracy, and speed: which ideal of science?
We examined projects in various scientific fields that use CS in conjunction with ML, such as neuroscience, sociology, oceanography, environmental science, botany, life sciences, astronomy, microbiology, and medicine. Despite the variety of scientific fields, the narratives about these projects in terms of project goals, human–machine integration, use of ML, and distribution of tasks between citizens, experts, and machines, all reveal a story of optimization, scaling up, increasing accuracy, and speeding up. These stories underpin a model of data-driven science propelled by ML affordances that make it easier to sift through massive datasets to infer patterns, while renewing inductive reasoning, and making research less hypothesis-driven (Mazzocchi
2015; Hochachka et al.
2012). At the same time, there is also a growing interest in using AI in research as a means of enabling new methods, processes, management, and evaluations (Chubb et al.
2021).
The diverse projects included in this study bring together a stereotypical narrative of how ML enables epistemic agency. Why is this narrative repeated in various scientific fields? Data-driven science seems to unite projects rather than separate them when they are reported for stakeholder audiences, embedding programs of action (Latour
1992), and envisioning a desired imagined future state (Glaser et al.
2021, p. 13). In this way, a scientific future is enacted, namely, one in which high-powered computing capacities enable the utilization of largely inductive ML applications that require less theoretical pre-processing of data (Glaser et al.
2021, p. 10). The shared narrative reflects a unity of science that extends beyond the epistemic agency that shapes these hybrid formats of science. As science opens up to outsiders to train the automated systems of high-capacity machines, outsiders are encouraged to participate as inductive actors, taking part in a program and performing an ideal of science that accommodates them in the research process. This does not imply that the observations made, classified, and used to train machines are not used to test hypotheses. Citizens’ access points, however, enact a process of induction, which tends to unify the sciences more than separate them. When scientists open the way for volunteers to train machines to speed up scientific processes, this is in line with inductivism’s ideals (Kasperowski et al.
2019).
Science has been associated with at least two main epistemic uncertainties known as the problem(s) of induction. Popper (
1934) suggests that although some “problems” cannot be solved, they can be more or less successfully managed. First, observations are uncertain and are usually addressed and managed by different technologies, standards, and protocols. Second, finite observations cannot yield universal conclusions. Observers cannot observe everything and protocols in themselves suffer from being based on finite observations. Inundated with data, they will not be able to cope. Nowadays, ML can learn to recognize patterns in classified data that were not integrated into their original design, since the hybrid use of CS and ML when coupled with an abundance of data makes it possible to manage epistemic uncertainty (Popenici and Kerr
2017). Protocols are guaranteed to evolve and maintain their functions as data sets become larger and the epistemic agency of machines increases. However, when outlining an algorithm’s biography (Glaser et al.
2021), it is critical to include how much data an algorithm can handle without losing its explanatory power and its ability to make additional classifications beyond the original gold standard. Once this new data are gathered, it will be time to retrain the machine with the help of experts or citizens.
Science that incorporates outsiders such as citizen scientists in the research process is more likely to unify sciences than to divide them. Citizen science would then reflect a scientific ideal resembling some aspects of the discipline of observations through protocols, which is reminiscent of the logical empiricists’ argument for the purification of science (Kolakowski
1972). While the 1930s called for a purification of science and research, 90 years later the key terms are computer science, artificial intelligence, machine learning, open science, inclusion, mediation, transparency, democracy, and responsible research. In the 1930s, the epistemological concern was that science must close ranks to save itself; it now appears that science is encouraged to open ranks under a banner of openness and unified investigation.
5.2 Implications of optimization for the role of citizen scientists
The problem of induction. The three examples of citizen-in-the-loop in Sect.
4.2 suggest that ML is used as much as possible at a given point in time/stage of evolution of the project, and then humans step in to fill in the missing pieces. Depending on the current strength of ML, the temporal order in which machines and citizens take turns to work on tasks is different. If ML provides wrong outputs, then citizens help retrain the model, but if ML is somewhat better and makes suggestions, then citizens correct the model. If ML models work well, then humans take care of undetected exceptional cases. If ML is perceived as producing optimized outputs, citizens may be made redundant. This is a just conjecture for which there is no direct empirical evidence, but it is worth considering. If the outputs of ML are trusted and relied upon by experts, they may influence the socio-technical assemblage that generates the data on which the same ML is trained (Faraj et al.
2018; Pachidi et al.
2021; Lebowitz et al.
2021). It is unclear if humans will ever become redundant when applying ML to the CS domain, and if they do, under what circumstances is ML considered optimal? In the present stage of development, the skills of ML make it a scalable complement to citizens and experts, for example by structuring large amounts of unfiltered data into information or estimating the probability of an occurrence of an event based on input data (Ponti and Seredko
2022).
However, rather than considering optimization from a technical perspective, we relate it to the way the “problem of induction” from observations is handled. A long-standing belief holds those inductive approaches—now emphasised by ML which infers general rules from observations—are plagued by inescapable and insoluble problems (Popper
1934). Humans and machines cannot solve these problems, nor can some sophisticated hybrid approach combine the two. There is an important reason for this: no protocol can handle a lot of data without encountering anomalies, and these must be controlled for. Therefore, human epistemic agency will be needed for never-observed phenomena, because protocols and gold standards have a limited epistemic reach in those cases.
In this study, we focused on technologies using ML classification methods that infer patterns from training datasets consisting of labelled input–output pairs and classify new inputs into predefined output classes. These ML methods are rule-based systems that allow ML to represent experts’ know-what knowledge (Lebovitz et al.
2021). This type of knowledge is captured, for example, in the gold-standard labels used to train and validate ML models. Experts are key in the data-intensive projects we examined. Consonant with the relational notion of agency adopted in this work, we use a relational notion of expertise. Rather than being something experts possess, we define expertise as the expert’s ability to mediate between the production of knowledge and its application (Grundmann
2017). In this sense, in our sampled projects, experts define and interpret situations and set priorities for action. As mentioned in Sect.
4.2, experts are the humans mainly involved throughout the research process and in the loop, when models fail or are unreliable. It has been said that under conditions of epistemic uncertainty, official expertise and lay expertise should not be seen as antagonistic but as complementary (Funtowicz and Ravetz
1990). It is these different types of expertise, including that embedded in ML algorithms, which interact with each other to form an assemblage in a "program of action" (Latour
1992, p.152). Therefore, we suggest viewing ML optimization as a constructed assemblage in which citizens, experts, and machines play different roles and exert epistemic agency at different points in time to pursue CS project epistemic goals.
Trust in ML outputs and redundancy of citizens. In this study, we focused on technologies using ML classification methods that infer patterns from training datasets consisting of labelled input–output pairs and classify new inputs into predefined output classes. These ML methods are rule-based systems that allow ML to represent expert know-what knowledge (Lebovitz et al. 2021). This type of knowledge is captured, for example, in the gold standard labels used to train and validate ML models. It is worth considering that if ML tools are trusted and taken-for-granted, and experts rely on seemingly accurate ML outputs over volunteers (including expert volunteers), these outputs may influence the socio-technical assemblage that generates the data on which the tools are trained (Faraj et al.
2018; Pachidi et al.
2021, Lebowitz et al. 2021). Therefore, to consider machines as unproblematic and their output as immutable mobiles (Latour
1990) implies that citizen scientists will play only a minor role in the long run.
One can speculate that ML in CS will allow citizens to focus on higher-level tasks by automating boring tasks. However, as Franzoni et al. (
2021) argued, “to the extent that such work is limited in volume or requires additional knowledge and resources that pose barriers for crowd participants, there is a risk that CS becomes less inclusive by focusing primarily on expert volunteers” (p. 17). They acknowledge the risk that CS could become less inclusive if projects rely primarily on experts. While this may not be a concern from a ‘productivity’ perspective, it may limit CS’s potential to advance the non-scientific goals highlighted by the “democratization view” (Franzoni et al.
2021). Relying mainly on expert volunteers could reduce the diversity of current and future citizen scientists by diminishing their range of motivations and disengaging those citizens who want to contribute to science in their spare time and have fun, help science, or spend time outdoors (Geoghegan et al.
2016). Deriving personal meaning and value from participating is important to citizen scientists, who typically volunteer time and effort driven by intrinsic or social motivations and not for financial compensation (Sauermann and Franzoni
2015). However, how CS projects have to be designed to actually cater to diverse needs and expectations seems to be very much in a conjectural stage (Kasperowski and Hagen
2022). Suggestions to avoid disengagement and redundancy include allowing participants to contribute their task of interest even if the task can be fully automated so that their contribution can help improve ML performance; or incorporating new forms of citizen contributions to fill the gaps created by automation (Lotfian et al.
2021).
While these suggestions can be regarded as motivational by some, they can be seen as “over-engaging”, bordering on the unethical. As part of the debate over the ethical problems evoked by citizen science, there is an issue of “over-engagement”, which means being available for free work for science indefinitely (Kasperowski and Hagen
2022). Therefore, we argue that the opposite of becoming redundant can actually happen with the growth of ML in CS. The problem of induction may repeatedly call upon humans, both experts and citizens, in the loop of the process. The fear of AI and ML creating “undemocratic”, hierarchical, or epistemically stratified projects, must of course be closely observed. However, from the perspective of epistemic agency, hierarchy or epistemic stratification could be said to occur constantly in projects on a microlevel, as different actors are endowed with more temporal epistemic power during the course of a project.
5.3 Participation of citizens—or the lack thereof—and the use of ML
The AI industry is showing interest in developing solutions to global problems using AI in combination with citizen scientists. An example is a recent partnership between a team of IBM data scientists and the UN Environment Programme (UNEP) to overcome the challenges associated with citizen data and create a unified, global baseline dataset for measuring plastic pollution in line with UNEP’s Sustainable Development Goal 14 (Clemente
2020). Sloane (
2020) contends that ML extends the agenda of the tech industry, which is focused on scale and extraction. The use of ML may exacerbate an “extractive” approach to citizen participation (Sloane
2020), by which data collection and classification remains the primary way for volunteers to contribute to the scientific goals of CS projects. The increasing use of ML in CS classification projects must therefore be related to issues of power dynamics and inequalities in terms of engagement and retention of volunteer participants.
CS wishes to be inclusive in terms of age, gender, ethnicity, geography and social class. However, CS participation is currently skewed demographically and geographically, with biases in age, gender, ethnicity, and socioeconomic status (Pateman et al.
2021). Participants in long-term projects, such as eBird at the Cornell Lab of Ornithology, have been shown to be predominantly highly educated, upper-middle class, middle-aged or older, and white (Purcell et al.
2012). The results concerning gender composition are mixed; however, some projects show a strong bias towards men (Hobbs and White
2012; Crall et al.
2012; Raddick et al.
2009; Wright et al.
2015). Other studies indicate that some projects offer opportunities for disadvantaged groups not otherwise available (Khairunnisa et al.
2021). There is an ethical imperative to involve a diverse group of participants to inform CS projects and provide access to their benefits (Mor Barak
2020).
CS participation is skewed not only in terms of sociodemographics but also in terms of actual contributions. Most contributions are being made by a few (Seymour and Haklay
2017). This lack of diversity in CS reflects different motivations and capacities and raises concerns about the representativeness of data and whether individual, societal, and environmental benefits are evenly distributed (Pateman et al.
2021). Deriving personal meaning and value from participating can be important to citizen scientists who typically volunteer time and effort driven by intrinsic or social motivations and not for financial compensation (Sauermann and Franzoni
2015). Diversity in participation remains a challenge, which the use of ML may exacerbate. The involvement of ML seems to be a case of what we call “designing for”, where citizens are not integrated into the design process from the beginning, but relied on to make the model (ML design) successful ex-post. It has been suggested that the involvement of citizens through CS, particularly during the research design phase, may help reduce bias in data and training annotations for AI, enable public shaping of AI, and foster a lifelong interest in science (Shanley et al.
2021). The long-standing interest among STS scholars of whether new technologies solve problems or rather manage and move them about; displacing, and making problems and some actors and their agency invisible or redundant; seems to reappear as ML and AI are combined with CS (Glaser et al.
2021).
We are left to wonder whether inequality in participation detracts from the promise to make science more democratic, both in terms of including more diverse people in doing science and in making science better aligned with the public interest (Strasser and Hacklay
2018). However, it would be contrary to some current proponents of CS to claim there are “objective” public interests that science can tap into continuously (c.f. Brown
2009). This standpoint seems too often inform voices from both policy and science, when expectations and social imagery of the availability and readiness of citizens to be mobilized into CS projects is produced. Our suggestion would be that the pursuit of such interests must be viewed as acts of performance, thus they are made and cannot be taken for granted.
The non-linearity of the HITL. In our ANT-inspired view, the constructed assemblage including citizens, experts, and ML is a complex effect resulting from mutual interaction and feedback loops, as exemplified in Figs.
5,
6, and
7. These assemblages can be seen as complex triangular systems of citizens-experts-technology in which relationships and loops need to be repeatedly “performed” by all the actors involved, or the assemblages dissolve. ANT is not the only framework that attends to these connections and interactions. Other theoretical endeavors, such as cybernetics (Wiener
1948), have explored complex feedback processes within networking and self-organization of systems. Both ANT and cybernetics aim to conceptualize complexity, they are both sensitive to the hybrid nature of phenomena, and they both emphasize system effects. However, as pointed out by Fenwick and Edwards (
2010), a core difference between cybernetics and ANT is the latter’s orientation toward contingent practices and multiplicity. ANT provides a conceptual framework for analysing how the diverse entities of a classification project—including technologies—take a role through specific situated material–semiotic redistributions of expertise and epistemic agency. Uncertainty characterizes contingent practices. Humans are unlikely to act as “controllers or processors” of classifications in a linear way in the loops, as they may not repeat or reproduce the same exact actions under the same input. Even such fixed things as standards and protocols can be uncertain in practice. The loops we exemplified are not expected to be seamless, whereby ML fails to identify a pattern, citizens identify it, and experts feed the correct answer into the training data. These are expectations and possibilities that ML will perform its tasks, but there is no guarantee that citizens will unquestioningly take the assigned epistemic role and dutifully engage in checking errors, or filling gaps. Nor, for that matter, does it mean that ML itself will comply with experts’ wishes. Checking the correctness of classification can happen in multiple ways, depending on the material–semiotic network in which this task is entangled. Tracing the material–semiotic assemblage of checking data classification could reveal the continuity from one version to the other and thus the simultaneous singularity and multiplicity of the assemblage.
5.4 Limitations
This study has two main limitations. First, we relied solely on secondary sources without incorporating other methods (e.g., interviews) that could help reduce bias and compensate for the dearth of documents and their incompleteness. Our study may be limited by the use of the narratives reported in the documents, which represent the authors’ perspectives. Since most research papers tend to report on successful rather than unsuccessful projects, we are likely to have been exposed to mostly successful divisions of labour instead of those that did not work. Being aware of this potential bias, we have been careful not to use documentary evidence as a stand-in for other kinds of evidence that we could not produce using this method.
Second, our study may be limited by the small number and type of projects examined. The sample we used is purposive. Note that the selected projects reflect those that were documented at a particular moment in time, rather than being a truly representative sample of the population.