Top

AI & SOCIETY

Published in:

Open Access 26-03-2022 | Original Paper

Narratives of epistemic agency in citizen science classification projects: ideals of science and roles of citizens

Authors: Marisa Ponti, Dick Kasperowski, Anna Jia Gander

Published in: AI & SOCIETY | Issue 2/2024

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Patentsearch

Off

Abstract

Citizen science (CS) projects have started to utilize Machine Learning (ML) to sort through large datasets generated in fields like astronomy, ecology and biodiversity, biology, and neuroimaging. Human–machine systems have been created to take advantage of the complementary strengths of humans and machines and have been optimized for efficiency and speed. We conducted qualitative content analysis on meta-summaries of documents reporting the results of 12 citizen science projects that used machine learning to optimize classification tasks. We examined the distribution of tasks between citizen scientists, experts, and algorithms, and how epistemic agency was enacted in terms of whose knowledge shapes the distribution of tasks, who decides what knowledge is relevant to the classification, and who validates it. In our descriptive results, we found that experts, who include professional scientists and algorithm developers, are involved in every aspect of a project, from annotating or labelling data to giving data to algorithms to train them to make decisions from predictions. Experts also test and validate models to improve their accuracy by scoring their outputs when algorithms fail to make correct decisions. Experts are mostly the humans involved in a loop, but when algorithms encounter problems, citizens are also involved at several stages. In this paper, we present three main examples of citizens-in-the-loop: (a) when algorithms provide incorrect suggestions; (b) when algorithms fail to know how to perform classification; and (c) when algorithms pose queries. We consider the implications of the emphasis on optimization on the ideal of science and the role of citizen scientists from a perspective informed by Science and Technology Studies (STS) and Information Systems (IS). Based on our findings, we conclude that ML in CS classification projects, far from being deterministic in its nature and effects, may be open to question. There is no guarantee that these technologies can replace citizen scientists, nor any guarantee that they can provide citizens with opportunities for more interesting tasks.

The authors declare that all the data supporting the findings of this study are in the following supplementary files: Appendix 1. Meta-summaries (XLSX 142 KB)

Appendix 2. Sources used for the meta-summaries (PDF 128 KB)

Appendix 3. Extracted excerpts from the sources used in the meta-summaries (PDF 448 KB)

Appendix 4. Codes and categories with data examples (XLSX 52 KB)

Appendix 5. Codebook (PDF 112 KB)

Supplementary Information

The online version contains supplementary material available at https://doi.org/10.1007/s00146-022-01428-9.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1 Introduction and background

Over the last few years, we have witnessed a large growth in the capabilities and applications of Artificial Intelligence (AI) in citizen science (CS). In 1995, Irwin used the concept to describe initiatives in science policy to be more responsive to people’s “understanding” and “concerns”, making science policy more “democratic”. Almost simultaneously, Bonney (1996) used the notion of CS to describe scientific projects in which “amateurs” provide observational data (such as bird spotting) and acquire new scientific skills in return. Over the years, both of these streams have been synthesized into contrasting ideal-type views of CS: a “productivity view” focusing on scientific output, and a “democratization view” considering scientific as well as non-scientific goals (Sauermann et al. 2020). Some projects focus on social change, inclusion, or advocacy rather than generating scientific knowledge in the traditional sense (Ottinger 2010). However, most citizen science projects consider the goal of knowledge production as essential (e.g., Bonney et al. 2009).

Members of the public—which we call here “citizen scientists,” or simply “citizens” to be taken to mean citizens of nation states but “members of a broadly construed community” (Eitzel et al. 2017, p. 6)—can participate in different types of citizen science and associated initiatives in several research fields. Citizen science projects are set up in astronomy and astrophysics, ecology and biodiversity, archaeology, biology, and neuroimaging, among other fields. For example, in ecology, citizens use sensors to contribute data to data collection programs and monitor air or water quality, while in astronomy they classify galaxies. Citizen science projects often create large-scale observational datasets including citizen-generated images crowdsourced through smartphone apps, or galactic data collected by astronomers with telescopes. These data can benefit both science and society. For example, large-scale data can be used to complement official data sources to improve reporting on the Sustainable Development Goals (Fritz et al. 2019). CS is now an expanding field and a promising arena for the creation of human–machine systems with increasing computational abilities, since many CS projects generate large datasets that can be used as training materials for AI subsets, such as machine learning (herein ML) (Lotfian et al. 2021; Wright et al. 2019). ML is achieved through adaptive algorithms that use large quantities of labelled data to autonomously detect patterns, make predictions, and recognize technical rules (Popenici and Kerr 2017, p. 2). In the literature, ML types are discussed in great detail, and readers are directed to relevant sources (e.g., Takano 2021; Lotfian et al. 2021).

1.1 The impact of ML on tasks in CS

Humans and ML have the potential to work together in new ways in CS, and make data collection, processing, and validation more efficient (Lotfian et al. 2021; Franzen et al. 2021; Ceccaroni et al. 2019). However, the use of ML raises the question of which tasks will be most affected and which will be relatively unaffected (Brynjolfsson et al. 2018). The distribution and content of tasks according to comparative advantage (who does what best at a given time), to maximize the effectiveness of specialization and increase the efficiency of the system (Kelling et al. 2013), can raise concerns. For example, by making citizen scientists’ contributions either too simple or too complex, or by reducing the range of what they can contribute, there is a risk of disengaging them (Leach et al. 2020). The question of a hypothetical ML takeover of citizen science was also raised by the participants at the 3rd European Citizen Science 2020 Conference (https://www.ecsa-conference.eu/) during a panel discussion intended to initiate a dialogue on how citizen scientists interact and collaborate with algorithms. As mentioned during the conference event, the current rapid progress in ML for image recognition and labelling, in particular the use of deep learning through convolutional neural networks and generative adversarial networks, presents a threat to human engagement in citizen science; if machines can confidently carry out the work required, then there can be no space for authentic engagement in the scientific process (Ponti et al. 2021). Therefore, as ML and other forms of AI become increasingly used in CS, even more fundamental questions arise as to whether we will continue to need citizen scientists and, if so, how their contributions will change because of automation.

Typically, allocating tasks between humans and machines is related to an increased effort to automate parts of human contributions based on what machines and humans are recognized to be better at (e.g., Fitts 1951) to maximize efficiency and speed to achieve a given goal (Tausch and Kluge, 2020). In CS, the use of ML presents opportunities to improve speed, accuracy, and efficiency in analysing massive datasets, monitoring the results, and identifying knowledge gaps (Ceccaroni et al. 2019). We may conjecture that if project organizers have primarily productivity goals, they may replace citizens as much as possible and only make tasks for them as meaningful as needed to keep them engaged. In contrast, if project organizers also have “democratization” goals, they are going to use machines more for the benefit of human engagement and may even involve citizens even where machines could do a more efficient job. The distinction may not matter much now, because AI is still not capable of replacing people completely. Nonetheless, this distinction can become critical once AI becomes more powerful—then project organizers will have to decide whether they want to maximize efficiency by replacing citizens, or maximize engagement by keeping them in the loop while using machines to make things more interesting for citizens. The distribution of epistemic agency would then be taken into account.

1.2 Distributing tasks and epistemic agency

In performing tasks, humans and machines do things, and as a result we can call them actors (those who act). More specifically, they are epistemic actors, because they do things to pursue specific epistemic goals (Ahlstrom-Vij 2013). In this paper, both attaining knowledge and providing solutions to specific problems qualify as epistemic goals. Considering that processes of knowing or problem-solving take place in increasingly entangled systems consisting of human and non-human actors, systems in which data from multiple sources gets processed, accepted, rejected, and modified in various ways by these different actors, the notion of epistemic agency needs to be examined to account for such socio-technical processes. Humans and algorithms are not seen here as self-contained epistemic actors in their own right, and a key empirical question is how agency dynamics play out in hybrid settings combining humans and machines. In this paper, we seek to answer the following questions:

RQ1

What is the distribution of tasks between humans and ML in CS classification projects?

RQ2

Based on this distribution, how is epistemic agency acted out in terms of whose knowledge shapes the distribution of tasks, who decides what knowledge is relevant to the classification, and who validates it?

Through the analysis of the narratives reported by organizers (e.g., project leaders, researchers) in the documentation of a sample of projects, we sought to gain descriptive insights into how citizens, experts, and ML participate in specific classification tasks and how task distribution affects their epistemic agency. Narratives are cultural artefacts that tell stories, which offer particular points of view or sets of values (Bal 2009). We consider academic papers and other non-fictional material used in this study as forms of “narratives” offering written accounts of the combination of CS and ML in classification projects.

We will now clarify the terms used in these questions, except for epistemic agency, which will be treated separately in the next section. For a proper understanding of the term task, we refer to Hackman’s (1969, p. 113) definition of the term as a job assigned to a person (or group) by an external agent or that can be self-generated. A task includes a set of instructions that specify which operations need to be performed by a person concerning an input, and/or what goal is to be achieved. We used Hackman’s (1969) conceptualization of tasks as a behaviour description, that is, a description of what an agent does to achieve a goal. Thus, the emphasis is placed on the reported behaviour of the task performer. This conceptualization applies to both humans and machines performing tasks. We chose to use the notion of task for two reasons. First, prior work on CS has also focused on it (e.g., Crowston et al. 2019; Franzoni and Sauermann 2014). Second, as Brynjolfsson et al. (2018) pointed out, the impact of ML on different jobs is likely to depend on the suitability of ML for specific tasks within jobs. Therefore, human participants in CS will be affected differently based on how suitable their tasks are for automation. In this paper, we focus on data-related tasks, such as data collection, processing, and analysis. Therefore, we excluded from our analysis other tasks that experts commonly do in CS projects such as, for example, securing funding, developing materials and methods, and writing papers. Regarding the term “expert”, we use it to include only professional scientists and professionals responsible for developing algorithms, setting up and running the projects. For the sake of our analysis, we do not use the term to refer to “expert citizens”, although we recognize that citizens can develop expertise in CS projects (e.g., Epstein 1995; Collins and Evans 2007). Citizens have been shown to develop expertise and perform tasks that extend beyond the ones they were mobilized with by scientists (Kasperowski and Hillman 2018).

The paper is structured as follows. We begin using a perspective in Science and Technology studies to define “epistemic agency” as a construct that allows exploration of what it means for actors to participate in socio-technical endeavours as CS classification projects. Second, we present the methodology used to collect and analyse our sample, followed by the results and discussion. The final section presents conclusions from this study and suggests future research directions.

The contribution of this paper is threefold: (1) to give scholars studying CS and human–machine integration a synthesis of results providing descriptive insights into the distribution of tasks and epistemic agency in CS classification projects; (2) to draw potential broader implications for the role of citizen scientists that are associated with the division of labour between the three actors; and (3) to point to relevant questions for future research.

2 What is epistemic agency?

A standard notion of agency used in the social sciences refers in a very broad sense to the capacity of an agent—usually a human—to act intentionally to influence or control social relationships or structures (Davidson 1980). In this study, we needed a more encompassing concept that goes beyond human intentions to include the agency of technologies and focuses on how different actors interact to influence the course of events. The notion of agency we use follows the influential conceptualizations used in Actor-Network Theory (ANT) as developed by Latour (2005) and Callon (1986). Our study is influenced by two aspects of these.

The notion of agency we use follows the influential conceptualizations used in Actor-Network Theory (ANT) as developed by Latour (2005) and Callon (1986). Our study is influenced by two aspects of these conceptualizations. First, material objects exercise agency much like humans although unlike humans they do not have intentions, only effects. Second, humans and non-humans do not possess/have agency, but exercise it by interacting with each other. According to Latour (2005), an actor can be anything—such as artifacts, tools, animals, and ideas—that modifies other actors through a series of actions. An actor makes others act. A network of actors makes room for epistemic agency, since the activity of knowing does not emerge only from the effort of one individual actor but from the efforts of several actors woven together in a “program of action” (Latour 1992, p. 226), bringing together both the intentions of humans and the functions of artefacts. In such a program, humans and machines form an assemblage and work together to pursue their epistemic goals.

This approach to agency reframes the role of ML and the way it relates to humans as presented in several accounts. For example, it helps us to go beyond the conception that, once ML is successfully implemented and trained to perform tasks, it can become autonomous, self-standing, and black-boxed in terms of epistemic agency (cf. Glaser et al. 2021, p. 3). The same view of ML as a discrete tool can be seen as enabling citizen science projects to require fewer volunteers thanks to the efficiency and speed afforded by computational technologies once a training dataset has been successfully developed (McClure et al. 2020). Arguably, these accounts of AI rely on a concept of agency located in discrete computational tools, which prevents us from considering both humans and non-humans as interwoven participants.

Consonant with the definition of agency used here, we conceptualize epistemic agency as the capacity of different actors to intervene, facilitate, or control the ways scientific knowledge is produced. Using a relational perspective, the distributed epistemic agency of algorithms comes to the fore: algorithms are no longer islands of automation but “assemblages” of hands tweaking and turning, swapping parts, and experimenting with new arrangements (Glaser et al. 2021, p. 5; Seaver 2019, p. 419; Pollock and Williams 2009). Algorithms are created in relational networks that exceed merely technical domains. Thus, investigating the epistemic agency in CS classification projects means disentangling these relational networks to examine how algorithms play a role that extends beyond the conditions under which they are developed and implemented (Glaser et al. 2021, p. 5). Human and non-human actors thus take turns performing the epistemic work required for achieving epistemic goals. The use of ANT to examine epistemic agency in projects that employ ML to sort through large datasets provides an analytic sensitivity that can reveal both the continuity and the simultaneous singularity and multiplicity of this phenomenon. Epistemic agency takes multiple forms, depending on the material–semiotic network in which it is entangled.

Considering the distributed nature of agency, we need to examine how it functions in hybrid settings. These settings are being developed in CS to perform certain research tasks using ML. It remains to be seen how the distribution of work between experts, citizens, and machines will affect their epistemic agency, and, ultimately, scientific knowledge production. As a result, science addresses the issue of managing large datasets by spreading the research process among various disciplines, machines, and actors outside of academic science, while also distributing control and command over the research process. Concern is raised over how epistemic agency is distributed over time, and how effective relationships between humans and technology are shaped (Knorr-Cetina 1999, 2007; Reyes-Galindo 2014).

3 Methodology

This study consisted of three main steps. In Step 1, we searched for documents about CS classification projects using ML. At the time we started this study, to the best of our knowledge, there was no repository of CS classification projects using ML. Our two main options were Internet searches and snowballing. We selected documents using three main criteria: (a) there must have been an implementation—or a proof-of-concept—of the ML application; (b) the texts must have been produced by personnel directly involved in the design and development of classification projects; and (c) the texts were deemed suitable to address our questions. In total, this study includes 38 published sources (27 journal articles, and 11 from a variety of articles including reports and blogs), retrieved between January and July 2020. All the documents are in the public domain and are obtainable without the authors’ permission. The used sources are referenced in Online Appendix 2.

We selected a purposive sample of 12 classification projects: Galaxy Zoo AI, Virus Spot, Multiple Sclerosis, Human Atlas, Plantsnap, MAIA (ML Assisted Image Annotation), iNaturalist, Milky Way, Twittersuicide, Mindcontrol, Observation.org, and Snapshot Serengeti. Our sample was drawn from Ceccaroni et al. (2019) and Citizen Science Salon in Discovermagazine.com (Ischell 2019). We chose to select a homogenous sample in order to identify important common patterns that cut across variations, and simplify our analysis. A homogenous sample usually requires a smaller number of cases (Patton 2002).

In Step 2, we used document analysis, a qualitative research approach used to analyse documents that have been produced prior to, and independently of, the researchers in the present study (Bowen 2009). For each project included in the sample, we produced meta-summaries (Online Appendix 1) of documents containing information relevant to address our research questions. A qualitative meta-summary is defined as a “form of systematic review or integration of qualitative findings in a target domain that are themselves topical or thematic summaries or surveys of data” (Sandelowski and Barroso, 2003, p. 227). Adapting the process for creating qualitative meta-summaries proposed by Sandelowski and Barroso (2003), we created a spreadsheet to summarise information from each source about the following aspects: the “data” tasks performed by citizen scientists, experts, and algorithms, respectively; the types of algorithms used; the sequence of tasks between humans and machines, and the reasons why the project combined humans and machines [the meta-summaries are available in Online Appendix 1]. One author reviewed all the sources. However, to ensure the trustworthiness of the review and provide direct evidence from the sources, we created anchor codes—e.g., [GZA-KA19-7]—in the meta-summaries to link the extracted information to the original statements in the sources (Online Appendix 3).

In Step 3, to analyse the meta-summaries, we used qualitative content analysis (QCA) (Hsieh and Shannon 2005, p. 1278). Content analysis assumes that texts can provide valuable information about a particular phenomenon (Bengtsson 2016). The unit of analysis was the individual classification project. NVivo 12 software (QSR International 2020) was set up for coding the collected secondary material. Through repeated examination and comparison, we identified themes in the data through inductive analysis. Two authors open-coded the meta-summaries, describing the tasks performed by citizens, experts, and machines, and categorized the codes based on their conceptual similarity. We decided to conduct a manifest analysis, which means that we remained close to the text, describing the visible, such as the words in the text, without trying to infer latent meanings (Bengtsson 2016). The coding structure is in ESM Appendix 5.

4 Results

In this section, we report the results from the data analysis of the 12 classification projects. The following two subsections address our two research questions: distribution of tasks (RQ1), and enactment of epistemic agency (RQ2).

4.1 Distribution of tasks between humans and ML

We begin by providing an overview of the main characteristics of the sampled projects with examples of tasks performed by citizens, experts, and machines, respectively (Table 1).

Table 1

Synopsis of project characteristics and tasks

Project	Application category	Field	Project epistemic goal	Reason for human–machine integration	ML method	Citizen tasks—Examples	Expert tasks—Examples	Machine tasks—Examples
Mindcontrol	Recognition	Neuroscience	Scale expertise in neuroimaging	Experts are better at visual inspection of images. Machines are better at scaling it to large datasets	Classification- Convolutional neural network	Generalize and scale expert decisions; review machine annotations	Validate classifications; develop a gold standard dataset	Learn the training dataset; classify
Twitter Suicide	Recognition	Sociology	Bridge big data and close reading and nuanced interpretations of manual qualitative coding and analysis	Human, labour-intensive and nuanced interpretation of data not manageable on large datasets. Machines are better at large-scale patterning	Classification-Convolutional neural network	Generalize and scale expert decisions; classify	Aggregate labels to reach a consensus; identify classes of interest	Classify
MAIA-ML Assisted Image Annotation	Recognition	Oceanography	Monitor and comprehensively explore marine species using image annotation	Experts are accurate at detecting and classifying marine species. Machines scale and speed up the process of manual image annotation	Autoencoder networks (AEN) in combination with fully convolutional networks (FCN)	Review machine annotations; review machine training data	Boost training data; validate classifications	Help identify objects of interest; learn the training dataset
iNaturalist	Recognition	Environmental science	Map and share observations of biodiversity across the globe	Human classifiers are few, while observers posting photos grow. Machines deliver higher quality identifications faster	Classification	Annotate observations; follow authoritative taxonomies	Provide a taxonomic nomenclature; curate data	Learn the training dataset; rank observations
Observation.org	Recognition	Environmental science	Provide a free tool for all field observers around the world to enhance the completeness and quality of the observation database	Citizens provide photos and observations. Machines offers citizens the possibility to automatically identify species	Deep learning (neural network)	Annotate observations; classify	Develop a gold-standard dataset; validate classifications	Detect automatically
Snapshot Serengeti	Recognition	Environmental science	Investigate multi-species dynamics in an ecosystem	Humans provide reliable datasets but cannot manually annotate large numbers of images. Machines scale up the process by detecting and classifying species automatically	Plurality algorithm	Classify	Develop citizen-labelled dataset; validate classifications	Apply a plurality algorithm to citizen classification data and turn it into a single aggregated species identification
Plantsnap	Recognition	Botany	Help people explore and identify plants all over in the world	Humans provide photos of plants, but they may not be able to recognize them. Machines help identify them	Customized classification algorithm	Annotate observations; peer review	Train machine classification	Detect automatically; classify
Milky Way	Recognition	Astronomy	Classify drawings on infrared images to identify targeted classes of astronomical objects	Humans are good at serendipitous discovery and performing nuanced classification. Machines scale up classification	Random Forest	Classify	Label data; curate data	Machine classify; find missing items in the database
Human Protein Atlas	Combination of pattern recognition and classification of images	Life sciences	Map all the human proteins in cells, tissues, and organs using the integration of various omics technologies	Players produce high-quality data. Machines perform large-scale image classification	Deep learning (neural network)	Classify	Label data; train citizen scientists	Classify
Galaxy Zoo AI	Recognition	Astronomy	Combine ML and big data to classify more galaxies much faster than the most expert volunteers	Humans provide reliable datasets but cannot annotate massive numbers of images manually. Machines scale up the process by detecting and classifying galaxies automatically	Bayesian neural network	Provide data for Galaxy Zoo	Train machine classifications	Classify
Virus Spot	Prediction	Microbiology	Identify rotavirus infected cells in cryo-electron microscopy (cryo-EM) images	Humans are better at identifying features of viruses, but machines scale and speed up the identification process	Convolutional neural network—deep learning semantic segmentation	Classify; do collaborative analysis	Develop a citizen-labelled data set	Classify
Multiple Sclerosis	Prediction	Medicine	Based on randomly selected clinical records, achieve prognostic capability that surpasses ML algorithms and human groups alone	Groups of lay people may perform as well as experts and provide collective intelligence Machines can be used in an attempt to succeed where traditional methods of clinical predictions fail	Random Forest	Make predictions; provide feedback and bugs reporting	Create hybrid predictions; train machine classifications	Make predictions

Table 2 presents a summary of the three tasks most frequently performed by each actor across the projects. A complete description of all the tasks is included in ESM Appendix 4, along with examples of data for each. The table highlights the role of experts in checking model predictions and validating the results to ensure accurate outputs by models; the role of citizens labelling data to develop a training dataset fed into machines to make correct predictions; and the role of machines inferring patterns from new data after training with a labelled dataset.

Table 2

A comparison of the three main tasks performed by each actor

Task	Description	Data examples
Citizen
Citizens classify	Citizen scientists classifying data. They can assign gold standard labels to objects, or assign their own labels	Citizen participants identify patterns that machines cannot identify in bubble detection and contribute to a dataset called (Milky Way)
Annotate observations	Citizen scientists adding context information associated with an observation, e.g., location and date/time of a photo	Citizen participants provide evidence for their observations (e.g., time, place, descriptions) by using the tools provided by the database system (Observation.org)
Review machine annotations	Instances in which citizen scientists have to approve, reject, or edit the automatically suggested matches	Citizens eliminate the false positive detections, which mark regions of the images that show no OOI. The annotation candidates are manually reviewed in the last step (MAIA)
Expert
Validate classifications	Methods used to validate the quality of classifications. For example, this can include experts seeking to ensure that a citizen scientist-trained machine model generates correct classifications	Researchers evaluate MAIA on three marine image datasets and validate that MAIA is a promising and efficient method for marine image analysis (MAIA)
Train machine classifications	The use of labelled datasets for training ML to classify objects	Researchers used the newly labelled DES data to do unsupervised recursive training to retrain their deep transfer learning model in order to boost the accuracy of classification (Galaxy Zoo AI)
Develop a gold standard dataset	A dataset that is accepted as the most accurate and reliable and needs to be used as a reference for checking and comparing machine labelling	The ratings of the top 4 expert raters (including the lead author) were used to create a gold standard subset of the data. The gold standard dataset set contains 100 images that were failed by experts, and 100 images that were passed by experts (Mindcontrol)
Machine
Classify	Machine classifying data by assigning labels to objects	The ML model was used for automated image classification of subcellular protein distribution patterns (e.g., recognizing patterns of the cell periphery such as plasma membrane, focal adhesions, cell junctions and making predictions of each cell) (Human Atlas)
Learn the training dataset	Machines learning the “gold standard" labelled dataset to predict labels when classifying data	In MAIA, the training is performed with the set of boosted training samples and the default configuration of the Mask R-CNN implementation. The training samples are used to train the machine supervised learning neural networks to produce a set of annotation candidates for a whole image dataset (MAIA)
Detect objects automatically	An algorithm able to recognize objects, e.g., plants and animals	The system will provide a suggestion of what species it could be, and the suggestions come from the automatic image recognition model “ObsIdentify” (Observation.org)

We now summarize the dataset according to the major categories and codes aggregated by the number of references (portions of coded text) across the 12 projects. In Figs. 1, 2, and 3, we present the distribution of tasks performed by citizens, experts, and ML across the projects.

Finally, the multiple bar chart in Fig. 4 displays the tasks across the three actors to see the distinct roles of machines, citizens, and experts.

4.2 Epistemic agency

The best example of epistemic agency in the data is provided by Human-in-the-Loop (HITL). Developing and improving ML models without human assistance is not possible yet, therefore, HITL (Shih 2018) is the prevailing approach, which requires human interaction when algorithms encounter problems. Typically, HITL is used to combine human and machine knowledge to create a continuous circle where ML algorithms are trained, tested, tuned, and validated. In this loop, with the help of humans, algorithms become better trained and make more accurate predictions. In other words, at their current stage, ML algorithms can learn and improve on their own through trial and error.

Our analysis indicates a type of interaction in which humans and algorithms are interdependent and take turns to solve a task together, while the feedback loop allows continuous improvement of the system. Experts are the humans mainly involved in the HITL described in the analysed narratives. They train models, test, and validate them to improve accuracy by scoring their outputs when algorithms are not able to make the right decisions. They create a continuous feedback loop, allowing the algorithm to give better results over time. However, citizens are also involved at various stages of the process. We present three main examples of what we call citizens-in-the-loop, showing how citizens assist algorithms when they encounter difficulties. These examples show a type of interaction in which citizens and algorithms are interdependent and take turns to solve a task together, while the feedback loop allows continuous improvement of the system.

When algorithms provide incorrect suggestions. In Observation.org, a free tool for field observers to record and share their plant and animal sightings, citizen scientists upload images of flora and fauna and if the recognition algorithm fails to provide correct identification of the species, then citizens can edit the wrong suggestion on the observation screen. Based on this, the system shows whether citizens have accepted or rejected observation data. Citizens contribute to creating a sort of gold standard database used to train the ML model (Fig. 5).

For example, in Snapshot Serengeti, machines may misclassify animals in the collected pictures. Citizen scientists are then tasked with identifying the animals and training the algorithms based on their observations. First, the algorithm classifies the picture. If the animal is detected with a certain probability, citizens come onto the scene. AI offers a primary classification (animal recognition) to the spotter (the trapper who uploaded records can also pre-classify the image). A citizen scientist validates/invalidates the pre-classification and an image is not considered as validated until there is at least a 75% consensus (which can be adjusted in the specific project) among all the citizens involved. This is the input for the algorithms.

When algorithms do not know yet how to perform classification. In Milky Way, a system leveraging citizen science and ML to detect interstellar bubbles, citizens identify patterns that machines cannot identify in bubble detection and contribute to building a database. Researchers use the citizen identification output to train ML and build a model of an automatic classifier (Fig. 6).

When algorithms pose queries. In MAIA, an ML-assisted image annotation for the analysis of marine environmental images, an algorithm poses queries to citizens in the form of training data images. Citizens review these images and determine whether they contain objects of interest for classification or not. Then, they manually refine each image with a circle to mark the object of interest in the image, by modifying the circle position or size, so it closely fits the position and size of the object (Fig. 7).

5 Discussion

In this study, we examined twelve classification projects combining CS and M in several scientific fields. This combination results in socio-technical epistemic systems consisting of human and non-human actors, each equipped with different amounts of knowledge and power. Our analysis suggests no task performed by either citizens or experts can be handled fully by ML at present. The suitability of a task for ML depends on the task characteristics and the level of knowledge required to perform it. However, machine learning may yield an epistemically stratified organization, as it requires more expert knowledge and skills, while still soliciting citizen scientists’ contributions. Due to the temporality of tasks in a CS project, the epistemic agency will be ascribed to different actors at different times. This means that, at different times, the epistemic agency of certain actors will come to the fore, while that of others will be obscured. Arguably, this means that narratives of ML, AI and CS (biography, success, optimization, ideal of science, etc.) must be understood in terms of how tasks are delegated and distributed among actors and their epistemic agency, as they change over time. Whether you ask the question “if”, or tell the story of “how” a project succeeds, when machines take over, etc., the answer will be different, and in fact it will differ based on when and where you examine the project.

Our descriptive results raise three main issues, which we will explore through the lens of perspectives informed by Science and Technology Studies (STS) and Information Systems. Specifically, we discuss the emphasis of optimization on the ideal of science, the problem of induction, and the role of citizen scientists.

5.1 Emphasis on optimization to achieve greater scale, accuracy, and speed: which ideal of science?

We examined projects in various scientific fields that use CS in conjunction with ML, such as neuroscience, sociology, oceanography, environmental science, botany, life sciences, astronomy, microbiology, and medicine. Despite the variety of scientific fields, the narratives about these projects in terms of project goals, human–machine integration, use of ML, and distribution of tasks between citizens, experts, and machines, all reveal a story of optimization, scaling up, increasing accuracy, and speeding up. These stories underpin a model of data-driven science propelled by ML affordances that make it easier to sift through massive datasets to infer patterns, while renewing inductive reasoning, and making research less hypothesis-driven (Mazzocchi 2015; Hochachka et al. 2012). At the same time, there is also a growing interest in using AI in research as a means of enabling new methods, processes, management, and evaluations (Chubb et al. 2021).

The diverse projects included in this study bring together a stereotypical narrative of how ML enables epistemic agency. Why is this narrative repeated in various scientific fields? Data-driven science seems to unite projects rather than separate them when they are reported for stakeholder audiences, embedding programs of action (Latour 1992), and envisioning a desired imagined future state (Glaser et al. 2021, p. 13). In this way, a scientific future is enacted, namely, one in which high-powered computing capacities enable the utilization of largely inductive ML applications that require less theoretical pre-processing of data (Glaser et al. 2021, p. 10). The shared narrative reflects a unity of science that extends beyond the epistemic agency that shapes these hybrid formats of science. As science opens up to outsiders to train the automated systems of high-capacity machines, outsiders are encouraged to participate as inductive actors, taking part in a program and performing an ideal of science that accommodates them in the research process. This does not imply that the observations made, classified, and used to train machines are not used to test hypotheses. Citizens’ access points, however, enact a process of induction, which tends to unify the sciences more than separate them. When scientists open the way for volunteers to train machines to speed up scientific processes, this is in line with inductivism’s ideals (Kasperowski et al. 2019).

Science has been associated with at least two main epistemic uncertainties known as the problem(s) of induction. Popper (1934) suggests that although some “problems” cannot be solved, they can be more or less successfully managed. First, observations are uncertain and are usually addressed and managed by different technologies, standards, and protocols. Second, finite observations cannot yield universal conclusions. Observers cannot observe everything and protocols in themselves suffer from being based on finite observations. Inundated with data, they will not be able to cope. Nowadays, ML can learn to recognize patterns in classified data that were not integrated into their original design, since the hybrid use of CS and ML when coupled with an abundance of data makes it possible to manage epistemic uncertainty (Popenici and Kerr 2017). Protocols are guaranteed to evolve and maintain their functions as data sets become larger and the epistemic agency of machines increases. However, when outlining an algorithm’s biography (Glaser et al. 2021), it is critical to include how much data an algorithm can handle without losing its explanatory power and its ability to make additional classifications beyond the original gold standard. Once this new data are gathered, it will be time to retrain the machine with the help of experts or citizens.

Science that incorporates outsiders such as citizen scientists in the research process is more likely to unify sciences than to divide them. Citizen science would then reflect a scientific ideal resembling some aspects of the discipline of observations through protocols, which is reminiscent of the logical empiricists’ argument for the purification of science (Kolakowski 1972). While the 1930s called for a purification of science and research, 90 years later the key terms are computer science, artificial intelligence, machine learning, open science, inclusion, mediation, transparency, democracy, and responsible research. In the 1930s, the epistemological concern was that science must close ranks to save itself; it now appears that science is encouraged to open ranks under a banner of openness and unified investigation.

5.2 Implications of optimization for the role of citizen scientists

The problem of induction. The three examples of citizen-in-the-loop in Sect. 4.2 suggest that ML is used as much as possible at a given point in time/stage of evolution of the project, and then humans step in to fill in the missing pieces. Depending on the current strength of ML, the temporal order in which machines and citizens take turns to work on tasks is different. If ML provides wrong outputs, then citizens help retrain the model, but if ML is somewhat better and makes suggestions, then citizens correct the model. If ML models work well, then humans take care of undetected exceptional cases. If ML is perceived as producing optimized outputs, citizens may be made redundant. This is a just conjecture for which there is no direct empirical evidence, but it is worth considering. If the outputs of ML are trusted and relied upon by experts, they may influence the socio-technical assemblage that generates the data on which the same ML is trained (Faraj et al. 2018; Pachidi et al. 2021; Lebowitz et al. 2021). It is unclear if humans will ever become redundant when applying ML to the CS domain, and if they do, under what circumstances is ML considered optimal? In the present stage of development, the skills of ML make it a scalable complement to citizens and experts, for example by structuring large amounts of unfiltered data into information or estimating the probability of an occurrence of an event based on input data (Ponti and Seredko 2022).

However, rather than considering optimization from a technical perspective, we relate it to the way the “problem of induction” from observations is handled. A long-standing belief holds those inductive approaches—now emphasised by ML which infers general rules from observations—are plagued by inescapable and insoluble problems (Popper 1934). Humans and machines cannot solve these problems, nor can some sophisticated hybrid approach combine the two. There is an important reason for this: no protocol can handle a lot of data without encountering anomalies, and these must be controlled for. Therefore, human epistemic agency will be needed for never-observed phenomena, because protocols and gold standards have a limited epistemic reach in those cases.

In this study, we focused on technologies using ML classification methods that infer patterns from training datasets consisting of labelled input–output pairs and classify new inputs into predefined output classes. These ML methods are rule-based systems that allow ML to represent experts’ know-what knowledge (Lebovitz et al. 2021). This type of knowledge is captured, for example, in the gold-standard labels used to train and validate ML models. Experts are key in the data-intensive projects we examined. Consonant with the relational notion of agency adopted in this work, we use a relational notion of expertise. Rather than being something experts possess, we define expertise as the expert’s ability to mediate between the production of knowledge and its application (Grundmann 2017). In this sense, in our sampled projects, experts define and interpret situations and set priorities for action. As mentioned in Sect. 4.2, experts are the humans mainly involved throughout the research process and in the loop, when models fail or are unreliable. It has been said that under conditions of epistemic uncertainty, official expertise and lay expertise should not be seen as antagonistic but as complementary (Funtowicz and Ravetz 1990). It is these different types of expertise, including that embedded in ML algorithms, which interact with each other to form an assemblage in a "program of action" (Latour 1992, p.152). Therefore, we suggest viewing ML optimization as a constructed assemblage in which citizens, experts, and machines play different roles and exert epistemic agency at different points in time to pursue CS project epistemic goals.

Trust in ML outputs and redundancy of citizens. In this study, we focused on technologies using ML classification methods that infer patterns from training datasets consisting of labelled input–output pairs and classify new inputs into predefined output classes. These ML methods are rule-based systems that allow ML to represent expert know-what knowledge (Lebovitz et al. 2021). This type of knowledge is captured, for example, in the gold standard labels used to train and validate ML models. It is worth considering that if ML tools are trusted and taken-for-granted, and experts rely on seemingly accurate ML outputs over volunteers (including expert volunteers), these outputs may influence the socio-technical assemblage that generates the data on which the tools are trained (Faraj et al. 2018; Pachidi et al. 2021, Lebowitz et al. 2021). Therefore, to consider machines as unproblematic and their output as immutable mobiles (Latour 1990) implies that citizen scientists will play only a minor role in the long run.

One can speculate that ML in CS will allow citizens to focus on higher-level tasks by automating boring tasks. However, as Franzoni et al. (2021) argued, “to the extent that such work is limited in volume or requires additional knowledge and resources that pose barriers for crowd participants, there is a risk that CS becomes less inclusive by focusing primarily on expert volunteers” (p. 17). They acknowledge the risk that CS could become less inclusive if projects rely primarily on experts. While this may not be a concern from a ‘productivity’ perspective, it may limit CS’s potential to advance the non-scientific goals highlighted by the “democratization view” (Franzoni et al. 2021). Relying mainly on expert volunteers could reduce the diversity of current and future citizen scientists by diminishing their range of motivations and disengaging those citizens who want to contribute to science in their spare time and have fun, help science, or spend time outdoors (Geoghegan et al. 2016). Deriving personal meaning and value from participating is important to citizen scientists, who typically volunteer time and effort driven by intrinsic or social motivations and not for financial compensation (Sauermann and Franzoni 2015). However, how CS projects have to be designed to actually cater to diverse needs and expectations seems to be very much in a conjectural stage (Kasperowski and Hagen 2022). Suggestions to avoid disengagement and redundancy include allowing participants to contribute their task of interest even if the task can be fully automated so that their contribution can help improve ML performance; or incorporating new forms of citizen contributions to fill the gaps created by automation (Lotfian et al. 2021).

While these suggestions can be regarded as motivational by some, they can be seen as “over-engaging”, bordering on the unethical. As part of the debate over the ethical problems evoked by citizen science, there is an issue of “over-engagement”, which means being available for free work for science indefinitely (Kasperowski and Hagen 2022). Therefore, we argue that the opposite of becoming redundant can actually happen with the growth of ML in CS. The problem of induction may repeatedly call upon humans, both experts and citizens, in the loop of the process. The fear of AI and ML creating “undemocratic”, hierarchical, or epistemically stratified projects, must of course be closely observed. However, from the perspective of epistemic agency, hierarchy or epistemic stratification could be said to occur constantly in projects on a microlevel, as different actors are endowed with more temporal epistemic power during the course of a project.

5.3 Participation of citizens—or the lack thereof—and the use of ML

The AI industry is showing interest in developing solutions to global problems using AI in combination with citizen scientists. An example is a recent partnership between a team of IBM data scientists and the UN Environment Programme (UNEP) to overcome the challenges associated with citizen data and create a unified, global baseline dataset for measuring plastic pollution in line with UNEP’s Sustainable Development Goal 14 (Clemente 2020). Sloane (2020) contends that ML extends the agenda of the tech industry, which is focused on scale and extraction. The use of ML may exacerbate an “extractive” approach to citizen participation (Sloane 2020), by which data collection and classification remains the primary way for volunteers to contribute to the scientific goals of CS projects. The increasing use of ML in CS classification projects must therefore be related to issues of power dynamics and inequalities in terms of engagement and retention of volunteer participants.

CS wishes to be inclusive in terms of age, gender, ethnicity, geography and social class. However, CS participation is currently skewed demographically and geographically, with biases in age, gender, ethnicity, and socioeconomic status (Pateman et al. 2021). Participants in long-term projects, such as eBird at the Cornell Lab of Ornithology, have been shown to be predominantly highly educated, upper-middle class, middle-aged or older, and white (Purcell et al. 2012). The results concerning gender composition are mixed; however, some projects show a strong bias towards men (Hobbs and White 2012; Crall et al. 2012; Raddick et al. 2009; Wright et al. 2015). Other studies indicate that some projects offer opportunities for disadvantaged groups not otherwise available (Khairunnisa et al. 2021). There is an ethical imperative to involve a diverse group of participants to inform CS projects and provide access to their benefits (Mor Barak 2020).

CS participation is skewed not only in terms of sociodemographics but also in terms of actual contributions. Most contributions are being made by a few (Seymour and Haklay 2017). This lack of diversity in CS reflects different motivations and capacities and raises concerns about the representativeness of data and whether individual, societal, and environmental benefits are evenly distributed (Pateman et al. 2021). Deriving personal meaning and value from participating can be important to citizen scientists who typically volunteer time and effort driven by intrinsic or social motivations and not for financial compensation (Sauermann and Franzoni 2015). Diversity in participation remains a challenge, which the use of ML may exacerbate. The involvement of ML seems to be a case of what we call “designing for”, where citizens are not integrated into the design process from the beginning, but relied on to make the model (ML design) successful ex-post. It has been suggested that the involvement of citizens through CS, particularly during the research design phase, may help reduce bias in data and training annotations for AI, enable public shaping of AI, and foster a lifelong interest in science (Shanley et al. 2021). The long-standing interest among STS scholars of whether new technologies solve problems or rather manage and move them about; displacing, and making problems and some actors and their agency invisible or redundant; seems to reappear as ML and AI are combined with CS (Glaser et al. 2021).

We are left to wonder whether inequality in participation detracts from the promise to make science more democratic, both in terms of including more diverse people in doing science and in making science better aligned with the public interest (Strasser and Hacklay 2018). However, it would be contrary to some current proponents of CS to claim there are “objective” public interests that science can tap into continuously (c.f. Brown 2009). This standpoint seems too often inform voices from both policy and science, when expectations and social imagery of the availability and readiness of citizens to be mobilized into CS projects is produced. Our suggestion would be that the pursuit of such interests must be viewed as acts of performance, thus they are made and cannot be taken for granted.

The non-linearity of the HITL. In our ANT-inspired view, the constructed assemblage including citizens, experts, and ML is a complex effect resulting from mutual interaction and feedback loops, as exemplified in Figs. 5, 6, and 7. These assemblages can be seen as complex triangular systems of citizens-experts-technology in which relationships and loops need to be repeatedly “performed” by all the actors involved, or the assemblages dissolve. ANT is not the only framework that attends to these connections and interactions. Other theoretical endeavors, such as cybernetics (Wiener 1948), have explored complex feedback processes within networking and self-organization of systems. Both ANT and cybernetics aim to conceptualize complexity, they are both sensitive to the hybrid nature of phenomena, and they both emphasize system effects. However, as pointed out by Fenwick and Edwards (2010), a core difference between cybernetics and ANT is the latter’s orientation toward contingent practices and multiplicity. ANT provides a conceptual framework for analysing how the diverse entities of a classification project—including technologies—take a role through specific situated material–semiotic redistributions of expertise and epistemic agency. Uncertainty characterizes contingent practices. Humans are unlikely to act as “controllers or processors” of classifications in a linear way in the loops, as they may not repeat or reproduce the same exact actions under the same input. Even such fixed things as standards and protocols can be uncertain in practice. The loops we exemplified are not expected to be seamless, whereby ML fails to identify a pattern, citizens identify it, and experts feed the correct answer into the training data. These are expectations and possibilities that ML will perform its tasks, but there is no guarantee that citizens will unquestioningly take the assigned epistemic role and dutifully engage in checking errors, or filling gaps. Nor, for that matter, does it mean that ML itself will comply with experts’ wishes. Checking the correctness of classification can happen in multiple ways, depending on the material–semiotic network in which this task is entangled. Tracing the material–semiotic assemblage of checking data classification could reveal the continuity from one version to the other and thus the simultaneous singularity and multiplicity of the assemblage.

5.4 Limitations

This study has two main limitations. First, we relied solely on secondary sources without incorporating other methods (e.g., interviews) that could help reduce bias and compensate for the dearth of documents and their incompleteness. Our study may be limited by the use of the narratives reported in the documents, which represent the authors’ perspectives. Since most research papers tend to report on successful rather than unsuccessful projects, we are likely to have been exposed to mostly successful divisions of labour instead of those that did not work. Being aware of this potential bias, we have been careful not to use documentary evidence as a stand-in for other kinds of evidence that we could not produce using this method.

Second, our study may be limited by the small number and type of projects examined. The sample we used is purposive. Note that the selected projects reflect those that were documented at a particular moment in time, rather than being a truly representative sample of the population.

6 Conclusion

AI tools have long been the subject of concerns such as the effects on human employment and the potential for dehumanizing (Boden 1987). As AI is used in CS increasingly, questions should be raised about its impact on citizen roles. ML in CS classification projects, far from being deterministic in its nature and effects, may be open to examination. There is no guarantee that these technologies will replace citizen scientists, nor any guarantee that these technologies will provide citizens with opportunities for more interesting tasks. However, to assume that ML and other AI computational technologies can replace humans entirely in CS overestimates their current limited autonomy and “smartness”, as they still require the human intervention of experts and engaged citizens (Authors 2022).

The use of ML raises the question of which tasks will be most affected, and which will be relatively unaffected. This paper offers a descriptive account of the distribution of tasks between humans and ML in CS classification projects. It also presents how epistemic agency is acted out in terms of whose knowledge shapes the distribution of tasks, who decides what knowledge is relevant to the classification, and who validates it. Citizens and experts in CS classification projects are already affected differently by the use of ML, depending on the tasks they perform and the epistemic agency they exercise. However, currently no task can be fully handled by ML. Our analysis leads to the conclusion that the integration of ML into the socio-technical system of a classification project requires some form of relationship with humans at one level or another. Regardless of the advancement of ML, humans are likely to have an active epistemic role to play in certain decision-making loops that will affect ML operations. However, it remains to be seen what the role of citizens will look like in the future, how they will be able to exert epistemic agency, and whether they will work on higher level tasks. For example, a future study could use ethnographic methods to examine in more depth whether AI and ML technologies empower experts and disenfranchise citizens. Further studies could examine how citizens, experts, and algorithms co-evolve in these projects, and whether the content of tasks assigned to actors early in the design of projects shifts over time. Another topic might be to examine whether human–machine integration leads to skill-biased technological changes, such as ML replacing low-skill tasks. As the boundaries and distinctions between humans and machines blur, we may face unexpected obstacles, opportunities and questions worth exploring in future research.

Acknowledgements

We are most thankful to Henry Sauermann for his very valuable comments on an earlier version of this paper. A preliminary version of this paper has been published in Zenodo, https://doi.org/10.5281/zenodo.4109066.

Declarations

Conflict of interest

The authors have no conflict of interest to declare.

Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

previous article Artificial intelligence and the secret ballot

next article Ethical aspects of AI robots for agri-food; a relational approach based on four case studies

Supplementary Information

Below is the link to the electronic supplementary material.

The authors declare that all the data supporting the findings of this study are in the following supplementary files: Appendix 1. Meta-summaries (XLSX 142 KB)

Appendix 2. Sources used for the meta-summaries (PDF 128 KB)

Appendix 3. Extracted excerpts from the sources used in the meta-summaries (PDF 448 KB)

Appendix 4. Codes and categories with data examples (XLSX 52 KB)

Appendix 5. Codebook (PDF 112 KB)

Ahlstrom-Vij K (2013) Why we cannot rely on ourselves for epistemic improvement. Philos Issues 23:276–296CrossRef

Bal M (2009) Narratology: introduction to the theory of narrative. University of Toronto Press, Toronto

Bengtsson M (2016) How to plan and perform a qualitative study using content analysis. NursingPlus Open 2:8–14. https://doi.org/10.1016/j.npls.2016.01.001CrossRef

Boden M (1987) Artificial intelligence: cannibal or missionary. AI & Soc 1(1):17–23CrossRef

Bonney R (1996) Citizen science: a lab tradition. Living Bird 15(4):7–15

Bonney R, Cooper CB, Dickinson J, Kelling S, Phillips T, Rosenberg KV, Shirk J (2009) Citizen science: a developing tool for expanding science knowledge and scientific literacy. Bioscience 59(11):977–984. https://doi.org/10.1525/bio.2009.59.11.9CrossRef

Bowen GA (2009) Document analysis as a qualitative research method. Qual Res J 9(2):27–40. https://doi.org/10.3316/QRJ0902027CrossRef

Brown MB (2009) Science in democracy: expertise, institutions, and representation. The MIT Press, New YorkCrossRef

Brynjolfsson E, Mitchell T, Rock D (2018) Economic consequences of Artificial Intelligence and robotics: what can machines learn and what does it mean for occupations and the economy? AEA Papers Proc 108:43–47. https://doi.org/10.1257/pandp.20181019CrossRef

Callon M (1986) The sociology of an actor-network: the case of the electric vehicle. In: Callon M, Law J, Rip A (eds) Mapping the dynamics of science and technology: sociology of science in the real world. Palgrave Macmillan, Basingstoke, pp 19–34CrossRef

Ceccaroni L, Bibby J, Roger E et al (2019) Opportunities and risks for citizen science in the age of artificial intelligence. Citizen Sci Theory Pract 4(1):29. https://doi.org/10.5334/cstp.241CrossRef

Chubb J, Cowling P, Reed D (2021) Speeding up to keep up: exploring the use of AI in the research process. AI & Soc. https://doi.org/10.1007/s00146-021-01259-0CrossRef

Clemente JC (2020) Can a bot named Sam help citizen scientists save our oceans? IBM Blogs, 2020. https://www.ibm.com/blogs/industries/unep-ai-marine-pollution-sam-virtual-human-citizen-science/. Accessed 18 Oct 2021

Collins H, Evans R (2007) Rethinking expertise. The University of Chicago Press, ChicagoCrossRef

Crall AW et al (2012) The impacts of an invasive species citizen science training program on participant attitudes, behaviour, and science literacy. Public Underst Sci. https://doi.org/10.1177/0963662511434894CrossRef

Crowston K, Mitchell E, Østerlund C (2019) Coordinating advanced crowd work: extending citizen science. Citizen Sci Theory Pract 4(1):16. https://doi.org/10.5334/cstp.166CrossRef

Davidson D (1980) Essays on actions and events. Oxford University Press, Oxford

Eitzel MV, Cappadonna JL, Santos-Lang C et al (2017) Citizen science terminology matters: exploring key terms. Citizen Sci Theory Pract 2(1):1. https://doi.org/10.5334/cstp.96CrossRef

Epstein S (1995) The construction of lay expertise: AIDS activism and the forging of credibility in the reform of clinical trials. Sci Technol Human Values 20(4):408–437. https://doi.org/10.1177/016224399502000402CrossRef

Faraj S, Pachidi S, Sayegh K (2018) Working and organizing in the age of the learning algorithm. Inf Organ 28(1):62–70. https://doi.org/10.1016/j.infoandorg.2018.02.005CrossRef

Fenwick T, Edwards R (2010) Actor-network theory in education. Routledge, AbingdonCrossRef

Fitts PM (1951) Human engineering for an effective air-navigation and traffic-control system. Division of National Research Council, Oxford

Franzen M, Kloetzer L, Ponti M, Trojan J, Vicens J (2021) Machine learning in citizen science: promises and implications. In: Vohland K, Land-Zandstra A, Ceccaroni L, Lemmens R, Perelló J, Ponti M et al (eds) The science of citizen science. Springer, Cham, pp 183–198. https://doi.org/10.1007/978-3-030-58278-4_10CrossRef

Franzoni C, Sauermann H (2014) Crowd science: the organization of scientific research in open collaborative projects. Res Policy 43(1):1–20. https://doi.org/10.1073/pnas.1408907112CrossRef

Franzoni C, Poetz M, Sauermann H (2021) Crowds, citizens, and science: a multi-dimensional framework and agenda for future research. Ind Innov. https://doi.org/10.1080/13662716.2021.1976627CrossRef

Fritz S, See L, Carlson T et al (2019) Citizen science and the United Nations Sustainable Development Goals. Nat Sustain 2:922–930. https://doi.org/10.1038/s41893-019-0390-3CrossRef

Funtowicz SO, Ravetz J (1990) Uncertainty and quality in science for policy. Kluwer Academic Publishers, DordrechtCrossRef

Geoghegan H, Dyke A, Pateman R, West S, Everett G (2016) Understanding Motivations for Citizen Science. Final report on behalf of UKEOF. University of Reading, Stockholm Environment Institute (University of York) and University of the West of England, UK

Glaser VL, Pollock N, D’Adderio L (2021) The biography of an algorithm: performing algorithmic technologies in organizations. Organization Theory 2:1–27. https://doi.org/10.1177/26317877211004609CrossRef

Grundmann R (2017) The problem of expertise in knowledge societies. Minerva 5(1):25–48. https://doi.org/10.1007/s11024-016-9308-7CrossRef

Hackman JR (1969) Toward understanding the role of tasks in behavioral research. Acta Physiol (oxf) 31:97–128. https://doi.org/10.1016/0001-6918(69)90073-0CrossRef

Hobbs SJ, White PCL (2012) Motivations and barriers in relation to community participation in biodiversity recording. J Nat Conserv 20(6):364–373. https://doi.org/10.1016/j.jnc.2012.08.002CrossRef

Hochachka WM, Fink D, Hutchinson RA, Sheldon D, Wong WK, Kelling S (2012) Data-intensive science applied to broad-scale citizen science. Trends Ecol Evol 27(2):130–137. https://doi.org/10.1016/j.tree.2011.11.006CrossRef

Hsieh H-F, Shannon SE (2005) Three approaches to qualitative content analysis. Qual Health Res 15(9):1277–1288. https://doi.org/10.1177/1049732305276687CrossRef

Irwin A (1995) Citizen science: a study of people, expertise and sustainable development. Routledge, London

Ischell (2019) Artificial intelligence meets citizen science. Discover. https://www.discovermagazine.com/technology/artificial-intelligence-meets-citizen-science

Kasperowski D, Hillman T (2018) The epistemic culture in an online citizen science project: programs, antiprograms and epistemic subjects. Soc Stud Sci 48(4):564–588. https://doi.org/10.1177/0306312718778806CrossRef

Kasperowski D, Kullenberg C, Rohden F (2019) The epistemology of mobilising citizens in the sciences: tensions in epistemic cultures of contribution and ideals of science. In: Mäkitalo Å, Nicewonger TE, Elam M (eds) Designs for experimentation and inquiry: approaching learning and knowing in digital transformation, 1st edn. Routledge, Oxon, pp 158–177CrossRef

Kasperowski D, Hagen N (2022) Ethical boundary work in citizen science: themes of insufficiency. Nordic J Sci Technol Stud 9(2):13–24. https://www.ntnu.no/ojs/index.php/njsts/article/view/4318

Kelling S, Gerbracht J, Fink D et al (2012) A human/computer learning network to improve biodiversity conservation and research. AI Mag 34(1):10–10. https://doi.org/10.1609/aimag.v34i1.2431CrossRef

Khairunnisa I, Khodursky S, Yasseri T (2021) Gender imbalance and spatiotemporal patterns of contributions to citizen science projects: the case of Zooniverse. Front Phys 9:1–12. https://doi.org/10.3389/fphy.2021.650720CrossRef

Knorr Cetina K (1999) Epistemic cultures. How the sciences make knowledge. Harvard University Press, CambridgeCrossRef

Knorr-Cetina K (2007) Culture in global knowledge societies: knowledge cultures and epistemic cultures. Interdisc Sci Rev 32(4):361–375. https://doi.org/10.1179/030801807X163571CrossRef

Kolakowski L (1972) Positivist philosophy. From Hume to the Vienna Circle. Pelican Books, Harmondsworth

Latour B (1990) Drawing things together. In: Lynch M, Woolgar S (eds) Representation in scientific practice. MIT Press, Cambridge, pp 19–68

Latour B (1992) Where are the missing masses, sociology of a few mundane artefacts. In: Bijker W, Law J (eds) Shaping Technology-Building Society. Studies in Sociotechnical Change. MIT Press, Cambridge, pp. 225–259. [New expanded and revised version of article (35). Republication in the reader Johnson Deborah J, Jameson M W (eds) Technology and Society, Building Our Sociotechnical Future. MIT Press, Cambridge Mass, 2008 pp. 151–180]

Latour B (2005) Reassembling the social: an introduction to actor-network theory. Oxford University Press, OxfordCrossRef

Leach B, Parkinson S, Lichten CA et al (2020) Emerging developments in citizen science: reflecting on areas of innovation. RAND Corporation, Santa Monica. https://doi.org/10.7249/RR4401CrossRef

Lebovitz S, Levina N, Lifshitz-Assaf H (2021) Is AI ground truth really true? The dangers of training and evaluating AI tools based on experts’ know-what. MIS Quart 45(3):1501–1525. https://doi.org/10.25300/MISQ/2021/16564CrossRef

Lotfian M, Ingensand J, Brovelli MA (2021) The partnership of citizen science and machine learning: benefits, risks, and future challenges for engagement, data collection, and data quality. Sustainability 13:8087. https://doi.org/10.3390/su13148087CrossRef

Mazzocchi F (2015) Could Big Data be the end of theory in science? A few remarks on the epistemology of data-driven science. EMBO Rep 16(10):1250–1255. https://doi.org/10.15252/embr.201541001CrossRef

McClure EC, Sievers M, Brown CJ, Buelow CA, Ditria EM, Hayes MA et al (2020) Artificial intelligence meets citizen science to supercharge ecological monitoring. Patterns (NY) 1(7):100109. https://doi.org/10.1016/j.patter.2020.100109CrossRef

Mor Barak ME (2020) The Practice and Science of Social Good: Emerging Paths to Positive Social Impact. Res Soc Work Pract 30(2):139–150. https://doi.org/10.1177/1049731517745600CrossRef

Ottinger G (2010) Buckets of resistance: standards and the effectiveness of citizen science. Sci Technol Human Values 35(2):244–270. https://doi.org/10.1177/0162243909337121CrossRef

Pachidi S, Berends H, Faraj S, Huysman M (2021) Make way for the algorithms: symbolic actions and change in a regime of knowing. Organ Sci 32(1):18–41. https://doi.org/10.1287/orsc.2020.1377CrossRef

Pateman R, Dyke A, West S (2021) The diversity of participants in environmental citizen science. Citizen Sci Theory Pract 6(1):9. https://doi.org/10.5334/cstp.369CrossRef

Patton MQ (2002) Qualitative research and evaluation methods, 3rd edn. Sage Publications, Thousand Oaks

Pollock N, Williams R (2009) Software and Organisations: the biography of the enterprise- wide system or how SAP conquered the world. Routledge, London

Ponti M, Seredko A (2022) Human-machine-learning integration and task allocation in citizen science. Humanit Soc Sci Commun 9(48). https://doi.org/10.1057/s41599-022-01049-z

Ponti M, Kloetzer L, Ostermann FO, Miller G, Schade S (2021) Can’t we all just get along? Citizen scientists interacting with algorithms. Hum Comput 8(2):5–14. https://doi.org/10.15346/hc.v8i2.128CrossRef

Popenici SAD, Kerr S (2017) Exploring the impact of artificial intelligence on teaching and learning in higher education. Res Pract Technol Enhanced Learn. https://doi.org/10.1186/s41039-017-0062-8CrossRef

Popper K (1934) The logic of scientific discovery. Julius Springer, Vienna

Purcell K, Garibay C, Dickinson JL (2012) A gateway to science for all: celebrate urban birds. In: Dickinson JL, Louv R, Bonney R (eds) Citizen science: public participation in environmental research. Comstock Publication Associates, Ithaca, pp 191–200CrossRef

QSR International (2020) NVivo Qualitative Data Analysis Software [Software]

Raddick MJ, Bracey G, Carney C, Gyuk G et al (2009) Citizen science: status and research directions for the coming decade. Astro2010: The Astronomy and Astrophysics Decadal Survey, Position Papers, no. 46

Reyes-Galindo L (2014) Linking the subcultures of physics: virtual empiricism and the bonding role of trust. Soc Stud Sci 44(5):736–757. https://doi.org/10.1177/0306312714539058CrossRef

Sandelowski M, Barroso M (2003) Creating metasummaries of qualitative studies. Nurs Res 52(4):226–233. https://doi.org/10.1097/00006199-200307000-00004CrossRef

Sauermann H, Franzoni C (2015) Crowd science user contribution patterns and their implications. PNAS 112(3):679–684. https://doi.org/10.1073/pnas.1408907112CrossRef

Sauermann H, Vohland K, Antoniou V, Balazs B, Goebel C, Karatzas K, Mooney P, Perello J, Ponti M et al (2020) Citizen science and sustainability transitions. Res Policy 49(5):103978. https://doi.org/10.1016/j.respol.2020.103978CrossRef

Seaver N (2019) Knowing algorithms. In: Vertesi J, Ribes D (eds) Digitalsts: a field guide for science & technology studies. Princeton University Press, Princeton, pp 412–422CrossRef

Seymour V, Haklay M (2017) Exploring engagement characteristics and behaviours of environmental volunteers. Citizen Sci Theory Pract 2(1):5. https://doi.org/10.5334/cstp.66CrossRef

Shanley LA, Fortson L, Berger-Wolf T, Crowston K, Michelucci P (2021) Imagine all the people: citizen science, artificial intelligence, and computational research. Computing Community Consortium (CCC), Washington D.C.

Shih PC (2018) Beyond human-in-the-loop: Empowering end users with transparent ML. In: Zhou J, Chen F (eds) Human and ML. Visible, explainable, trustworthy and transparent. Springer, Cham, pp 37–54. https://doi.org/10.1007/978-3-319-90403-0_3CrossRef

Sloane M, Moss E, Awomolo O, Forlano L (2020) Participation is not a design fix for machine learning. arXiv:2007.02423v3 [cs.CY]

Sloane M (2020) Participation-washing could be the next dangerous fad in machine learning. MIT Technology Review. https://www.technologyreview.com/2020/08/25/1007589/participation-washing-ai-trends-opinion-machine-learning/. Accessed 18 Oct 2021

Strasser B, Haklay ME (2018) Citizen Science: Expertise, Democracy, and Public Participation. SSC Policy Analysis 1/2018: 1–92. Swiss Science Council, Bern, Switzerland. https://discovery.ucl.ac.uk/id/eprint/10062223. Accessed 10 Oct 2021

Takano S (2021) Thinking machines: machine learning and its hardware implementation. Academic Press, London

Tausch A, Kluge A (2020) The best task allocation process is to decide on one’s own: effects of the allocation agent in human–robot interaction on perceived work characteristics and satisfaction. Cogn Tech Work. https://doi.org/10.1007/s10111-020-00656-7CrossRef

Wiener N (1948) Cybernetics; or Control and Communication in the Animal and the Machine. MA, MIT Press, Cambridge

Wright DR, Underhill LG, Keenec M, Knight AT (2015) Understanding the motivations and satisfactions of volunteers to improve the effectiveness of citizen science programs. Soc Nat Resour 28:1013–1029. https://doi.org/10.1080/08941920.2015.1054976CrossRef

Wright DE, Fortson L, Lintott CJ et al (2019) Help me to help you: machine augmented citizen science. ACM Trans Soc Comput 2(3):1–20. https://doi.org/10.1145/3362741CrossRef

Title: Narratives of epistemic agency in citizen science classification projects: ideals of science and roles of citizens
Authors: Marisa Ponti
Dick Kasperowski
Anna Jia Gander
Publication date: 26-03-2022
Publisher: Springer London
Published in: AI & SOCIETY / Issue 2/2024
Print ISSN: 0951-5666
Electronic ISSN: 1435-5655
DOI: https://doi.org/10.1007/s00146-022-01428-9

Springer Professional

Narratives of epistemic agency in citizen science classification projects: ideals of science and roles of citizens

Abstract

Supplementary Information

Publisher's Note

1 Introduction and background

1.1 The impact of ML on tasks in CS

1.2 Distributing tasks and epistemic agency

2 What is epistemic agency?

3 Methodology

4 Results

4.1 Distribution of tasks between humans and ML

4.2 Epistemic agency

5 Discussion

5.1 Emphasis on optimization to achieve greater scale, accuracy, and speed: which ideal of science?

5.2 Implications of optimization for the role of citizen scientists

5.3 Participation of citizens—or the lack thereof—and the use of ML

5.4 Limitations

6 Conclusion

Acknowledgements

Declarations

Conflict of interest

Publisher's Note

Supplementary Information

Premium Partner

Springer Professional

Abstract

Supplementary Information

Publisher's Note

1 Introduction and background

1.1 The impact of ML on tasks in CS

1.2 Distributing tasks and epistemic agency

2 What is epistemic agency?

3 Methodology

4 Results

4.1 Distribution of tasks between humans and ML

4.2 Epistemic agency

5 Discussion

5.1 Emphasis on optimization to achieve greater scale, accuracy, and speed: which ideal of science?

5.2 Implications of optimization for the role of citizen scientists

5.3 Participation of citizens—or the lack thereof—and the use of ML

5.4 Limitations

6 Conclusion

Acknowledgements

Declarations

Conflict of interest

Publisher's Note

Supplementary Information

Other articles of this Issue 2/2024

Book review: Luca Possati (2021): “The algorithmic unconscious: how psychoanalysis helps in understanding AI” (Routledge)

Ethical aspects of AI robots for agri-food; a relational approach based on four case studies

The news framing of artificial intelligence: a critical exploration of how media discourses make sense of automation

Two-stage approach to solve ethical morality problem in self-driving cars

Toward the symbiocene through artificial intelligence

Discussion of ethical decision mode for artificial intelligence

Premium Partner