Assessing the substantive outputs of deliberative minipublics: a categorical framework for policy recommendations
- Open Access
- 01.12.2025
- Research Article
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by (Link öffnet in neuem Fenster)
Abstract
Introduction
Over the last two decades, deliberative minipublics (DMPs), such as citizens’ assemblies (CAs), have been gaining popularity as a way to supplement traditional political decision-making, especially across the Western world (OECD, 2020). In particular, they can be considered an ever more significant element of the polycentric landscape of climate policy-making (Jordan et al., 2018). Minipublics are composed of randomly selected citizens who are tasked with developing policy recommendations on a specific issue after being presented with balanced information on the topic and engaging in deliberations with one another. They are claimed to address many of the obstacles faced nowadays by liberal democracies, especially to democratize the policy-making process and empower the perspectives of regular citizens. Moreover, they are believed to heighten the epistemic value of public decisions (Carson & Martin, 1999; Cohen, 2009; Estlund & Landemore, 2018; Setälä, 2017) by helping to overcome complexity (Dryzek, 1990) and addressing the cognitive challenges of democratic citizenship (Warren & Gastil, 2015). Finally, DMPs are expected to develop more efficient policies, thus enhancing the output legitimacy of policy-making (Papadopoulos & Warin, 2007). According to Goodin (2017), their design which combines deliberation and voting represents “the epistemically best form of democracy” as it offers the highest chances of finding the best among available responses to political issues.
On the other hand, we know that DMPs’ value for the broader sociopolitical system is ambivalent and depends on both the internal quality of the process and its fit within the context of the system as a whole (Curato & Böker, 2016; Bächtiger et al., 2020; Vrydagh, 2023). Moreover, research on other decision-making processes reveals that the perceived legitimacy of the process is strongly influenced by the favorability of the outcome (Christensen et al., 2020). However, despite CAs’ increasing relevance in policymaking, little is known about the contents of their recommendations. Therefore, we argue that they deserve more systematic attention and aim to establish this link with the following study.
Anzeige
As of now, few studies have examined the recommendations that result from the existing deliberative processes, and they have not proposed universally applicable ways of comparing them across cases. However, such frameworks have already been developed by comparative analyses of policy outputs produced in traditional policy-making practices and for some other types of advisory bodies. In this article, we utilize the existing approaches and adapt them to the particular nature of outputs (i.e., recommendations) developed by CAs.
To do so, we derive the applicable categories for analyzing the contents of policy outputs and policy advice from the existing literature and juxtapose them with an empirically derived list of categories developed for analyzing DMPs’ recommendations. The latter was created based on the nine interviews and 2 focus groups we conducted with various types of actors (citizens, experts, stakeholders, and public officials) engaged in 9 CAs organized in Poland. We asked them to assess the recommendations of the selected CA cases to determine which of their features they consider relevant. Finally, we applied these categories to pilot code a portion of the recommendations collected by us in a base of all recommendations produced by CAs in Poland.
As a result of our study, we provide a framework for categorizing policy recommendations produced by citizens’ assemblies (as well as other DMPs or any similar participatory processes). The framework will allow researchers in the field of democratic innovations to analyze the contents of DMPs’ substantive outputs and compare them across cases. The framework’s flexibility allows for adjusting it to various research questions, e.g., focusing only on their form and content or how they are assessed and perceived by different actors.
In the following sections of the paper we describe the existing literature on minipublics’ outputs and policy outputs and specifically explain the methodology used in this research. Next, the main findings of the study are provided: we present a categorical framework for assessing deliberative minipublics’ policy recommendations and describe briefly each of the identified categories. Finally, in the last section, we discuss our findings, including exploring areas for further studies on the subject.
Anzeige
Literature on minipublics’ outputs
Over the last decades, research on DMPs has focused on analyzing their internal processes, quality of deliberation, or their functions in the broader political system (e.g., Bächtiger et al., 2020; Curato & Böker, 2016). Minipublics’ outputs, i.e., their recommendations, have thus far received surprisingly little attention. They have been mainly used for measuring the policy impacts of DMPs, often conceptualized as the congruency between policy recommendations and public policies (Vrydagh, 2022). We discuss below what is known from the few studies that paid more specific paid attention to CAs’ outputs, and point out some limitations. In their analysis of factors relevant to the implementation of proposals from participatory processes, Font et al. (2018) found that proposal-level variables have the strongest explanatory power for the proposals’ fate. Their results demonstrate that recommendations that are more likely to be implemented do not challenge the existing practices of an administration, are less costly, receive external funding, and are supported by the local administration and civil servants. On the other hand, the number of proposals developed by a participatory process was not relevant for their implementation. The study required data on the more or less challenging nature of recommendations to be collected through interviews, and it only included Spanish cases.
Lage et al. (2023) analyzed the content of policy recommendations developed by eleven national- and EU-level CAs on environmental issues that took place in Europe by 2022. The authors found that the share of policies aligned with the so-called sufficiency strategy to achieve sustainability goals (Jungell-Michelsson & Heikkurinen, 2022) was considerably higher in CAs’ recommendations than in the existing public policies. These policy proposals were also more equally distributed between different policy sectors and more often included regulatory policy instruments than was the case in the analyzed policy documents. Therefore, citizens’ assemblies’ recommendations were found to be substantively and consistently different than policies developed by traditional policy-makers. However, the study identified significant differences between countries, which indicates the need for further research to explain the sources of this variation. Since each country was represented only by a single CA case, the differences could be potentially tracked to not only local contexts of policy support but also internal differences between the analyzed minipublics’ designs1.
For Stark et al. (2021), citizens’ deliberation can be valuable to policy-makers even when the process is consultative since it results in an observable change in policy proposal support, which is lacking in more traditional, non-deliberative forms of public consultations, and thus offers more in-depth insights about the public opinion. The preference transformation was tracked with a pre- and post-survey asking the participants about their support levels for the four predetermined policy measures. However, in real-life DMPs, participants usually develop the policy recommendations themselves, sometimes based on experts’ or stakeholders’ suggestions, rather than vote on preformulated measures put up for deliberation. This fact highlights the need for studies on DMPs’ outputs developed in a pertinent manner, i.e., by the citizens themselves, rather than merely measuring their support for the available alternatives.
It can be expected that elements of the internal design and content of deliberation might be relevant for its outputs, i.e., the formulation and support for policy recommendations. Notably, Muradova et al. (2020) proved that expert and non-expert speakers’ effectiveness of communication, their gender, and the repetition of proposals by several speakers played a role in the final uptake of policy proposals by participants of the Irish Citizens’ Assembly. At the same time, Leino et al. (2022) found that neither the experts’ field of specialization nor the order of expert hearings impact the development of participants’ views. Duvic-Paoli (2022) points out that the particular ways in which the work on recommendations in CAs’ processes is organized can have consequences for the nature of the resulting outputs. Among others, it can lead to an incoherence between different recommendations, reinforce the existing sectoral silos in climate policies, or fail to provide decision-makers with clear or precise guidelines for policymaking. However, the paper analyzes the overall contributions of the analyzed CA cases to climate governance in broad terms and does not systematically investigate the contents of their recommendations. Therefore, more research is still needed to help us better understand the relationship between deliberative processes and their substantive outputs.
Meanwhile, providing recommendations on public policies can be considered the main aim of DMPs (OECD, 2020). It is, therefore, vital to investigate the recommendations they provide and to develop tools for comparing them across cases. In particular, comparing recommendations is key to understanding what issues they address, what they propose and to whom, and how they influence existing policies. Analyzing these issues can provide insights into how much DMPs actually change the dominant paradigms and to what extent they represent a solution to urgent climate and environmental challenges. To develop an analytical framework aimed specifically at recommendations of deliberative minipublics and similar processes, we build on literature from other subfields of political science that have been analyzing policy outputs, i.e., literature on comparative public policy and on policy advice.
Literature on policy outputs
In a broad sense of the policy cycle, we do not consider recommendations from participatory consultative processes, such as citizens’ assemblies, to represent policy outputs (Knill & Tosun, 2012; Steinebach, 2023), as they are not directly effective in the form of adopted regulations before being incorporated into legislation by policy-making bodies. Instead, as proposals that reflect citizens’ preferences, they may feed into the policy cycle as a form of policy input. However, in more specific terms, recommendations represent the substantive outputs of individual deliberative processes, being the direct result of their work. What these two types of outputs have in common is that both involve policy content, such as certain policy instruments and their settings. In this article, we will not focus on the function of recommendations in policy cycles, but rather on their content and propose a framework for their systematic comparison.
To this end, we believe it is useful to review the existing literature on policy outputs. It should be noted that, over the last few decades, issues related to environmental policy have been, along with urban planning and health policy, among the most common topics of deliberative minipublics (OECD, 2020). Therefore, when developing a method for a systematical comparison of the contents of CAs’ recommendations, we turn in particular to the literature on the outputs of environmental policy.
The analysis of policy output is a common challenge in policy research, originating from the lack of a common definition of the object of comparison. Moreover, it results from a variety of measurements used and a limited comparability of the findings (Schaffrin et al., 2015; Steinebach, 2023). Due to the variety of existing conceptualizations, we base our framework on selected articles that propose categorizations relevant to different strands of research on environmental policy outputs.
First, we build on the comprehensive measure developed by Schaffrin et al. (2015) who joined three established perspectives on climate policy outputs: a taxonomy of ends and means of policy (Howlett & Cashore, 2009), the conceptualization of policy density and intensity (Knill et al., 2012), and types of policy instruments (Bemelmans-Videc et al., 1998; Hood, 2007; Macdonald, 2001; Steurer, 2011). The authors suggest different approaches for analyzing single policy instruments (focus on policy intensity, i.e., types of instruments and their ‘objectives’, ‘settings’, and ‘calibrations’) and for analyzing whole policy portfolios (where the number of policy instruments, i.e., policy density, is added up and weighed against the intensity measures). This suggests two possible courses of action for analyzing CA outputs: at the individual recommendation level or their whole final lists of recommendations as policy portfolios. Moreover, Schaffrin et al. point out the limitations of the existing approaches. First, earlier studies comparing the intensity of policies were primarily based on expert evaluations or media coverage, so the validity of these studies depended on the applied data collection methods. Second, they have focused mainly on the national and top-down levels of climate policy-making, demonstrating the need for research on other institutional levels of policy output.
Second, we analyzed the measurement approach for policy complexity initially proposed by Hurka et al. (2022) and further developed by Hurka et al. (2024). The measurement was developed to analyze the specific type of policy outputs, which are legal acts. The frameworks proposed by the authors consist of three aspects that translate into the complexity of a policy result—structural, linguistic, and relational aspects of legislative text. Each of these aspects includes several categories of measurement, such as technicality, formulation, readability, signal uniformity, depth, embeddedness, and interdependence. What is important is that the mentioned categories do not relate to the content of the analyzed text. Instead, they rely on the indicators concerning the language (e.g., its simplicity, the number of definitions used, or the employed terminology’s uniformity), the text structure (e.g., the number of levels of the sections), and the number of external and internal references. Thus, the described framework can serve as a reference point for analyzing the language layer of CAs’ recommendations.While policy-making bodies create official policy documents, advisory bodies formulate their recommendations aimed at the policy-makers. These recommendations represent a distinct type of process output and lend themselves to a different type of analysis. To employ the findings of the policy advice literature for building our framework, we rely on a study by Dudley et al. (2022). The authors analyzed recommendations made by the UK Climate Change Committee (CCC), a body established by the 2008 UK Climate Change Act to provide policy advice to policy-makers in national and devolved legislative bodies. Dudley et al. developed a list of categories for coding the individual recommendations found in the CCC’s annual reports. The categories include having an addressee, a sectoral focus, targets, action points (i.e. clearly formulated future actions), and being repeated across reports published over time. Moreover, they apply Fischer’s (1980, 1990, 1995) framework to assess the ambitiousness of policy recommendations, i.e., to what extent they challenge the policy status quo. One of their conclusions is that many of the CCC’s recommendations have fallen short of the normative guidance of what elements policy advice should consist of in order to be successful. However, there is a need for further research on why advisory bodies develop their recommendations in the way they do and what factors influence the impacts of policy advice.
Despite the similarities with other types of advisory bodies, the distinctiveness of deliberative minipublics calls for a separate approach to analyzing their recommendations. An example of such a framework is provided by the classification of minipublics’ recommendations proposed by Poole and Elstub (2025), which can be used to analyze the relationship between various types of recommendations and the policy impact of these minipublics. The classification was based on the earlier work of Benton and Russell (2013), who created a similar categorization regarding British parliamentary select committees’ recommendations. It included four categories: target, type of action required, measurability, and level of policy change. According to Poole and Elstub’s (2025) classification, minipublics’ recommendations can be targeted at a parliament, central or local government, industry, civil society, or the public. These actors may be required by the recommendations to undertake such actions as government policy (or policy and practice), implementation of a certain approach (Poole & Elstub, 2025), legislation, funding, guidance, or disclosure of information by the government (Benton & Russell, 2013). The category of measurability, in turn, aims to assess the extent to which it is possible to determine whether a recommendation has been met or not, using a scale of easy, medium, and impossible. Finally, the level of policy change evaluates the degree of change expected by a recommendation in relation to the existing policy, categorized on a scale of no/small, medium, and large degree of change.
Although the described classification was developed to analyze the minipublics’ recommendations, its scope is limited, as it does not recognize many of the categories proposed in the aforementioned classifications applied to policy outputs. Therefore, the framework we present in the results section represents an advancement in linking the most recent findings on deliberative minipublics to public policy, which they increasingly co-create through their recommendations.
Methods
Our research focused on analyzing the above-mentioned literature on policy outputs alongside empirical examination of the CAs’ recommendations. In our study, we compared various categorizations found in the literature with the categories identified through the empirical analysis.
The subject of the empirical part were the recommendations of the Polish CAs organized in 2016–2023. In total, we analyzed 9 cases of CAs, one per each city in Poland where they were commissioned (Gdańsk, Lublin, Łódź, Wrocław, Warszawa, Poznań, Kraków, Rzeszów) and one per the national CA. In cities where more than one case was organized, we randomly selected the cases to be analyzed.
As the empirical part of the study, we first conducted nine in-depth interviews with three groups of actors who participated in each of the analyzed CAs and had expertise in their topics: public officials, non-state stakeholders, and independent experts (three interviews per group). Additionally, we organized two focus group interviews (FGIs), with one group of former assembly members from Kraków and one control group composed of demographically comparable residents of the same city who never took part in a CA. Conducting an FGI in this city was possible because former CA participants’ contact information remained available through cooperation with the organization that conducted the CA process and the relatively short time that had passed since the assembly concluded.
In the semi-structured individual interviews, each interviewee was asked about their assessment of the recommendations developed in the CA in which they participated, and of one additional case of a CA organized in a different Polish city, which was randomly assigned to them. They were sent both lists of recommendations in advance and handed a printout during the interview. In the case of FGIs, both groups of participants were given only the list of recommendations produced by the CA locally organized in their city. Both the individual interviewees and the FGI groups (working in smaller groups of 2–3 participants) were asked to identify the recommendations they considered to be of the highest and lowest quality, based on their own perspective, and explain the reasoning behind these choices. The interviews were conducted using prepared semi-structured scripts. All of the analyzed lists of recommendations came from the CAs’ final reports or other post-process documents.
Interviewing a variety of different stakeholders with the same set of questions was justified by theoretical claims concerning the representativeness of minipublics. In order to address the existing inequalities and improve the epistemic value of decision-making, deliberative democracy theory aims to guarantee the representation of various social groups in decision-making. This goal is reflected in the design of deliberative minipublics, which engage various types of actors representing different perspectives and interests. Therefore, we aimed to develop our framework based on assessments expressed by various groups of actors, representing different social groups and perspectives on policies and the policy-making process.
After transcribing the interviews, we inductively coded the transcripts, searching for the categories that the interviewees used to discuss and assess the CAs’ recommendations. Based on that, we created a list of categories and organized them into supercategories, thus drafting an initial categorical framework.
Afterward, based on the lists of recommendations from all analyzed cases, we created a database of 656 recommendations. Then, we randomly selected 10% of them (ensuring a proportional share of recommendations from each CA case) to test out the framework through pilot coding the recommendations against the initial list of categories. The pilot coding of 65 recommendations revealed the (non-)applicability of some categories, the need to modify (e.g., broaden) certain definitions, and the necessity of adding further categories that were missing.
Finally, we merged the categories identified in the theoretical literature with those inductively identified through interviews and our pilot coding of the recommendations, resulting in an abductively developed framework that we present below.
To test the reliability of the developed categories, the core categories of the final framework were applied again to a fresh batch of 10% randomly selected recommendations. The selection followed the same procedure as for the pilot coding, but the recommendations used in the previous probe were excluded. The intercoder reliability was calculated based on recommendations coded uniformly as a percentage of all 65 recommendations included in the probe. In cases where a few categories did not apply to all individual recommendations (e.g., subordinate categories 1aa-ab. and 3a–k., as well as 4aa.), the denominator was adjusted accordingly. Moreover, for the two categories with values other than binary, i.e., Preciseness (5a.) and Policy sectors (4c.), we measured the continuous degree of agreement between the coders rather than a binary value. Finally, the Complexity (5e.) category was excluded from the calculation due to its aggregate nature, as was the complementary categories section, which addresses subjective quality assessments rather than replicability.
This final step of the research did not result in any additional modifications to the framework. The conclusions from the reliability test are discussed in the last part of the results section.
Figure 1 presents an overview of the research steps conducted in the study.
Fig. 1
Visual representation of the research process
Results
The result of our analysis is the categorical framework presented in Table 1. below. The framework comprises a list of 32 core categories, organized into five groups (or supercategories) and two supplementary categories groups.
The categories are applied to an individual CA recommendation using a binary nominal scale, i.e., each of the categories can be assigned a “yes” or “no” (or 0–1) value (with the exception of categories 4c. and 5e. with their different specified values). As per design, any number of categories (even all of them) within a supercategory group can receive any of the two values. The categories are generally independent, except for (1) the categories 1aa-1ab. that specify how a recommendation addresses the CA’s policy goal(s) if it does (i.e., 1a. has the positive value); (2) the categories 6aa-6ab. that apply only when a proposed policy solution is new (i.e., 6a. is positive)—they evaluate its relation to other policy measures and are mutually exclusive; (3) the whole supercategory of policy instruments type (3.) applies only when a recommendation suggests the presence of any policy instruments (i.e., 2b. has a positive value). We describe each of the categories in more detail further below.
Table 1
A categorical framework for assessing deliberative minipublics’ policy recommendations. The shaded rows signify categories that are dependent on an earlier category being present
Core categories | Goal | 1a. | Complies with the policy goal(s) | |
1a. =yes | 1aa. | Addresses the causes of the problem | ||
1ab. | Addresses the consequences of the problem | |||
1b. | Aims for different policy goal(s) | |||
Function | 2a. | Formulates goals or objectives | ||
2b. | Formulates policy measures | |||
2c. | Specifies the implementation of policy measures | |||
Type of intervention (only if 2b.=yes) | 3a. | Prohibitions or orders | ||
3b. | Execution of existing regulations | |||
3c. | Fiscal instruments (e.g., fees and taxes) | |||
3d. | Incentives for non-public actors | |||
3e. | Education, information, communication | |||
3 f. | Standards and best practices (incl. certification, voluntary agreements) | |||
3 g. | Management (incl. monitoring, conducting analyses, creating strategies, plans, etc.) | |||
3 h. | Institutional changes (incl. changes to participatory mechanisms) | |||
3i. | Cooperation with other entities (incl. lobbying) | |||
3j. | Public investments (incl. investments in existing infrastructure) | |||
3k. | Public services (e.g., financial, knowledge, technology) | |||
Scope of intervention | 4aa. | Temporal scope: Long-term solutions | ||
4ab. | Temporal scope: Long implementation perspective | |||
4b. | Holistic scope | |||
4c. | Policy sector(s) | |||
4da. | Addressee: Local authority | |||
4db. | Addressee: Regional authority | |||
4dc. | Addressee: National authority | |||
4e. | Universal beneficiary | |||
4 f. | Universal burdened actors | |||
Linguistic form | 5a. | Precise | ||
5b. | Comprehensible & correct | |||
5c. | External references to research, data, technical standards, etc. | |||
5d. | External references to values | |||
5e. | Complexity | |||
Complementary categories | Relations | 6a. | A new solution (in relation to existing regulations, practices, and plans) | |
6a. =yes | 6aa. | New and contradicting existing measures | ||
6ab. | New and improving (e.g., clarifying, integrating) existing measures | |||
7a. | Duplicating other recommendations | |||
7b. | Contradicting other recommendations | |||
7c. | Total number of recommendations | |||
Quality | 8a. | Adapted to the local context | ||
8b. | Based on everyday experiences | |||
8c. | Effective | |||
8d. | Justified | |||
8e. | Priority | |||
8 f. | Radical measure | |||
8 g. | Realistic measure | |||
8 h. | Up-to-date (vs. outdated) measure | |||
8i. | Verifiable | |||
Figure 2 illustrates the sources—whether literature, interviews, both, or the authors’ conceptual work—where each category was derived from.
Fig. 2
Sources of individual categories in the framework. Elements shown in italics indicate overarching categories
Goal
This group of categories was developed based on the interviews only, as it was not present in the literature on climate policies or other advice-giving bodies and marks an aspect specific to citizens’ assemblies. Traditional decision-making bodies, as per the principle of legality, act on the basis of specific legal foundations that precisely outline their policy-making activities. The equivalent in the case of any citizens’ assembly is its guiding question (also referred to as the CA’s remit), which delineates the thematic scope of its work and, thus, the mandate of its recommendations. The difference (discussed further in the conclusions) is that not all CA recommendations will adhere to this scope, which creates the need for the following categories.
When a recommendation complies with the policy goal(s) of the process (1a.), it can address the causes of the problem (1aa.), e.g., aim to limit the sources of transport- and domestic heating-related air pollution or address the effects of the problem (1ab.), e.g. by issuing air pollution alerts to communicate to the local population that they should avoid going outside to minimize the negative health impacts.
Depending on the process design and its particular course, the final list of recommendations might contain items that are only indirectly related to the exact remit. They can either concern different policy fields than designated (e.g., by addressing the city’s greenery rather than transportation) or be incompatible with one of the policy goals of the process (e.g., by encouraging individual car use in the city center while aiming for sustainable public transportation).
The categories are not mutually exclusive. A recommended intervention might aim for different policy goal(s) (1b.) in addition to addressing the main one. Decreasing the ticket prices for certain populations might be a climate change mitigation measure, as well as improve the accessibility of public transport. However, if a recommendation aims for different policy goals only, then it must be considered non-compliant with the mandate.
Function
The categories of functions signify whether a recommendation formulates a goal for policy and to what extent it suggests specific ways of attaining it. They correspond most clearly to the three basic policy components initially proposed by Hall (1989, 1993), i.e., goals, policy instruments, and calibrations, which were further specified by Howlett and Cashore (2009, 2014) to consist of six components in total: goals, objectives, settings, instrument logic, mechanisms, and calibrations. Our categories of function also incorporate the policy intensity measurement proposed by Schaffrin et al. (2015), who operationalized Howlett and Cashore’s objectives, settings, and calibrations to more specifically reflect the amount of resources mobilized for a given policy.
Howlett and Cashore (2009, 2014) distinguish between three different types of ends (goals, objectives, and settings) based on their varied level of abstraction. However, we know from our interviews that the actors engaged in CAs paid attention to whether recommendations formulated any policy ends and pointed out the directions for city policies, but they did not analyze such ends’ level of abstraction. For this reason and in the interest of simplicity and practicality of our framework, we opted not to differentiate between abstraction levels and propose a single category to verify whether a recommendation formulates goals or objectives (2a.). The formulation of a policy goal as more precise targets, e.g., a certain percentage of emission reduction and timescales by which it should be reached, was also one of the main categories analyzed in CCC’s policy advice by Dudley et al. (2022).
Several of our interviewees reflected on the difference between policy goals and means. They considered recommendations that not only set directions for policy (the “what?”), but also suggest with what means one should pursue these aims (the “how?”), to be more actionable and useful for the implementing authorities. This function is captured by the category of formulating policy measures (2b.). Following Howlett and Cashore’s taxonomy, this category encompasses both the concrete mechanisms (i.e., types of policy instruments, such as taxes, see next section ) and certain more abstract instrument logic, as in “ticket pricing encouraging traveling with collective transport rather than cars” (Rzeszów, 348–349).
The third possible function of a recommendation is to specify the implementation of a policy measure (2c.) by adding implementation guidelines such as certain applicable rules, city areas to start with, etc. This corresponds to what Howlett and Cashore understood as calibrations. It is also the category encompassing (Schaffrin et al., 2015) intensity measurements of implementation and its monitoring.
The policy measures, but also their specified implementation, are where Dudley et al. (2022) would identify the recommendations’ action points which they define as having an active verb, such as “plan”, “evaluate”, “monitor”, etc.
Type of intervention
Based on the earlier literature comparing policy tools, Schaffrin et al. (2015) distinguish five types of policy instruments: regulatory instruments, market-based instruments, soft measures, framework policy, and public investment. Such a limited list of categories is undoubtedly useful for studies measuring aggregate climate policy outputs and comparing them across years or countries. However, there is a considerable variety of instruments that fall under each of the five broad categories that can be overlooked. For example, when citizens recommend a market-based regulation, do they endorse introducing fiscal instruments such as new fees or taxes (3c.) on unwanted behaviors, or rather gravitate towards incentives for non-public actors (3d.) for the desired actions? Benton and Russell (2013) and Poole and Elstub (2025) also distinguish more specific types of interventions, such as legislation, guidance, changing the addressee’s approach, government policy (and practice), funding, or disclosure of information. Thus, to analyze citizens’ assembly outputs, we propose an approach that is more granular than that of Schaffrin et al. (2015) and more comprehensive than that of Poole and Elstub (2025).
Certain categories included in our framework result from the fact that CA recommendations, stemming from deliberation between regular citizens, are developed in a less restricted and systematic way than conventional policy documents and might move beyond the types of policy measures usual for traditional policy-making. Beyond commending an introduction of specific new policy measures, they might also call for improved execution of existing regulations (3b.) or the initiation of cooperation with other entities (3i.), including the launching of lobbying activities by the local authorities to influence national lawmaking.
Our list of policy intervention types captures the variety of actions recommended by CAs in more detail, providing more insight into their contents while still allowing to meaningfully reduce the amount of data and track observable trends.
Scope of intervention
Another group of features that characterize policy recommendations is the scope of intervention. It comprises various categories that we both identified empirically and found in the category of scope as distinguished by Schaffrin et al. (2015) with regard to policy instruments.
When it comes to the temporal scope of the recommendations, the interviewees considered two aspects: (1) whether the solutions outlined in a recommendation were of a long-term or ad hoc nature (4aa.), and (2) whether the perspective of implementation was long or short (4ab.).
Another category of evaluation employed was a holistic or particular scope (4b.) of a recommendation with regard to different dimensions of the proposed solutions (i.e., territorial, personal, material).
Finally, we identified three categories concerning the personal scope of recommendations. The first specifies the addressee expected to implement the recommendation. It corresponds with a similar category proposed by Dudley (2022) for the evaluation of recommendations of advisory bodies, and the target category used by Poole and Elstub (2025). Since CAs’ advice is directed at public authorities and they are the only entities that can be held accountable for implementing these recommendations, our framework distinguishes between different levels of their jurisdiction, i.e., local, regional, or national authority (4da., 4db., 4dc.). It is, however, imaginable that a CA would formulate recommendations aimed at other actors, such as civil society, businesses, or individuals (as it was also predicted in Benton & Russell’s framework). Such a situation would be conveyed by coding the given recommendations with a negative value to each of the three categories of public authorities. This would mean that the addressee is not a public actor, and the recommendation can be considered a wishful guideline rather than an implementable and trackable action point. Moreover, contrary to Benton and Russell (2013), our framework does not differentiate between executive and legislative authorities, an aspect that can easily be added, if such a differentiation is relevant to the analysis.
Two other categories regarding the personal scope correspond to the actors potentially affected by the recommendations. They aim to evaluate whether (1) the group of a recommendation’s beneficiaries and (2) the group of actors burdened by its implementation are universal or limited—similarly as in the framework proposed by Schaffrin et al. (2015).
Finally, following Dudley (2022), we distinguish the sectoral focus category, which refers to the policy fields that encompass the areas addressed by a recommendation. Unlike other categories in our model, policy sector(s) (4c.) is an open category that should be filled with the relevant public policy areas2 relevant to the case.
Linguistic form
In the reviewed literature, the categories concerning the linguistic form of the analyzed object were developed with reference to legal texts (Hurka et al., 2022, 2024). The authors distinguish three categories connected to the comprehensiveness of a text, i.e., its formulation (understood as the simplicity of the language), readability (whether the text is accessible for the reader, considering, e.g., the length of words and sentences), and signal uniformity (whether the applied terminology is consistent). Equivalent categories were found in our interviews, where the interviewees frequently evaluated whether the discussed recommendations were comprehensible and correct (5b.). Another category frequently emerging in the interviews was the preciseness (5a.) of the recommendations, i.e., whether they clearly specified what should be done.
The aspect of preciseness is difficult to conceptualize and existing literature offers limited guidance. Moreover, precision means different things depending on the function of the recommendation, i.e., whether it applies to policy measures (3b.) and calibrations (3c.), or policy goals (3a.) (see, function, above). In the former two cases, the proposal should clearly specify the actions that should be undertaken. If needed for clarity, it should also specify, who or what should be the object of such actions. For example, if a recommendation proposes to “use such techniques of insulating buildings, that allow to reach minimally the class A reduction(…)” [L13], but it is not clear what kinds of buildings are meant, e.g. public or private, then the recommendation is not precise enough. Different criteria apply when a recommendation formulates a policy goal rather than an action to be undertaken. In their framework inspired by Ostrom (2005, 2010), Capano and Howlett (2024) propose three elements that should be identifiable when specifying any policy goal: the target population (“who”), the expected outcome (“what”) and a time frame for achieving it (“when”). A useful rule-of-thumb that can be helpful in assessing these elements in a recommendation is to consider, whether the policy goal is formulated in a way that will allow a future verification of its completion. Finally, if a recommendation consists of several elements, such as more than one policy measure, or combines policy measures and policy goals (see, Complexity (5e.), below), we recommend the following scoring criteria. A recommendation’s specificity is assigned value “1” (or “yes”) only if all its elements meet the specificity criteria. When only certain elements are specific enough, the assigned value is “0.5” (or “partially”); if no elements meet the criteria, the value is “0” (or “no”).
Two further categories proposed by Hurka et al. (2022) and Hurka et al. (2024) are the depth of the text (allowing one to measure its hierarchical level) and its technicality. Only the former category was included in our framework due to its applicability to CAs’ recommendations and to which we refer as to “complexity” of recommendations. To assess the complexity (5d.), we distinguish three levels, where the number of hierarchical levels depends on the number of that recommendation’s functions (see 3a–c.). Thus, if a recommendation formulates only goals or objectives, policy measures, or their specification—the level is 1; if it has two of these functions at the same time—the level is 2; and if it has three functions—the level is 3 (types of functions are described above and in Table 1).
Additionally, the interviewees highlighted that some of the recommendations contained references to various external sources. Overlapping this category to the classification proposed by Fischer (1980, 1990, 1995) as applied by Dudley (2022) to the recommendations of advisory bodies, in our final categorization, we distinguish two categories: external references to research, data technical standards, etc. (5c.), including also existing legal regulations, and external references to (social) values (5d.), such as safety, comfort, aesthetics or efficiency.
Supplementary categories
Beyond the core categories of the framework presented above, we have identified a number of supplementary categories that might be relevant for specific research designs. These categories were excluded from the core list because, unlike the earlier groups, applying them requires substantial expertise in the specific policy field and/or contextual knowledge (for further details, see the discussion and conclusion section).
The first group of categories concerns various types of relations between the recommendations and other policy components, i.e., relations to existing regulations, plans, practices, and other recommendations of a given CA. The findings from the interviews suggest that a recommendation can propose a new solution (6a.) or overlap with solutions already present in other regulations. If the solution is new, the following two categories assess whether it contradicts (6aa.) or improves existing measures (6ab.). This classification corresponds with the embeddedness category proposed by Hurka et al. (2022, 2024) and the level of policy change proposed by Poole and Elstub (2025).
Similarly, further categories regarding relations with other recommendations allow us to indicate whether a certain recommendation duplicates (7a) or contradicts (7b) other proposals from the same deliberative process. In this case, the categories are related to the measures of integration (Schaffrin et al., 2015) and internal interdependence (Hurka et al., 2022, 2024) found in the literature.
Finally, following the measure of density quantifying the number of existing policy regulations (Schaffrin et al., 2015), we propose to count the total number of recommendations formulated by a certain CA. Together with the complexity(5e.) category described earlier, it serves as an indicator of the size of the list of recommendations.
The second group of categories, not included in the core framework, describes various qualities of recommendations in a more interpretive and subjective manner. While analyzing how our interviewees and focus group participants evaluated CAs’ recommendations, we identified several such categories, listed below. Rather than being applied by the researcher, they can be used for measuring policy experts’ assessments or the public’s perceptions of CA outputs.
The first category in this group is concerned with how well a certain recommendation is adapted to the local context (8a.). The interviewees emphasized that rather than relying on coping solutions found in other locations, adequate recommendations should be grounded in the knowledge and understanding of local conditions to ensure their effectiveness.
Further, especially the regular citizens (i.e., the FGIs participants) appreciated recommendations that corresponded with their everyday experiences (8b.) by addressing the issues they personally encounter when living in the city.
Even though the effectiveness of recommendations was the most frequently used evaluation in our interviews, what was considered an effective measure (8c.) varied broadly and was highly context-dependent. Broadly speaking, it represented what the interviewees considered to be intuitively rational according to their knowledge and understanding, whether in the sense of effectiveness in reaching the desired results or cost- and resource-effectiveness.
The next category evaluates whether a recommendation is justified (8d.) from the perspective of certain sociopolitical values upheld by an interviewee. Policy recommendations can either be based on the same values and reinforce them or support their conflicting alternatives. In the context of our interviews, this subjective category was highlighted especially in reference to questions such as social and environmental justice, energy transformation, regulation and deregulation, decentralization and democratization of public management, space and priority given in a city to various transportation methods, or individual transportation choices.
Interviewees also distinguished how much priority (8e.) a measure should be given compared to other interventions. It was usually based on whether it addressed the direst issues and should, therefore, be introduced first or that it would make the most significant difference.
Further, the interviewed actors often considered recommendations as either radical or moderate (8 f.), which was not necessarily related to their positive or negative assessment. E.g., they saw some of the measures as too radical or as not radical enough.
Another category captures the assessment of a recommendation as either a potentially realistic measure (8 g.) or one whose implementation is considered rather unrealistic or even impossible due to various legal, financial, or organizational limitations.
When it comes to the current trends and scientific or technological state of the art, a recommendation can be considered an up-to-date measure (8 h.) or rather outdated. For example, an interviewee considered a 2018 CA’s recommendation about the desired technology for household heating stoves entirely obsolete from the perspective of the year 2024 (Warsaw, 542–546).
The last category of our framework, the verifiability (8i.) of a recommendation, refers not to the measure it advises but to the extent to which its factual implementation or lack thereof can be clearly verified and tracked. This category is comparable to the measurability category proposed by Poole and Elstub (2025) and was frequently brought up during the interviews. Interviewees who paid attention to this aspect were especially wary of public authorities exploiting insufficiently verifiable formulations to declare the successful implementation of certain recommendations without satisfying their authors’ true intentions. The verifiability can be related to a recommendation’s precise linguistic formulation (1a.), specification of policy goals or objectives through quantifiable targets and timescales (3a., see also Dudley et al., 2022), but also the nature of the recommended interventions (4.) and its contextual and substantive aspects. While a combination of these factors can influence it, we see it as an ultimately interpretative category.
Intercoder reliability
Table 2 presents the intercoder reliability levels achieved for each of the framework’s categories following the second round of test-coding, as described in the methods section.
Table 2
Intercoder reliability measured for each of the framework’s categories included in the test-coding conducted by the authors
Core categories | Goal | 1a. | Complies with the policy goal(s) | 92% | |
1a. =yes | 1aa. | Addresses the causes of the problem | 81% | ||
1ab. | Addresses the consequences of the problem | 90% | |||
1b. | Aims for different policy goal(s) | 60% | |||
Function | 2a. | Formulates goals or objectives | 78% | ||
2b. | Formulates policy measures | 86% | |||
2c. | Specifies the implementation of policy measures | 72% | |||
Type of intervention (only if 2b.=yes) | 3a. | Prohibitions or orders | 92% | ||
3b. | Execution of existing regulations | 100% | |||
3c. | Fiscal instruments (e.g., fees and taxes) | 100% | |||
3d. | Incentives for non-public actors | 100% | |||
3e. | Education, information, communication | 92% | |||
3 f. | Standards and best practices (incl. certification, voluntary agreements) | 97% | |||
3 g. | Management (incl. monitoring, conducting analyses, creating strategies, plans, etc.) | 92% | |||
3 h. | Institutional changes (incl. changes to participatory mechanisms) | 94% | |||
3i. | Cooperation with other entities (incl. lobbying) | 95% | |||
3j. | Public investments (incl. investments in existing infrastructure) | 94% | |||
3k. | Public services (e.g., financial, knowledge, technology) | 94% | |||
Scope of intervention | 4aa. | Temporal scope: Long-term solutions | 97% | ||
4ab. | Temporal scope: Long implementation perspective | 68% | |||
4b. | Holistic scope | 55% | |||
4c. | Policy sector(s) | 73% | |||
4da. | Addressee: Local authority | 97% | |||
4db. | Addressee: Regional authority | 91% | |||
4dc. | Addressee: National authority | 100% | |||
4e. | Universal beneficiary | 63% | |||
4 f. | Universal burdened actors | 91% | |||
Linguistic form | 5a. | Precise | 75% | ||
5b. | Comprehensible & correct | 83% | |||
5c. | External references to research, data, technical standards, etc. | 72% | |||
5d. | External references to values | 72% | |||
5e. | Complexity | – | |||
A significant majority of the categories demonstrated a satisfying level of intercoder agreement, ranging from 72 to 90% or higher, with several exceeding 90%. In a similarly designed, empirically oriented study, it would be standard procedure for coders to meet, discuss their dilemmas, and agree on more detailed decision-making guidelines. Including these steps during coding would further improve intercoder agreement, especially for minor discrepancies.
In a few categories, however, the agreement values were noticeably lower. In the case of the category Aims for different policy goals (1b.), this can be explained by the lack of a previously established, finite list of potential additional policy goals to consider (as applied in Policy sector(s), 4c., or Type of intervention, 3., where available options were more precisely delineated). For other categories, particularly those in the supercategory Scope of intervention (4aa.–4 f.), lower agreement rates suggest a need for greater familiarity with the specific policy fields and, possibly, expert consultation.
Discussion and conclusion
While existing literature offers ways of analyzing and comparing policy outputs developed in traditional policy-making processes, attempts to investigate those produced by advisory bodies have been scarce. It is our conviction that, in particular, recommendations resulting from deliberative minipublics such as citizens’ assemblies require special attention and a dedicated approach. It seems justified not only by the epistemological claims about the CAs and their growing popularity around the world but also by their unique character in the policy-making system. CAs develop their recommendations through the deliberation of diverse groups of lay citizens and are bound by broader, often cross-sectoral remits and less strict agendas than “professional” policy-making bodies. Policy goals of particular processes and addresses of their recommendations are usually assumed, but not all recommendations will be consistent with them. Moreover, CAs might be more creative in their proposals than legislatures or even other advisory bodies, suggesting measures that go beyond administrative silos and prevailing practices of public authorities or existing legal regulations. On the other hand, it seems that such creativity may create a risk of exceeding the scope of the CAs’ mandates. Finally, the formulations of CAs’ recommendations lack the standardization of legal documents and vary broadly between individual processes, presenting a still under-explored field for systematic comparisons.
In this article, we propose an analytical framework developed from a productive combination of an analysis of existing literature on policy outputs, including policy advice, and an empirical study of CAs’ recommendations. With this, we provide a consistent method of comparing recommendations developed in deliberative participatory processes across cases, independently of their particular contexts.
The framework includes the following core supercategories: Goal(s), Function(s), Type of intervention, Scope of intervention, and Form. They are designed to enable the coding of individual CA recommendations based on their literal formulations. Depending on the researcher’s familiarity with the given policy field, assessing recommendations against the provided category list can be conducted by the researcher or by a group of policy experts3 recruited to assess the recommendations, e.g., via a survey. Not all categories need to be applied for any analysis – the final selection of categories to be used in a research design will depend on its particular research questions.
Beyond the core of the framework, we identified supplementary categories that fall into two groups: Relations (6–7.) and Quality (8.). The former group comprises categories that investigate recommendations’ relations with the existing regulations and with each other. Applying these categories would allow a researcher to look at the list of recommendations collectively as the entire policy output portfolio of a particular process. At the same time, analyzing their relations with the broader policy context will require tapping into an existing policy field expertise, either that of external experts or one’s own. This is why it is not included in the core framework since the motivation behind building this tool was to enable an analysis of recommendations by researchers without expertise in a particular policy field, such as scholars of public participation. The latter group consists of categories identified inductively from what our interviewees considered to make up “good” recommendations. These characteristics were highly subjective and linked to particular actors’ values and policy preferences and, therefore, not suited for non-interpretative coding of recommendations. However, we imagine they can be useful for exploring various actors’ perceptions of these recommendations.
Even though various types of actors were interviewed to include their different perspectives and achieve a higher data saturation, we did not analyze their interviews in a comparative but rather an explorative manner. That means that this article does not provide information on which characteristics of the recommendations were relevant to particular types of actors. However, the framework offers an excellent starting point for systematically investigating such actors’ preferences, providing an assessment scheme that could be applied to large-N survey designs. Future research could also aim to understand the heuristics used by different types of actors, including the differences between the former participants and non-participants. Moreover, the framework does not assume any single category to be more relevant than others. The relative importance of specific characteristics of recommendations would depend on the theoretical approach or normative assumptions applied. For example, a clear formulation of specific targets or addressees of a recommendation can be considered relevant for its subsequent implementation. On the other hand, it can be argued that the policy measure type, or relation to the existing regulations, plays the most significant role in its successful execution. Since our list of categories aims to be a flexible tool for various research questions, we do not propose any such assumptions.
The framework does not claim to be a universal measure of CAs’ recommendations. Most importantly, it does not include contextual aspects of the recommendations, e.g., whether they were developed in response to specific local challenges or the broader political debate. Such features need to be subject to further conceptualizations and might require more context-specific data sources. For instance, the study by Font et al. (2018) included such contextual characteristics and found, among others, that public servants’ and politicians’ support for recommendations predicted much higher chances of their full implementation. At the same time, gathering this kind of data required, on average, 4,6 interviews per participatory process, making a comparative study of a large number of cases highly resource-intensive. What our framework offers as an alternative is an organizing structure for even an individual researcher to evaluate CAs’ recommendations based on their literal content without the need to familiarize themselves with background information. On the other hand, a more context-sensitive investigation of qualitative assessments of recommendations by local actors would be possible through a survey based on the supplementary categories we captured in the Quality (8.) group.
One limitation of the instrument is that it was developed using Polish cases only. Since CAs in Poland tend to be commissioned with a political promise of implementation by executive bodies, the involved and responsible actors pay considerable attention to the details and specific formulations of CAs’ recommendations. That offered us suitable grounds for investigating the proposals’ most relevant characteristics. Nevertheless, it would be beneficial to test this framework on recommendations from CAs across various jurisdiction levels and contexts. The same applies to other policy fields since most of the CA cases and the resulting recommendations were concerned with environmental policies.
Further research could apply the framework to track the existing practices in CAs’ recommendations’ formulations across a high number of cases and lead to developing an empirically grounded typology of recommendations. A narrower focus could be applied to recommendations’ selected aspects, such as references to social values, most often addressed sectors, or most preferred policy instruments. On a more systemic level, it would be valuable to explore the relationships between different aspects of CAs’ designs and their outputs.
Finally, considering that the UK Climate Change Committee’s advice has been shown to diverge from normative accounts of the elements required for successful policy advice (Dudley et al., 2022), analyzing CAs’ recommendations using the proposed categories and evaluating them against their subsequent policy impacts could help identify which elements are truly essential for recommendations to be impactful.
By providing our framework as a source of theoretically and empirically based analytical categories, we hope to facilitate further studies on proposals developed by citizens’ assemblies and encourage greater scholarly attention to this still understudied topic.
Acknowledgements
An earlier version of this manuscript was presented at the ECPR General Conference 2024 and at the mini-seminar on Democratic Innovations at the University of Oslo on September 19th, 2024. We thank the discussant at the former event and all the participants at the latter event for their insightful comments. We would like to express our sincere gratitude to Yves Steinebach and Elin Lerum Boasson for their recommendations on relevant literature, which significantly contributed to the improvement of our work. Moreover, we thank Yves Steinebach and Paulina Pospieszna for providing their critical feedback on the manuscript.
Declarations
Conflict of interest
The authors declare no competing interests.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.