1 Introduction
In broad terms, saturation is used in qualitative research as a criterion for discontinuing data collection and/or analysis.
1 Its origins lie in grounded theory (Glaser and Strauss
1967), but in one form or another it now commands acceptance across a range of approaches to qualitative research. Indeed, saturation is often proposed as an essential methodological element within such work. Fusch and Ness (
2015: p. 1408) claim categorically that ‘failure to reach saturation has an impact on the quality of the research conducted’;
2 Morse (
2015: p. 587) notes that saturation is ‘the most frequently touted guarantee of qualitative rigor offered by authors’; and Guest et al. (
2006: p. 60) refer to it as having become ‘the gold standard by which purposive sample sizes are determined in health science research.’ A number of authors refer to saturation as a ‘rule’ (Denny
2009; Sparkes et al.
2011), or an ‘edict’ (Morse
1995), of qualitative research, and it features in a number of generic quality criteria for qualitative methods (Leininger
1994; Morse et al.
2002).
However, despite having apparently attained something of the status of orthodoxy, saturation is defined within the literature in varying ways—or is sometimes undefined—and raises a number of problematic conceptual and methodological issues (Dey
1999; Bowen
2008; O’Reilly and Parker
2013). Drawing on a number of examples in the literature, this paper seeks to explore some of these issues in relation to three core questions:
-
‘What?’—in what way(s) is saturation defined?
-
‘Where and why?’—in what types of qualitative research, and for what purpose, should saturation be sought?
-
‘When and how?’—at what stage in the research is saturation sought, and how can we assess if it has been achieved?
In addressing these questions, we will explore the implications of different models of saturation—and the theoretical and methodological assumptions that underpin them—for the varying purposes saturation may serve across different qualitative approaches. In doing so, the paper will contribute to the small but growing literature that has critically examined the concept of saturation (e.g. Bowen
2008; O’Reilly and Parker
2013; Walker
2012; Morse
2015; Nelson
2016), aiming to extend the discussion around its conceptualization and use. We will argue not only for greater transparency in the reporting of saturation, as others have done (Bowen
2008; Francis et al.
2010), but also for a more thorough consideration on the part of qualitative researchers regarding how saturation relates to the research question(s) they are addressing, in addition to the theoretical and analytical approach they have adopted, with due recognition of potential inconsistencies and contradictions in its use.
2 ‘What?’—in what way(s) is saturation defined?
In their original treatise on grounded theory, Glaser and Strauss (
1967: p. 61) defined saturation in these terms:
The criterion for judging when to stop sampling the different groups pertinent to a category is the category’s theoretical saturation. Saturation means that no additional data are being found whereby the sociologist can develop properties of the category. As he sees similar instances over and over again, the researcher becomes empirically confident that a category is saturated. He goes out of his way to look for groups that stretch diversity of data as far as possible, just to make certain that saturation is based on the widest possible range of data on the category.
Here, the decision to be made relates to further sampling, and the determinant of adequate sampling has to do with the degree of development of a theoretical category in the process of analysis. Saturation is therefore closely related to the notion of theoretical sampling—the idea that sampling is guided by ‘the necessary similarities and contrasts required by the emerging theory’ (Dey
1999: p. 30)—and causes the researcher to ‘combine sampling, data collection and data analysis, rather than treating them as separate stages in a linear process’ (Bryman
2012: p. 18).
Also writing from a grounded theory standpoint, Urquhart (
2013: p. 194) defines saturation as: ‘the point in coding when you find that no new codes occur in the data. There are mounting instances of the same codes, but no new ones’, whilst Given (
2016: p. 135) considers saturation as the point at which ‘additional data do not lead to any new emergent themes’. A similar position regarding the (non)emergence of new codes or themes has been taken by others (e.g. Birks and Mills
2015; Olshansky
2015).
3 These definitions show a change of emphasis, and suggest a second model of saturation. Whilst the focus remains at the level of analysis, the decision to be made appears to relate to the emergence of
new codes or themes, rather than the degree of development of those already identified. Moreover, Urqhart (
2013) and Birks and Mills (
2015) relate saturation primarily to the termination of analysis, rather than to the collection of new data.
According to Starks and Trinidad (
2007: p. 1375), however, theoretical saturation occurs ‘when the complete range of constructs that make up the theory is fully represented by the data’. Whilst not wholly explicit, this definition suggests a third model of saturation with a different directional logic: not ‘given the data, do we have analytical or theoretical adequacy?’, but ‘given the theory, do we have sufficient data to illustrate it?’
4
If we move outside the grounded theory literature,
5 a fourth perspective becomes apparent in which there are references to
data saturation, rather than
theoretical saturation (e.g. Fusch and Ness
2015).
6 This view of saturation seems to centre on the question of how much data (usually number of interviews) is needed until nothing new is apparent, or what Sandelowski (
2008: p. 875) calls ‘informational redundancy’ (e.g. Francis et al.
2010; Guest et al.
2006). Grady (
1998: p. 26) provides a similar description of data saturation as the point at which:
New data tend to be redundant of data already collected. In interviews, when the researcher begins to hear the same comments again and again, data saturation is being reached… It is then time to stop collecting information and to start analysing what has been collected.
Whilst several others have defined data saturation in a similar way (e.g. Hill et al.
2014: p. 2; Middlemiss et al.
2015; Jackson et al.
2015), Legard et al. (
2003) seem to adopt a narrower, more individual-oriented perspective on data saturation, whereby saturation operates not at the level of the dataset as a whole, but in relation to the data provided by an individual participant; i.e. it is achieved at a particular point within a specific interview:
Probing needs to continue until the researcher feels they have reached saturation, a full understanding of the participant’s perspective (Legard et al.
2003: p. 152).
From this perspective, the researcher’s response to the data—through which decisions are made about whether or not any new ‘information’ is being generated—is not necessarily perceived as forming part of the analysis itself. Thus, in this model, the process of saturation is located principally at the level of data collection and is thereby separated from a fuller process of data analysis, and hence from theory.
Four different models of saturation seem therefore to exist (Table
1). The first of these, rooted in traditional grounded theory, uses the development of categories and the emerging theory in the analysis process as the criterion for additional data collection, driven by the notion of theoretical sampling; using a term in common use, but with a more specific definitional focus, this model could thus be labelled as
theoretical saturation. The second model takes a similar approach, but saturation focuses on the identification of new codes or themes, and is based on the number of such codes or themes rather than the completeness of existing theoretical categories. This can be termed
inductive thematic saturation. In this model, saturation appears confined to the level of analysis; its implication for data collection is at best implicit. In the third model, a reversal of the preceding logic is suggested, whereby data is collected so as to
exemplify theory, at the level of lower-order codes or themes, rather than to
develop or
refine theory. This model can be termed
a priori thematic saturation, as it points to the idea of pre-determined theoretical categories and leads us away from the inductive logic characteristic of grounded theory. Finally, the fourth model—which, again aligning with the term already in common use, we will refer to as
data saturation—sees saturation as a matter of identifying redundancy in the data, with no necessary reference to the theory linked to these data; saturation appears to be distinct from formal data analysis.
Table 1
Models of saturation and their principal foci in the research process
Theoretical saturation | Relates to the development of theoretical categories; related to grounded theory methodology | Sampling |
Inductive thematic saturation | Relates to the emergence of new codes or themes | Analysis |
A priori thematic saturation | Relates to the degree to which identified codes or themes are exemplified in the data | Sampling |
Data saturation | Relates to the degree to which new data repeat what was expressed in previous data | Data collection |
Some authors appear to espouse interpretations of saturation that combine two or more of the models defined above, making its conceptualization less distinct. For example, Goulding (
2005) suggests that both data and theory should be saturated within grounded theory, and Drisko (
1997: p. 192) defines saturation in terms of ‘the comprehensiveness of both the data collection and analysis’. Similarly, Morse’s view of saturation seems to embody elements of both theoretical and data saturation. She links saturation with the idea of replication, in a way that suggests a process of data saturation:
However, when the domain has been fully sampled – when all data have been collected – then replication of data occurs and, with this replication… the signal of saturation (Morse
1995: p. 148).
Morse notes elsewhere that she is able to tell when her students have achieved saturation, as they begin to talk about the data in more generalized terms and ‘can readily supply examples when asked. These students
know their data’ (Morse
2015: p. 588). This too suggests a form of data saturation. However, Morse also proposes that saturation is lacking when ‘there are too few examples in each category to identify the characteristics of concepts, and to develop theory’ (Morse
2015: p. 588). This perspective seems to be located firmly in the idea of theory development (as other parts of the quoted papers by Morse make clear), though a heavy emphasis is placed at the level of the data and the way in which the data exemplify theory, thereby seeming to evoke both data and theoretical saturation.
Hennink et al. (
2017) go further, appearing to combine elements of all four models of saturation. They firstly identify ‘code saturation’, the point at which ‘no additional issues are identified and the codebook begins to stabilize’ (
2017: p. 4), which seems to combine elements of both inductive thematic saturation and data saturation. However, within this approach saturation is discussed as relating not only to codes developed inductively, but also to a priori codes, which echoes the third model: a priori thematic saturation. They go on to distinguish ‘code saturation’ from ‘meaning saturation’; in the latter, the analyst attempts to ‘fully understand conceptual codes or the conceptual dimensions of… concrete codes’ (
2017: p. 14). This focus on saturating the dimensions of codes seems more akin to theoretical saturation; however, their analysis remains at the level of codes, rather than theoretical categories developed from these codes, and Hennink et al. explicitly position their approach outside grounded theory methods.
3 ‘Where and why?’—in what types of qualitative research, and for what purpose, should saturation be sought?
Morse (
2015: p. 587) takes the view that saturation is ‘present in all qualitative research’ and as previously noted, it is commonly considered as the ‘gold standard’ for determining sample size in qualitative research, with little distinction between different types of qualitative research. We question this perspective, and would instead argue—as is suggested by the different models of saturation considered in the previous section—that saturation has differing relevance, and a different meaning, depending on the role of theory, a viewpoint somewhat supported by other commentators who have questioned its application across the spectrum of qualitative methods (Walker
2012; O’Reilly and Parker
2013; van Manen et al.
2016).
In a largely deductive approach (i.e. one that relies wholly or predominantly on applying pre-identified codes, themes or other analytical categories to the data, rather than allowing these to emerge inductively) saturation may refer to the extent to which pre-determined codes or themes are adequately represented in the data—rather like the idea of the categories being sufficiently replete with instances, or ‘examples’, of data, as suggested in the a priori thematic saturation model outlined above. Thus, in their attempt to establish an adequate sample size for saturation, Francis et al. (
2010) refer explicitly to research in which conceptual categories have been pre-established through existing theory, and it is significant in this respect that they link saturation with the notion of content validity. In contrast, within a more inductive approach (e.g. the inductive thematic saturation and theoretical saturation models outlined above), saturation suggests the extent to which ‘new’ codes or themes are identified within the data, and/or the extent to which new theoretical insights are gained from the data via this process.
In both the deductive and the inductive approach, we can make sense of the role of saturation, however much it differs in each case, because the underlying approach to analysis is essentially thematic, and usually occurs in the context of interview or focus group studies involving a number of informants. It is less straightforward to identify a role for saturation in qualitative approaches that are based on a biographical or narrative approach to analysis, or that, more generally, include a specific focus on accounts of
individual informants (e.g. interpretative phenomenological analysis). In such studies, analysis tends to focus more on
strands within individual accounts rather than on analytical
themes; these strands are essentially continuous, whereas themes are essentially recurrent. Accordingly, Marshall and Long (
2010) suggest that saturation was not appropriate in their study of maternal coping processes, based on narrative methods. Elsewhere, however, a less straightforward picture emerges. Hawkins and Abrams (
2007) utilized saturation in the context of a study based on life-history interviews with 39 formerly homeless mentally ill men and women. The authors state: ‘Of the 39 participants, six did not complete a second interview because they were unavailable, impaired, or the research team felt the first interview had achieved saturation’ (p. 2035), suggesting that judgments of saturation were made within each participant’s account. Power et al. (
2015) adopted a story-telling approach to women’s experience of post-partum hospitalization, and recruitment continued until data saturation, which was established through ‘the repetition of responses’ (p. 372). Analysis was thematic, and it is not clear whether saturation was determined in relation to themes across participants’ stories, or within individual stories. Similarly, in a study of osteoarthritis in footballers, based on interpretative phenomenological analysis, Turner et al. (
2002) employed saturation, which was defined both in terms of the emergence of themes from the analysis and a ‘consensus across views expressed’ (p. 298), which suggests that, notwithstanding the interpretive phenomenological analysis perspective adopted, saturation was sought more across than within cases. Hale et al. (
2007: p. 91) argue, however, that saturation is not normally an aim in interpretative phenomenological analysis, owing to the concern to obtain ‘full and rich personal accounts’, which highlights the particular analytical focus within individual accounts in this approach, and van Manen dissociates saturation from phenomenological research more generally (van Manen et al.
2016).
Considering the various types of research in which saturation might feature helps to clarify the purposes it is intended to fulfil. When used in a deductive approach to analysis, saturation serves to demonstrate the extent to which the data instantiate previously determined conceptual categories, whereas in more inductive approaches, and grounded theory in particular, it says something about the adequacy of sampling in relation to theory development (although we have seen that there are differing accounts of how specifically this should be achieved). In narrative research, a role for saturation is harder to discern. Rather than the sufficient development of theory, it might be seen to indicate the ‘completeness’ of a biographical account. However, one could question whether the point at which a participant’s story is interpreted as being ‘complete’—having presumably conveyed everything seen to be relevant to the focus of the study—is, in fact, usefully described by the concept of saturation, given the distance that this moves us away from the operationalization of saturation in broadly thematic approaches. This might, furthermore, lead us to ask whether there is the risk of saturation losing its coherence and utility if its potential conceptualization and uses are stretched too widely.
The same issue is relevant with regard to a number of other, less obvious, purposes that have been proposed for saturation. For example, it has been claimed to demonstrate the trustworthiness of coding (Damschroder et al.
2007)—but as saturation will be a direct and automatic consequence of one’s coding decisions, it is not clear how it can be an independent measure of their quality. Dubé et al. (
2016) suggest that saturation says something about (though not conclusively) the ability to extrapolate findings, and Boddy (
2016: p. 428) claims that ‘once saturation is reached, the results must be capable of some degree of generalisation’; this seems to move us away from the notion of the theoretical adequacy of an analysis, and the explanatory scope of a theory, toward a much more empirical sense of generalizability. The use of saturation in these two cases could perhaps indicate a degree of confusion in some studies about the meaning of saturation and its purpose, even when taking into account the differing models of saturation outlined earlier. Therefore, we would suggest that for saturation to be conceptually meaningful and practically useful there should be some limit to the purposes to which it can be applied.
5 Conclusion
This paper has offered a critical reflection on the concept of saturation and its use in qualitative research, contributing to the small body of literature that has examined the complexities of the concept and its underlying assumptions. Drawing on recent examples of its use, saturation has been discussed in relation to three key sets of questions: What? Where and why? When and how?
Extending previous literature that has highlighted the variability in the use of saturation (O’Reilly and Parker
2013; Walker
2012), we have scrutinized the different ways in which it has been operationalized in the research literature, identifying four models of saturation, each of which appears to make different core assumptions about what saturation is, and about what exactly is being saturated. These have been labelled as: theoretical saturation, inductive thematic saturation, a priori thematic saturation, and data saturation. Moving forward, the identification and recognition of these different models of saturation may aid qualitative researchers in untangling some of the inconsistencies and contradictions that characterize its use.
Saturation’s apparent position as a ‘gold standard’ in assessing quality and its near universal application in qualitative research have been previously questioned (Guest et al.
2006; O’Reilly and Parker
2013; Malterud et al.
2016). Similarly, doubts have been raised regarding its common adoption as a
sole criterion of the adequacy of data collection and analysis (Charmaz
2005), or of the adequacy of theory development: ‘Elegance, precision, coherence, and clarity are traditional criteria for evaluating theory, somewhat swamped by the metaphorical emphasis on saturation’ (Dey
2007: p. 186). On the basis of such critiques, we have examined how saturation might be considered in relation to different theoretical and analytical approaches. Whilst we concur with the argument that saturation should not be afforded unquestioned status, polarization of saturation as either applicable or non-applicable to different approaches, as has been suggested (Walker
2012), may be too simplistic. Instead we propose that saturation has differing relevance, and a different meaning, depending on the role of theory, the analytic approach adopted, and so forth, and thus may usefully serve different purposes for different types of research—purposes that need to be clearly articulated by the researcher.
Whilst arguing for flexibility in terms of the purpose and use of saturation, we also suggest that there must be some limit to this range of purposes. Some of the ways in which saturation has been operationalized, we would suggest, risk stretching or diluting its meaning to the point where it becomes too widely encompassing, thereby undermining its coherence and utility.
When and how saturation may be judged to have been reached will differ depending on the type of study, as well as assumptions about whether it represents a distinct event or an ongoing process. The view of saturation as an event has been problematized by others (Strauss and Corbin
1998; Dey
1999; Nelson
2016), and we have explored the implications of conceptualizing saturation in this way, arguing that it appears to give rise to a degree of uncertainty and equivocation, in part driven by the uncertain logic of the concept itself—as a statement about the unobserved based on the observed. This uncertainty appears to give rise to inconsistencies and contradictions in its use, which we would argue could be resolved, at least in part, if saturation were to be considered as a matter of degree, rather than simply as something either attained or unattained. However, whilst considering saturation in incremental terms may increase researchers’ confidence in making claims to it, we suggest it is only through due consideration of the specific purpose for which saturation is being used, and what one is hoping to saturate, that the uncertainty around the concept can be resolved.
In highlighting and examining these areas of complexity, this paper has extended previous discussions of saturation in the literature. Whilst consideration of the concept has led some commentators to argue for the need for qualitative researchers to provide a more thorough and transparent reporting of how they achieved saturation in their research, thus allowing readers to assess the validity of this claim (Bowen
2008; Francis et al.
2010), our arguments go beyond this. We contend that there is a need not only for more transparent reporting, but also for a more thorough re-evaluation of how saturation is conceptualized and operationalized, including recognition of potential inconsistencies and contradictions in the use of the concept—this re-evaluation can be guided through attending to the four approaches we have identified and their implications for the purposes and uses of saturation. This may lead to a more consistent use of saturation, not in terms of its always being used in the same way, but in relation to consistency between the theoretical position and analytic framework adopted, allowing saturation to be used in such a way as to best meet the aims and objectives of the research. It is through consideration of such complexities in the context of specific approaches that saturation can have most value, enabling it to move away from its increasingly elevated yet uneasy position as a taken-for-granted convention of qualitative research.