A great many happiness researchers embrace the view that happiness is u-shaped in age. That assertion is evidenced in part by the way age is used as a control variable for analyses oriented to some other purpose: the overwhelmingly common practice is to use a quadratic specification, entering age together with age-squared in the model. This practice is typically justified with reference to the idea that the relationship is in fact u-shaped—in other words, taking for granted that the answer to the question is known.
This early study was challenged very quickly by Glenn (
2009). Glenn considered whether the appearance of the u-shape resulted from use of ‘inappropriate’ control variables (the substance of this view is described below). Similar explorations appeared in Hellevik (
2017) and Bartram (
2021). In certain instances, these critiques were followed by rebuttals. A key example appeared in Blanchflower (
2021), offering an analysis covering an even wider range of countries and using multiple datasets—and concluding that the age–happiness relationship is u-shaped virtually ‘everywhere’. A further critique by Bartram (
2023) led to another rebuttal (Blanchflower et al.,
2023), a contribution notable for a sweeping statement asserting that there are no fewer than 618 published studies finding that the age–happiness relationship is characterised by a u-shape.
More broadly, we see a range of studies that produce a range of patterns characterising the age–happiness relationship.
2 Several studies offer support for the ‘u-shape’ finding: in addition to Blanchflower’s contributions, there is work by Beja (
2018), Graham and Ruiz Pozuelo (
2017), Cheng et al. (
2017), and Movshuk (
2011). There is a second category of studies that agree with the idea of ‘u-shape’ in the sense that happiness apparently rises after a midlife low; the difference here is that these studies discern that happiness subsequently declines as people move through older age
3—such that the overall pattern is a sideways ‘s-shape’ or ‘wave pattern’ (Biermann et al.,
2022; Frijters and Beatton
2012; Laaksonen,
2018). Other investigations show greater inconsistency with the ‘u-shape’: for example, Kratz and Brüderl (
2021) identify a declining trend in happiness across the life course in Germany (i.e., with no post-middle-age increase). In the study of Germany by Kassenboehmer and Haisken-DeNew (
2012), the life-course trend is instead found to be flat. Galambos et al. (
2015) find that happiness increases among younger Canadians as they approach middle age. Another set of studies stands in direct contrast to the idea of ‘u-shapes
everywhere’, finding instead that the age–happiness relationship takes a variety of different forms in different countries (Bartram,
2023; Becker & Trautmann,
2022; Bittmann,
2021; Morgan & O’Connor,
2017).
The current situation, then, amounts to a remarkable absence of consensus. Different researchers adopt a range of different approaches in their investigations. Key points of difference include: (1) whether to use control variables (in particular, controls that are influenced by age); (2) whether to use cross-sectional data or to insist more stringently on use of panel data; (3) whether to adopt a priori a quadratic specification (or some other functional form, e.g. cubic), as against starting with a non-parametric approach. We can then ask: why is there so much contention—not only about the result itself but about the appropriate methodological way to conduct an analysis intended to give us the right result?
2.1 Over-Reliance on Statistical Significance
In research on this topic as on many others, quantitative researchers commonly evaluate their results mainly with reference to whether they are statistically significant. The key point here is that, if we evaluate our results
only by considering whether they are statistically significant, we run a substantial risk of drawing faulty conclusions from our analysis (compare Wasserstein et al.,
2019 and Carver,
1978).
At best, statistical significance could tell us whether our results, rooted in analysis of data from a sample, are likely to characterise the population from which the sample is drawn. As conventionally understood, a
p value can be used to evaluate a hypothesis of some sort. If we start with the assumption that the corresponding null hypothesis (H
0) is true, the
p value ‘is the probability … of [getting] a test statistic value at least as contradictory to H
0 as the value actually observed’ via the data we are analysing (Agresti & Finlay,
1997, p. 157). If
p is small, what many researchers then conclude is that the null hypothesis can be rejected—so, our results from sample data can then tell us that an effect of some sort is likely to be found in the population.
4
We can then consider the conditions that must be met for statistical significance to succeed in giving us this information. The ‘assumptions’ described in any statistics textbook include, inter alia, having a representative sample and having confidence that the ‘error term’ is not correlated with independent variables in the model. But the more important assumption in this context is as follows: if we are going to use a linear model, then we must be confident that the relationship is in fact linear. Statistical significance (p < 0.05) does not tell us that the relationship is linear. Instead, it enables us to extrapolate effectively from sample to population only if we already know that the relationship is indeed linear.
The point is universally articulated in textbooks with reference to linear regression—but it applies just as much to situations where a different functional form is specified. The functional form relevant here is quadratic, where age plus age-squared is entered in the model. The statistical significance of these coefficients cannot be used to tell us that the relationship is in fact u-shaped. Statistical significance can be used here to extrapolate effectively from sample to population only if we already know that the relationship is u-shaped. Using statistical significance as evidence about the shape itself amounts to putting the cart before the horse.
Researchers might believe that the quadratic age coefficients would be statistically significant only if the relationship is in fact u-shaped. It is no doubt jarring to imagine that the age plus age-squared coefficients could be statistically significant if the relationship is not u-shaped. How would this be possible? Why might statistical significance mislead us in this way?
The answer is: sample size. With a sufficiently large sample, we can get statistically significant results (
p < 0.05) from an analysis that imposes a particular functional form even when that functional form does not effectively represent the underlying social process. Here we can gain insight by revisiting a very basic question: how is p determined? P is associated with t, which results from dividing the coefficient by its standard error. The standard error is determined in part by sample size. With a larger sample, we are more likely to get
p < 0.05, simply because the standard error is smaller (so, t is larger and p is smaller). This is one core reason why statistical significance is not a sufficient way of reaching substantive conclusions, certainly not when the functional form of a relationship is in question. Geerling and Diener (
2020) show how use of large samples can lead to statistically significant results even when effect sizes are very small. The point here is again more jarring: using large samples, we can get statistically significant results even when those results are clearly
incorrect. This point is demonstrated empirically in the analysis below.
2.2 Use of Inappropriate Control Variables
As noted above, the early study by Blanchflower and Oswald (
2008) relied mainly on models that include control variables. Glenn’s critique (
2009) described the use of those controls as ‘inappropriate’. Blanchflower and Oswald (
2009,
2019) rejected this view, asserting that use of controls constitutes a ‘ceteris-paribus analytical approach’, ostensibly more appropriate for characterising the age–happiness relationship. This terminology needs unpacking, so that we can gain clarity about the social reality underpinning the results our analysis creates (Martin,
2018). The most effective contribution in this context is Morgan and O’Connor (
2017), arguing that an analysis without controls yields results that tell us what people actually
experience as they grow older (note their term ‘experienced life-cycle satisfaction’).
In many instances of quantitative research, the use of controls would of course make sense. In general, the purpose of using control variables is to mitigate the possibility of bias in our results: we want to ensure that our estimates are neither too high nor too low (as an indication of the true effect, which is unknown except via estimates using data). We might observe a correlation between height and vocabulary size, but if we conclude that getting taller leads to an increase in the number of words someone can use, we overlook the way age (among children) is the real cause of both processes. Once we control for age, we get the right estimate of the effect size of getting taller (i.e., zero).
That example works because the control (age) is an antecedent of both variables. In general, to estimate X → Y (the impact of X on Y) without bias, we need controls (W) that are antecedents of X and Y (so, W → X and W → Y) (see e.g. Pearl,
2009). A genuine problem arises when a model includes (as controls) variables that are instead
influenced by the focal independent variable (X → W). If we use these ‘bad controls’ (Angrist & Pischke,
2009), we exacerbate bias in our results, rather than mitigating bias.
5 Many researchers worry about ‘omitted variable bias’, which is indeed an important issue in general. But the possibility of ‘overcontrol bias’ is no less important (e.g. Elwert & Winship,
2014; Rohrer,
2018).
The relevant point in connection with the age–happiness relationship is twofold. (1) Apart from cohort and period, there are no antecedents of our focal independent variable (X) here, age (cf. Bittmann,
2021 and Kratz & Brüderl,
2021). Until they die, everyone keeps getting older, at the same rate, no matter what their other characteristics or circumstances are (Bartram,
2021). The only relevant controls are cohort and period, to address age–period–cohort concerns (see below). Other than those, there are no needed controls, i.e., variables where W → X. (2) The real problem is that age is likely to influence controls pertaining to individual characteristics and/or circumstances (so, X → W). Use of other variables as controls is very likely to lead to overcontrol bias.
We can then consider the likely direction of overcontrol bias in substantive terms. Getting older might mean loss of one’s spouse, or declining health, or reduced income. What would it mean to control for health when estimating the age–happiness relationship (as in e.g. Laaksonen,
2018)? The result for age would tell us about the way happiness changes as someone becomes one year older while health is ‘held constant’. The difficulty is that age itself does not ‘hold health constant’. Age influences people’s health; it is a ‘bad control’ (X → W). If we control for health, we learn about the impact of age only for people who are fortunate enough not to experience declining health (in line with the fact that health is being held constant). The result for age then does not reflect the experience of people who
do suffer from declining health. For them, ageing means becoming less healthy, which likely also means becoming less happy, relative to the happiness one might experience if health didn’t deteriorate (Jivraj et al.,
2014; Steptoe,
2019). As a representation of how age affects happiness in general (especially with reference to the idea that happiness might increase after a ‘midlife low’), results from a model that includes health as a control are very likely to be upwardly biased, giving an exaggerated impression of any tendency for happiness to increase after middle age (compare Hellevik,
2017).
6
The same pattern can be anticipated in an analysis that includes marital status as a control. As people age, their likelihood of being widowed increases. Controlling for marital status would tell us about the impact of age only for people who are fortunate enough not to lose their spouses. What about those who do lose their spouses? For them, the loss of spouse that comes with getting older is likely to mean lower happiness, relative to the happiness they would experience if they didn’t lose their spouses (Clark et al.,
2018). As a representation of how age affects happiness in general, results from a model that includes marital status as a control are very likely to be upwardly biased.
In general, getting older entails the experience of loss, for many people. If we include ‘bad controls’ (and when age is the independent variable, virtually all controls are either irrelevant or ‘bad controls’), we are likely to get upwardly biased results for post-middle-age experiences, misleading us about the impact of age on happiness in general. There might well be countervailing processes contributing also to a tendency towards increased happiness as we age. But inclusion of controls, especially for the circumstances that amount to loss, leads to overcontrol bias in a predictable direction—i.e., upwards, overstating the extent of increased happiness in later life.
Blanchflower and Oswald’s position (e.g. 2019) is that results from a model with controls give us the ‘pure’ effect of ageing. But we need clarity on what this result has been purified of. Another way of articulating the points made above is as follows: when we control for factors that are themselves influenced by age, our result has been ‘purified’ of part of the effects of age itself. If we want to know what people actually experience as they grow older, we need to omit variables that are influenced by age.
Bartram (
2023) demonstrates that the inclusion of typical ‘bad controls’ in a quadratic model of the age–happiness relationship results in a doubling of the age and age-squared coefficients. The error is a consequential one. It is especially consequential when statistical significance is used as the sole means of interpreting results and drawing substantive conclusions. Coefficients that are artificially inflated away from zero are more likely to be statistically significant. Biased results are therefore more likely to be perceived as
correct results when evaluated with reference to statistical significance. The practice of using ‘bad controls’ exacerbates the tendency to misinterpret results via a focus on statistical significance.