Mortality density forecasts: An analysis of six stochastic mortality models
Introduction
The last twenty years has seen a growing range of models for forecasting mortality. Early work on stochastic models by McNown and Rogers (1989) and Lee and Carter (1992) has been followed by:
- •
developments on the statistical foundations by, for example, Lee and Miller (2001), Brouhns et al. (2002), Booth et al. (2002a), Czado et al. (2005), Delwarde et al. (2007), and Li et al. (2009); and
- •
the development of new stochastic models by Booth et al., 2002a, Booth et al., 2002b, Booth et al., 2005, Cairns et al. (2006b) (CBD), Renshaw and Haberman (2006), Hyndman and Ullah (2007), Cairns et al. (2009), Plat (2009) and Debonneuil (2010).
A number of studies have sought to draw out more formal comparisons between a number of these models. Some of these limit themselves to comparison of some variants of the Lee–Carter model (Lee and Miller, 2001, Booth et al., 2002a, Booth et al., 2002b, Booth et al., 2005). Hyndman and Ullah (2007) compare out-of-sample forecasting performance of the Lee–Carter model and its Lee–Miller and Booth–Maindonald–Smith variants with a new class of multifactor models. CMI, 2005, CMI, 2006, CMI, 2007 compare the Lee–Carter, Renshaw and Haberman and -splines models. These types of analysis have been extended to a wider range of models with substantially different characteristics by the present authors; this paper is one part of this endeavour.
Cairns et al. (2009) focused on quantitative and qualitative comparisons of eight stochastic mortality models (see Table 1 in Section 2), based on their general characteristics and ability to explain historical patterns of mortality. The criteria employed included: quality of fit, as measured by the Bayes information criterion (BIC); ease of implementation; parsimony; transparency; incorporation of cohort effects; ability to produce a non-trivial correlation structure between ages; robustness of parameter estimates relative to the period of data employed.
Complementing this, Dowd et al., 2010a, Dowd et al., 2010b carry out a range of formal, out-of-sample backtesting and goodness-of-fit tests using mortality data for English and Welsh males. They find that some models fare better under some criteria than others, but that no single model can claim superiority under all the criteria considered. In any event, different patterns of mortality improvements in different countries means that models that are best for one country might not be as suitable for another. Finally, this paper focuses on the ex ante plausibility and robustness of forecasts produced by the different models. The present paper, therefore, focuses on the ex ante qualitative aspects of forecasts, while the previous works (Cairns et al., 2009, Dowd et al., 2010a, Dowd et al., 2010b) focus on the ex post quantitative aspects.
Building on the analyses of historical data of Cairns et al. (2009) and Dowd et al., 2010a, Dowd et al., 2010b, the present paper focuses on ex ante qualitative aspects of mortality forecasts and the distribution of results around central forecasts. Specifically, we introduce a number of qualitative criteria that focus on the plausibility of forecasts made using different models.
Often in this paper, we will refer to the concept of biological reasonableness (which was first proposed in Cairns et al., 2006a). The concept is not intended to refer to criteria based on hard scientific (biological or medical) facts. Instead, it is intended to cover a wide range of subjective criteria, related to biology, medicine and the environment. What the modeller needs to do is look at the results and ask the question: what mixture of biological factors, medical advances and environmental changes would have to happen to cause this particular set of forecasts? As one example, the upper set of projections in Fig. 4 at age 85 looks rather more unusual than the two lower sets of projections under a particular model. Under the upper scenario, we would have to think of a convincing biological, medical or environmental reason why, with certainty, age 85 mortality rates are going to deteriorate to 1960’s levels. If the modeller cannot think of any good reason why this might happen, then she must rule out the model (at least with its current method of calibration) on grounds of biological unreasonableness.
Besides biological reasonableness, we also consider the issue of the plausibility of forecast levels of uncertainty in projections at different ages. The objective here is to judge whether or not the pattern of uncertainty at different ages is consistent with historical levels of variability at different ages: we can sometimes conclude that a particular model is less plausible on the basis of forecast levels of uncertainty.
An important additional issue concerns the robustness of forecasts relative to the choice of sample period and age range. If we make a small change either to the sample period (for example, when we add in the latest mortality data) or to the age range, we would normally expect to see, with a robust model, only modest changes in the forecasts at all ages. Where a model is found to lack robustness with one sample population, there is a danger that it will lack robustness if applied to another sample population and should, therefore, either be used with great care or not used at all.
Although application of such a wide ranging set of model selection criteria will eliminate some models, we will demonstrate that mortality forecasting is no different from many other modelling problems where model risk is significant: mortality forecasters should acknowledge this fact and make use of multiple models rather than pretend that it is sufficient to make forecasts based on any single model.
We will consider qualitative assessment criteria that allow us to examine the ex ante plausibility of the forecasts generated by six stochastic mortality models, illustrating with national population data for England & Wales (EW) for an age group consisting of males 60–89 years old and estimated over the years 1961–2004. This is supplemented by a briefer discussion of forecasts for the equivalent US dataset. We focus on higher ages because our current principal research interest is the longevity risk facing pension plans and annuity providers.
We will concentrate on six of the models discussed by Cairns et al. (2009): these are labelled in Table 1 as M1, M2, M3, M5, M7 and M8. Models M2, M3, M7 and M8 include a cohort effect and these emerged in Cairns et al. (2009) as the best fitting, in terms of BIC, of the eight models considered on the basis of male mortality data from EW and the US for the age group under consideration. M2 is the Renshaw and Haberman (2006) extension of the original Lee–Carter model (M1), M3 is a special case of M2, and M7 and M8 are extensions of the original CBD model (M5). The original Lee–Carter and CBD models had no cohort effect, and provide useful benchmarks for comparison with the four models involving cohort effects. M4 is not considered any further in this study because of its low BIC and qualitative rankings for these datasets in Cairns et al. (2009, Table 3). (M4 focuses on identifying the smooth underlying trend. However, this means that it is not as good as the other models at capturing short-term deviations from this trend.) Although M3 is a special case of M2, we include it here because it had a relatively high BIC ranking for the US data, and because it avoids a problem with the robustness of parameter estimates for M2 identified by CMI (2007), Cairns et al. (2009), and Dowd et al., 2010a, Dowd et al., 2010b. M6 was also dropped from the original set of eight models: M6 is a special case of M7, and M7 was found to be stable and to deliver consistently better and more plausible results than M6.
The structure of the paper is as follows. In Section 2, we specify the stochastic processes needed for forecasting the term structure of mortality rates for each of the models. Results for the different models obtained using EW male mortality data are compared and contrasted in Section 3. Section 5 examines two applications of the forecast models, namely applications to survivor indices and annuity prices, and makes additional comments on model risk and plausibility of the forecasts. Each model is then tested for the robustness of its forecasts in Section 4. Finally, in Section 6, we summarise an analysis for US male mortality data: our aim is to draw out features of the US data that are distinct from those of the EW data. Section 7 concludes.
Section snippets
Forecasting with stochastic mortality models
We take six stochastic mortality models which, on the basis of fitting to historical data, appear to be suitable candidates for forecasting future mortality at higher ages, and prepare them for forecasting. To do this, we need to specify the stochastic processes that drive the age, period and (if present) cohort effects in each model.
We define to be the death rate in year at age , and to be the corresponding mortality rate, with the relationship between them given by
Forecasts and model comparisons
We now proceed to compare the forecasting results for EW for the nine models M1, M2A, M2B, M3A, M3B, M5, M7, M8A and M8B. (Corresponding results for US males are presented and discussed in Section 6.) To do this, we will present fan charts of the forecasts produced by the models. Each fan chart illustrates the forecast output from the stochastic mortality models by dividing the simulated densities into 5% quantile bands. Fan charts give us the opportunity to explore any distinctive visual
Robustness of projections
We now assess the projections from models M1, M2B, M3B, M5, M7, M8A and M8B for robustness relative to the sample period used in estimating the model. For each model, we compare three sets of simulations:
- •
Scenario 1: (A) The underlying model is first fitted to mortality data from 1961 to 2004. (B) The stochastic model for the period effects and the cohort effects is then fitted to the full set of values resulting from (A) (44 ’s and 60 ’s).
- •
Scenario 2: (A) The
Applications: survivor index and annuity price
In this section, we switch our attention from forecasts of the underlying mortality rates, , to two “derivative” quantities that utilise these forecasts. The first of these is a survivor index, and the second is the price of an annuity (which is, in turn, derived from the survivor index). Forecasts of these will provide additional evidence of possible model risk.
Fig. 7 shows the fan charts produced by each model of the future value of the survivor index ; this measures the
Results for US males
In this section, we report briefly on a repeat analysis of US males data from 1968 to 2003. (For a more detailed discussion, see Cairns et al., 2008b.) Our aim in this repeat analysis is to see whether the conclusions that we have drawn in Sections 3 Forecasts and model comparisons, 4 Robustness of projections, 5 Applications: survivor index and annuity price, 6 Results for US males are specific to the England & Wales males dataset or whether they might apply more generally to the US population
Conclusions
One of the main lessons from this investigation into forecasting with stochastic mortality models is the danger of ranking and selecting models purely on the basis of how well they fit historical data: it is quite possible for a model to give a good fit to the historical data, and still give inadequate forecasts. We propose here new qualitative criteria that focus on a model’s ability to produce plausible forecasts: biological reasonableness of forecast mortality term structures, biological
Acknowledgement
The authors would like to thank an anonymous referee for his/her helpful comments.
References (28)
- et al.
A Poisson log-bilinear regression approach to the construction of projected life tables
Insurance: Mathematics and Economics
(2002) - et al.
Bayesian Poisson log-linear mortality projections
Insurance: Mathematics and Economics
(2005) - et al.
Evaluating the goodness of fit of stochastic mortality models
Insurance: Mathematics and Economics
(2010) - et al.
Robust forecasting of mortality and fertility rates: a functional data approach
Computational Statistics & Data Analysis
(2007) On stochastic mortality modelling
Insurance: Mathematics and Economics
(2009)- et al.
A cohort-based extension to the Lee–Carter model for mortality reduction factors
Insurance: Mathematics and Economics
(2006) - Bauwens, L., Sucarrat, G., 2008. General to specific modelling of exchange rate volatility: a forecast evaluation....
- et al.
Applying Lee–Carter under conditions of variable mortality decline
Population Studies
(2002) - Booth, H., Maindonald, J., Smith, L., 2002b. Age–time interactions in mortality projection: applying Lee–Carter to...
- et al.
Evaluation of the variants of the Lee–Carter method of forecasting mortality: a multi-country comparison
New Zealand Population Review
(2005)
Pricing death: frameworks for the valuation and securitization of mortality risk
ASTIN Bulletin
A two-factor model for stochastic mortality with parameter uncertainty: theory and calibration
Journal of Risk and Insurance
Modelling and management of mortality risk: a review
Scandinavian Actuarial Journal
Cited by (215)
Thirty years on: A review of the Lee–Carter method for forecasting mortality
2023, International Journal of ForecastingBayesian model averaging for mortality forecasting using leave-future-out validation
2023, International Journal of ForecastingPricing extreme mortality risk in the wake of the COVID-19 pandemic
2023, Insurance: Mathematics and EconomicsExpressive mortality models through Gaussian process kernels
2024, ASTIN BulletinA market consistent approach to the valuation of no-negative equity guarantees and equity release mortgages
2023, Journal of Demographic EconomicsA calendar year mortality model in continuous time
2023, ASTIN Bulletin
- 1
Disclaimer: This report has been partially prepared by the Pension Advisory group, and not by any research department, of JPMorgan Chase & Co. and its subsidiaries (“JPMorgan”). Information herein is obtained from sources believed to be reliable but JPMorgan does not guarantee its completeness or accuracy. Opinions and estimates constitute JPMorgan’s judgment and are subject to change without notice. Past performance is not indicative of future results. This material is provided for informational purposes only and is not intended as a recommendation or an offer or solicitation for the purchase or sale of any security or financial instrument.