Skip to main content
Top

Open Access 30-01-2025 | Original Research Paper

Fast estimation of the Renshaw-Haberman model and its variants

Authors: Yiping Guo, Johnny Siu-Hang Li

Published in: European Actuarial Journal

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In mortality modelling, cohort effects are often taken into consideration as they add insights about variations in mortality across different generations. Statistically speaking, models such as the Renshaw-Haberman model may provide a better fit to historical data compared to their counterparts that incorporate no cohort effects. However, when such models are estimated using an iterative maximum likelihood method in which parameters are updated one at a time, convergence is typically slow and may not even be reached within a reasonably established maximum number of iterations. Among others, the slow convergence problem hinders the study of parameter uncertainty through bootstrapping methods. In this paper, we propose an intuitive estimation method that minimizes the sum of squared errors between actual and fitted log central death rates. The complications arising from the incorporation of cohort effects are overcome by formulating part of the optimization as a principal component analysis with missing values. Using mortality data from various populations, we demonstrate that our proposed method produces satisfactory estimation results and is significantly more efficient compared to the traditional likelihood-based approach.
Notes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1 Introduction

One important concept in modeling and management of longevity risk is cohort effects. In the context of longevity risk, cohort effects refer to the impact of a person’s birth year or generation on their health and mortality outcomes. The significance of cohort effects has long been recognized by demographers [18, 34] and actuaries [33].
Cohort effects can be attributed to various factors such as changes in lifestyle, medical advancements, etc. Their strength varies across geographical regions, although it is widely acknowledged that they are particularly strong in the United Kingdom, where the “golden generation” who were born in the early 1930s experienced significantly higher mortality improvement. It is important to note that cohort effects are not purely historical. For example, in 2022, New Zealand announced an anti-smoking law that bans the sale of tobacco to anyone born on or after January 1, 2009.1 Shortly after New Zealand’s announcement, UK also unveiled plans to phase out smoking for young generations. Such legislations are expected to result in cohort effects in mortality improvement, as they prevent younger generations from being exposed to the negative effects of tobaccos.
To incorporate cohort effects into stochastic mortality modeling, [28] extend the seminal work of [25] to develop the Renshaw-Haberman model. It adds to the original Lee-Carter model a bi-linear term, which captures the variation of mortality across years-of-birth and the interaction between such variation with age. It is also closely connected to the classical age-period-cohort (APC) model [18], as it degenerates to the APC model when some of its age-specific parameters are eliminated. When estimated to historical mortality data, the model is able to absorb part of the remaining variation that is not captured by models with age and period (time-related) effects only, leaving residuals that exhibit a more random pattern. Recently, the Renshaw-Haberman model has been generalized to incorporate socioeconomic differences in mortality [31], making it applicable to an even wider range of insurance and pension applications.
In the literature, including the original work of [28], the Renshaw-Haberman model is often estimated with maximum likelihood (ML). When fitting the Renshaw-Haberman with ML, a log-likelihood function is derived on the basis of a distributional assumption, typically Poisson, made on observed death counts; then, parameter estimates are obtained by maximizing the log-likelihood function. Given that the Renshaw-Haberman model has a large number of parameters, the maximization is customarily performed with an iterative Newton–Raphson method, in which parameters are updated one batch at a time. Unfortunately, ML estimation for the Renshaw-Haberman model is slow and sometimes unstable. Depending on the dataset in question, the iterative algorithm may not even converge given the desired convergence criterion. This problem is noted by a number of researchers, including [7, 8] and [15, 16].
While a slow convergence might be acceptable is model estimation is a one-off task, it may render applications that require repeated model estimation time-prohibitive. Such applications include the following.
  • Assessment of parameter uncertainty via bootstrapping
    Any model-based mortality projection is subject to parameter uncertainty, as the parameters used for extrapolating future death rates are estimates rather than exact. One way to gauge parameter uncertainty is bootstrapping [3, 10, 23]. In a bootstrap, a large number pseudo datasets are generated by, for example, resampling residuals (residual bootstrapping); then, the model is re-estimated using the pseudo data sets. The procedure results in empirical distributions of model parameters, from which parameter uncertainty can be inferred. The bootstrapping procedure involves a large number of (re-)estimations, and cannot be executed in practice if the estimation is slow.
  • Calculation of Solvency Capital Requirements
    Under Solvency II, solvency capital requirement (SCR) is based on the Value-at-Risk at a 99.5% confidence level over a one-year horizon [36]. In lieu of the prescribed standard formula, an insurer may opt to calculate SCR by simulating from an approved internal model. Taking re-calibration risk2 [5] into account, the simulation procedure for estimating longevity Value-at-Risk encompasses the following steps: (1) simulate \(M_1\) mortality scenarios in one year from a model that is fitted to historical data; (2) for each mortality scenario, re-estimate the model to an updated dataset that includes the simulated mortality scenario, and use the re-estimated model to simulate \(M_2\) sample paths of mortality (for year 2 and beyond), from which the expected value of the liability at the end of year 1 can be calculated. Step (2) yields a distribution of liabilities at the end of year 1, which can be used to infer the 99.5% Value-at-Risk. Typically, \(M_1\) is large, so that the procedure includes a large number of model re-estimations.
  • Identification of ultimate mortality improvement rates
    In recent years, two-dimensional mortality improvement scales have been promulgated by major actuarial professional organizations [see, e.g., 29]. A two-dimensional mortality improvement scale is composed of relatively high short-term scale factors, which are blended into lower long-term (ultimate) scale factors through an interpolative mid-term scale. One possible way to estimate the ultimate scale factors is to fit a parametric model to absorb all transient period and cohort effects that are present in the historical data, leaving a long-term pattern from which the ultimate scale factors can be inferred [27]. This method requires the modeller to experiment different model structures, some of which, ideally, include multiple age-cohort interaction terms. A slow convergence rate plagues the use of this method; in particular, it hinders the consideration of models with additional age-cohort interaction terms.
So far as we aware, two attempts have been made to mitigate the estimation issues of the Renshaw-Haberman model. The first attempt is made by [28], who consider a number of restricted versions of the Renshaw-Haberman model which may take less time to estimate given that they have fewer free parameters. Most notably, they propose the H1 model, which still incorporates cohort effects but assumes that such effects do not interact with age. The second attempt is made by [21], who argue that the problem of slow convergence is due possibly to an approximate identification issue that is applicable to the Renshaw-Haberman model. To mitigate the issue, they recommend imposing an additional parameter constraint to stabilize the estimation process and enhance algorithmic robustness. It is noteworthy that both approaches are based on a reduction in parameter space. That said, they improve estimation efficiency at the expense of goodness-of-fit to the historical data.
In this paper, we attack the problem of estimation efficiency for the Renshaw-Haberman model from a different angle. Instead of building on the commonly used maximum likelihood approach, we consider a least squares method in which parameters are estimated by minimizing the sum of squared errors between the actual and fitted log central death rates. The idea of using a least squares approach to estimate stochastic mortality models is not new. As a matter of fact, when the original Lee-Carter and Cairns-Blake-Dowd models were first proposed, the authors estimated them with least squares methods [6, 25].
It is not straightforward to efficiently estimate the Renshaw-Haberman model with a least squares method. This is because the model involves an additional (year-of-birth) dimension that is not orthogonal to the age and time dimensions, rendering the efficient singular value decomposition (SVD) technique that is used for fitting the original Lee-Carter model inapplicable. Recently, [30] has discussed the least squares implementation of the generalized APC model as a Gaussian generalised linear model, but it may have similar convergence issues as the classic MLE estimation. To overcome the optimization challenge, we develop an alternating minimization scheme which sequentially updates one group of parameters at a time. We also formulate the update of the age-cohort component in the model as a principal component analysis (PCA) problem with missing values, so that it can be accomplished effectively using an iterative SVD algorithm. Using data from various national populations, we demonstrate that our proposed least squares method significantly outperforms the ML approach in terms of estimation efficiency, without sacrificing goodness-of-fit to historical data.
Our proposed least squares method offers several advantages over the ML approach. First, given the same convergence criterion, our proposed method takes less computation time. We argue that the improvement in estimation efficiency is due to a sharper objective function, and empirically verify this argument with a numerical experiment. Second, unlike the ML approach, the objective function in our proposed approach is not built on a specific distributional assumption, thereby avoiding the potential problems associated with choosing such an assumption. Finally, our proposed method can be implemented seamlessly with the two methods that are previously proposed by [28] and [21] to further improve estimation efficiency.
The remainder of this paper is organized as follows. Section 2 presents an overview of the Lee-Carter model, with a focus on the estimation methods for the model that are relevant to this study. Section 3 reviews the Renshaw-Haberman model and its estimation challenges. Section 4 details our proposed method, including its motivation, theoretical support, and execution. Section 5 explains how our proposed method can be implemented simultaneously with the two methods that are previously proposed by [28] and [21]. Section 6 documents the numerical experiments that validate the advantages of our proposed method. Finally, concluding remarks are provided in Sect. 7.

2 The Lee-Carter model

2.1 Specification

This section presents a concise review of the Lee-Carter model [25], with a focus on two commonly used methods for estimating the model. We let \(m_{x,t}\) be the central rate of death for age x and year t, and \(y_{x,t}:=\log (m_{x,t})\) for notational convenience. The Lee-Carter model assumes that
$$\begin{aligned} y_{x,t}:=\log (m_{x,t})=a_x+b_xk_t+\varepsilon _{x,t}, \end{aligned}$$
(2.1)
where \(a_x\) and \(b_x\) are age-specific parameters, \(k_t\) is a time-varying index, and \(\varepsilon _{x,t}\) is the error term. In the model, \(a_x\) captures ‘age effects’ (the age pattern of mortality), \(k_t\) captures ‘period effects’ (changes in the overall mortality level over time), and \(b_x\) measures the interaction between age and period effects. Throughout this paper, we assume that the data set in question covers p ages, \(x\in [x_1,\cdots ,x_p]\), and n calendar years, \(t\in [t_1,\cdots ,t_n]\).
The Lee-Carter model is subject to an identifiability problem. It can be shown that two parameter constraints are required to stipulate parameter uniqueness. In the literature (including the original work of [25]), the following two parameter constraints are typically imposed:
$$\begin{aligned} \sum _{x=x_1}^{x_p}b_x=1 \quad \text {and} \quad \sum _{t=t_1}^{t_n}k_t=0. \end{aligned}$$
(2.2)

2.2 Least squares estimation

In their original work, [25] estimated (2.1) using a least squares approach, in which parameter estimates are chosen such that they minimize the sum of squared errors between the observed and fitted log central mortality rates. In more detail, let us rewrite the model in vector form as follows:
$$\begin{aligned} \varvec{y}_{t} = \varvec{a}+\varvec{b}k_t + \varvec{\varepsilon }_t, \end{aligned}$$
(2.3)
where \(\varvec{y}_{t}=(y_{x_1,t},\cdots ,y_{x_p,t})^T\), \(\varvec{a}=(a_{x_1},\cdots ,a_{x_p})^T\), \(\varvec{b}=(b_{x_1},\cdots ,b_{x_p})^T\) and \(\varvec{\varepsilon }_t=(\varepsilon _{x_1,t},\cdots ,\varepsilon _{x_p,t})^T\). In using the least squares approach, the estimates of \(\varvec{a}\), \(\varvec{b}\) and \(\varvec{k}\) are obtained by solving the following optimization:
$$\begin{aligned} \min _{\varvec{a},\varvec{b},\varvec{k}}\sum _{x,t}(y_{x,t}-(a_x+b_xk_t))^2= \min _{\varvec{a},\varvec{b},\varvec{k}}\sum _{t}\Vert \varvec{y}_t-(\varvec{a}+\varvec{b}k_t)\Vert _2^2, \end{aligned}$$
(2.4)
where \(\varvec{k}=(k_{t_1},\cdots ,k_{t_n})^T\) and \(\Vert \cdot \Vert _2\) denotes the Euclidean norm (or \(L^2\)-norm) of a vector. When the identification constraints specified in (2.2) are applied, the optimization problem specified in (2.4) is equivalent to a special case of principal component analysis (PCA) with one principal component. Its solution can thus be obtained by performing a singular value decomposition (SVD) on the mean-centered log mortality data matrix, \(\varvec{Y}-\bar{\varvec{Y}}:=(\varvec{y}_{t_1}-\bar{\varvec{y}},\cdots ,\varvec{y}_{t_n}-\bar{\varvec{y}})\), where \(\bar{\varvec{Y}}=(\bar{\varvec{y}},\cdots ,\bar{\varvec{y}})\) and \(\bar{\varvec{y}}=\frac{1}{n}\sum _{t=t_1}^{t_n} \varvec{y}_t\). The solution has the following closed form:
$$\begin{aligned} \hat{\varvec{a}}=\bar{\varvec{y}}, \quad \hat{\varvec{b}}=\frac{\varvec{u}}{\textbf{1}^T\varvec{u}}, \quad \hat{\varvec{k}}= (\textbf{1}^T\varvec{u}) \cdot (\varvec{Y}-\bar{\varvec{Y}})^T \varvec{u}, \end{aligned}$$
(2.5)
where \(\varvec{u}\) is the first left-singular vector of \(\varvec{Y}-\bar{\varvec{Y}}\) and \(\textbf{1}=(1,\cdots ,1)^T\). In this solution, the term \(\varvec{1}^T\varvec{u}\) normalizes the standard PCA solution due to the imposed constraint \(\sum _xb_x=1\). Additionally, it is easy to check that the constraint \(\sum _t k_t=0\) is also met. We refer readers to [2] and [17] for a comprehensive overview of the theory of PCA.

2.3 Maximum likelihood estimation

In contrast to the least squares approach, maximum likelihood estimation (MLE) requires a distributional assumption. Estimation of the Lee-Carter model using maximum likelihood was first accomplished by [35], who assumes that the observed death count in each age-time cell follows a Poisson distribution. We let \(D_{x,t}\) be the observed number of deaths for age x and year t, and \(N_{x,t}\) be the corresponding exposure-to-risk. The method of Poisson-MLE assumes that
$$\begin{aligned} D_{x,t}\sim \text {Poisson}(N_{x,t}m_{x,t}), \text { with }\log (m_{x,t})=a_x+b_xk_t. \end{aligned}$$
(2.6)
Parameter estimates are obtained by maximizing the following log-likelihood function:
$$\begin{aligned} \ell (\varvec{a},\varvec{b},\varvec{k})=\sum _{x,t}\left( D_{x,t}(a_x+b_xk_t)-N_{x,t}e^{a_x+b_xk_t} \right) +\text {constant}. \end{aligned}$$
(2.7)
The optimization problem can be solved via an iterative Newton–Raphson method [14].

3 The Renshaw-Haberman model

3.1 Specification

The focus of this paper is the Renshaw-Haberman model [28], which extends the Lee-Carter model by incorporating cohort effects. The Renshaw-Haberman model assumes that
$$\begin{aligned} y_{x,t}:=\log (m_{x,t})=a_x+b_xk_t+c_x\gamma _{t-x}+\varepsilon _{x,t}. \end{aligned}$$
(3.1)
In the above, \(\gamma _{t-x}\) is an index that is linked to year-of-birth \((t -x)\), thereby capturing cohort effects. Parameter \(c_x\) captures the sensitivity of the log central death rate at each age to cohort effects. The interpretations of \(a_x\), \(b_x\) and \(k_t\) in (3.1) are the same as those in the Lee-Carter model.
The Renshaw-Haberman model is also subject to an identifiability problem. In addition to the two constraints specified in (2.2), two constraints on \(c_x\) and \(\gamma _{t-x}\) are needed. Following [28], the additional constraints we use are
$$\begin{aligned} \sum _{x=x_1}^{x_p}c_x=1, \quad \text{ and } \quad \sum _{t-x=t_1-x_p}^{t_n-x_1}\gamma _{t-x}=0. \end{aligned}$$
(3.2)
It is worth-noting that the constraint for \(\gamma _{t-x}\) may be formulated differently. For instance, as mentioned by [28], another possible choice is \(\gamma _{t_1-x_p}=0\). We choose to use \(\sum _{t-x=t_1-x_p}^{t_n-x_1}\gamma _{t-x}=0\), because it is commonly adopted in the literature [e.g., 8] and used in the StMoMo package in R [32]. The choice of the constraints makes no difference to the goodness-of-fit.

3.2 Estimation

Estimation of Renshaw-Haberman model is well-known to be challenging. While the Lee-Carter model can be estimated readily using a least squares approach, a parallel least squares method for estimating the Renshaw-Haberman model remains largely unexplored in the literature. The least squares solution to the Renshaw-Haberman estimation problem is not easy to obtain, because the incorporation of cohort effects expands the dimension of the problem. This challenge is succinctly described by [11]:
“Under the Lee-Carter original approach, one might consider modelling the crude death rate with cohort effects as follows:
$$\begin{aligned} \log (\tilde{m}_{x,t})=\alpha _x+\beta _x\kappa _{t}+\beta _x^{\gamma }\gamma _{t-x}+\varepsilon _{x,t}. \end{aligned}$$
However the dimension of the cohort index would cause difficulty for the SVD estimation approach."
In the literature, the Renshaw-Haberman model is often estimated using maximum likelihood. Assuming Poisson death counts, the log-likelihood function for the Renshaw-Haberman model is given by
$$\begin{aligned} \ell (\varvec{a},\varvec{b},\varvec{k},\varvec{c},\varvec{\gamma })=\sum _{x,t}\left( D_{x,t}(a_x+b_xk_t+c_x\gamma _{t-x})-N_{x,t}e^{a_x+b_xk_t+c_x\gamma _{t-x}} \right) +\text {constant}, \end{aligned}$$
(3.3)
where \(\varvec{c}=(c_{x_1},\cdots ,c_{x_p})\) and \(\varvec{\gamma }=(\gamma _{t_1-x_p},\cdots ,\gamma _{t_n-x_1})\). This objective function is maximized through an iterative Newton–Raphson method to obtain parameter estimates. Although Poisson-MLE is technically feasible for the Renshaw-Haberman model, computational efficiency represents a significant concern to users. It is widely reported that Poisson-MLE for the Renshaw-Haberman model takes a lot of iterations to converge [7, 8, 15, 16]. The problem is investigated more deeply by [9], who emphasized the importance of using appropriate starting values in the estimation process. More recently, [30] attempt to obtain least squares estimates of Renshaw-Haberman parameters by changing the distributional assumption in MLE to Gaussian.3 However, their attempt still entails a computationally demanding iterative Newton–Raphson algorithm, and does not aim to solve the convergence problem.

3.3 Existing methods for expediting estimation

So far as we aware, there have been two major attempts to expedite estimation for the Renshaw-Haberman model. These methods are reviewed in this subsection.

3.3.1 The H1 model

[28] attempt to improve estimation efficiency by simplifying the structure of the Renshaw-Haberman model. Specifically, they consider setting \(c_x\) in the original Renshaw-Haberman Model to 1/p, where p represents the number of ages covered by the data set. The resulting model, given by
$$\begin{aligned} y_{x,t}:=\log (m_{x,t})=a_x+b_xk_t+\frac{1}{p}\gamma _{t-x}+\varepsilon _{x,t}, \end{aligned}$$
(3.4)
is often referred to as the H1 model, and is further discussed by [16]. The H1 model may be further reduced by setting \(b_x=1/p\). This further simplification would result in the classical age-period-cohort (APC) model [18]:
$$\begin{aligned} y_{x,t}:=\log (m_{x,t})=a_x+\frac{1}{p}k_t+\frac{1}{p}\gamma _{t-x}+\varepsilon _{x,t}. \end{aligned}$$
(3.5)
Reducing the model structure may result in a faster convergence; however, a reduced model structure may no longer provide an adequate fit.

3.3.2 The Hunt-Villegas method

[21] argue that the slow convergence of the MLE for the Renshaw-Haberman model is due possibly to an approximate identifiability issue.
Specifically, [21] show that if \(k_t\) in (3.1) follows a perfect straight line, then there exists an approximately invariant parameter transformation. In other words, parameters are not unique even if the four parameter constraints specified in (2.2) and (3.2) are imposed.
Empirically, the estimates of \(k_t\) typically exhibit a steady downward trend due to mortality improvements, but the trend is not perfectly linearly. As such, this identification problem is ‘approximate’ rather than ‘exact’. The approximate identification problem means that there exist different sets of parameters that would lead to different allocations between the time effect and cohort effect but approximately the same fit to the historical data. This phenomenon could potentially make the optimization procedure slow and unstable.
To resolve the approximate identifiability issue, [21] suggest imposing an additional constraint:
$$\begin{aligned} \sum _{s=t_1-x_p}^{t_n-x_1}(s-\bar{s})\gamma _s=0, \end{aligned}$$
(3.6)
where \(\bar{s}\) represents the average year-of-birth over the years-of-birth covered by the data set. This constraint ensures that \(\gamma _s\) does not follow a linear trend over the years-of-birth covered by the data set. To see why this is true, we can treat (3.6) as a requirement that the sample covariance between \(\gamma _s\) (which has a zero mean due to another identifiability constraint) and year-of-birth s is zero. [20] mention that the additional constraint has significant demographic significance. Specifically, it is conceivable that \(\gamma _s\) is approximately trendless, because systematic changes in mortality over time should have been captured by \(k_t\).
Imposing the additional constraint can mitigate the approximate identification issue. It also shrinks the parameter space over which the Newton–Raphson’s algorithm has to cover, thereby stabilizing and accelerating the optimization.

4 The proposed method

4.1 Motivation

The existing methods for expediting Renshaw-Haberman estimation are both based on the MLE framework, and therefore the requirement of a distributional assumption (which may turn out to be wrong) remains. Also, both methods rely on a reduction in parameter space, so that the improve estimation efficiency at the expense of goodness-of-fit.
The aforementioned limitations motivate us to tackle the estimation challenge from a different angle. Specifically, we develop a least squares approach for the Renshaw-Haberman model, in which computational efficiency is achieved through some closed-form SVD solutions. The proposed approach has the following merits:
  • A sharper objective function
    Compared to MLE, the proposed least squares method is based on a different objective function. We show empirically in Sect. 6 that the objective function in our proposed method is sharper, thereby resulting in a faster convergence. Also, the objective function is optimized in part by some SVD closed-form solutions, so that our proposed method is more computational efficient.
  • Less reliant on distributional assumptions
    The MLE approach requires a strict distributional assumption. The commonly used Poisson death count assumption is not without criticism. For instance, the over-dispersion problem arising from population heterogeneity would render the Poisson assumption inappropriate. Although this problem may be mitigated by assuming a more flexible death count distribution, such as negative binomial [26], the number of parameter would increase and consequently model estimation may be even slower. In contrast, in our proposed method, the objective function, defined as the sum of squared differences between observed and fitted log mortality rates, is not developed upon any specific distributional assumption.
  • Seamless integration with existing methods for expediting estimation
    The proposed estimation method applies to not only the original Renshaw-Haberman model but also its reduced versions including the H1 model. It can also be implemented with the Hunt-Villegas method to further improve computational efficiency.

4.2 Main optimization: alternating minimization

When a least squares approach is used to estimate the Renshaw-Haberman model, the optimization problem can be expressed as
$$\begin{aligned} \min _{\varvec{a},\varvec{b},\varvec{k},\varvec{c},\varvec{\gamma }}\sum _{x,t}(y_{x,t}-(a_x+b_xk_t+c_x\gamma _{t-x}))^2. \end{aligned}$$
(4.1)
The parameter constraints specified in (2.2) and (3.2) are imposed to stipulate a unique solution.
Unlike the the least squares optimization problem (2.4) for the Lee-Carter model, (4.1) is a much more challenging problem that cannot be easily solved. To overcome the estimation challenge, we can consider an alternating minimization strategy, in which the shape parameter vector \(\varvec{a}\), the age-period component \((\varvec{b},\varvec{k})\) and the age-cohort component \((\varvec{c},\varvec{\gamma })\) are updated in turns, while other components are held fixed.
The core iteration is outlined in Algorithm 1. Each cycle of iteration is composed of several steps. Step 2 is simple, as it updates the shape vector \(\varvec{a}\) through an explicit averaging formula. Step 3 is also straightforward as it just fits a Lee-Carter structure, through a SVD, to the residual after the removal of the shape vector and age-cohort effects. Note that the updated values of \(k_t\) from Step 3 sum to zero, because the input residual matrix \([y_{x,t}-a_x-c_x\gamma _{t-x}]_{x,t}\) is row-centered following the implementation of (4.2) in Step 2.
Step 4, however, represents a much more complex optimization challenge, because it cannot be directly translated into a traditional PCA problem. Therefore, efficiently solving (4.5) in Step 4 is a crucial milestone of our research question. In the next subsection, we show that Step 4 can be formulated as a PCA problem with missing values, which can be efficiently solved in an iterative manner.
Finally, Step 5 requires a convergence criterion. In this paper, convergence is achieved when the relative change in the objective function, as defined by (4.1), falls below a pre-determined small threshold. Algorithm 1 always converges, since each of Steps 2–4 in the algorithm consistently decreases the objective function, and the objective function (\(L^2\) error) is inherently bounded below by zero.

4.3 Updating \(\varvec{c}\) and \(\varvec{\gamma }\): PCA with Missing Values via an Iterative SVD

In this subsection, we develop a method to overcome the optimization challenge in Step 4 of Algorithm 1.
First, let us explain in more detail the optimization challenge we are facing. Let \(z_{x,t}:=y_{x,t}-a_x-b_xk_t\) be the input for the sub-optimization problem in Step 4. We may arrange the values of \(z_{x,t}\) in a \(p\times n\) age-period (age-time) matrix as follows:
https://static-content.springer.com/image/art%3A10.1007%2Fs13385-025-00407-w/MediaObjects/13385_2025_407_Equ15_HTML.png
(4.7)
We are unable to update the age-cohort component \((\varvec{c},\varvec{\gamma })\) by applying a SVD directly to this age-period matrix, because age and cohort are not orthogonal in \(\varvec{Z}_{ap}\).
To solve the sub-optimization, we first rearrange the input values in a \(p\times (n+p-1)\) age-cohort data matrix:
https://static-content.springer.com/image/art%3A10.1007%2Fs13385-025-00407-w/MediaObjects/13385_2025_407_Equ16_HTML.png
(4.8)
In \(\varvec{Z}_{ac}\), each \(\times \) represents a missing value, which arise because the oldest and youngest cohorts are not completely observed. For instance, for the youngest cohort of individuals who are born in year \(t_n - x_1\), only one observed value \((z_{x_1, t_n})\) is available. Similar arrangements of mortality data are also made by researchers such as [1]. In the spirit of this rearrangement, the sub-optimization problem can be expressed as
$$\begin{aligned} \min _{\varvec{c},\varvec{\gamma }}\sum _{x,s\in \mathcal {O}}(z_{x,s}-c_x\gamma _{s})^2, \end{aligned}$$
(4.9)
where \(s:=t-x\) represents year-of-birth and \(\mathcal {O}\) is the set of indices of the observed values.
If \(\varvec{Z}_{ac}\) contains no missing value, then we can solve (4.9) readily by applying a SVD to \(\varvec{Z}_{ac}\). However, given the presence of missing values, the sub-optimization boils down to a first-order PCA with missing values.
Handling missing values in PCA is a complex problem in statistics and machine learning. The technical challenges entailed are discussed in the work of [22]. The same paper also provides a comprehensive review of the classical algorithms for PCA with missing values. In the modern statistics and machine learning literature, there exist advanced techniques for handling missing data in PCA, such as matrix completion with a nuclear norm regularization [12]. However, such advanced techniques are designed for extremely large-scale and sparse matrices. Additionally, their primary goal is to predict the missing values (matrix completion) rather than finding the optimal least squares solution.
In this study, we utilize a method called the iterative SVD algorithm. This algorithm begins with an imputation of the missing values, typically with row-wise means of the observed values in the input matrix. This creates an approximate complete matrix, to which a PCA can be applied to obtain singular vectors (parameter estimates). Then, a PCA reconstruction is employed to generate an improved imputation of the missing values. The process is repeated until convergence is achieved. The implementation of the iterative SVD algorithm in the context of our research is presented in Algorithm 2.
While the iterative SVD algorithm appears to be a suitable method for solving PCA with missing values, it is not immediately clear why this algorithm addresses our specific \(L^2\) minimization problem with missing values. To elucidate this, in the Appendix, we prove that the iterative SVD algorithm minimizes the target loss function specified in (4.9), and that the iterative SVD algorithm always converges.

5 Integrating the proposed method with the existing methods

We may implement our proposed method with one or both of the the existing methods (H1 and Hunt-Villegas) to further boost estimation speed.

5.1 Implementing with the H1 Model

Our proposed method can be applied to the variants of the Renshaw-Haberman model, including the H1 model discussed in Sect. 3. For the H1 model, least squares estimation can be formulated as the following the optimization problem:
$$\begin{aligned} \min _{\varvec{a},\varvec{b},\varvec{k},\varvec{\gamma }}\sum _{x,t}\left( y_{x,t}-(a_x+b_xk_t+\frac{1}{p}\gamma _{t-x})\right) ^2, \end{aligned}$$
(5.1)
and the following three identification constraints can be used to stipulate parameter uniqueness:
$$\begin{aligned} \sum _{x=x_1}^{x_p}b_x=1, \quad \sum _{t=t_1}^{t_n}k_t=0,\quad \sum _{t-x=t_1-x_p}^{t_n-x_1}\gamma _{t-x}=0. \end{aligned}$$
(5.2)
The main algorithm of our proposed method for the H1 model is identical to Algorithm 1 for the Renshaw-Haberman model, except that \(c_x\) is always set to \(=1/p\). Interestingly, as explained below, further computational simplifications can be achieved when our proposed method is applied to the H1 model.
For the H1 model, we can update \(\varvec{\gamma }\) using explicit formulas, thereby eliminating the need for iterative algorithms. To explain, we first express the sub-optimization problem for updating \(\varvec{\gamma }\) (Step 4 in Algorithm 1) in the H1 model as follows:
$$\begin{aligned} \min _{\varvec{\gamma }}\sum _{x,t}\left( \underbrace{(y_{x,t}-a_x-b_xk_t)}_{\text {given}}-\frac{1}{p}\gamma _{t-x}\right) ^2. \end{aligned}$$
(5.3)
The above can be rewritten in an age-cohort dimension as
$$\begin{aligned} \min _{\varvec{\gamma }}\sum _{x,s\in \mathcal {O}}\left( z_{x,s}-\frac{1}{p}\gamma _{s}\right) ^2, \end{aligned}$$
(5.4)
where \(z_{x,s}:=y_{x,s}-a_x-b_xk_s\) denotes the residual from Step 3 in Algorithm 1, \(s:=t-x\) represents year-of-birth, and \(\mathcal {O}\) is the set of the indices for the observed values in \(\varvec{Z}_{ac}\) (the matrix of \(z_{x,s}\) in age-cohort dimension).
Noticing that (5.4) is separable, we can rewrite it as:
$$\begin{aligned} \min _{\varvec{\gamma }}\sum _s \sum _{x \in \mathcal {O}_s}\left( z_{x,s}-\frac{1}{p}\gamma _{s}\right) ^2, \end{aligned}$$
(5.5)
where \(\mathcal {O}_s\) denotes the set of the indices for the observed values in column s of \(\varvec{Z}_{ac}\). Note that this convenient separability does not hold for the general Renshaw-Haberman model with \(c_x\ne 1/p\), since each summand \((z_{x,s}-c_x\gamma _s)^2\) depends on both age x and year-of-birth s.
The separability enables us to solve the target optimization problem (5.1) by solving the following for each s:
$$\begin{aligned} \min _{\gamma _s}\sum _{x\in \mathcal {O}_s}\left( z_{x,s}-\frac{1}{p}\gamma _{s}\right) ^2. \end{aligned}$$
(5.6)
For a given s, (5.6) is a simple linear regression with no intercept and a slope of
$$\begin{aligned} \hat{\gamma }_s= \frac{p}{n_s}\sum _{x \in \mathcal {O}_s}z_{x,s}, \end{aligned}$$
(5.7)
where \(n_s:=|\mathcal {O}_s|\) is the cardinality of \(\mathcal {O}_s\). Applying (5.7) for every year-of-birth s covered by the data set yields an update of \(\varvec{\gamma }\).

5.2 Implementing with the Hunt-Villegas method

This subsection explains how our proposed method can be utilized with the H1 model and the Hunt-Villegas method.
Recall that the Hunt-Villegas method originates from an approximate identifiability problem of the Renshaw-Haberman model and its variants. For the H1 model, [21] show that if \(k_t\) follows a perfect straight line, i.g., \(k_t = K(t-\bar{t})\), where K is a constant that is less than zero and \(\bar{t}=(t_n + t_1)/2\) represents the mid-point of the calibration window, then there exists the following invariant transformation that is equivalent to \(\{a_x,b_x,k_t,\gamma _{s}\}\):
$$\begin{aligned} \left\{ a_x+\frac{g}{p}(x-\bar{x}),\frac{K}{K-g}b_x-\frac{g}{p(K-g)},\frac{K-g}{K}k_t,\gamma _s +g(s-\bar{s}) \right\} , \end{aligned}$$
(5.8)
where \(\bar{x} = (x_p + x_1)/2\) represents the mid-point of the age range under consideration and g is a real constant. In practice, the trend in \(k_t\) close to but not perfectly linear, so that an approximate identifiability problem exists. This approximate identifiability problem may adversely affect convergence of the estimation algorithm. [21] propose to mitigate approximate identifiability problem by imposing the extra constraint specified in (3.6).
[21] proposed a modified Newton–Raphson method to impose (3.6) in Poisson ML estimation of model parameters. Specifically, in each iteration of the Newton–Raphson algorithm, they determine the values of K and g in the invariant transformation such that (3.6) is satisfied; then, the invariant transformation is applied to adjust the parameter estimates.
However, it turns out that the modified Newton–Raphson method is not applicable to our alternating minimization scheme. To explain, let us suppose that in one iteration we have updated the value of \(\varvec{\gamma }\) using the closed-form solution provided in (5.7). This update is guaranteed to decrease the value of the overall objective function specified in (5.1). However, if we adjust the estimates of \(\varvec{a}\), \(\varvec{b}\), \(\varvec{k}\), and \(\varvec{\gamma }\) using the approximate invariant transformation specified in (5.8) to make (3.6) hold, then the resulting estimates may lead to a higher (less optimal) value of (5.1), since the transformation is only approximately (rather than exactly) invariant. If the value of the objective function increases in some iterations, the alternating minimization algorithm may diverge.
We propose to incorporate the additional constraint specified in (5.8) by using a Lagrange multiplier in the update of \(\varvec{\gamma }\) in model H1. Incorporating (5.8), we aim to solve the following constrained optimization problem in the update of \(\varvec{\gamma }\):
$$\begin{aligned} \min _{\varvec{c},\varvec{\gamma }}\sum _{x,s\in \mathcal {O}}(z_{x,s}-c_x\gamma _{s})^2,\quad \text {s.t. } \sum _{s= t_1-x_p}^{t_n-x_1} \gamma _s(s-\bar{s})=0. \end{aligned}$$
(5.9)
Then, the Lagrangian can be written as:
$$\begin{aligned} \mathcal {L}(\varvec{\gamma },\lambda )=\sum _{x,s\in \mathcal {O}}\left( z_{x,s}-\frac{1}{p}\gamma _{s}\right) ^2+2\lambda \sum _{s= t_1-x_p}^{t_n-x_1}\gamma _s(s-\bar{s}), \end{aligned}$$
(5.10)
where \(\lambda \) represents the Lagrange multiplier.4
Unlike the unconstrained case in which objective function is separable, the minimization of (5.10) is non-separable because the Lagrange multiplier \(\lambda \) applies to all years-of-birth \(s = t_1-x_p, \ldots , t_n-x_1\). To obtain the solution to (5.9), we derive the first-order partial derivatives of \(\mathcal {L}(\varvec{\gamma },\lambda )\) with respect to \(\varvec{\gamma }\) and \(\lambda \), and set them to zero:
$$\begin{aligned} \frac{\partial \mathcal {L}}{\partial \gamma _s}= & -\frac{2}{p}\cdot \sum _{x \in \mathcal {O}_s}\left( z_{x,s}-\frac{1}{p}\gamma _s\right) +2\lambda (s-\bar{s})=0, \quad s= t_1-x_p, \ldots , t_n-x_1. \end{aligned}$$
(5.11)
$$\begin{aligned} \frac{\partial \mathcal {L}}{\partial \lambda }= & \sum _{s= t_1-x_p}^{t_n-x_1} \gamma _s(s-\bar{s})=0. \end{aligned}$$
(5.12)
From (5.11), we obtain the following expression of \(\gamma _s\) in terms of \(\lambda \):
$$\begin{aligned} \gamma _s=\frac{p}{n_s}\cdot \left[ \left( \sum _{x \in \mathcal {O}_s}z_{x,s}\right) -p\lambda (s-\bar{s}) \right] , \end{aligned}$$
(5.13)
for \(s = t_1-x_p, \ldots , t_n-x_1\). Plugging (5.13) into (5.12), we get the optimal solution for \(\lambda \):
$$\begin{aligned} \hat{\lambda }=\frac{1}{p}\cdot \left[ \sum _{s= t_1-x_p}^{t_n-x_1} \frac{(s-\bar{s})^2}{n_s} \right] ^{-1} \cdot \sum _{s= t_1-x_p}^{t_n-x_1} \left[ \frac{s-\bar{s}}{n_s}\cdot \sum _{x\in \mathcal {O}_s}z_{x,s} \right] . \end{aligned}$$
(5.14)
Plugging (5.14) back into (5.13) gives the solution to \(\gamma _s\) for \(s = t_1-x_p, \ldots , t_n-x_1\).

6 Numerical illustrations

In this section, we present various experiments to illustrate our proposed least square method for estimating the Renshaw-Haberman model. The data used are obtained from the [19]. They cover a calibration window of 1950-2019 and an age range of 60–89. All of the experiments are performed using a desktop with an Intel Core i9-10900 CPU at 2.80 GHZ, 16 GB of RAM, and Windows 11 Education (64 bits).
All estimation methods under consideration involve an iterative procedure. While it is usual to base the convergence criterion of an iterative procedure on the absolute change in the objective function in each iteration, we consider the relative change instead, because the objective functions of Poisson ML estimation and the proposed least squares estimation have rather different magnitudes. Basing the convergence criterion on relative changes allows us to compare the two streams of estimation methods more fairly.
We use \(\delta \) to represent the tolerance level used in main estimation algorithms. The choice of \(\delta \) is admittedly subjective. The StMoMo package, by default, uses a tolerance level of \(10^{-4}\) and the absolute change in the log-likelihood function as the convergence criterion when fitting the Renshaw-Haberman model. Considering the size of the datasets we are using, the values of the maximized log-likelihood functions (when models are fitted using Poisson MLE) have a magnitude of \(10^4\). Since we are basing our convergence criterion on relative changes, the baseline value of \(\delta \) is set to \(10^{-8}\) to match the standard used in the StMoMo package.

6.1 Comparing least squares with poisson MLE

We first compare the following three methods for fitting the Renshaw-Haberman model:
  • RH-MLE: The Renshaw-Haberman model estimated with Poisson MLE;
  • RH-MLE-HV: The Renshaw-Haberman model estimated with Poisson MLE and the Hunt-Villegas method;
  • RH-LS: The Renshaw-Haberman model estimated with our proposed least squares method.
The baseline results are obtained using the data from the male populations of England and Wales (E&W) and the US. These data sets are considered in prominent works on stochastic mortality modelling [e.g., 7, 8].
The results are summarized in Table 1, from which we observe that RH-LS consumes significantly less computation time compared to RH-MLE. Reductions in computational time are over 90% in general. Figure 1 shows that for a given data set, the parameter estimates from RH-LS and RH-MLE are highly similar. It is not surprising that the parameter estimates from the two estimations methods are not identical, because they are based on different objective functions. As expected, RH-LS (which minimizes the \(L^2\) error) yields a lower (less preferred) log-likelihood but a smaller \(L^2\) error compared to RH-MLE.
We also observe from Table 1 that RH-MLE-HV takes shorter computational times relative to RH-MLE, and is comparable to RH-LS in terms of computational efficiency. However, it is important to note that RH-MLE-HV results in less desirable \(L^2\) errors and log-likelihoods compared to both RH-MLE and RH-LS, because RH-MLE-HV entails an additional constraint which makes the model more restrictive. That said, RH-MLE-HV improves computational efficiency at the expense of goodness-of-fit.
Table 1
\(L^2\) errors, log-likelihood values, and computation times for RH-MLE, RH-LS and RH-MLE-HV, based on E&W male and US male datasets
Data
 
RH-MLE
RH-LS
RH-MLE-HV
E&W male
\(L^2\) error
0.578
\(\varvec{0.565}\)
0.581
Log-likelihood
\(\varvec{-12843}\)
\(-12890\)
\(-12853\)
Computing time (seconds)
336.68
38.72
\(\varvec{20.39}\)
US male
\(L^2\) error
0.472
\(\varvec{0.465}\)
0.473
Log-likelihood
\(\varvec{-17736}\)
\(-17828\)
\(-17744\)
Computing time (seconds)
228.72
\(\varvec{10.44}\)
25.95
Table 2
Computation times for RH-MLE, RH-LS and RH-MLE-HV, based on all of the ten datasets under consideration
Computation time (seconds)
Data
RH-MLE
RH-LS
RH-MLE-HV
E&W male
336.68
38.72
\(\varvec{20.39}\)
E&W female
132.21
36.28
\(\varvec{10.26}\)
US male
228.72
\(\varvec{10.44}\)
25.95
US female
30.23
\(\varvec{26.18}\)
42.41
Australia male
53.10
\(\varvec{5.76}\)
18.28
Australia female
59.12
\(\varvec{10.81}\)
12.76
Canada male
118.51
\(\varvec{16.17}\)
29.98
Canada female
59.02
\(\varvec{11.71}\)
43.31
The Netherlands male
101.22
\(\varvec{13.58}\)
15.34
The Netherlands female
39.25
24.52
\(\varvec{5.62}\)
Table 3
Computation times for RH-MLE, RH-LS and RH-MLE-HV, based on all of the ten datasets, with an age range of 0–89
Computation time (seconds)
Data
RH-MLE
RH-LS
RH-MLE-HV
E&W male
891.71
\(\varvec{22.23}\)
204.25
E&W female
262.38
\(\varvec{31.31}\)
166.38
US male
3210.81
210.22
\(\varvec{32.03}\)
US female
1320.94
280.12
\(\varvec{199.01}\)
Australia male
2015.08
212.40
\(\varvec{32.53}\)
Australia female
715.67
194.17
\(\varvec{12.22}\)
Canada male
1188.53
\(\varvec{73.21}\)
132.10
Canada female
128.22
\(\varvec{22.89}\)
68.99
The Netherlands male
601.22
116.12
\(\varvec{49.32}\)
The Netherlands female
900.29
\(\varvec{98.23}\)
148.42
To demonstrate the consistency in computation time reduction, we compare RH-LS with RH-MLE using eight alternative data sets: E&W female, US female, Australia male and female, Canada male and female, and the Netherlands male and female. Reported in Table 2, the results show that RH-LS takes a significantly shorter computation time compared to RH-MLE for all of the ten datasets under consideration. While it is more computationally efficient, our proposed method preserves goodness-of-fit in the sense that it results in smaller \(L^2\) errors and similar log-likelihood values compared to the Poisson ML approach.
As a robustness check, we repeat the numerical experiment using a wider age range of 0-89. The resulting computation times for all of the ten datasets under consideration are presented in Table 3. As expected, both estimation methods take longer to converge due to the increased parameter space introduced by the expanded age range. For RH-MLE, the increase is significantly higher, leading computational times ranging from approximately ten minutes to one hour per estimation. This level of fitting speed renders tasks that require repeated model re-fitting, such as uncertainty estimation via the bootstrapping, practically infeasible. In contrast, for RH-LS, the computation times needed for the extended age range remain modest.

6.2 Sharpness of objective functions

One may wonder why the proposed least squares method is more computationally efficient than the Poisson maximum likelihood approach, while producing a comparable goodness-of-fit. In this sub-section, we attempt to account for the superiority of our proposed approach by considering the sharpness of the objective functions used in each of the candidate estimation methods.
In a study of maximum likelihood estimation of various stochastic mortality models, [8] mentioned that “the likelihood function will be close to flat in certain dimensions.” As a result of such flatness, over the iterative estimation process, parameter estimates tend stray around the area of parameter space over which the resulting log-likelihood values are similar, thereby resulting in a slow convergence. It follows that a faster convergence can be achieved if the objective function is sharper, in the sense that the change in its value in each iteration of the estimation algorithm tends to be larger.
Should there be flatness in certain dimensions of the objective function, parameter estimates tend to be sensitive to the tolerance level \(\delta \) used in the iterative estimation process. To compare the sharpness of the objective functions for RH-MLE and RH-LS, we estimate the Renshaw-Haberman model with Poisson MLE and the proposed least squares methods to the US male dataset, for different tolerance levels: \(10^{-6}\), \(10^{-7}\) and \(10^{-8}\).
Figure 3 reveals that parameter estimates obtained from Poisson MLE are quite sensitive to the tolerance level. The reduction in tolerance level does not materially improve the log-likelihood value, but comes with a substantially longer computation time as reported in Table 4. In contrast, Fig. 4 shows that the parameter estimates obtained from the proposed least squares method are more robust with respect to the tolerance level. Additionally, compared to Poisson MLE, the increase in computational time as the tolerance level reduces is moderate, as also shown in Table 4. These outcomes suggest that the objective function for the proposed method is sharper, offering a reason as to why the proposed method is more computationally efficient.
Table 4
\(L^2\) errors, log-likelihoods, and computing times for RH-MLE and RH-LS when three different tolerance levels are used, US male
 
Tolerance Level
RH-MLE
RH-LS
\(L^2\) error
\(10^{-6}\)
0.4734
0.4661
\(10^{-7}\)
0.4727
0.4654
\(10^{-8}\)
0.4720
0.4652
Log-likelihood
\(10^{-6}\)
\(-17749\)
\(-17828\)
\(10^{-7}\)
\(-17745\)
\(-17827\)
\(10^{-8}\)
\(-17737\)
\(-17827\)
Time (seconds)
\(10^{-6}\)
12.28
3.96
\(10^{-7}\)
20.67
4.92
\(10^{-8}\)
283.87
11.28

6.3 Implementing with the H1 model

In this sub-section, we implement our proposed least squares estimation method with the H1 model and/or the Hunt-Villegas method to further boost estimation efficiency. The following four settings are considered:
  • H1-MLE: The H1 model estimated with Poisson MLE;
  • H1-MLE-HV: The H1 model estimated with Poisson MLE plus the Hunt-Villegas method;
  • H1-LS: The H1 model estimated with the proposed least squares method;
  • H1-LS-HV: The H1 model estimated with the proposed least squares method plus the Hunt-Villegas method.
Table 5 presents the results for the four settings above, derived from the E&W male and US datasets. Comparing the results for H1-MLE (Table 5) and RH-MLE (Table 1), we notice that using the H1 model (a reduced version of the original Renshaw-Haberman model) helps reduce computation time. Nevertheless, compared to the full Renshaw-Haberman model, the H1 model yields lower log-likelihood values and higher \(L^2\) errors, suggesting that it produces a reduced goodness-of-fit. This outcome is expected, as the H1 model is a restricted version of the Renshaw-Haberman model with p fewer parameters.
On the other hand, from Table 5 we observe that the computation times for H1-LS are significantly less than those for H1-MLE, suggesting that the proposed least square estimation methods also offers an improvement in estimation efficiency when a restricted version of the Renshaw-Haberman model is considered. Finally, from Table 5 we notice that H1-LS-HV requires the least computation time among all settings under consideration. For the US male dataset, H1-LS-HV takes just slightly over 2 s, which is less than 1% of the computation time required when we estimate the original Renshaw-Haberman model with Poisson MLE.
Table 5
\(L^2\) errors, log-likelihoods, and computation times for H1-MLE, H1-MLE-HV, H1-LS, and H1-LS-HV, E&W male and US male datasets
Data
 
H1-MLE
H1-LS
H1-MLE-HV
H1-LS-HV
E&W male
\(L^2\) error
0.682
\(\varvec{0.663}\)
0.721
0.697
Log-likelihood
\(\varvec{-13149}\)
\(-13208\)
\(-13247\)
\(-13321\)
Computation time (seconds)
62.25
9.79
19.05
\(\varvec{3.16}\)
US male
\(L^2\) error
0.557
\(\varvec{0.545}\)
0.557
0.545
Log-likelihood
\(\varvec{-18692}\)
\(-18817\)
\(-18699\)
\(-18816\)
Computation time (seconds)
138.88
2.56
18.30
\(\varvec{2.12}\)
For a more comprehensive analysis, we study the four settings with the eight alternative datasets considered in Section 6.1. Tabulated in Table 6, the results indicate the superiority of H1-LS over H1-MLE in terms of computation efficiency for all of the eight datasets under consideration. We also observe from Table 6 that for certain datasets, such as US female, Canada female, and the Netherlands female, fitting the H1 model is very time consuming (even though the H1 model is a restricted version of the original Renshaw-Haberman model), suggesting convergence issues that are possibly caused by the approximate identification problem discussed in Sect. 3.3.2. In these cases, using the Hunt-Villegas method could significantly reduce the computation time, and switching from Poisson MLE to the proposed least squares approach could lower the computation time even more.
Table 6
Computation times for H1-MLE, H1-MLE-HV, H1-LS, and H1-LS-HV, based on all of the ten datasets under consideration
Computing time (s)
Data
H1-MLE
H1-LS
H1-MLE-HV
H1-LS-HV
E&W male
62.25
9.79
19.05
\(\varvec{3.16}\)
E&W female
58.23
1.12
7.58
\(\varvec{0.97}\)
US male
138.88
2.56
18.30
\(\varvec{2.12}\)
US female
702.34
89.93
3.58
\(\varvec{1.01}\)
Australia male
18.23
7.24
12.09
\(\varvec{2.55}\)
Australia female
9.12
5.58
6.61
\(\varvec{1.68}\)
Canada male
18.51
\(\varvec{2.69}\)
15.23
3.42
Canada female
162.26
78.62
7.28
\(\varvec{2.33}\)
The Netherlands male
21.22
2.76
9.38
\(\varvec{1.59}\)
The Netherlands female
425.69
85.51
2.85
\(\varvec{0.36}\)

6.4 Quantifying parameter uncertainty

One important application of stochastic mortality models is quantifying the uncertainty involved in mortality projections. A part of such uncertainty is parameter risk, which arises because parameters used in projecting future mortality are only estimates rather than known constants.
In this subsection, we demonstrate the advantage of our proposed method in the context of parameter uncertainty quantification. To this end, we consider a residual bootstrapping method, originally proposed by [23] and discussed in the review paper of [24]. The residual bootstrapping method is implemented using the following algorithm:
1.
Estimate the Renshaw-Haberman model to the original dataset, and calculate the residuals of fit as \( y_{x,t}-\hat{y}_{x,t}\) for all x in the age range and t in the calibration window.
 
2.
Sample residuals calculated in Step 1 with replacement, and generate a pseudo dataset by adding the sampled residuals to the fitted log mortality rates \(\hat{y}_{x,t}\) over the entire age range and calibration window.
 
3.
Fit the model to the pseudo dataset produced in the immediate previous step. A collection of parameter estimates are then obtained.
 
4.
Repeat Steps 2 and 3 M times, where M is a large integer, say 1000. This step yields, for each parameter in the Renshaw-Haberman model, an empirical distribution of parameter estimates. From the empirical distribution, measures of parameter uncertainty such as standard error can be calculated.
 
We implement the algorithm with RH-MLE, RH-LS and RH-MLE-HV, using the EW male and US male datasets (with an age range of 60-89). The resulting standard errors for a subset of parameters are presented in Tables 7 and 8, respectively. It is observed that the three estimation methods produce standard errors of similar orders of magnitude.
Although the three estimations methods produce similar standard errors, they demand significantly different amounts of time to produce such standard errors. Since the residual bootstrapping algorithm entails M re-estimations, the runtime of the algorithm is directly proportional to the amount of time needed for each estimation. When the US male dataset is considered and M is set to 1000, the amounts of time required by RH-MLE and RH-LS are 3812 min (2.65 days) and 174 min, respectively. This result underscores the advantage of our proposed estimation method in practical applications that involve repeated model estimation.
Table 7
Standard errors of selected parameter estimates for RH-MLE, RH-LS and RH-MLE-HV, based on the EW male dataset
Parameter
 
RH-MLE
RH-LS
RH-MLE-HV
\(a_x\)
\(a_{60}\)
0.0114
0.0048
0.0039
\(a_{75}\)
0.0072
0.0063
0.0065
\(a_{89}\)
0.0034
0.0049
0.0054
\(b_x\)
\(b_{60}\)
0.0006
0.0007
0.0008
\(b_{75}\)
0.0003
0.0003
0.0002
\(b_{89}\)
0.0011
0.0010
0.0011
\(k_t\)
\(k_{1950}\)
0.3835
0.3710
0.3803
\(k_{1985}\)
0.1954
0.1834
0.1805
\(k_{2019}\)
0.3585
0.3622
0.3315
\(c_x\)
\(c_{60}\)
0.0013
0.0012
0.0015
\(c_{75}\)
0.0004
0.0004
0.0004
\(c_{89}\)
0.0012
0.0018
0.0013
\(\gamma _{t-x}\)
\(\gamma _{1861}\)
1.9783
1.3747
1.9718
\(\gamma _{1910}\)
0.3678
0.3166
0.3650
\(\gamma _{1959}\)
0.7992
0.8069
0.8508
Table 8
Standard errors of selected parameter estimates for RH-MLE, RH-LS and RH-MLE-HV, based on the US male dataset
Parameter
 
RH-MLE
RH-LS
RH-MLE-HV
\(a_x\)
\(a_{60}\)
0.0096
0.0066
0.0038
\(a_{75}\)
0.0058
0.0051
0.0045
\(a_{89}\)
0.0201
0.0096
0.0053
\(b_x\)
\(b_{60}\)
0.0010
0.0007
0.0009
\(b_{75}\)
0.0004
0.0003
0.0003
\(b_{89}\)
0.0015
0.0010
0.0012
\(k_t\)
\(k_{1950}\)
0.9492
0.3231
0.2767
\(k_{1985}\)
0.1819
0.1875
0.1743
\(k_{2019}\)
0.9811
0.4943
0.5510
\(c_x\)
\(c_{60}\)
0.0012
0.0011
0.0015
\(c_{75}\)
0.0005
0.0007
0.0006
\(c_{89}\)
0.0024
0.0027
0.0027
\(\gamma _{t-x}\)
\(\gamma _{1861}\)
1.5197
0.7801
0.8251
\(\gamma _{1910}\)
0.3313
0.2789
0.2901
\(\gamma _{1959}\)
1.8623
0.8211
0.6808

7 Conclusion, discussion, and limitations

In this paper, we introduce a least squares method for estimating the Renshaw-Haberman model. Our proposed approach obtains parameter estimates by minimizing the total \(L^2\) error, which measures the sum of squared errors between the observed and fitted log central mortality rates. To overcome the optimization challenge, we develop an alternating minimization scheme which sequentially updates one group of parameters at a time. We also formulate the update of the age-cohort component as a PCA problem with missing values, so that it can be accomplished effectively using an iterative SVD algorithm.
Through a number of numerical experiments, we demonstrate that our proposed method significantly outperforms the traditional Poisson MLE in terms of computation time, while producing a better goodness-of-fit in terms of \(L^2\) error and a similar goodness-of-fit in terms of log-likelihood.
Our proposed method can be applied to the H1 model, a reduced version of the Renshaw-Haberman model that is designed to improve estimation efficiency. It can also be implemented in tandem with the Hunt-Villegas method, which reduces computation time through an extra parameter constraint. Our numerical experiments indicate that computation time can be reduced further if our proposed method is used with the H1 model and/or the Hunt-Villegas method.
We have argued that the least squares method is less dependent on distributional assumptions, as the objective function that is optimized to yield parameter estimates is not formulated on the basis of any explicit distributional assumption. Nevertheless, it should be noted that even though no explicit distributional assumption is made, the end result produced by the least squares approach is equivalent to that obtained from an MLE with the assumption that log central death rates are normally distributed with the same variance across the entire age range and calibration window. From this angle, MLE may be regarded as more advantageous than the least squares approach, and it permits the user to specify different distributions for log death rates (or death counts) and incorporate a variance structure that depends on age and/or time. For instance, in Poisson-MLE, the variance of the death count in an age-time cell is the expected number of deaths in the age-time cell implied by the model in question, thereby incorporating heteroskedasticity.
The (implicit) assumption of uniform variance across age and time is admittedly significant, particularly when the data set includes older ages for which variances (of empirical log central death rates) are higher due to reduced exposure counts [4]. One way to mitigate this limitation is to weight the squared errors with their corresponding exposure counts, an approach that was considered by [35] for estimating the original Lee-Carter model. Unfortunately, it is not straightforward to extend the least squares approach proposed in this paper to incorporate weights. This is because unlike an unweighted PCA with missing values, which can be efficiently solved using an iterative SVD algorithm (as discussed in Section 4.3), a weighted PCA with missing values is significantly more complex and is proven to be NP-hard [13]. Future research is warranted to develop an appropriate algorithm for solving the weighted PCA with missing values problem entailed in estimating \(c_x\) and \(\gamma _{t-x}\) in the Renshaw-Haberman model.
In Section 6.4, we demonstrate that our proposed estimation method can be implemented with a residual bootstrap to generate measures of parameter uncertainty in mortality forecasting. Without the improvement in efficiency brought by our proposed estimation method, the residual bootstrap would have taken a few days to complete. Similar benefits are also seen in the application to solvency capital requirement calculations under Solvency II. When re-calibration risk is taken into consideration, the solvency capital requirement of a liability can be obtained with the following algorithm:
1.
simulate \(M_1\) realizations of mortality in the following year;
 
2.
for each of the \(M_1\) realizations obtained from the previous step,
(a)
expand the original data set to include the realization of mortality in the following year;
 
(b)
re-estimate the mortality model with the updated dataset;
 
(c)
using the re-estimated model, simulate \(M_2\) sample paths of mortality (for year 2 and beyond);
 
(d)
calculate the expected value of the liability at the end of year 1 using the \(M_2\) sample paths;
 
 
3.
obtain an empirical distribution of liabilities at the end of year 1;
 
4.
calculate the solvency risk capital (Value-at-Risk at 99.5% confidence level) from the empirical distribution.
 
The outer loop of the algorithm above entails \(M_1\) model re-estimation. Typically, \(M_1\) needs to be large enough (say 5000) to that the tail risk measure can be prudently estimated. Assuming \(M_1 = 5000\) and the US male dataset is considered, it would take 13.2 days when RH-MLE is used, but only 14.5 h when our proposed RH-LS is used.
Further, the computational efficiency provided by our estimation method enables researchers to consider extensions of the Lee-Carter model with multiple cohort effects, such as one that generalizes the Renshaw-Haberman model with the following specification:
$$\begin{aligned} \log (m_{x,t})=a_x+\sum _{i=1}^Pb^{(i)}_xk^{(i)}_t+\sum _{j=1}^Qc^{(j)}_x\gamma ^{(j)}_{t-x}+\varepsilon _{x,t}, \end{aligned}$$
(7.1)
so that P period effects and Q cohort effects are captured. When estimated with the traditional ML method, this generalization would be subject to even more severe convergence problem compared to the Renshaw-Haberman model, as it comes with a larger parameter space. However, the computational efficiency of our proposed method is unaffected by the generalized model structure, as estimates of the additional model parameters can be obtained easily by replacing the first-order SVD in Steps 3 and 4 of Algorithm 1 with a P-order SVD and Q-order SVD, respectively.
The model specified in equation (7.1) may be used to identify long-term (ultimate) scale factors in two-dimensional mortality improvement scales, a mortality projection method that has been promulgated by major actuarial professional organizations in recent years. Defined as mortality improvement rates that are not subject to any transient period and cohort effects, such scale factors may be derived from the model specified in equation (7.1), with P and Q that are chosen in such a way that transient period and cohort effects are fully filtered by the model structure. The identification of P and Q in equation (7.1) is in principle similar to that for a GARCH(P, Q) process in the context of time-series analysis. We leave the implementation of this generalized model to future research.

Declarations

Conflict of interest

The authors have declared no Conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://​creativecommons.​org/​licenses/​by/​4.​0/​.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix

Appendix: theoretical properties of the iterative SVD algorithm

In this appendix, we provide further technical details about the iterative SVD algorithm in Algorithm 2.
Recall that the objective of the optimization problem is to update the age-cohort parameters \((\varvec{c},\varvec{\gamma })\) by minimizing the target loss function:
$$\begin{aligned} L(\varvec{c},\varvec{\gamma })=\sum _{x,s\in \mathcal {O}}(z_{x,s}-c_x\gamma _{s})^2, \end{aligned}$$
(A.1)
where \(s=t-x\) represents year-of-birth and \(\mathcal {O}\) is the set of the indexes of the observed values. For notational convenience, we use \(\hat{z}_{x,s}(\varvec{\theta }):=c_x\gamma _{s}\) to denote the estimator of observation \(z_{x,s}\) as a function of the model parameters \(\varvec{\theta }=(\varvec{c},\varvec{\gamma })\). We can then regard the optimization as the problem of finding \(\varvec{\theta }\) that minimizes the \(L^2\) loss function of the observed data:
$$\begin{aligned} L_{obs}(\varvec{\theta })=\sum _{x,s\in \mathcal {O}}(z_{x,s}-\hat{z}_{x,s}(\varvec{\theta }))^2. \end{aligned}$$
(A.2)
The iterative SVD algorithm iteratively performs impute the missing values in and perform SVD on the complete data matrix \(\varvec{Z}_{ac}\). Let us write the \(L^2\) loss function of the complete data and missing data as
$$\begin{aligned} L_{tot}(\varvec{\theta },\tilde{\varvec{z}})=L_{obs}(\varvec{\theta })+L_{mis}(\varvec{\theta },\tilde{\varvec{z}}) \end{aligned}$$
(A.3)
and
$$\begin{aligned} L_{mis}(\varvec{\theta },\tilde{\varvec{z}})=\sum _{x,s\notin \mathcal {O}}(\tilde{z}_{x,s}-\hat{z}_{x,s}(\varvec{\theta }))^2, \end{aligned}$$
(A.4)
respectively, where \(\tilde{z}_{x,s}\) is the imputed missing value of \(z_{x,s}\). The iterative SVD algorithm minimizes the total \(L^2\) error (A.3) as a function \(L_{tot}(\varvec{\theta },\tilde{\varvec{z}})\) with respect to both the model parameter \(\varvec{\theta }\) and the set of imputed values \(\tilde{\varvec{z}}=\{\tilde{z}_{x,s}|x,s\notin \mathcal {O}\}\).

Convergence of the iterative SVD algorithm

We first show that the iterative SVD algorithm always converges by showing that the algorithm can be represented as the following alternating minimization procedure:
1.
For a fixed \(\tilde{\varvec{z}}\), let \(\varvec{\theta }^{*}\) be the minimizer of \(L_{tot}(\varvec{\theta },\tilde{\varvec{z}})\) with respect to \(\varvec{\theta }\):
$$\begin{aligned} \varvec{\theta }^{*}&=\arg \min _{\varvec{\theta }}L_{tot}(\varvec{\theta },\tilde{\varvec{z}})\nonumber \\&=\arg \min _{\varvec{\theta }}\left[ \sum _{x,s\in \mathcal {O}}(z_{x,s}-\hat{z}_{x,s}(\varvec{\theta }))^2+\sum _{x,s\notin \mathcal {O}}(\tilde{z}_{x,s}-\hat{z}_{x,s}(\varvec{\theta }))^2\right] \nonumber \\&= \arg \min _{\varvec{\theta }}\left[ \sum _{x,s}(z^{\prime }_{x,s}-\hat{z}_{x,s}(\varvec{\theta }))^2\right] , \end{aligned}$$
(A.5)
where \(z^{\prime }_{x,s}\) equals \(z_{x,s}\) for \(x,s\in \mathcal {O}\) and \(\tilde{z}_{x,s}\) for \(x,s\notin \mathcal {O}\). Since \(\hat{z}_{x,s}(\varvec{\theta }):=c_x\gamma _{s}\), the minimizer \(\varvec{\theta }^{*}\) can be found by performing a PCA to the approximate complete matrix \(\varvec{Z}_{ac}\), as described in Step 2 of Algorithm 2.
 
2.
For a fixed \(\varvec{\theta }\), let \(\tilde{\varvec{z}}^{*}\) be the minimizer of \(L_{tot}(\varvec{\theta },\tilde{\varvec{z}})\) with respect to \(\tilde{\varvec{z}}\):
$$\begin{aligned} \tilde{\varvec{z}}^{*}&=\arg \min _{\tilde{\varvec{z}}}L_{tot}(\varvec{\theta },\tilde{\varvec{z}})\nonumber \\&=\arg \min _{\tilde{\varvec{z}}}\left[ L_{mis}(\varvec{\theta },\tilde{\varvec{z}})+\text {constant}\right] \nonumber \\&= \arg \min _{\tilde{\varvec{z}}}\left[ \sum _{x,s\notin \mathcal {O}}(\tilde{z}_{x,s}-\hat{z}_{x,s}(\varvec{\theta }))^2+\text {constant}\right] . \end{aligned}$$
(A.6)
It is easy to see that the minima is achieved when \(\tilde{z}_{x,s}=\hat{z}_{x,s}(\varvec{\theta })\) and so \(\tilde{\varvec{z}}^{*}=\{\tilde{z}_{x,s}(\varvec{\theta })|x,s\notin \mathcal {O}\}\). This solution is exactly the same as imputing the missing values by the PCA reconstruction \(\hat{z}_{x,s}(\varvec{\theta })\) with parameters \(\varvec{\theta }\), as described in Step 3 of Algorithm 2.
 

The iterative SVD algorithm minimizes the target loss function

We next show that the iterative SVD algorithm minimizes the target loss function (A.2). More precisely, the minimizer \(\varvec{\theta }^{*}\) obtained by iteratively minimizing the total \(L^2\) error \(L_{tot}(\varvec{\theta },\tilde{\varvec{z}})\) is equivalent to the one obtained by directly minimizing the \(L^2\) error \(L_{obs}(\varvec{\theta })\) of the observed data.
Following (A.6), we have
$$\begin{aligned} \tilde{\varvec{z}}^{*}=\arg \min _{\tilde{\varvec{z}}}L_{tot}(\varvec{\theta },\tilde{\varvec{z}})=\arg \min _{\tilde{\varvec{z}}}\left[ \sum _{x,s\notin \mathcal {O}}(\tilde{z}_{x,s}-\hat{z}_{x,s}(\varvec{\theta }))^2\right] , \end{aligned}$$
(A.7)
which immediately implies that
$$\begin{aligned} \frac{\partial L_{tot}}{\partial \tilde{\varvec{z}}}\Bigg |_{\tilde{\varvec{z}}=\tilde{\varvec{z}}^{*}}=0 \end{aligned}$$
(A.8)
and
$$\begin{aligned} L_{mis}(\varvec{\theta },\tilde{\varvec{z}}^{*})=\sum _{x,s\notin \mathcal {O}}(\hat{z}_{x,s}(\varvec{\theta })-\hat{z}_{x,s}(\varvec{\theta }))^2=0. \end{aligned}$$
(A.9)
Therefore, we can obtain
$$\begin{aligned} L_{tot}(\varvec{\theta },\tilde{\varvec{z}}^{*})=L_{obs}(\varvec{\theta })+L_{mis}(\varvec{\theta },\tilde{\varvec{z}}^{*})=L_{obs}(\varvec{\theta }), \end{aligned}$$
(A.10)
and consequentially,
$$\begin{aligned} \frac{dL_{obs}}{d\varvec{\theta }}=\frac{dL_{tot}}{d\varvec{\theta }}\Bigg |_{\tilde{\varvec{z}}=\tilde{\varvec{z}}^{*}}=\frac{\partial L_{tot}}{\partial \varvec{\theta }}\Bigg |_{\tilde{\varvec{z}}=\tilde{\varvec{z}}^{*}}+\underbrace{\frac{\partial L_{tot}}{\partial \tilde{\varvec{z}}}\Bigg |_{\tilde{\varvec{z}}=\tilde{\varvec{z}}^{*}}}_{=0 \text { from }(A.8)} \cdot \frac{\partial \tilde{\varvec{z}}}{\partial \varvec{\theta }}= \frac{\partial L_{tot}}{\partial \varvec{\theta }}\Bigg |_{\tilde{\varvec{z}}=\tilde{\varvec{z}}^{*}}, \end{aligned}$$
(A.11)
which suggests that the minimizer \(\varvec{\theta }^{*}\) of \(L_{tot}(\varvec{\theta },\varvec{z}_{imp})\) coincides with the minimizer of \(L_{obs}(\varvec{\theta })\), and thus shows that the iterative SVD algorithm implicitly solves the optimization problem specified in (4.9).
Footnotes
1
The law was repealed in early 2024 for economic reasons.
 
2
Re-calibration risk arises because model parameter estimates may become different if the model in question is fitted to an updated data set.
 
3
The relationship between least squares and a Gaussian assumption in MLE is discussed in Sect. 7.
 
4
We multiply \(\lambda \) by two for computational convenience. The use of \(2\lambda \) instead of \(\lambda \) makes no difference in the final solution.
 
Literature
1.
go back to reference Basellini U, Camarda CG (2022) Lee-Carter Cohort Mortality Forecasts. Paper presented at the 2022 European Population Conference Basellini U, Camarda CG (2022) Lee-Carter Cohort Mortality Forecasts. Paper presented at the 2022 European Population Conference
2.
go back to reference Bishop CM, Nasrabadi NM (2006) Pattern recognition and machine learning. 4(4):738. Springer, New York Bishop CM, Nasrabadi NM (2006) Pattern recognition and machine learning. 4(4):738. Springer, New York
3.
go back to reference Brouhns N, Denuit M, Van Keilegom I (2005) Bootstrapping the Poisson log-bilinear model for mortality forecasting. Scand Actuar J 3:212–224MathSciNetCrossRefMATH Brouhns N, Denuit M, Van Keilegom I (2005) Bootstrapping the Poisson log-bilinear model for mortality forecasting. Scand Actuar J 3:212–224MathSciNetCrossRefMATH
4.
go back to reference Brouhns N, Denuit M, Vermunt JK (2002) A Poisson log-bilinear regression approach to the construction of projected lifetables. Insur Math Econ 31(3):373–393MathSciNetCrossRefMATH Brouhns N, Denuit M, Vermunt JK (2002) A Poisson log-bilinear regression approach to the construction of projected lifetables. Insur Math Econ 31(3):373–393MathSciNetCrossRefMATH
6.
go back to reference Cairns AJ, Blake D, Dowd K (2006) A two-factor model for stochastic mortality with parameter uncertainty: theory and calibration. J Risk Insur 73(4):687–718 Cairns AJ, Blake D, Dowd K (2006) A two-factor model for stochastic mortality with parameter uncertainty: theory and calibration. J Risk Insur 73(4):687–718
7.
go back to reference Cairns AJ, Blake D, Dowd K, Coughlan GD, Epstein D, Khalaf-Allah M (2011) Mortality density forecasts: an analysis of six stochastic mortality models. Insur Math Econ 48(3):355–367MathSciNetCrossRefMATH Cairns AJ, Blake D, Dowd K, Coughlan GD, Epstein D, Khalaf-Allah M (2011) Mortality density forecasts: an analysis of six stochastic mortality models. Insur Math Econ 48(3):355–367MathSciNetCrossRefMATH
8.
go back to reference Cairns AJ, Blake D, Dowd K, Coughlan GD, Epstein D, Ong A, Balevich I (2009) A quantitative comparison of stochastic mortality models using data from England and Wales and the United States. North Am Actuar J 13(1):1–35MathSciNetCrossRefMATH Cairns AJ, Blake D, Dowd K, Coughlan GD, Epstein D, Ong A, Balevich I (2009) A quantitative comparison of stochastic mortality models using data from England and Wales and the United States. North Am Actuar J 13(1):1–35MathSciNetCrossRefMATH
10.
go back to reference D’Amato V, Haberman S, Russolillo M (2012) The stratified sampling bootstrap for measuring the uncertainty in mortality forecasts. Methodol Comput Appl Probab 14:135–148MathSciNetCrossRefMATH D’Amato V, Haberman S, Russolillo M (2012) The stratified sampling bootstrap for measuring the uncertainty in mortality forecasts. Methodol Comput Appl Probab 14:135–148MathSciNetCrossRefMATH
11.
go back to reference Fung MC, Peters GW, Shevchenko PV (2018) Cohort effects in mortality modelling: a Bayesian state-space approach. Annals Actuar Sci 13(1):109–144CrossRefMATH Fung MC, Peters GW, Shevchenko PV (2018) Cohort effects in mortality modelling: a Bayesian state-space approach. Annals Actuar Sci 13(1):109–144CrossRefMATH
12.
go back to reference Gillis N, Glineur F (2011) Low-rank matrix approximation with weights or missing data is NP-hard. SIAM J Matrix Anal Appl 32(4):1149–1165 Gillis N, Glineur F (2011) Low-rank matrix approximation with weights or missing data is NP-hard. SIAM J Matrix Anal Appl 32(4):1149–1165
13.
go back to reference Goodman LA (1979) Simple models for the analysis of association in cross-classifications having ordered categories. J Am Statist Assoc 74(367):537–552MathSciNetCrossRefMATH Goodman LA (1979) Simple models for the analysis of association in cross-classifications having ordered categories. J Am Statist Assoc 74(367):537–552MathSciNetCrossRefMATH
14.
15.
16.
go back to reference Hastie T, Tibshirani R, Friedman JH (2009) The elements statistical learning: data mining, inference, and prediction, 2nd edn. Springer, New YorkCrossRefMATH Hastie T, Tibshirani R, Friedman JH (2009) The elements statistical learning: data mining, inference, and prediction, 2nd edn. Springer, New YorkCrossRefMATH
17.
go back to reference Hobcraft J, Menken J, Preston S (1982) Age, period, and cohort effects in demography: a review. Popul Index 48(1):35–55CrossRefMATH Hobcraft J, Menken J, Preston S (1982) Age, period, and cohort effects in demography: a review. Popul Index 48(1):35–55CrossRefMATH
18.
go back to reference Human Mortality Database (2023) Max Planck Institute for Demographic Research, University of California, Berkeley, and French Institute for Demographic Studies. Available at www.mortality.org Human Mortality Database (2023) Max Planck Institute for Demographic Research, University of California, Berkeley, and French Institute for Demographic Studies. Available at www.mortality.org
20.
21.
go back to reference Ilin A, Raiko T (2010) Practical approaches to principal component analysis in the presence of missing values. J Mach Learn Res 11:1957–2000MathSciNetMATH Ilin A, Raiko T (2010) Practical approaches to principal component analysis in the presence of missing values. J Mach Learn Res 11:1957–2000MathSciNetMATH
22.
go back to reference Koissi MC, Shapiro AF, Högnäs G (2006) Evaluating and extending the Lee-Carter model for mortality forecasting: Bootstrap confidence interval. Insur Math Econ 38(1):1–20MathSciNetCrossRefMATH Koissi MC, Shapiro AF, Högnäs G (2006) Evaluating and extending the Lee-Carter model for mortality forecasting: Bootstrap confidence interval. Insur Math Econ 38(1):1–20MathSciNetCrossRefMATH
23.
go back to reference Li J (2014) A quantitative comparison of simulation strategies for mortality projection. Annals Actuar Sci 8(2):281–297CrossRefMATH Li J (2014) A quantitative comparison of simulation strategies for mortality projection. Annals Actuar Sci 8(2):281–297CrossRefMATH
24.
go back to reference Lee RD, Carter LR (1992) Modeling and forecasting US mortality. J Am Stat Assoc 87(419):659–671MATH Lee RD, Carter LR (1992) Modeling and forecasting US mortality. J Am Stat Assoc 87(419):659–671MATH
25.
go back to reference Li JS-H, Hardy MR, Tan KS (2009) Uncertainty in mortality forecasting: an extension to the classical Lee-Carter approach. ASTIN Bull 39:137–164MathSciNetCrossRefMATH Li JS-H, Hardy MR, Tan KS (2009) Uncertainty in mortality forecasting: an extension to the classical Lee-Carter approach. ASTIN Bull 39:137–164MathSciNetCrossRefMATH
26.
go back to reference Li JS-H, Zhou R, Liu Y, Graziani G, Hall RD, Haid J, Peterson A, Pinzur L (2020) Drivers of mortality dynamics: identifying age/period/cohort components of historical US mortality improvements. North Am Actuar J 24(2):228–250MathSciNetCrossRefMATH Li JS-H, Zhou R, Liu Y, Graziani G, Hall RD, Haid J, Peterson A, Pinzur L (2020) Drivers of mortality dynamics: identifying age/period/cohort components of historical US mortality improvements. North Am Actuar J 24(2):228–250MathSciNetCrossRefMATH
27.
go back to reference Mazumder R, Hastie T, Tibshirani R (2010) Spectral regularization algorithms for learning large incomplete matrices. J Mach Learn Res 11:2287–2322MathSciNetMATH Mazumder R, Hastie T, Tibshirani R (2010) Spectral regularization algorithms for learning large incomplete matrices. J Mach Learn Res 11:2287–2322MathSciNetMATH
28.
go back to reference Renshaw AE, Haberman S (2006) A cohort-based extension to the Lee-Carter model for mortality reduction factors. Insur Math Econ 38(3):556–570CrossRefMATH Renshaw AE, Haberman S (2006) A cohort-based extension to the Lee-Carter model for mortality reduction factors. Insur Math Econ 38(3):556–570CrossRefMATH
30.
go back to reference SriDaran D, Sherris M, Villegas AM, Ziveyi J (2022) A group regularization approach for constructing generalized age-period-cohort mortality projection models. ASTIN Bull J IAA 52(1):247–289CrossRefMATH SriDaran D, Sherris M, Villegas AM, Ziveyi J (2022) A group regularization approach for constructing generalized age-period-cohort mortality projection models. ASTIN Bull J IAA 52(1):247–289CrossRefMATH
31.
go back to reference Villegas AM, Haberman S (2014) On the modeling and forecasting of socioeconomic mortality differentials: an application to deprivation and mortality in England. North Am Actuar J 18(1):168–193MathSciNetCrossRefMATH Villegas AM, Haberman S (2014) On the modeling and forecasting of socioeconomic mortality differentials: an application to deprivation and mortality in England. North Am Actuar J 18(1):168–193MathSciNetCrossRefMATH
32.
go back to reference Villegas A, Kaishev VK, Millossovich P (2018) StMoMo: an R package for stochastic mortality modelling. J Stati Softw 84(3):1–38MATH Villegas A, Kaishev VK, Millossovich P (2018) StMoMo: an R package for stochastic mortality modelling. J Stati Softw 84(3):1–38MATH
33.
go back to reference Willets RC (2004) The cohort effect: insights and explanations. British Actuar J 10(4):833–877CrossRefMATH Willets RC (2004) The cohort effect: insights and explanations. British Actuar J 10(4):833–877CrossRefMATH
34.
go back to reference Wilmoth JR (1990) Variation in vital rates by age, period, and cohort. Sociol Methodol 20:295–335CrossRefMATH Wilmoth JR (1990) Variation in vital rates by age, period, and cohort. Sociol Methodol 20:295–335CrossRefMATH
35.
go back to reference Wilmoth JR (1993) Computational methods for fitting and extrapolating the Lee-Carter model of mortality change. Technical report, Department of Demography, University of California, Berkeley Wilmoth JR (1993) Computational methods for fitting and extrapolating the Lee-Carter model of mortality change. Technical report, Department of Demography, University of California, Berkeley
36.
go back to reference Zhou R, Wang Y, Kaufhold K, Li JS-H, Tan KS (2014) Modeling period effects in multi-population mortality models: applications to Solvency II. North Am Actuar J 18(1):150–167MathSciNetCrossRefMATH Zhou R, Wang Y, Kaufhold K, Li JS-H, Tan KS (2014) Modeling period effects in multi-population mortality models: applications to Solvency II. North Am Actuar J 18(1):150–167MathSciNetCrossRefMATH
Metadata
Title
Fast estimation of the Renshaw-Haberman model and its variants
Authors
Yiping Guo
Johnny Siu-Hang Li
Publication date
30-01-2025
Publisher
Springer Berlin Heidelberg
Published in
European Actuarial Journal
Print ISSN: 2190-9733
Electronic ISSN: 2190-9741
DOI
https://doi.org/10.1007/s13385-025-00407-w