rnorm()
function. Indeed, had this work been done in the 1970s, or perhaps the 1670s when national (versus natural) tontines were first launched, that might have been sufficient. But in the early twenty-first century, being normal doesn’t cut it and mean with standard deviations aren’t enough. In this chapter I explain (i) why that assumption is problematic and (ii) how to implement more robust models of portfolio investment returns. I will begin by returning to higher moments, such as skewness and kurtosis and their important role in testing data for normality. I then examine historical returns from the stock market during the prior half century and discuss how they deviate from normality. The chapter concludes by using a technique called statistical bootstrapping to simulate modern tontine dividends as an alternative construction method for the critical portfolio return matrix PORET
. Overall the pedagogical objective is to help readers answer the basic question: Does all this complexity make a difference?
6.1 Statement of the Historical Problem
rnorm(TH,EXR,SDR)
, where TH
was the number of years (e.g. 30 years), EXR
was the mean return (e.g. 4% per year), and SDR
was the standard deviation (e.g. 3% per year). That assumption, which is also labelled Gaussian or just bell curving should be familiar to readers from portfolio or option pricing theory. It states that future market investment returns can be described fully and exclusively in terms of their mean and standard deviation, a.k.a. the first two moments. Alas, bell curves are certainly convenient to work with, but they are a fiction in markets.6.2 Measuring Skewness and Kurtosis
rnorm(.)
method and display some summary statistics of those artificial returns. The plan is to compare those (artificial, laboratory grown) numbers to actual historical returns from the SP500 total return; the most widely known and quoted stock index in the world. To do this higher-moment analysis, you might want to install a new package into the R-studio environment, which computes statistical moments. Install via the command install.packages("moments")
.library(moments)
before you start your script. In the end though, once the warnings lights have stopped flashing and the proper boxes have been ticked, you should get results that look like this next script. If you get error messages for the skewness
and the kurtosis
command, go back and try loading the libraries again. Alternatively, you can compute those two manually (without the moment packages) based on the formulas in Chap. 2, which I’ll revisit in a moment.
set.seed(1693)
rv<-rnorm(612,0.0085,0.044)
mean(rv)
> 0.007751903
sd(rv)
> 0.04484798
median(rv)
> 0.00664067
skewness(rv)
> 0.04799061
kurtosis(rv)
> 2.607695
set.seed(1000)
, the sample skewness is − 0.04792736 (on the other side of zero) and the sample kurtosis is 2.904078 (getting closer to 3). That’s noise for you.
set.seed(1693)
rv<-rnorm(612,0.0085,0.044)
# Sample Mean
mrv<-sum(rv)/(length(rv))
# Third Cumulant Defined
m3<-sum((rv-mrv)^3)/length(rv)
# Second Cumulant Defined
m2<-(sum((rv-mrv)^2)/(length(rv)))
# Fisher’s Coefficient of Skewness
m3/m2^(3/2)
> 0.04799061
skewness(rv)
> 0.04799061
set.seed(1693)
rv<-rnorm(612,0.0085,0.044)
# Sample Mean
mrv<-sum(rv)/(length(rv))
# Fourth Cumulant Defined
m4<-sum((rv-mrv)^4)/length(rv)
# Raw Standard Deviation Defined
sdrv<-sqrt(sum((rv-mrv)^2)/(length(rv)))
# Coefficient of Kurtosis
m4/sdrv^4
> 2.607695
kurtosis(rv)
> 2.607695
sd(.)
function built into R computes the sample standard deviation, not the above-noted (and christened) raw standard deviation. The difference between them will be well-known to readers from basic statistics, but I’ll re-iterate here that the sample estimate divides all sums by (N − 1) and what I called raw divides by the original N. The following script should clarify any confusion in this matter. In particular notice and pay attention to the first few digits after the decimal sign, which in both cases represents a volatility of 4.5% per month. Indeed, the differences are small (to an empiricist).
set.seed(1693)
rv<-rnorm(612,0.0085,0.044)
# Sample Mean
mrv<-sum(rv)/(length(rv))
# Raw Standard Deviation
sqrt(sum((rv-mrv)^2)/(length(rv)))
> 0.04481132
# Sample Standard Deviation
sqrt(sum((rv-mrv)^2)/(length(rv)-1))
> 0.04484798
# As computed by R.
sd(rv)
> 0.04484798
6.3 Continuously Compounded vs. Effective Annual
PORET[i,j]
matrix, versus the continuously compounded return, which is computed via log(1+PORET[i,j])
. During the last few pages and chapters, I might have sloppily used the term returns when referring to either of these two variables, even though they are technically different from each other. One is an effective annual rate (EAR), and one is a type of annual percentage rate (APR).PORET[i,j]
begins with a normally distributed random vector with mean EXR
and standard deviation SDR
, a.k.a. the volatility. The number e, or approximately 2.718, as the base raised to the power of that vector using the built-in exp(.)
function. Finally, the numerical value of 1 is subtracted from that exponentiated value to convert the entire vector into an effective annual rate. My point here is that the exponentiation process will distort the higher moments that I introduced and explained in the prior paragraph, and especially the skewness. Here is a detailed numerical example that should help illuminate this insight.
set.seed(1693)
rv0<-rnorm(100000,0.04,0.06)
rv1<-exp(rv0)-1
# Compare Means
mean(rv0)
> 0.03991409
mean(rv1)
> 0.04258999
# Compare Deviations
sd(rv0)
> 0.0599022
sd(rv1)
> 0.0624973
# Compare Skewness
skewness(rv0)
> -0.005790745
skewness(rv1)
> 0.1724074
# Compare Kurtosis
kurtosis(rv0)
> 2.9789
kurtosis(rv1)
> 3.033556
rv0
is normally distributed, but rv1
is created by exponentiating rv0
and then subtracting one.rv0
is indeed very close to 4%, but the mean of the effective annual return vector is higher than 4%, as indeed it should be. Also, the standard deviation of the continuously compounded return is close to 6%, but for the effective annual return it’s higher than 6%, once again as it should be. More importantly, the skewness of the continuously compounded return is slightly negative but zero for two digits after the decimal point, again consistent with the symmetry (and zero theoretical skewness) of the normal distribution. But, when the rv0
vector is exponentiated to create the rv1
vector, which recall is the basis for the PORET[i,j]
matrix, raising e to the power of these values, induces statistical skewness. This is important to remember. Skewness in effective annual investment returns is not a sign of abnormality. Finally, the kurtosis of the effective annual return rv1
is just slightly higher than the corresponding value for the continuously compounded return rv0
, although both are quite close to the anticipated value of 3. No big change there.PORET[i,j]
matrix and the foundation for the simulation algorithm will have a small amount of skewness due to the exponentiation, even if the underlying return generating process was perfectly normal. Second, if you (blindly) compute the sample standard deviation of any PORET[i,j]
matrix—regardless of how it was obtained—it will likely be just a tad higher than the standard deviation SDR
used to generate the underlying returns, and what I have called volatility. Third, and just as importantly, if-and-when I happen to be sloppy and use the term expected return or standard deviation without specifying whether I’m referring to the continuously compounded rv0
or the effective annual rv1
, I should apologize in advance. But, more often than not I’ll be referring to the former rv0
and not the latter rv1
. Or, using the language or R, it’s the second and third argument in the rnorm(.)
function. Go back to Chap. 2 for a refresher.6.4 Quantile Plots of Investment Returns
rv
data using the qqnorm(rv)
and then qqline(rv)
command, which adds the diagonal line. What this picture is telling us by virtue of the points falling almost perfectly on the diagonal is that our sample (612 normal numbers generated using the 1693 seed) appears normally distributed. Ok, yes, in the lower-left corner a few data points are above the diagonal and in the upper-right corner a few points are under the diagonal, but this is consistent with the sample kurtosis value being under (the theoretical, perfect) 3. And for those readers who now worry about the accuracy of rnorm(.)
, if you want to convince yourself R’s random number generator is functioning properly, generate 612,000 monthly returns and run the same procedure. It might take a while to sort and plot them, but the points will all rest nicely on the diagonal and barely a dot will deviate. So, I might be a bit cavalier with confidence intervals here, but the point is to (i) develop some quick and easy intuition for detecting or not being able to detect normality and (ii) remembering that it very much depends on the sample size.
6.5 Serial Autocorrelation of Returns
acf(rv)
in R.
acf(rv)$acf[1:5]
> 1.000000 0.029472 0.000302 -0.025429 -0.020377
6.6 The SP500 Total Return
SP500TR
. Simple importing can be done with a menu command in the graphical R-studio environment as well. In fact, the exact syntax you must use depends on where (on your desktop) you have downloaded and stored the CSV file. You also might want to ensure there are no extraneous entries or cells in the downloaded CSV. All these minor irritants can create error messages. Also, every time you see a + in the script below, it means that this line should be appended to the previous line.
library(readr)
SP500TR <- read_csv("~/SP500TR.csv",
+ col_types = cols(MONTH = col_date(format = "%m/%d/%Y"),
+ RETURN = col_number()))
SP500TR
saved in memory, that is loaded into your environment, a simple calculation of (log one plus) mean and standard deviation should yield the following two numbers, which should further help clear up the rationale from my rather odd selection of parameters earlier in this chapter.
mean(log(1+SP500TR$RETURN))
> 0.008501958
sd(log(1+SP500TR$RETURN))
> 0.04436831
rv
and the SP500TR share the first two (log) moments, but what about the 3rd and 4th moments?
skewness(log(1+SP500TR$RETURN))
> -0.6962339
kurtosis(log(1+SP500TR$RETURN))
> 5.503288
rv_history<-log(1+SP500TR$RETURN)
qqnorm(rv_history,main="QQplot: Is the SP500TR Normal?")
qqline(rv_history)
acf(.)
command to display the autocorrelation function in R. Using the randomly simulated (612) investment returns, those serial correlation values were within the 95% confidence interval of zero—which is exactly what you would anticipate based on the underlying generating process. But what about the total monthly returns from the SP500 index over the last half century? Do they display any serial correlation? Do broad stock indices continue to go up (or down) after they have gone up (or down)? Is there (negative) momentum in monthly returns?acf(log(1+SP500TR$RETURN))$acf[2]
, which is the autocorrelation using a lag of one month, the resulting number is indeed positive with a point estimate of 0.0365. Remember, that should be interpreted as a correlation between this month and last month. But technically, one can’t reject the null hypothesis that its value is indeed zero—given that it falls within the confidence intervals noted in Fig. 6.5.
6.7 Path Forward for Deviations from LogNormality
PORET[i,j]
matrix. But how exactly? Is it economically material? The only way to actually find out is to simulate tontine values in which the first two moments—a.k.a mean and variance—remain the same, but the higher moments are modified. What I will now describe is a procedure that can be used to replace the PORET[i,j]
matrix, as opposed to a completely different modern tontine simulation procedure. This will limit the surgery (in the code) and the overall work involved. There are two different ways to do this, and I will describe both. The first one is rather basic, and that is to use an external program or economic forecasting engine to create the 10,000 paths required by PORET[i,j]
. From a business management point of view, this implies having someone other than the (tontine) quant who is designing the algorithm take responsibility for generating those asset return scenarios. I have nothing more to say about that first approach, other than to remind readers that PORET[i,j]
should reflect an asset allocation that is suitable for the clientele investing in the modern tontine. The second approach is more organic, the historical bootstrap, and involves using a new function in R.
6.8 Basic Historical Bootstrap
sample(.)
command within R, which is a rather powerful tool for simulating forward-looking investment returns. The following script samples 4 numbers from the entire vector of 612 numbers (monthly returns) stored in the SP500TR$RETURN
dataset. For variety, I will use another seed to generate this sample.
set.seed(1)
sample(SP500TR$RETURN,4)
> 0.0294 -0.0601 0.0876 -0.0364
set.seed
command ensures that everyone gets the same four numbers, no different than setting the seed before generating random numbers. In the above case, R selected the four numbers listed in the results. The first (random) number was a gain of 2.94% in the month, the second was a loss of 6.01% in a month, the third was a gain of 8.76%, and the final sample monthly return was a loss of 3.64%. Now, to look clever, I could have also sampled from the actual months themselves that are part of the SP500TR
dataset. In this case the syntax would have been:
set.seed(1)
sample(SP500TR$MONTH,4)
> "1980-09-30" "2012-05-31" "2009-03-31" "1994-11-30"
SP500TR$RETURN[SP500TR$MONTH=="1980-09-30"]
> 0.0294
PORET[i,j]
matrix with a new one. First, since we will be sampling 360 months (from a total of 612) we probably want to sample-with-replacement and allow for multiple picks and repetitions of a given month. Generally the arguments for sampling with (or without) replacement are rather deep and philosophical in the context of forward-looking investment returns, but once again this isn’t the venue for such debates. The key is that we must use a slightly modified version of the sample(.)
command in R. Here is the next step on the path to creating the modified PORET[i,j]
matrix. I will generate one possible path for the 30 years, using our original familiar 1693 seed.
set.seed(1693)
path<-sample(SP500TR$RETURN,360,replace = TRUE)
> summary(path)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.215400 -0.017200 0.009600 0.007188 0.037675 0.110400
SP500TR$MONTH[SP500TR$RETURN=="-0.2154"]
> "1987-10-30"
SP500TR$MONTH[SP500TR$RETURN=="0.1104"]
> "1984-08-31"
path
values into a large matrix to replace PORET[i,j]
, it might be worthwhile to examine the statistical properties of the (log) one-plus investment return of this one path
just created. The following script that computes the four moments should be familiar by now but is yet another check on your results. Confirm these numbers as well.
mean(log(1+path))
> 0.006101569
sd(log(1+path))
> 0.04651308
skewness(log(1+path))
> -1.152608
kurtosis(log(1+path))
> 6.991457
6.9 Monthly to Annual
set.seed(1693)
# Generate Vector of 360 Monthly Investment Returns.
vector1<-rnorm(360,0.01,0.05)
# Create Cumulative Investment Values.
vector2<-cumprod(1+vector1)
# Create Vector of Annual Values.
vector3<-vector2[seq(0, length(vector2), 12)]
vector3<-append(vector3,1,0)
# Extract the Annual Returns.
vector4<-vector3[-1]/vector3[-length(vector3)]-1
round(vector4,6)
[1] 0.073581 0.121747 0.411016 0.580719 -0.022159
[6] -0.053635 -0.150397 0.089884 0.254365 0.222345
[11] 0.411773 -0.183681 0.102055 0.117695 0.226608
[16] 0.319082 0.128694 0.015303 0.064975 0.399443
[21] -0.022303 0.084906 0.411599 0.035348 0.005110
[26] -0.274844 0.030719 0.176580 0.269325 0.343089
path
into annual numbers.
ANPATH<-function(hist_month_data,TH){
path<-sample(hist_month_data,TH*12,replace = TRUE)
path2<-cumprod(1+path)
path3<-path2[seq(0, length(path2), 12)]
path3<-append(path3,1,0)
path4<-path3[-1]/path3[-length(path3)]-1
path4}
ANPATH
function. First, I am sampling from the historical monthly return data that is captured in the first argument of this function. In particular, since TH
is measured in years, I’m sampling TH *12
months with replacement. That is the first operational line in the script. Then, once that path of monthly values has been created, the remainder of the script converts from monthly to annual, all as explained above. Here is one particular run of the ANPATH(.)
function.
set.seed(1693)
rv<-ANPATH(SP500TR$RETURN,30)
> summary(rv)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.30730 -0.01739 0.11606 0.09000 0.20254 0.42745
prod(1+rv)
> 8.994063
PORET[i,j]
. Here is that final piece, of which the loop is lifted directly from the script.
TH<-30; N<-10000
set.seed(1693)
PORET<-matrix(nrow=N,ncol=TH)
for (i in 1:N){
PORET[i,]<-ANPATH(SP500TR$RETURN,TH)
}
mean(log(1+PORET[,10]))
> 0.1018501
sd(log(1+PORET[,10]))
> 0.1528116
skewness(log(1+PORET[,10]))
> -0.2019967
kurtosis(log(1+PORET[,10]))
> 3.238949
PORET[i,j]
matrix, one that resembles the historical SP500 total return index—with negative skewness and higher kurtosis—which will be used to generate tontine fund values.6.10 How Do Higher Moments Affect Tontine Payouts?
PORET[i,j]
matrix is defined in terms of the rnorm(.)
function. Instead, replace that variable with the modified PORET
numbers and then generate the modern tontine simulation code over a 30-year horizon with the usual Gompertz mortality parameters. The only remaining tricky and sticky point is to figure out what discount rate r to use. That determines the initial tontine dividends as well as the factor used to distribute cash in later years. Recall that when returns are simulated from a LogNormal distribution and the fund incurs no expenses or extra costs, the discount rate is the expected continuously compounded investment return (CC-IR).PORET
matrix is generated exogenously, we must manually compute the sample average and hope that using that as r stabilizes the dividends. For the PORET[i,j]
matrix computed above, the arithmetic average was approximately 10%, which is what I’ll use for the r value in this one simulation. The expectation is that like the LogNormal case using that value of r will assure a stable tontine dividend profile.PORET[i,j]
matrix, implemented in the standardized tontine dashboard. The first item to notice when compared against dashboards from prior chapters, is that the median modern tontine dividend is much higher than in previously reported simulations. The initial payout of $12,550 per year far exceeds the $7074 estimated and displayed in the dashboard of Chap. 5. The reason for this almost doubling of the dividend is driven entirely by the (much) higher discount rate of 10% versus the 4% used in Chap. 5. Now, it’s not that I have changed my mind about a suitable discount rate for setting the initial tontine payout. In fact, the number 10% is woefully inappropriate and much too high. Rather, what’s driving results and this choice of r, is that the underlying assets of the tontine fund are entirely invested in the (historical) SP500 index, which experienced a growth rate of 10% over time. I had no choice but to discount and value the embedded temporary life annuity at 10%, if I wanted stable tontine dividends when the assets are allocated to SP500TR. set.seed(1693)
Modern Tontine (MoTo) Fund: Simulation Dashboard
| |||||
Lifetime Income with Refundable Death Benefit
| |||||
Statistical
|
DIVIDENDS: End of Year Number…
| ||||
Outcome
|
T = 1 |
T = 5 |
T = 10 |
T = 20 |
T = 30 |
1 pct. (worst) | $12,550 | $5,276 | $3,232 | $1,247 | $0 |
25 pct.
| $12,550 | $9,940 | $8,693 | $7,179 | $5,363 |
Median
| $12,550 | $12,619 | $12,637 | $12,475 | $11,856 |
75 pct.
| $12,550 | $16,045 | $18,205 | $21,471 | $24,650 |
99 pct. (best) | $12,550 | $27,697 | $42,803 | $76,144 | $143,983 |
St. Dev.
| $0 | $4,783 | $8,271 | $15,464 | $30,633 |
Assumptions:
| Historical SP500TR Bootstrap with: \(r\!=\!10.0\%\)
| ||||
N = 10000 | Gompertz: x = 65, m = 90 and b = 10. | ||||
TH=30
| Investors: GL
0 = 1000 with f
0 = $100, 000. |
PORET
matrix underling Table 6.1 the skewness (of the year 10 investment return, for example) was − 0.20 and the kurtosis was 3.24.PORET
matrices share that same first two moments (at least approximately) but differ in their skewness and kurtosis. Can you see any differences in results between a parametric simulation (with LogNormal returns) and a non-parametric (with historical bootstrap returns)? Pay specific attention to the standard deviation of the dividends in year T = 10, as an example. set.seed(1693)
Modern Tontine (MoTo) Fund: Simulation Dashboard
| |||||
Lifetime Income with Refundable Death Benefit
| |||||
Statistical
|
DIVIDENDS: End of Year Number…
| ||||
Outcome
|
T = 1 |
T = 5 |
T = 10 |
T = 20 |
T = 30 |
1 pct. (worst) | $12,550 | $5,517 | $3,316 | $1,274 | $240 |
25 pct.
| $12,550 | $9,912 | $8,499 | $7,037 | $5,095 |
Median
| $12,550 | $12,486 | $12,261 | $11,960 | $10,850 |
75 pct.
| $12,550 | $15,626 | $17,665 | $20,257 | $22,485 |
99 pct. (best) | $12,550 | $28,684 | $40,990 | $69,499 | $129,557 |
St. Dev.
| $0 | $4,740 | $7,931 | $14,343 | $29,199 |
Assumptions:
| Financial: \(r\!=\!10.0\%, \nu \!=\!10.0\%, \sigma \!=\!15.0\%\)
| ||||
N = 10000 | Gompertz: x = 65, m = 90 and b = 10. | ||||
TH=30
| Investors: GL
0 = 1000 with f
0 = $100, 000. |
6.11 Conclusion: How Much Should We Worry?
6.12 Test Yourself
PORET[i,j]
matrix in which 40% of the tontine fund assets are placed in a fund resembling the historical SP500 total return index, and the remaining 60% of the fund is invested in fixed income bonds that are normally distributed with a mean return of 2%, and volatility of 4%, per annum. There is no need to generate the entire modern tontine simulation, but please report the summary mean, standard deviation, skewness and kurtosis of the PORET[i,j]
matrix in the tenth year of the fund.PORET[i,j]
matrix in which investment returns are based entirely on the historical SP500 total return, but the annual returns are both floored and capped. The floor and cap are located at the upper and lower 15th percentile. What this means is that 15% of the worst months are not used or experienced by the fund, in exchange for giving up or sacrificing the 15% of best months. To be very clear, your bootstrap procedure should only use 100% or all of 612 months, but replace the extreme returns with their floored and capped values. Again, there is no need to simulate tontine dividends. Rather, report the summary statistics (a.k.a. moments) of this PORET[i,j]
matrix in the 10th year.PORET
matrix have on the skewness and kurtosis of TONDV[,10]
. This question looks at how PORET
volatility affects the skewness and kurtosis of TONDV[,10]
. Generate a PORET
matrix where volatility is 0, and the expected annual return and discount rate are 10%. Use this returns matrix to simulate modern tontine dividends and measure the skewness and kurtosis of the dividends in the 10th year.