1 Introduction
The concept of characteristics-based parametric portfolio policies introduced by Brandt et al. (
2009) provides an attractive reduction technique for determining optimal portfolio choices. By directly applying optimal portfolio weights as functions of observable characteristics, the cumbersome determination of the effects of underlying economic and financial variables on a multitude of moments of the return distributions can be avoided, which according to classical portfolio theory drive optimal portfolio weights. However, without providing a micro-foundation of parametric portfolio choice Brandt et al. (
2009) cannot identify the limitations of their approach.
A potential micro-foundation is provided by Koijen and Yogo (
2019). In their widely acclaimed contribution, these authors develop an asset pricing model with flexible heterogeneity in asset demand across investors. However, they provide it for a special utility function only, logarithmic utility and hence relative risk aversion of 1. Their framework is especially useful in modeling non-atomic investors such as large institutions and pension funds. In their framework, however, short selling constraints are required, such that optimal portfolio choice reduces to characteristics-based demand, when returns exhibit a factor structure. This allows them to construct and apply an instrumental variable estimator in order to deal with the endogeneity of demand and asset prices. Finally, these authors illustrate the power of their approach on US stock market data and investor holding data from 1980-2017.
Therefore, this article extends the approach of Koijen and Yogo (
2019) in order to allow more general preferences in order to provide for a richer and robust micro-foundation of Brandt et al. (
2009) for potential applications in portfolio choice. First, in contrast to Koijen and Yogo (
2019) we allow for general constant relative risk aversion (relative risk aversion parameter
\(\gamma \in \mathbb {R}_{>0}\)) rather than imposing log-linear utility (
\(\gamma = 1\)). Second, we extend the analysis to the case of constant absolute risk aversion. Doing so allows us to connect the demand system approach directly to the parametric portfolio approach of Brandt et al. (
2009). Third, we add the analysis in the absence of short selling constraints in order to analyze and evaluate the empirical relevance of this restriction. In this regard, we identify potential problems of the parametric portfolio policy for low enough levels of constant relative risk aversion in recent US data. Fourth, we show how a shrinkage device can be included in a simple way to “stabilize” the investment strategies and to improve performance in empirical data. Fifth, we show the existence of equilibrium in an economy with heterogeneous agents specifically for the cases of constant absolute risk aversion (CARA, also considered almost recently in Koijen et al.
2023) and constant relative risk aversion (CRRA) preferences. This result still holds if a subset of the agents, which is not necessarily proper, apply the shrinkage device proposed in this article.
Finally, we illustrate the performance of those extensions (in terms of the certainty equivalent and the Sharpe ratio) at the hand of US stock market data.
The basic insights are the following:
-
We find that
parametric portfolio policies (see Brandt et al.
2009) can be derived as optimal portfolio policies only under very restrictive assumptions. Typically, optimal portfolio investments differ from solutions to the characteristics-based approach.
-
The case of
constant absolute risk aversion generates relatively simple solutions because of the absence of wealth effects. We demonstrate that our optimal strategies with shrinkage outperform parametric portfolio policies and a simple 1/
N investment strategy (see, e.g., DeMiguel et al.
2009).
-
In the case of constant relative risk aversion, technical pitfalls have to be avoided by imposing restrictions on domains or adapting objective functions for the region of large losses. The necessity of such restrictions is demonstrated empirically at the example of S&P 500 stocks for the USA in the period from 1995 to 2013, especially for low levels of relative risk aversion. Overall, we find that the performance of the “constant relative risk aversion adaptions” is relatively poor for low levels of relative risk aversion \(\gamma\). However, the performance is improving for higher levels, both in sample and out of sample. We observe that for moderate and higher \(\gamma\) our optimal strategies with shrinkage outperform parametric portfolio policies and a simple 1/N investment strategy. For large \(\gamma\) the differences in the performance become small.
The demand systems approach can be interpreted as a reduction technique to explain asset prices as a function of a few exogenous characteristics.
Such a reduction technique is expected to reduce numerical complexity and to enhance robustness. Obviously the validity of such a procedure depends on the true underlying economic structure.
Our insights are particularly useful for popular machine learning algorithms (see, e.g., Nagel
2021), since they allow to fuse prior economic knowledge with big data on asset prices and further underlying information sources. Our analysis identifies potential, and empirically relevant pitfalls, and provides solutions to such challenges for algorithmic portfolio optimization. In particular, and in contrast with Nagel (
2021), where ridge regression is used to predict returns, we propose an algorithm that allows to shrink toward some specific portfolio weights such as the 1/
N-portfolio.
The paper is organized as follows: Section
2 provides a literature review. Section
3 presents the basic model. Section
4 presents asset demand based on constant absolute risk aversion (CARA preferences) and develops the conditions for the parametric portfolio policy as an optimal solution to the portfolio investment problem. Section
5 analyzes CRRA-preferences. Section
6 presents asset prices derived in general asset market equilibrium for both, the CARA and the CRRA preferences discussed in the sections before. Section
7 presents an empirical evaluation of the pricing theories at a sample of one-hundred S&P 500 stocks. This chapter also provides robust empirical evidence of potential pitfalls for the unchecked parametric portfolio approach. Section
8 concludes. Appendix contains a section on the properties of the empirical data, while Supplementary Material provides further technical details.
2 Literature and relations to machine learning
As already stated in Introduction, in our approach we obtain optimal portfolio weights given some characteristics (abbreviated
\({\textbf{x}}_{\text{it}}\) in the later parts of this article). In the following sections, we also investigate whether these optimal rules are equal to or at least approximately correspond to the parametric portfolio approach of Brandt et al. (
2009). In addition, we observe that the optimal rules show poor out-of-sample performance, at least in the empirical data set considered in this article. To improve on this issue, a quadratic penalty function will be included.
Our paper is not the first to discuss issues related to the missing micro-foundations of the parametric portfolio policy approach. Ammann et al. (
2016) show that the parametric portfolio policy approach implies unrealistically large amounts of implied short sales and provide conditions to render the approach more empirically appealing, and more in line with the empirical findings of Medeiros et al. (
2014). Our contribution complements these earlier studies by providing a micro-foundation for the parametric portfolio policy approach in a factor setting.
We adopt this approach to a S&P 500 sub-sample of 100 assets for the period of 1979-2013 and compare it to the optimal solution implied by the micro-founded model. Other closely related work is Hjalmarsson and Manchev (
2012), who consider the special case of mean–variance preferences. We also compare the results with the ad hoc heuristics of the 1/
N-rule (see, e.g., DeMiguel et al.
2009)
1.
Further reduction techniques and methods to stabilize and improve estimates and/or forecasts are tools recently provided in machine learning literature (for an overview, see, e.g., Nagel
2021). For example, in Nagel (
2021)[Chapter 4] ridge regression is applied to improve the forecasting performance of a predictive regression model, where a quite large set of exploratory variables is used to predict asset returns. Then, these forecasts are used for portfolio allocation.
Also Kelly et al. (
2021) use ridge regression techniques to forecast asset returns by using a large set of predictors. The authors also connect ridge regression to the Moore–Penrose pseudo-inverse (which corresponds to the case where the shrinkage parameter becomes small). In addition, the authors consider the case where the number of regression parameters becomes large and use random matrix theory to obtain asymptotic results (further theoretical results are provided in Hastie et al.
2022). In their empirical analysis, CRSP-data were used. The authors show that using a bulk of “plausibly relevant predictors” in combination with “rich nonlinear models” improves return forecasting and portfolio returns. Nonparametric regression in combination with shrinkage is applied to portfolio allocation in Freyberger et al. (
2020).
Alternatively, neural networks—in particular reinforcement learning—can be used to directly optimize the objective function of an investor (see, e.g., Cong et al.
2020). The parametric portfolio approach of Brandt and Santa-Clara (
2006); Brandt et al. (
2009) can be seen as special case of this machine learning approach (by considering a small number of predictors as well as a linear dependence structure).
In this article, we augment our objective function (that is, either CARA or CRRA expected utility) by a quadratic penalty term. In contrast with Kelly et al. (
2021), Nagel (
2021) and a lot of other ’machine learning in finance papers’ cited there, the number of predictors remains small in our analysis. The main reason for that is to keep our approach comparable to Brandt et al. (
2009) and DeMiguel et al. (
2009), that is, whether we can improve on 1/
N or parametric portfolio policies already with a small number of prediction variables.
We show that for CARA utility our optimization problem exactly corresponds to the optimization problem observed in the case of ridge regression. For constant relative risk aversion, we show that by using a second-order Taylor series approximation of the utility function the optimization problem corresponds to a ridge regression problem. Our approach allows to shrink the portfolio weights toward weights chosen by the investor (such as the equally weighted portfolio). We observe that in our empirical data the implementation of the characteristics-based approach of Koijen and Yogo (
2019) requires the application of shrinkage methods to stabilize and improve out-of-sample performance.
As is well known in the literature (see, e.g., James et al.
2017, p. 226), the ridge regression estimator corresponds to the posterior mean of the vector of regression parameters in a Bayesian regression model with normally distributed noise and a normal prior on the regression parameters (e.g.,
\(\breve{{\textbf{w}}}_t\) and covariance matrix
\(\frac{1}{c_p} {\textbf{I}}_n\) in Section
5). In our analysis, the vector of regression parameters corresponds to our portfolio weights. The ridge regression methodology easily allows to integrate a-priori information on portfolio weights. The stronger the prior on these weights the more we shrink toward the a-priori weights chosen by the investor. One prominent example is the equally weighted portfolio discussed in DeMiguel et al. (
2009). Hence, in contrast with the machine learning approaches discussed above, our approach directly allows to integrate a-priori information on investment weights.
3 Model and assumptions
We consider an economy in discrete time
\(t\in \mathbb {N}\). Denote the one-period return (or yield) of security
i from period
t to
\(t+1\) as
\(r_{it+1}\) and the gross returns as
\(R_{it+1} := 1+r_{it+1}\),
\(i=1,\dots ,N\), where
N is the number of risky assets traded.
2
In the case a risk-free asset is traded, we apply the index \(i=0\), its return is \(r_{ft+1}\), and the total number of assets is \(n=N+1\) (in sums the summation index 0 is used for the risk-free asset) otherwise \(n=N\). Denote the share price of asset i in period t by \(P_{\text{it}}\) and the number of traded shares by \(S_{\text{it}}\). Accordingly, the market value of equity of asset i is given by \(P_{\text{it}} S_{\text{it}}\) and aggregate market capitalization reads \(\sum _{i =1}^{N} P_{\text{it}} S_{\text{it}}\), while \({\textbf{P}}_t := \left( P_{1t},\dots , P_{Nt} \right) ^{ {{\,\mathrm{\top }\,}}}\) denotes the vector of share prices.
For a given set of weights
\(w_{\text{it}} \in \mathbb {R}\),
\(i=0,1,\dots ,N\), the portfolio return is
\(r_{pt+1} := \sum _{i = (N-n)+1}^{N} w_{\text{it}} r_{it+1}\), with
\(R_{pt+1} := 1 + r_{pt+1}\) denoting the portfolio’s gross return.
3
We collect observed characteristics in
\({\textbf{x}}_{\text{it}} \in \mathbb {R}^k\),
\(i= (N-n)+1,\dots ,N\), where
\({\textbf{x}}_{\text{it}}\) could contain endogenous, predetermined and or exogenous variables. For the risk-free asset
\({\textbf{x}}_{ft}\) is known in period
\(t-1\) (
\(\mathcal {F}_{t-1}\)-measureable in mathematical terms
4).
In particular, we assume that market equity (in the empirical data a stationary transformation of market equity) of asset
i, that is,
\(P_{\text{it}} S_{\text{it}}\), is contained in
\({\textbf{x}}_{\text{it}}\). Following Koijen and Yogo (
2019), let
\(\breve{{\textbf{x}} }_{\text{it}}\) contain these observed variables as well as unobserved variables. We explicitly assume that prior investment weights or amounts invested are not contained in
\(\breve{{\textbf{x}} }_{\text{it}}\). Let
$$\begin{aligned} {\textbf{y}}_{\text{it}} := \left( \begin{array}{c} \breve{{\textbf{x}}}_{\text{it}} \\ vech \left( \breve{{\textbf{x}}}_{\text{it}} \breve{{\textbf{x}}}_{\text{it}}^{{{\,\mathrm{\top }\,}}} \right) \\ \vdots \end{array} \right) \in \mathbb {R}^{k_y} \end{aligned}$$
(1)
collect terms obtained from raising the elements of
\(\breve{{\textbf{x}}}_{\text{it}}\) by
\(j=1,2, \dots\). Then, we assume that returns follow from
$$\begin{aligned} \underbrace{ \left( \begin{array}{c} R_{0t} \\ R_{1t}\\ \vdots \\ R_{Nt} \end{array} \right) }_{{\textbf{R}}_{t}}= & \underbrace{ \left( \begin{array}{c} a_{f }\\ a_{1 }\\ \vdots \\ a_{ N } \end{array} \right) }_{ {\textbf{a}} } + \underbrace{ \left( \begin{array}{cccc} {\textbf{A}}_f & \cdots & \cdots & {\textbf{0}} \\ {\textbf{0}} & {\textbf{A}}_{1} & {\textbf{0}} & {\textbf{0}}\\ \vdots & & \ddots & \\ {\textbf{0}} & \cdots & {\textbf{0}} & {\textbf{A}}_{N } \\ \end{array} \right) }_{{\textbf{A}} } \underbrace{ \left( \begin{array}{c} {\textbf{y}}_{ft} \\ {\textbf{y}}_{1t}\\ \vdots \\ {\textbf{y}}_{N} \end{array} \right) }_{ {\textbf{y}}_{t} \in \mathbb {R}^{n k} } + \underbrace{ \left( \begin{array}{c} 0 \\ \varepsilon _{1t } \\ \vdots \\ \varepsilon _{N t } \end{array} \right) }_{ \tilde{{\varvec{\varepsilon }}}_t} \ . \end{aligned}$$
(2)
\({\textbf{A}}_{j}\),
\(j=(N-n)+1,\dots ,n\), are
\(1 \times k_y\)-dimensional matrices. In the case, a risk-free asset is traded
\(\varepsilon _{f t} = {0}\), and
\(R_{ft} = a_{ f } + {\textbf{A}}_{f} {\textbf{y}}_{ft}\) already known in period
\(t-1\). The vector of noise terms
\(\tilde{{\varvec{\varepsilon }}}_t\) contains the
N-dimensional subvector
\({{\varvec{\varepsilon }}}_t\) affecting the risky assets. Its expectation is zero and covariance matrix
\(\mathbf {\Sigma }\).
To slightly simplify the analysis and in contrast with Koijen and Yogo (
2019), we did not impose a factor structure on the covariance matrix
\(\mathbf {\Sigma }\); however, this simplifying assumption can be relaxed in a straightforward way. [The above paragraphs imply that Assumption 1 of Koijen and Yogo (
2019) holds by our model assumptions.] To simplify the notation, an additional index for an investor is only included if necessary. Next we impose
By (ii), we exclude constant characteristics and colinearities between
\({\textbf{y}}_t\). That is, no characteristic is redundant. The stronger assumption of a positive definite (conditional) covariance matrix of risky returns is imposed in Section
4 only to obtain a unique optimal investment strategy. Part (i) is important for the empirical implementation of the model, since it avoids technical problems with possibly non-stationary regressors. By (iii), the conditional expectation is affine in
\({\textbf{y}}_{\text{it}}\). That is, the conditional expectation of the return of asset
i is
\(a_{i} + {\textbf{A}}_{i} {\textbf{y}}_{t}\).
We consider a sequence of myopic investment problems. There are no trading costs. In each period
t,
\(t=1,2,\dots\), an investor is endowed with wealth
\(e_{t} > 0\). This wealth can be invested into
n alternatives. Portfolio optimization traditionally involves the optimal determination of those weights
\(w_{\text{it}}\) (or amounts invested into asset
i,
\(\phi _{\text{it}}\),) with respect to a utility function, potential endowment and trading constraints.
\(\phi _{\text{it}} = e_t w_{\text{it}}\) is the amount invested in monetary units in asset
i, while
\({\textbf{w}}_{t} := \left( w_{1t},\dots ,w_{Nt} \right) ^{{{\,\mathrm{\top }\,}}} \in \mathbb {R}^{N}\) and
\({\varvec{\phi }}_{t} :=\left( \phi _{1t},\dots ,\phi _{N,t} \right) ^{{{\,\mathrm{\top }\,}}} \in \mathbb {R}^{N}\) are the investments (investment weights) into the risky assets in the following. Let
\(\mathcal {W} \subset \mathbb {R}^{n}\) and
\(\mathcal {W}_{\phi } \subset \mathbb {R}^{n}\) denote the sets of feasible strategies. Hence,
\((w_{ft},{\textbf{w}}_t)^{{{\,\mathrm{\top }\,}}} \in \mathcal {W}\) or equivalently
\((\phi _{ft}, {\varvec{\phi }}_t)^{{{\,\mathrm{\top }\,}}} \in \mathcal {W}_{\phi }\) [if no risk-free asset is traded the
\(\mathcal {W}\) and
\(\mathcal {W}_{\phi }\) are such that
\(w_{ft}=0\) and
\(\phi _{ft}=0\)]. Preferences of a typical (or representative) investor are specified by the expected utility (conditional on the information in period
t) over gross portfolio returns
\(R_{pt+1} = \sum _{i=(N-n)+1}^{n} w_{\text{it}} R_{\text{it+1}}\), resulting in the optimization problem
$$\begin{aligned} \max _{ (w_{ft}, {\textbf{w}}_t)^{{{\,\mathrm{\top }\,}}} \mathcal {W}} \mathbb {E}_t \left( {u} \left( e_t R_{pt+1} \right) \right)= & \max _{ (w_{ft}, {\textbf{w}}_t)^{{{\,\mathrm{\top }\,}}} \in \mathcal {W}} \mathbb {E}_t \left( {u} \left( E_{t+1} \right) \right) = \max _{(w_{ft}, {\textbf{w}}_t)^{{{\,\mathrm{\top }\,}}} \in \mathcal {W}} \mathbb {E}_t \left( u \left( e_t \left( 1 + \sum _{i = (N-n)+1}^{N} w_{\text{it}} r_{\text{it+1}} \right) \right) \right) \; , \ \end{aligned}$$
(3)
where
\(u(\cdot )\) is a strictly monotone increasing Bernoulli utility function defined on the domain
\(\mathbb {D} \subset \mathbb {R}\) and
\(e_t\) the wealth invested in period
t. We assume that
\(e_t\),
\(t=1,2,\dots\), are already given or fixed before any portfolio optimization is performed. Hence, in the optimization problem (
3), the
\(e_t\) invested are deterministic. Given a vector of investment weights
\({\textbf{w}}_{t} := \left( w_{1t},\dots ,w_{N t} \right) ^{{{\,\mathrm{\top }\,}}} \in \mathbb {R}^{N}\) into the risky assets, the
\(t+1\) period wealth is
\({E}_{t+1} = e_{t} {\textbf{w}}_{t}^{{{\,\mathrm{\top }\,}}} {\textbf{R}}_{t+1}\) if no risk-free asset is traded. If a risk-free asset is traded (or depositions and lending in cash are allowed), its gross return will be
\(R_{ft+1} \ge 0\);
\(w_{ft}\) is the corresponding proportion of the wealth invested into the risk-free asset at period
t. In the case a risk-free asset is traded
\({E}_{t+1} = e_{t} {\textbf{w}}_{t}^{{{\,\mathrm{\top }\,}}} {\textbf{R}}_{t+1} + e_t w_{ft} R_{fT+1}\). To jointly consider both cases, we write
\({E}_{t+1} = e_{t} {\textbf{w}}_{t}^{{{\,\mathrm{\top }\,}}} {\textbf{R}}_{t+1} + e_t w_{ft} R_{fT+1}\), and assume that
\(w_{ft}=0\) if no risk-free asset is traded. As already stated above, note that the next period’s amount invested,
\(e_{t+1}\), need not be equal to the realization of
\({E}_{t+1}\). By contrast,
\(e_{t+1}>0\) is some non-random real number.
The model presented in the above paragraphs is closely related to the model of Koijen and Yogo (
2019). In contrast, however, we allow for the case without short selling constraints.
6 To simplify notation, we do not consider time varying
\({\textbf{A}}\),
\({\textbf{a}}\), and
\(\mathbf {\Sigma }\).
7
Since both, Koijen and Yogo (
2019) and our model, result in optimal strategies which are affine in some variables considered, we relate our approach to the parametric portfolio approach proposed in Brandt et al. (
2009), where strategies considered are typically affine in some characteristics. Note that the parametric portfolio policy of Brandt et al. (
2009) reduces the dimensionality of the optimization problem by modeling a small number of drivers of the portfolio weights directly.
8 Often the number of the drivers
\(\tilde{{\textbf{x}}}_{\text{it}}\) is very low (e.g., 3 in our empirical setting below), and only investments into risky assets are considered (
\(w_{ft}=0\)). Typically,
\(\tilde{{\textbf{x}}}_{\text{it}}\) is a vector of standardized variables. That is, we consider exogenous or predetermined variables
\({\varvec{\chi }}_{\text{it}}\) contained in
\({\textbf{x}}_{\text{it}}\). For the variables
\({\varvec{\chi }}_{\text{it}} \in {\textbf{R}}^{k_{\chi }}\), we subtract the vector of sample means and multiply by the inverse of the diagonal matrix containing the sample standard deviations on the main diagonal, which results in
\(\tilde{{\textbf{x}}}_{\text{it}}\).
\({\varvec{\theta }}= (\theta _1,\dots ,\theta _{k_{\chi }})^{{{\,\mathrm{\top }\,}}}\) is a
\({k_{\chi }}\)-dimensional parameter vector in the parameter space
\(\Theta \subset \mathbb {R}^{k_{\chi }}\); if not otherwise stated
\(\Theta = \mathbb {R}^{k_{\chi }}\).
\({\varvec{\theta }}\) is assumed to be constant over time and is chosen such that expression (
3) is maximized.
Following Brandt et al. (
2009), we focus on the affine parametric portfolio policy
$$\begin{aligned} w_{\text{it}} = {\bar{w}}_{\text{it}} + \frac{1}{N} {\varvec{\theta }}^{{{\,\mathrm{\top }\,}}} \tilde{{\textbf{x}}}_{\text{it}} \ , \hbox { for all}\ i= 1,\dots ,N \ . \end{aligned}$$
(4)
In all applications, we work with
\({\bar{w}}_{\text{it}} = 1/N\), where
\(\bar{{\textbf{w}}}_t := \left( {\bar{w}}_{1t},\dots , {\bar{w}}_{Nt} \right) ^{{{\,\mathrm{\top }\,}}}\). Some further results on this form of parametric portfolio policies are provided in Supplementary Material
S.5.
The vector
\(\tilde{{\textbf{x}}}_{\text{it}}\) denoting the parametric portfolio policies does not need to be identical to the observed characteristics driving expected returns. To simplify notation and to provide a fair comparison between parametric strategies and some (approximately) optimal strategies obtained later, in our empirical analysis we set
\({{\textbf{x}}}_{\text{it}} = {\varvec{\chi }}_{\text{it}}\) (hence also
\(k={k}_{\chi }\)), where the standardized characteristics
\(\tilde{{\textbf{x}}}_{\text{it}}\) are assumed to be stationary. Finally, in order to compare the optimal strategies derived from our model with the affine parametric policy (
4) we define
9:
By using Definition
1, an ACBOV strategy is also ACB, while ACBOV demands for
\(w_{\text{it}}\) or
\(\phi _{\text{it}}\) to depend only on (a subvector of)
\({\textbf{y}}_{\text{it}}\) contained in
\({\textbf{y}}_{t}\). ACB or ACBOV strategies need not be parametric in a very narrow sense, since even if the investment weights are affine in
\({\textbf{y}}_t\) or
\(\tilde{{\textbf{x}}}_{\text{it}}\), a matrix like
\(\mathbf {\Pi }_{\text{it}}\) need not follow from solving an optimization problem in some parameter vector
\({\varvec{\theta }}\). However, if
\(\mathbf {\Pi }_{\text{it}} {{\textbf{x}}}_{t}\) is equal to
\(\frac{1}{N} {\varvec{\theta }}^{{{\,\mathrm{\top }\,}}} \tilde{{\textbf{x}}}_{\text{it}}\), then the optimal strategy can be
implemented by using a parametric policy. In more detail,
\(\mathbf {\Pi }_{\text{it}} {{\textbf{y}}}_{t} = \frac{1}{N} {\varvec{\theta }}^{{{\,\mathrm{\top }\,}}} \tilde{{\textbf{x}}}_{\text{it}}\) demands for
\({\textbf{B}} {\varvec{\chi }}_{\text{it}}\), where
\({\textbf{B}}\) is a submatrix of
\(\mathbf {\Pi }_{\text{it}}\) (equal for all
i and
t). Then, by using the population mean and the population standard deviation of
\({\varvec{\chi }}_{\text{it}}\),
\({\textbf{B}} {\varvec{\chi }}_{\text{it}}\) can be expressed by means of
\(\frac{1}{N} {\varvec{\theta }}^{{{\,\mathrm{\top }\,}}} \tilde{{\textbf{x}}}_{\text{it}}\) plus a constant term.
The question whether an optimal strategy can be implemented by the reduction strategy (
4) is discussed Sections
4 and
5. For an investor, the question arises whether an optimal strategy after performing parameter estimation is really better than a simple reduction strategy or the “1/
N-rule.” This question will be investigated in our empirical Section
7.
Next we develop asset demand for the cases of constant absolute risk aversion (Section
4) and then for constant relative risk aversion (Section
5).
4 Parametric portfolio policies with constant absolute risk aversion
In this section, we explore constant absolute risk aversion by applying a Bernoulli utility function \(u(x) = -\exp (-\rho x)\), \(x \in \mathbb {R}\), where the parameter \(\rho >0\) expresses constant relative risk aversion, defined by \(\frac{u''(x)}{u'(x)} = \rho\). The domain \(\mathbb {D}\) of this function is the real axis.
For the CARA case, it is easier to work with the amounts invested \({{\varvec{\phi }}}_{t}\). The weights of investments into the N risky assets follow from \({{\varvec{\phi }}}_{t}\) and \(e_t\), that is, \({\textbf{w}}_t = \frac{1}{ {\textbf{1}}_N^{{{\,\mathrm{\top }\,}}} {{\varvec{\phi }}}_{t}} {{\varvec{\phi }}}_{t}\).
The portfolio vector of risky investments is \({\varvec{\phi }}_{t} =\left( \phi _{1t},\dots ,\phi _{Nt} \right) ^{{{\,\mathrm{\top }\,}}} \in \mathbb {R}^N\), where \(\phi _{\text{it}}\) is the money amount invested into risky asset i at period t. The amount invested in the risk-free asset is \(\phi _{ft} = e_t - {\varvec{\phi }}_t^{{{\,\mathrm{\top }\,}}} {\textbf{1}}_{N}\) if a risk-free asset is traded, and \(\phi _{ft} =0\), \(\forall t\), otherwise. Hence, the value of the portfolio in period \(t+1\) is a random variable and given by \(E_{{t + 1}} = e_{t} \left( {w_{{ft}} R_{{ft}} + \sum\nolimits_{{i = 1}}^{N} {w_{{{\text{it}}}} } R_{{it + 1}} } \right) = \phi _{{ft}} R_{{ft + 1}} + \sum\nolimits_{{i = 1}}^{N} {\phi _{{{\text{it}}}} } R_{{it + 1}} = \sum\nolimits_{{i = 1}}^{N} {\phi _{{{\text{it}}}} } R_{{it + 1}} + \left( {e_{t} - \sum\nolimits_{{i = 1}}^{N} {\phi _{{{\text{it}}}} } } \right)R_{{ft + 1}} = \phi _{t}^{{{\kern 1pt} { \top }{\kern 1pt} }} {\mathbf{R}}_{{t + 1}} + \left( {e_{t} - \sum\nolimits_{{i = 1}}^{N} {\phi _{t}^{{{\kern 1pt} { \top }{\kern 1pt} }} } {\mathbf{1}}_{N} } \right)R_{{ft}}\), where \({\textbf{R}}_{ t+1}\) denotes the vector of risky returns and \({\varvec{\phi }}_{t} \in \Theta = \mathbb {R}^N\). In this section, we impose:
We first analyze optimal investment strategies and then apply a shrinkage procedure.
4.1 Optimal strategy
Using the assumption of normally distributed innovations in the absence of transactions costs, we can derive conditional expected utility
$$\begin{aligned} \mathbb {E}_t (-\exp (-\rho E_{t+1}))= & -\exp \left[ -\rho e_t - \rho {\varvec{\phi }}_{t}^{{{\,\mathrm{\top }\,}}} \left( \mathbb {E}_t \left( {\textbf{R}}_{t+1} \right) - {\textbf{1}}_N R_{ft+1} \right) + \frac{\rho ^2}{2} {\varvec{\phi }}_{t}^{{{\,\mathrm{\top }\,}}} \mathbb {V}_{t} ({\textbf{R}}_{t+1} ) {\varvec{\phi }}_{t} \right] \ . \end{aligned}$$
(6)
Maximizing (
6) yields the vector of optimal amounts invested into the risky assets
$$\begin{aligned} {\varvec{\phi }}_t^* \left( {\textbf{x}}_t \right)= & \left( \underbrace{ \frac{1}{\rho } \left( \mathbb {V}_{t} \left( {\textbf{R}}_{t+1} \right) \right) ^{-1} }_{{\textbf{B}}_t} \right) ^{-1} \underbrace{ \left( \mathbb {E}_t \left( {\textbf{R}}_{t+1} \right) - R_{ft} {\textbf{1}}_N \right) }_{{\textbf{b}}_t} \ . \end{aligned}$$
(7)
The remaining wealth
\(\phi _{ft} = e_t - {\textbf{1}}_N^{{{\,\mathrm{\top }\,}}} {\varvec{\phi }}_t^* \in \mathbb {R}\) is invested into the risk-free asset. In case when a risk-free asset is not available, we can establish the following result:
10$$\begin{aligned} {\varvec{\phi }}_t^+ \left( {\textbf{x}}_t \right)= & \frac{1}{\rho } \mathbb {V}_{t} \left( {\textbf{R}}_{t+1} \right) ^{-1} \left( \mathbb {E}_t \left( {\textbf{R}}_{t+1} \right) - \frac{ \rho \left( \frac{1}{\rho } {\textbf{1}}_N^{{{\,\mathrm{\top }\,}}} \mathbb {V}_{t} \left( {\textbf{R}}_{t+1} \right) ^{-1} \mathbb {E}_t \left( {\textbf{R}}_{t+1} \right) - e_t \right) }{{\textbf{1}}_N^{{{\,\mathrm{\top }\,}}} \mathbb {V}_{t} \left( {\textbf{R}}_{t+1} \right) ^{-1} {\textbf{1}}_{N} } {\textbf{1}}_N \right) \ . \end{aligned}$$
(8)
From (
7) and (
8) we conclude:
Equations (
1),(
7) and the assumption of an unrestricted covariance matrix
\(\mathbf {\Sigma }\) show that the optimal investments into a risky asset
i depend on
\({\textbf{y}}_{1t}, \dots , {\textbf{y}}_{Nt}\). We get dependence on
\({\textbf{y}}_{\text{it}}\) only in the case of a diagonal covariance matrix. Hence, our assumption on the covariance matrix implies that in general the optimal investment strategy cannot by supported by the reduction strategy (
4).
This is an important difference to Koijen and Yogo (
2019), where a factor structure for the covariance matrix is assumed. Then, the Woodbury matrix identity and some algebra (see Koijen and Yogo
2019, equation(A.6)) allow to derive an optimal strategy which is affine in
\({\textbf{y}}_{\text{it}}\). Based on some empirical literature (see, e.g., Barigozzi and Brownlees
2019, who demonstrate that after adjusting for factors some correlation (some network effects) are present) we decided to work with an unrestricted covariance matrix. The result discussed in this paragraph also remains valid for the CARA case with shrinkage as well as for the CRRA approximation provided in Section
5.
11
4.2 CARA utility and shrinkage
Let us now analyze the general case with or without short selling and apply a shrinkage procedure. By maximizing expected utility (
6), we get:
Figure
1 plots realized returns
\(R_{pt+1}\) for
\({\varvec{\phi }}_t^*\), the parametric policy
\({\varvec{\phi }}_{t}^{\sharp }\) and the 1/
N-strategy
\({\varvec{\phi }}_{t}^{1/N}\) [since
\(e_t=1\),
\({\varvec{\phi }}_{t}^{\sharp } ={\textbf{w}}_t^{\sharp }\) and
\({\varvec{\phi }}_{t}^{1/N} ={\textbf{w}}_t^{1/N}\);
\(w_{ft}^{\sharp } ={w}_{ft}^{1/N} =0\)]. Note that the vertical axes have different scales and the variation in the returns becomes very large with
\({\varvec{\phi }}_t^*\). In our empirical data set, this results in poor out-of-sample performance. To circumvent this problem, we augment the optimization problem by a shrinkage device. In terms of econometrics, we consider a ridge regression problem
12,
while in terms of finance we add an object close to a quadratic cost term. Although also a cost function as described in Section S.1.1 can be used as some kind of punishment function, we want to exclude path-dependence [as discussed in Supplementary Material
S.5.2] and shrink
\({\varvec{\phi }}_t\) or the weights
\({\textbf{w}}_t\) toward some specific values
\(\breve{{\varvec{\phi }}}_t\) or weights
\(\breve{{\textbf{w}}}_t\), respectively. This is close in spirit to approaches like Black and Litterman (
1992) that anchor an optimization in some pre-specified portfolio. This portfolio could be an 1/
N-portfolio as in our empirical example. But it could also be the market portfolio motivated by CAPM equilibrium considerations or a long-term strategic allocation specific to an (institutional) investor.
Hence, we consider a positive definite \(N \times N\) matrix \({\textbf{C}}_t\) and the punishment term \(-\frac{1}{2} \left( {\varvec{\phi }}_t - {\check{{\varvec{\phi }}}}_t \right) ^{{{\,\mathrm{\top }\,}}} {\textbf{C}}_t \left( {\varvec{\phi }}_t - {\check{{\varvec{\phi }}}}_t \right)\). With \({\check{{\varvec{\phi }}}}_t = \check{{\textbf{w}}}_t = {\textbf{0}}_{N \times 1}\), we shrink to zero, while with \({\check{{\varvec{\phi }}}}_t = e_t \frac{1}{N} {\textbf{1}}_{N \times 1}\) shrink toward the 1/N-portfolio.
By using transformed expected utility (
6), the (possible) short selling constraints
\(\phi _{\text{it}} \ge 0\), and the shrinkage device, we get
\({b}_{0t} := \rho e_t\),
\({\textbf{b}}_{t} := \rho \left( \mathbb {E}_t \left( {\textbf{R}}_{t+1} \right) - {\textbf{1}}_N R_{ft+1} \right) ^{{{\,\mathrm{\top }\,}}}\),
\({\textbf{B}}_{t} := \frac{\rho ^2}{2} \mathbb {V}_{t} ( {\textbf{R}}_{t+1} - {\textbf{1}}_N R_{ft+1} ) = \frac{\rho ^2}{2} \mathbb {V}_{t} ( {\textbf{R}}_{t+1} )\), and the Lagrangian
$$\begin{aligned} L({\varvec{\phi }}_t,\lambda _{1t},\dots ,\lambda _{nt}) = {b}_{0t} + {\textbf{b}}_t {\varvec{\phi }}_t - \frac{1}{2} {\varvec{\phi }}_t^{{{\,\mathrm{\top }\,}}} {\textbf{B}}_t {\varvec{\phi }}_t + \sum _{i=1}^n \lambda _{\text{it}} \phi _{\text{it}} - \frac{1}{2} \left( {\varvec{\phi }}_t - {\check{{\varvec{\phi }}}}_t \right) ^{{{\,\mathrm{\top }\,}}} {\textbf{C}}_t \left( {\varvec{\phi }}_t - {\check{{\varvec{\phi }}}}_t \right) \ . \end{aligned}$$
Let
\({\varvec{\lambda }}_t : = \left( \lambda _{1t}, \dots , \lambda _{Nt} \right) ^{{{\,\mathrm{\top }\,}}}\). Taking first partial derivatives with respect to
\({\varvec{\phi }}_t\) and
\({\varvec{\lambda }}_t\), we get the Kuhn–Tucker conditions
$$\begin{aligned} \frac{\partial L ({\varvec{\phi }}_t,{\varvec{\lambda }}_{t})}{\partial {\varvec{\phi }}_{t}^{{{\,\mathrm{\top }\,}}} }= & {\textbf{b}}_t - {\textbf{B}}_t {\varvec{\phi }}_t + {\varvec{\lambda }}_t - {\textbf{C}}_t \left( {\varvec{\phi }}_t - {\check{{\varvec{\phi }}}}_t \right) = {\textbf{0}}_{N}\ , \end{aligned}$$
(9)
$$\begin{aligned} \frac{\partial L ({\varvec{\phi }}_t,{\varvec{\lambda }}_{t})}{\partial \lambda _{\text{it}} }= &\, \phi _{\text{it}} = 0 \ , \ i=1,\dots ,N \ , \text { and the complementary slackness conditions} \nonumber \\ 0= &\, \lambda _{\text{it}} \frac{\partial L ({\varvec{\phi }}_t,{\varvec{\lambda }}_{t})}{\partial \lambda _{\text{it}} } = \lambda _{\text{it}} \phi _{\text{it}} \ , \ i=1,\dots ,N \ . \end{aligned}$$
(10)
\(\phi _{ft} = e_t - \sum _{i=1}^{N} \phi _{\text{it}}\) in the case a risk-free asset is traded. The second-order conditions are satisfied by the quadratic structure of the optimization problem (see, e.g., Simon and Blume
1994, Chapter 19.3). If no short selling constraints are binding or if we consider an optimization problem without short selling constraints, we obtain
\({\varvec{\lambda }}_t= {\textbf{0}}_N\) and
$$\begin{aligned} {\varvec{\phi }}_t^*= & \left( {\textbf{B}}_t + {\textbf{C}}_t \right) ^{-1} \left( {\textbf{b}}_t + {\textbf{C}}_t {\check{{\varvec{\phi }}}}_t \right) \ . \end{aligned}$$
(11)
Let
\({\textbf{C}}_t\) be equal to
\(c_p {\textbf{I}}_N\), then (
11) yields
$$\phi _{t}^{{\flat }} : = \frac{1}{\rho }\left( {{\mathbb{V}}_{t} \left( {\left( {R_{{t + 1}} - R_{{ft + 1}} 1_{n} } \right)\left( {R_{{t + 1}} - R_{{ft + 1}} 1_{n} } \right)^{{{\kern 1pt} { \top }{\kern 1pt} }} } \right)} \right.{\text{ }}\left. { + c_{p} I_{n} } \right)^{{ - 1}} \left( {{\mathbb{E}}_{t} \left( {R_{{t + 1}} - R_{{ft + 1}} 1_{n} } \right) + \rho c_{p} \mathop \phi \limits^{} _{t} } \right).\;$$
(12)
Observe that the optimal investments
\({\varvec{\phi }}_t^{\flat } \in \mathbb {R}^N\) do not depend on the wealth level
\(e_t\),
\(\phi _{ft}^{\flat } = e_t - \sum _{i=1}^{N} \phi _{\text{it}}^{\flat }\). For
\(c_p=0\), we arrive at an optimization problem without shrinkage (where
\({\varvec{\phi }}_{t}^{\flat } = {\varvec{\phi }}_{t}^{*}\)), while the larger
\(c_p\) the more we shrink toward
\({\check{{\varvec{\phi }}}}_t = e_t \check{{\textbf{w}}}_t\). To see this, for large
\(c_p\), the terms multiplied by
\(c_p\) become the dominating terms. Hence, for large
\(c_p\),
\({\varvec{\phi }}_t^{\flat } \approx \frac{1}{\rho } \left( c_p {\textbf{I}}_n \right) ^{-1} \left( \rho c_p {\check{{\varvec{\phi }}}}_t \right) = {\check{{\varvec{\phi }}}}_t\). Summing up, we get
Panel (b) of Figure
1 plots the realized returns when applying
\({\varvec{\phi }}_t^{\flat }\) with
\(c_p=0.2\) and shrinkage to the 1/
N-portfolio; since
\(e_t=1\) we get
\(\breve{{\varvec{\phi }}}_t = \frac{1}{N} e_t {\textbf{1}}_N = \frac{1}{N} {\textbf{1}}_N\). When looking at the scale of the ordinate, we observe that the variation in the returns
\(R_{pt}\) decreases a lot. In the case of binding short selling constraints, the system of inequalities (
9) can be transformed to a linear programming problem. However, we observed that due to a high number of assets the optimal weights under short selling constraints can hardly be obtained by applying standard linear programming methods.
Hence, we applied numerical tools to obtain the optimal investments
\(\phi _{\text{it}}^{\flat ,\ge 0}\) described by (
9). Here the MATLAB function
fminsearch
is used, where we start the optimization routine from
\(\max \{ 0, \phi _{\text{it}}^{\flat } \}\),
\(i=1,\dots ,N\).
5 Constant relative risk aversion
Let us now focus on constant relative risk aversion (CRRA). First, this section demonstrates potential pitfalls arising from parametric portfolio policies (that is, applying (
4)) and a Bernoulli utility function defined in
\(\mathbb {R}_{>0}\). For CRRA, the Bernoulli utility function is
\(v(x) := \frac{x^{1-\gamma }}{1-\gamma }\) for
\(\gamma >0\),
\(\gamma \not =1\) and
\(\ln x\) for
\(\gamma =1\). The domain
\(\mathbb {D}\) of
v(
x) is the positive half-line
\(\mathbb {R}_{>0}\). Given Assumption
1 and the second-order condition [see (
S-4) in the Supplementary Material], expected utility is strictly concave in
\({\varvec{\theta }}\). However, for CRRA preferences in a simple binary model, examples can be constructed where the portfolio returns
\(R_{pt}\) do not remain in the domain
\(\mathbb {D} = \mathbb {R}_{>0}\) or where for a concave utility function, the first derivative always stays positive (or negative), such that only a supremum exists. Hence, no optimal
\({\varvec{\theta }}\in \mathbb {R}^k\) exists in these cases, see equation (S-4) and Gehrig et al. (
2018)]. Therefore, we obtain
In the next steps, we investigate whether Observation
3 is also relevant for real-world data. To do this, let us now apply the parametric portfolio policy approach in its original version of Brandt et al. (
2009) to US stocks that are particularly relevant for institutional investors, namely S&P 500 stocks; [see Section
7.1 and Appendix
A]. Our observations cover the time span from 04/1979 to 12/2013, which amounts to
\(T=415\) and
\(N=100\).
Consider for example the strategy defined in (
4). Since relative risk aversion is only defined on the domain of positive gross returns, we need to check the underlying data and potentially develop a strategy of how to deal with negative gross returns.
In order to analyze whether negative portfolio returns are observed in the underlying empirical data, we pick some \({\varvec{\theta }}\in \mathbb {R}^3\) and check whether \(R_{pt+1}\) becomes negative. And indeed, it turns out that in all the cases considered we observe negative \(R_{pt+1}\) for large \({\varvec{\theta }}\) (in absolute terms), one large coordinate of \({\varvec{\theta }}\) turned out to be sufficient for negative gross returns.
As demonstrated and discussed in more detail in Supplementary Material
S.4, we extend the domain to the real line by applying the utility function
$$\begin{aligned} v_{\flat } \left( e_t R_{pt+1}\right):= & {\left\{ \begin{array}{ll} v \left( e_t R_{pt+1} \right) & , \text { for} R_{pt+1} \ge \psi _{R}, \\ \left( v \left( e_t \psi _{R} \right) - v' \left( e_t \psi _{R} \right) \right) + v' \left( e_t \psi _{R} \right) e_t R_{pt+1} & , \hbox { for}\ R_{pt+1} < \psi _{R} , \end{array}\right. } \end{aligned}$$
(13)
where
\(\psi _{R}>0\). With (
13), we apply
\(v \left( e_t R_{pt+1} \right)\) for all
\(R_{pt+1} \ge \psi _{R}\). At
\(R_{pt+1} = \psi _{R}\), we get
\(v_{\flat } \left( e_t \psi _{R} \right) = v \left( e_t \psi _{R} \right) = \left( v \left( e_t \psi _{R} \right) - v' \left( e_t \psi _{R} \right) \right) + v' \left( e_t \psi _{R} \right) e_t \psi _{R}\). Observe that
\(v_{\flat }' \left( e_t \psi _{R} \right) =v' \left( e_t \psi _{R} \right)\) is equal to the slope of the line described by
\(\left( e_t v \left( \psi _{R} \right) - 1 \cdot v' \left( e_t \psi _{R} \right) \right) + 1 \cdot v' \left( e_t \psi _{R} \right) R_{pt+1}\).
Using these insights, we consider the optimization problem (
3), where preferences are described by the
approximate CRRA utility function \(v_{\flat }\). We assume that a risk-free asset and
N risky assets are traded; the portfolio weights are
\({\textbf{w}}_{t} = {\varvec{\phi }}_{t}/e_t\) for the investments in the risky securities and
\(w_{ft} =\phi _{ft}/e_t\) for the risk-free asset. Hence,
\(n=N+1\).
By a Taylor series approximation of expected utility at
\(w_{ft}=1\) and
\({\textbf{w}}_{t} = \left( w_{1t},\dots , w_{nt} \right) ^{{{\,\mathrm{\top }\,}}}= {\textbf{0}}_N\), we obtain
$$\begin{aligned} \mathbb {E}_t \left( v_{\flat } ({E}_{t+1}) \right)& =\,\mathbb {E}_t \left( v_{\flat } ({e}_{t} R_{pt+1} ) \right) = \mathbb {E}_t \left( v_{\flat } ({e}_{t} \left( R_{ft+1} + {\textbf{w}}_t \left( {\textbf{R}}_{t+1} - R_{ft+1} {\textbf{1}}_N \right) \right) ) \right) \nonumber \\ \approx&\underbrace{ v_{\flat } \left( {e}_{t+1} R_{ft+1} \right) }_{=: \alpha _{0t}} + \underbrace{ e_t v_{\flat }^{\prime } ( {e}_{t} R_{ft+1} ) \mathbb {E}_t \left( {\textbf{R}}_{t+1} - R_{ft+1} {\textbf{1}}_N \right) }_{=: {\varvec{\alpha }}_t} {\textbf{w}}_t \nonumber \\&- \frac{1}{2} {\textbf{w}}_t^{{{\,\mathrm{\top }\,}}} \underbrace{ \left( - v_{\flat }^{\prime \prime } \left( {e}_{t+1} R_{ft+1} \right) e_t^2 \mathbb {E}_t \left( \left( {\textbf{R}}_{t+1} - R_{ft+1} {\textbf{1}}_N \right) \left( {\textbf{R}}_{t+1} - R_{ft+1} {\textbf{1}}_N \right) ^{{{\,\mathrm{\top }\,}}} \right) \right) }_{:=\mathcal {A}_t} {\textbf{w}}_t \ . \end{aligned}$$
(14)
In the following optimization problem, we also allow for short selling constraints, more specifically
\(w_{\text{it}} \ge 0\). (see also Koijen and Yogo
2019, for a model with log-utility).
In our empirical data (see Sect.
A for more details), especially, the out-of-sample performance of the approximately optimal strategy
\({\textbf{w}}_t = \mathcal {A}_t^{-1} {\varvec{\alpha }}_t\) is very poor and very risky. Hence, similar to Sect.
4 we proceed with a shrinkage device. We consider a positive definite
\(N \times N\) matrix
\({\textbf{C}}_t\) and the term
\(-\frac{1}{2} \left( {\textbf{w}}_t - \check{{\textbf{w}}}_t \right) ^{{{\,\mathrm{\top }\,}}} {\textbf{C}}_t \left( {\textbf{w}}_t - \check{{\textbf{w}}}_t \right)\), where with
\(\check{{\textbf{w}}}_t = {\textbf{0}}_{N \times 1}\) we shrink to zero, while with
\(\check{{\textbf{w}}}_t = \frac{1}{N} {\textbf{1}}_{N \times 1}\) shrink toward the 1/
N portfolio
13.
By using the expected utility approximation (
14), the short selling constraints and the shrinkage device, we get the Lagrangian
$$\begin{aligned} L({\textbf{w}}_t,\lambda _{1t},\dots ,\lambda _{Nt}) = \alpha _0 + {\varvec{\alpha }}_t {\textbf{w}}_t - \frac{1}{2} {\textbf{w}}_t^{{{\,\mathrm{\top }\,}}} \mathcal {A}_t {\textbf{w}}_t + \sum _{i=1}^N \lambda _{\text{it}} w_{\text{it}} - \frac{1}{2} \left( {\textbf{w}}_t - \check{{\textbf{w}}}_t \right) ^{{{\,\mathrm{\top }\,}}} {\textbf{C}}_t \left( {\textbf{w}}_t - \check{{\textbf{w}}}_t \right) \ . \end{aligned}$$
Taking first partial derivatives with respect to
\({\textbf{w}}_t\) and
\({\varvec{\lambda }}_t\), we get the Kuhn–Tucker conditions
$$\begin{aligned} \frac{\partial L ({\textbf{w}}_t,{\varvec{\lambda }}_{t})}{\partial {\textbf{w}}_{t}^{{{\,\mathrm{\top }\,}}} }= & {\varvec{\alpha }}_t - \mathcal {A}_t {\textbf{w}}_t + {\varvec{\lambda }}_t - {\textbf{C}}_t \left( {\textbf{w}}_t - \check{{\textbf{w}}}_t \right) = {\textbf{0}}_{N}\ , \end{aligned}$$
(15)
$$\begin{aligned} \frac{\partial L ({\textbf{w}}_t,{\varvec{\lambda }}_{t})}{\partial \lambda _{\text{it}}}= &\, w_{\text{it}} = 0 \ , \ i=1,\dots ,N \ , \text { and the complementary slackness conditions} \nonumber \\ 0= &\, \lambda _{\text{it}} \frac{\partial L ({\textbf{w}}_t,{\varvec{\lambda }}_{t})}{\partial \lambda _{\text{it}} } = \lambda _{\text{it}} w_{\text{it}} \ , \ i=1,\dots ,N \ . \end{aligned}$$
(16)
The second-order conditions are satisfied by the quadratic structure of the optimization problem. If no short selling constraints are binding or if we consider an optimization problem without short selling constraints, we obtain
\({\varvec{\lambda }}_t= {\textbf{0}}_N\) and
$$\begin{aligned} {\textbf{w}}_t= &\, \left( \mathcal {A}_t + {\textbf{C}}_t \right) ^{-1} \left( {\varvec{\alpha }}_t + {\textbf{C}}_t \check{{\textbf{w}}}_t \right) \nonumber \\= &\, \left( \mathcal {A}_t + {\textbf{C}}_t \right) ^{-1} \left( {\varvec{\alpha }}_t + \frac{ e_t R_{ft+1} v_{\flat }^{\prime } ( {e}_{t} R_{ft+1} ) }{ e_t R_{ft+1} v_{\flat }^{\prime } ( {e}_{t} R_{ft+1} )} {\textbf{C}}_t \check{{\textbf{w}}}_t \right) \ . \end{aligned}$$
(17)
Let
\(M_{APR} ({e}_{t+1} R_{ft+1})\) denote the relative Arrow–Pratt measure evaluated at
\({e}_{t+1} R_{ft+1}\). Since
\(R_{ft+1} \ge 1\) and usually close to one, we get
\(\frac{R_{ft+1} }{ M_{APR} ({e}_{t+1} R_{ft+1}) } \approx \frac{1}{M_{APR} ({e}_{t} R_{ft+1}) }\). In realistic scenarios,
\(\Psi _R\) can be chosen such that
\(e_{t+1} R_{ft+1}> \Psi _R>0\). In this case, we Taylor expand at the classical CRRA branch of the Bernoulli utility function
\(v_{\flat }\), that is,
\(\frac{x^{1-\gamma }}{1-\gamma }\).
In the following, let
\({\textbf{C}}_t\) be a diagonal matrix such that
\({\textbf{C}}_t = \left( -v_{\flat }^{\prime \prime } \left( {e}_{t+1} R_{ft+1} \right) e_t^2 \right) c_p {\textbf{I}}_N\), where
\({\textbf{I}}_N\) denotes the
N-dimensional identity matrix and
\(c_p \ge 0\). Recall that
\(v_{\flat }^{\prime \prime } \le 0\) and
\(v_{\flat }^{\prime \prime } (x)<0\) for
\(x > \Psi\). Then, the approximation
\(\frac{R_{ft+1} }{ M_{APR} ({e}_{t+1} R_{ft+1}) } \approx \frac{1}{M_{APR} ({e}_{t+1} R_{ft+1}) }\) and (
17) result in
$$\begin{aligned}&\frac{1}{\gamma } \left( \mathbb {E}_t \left( \left( {\textbf{R}}_{t+1} - R_{ft+1} {\textbf{1}}_N \right) \left( {\textbf{R}}_{t+1} - R_{ft+1} {\textbf{1}}_N \right) ^{{{\,\mathrm{\top }\,}}} \right) + c_p {\textbf{I}}_N \right) ^{-1} \left( \mathbb {E}_t \left( {\textbf{R}}_{t+1} - R_{ft+1} {\textbf{1}}_N \right) + \frac{ 1}{ e_t R_{ft+1} v_{\flat }^{\prime } ( {e}_{t} R_{ft+1} )} {\textbf{C}}_t \check{{\textbf{w}}}_t \right) \nonumber \\ =&\frac{1}{\gamma } \left( \mathbb {E}_t \left( \left( {\textbf{R}}_{t+1} - R_{ft+1} {\textbf{1}}_N \right) \left( {\textbf{R}}_{t+1} - R_{ft+1} {\textbf{1}}_N \right) ^{{{\,\mathrm{\top }\,}}} \right) + c_p {\textbf{I}}_N \right) ^{-1} \left( \mathbb {E}_t \left( {\textbf{R}}_{t+1} - R_{ft+1} {\textbf{1}}_N \right) + \frac{\left( -v_{\flat }^{\prime \prime } \left( {e}_{t+1} R_{ft+1} \right) e_t^2 \right) }{ e_t R_{ft+1} v_{\flat }^{\prime } ( {e}_{t} R_{ft+1} ) } c_p {\textbf{I}}_N \check{{\textbf{w}}}_t \right) \nonumber \\ \approx&\frac{1}{\gamma } \left( \underbrace{ \mathbb {E}_t \left( \left( {\textbf{R}}_{t+1} - R_{ft+1} {\textbf{1}}_N \right) \left( {\textbf{R}}_{t+1} - R_{ft+1} {\textbf{1}}_N \right) ^{{{\,\mathrm{\top }\,}}} \right) + c_p {\textbf{I}}_N }_{ \mathcal {B}_t } \right) ^{-1} \left( \mathbb {E}_t \left( {\textbf{R}}_{t+1} - R_{ft+1} {\textbf{1}}_N \right) + \gamma c_p \check{{\textbf{w}}}_t \right) =: {\textbf{w}}_t^{\flat } \ . \end{aligned}$$
(18)
Hence, we get
Note that a diagonal \(\mathcal {B}_t\) and the equality \(w_{\text{it}}^{\flat } = {\bar{w}}_{\text{it}} + {\varvec{\theta }}^{{{\,\mathrm{\top }\,}}} {\textbf{x}}_{\text{it}}\) are still a strong requirements. Having derived demand functions under different preference specifications, we will next analyze the implications for equilibrium asset pricing.
8 Conclusions
Theory-guided reduction techniques prove particularly helpful for machine learning applications as forcefully argued by Nagel (
2021).
20 Such techniques are extremely valuable under conditions that are challenging for the determination of optimal portfolios either because utility frontiers are rather flat or even exhibit multiple optima. Not only in such cases our new shrinkage facility renders portfolio strategies less risky and improves performance when applied to empirical data. In contrast with a bulk of methods proposed in the literature, we do not shrink (some of) the model parameters but shrink the portfolio weights toward some a-priori specified strategy.
We test our approach in simulation exercises on data for the S&P 500, the US market for large caps.
21 For the scenario when parameters actually can be estimated directly, we observe that the simple optimal shrinkage strategy proposed in this article outperforms the parametric portfolio approach of Brandt et al. (
2009), and the 1/
N-strategy, for most levels of absolute and relative risk aversion. Only for CRRA preferences with very low levels of risk aversion, the other strategies are superior. For higher degrees of risk aversion, the performances of the strategies considered are quite similar.
The demand systems approach to asset pricing introduced by Koijen and Yogo (
2019) lends itself to numerous applications, such as the intermediary asset pricing theory of He and Krishnamurthy (
2013) or asset pricing with frictions more generally. In this article, our shrinkage approach also augments the demand systems approach to CARA and CRRA expected utility. We consider the cases with and without short selling constraints and show the existence of equilibrium.
Another aspect of the demand system approach is its relation to the BSV characteristics-based parametric portfolio approach (see Brandt et al.
2009 ) that has received a lot of interest from empirical researchers because it provides an attractive reduction technique to an otherwise complex optimization problem. From the results obtained in this article, we observe that parametric portfolio strategies can be optimal under rather strong assumptions.
While our work, as a first step, has focused on a quasi-static analysis and evaluation, a promising route for future research would seem as the next step to consist in a dynamic implementation of optimal shrinkage strategies, allowing for time dependence and tuning of the shrinkage parameter. Alternatively different shrinkage portfolios can be investigated. For example, does it make sense to shrink toward the risk-free asset in risky periods, etc. In the current implementation, the model parameters are estimated in the training sample and not adapted in the evaluation sample. It is tempting to experiment with rolling windows or more sophisticated dynamic models in order to improve out-of-sample performance.