3.1 Model by Vikram & Sinha (VS)
The market in the VS model is populated by
N traders. At each time step
t, a trader
i either buys (
\(S_i(t) = 1\)), sells (
\(S_i(t) = -1\)) or stays inactive (
\(S(t) = 0\)). The normalized net demand from all traders is then given as
\(M_t = \frac{1}{N} \sum _{i=1}^N S_i(t)\), and the price adjusts as
\(p_{t+1} = \frac{1 + M_t}{1 - M_t} p_t\). An agent’s decision to buy/sell or staying out depends on the perceived mispricing between the current price
\(p_t\) and its running average
\(p^*_t = \langle p_t \rangle _{\tau }\) which is considered as a proxy for the fundamental price of the asset. The probability of an agent to trade is then given by
$$\begin{aligned} \mathbb {P}(|S_i(t)|) = \exp ^{ - \mu \left| \frac{p_t - p^*_t}{p^*_t} \right| } \end{aligned}$$
and a trading agent buys
\(S_i(t) = 1\) or sells
\(S_i(t) = -1\) at random with equal probability.
In order to obtain a statistical model of volatility, in particular with a continuous latent state as required for HMC sampling, we have adapted the model as follows:
-
For a large number of agents
\(N \rightarrow \infty \), the net demand
\(M_t\) converges to a Gaussian distribution with mean zero (as
\(\mathbb {E}[S_i(t)] = 0\)) and variance
\(\frac{\mathbb {P}(|S_i(t)| = 1)}{\sqrt{N}}\).
-
Here, we have used that agents trading decisions
\(S_i(t)\) are independent and
$$\begin{aligned} \mathbb {E}[S_i(t)]&= \frac{1}{2} \mathbb {P}(|S_i(t)| = 1) \cdot 1 + \frac{1}{2} \mathbb {P}(|S_i(t)| = 1) \cdot (- 1) \\&\quad + (1 - \mathbb {P}(|S_i(t)| = 1) \cdot 0 = 0 \\ \mathbb {V}ar[S_i(t)]&= \mathbb {E}[S_i(t)^2] \\&= \frac{1}{2} \mathbb {P}(|S_i(t)| = 1) \cdot 1^2 + \frac{1}{2} \mathbb {P}(|S_i(t)| = 1) \cdot (- 1)^2 \\&\quad + (1 - \mathbb {P}(|S_i(t)| = 1) \cdot 0^2 \\&= \mathbb {P}(|S_i(t)| = 1). \end{aligned}$$
-
Next, considering the number of agents as unknown we introduce a scaling parameter \(\sigma _{\max }^2\) for the variance and model the demand as \(M_t \sim \mathcal {N}(0, \sigma _{\max }^2 \mathbb {P}(|S_i(t)| = 1))\).
-
Finally, we approximate the log-return by linearizing the price impact
5$$\begin{aligned} r_{t+1}&= \log \frac{p_{t+1}}{p_t} \\&= \log \frac{1 + M_t}{1 - M_t} \\&\approx 2 M_t \end{aligned}$$
where we have used that
\(\log (1 + x) \approx x\) for
\(|x| \ll 1\).
Overall, we arrive at the following model dynamics
6$$\begin{aligned} \langle p_{t} \rangle _{\tau }= & {} (1 - \tau ) p_{t} + \tau \langle p_{t-1} \rangle _{\tau } \nonumber \\ \mathbb {P}(|S(t)| = 1)= & {} e^{- \mu \left| \log \frac{p_t}{\langle p_t \rangle _{\tau }} \right| } \nonumber \\ r_{t+1}\sim & {} \mathcal {N}(0, \sigma _{\max }^2 \cdot 4 \mathbb {P}(|S(t)| = 1)). \end{aligned}$$
(2)
Note that this is a state-space model with a continuous latent state driving the time-varying volatility
\(\sigma _{t+1} = \sqrt{\sigma _{\max }^2 \cdot 4 \mathbb {P}(|S(t)| = 1)}\). Indeed, the famous GARCH(1, 1) (generalized auto-regressive conditional heteroscedastic) model (Bollerslev
1986) is of a similar form
$$\begin{aligned} \sigma _{t+1}^2&= \alpha _0 + \alpha _1 r_t^2 + \beta _1 \sigma _t^2 \nonumber \\ r_{t+1}&\sim \mathcal {N}(\mu , \sigma _{t+1}^2) \end{aligned}$$
(3)
The main difference between the VS (in our formulation) and the GARCH model is that the volatility is a function of past prices in the former and past returns in the latter model. Furthermore, due to being founded in an agent-based model all parameters of the VS model are readily interpretable as the sensitivity
\(\mu \) of the agents to mispricing and the weighting
\(\tau \) of the running price average. In contrast, parameters in the GARCH model are motivated purely from statistical grounds and cannot easily be related to agent behaviors.
From a Bayesian perspective, Eqs. (
2) and (
3) correspond to the likelihood
\(p(\varvec{x}| \varvec{\theta })\), i.e., the conditional probability of the observed data given the model parameters. To complete the model density
\(p(\varvec{x}, \varvec{\theta })\), we need to specify a prior distribution on the parameters. The choice of a prior distribution is often considered as subjective (whereas the likelihood has an aura of objectivism). Arguably, from the perspective of modeling the observed data this distinction is of limited relevance. Instead, note that fixing the prior implicitly fixes a distribution on the data space, i.e., obtained as
\(p(\varvec{x}) = \int p(\varvec{x}, \varvec{\theta }) \hbox {d}\varvec{\theta }\) by marginalizing over the parameters. A model can be considered as misspecified when it assigns very low probability to the actual observed data. In contrast, a good model should be able to generate similar data with reasonable probability. This viewpoint is in line with Gelman et al. (
2017) who argue that the prior can only be understood in the context of the likelihood. Indeed, prior and likelihood act together in shaping the model and expressing our expectations of plausible data.
Here, we propose the use of (weakly) informative priors which take into account our knowledge about the role played by the parameters when generating data from the likelihood model. As an example, consider the parameter
\(\tau \in (0, 1)\) of Eq. (
2). While it might be natural to simply assign a uniform prior,
7\(\tau \) controls the time constant of the running price average, i.e.,
$$\begin{aligned} \langle p_{t} \rangle _{\tau }&= (1 - \tau ) p_{t} + \tau \langle p_{t-1} \rangle _{\tau } \\&= (1 - \tau ) p_{t} + \tau \left( (1 - \tau ) p_{t-1} + \tau \langle p_{t-2} \rangle _{\tau } \right) \\&= (1 - \tau ) \sum _{k = 0}^{\infty } \tau ^k p_{t-k} \end{aligned}$$
as
\(\tau ^k \rightarrow 0\) for
\(k \rightarrow \infty \). Comparing this to an exponentially weighted average in continuous time, i.e.,
$$\begin{aligned} \langle p_{t} \rangle _{\rho } = \int _0^\infty \rho e^{- \rho s} p_{t - s} \hbox {d}s \end{aligned}$$
with time constant
\(\rho ^{-1}\) and exponential weighting kernel
\(k(s) = \rho e^{- \rho s}\) of unit weight, i.e.,
\(\int _0^\infty k(s) \hbox {d}s = 1\), we match
\(\tau ^k\) with
\(e^{- \rho k}\), thus interpreting
\(- \frac{1}{\log \tau }\) as the time constant
l of the running average. A
\(\hbox {Uniform}(0, 1)\) prior on
\(\tau \) then corresponds to an
\(\hbox {Inv-Gamma}(1, 1)\) on
\(l = - \frac{1}{\log \tau }\) putting considerable probability mass on very short time constants below 1 day. In order to obtain a more reasonable distribution, we parameterize the model directly in terms of the time constant
l and give it an
\(\hbox {Inv-Gamma}(2, 1000)\) prior with a mean of 1000 and unbounded variance. Similarly,
\(\mu \) has been given a
\(\hbox {Gamma}(3, 0.03)\) which assigns more than
\(95\%\) of its probability mass to the interval [20, 250]. Together, these priors inform the VS model to stay away from the boundary
\(\mu \rightarrow 0\) or
\(\tau \rightarrow 0\) where it becomes trivial, i.e.,
\(\mathbb {P}(|S_t| = 1) \equiv 1\) and thus
\(\sigma _t \equiv \sigma _{\max }\). Accordingly, it cannot be expected to generate data exhibiting pronounced volatility clustering in this case, and indeed, in the original reference Vikram and Sinha (
2011), prices were averaged over
\(10^4\) time steps, which are well covered by the chosen prior. Implementing both models is straightforward in
Stan, and the full code is provided as supplementary material.
3.2 Model by Franke & Westerhoff (FW)
Franke & Westerhoff have developed a series of models and have estimated them by moment matching (Franke and Westerhoff
2012,
2011). Here, we follow their presentation and introduce the DCA–HPM model in their terminology.
In the FW model, the market is populated with two types of agents, namely fundamental and chartist traders. The fraction of fundamental traders at time step
t is denoted by
\(n_t^f \in [0, 1]\). The corresponding fraction of chartist traders is then given by
\(n_t^c = 1 - n_t^f\). The log price, denoted by
\(p_t\), adjusts to the average demand from fundamental
\(d^f\) and chartist
\(d^c\) traders as
$$\begin{aligned} p_t = p_{t-1} + \mu \left( n_{t-1}^f d_{t-1}^f + n_{t-1}^c d_{t-1}^c\right) . \end{aligned}$$
(4)
The demand is composed of a deterministic and stochastic component. It is assumed that fundamental traders react to mispricing, i.e., the difference between
\(p_t\) and the (known) fundamental price
\(p^*\), whereas chartist traders react to past price movement, i.e.,
\(p_t - p_{t-1}\). According to Franke and Westerhoff (
2012), the demand dynamics is modeled as
$$\begin{aligned} d_t^f&= \phi (p^* - p_t) + \epsilon ^f_t\quad \,\, \epsilon ^f_t \sim \mathcal {N}\left( 0, \sigma _f^2\right) \\ d_t^c&= \xi (p_t - p_{t-1}) + \epsilon _t^c \quad \epsilon _t^c \sim \mathcal {N}\left( 0, \sigma _c^2\right) \end{aligned}$$
with parameters
\(\phi , \xi > 0\) specifying the sensitivity to price differences for the fundamental and chartist traders. Note that these demands are unobserved as only their weighted sum effects the price. While such a dynamics could be modeled by means of a stochastic latent state, in the present case it is possible to marginalize out the demand. As the sum of two normally distributed random variables is again normal, the combined demand gives rise to a stochastic model for the log return
\(r_t = p_t - p_{t-1}\)$$\begin{aligned}&r_t \sim \mathcal {N}\left( \mu \left( n_{t-1}^f \phi (p^* - p_{t-1}) + n_{t-1}^c \xi (p_{t-1} - p_{t-2})\right) ,\right. \nonumber \\&\quad \left. \mu ^2 \left( \left( n_{t-1}^f\right) ^2 \sigma _f^2 +\left( n_{t-1}^c\right) ^2 \sigma _c^2\right) \right) . \end{aligned}$$
(5)
The volatility
\(\sigma _t = \mu \sqrt{(n_{t-1}^f)^2 \sigma _f^2 + (n_{t-1}^c)^2 \sigma _c^2}\) now depends on the fraction of chartist vs fundamental traders and changes over time. Franke & Westerhoff (
2012) call this structured stochastic volatility, in analogy with structural models in economics, as the parameters of the agent-based model are grounded in behavioral terms and therefore economically meaningful.
The model is then completed by an update equation for the fraction of traders in each group. Here, we consider the DCA–HPM specification of Franke and Westerhoff (
2012) which is given by
$$\begin{aligned} n_t^f&= \frac{1}{1 + e^{- \beta a_{t-1}}} \end{aligned}$$
(6)
$$\begin{aligned} n_t^c&= 1 - n_t^f \nonumber \\ a_t&= \alpha _0 + \alpha _n (n_t^f - n_t^c) + \alpha _p (p^* - p_t)^2. \end{aligned}$$
(7)
The parameter
\(a_t\) denotes the relative attractiveness of the fundamental over the chartist strategy. It includes a general predisposition
\(\alpha _0\) and herding
\(\alpha _n > 0\) as well as mispricing
\(\alpha _p > 0\) effects. We chose this specification for two reasons:
1.
The discrete choice approach (DCA) of Eq. (
6) leads to a smoothly differentiable model density. This eases the exploration of the posterior when sampling with the HMC algorithm.
2.
The herding
\(+\) predisposition
\(+\) misalignment (HPM) specification for the attractiveness Eq. (
7) can be computed without access to the actual demands
\(d_t^f\) and
\(d_t^c\). This is not true for the other specifications of Franke and Westerhoff (
2012) where the agent’s wealth depends on previous demands which, in turn, leads to a stochastic volatility model where (one of) the demands have to be modeled as a stochastic latent variable. For simplicity, we have not considered this complication in the present paper.
Overall, the model dynamics is fully specified by Eqs. (
5), (
6) and (
7). The parameters of the model are given by
\(\varvec{\theta }^{FW} = (\mu , \phi , \sigma _f, \xi , \sigma _c, \beta , \alpha _0, \alpha _n, \alpha _p, p^*)\). Note that
\(\beta \) and
\(\mu \) are redundant as they simply control the scale of
\(\alpha _0, \alpha _n, \alpha _p\) and
\(\xi , \phi , \sigma _f, \sigma _c\), respectively. Thus, throughout we fix them at
\(\beta = 1\) and
\(\mu = 0.01\) as in the simulation exercise of Franke and Westerhoff (
2012).
When estimating the model on real stock returns below, we do not know the fundamental price. When simulating from the model or matching moments as in Franke and Westerhoff (
2012), the fundamental price can be considered as fixed. Yet, when fitting the model to actual return data more flexibility in order to estimate reasonable values for the unobserved fundamental price is needed. Here, we consider two specifications of the model: First, as in the VS model presented above, the fundamental price is derived as an average of past prices. Note that in order to be faithful and comparable to the VS model, we compute a running average of past prices and not log prices. Secondly, following Lux (
2018), we assume that the log fundamental price is time varying as a Brownian motion
$$\begin{aligned} p^*_t \sim \mathcal {N}(p^*_{t-1}, \sigma _*^2) . \end{aligned}$$
This not only introduces another parameter
\(\sigma _*\) but also turns the model into a stochastic volatility model, i.e., the volatility
\(\sigma _t\) now includes a stochastic component. To see this, note that
\(\sigma _t\) depends on
\(a_{t-2}\) via
\(n_{t-1}^f\) and the attractiveness in turn includes the stochastic fundamental log price
\(p^*_{t-2}\). Thus, the fundamental price plays a similar role as the log volatility
\(h_t\) in a classical discrete time stochastic volatility (SV) model (Kim et al.
1998)
$$\begin{aligned} h_t&\sim \mathcal {N}(\mu + \phi (h_{t-1} - \mu ), \sigma _h) \nonumber \\ r_t&\sim \mathcal {N}(0, e^{\nicefrac {h_t}{2}}) \end{aligned}$$
(8)
that we include as a benchmark alongside the GARCH(1, 1) model. Again, in contrast to the purely phenomenological dynamics in Eq. (
8) the parameters of the FW model are interpretable in terms of behavioral trades of agents. Furthermore, the model specification combines aspects of local and stochastic volatility in that its volatility
\(\sigma _t\) depends on the random fundamental price as well as past prices via Eqs. (
6) and (
7).
Nevertheless, implementing the model in
Stan is readily possible. As before, the full code of the FW model—for both specifications—is provided as a supplementary material. Interestingly, along the same lines a variant of the VS model with a random walk specification for the fundamental price can be defined. For comparison, we have also implemented this model using both specifications. Note that the time-varying log fundamental prices
\(p^*_t\) do not appear as a
T-dimensional vector (where
T denotes the number of observed time steps) in the parameter block. Instead, we have used a non-centered parameterization, i.e.,
is computed from
as a transformed parameter. Formally, we can express this as follows:
$$\begin{aligned} p^*_t = p^*_{t-1} + \sigma _* \epsilon ^*_t \quad \text{ where } \quad \epsilon ^*_t \sim \mathcal {N}(0, 1) \end{aligned}$$
(9)
instead of
$$\begin{aligned} p^*_t \sim \mathcal {N}(p^*_{t-1}, \sigma _*^2). \end{aligned}$$
This is a standard example of a reparameterization which does not change the model, but helps when HMC sampling as the innovation parameters
\(\epsilon _t^*\) all have unit scale and are a priori independent, no matter which variance
\(\sigma _*^2\) is currently sampled.
Again, we complete the model with weakly informative priors for all parameters. As few insights are available about the proper choice of the attractiveness parameters
\(\alpha _0, \alpha _n\) and
\(\alpha _p\), we assign weakly informative priors, e.g.,
, which restrict the scale of the parameter yet, being heavy-tailed, allow substantially larger values.
8 In case of the standard deviation parameters
\(\sigma _f, \sigma _c\) and
\(\sigma _*\), we impose stronger priors and resort to the observed data to set the proper scale. While not being purely Bayesian, this choice restricts the model to reasonable scales accounting for the fact that volatility could be measured in arbitrary units, e.g., percent per year. Figure
1 illustrates prior predictive checks, i.e., sample data generated according to the model with parameters drawn from the prior, for the FW model using a moving average specification for the fundamental price.
9 While the scale of the data appears well matched, the model only occasionally produces volatility clustering as pronounced as in the real data. Nevertheless, we found these priors effective in simulation studies as well as when estimating the model on stock data. Furthermore, Appendix
C contains additional robustness checks confirming the above rationality behind our choice of priors.
3.3 Model by Alfarano, Lux & Wagner (ALW)
Alfarano, Lux & Wagner model a financial market populated by
N chartist traders as well as additional fundamental traders. The excess demand from the population of fundamental traders is given by
$$\begin{aligned} ED_f&= T_f (p_f - p) \end{aligned}$$
(10)
with total trading volume
\(T_f\) and log fundamental price
\(p_f\). As in the FW model, prices are exclusively considered in logarithmic terms, i.e., to simplify notation the log price is denoted by
p.
Chartists traders are either in an optimistic or pessimistic state. Optimists are assumed to buy a certain amount
\(T_c\) at each time step, while pessimists sell amount
\(T_c\). Denoting the number of optimistic traders as
n, the market sentiment
\(x = 2 \frac{n}{N} - 1 \in [-1, 1]\) is defined and the excess demand from chartist traders given as
$$\begin{aligned} ED_c&= N T_c x . \end{aligned}$$
(11)
Then, assuming a Walrasian pricing mechanism log market prices are adjusted as
$$\begin{aligned} \frac{\hbox {d} p}{\hbox {d} t}&= \beta \left( ED_f + ED_c \right) \end{aligned}$$
(12)
$$\begin{aligned}&= \beta \left( T_f (p_{f,t} - p_t) + N T_c x_t \right) \end{aligned}$$
(13)
Note that the FW model assumed a very similar log price adjustment, albeit formulated in discrete terms in Eq. (
4).
Lux (
2018) now assumes instantaneous price adjustment, i.e.,
\(\beta \rightarrow \infty \), and the market is cleared by matching fundamental and chartist demand. In this case, the market price is found to be
$$\begin{aligned} p_t&= p_{f,t} + \frac{N T_c}{T_f} x_t \end{aligned}$$
(14)
and the corresponding returns can be expressed as
$$\begin{aligned} r_t&= p_{f,t} - p_{f,t-1} + \frac{N T_c}{T_f}(x_t - x_{t-1})&= \sigma _f \epsilon _{f,t} + \frac{N T_c}{T_f}(x_t - x_{t-1}) \end{aligned}$$
where the last line follows when assuming a Brownian motion for the log fundamental price, i.e.,
\(p_{f,t} = p_{f,t-1} + \sigma _f \epsilon _{f,t}\) with
\(\epsilon _{f,t} \sim \mathcal {N}(0, 1)\). From a statistical perspective, market returns are distributed as
$$\begin{aligned} r_t&\sim \mathcal {N}\left( \frac{N T_c}{T_f}(x_t - x_{t-1}), \sigma _f^2 \right) . \end{aligned}$$
(15)
In contrast to the FW model, this model does not exhibit stochastic volatility. Furthermore, we do not need to model the fundamental price. Indeed, assuming instantaneous price adjustment market prices cannot deviate persistently from the fundamental price and it becomes essentially an observed quantity (compare Eq.
14).
Now, the market sentiment changes according to a herding process originally proposed by Kirman. Here, each agent randomly switches from pessimistic to optimistic or vice versa with transition rates
\(\pi ^+ = a + b n\) and
\(\pi ^- = a + b (N - n)\), respectively. Parameter
a expresses a general tendency to switch state, while
b models the herding with the switching probability increasing in the size of the contrarian population. As detailed in Alfarano et al. (
2008), these transition rates lead to a sentiment dynamics with nonvanishing fluctuations even in the limit of an infinite population. The sentiment dynamics is then governed by the following Langevin equation:
$$\begin{aligned} \hbox {d}X_t&= - 2 a X_t \hbox {d}t + \sqrt{2 b (1 - X_t^2)} \hbox {d}W_t . \end{aligned}$$
(16)
In our implementation, the continuous time equation (Eq. (
16)) has been Euler-discretized with
\(\Delta t = 1\) day, i.e.,
\(x_t = - 2 a x_{t-1} + \sqrt{2 b (1 - x_t^2)} \; \epsilon _t\), where
\(\epsilon _t \sim \mathcal {N}(0, 1)\), and then coupled to the observed returns via Eq. (
15). As in Lux (
2018), we have fixed
\(\frac{N T_c}{T_f}\) as one and assume weakly informative truncated standard normal priors on the remaining parameters
a,
b and
\(\sigma _f\). As illustrated in Fig.
1, the model produces return time series similar to the FW model, albeit at a somewhat larger scale. It appears that a prior favoring smaller values for
\(\alpha \) and
\(\beta \) would be beneficial. The prior chosen here provides a compromise between such an informative prior and the uniform one suggested in Lux (
2018) which with high probability exhibits almost independent returns without clustering (not shown). Again, the full Stan code for the ALW model is provided as a supplementary material.