nach oben

Review of Quantitative Finance and Accounting

Erschienen in:

Open Access 03.02.2020 | Original Research

A machine learning approach to univariate time series forecasting of quarterly earnings

verfasst von: Jan Alexander Fischer, Philipp Pohl, Dietmar Ratz

Erschienen in: Review of Quantitative Finance and Accounting | Ausgabe 4/2020

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Patentsuche

Aus

Abstract

We propose our quarterly earnings prediction (QEP_SVR) model, which is based on epsilon support vector regression (ε-SVR), as a new univariate model for quarterly earnings forecasting. This follows the recommendations of Lorek (Adv Account 30:315–321, 2014. https://doi.org/10.1016/j.adiac.2014.09.008), who notes that although the model developed by Brown and Rozeff (J Account Res 17:179–189, 1979) (BR ARIMA) is advocated as still being the premier univariate model, it may no longer be suitable for describing recent quarterly earnings series. We conduct empirical studies on recent data to compare the predictive accuracy of the QEP_SVR model to that of the BR ARIMA model under a multitude of conditions. Our results show that the predictive accuracy of the QEP_SVR model significantly exceeds that of the BR ARIMA model under 24 out of the 28 tested experiment conditions. Furthermore, significance is achieved under all conditions considering short forecast horizons or limited availability of historic data. We therefore advocate the use of the QEP_SVR model for firms performing short-term operational planning, for recently founded companies and for firms that have restructured their business model.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1 Introduction

The quarterly earnings reported by a company is an accounting figure of great significance. Quarterly earnings can be used to track performance in the context of management and debt contracts (Dechow et al. 1998), and are reflective of corporate governance (Chen et al. 2015). Isidro and Dias (2017) also show that earnings are strongly related to stock returns in volatile market conditions, while Zoubi et al. (2016) consider disaggregated earnings to better explain variation in stock returns. Furthermore, differences between forecasted and actual earnings have been used to calculate a firm’s market premium (Dopuch et al. 2008).

The prediction of future quarterly earnings using univariate statistical models has been the subject of extensive research. Lorek and Willinger (2011) and Lorek (2014) claim that the autoregressive integrated moving average (ARIMA) model proposed by Brown and Rozeff (1979), denoted by BR ARIMA, is the premier univariate statistical model for the prediction of quarterly earnings. Its functional form is

$$Y_{q} = Y_{q - 4} + \phi \left({Y_{q - 1} - Y_{q - 5}} \right) + \epsilon_{q} - \theta \left({\epsilon_{q - 4}} \right),$$

where $Y_{q}$ is the earnings of a company for a quarter $q$, $\phi$ is an autoregressive parameter, $\theta$ is a moving average parameter and $\epsilon_{q}$ is the disturbance term at $q$.

While the BR ARIMA model has been praised for its predictive accuracy on historic data, Lorek (2014) expresses concerns about its ability to describe more recent quarterly earnings. This is because research suggests that the time series properties of quarterly earnings have changed significantly since the development of the BR ARIMA model four decades ago. For instance, Baginski et al. (2003) note an apparent decline in the persistence of quarterly earnings from 1967 to 2001. This could be explained by the increased prevalence of high-tech firms (Kwon and Yin 2015). Lorek and Willinger (2008) also find that for an increasing number of companies, the quarterly earnings series no longer exhibit any significant seasonality. Furthermore, Klein and Marquardt (2006) highlight an increasing frequency of negative quarterly earnings, which according to Hayn (1995) disrupts autocorrelation patterns. Lorek (2014) therefore advocates research towards developing a new univariate model for the prediction of quarterly earnings.

In this paper we introduce a new model based on epsilon support vector regression (ε-SVR) (Smola and Schölkopf 2004; Vapnik 1995), termed the quarterly earnings prediction (QEP_SVR) model. We choose ε-SVR as a supervised learning algorithm because of (1) the guarantee of finding a globally optimal solution, (2) a sparse solution space, meaning only some training points contribute to the solution, and (3) the use of dot products to facilitate efficient computation of nonlinear solutions (Thissen et al. 2003). Furthermore, ε-SVR has been shown to generally perform well in many real-world applications (Hastie et al. 2009). Initial trials were also conducted using random forests (Breiman 2001), gradient boosting (Friedman 2001), and gaussian processes (Rasmussen and Williams 2006) as base supervised learners for the QEP_SVR model. However, utilizing ε-SVR yielded the highest predictive accuracy.

Our QEP_SVR model retains the univariate character of the BR ARIMA model, meaning all predictive features are derivable from historic quarterly earnings series. Unlike the BR ARIMA model however, the QEP_SVR model is fitted using the historic quarterly earnings of multiple firms. The objective of this paper is to analyze and compare the predictive accuracy of the QEP_SVR model to that of the state-of-the-art BR ARIMA model under a multitude of conditions.

The rest of the paper proceeds as follows. Section 2 familiarizes the reader with the basics of ε-SVR. Section 3 then describes the QEP_SVR model in detail. This is followed by an explanation of the research method for comparing the QEP_SVR model to the BR ARIMA model in Sect. 4. The experimental results are discussed in Sect. 5, while Sect. 6 offers concluding remarks.

2 Epsilon support vector regression (ε-SVR)

ε-SVR (Smola and Schölkopf 2004; Vapnik 1995) is an integral part of the QEP_SVR model; its key ideas are introduced in this section. Consider training data of the form $\left\{ {\left( {x_{1} , y_{1} } \right), \ldots ,\left( {x_{n} ,y_{n} } \right)} \right\} \subset {\mathcal{X}} \times {\mathbb{R}}$, where ${\mathcal{X}}$ is the input space (e.g. ${\mathcal{X}} = {\mathbb{R}}^{d}$), i.e. $x_{i} \in {\mathcal{X}}$ and $y_{i} \in {\mathbb{R}}$ for $i = 1, \ldots ,n$. The goal of ε-SVR regression is to fit a model, $f\left( x \right)$, to the training data, such that (1) the maximum deviation of each training point $y_{i}$ from $f\left( {x_{i} } \right)$ is ε, and (2) $f\left( x \right)$ is as flat as possible. The procedure for achieving this is introduced for cases where $f\left( x \right)$ is linear first, followed by an extension to nonlinear models.

A linear function is stipulated as

$$f\left( x \right) = \langle w,x \rangle + b,$$

where $w \in {\mathcal{X}},b \in {\mathbb{R}}$, and $\langle w,x \rangle$ denotes the dot product of $w$ and $x.$ For this function, flatness refers to $w$ being small. ε-SVR fits a linear function to training data by solving the following convex optimization problem:

$$\begin{aligned} & minimize\quad \frac{1}{2}{\parallel}w{\parallel}^{2} \\ & subject\, to\quad \left\{ {\begin{array}{*{20}l} {y_{i} - \langle w,x_{i} \rangle - b \le \upvarepsilon} \hfill \\ {\langle w,x_{i} \rangle + b - y_{i} \le\upvarepsilon} \hfill \\ \end{array} } \right. \\ \end{aligned}$$

where ${\parallel}w{\parallel}^{2} = \langle w,w \rangle $. Since satisfying these constraints may be infeasible, slack variables $\xi_{i}$ and $\xi_{i}^{*}$ are introduced:

$$\begin{aligned} & minimize\quad \frac{1}{2}{\parallel}w{\parallel}^{2} + C\mathop \sum \limits_{i = 1}^{n} \left( {\xi_{i} + \xi_{i}^{*} } \right) \\ & subject \,to\quad \left\{ {\begin{array}{*{20}l} {y_{i} - \langle w,x_{i} \rangle - b \le\upvarepsilon + \xi_{i} } \hfill \\ {\langle w,x_{i} \rangle + b - y_{i} \le\upvarepsilon + \xi_{i}^{*} } \hfill \\ { \xi_{i} ,\xi_{i}^{*} \ge 0} \hfill \\ \end{array} } \right. \\ \end{aligned}$$

Here, deviations only contribute to the total error when $\left| {f\left( {x_{i} } \right) - y_{i} } \right| >\upvarepsilon$. The trade-off between attaining a flat $f$ and not allowing deviations greater than ε is controlled by the constant $C > 0$. However, the above optimization problem is usually solved in its dual formulation. The involves the construction of a Lagrange function, $L$:

$$\begin{aligned} L = & \frac{1}{2}{\parallel}w{\parallel}^{2} + C\mathop \sum \limits_{i = 1}^{n} \left( {\xi_{i} + \xi_{i}^{*} } \right) - \mathop \sum \limits_{i = 1}^{n} \left( {\eta_{i} \xi_{i} + \eta_{i}^{*} \xi_{i}^{*} } \right) \\ & - \mathop \sum \limits_{i = 1}^{n} \alpha_{i} \left( {\upvarepsilon + \xi_{i} - y_{i} + \langle w,x_{i} \rangle + b} \right) - \mathop \sum \limits_{i = 1}^{n} \alpha_{i}^{*} \left( {\upvarepsilon + \xi_{i}^{*} + y_{i} - \langle w,x_{i} \rangle - b} \right), \\ \end{aligned}$$

where the variables $\alpha_{i} , \alpha_{i}^{*} ,\eta_{i} , \eta_{i}^{*} \ge 0$ are Lagrange multipliers. To find the optimal solution, the partial derivatives of $L$ with respect to the variables $w,b,\xi_{i}$ and $\xi_{i}^{*}$ are set to 0:

$$\begin{aligned} \partial_{b} L & = \mathop \sum \limits_{i = 1}^{n} \left( {\alpha_{i}^{*} - \alpha_{i} } \right) = 0 \\ \partial_{w} L & = w - \mathop \sum \limits_{i = 1}^{n} \left( {\alpha_{i} - \alpha_{i}^{*} } \right)x_{i} = 0 \\ \partial_{{\xi_{i} }} L & = C - \alpha_{i} - \eta_{i} = 0 \\ \partial_{{\xi_{i}^{*} }} L & = C - \alpha_{i}^{*} - \eta_{i}^{*} = 0 \\ \end{aligned}$$

The substitution of these equations into $L$ gives the dual optimization problem, in which $\eta_{i}$ and $\eta_{i}^{*}$ vanish:

$$\begin{aligned} & maximize\quad - \frac{1}{2}\mathop \sum \limits_{i,j = 1}^{n} \left( {\alpha_{i} - \alpha_{i}^{*} } \right)\left( {\alpha_{j} - \alpha_{j}^{*} } \right) \langle x_{i} ,x_{j} \rangle -\upvarepsilon\mathop \sum \limits_{i = 1}^{n} \left( {\alpha_{i} + \alpha_{i}^{*} } \right) + \mathop \sum \limits_{i = 1}^{n} y_{i} \left( {\alpha_{i} - \alpha_{i}^{*} } \right) \\ & subject \,to\quad \left\{ {\begin{array}{*{20}l} {\mathop \sum \limits_{i = 1}^{n} \left( {\alpha_{i} - \alpha_{i}^{*} } \right) = 0} \hfill \\ {\alpha_{i} ,\alpha_{i}^{*} \in \left[ {0, C} \right]} \hfill \\ \end{array} } \right. \\ \end{aligned}$$

The optimization problem can therefore be stipulated in terms of dot products between the training data. This observation is key for the subsequent extension of ε-SVR to nonlinear $f\left( x \right)$. Since rearranging $\partial_{w} L$ gives

$$w = \mathop \sum \limits_{i = 1}^{n} \left( {\alpha_{i} - \alpha_{i}^{*} } \right)x_{i} ,$$

it follows that $f\left( x \right)$ can be rewritten as a linear combination of the training inputs, $x_{i}$:

$$f\left( x \right) = \langle w,x \rangle+ b = \mathop \sum \limits_{i = 1}^{n} \left( {\alpha_{i} - \alpha_{i}^{*} } \right) \langle x_{i} ,x \rangle+ b.$$

ε-SVR is also capable of fitting nonlinear $f\left( x \right)$ to training data. This requires a mapping of the input space ${\mathcal{X}}$ to another space, ${\mathcal{X}}^{*}$. The dual optimization problem specified above could then be solved for the modified training data, $\left\{ {\left( {x_{1}^{*} , y_{1} } \right), \ldots ,\left( {x_{n}^{*} ,y_{n} } \right)} \right\} \subset {\mathcal{X}}^{*} \times {\mathbb{R}}$. However, computing dot products in ${\mathcal{X}}^{*}$ may be computationally infeasible. Instead, a kernel function $K$ can be used to compute the dot products $\langle x_{i}^{*} ,x_{j}^{*} \rangle$ specified in the optimization problem from ${\mathcal{X}}$ directly—avoiding the explicit transformation ${\mathcal{X}} \mapsto {\mathcal{X}}^{*}$. The dual optimization problem is thus rewritten as

$$\begin{aligned} & maximize\quad - \frac{1}{2}\mathop \sum \limits_{i,j = 1}^{n} \left( {\alpha_{i} - \alpha_{i}^{*} } \right)\left( {\alpha_{j} - \alpha_{j}^{*} } \right)K\left( {x_{i} ,x_{j} } \right) -\upvarepsilon\mathop \sum \limits_{i = 1}^{n} \left( {\alpha_{i} + \alpha_{i}^{*} } \right) + \mathop \sum \limits_{i = 1}^{n} y_{i} \left( {\alpha_{i} - \alpha_{i}^{*} } \right) \\ & subject \,to\quad \left\{ {\begin{array}{*{20}l} {\mathop \sum \limits_{i = 1}^{n} \left( {\alpha_{i} - \alpha_{i}^{*} } \right) = 0} \hfill \\ {\alpha_{i} ,\alpha_{i}^{*} \in \left[ {0, C} \right],} \hfill \\ \end{array} } \right. \\ \end{aligned}$$

where flatness of $f\left( x \right)$ is maximized in ${\mathcal{X}}^{*}$ rather than in ${\mathcal{X}}$. The fitted nonlinear function is stipulated as follows:

$$f\left( x \right) = \mathop \sum \limits_{i = 1}^{n} \left( {\alpha_{i} - \alpha_{i}^{*} } \right)K\left( {x_{i} ,x} \right) + b.$$

Finally, the variable $b$ can be estimated using the Karush–Kuhn–Tucker conditions (Karush 1939; Kuhn and Tucker 1951), which are not discussed further.

3 The QEP_SVR model

The notation used in the description of the QEP_SVR model is as follows. Firstly, quarterly earnings series are assigned relative integer indices. For example, if $Y_{q} |q \in {\mathbb{Z}}$ denotes the fourth quarter (Q4) earnings for 2015 in a series $Y$, then $Y_{q - 5}$ denotes the Q3 earnings for 2014. The unmodified quarterly earnings series are referred to as original (${\text{orig}} )$ series. However, differenced (${\text{diff }}$) and quarterly differenced (${\text{qdiff }}$) series are also considered in order to expose quarter-by-quarter and quarter-to-quarter earnings relationships, respectively (Lorek and Willinger 2011). Their derivation from ${\text{orig}}$ series is shown below.

$$\begin{aligned} {\text{diff}}_{q} & = orig_{q} - orig_{q - 1} \\ {\text{qdiff}}_{q} & = orig_{q} - orig_{q - 4} . \\ \end{aligned}$$

This means that for a continuous ${\text{orig }}$ series of length $n$, the derived ${\text{diff }}$ and ${\text{qdiff}}$ series have lengths $n - 1$ and $n - 4$, respectively.

At a high level, the QEP_SVR model can be thought of as a sequence of data manipulation steps. Given a historic quarterly earnings series, the QEP_SVR model first extracts several explanatory variables (features). Next, a heuristic is used to estimate the predictive power of each feature – only the top features are selected. The retained features are subsequently scaled to normalize their range. An ε-SVR model is then applied to the scaled features, yielding a (scaled) one-step-ahead quarterly earnings prediction. Finally, depending on the configuration of the ε-SVR model, this prediction is scaled back into the range of the original series. This gives the true one-step-ahead quarterly earnings prediction.

The BR ARIMA model and the QEP_SVR model always return one-step-ahead predictions when evaluated. We therefore obtain multi-step-ahead predictions via a series of one-step-ahead predictions. For example, making a two-step-ahead prediction for a historic series of length $Available Series Length \left( {ASL} \right) \in {\mathbb{N}}^{ + }$, $orig_{ - ASL} , \ldots ,orig_{ - 1}$, requires (1) evaluating a model for $orig_{ - ASL} , \ldots ,orig_{ - 1}$ to obtain the one-step-ahead prediction of $orig_{0}$, denoted by $orig_{0}^{*}$, then (2) evaluating the same model again for $orig_{ - ASL + 1} , \ldots ,orig_{0}^{*}$, yielding the desired two-step-ahead prediction of $orig_{1}$, denoted by $orig_{1}^{*}$. Note that the model parameters are the same in (1) and (2), i.e. the BR ARIMA and QEP_SVR models are not refitted during multi-step-ahead predictions.

The QEP_SVR model has two operations: fit and predict. Fitting is the process of estimating the QEP_SVR model’s parameters from a set of historic quarterly earnings series, while the predict operation uses the model parameters to make one-step-ahead quarterly earnings forecasts. We describe the QEP_SVR model in the context of these two operations for the remainder of this section, hereby refining the high-level approach outlined in the previous paragraph.

We define both the fitting and prediction operations of the QEP_SVR model in pseudocode. The notation $i\left[ c \right]$ refers to an item $i$ in a collection $c$. Furthermore, the notation $x \leftarrow e$ denotes the assignment of the value of an expression $e$ to a variable $x$. All referenced functions are prefixed by the name of the module they belong to. There are four modules, $Extraction$, $Selection$, $Scaling$ and $SVR$, each of which encapsulates a specific type of data manipulation. While the predict operation is explained first, some functions and parameters referenced by both operations are explained more closely in the subsequent description of the fit operation.

The input $origSeriesCollection$ of the predict operation is a set of $m$ continuous $orig$ series. All series in $origSeriesCollection$ are of equal length $Available Series Length\left( {ASL} \right) \in {\mathbb{N}}^{ + }$ and have the form $orig_{ - ASL} , \ldots ,orig_{ - 1}$. The first step of the predict operation is to extract the ${\text{diff}}$ and ${\text{qdiff }}$ series corresponding to each of the $m$ $orig$ series in the input. This is done via the $fTransform$ function of the $Extraction$ module, yielding an output matrix $X_{1} \in {\mathbb{R}}^{{m \times \left( {3 \cdot ASL - 5} \right)}}$. Each of the $m$ rows of $X_{1}$ contains the $3 \cdot ASL - 5$ elements of the $orig$, ${\text{diff}}$ and ${\text{qdiff }}$ series for a single input series. Since the elements of each row are aligned, the column indices of $X_{1}$ are the set $\left\{ {orig_{ - ASL} , \ldots ,orig_{ - 1} \} \cup \left\{ {{\text{diff}}_{ - ASL + 1} , \ldots ,{\text{diff}}_{ - 1} } \right\} \cup \{ {\text{qdiff}}_{ - ASL + 4} , \ldots ,{\text{qdiff}}_{ - 1} } \right\}$. These indices are also referred to as features.

The next step is to remove columns from $X_{1}$ corresponding to features with low predictive power. This is done using the $transform$ function of the $Selection$ module. It requires a model parameter, $selectedFeatures$, that specifies the set of $k$ features to retain. Note that all model parameters in the $modelParams$ collection are determined using the QEP_SVR model’s fit operation. Subsequently, each of the remaining columns in $X_{2} \in {\mathbb{R}}^{m \times k}$ are scaled by the $transform$ function of the $Scaling$ module. This produces a scaled matrix, $X_{3} \in {\mathbb{R}}^{m \times k}$. The scaling of each column is controlled by the model parameter $scaleParamsX$.

Next, the $predict$ function of the $SVR$ module uses the model parameter $svrParams$ to estimate the value of a specific target variable for each row of $X_{3}$. The result, $Y_{1} \in {\mathbb{R}}^{m \times 1}$, is then subject to inverse scaling by the $invTransform$ function of the $Scaling$ module to produce $Y_{2} \in {\mathbb{R}}^{m \times 1}$. Note that for a given matrix $X$ and constant scaling parameters $P$, the application of $Scaling.invTransform$ to the output of $Scaling.transform\left( {X, P} \right)$ (and vice versa) yields the original matrix, $X$.

The final step of the predict operation is the application of the $Extraction$ module’s $invTTransform$ function to $Y_{2}$. This ensures that the output, $Y_{3} \in {\mathbb{R}}^{m \times 1}$, consists of one-step-ahead predictions for the $m$ input ${\text{orig}}$ series in $origSeriesCollection$, since the $SVR.predict$ function may have returned (scaled) predictions for the corresponding ${\text{diff}}$ or ${\text{qdiff}}$ series instead. The $Extraction.invTTransform$ is notified of this through the model parameter $targetVar$, and uses the previously determined $X_{1} \in {\mathbb{R}}^{{m \times \left( {3 \cdot ASL - 5} \right)}}$ to compute the following:

$$Y_{3} = \left\{ {\begin{array}{*{20}l} {Y_{2} ,} \hfill &\quad {{\text{if}} \,targetVar = orig } \hfill \\ {X_{1}^{{orig_{ - 1} }} + Y_{2} ,} \hfill &\quad {{\text{if}}\,targetVar = {\text{diff }}} \hfill \\ {X_{1}^{{orig_{ - 4} }} + Y_{2} ,} \hfill &\quad {{\text{if}} \,targetVar = {\text{qdiff }}, } \hfill \\ \end{array} } \right.$$

where the notation $X_{1}^{{orig_{ - q} }}$ denotes the column of $X_{1}$ corresponding to the feature $orig_{ - q}$.

The fit operation determines model parameters for a specified $orig$ series length, $ASL$, for which one-step-ahead forecasts are to subsequently be made via the predict operation. Fitting is controlled by hyperparameters ($hyperParams$). As for the predict operation, the input $origSeriesCollection$ is a set of continuous $orig$ series. For the fit operation however, each series must have a length of at least $ASL + 1$ and lengths are not required to be equal.

The first step of the fit operation is to decompose the series in $origSeriesCollection$. Conceptually, this is done by sliding a window of length $ASL + 1$ along each input series, one step of a time. For example, given $ASL = 6$, a single input series $orig_{1} \ldots ,orig_{8}$ would be split into two series: $orig_{1} , \ldots ,orig_{7}$ and $orig_{2} , \ldots ,orig_{8}$. Assume this yields a total of $s$ series, each of the form $orig_{ - ASL} , \ldots ,orig_{0}$. Next, the first $ASL$ elements of each of these series are assigned to a row of the matrix $X_{1} \in {\mathbb{R}}^{s \times ASL}$, while the last element of each series is assigned to the corresponding row in $Y_{1} \in {\mathbb{R}}^{s \times 1}$. This decomposition is performed by the $Extraction.windowTransform$ function.

The $fTransform$ function of the $Extraction$ module also accepts $orig$ series in matrix form (in addition to set form, as in the predict operation). It uses $X_{1}$ to create a feature matrix $X_{2} \in {\mathbb{R}}^{{s \times \left( {3 \cdot ASL - 5} \right)}}$. The target matrix $Y_{2} \in {\mathbb{R}}^{s \times 1}$ is computed by the $Extraction.tTransform$ function using $X_{2}$, $Y_{1}$ and the hyperparameter $targetVar$ as follows

$$Y_{2} = \left\{ {\begin{array}{*{20}l} {Y_{1} ,} \hfill & \quad{{\text{if}}\, targetVar = orig } \hfill \\ {Y_{1} - X_{2}^{{orig_{ - 1} }} ,} \hfill &\quad {{\text{if}}\, targetVar = {\text{diff}} } \hfill \\ {Y_{1} - X_{2}^{{orig_{ - 4} }} ,} \hfill &\quad {{\text{if}} \,targetVar = {\text{qdiff }}, } \hfill \\ \end{array} } \right.$$

where $X_{2}^{{orig_{ - q} }}$ denotes the column of $X_{2}$ corresponding to the feature $orig_{ - q}$. Note that since $targetVar$ is also required as a model parameter by the predict operation, it is added to the $modelParams$ collection returned by the fit operation.

The next step is to choose the $k \in {\mathbb{N}}^{ + }$ features in $X_{2}$ with the highest estimated explanatory power for predicting $Y_{2}$. This is performed by the $fit$ function of the $Selection$ module. A heuristic for explanatory power is the mutual information score (Cover and Thomas 1991) of a feature with the target variable. It is defined as

$$I\left( {A;B} \right) = \mathop \int \limits_{B} \mathop \int \limits_{A} P_{{\left( {A,B} \right)}} \left( {a,b} \right)\log_{2} \frac{{P_{{\left( {A,B} \right)}} \left( {a,b} \right)}}{{P_{\left( A \right)} \left( a \right)P_{\left( B \right)} \left( b \right)}} da db,$$

where $A$ and $B$ are continuous random variables, $P_{{\left( {A,B} \right)}}$ denotes the joint probability density function of $A$ and $B$, and $P_{\left( A \right)}$ and $P_{\left( B \right)}$ are the marginal probability density functions of $A$ and $B$, respectively. Intuitively, mutual information describes the degree of uncertainty reduction in $A$ through knowledge of $B$. A higher mutual information score therefore suggests greater explanatory power. Once the set of features $selectedFeatures$ is determined, $Selection.transform$ removes all unimportant features from $X_{2}$ to produce $X_{3} \in {\mathbb{R}}^{s \times k}$.

Subsequently, the columns of $X_{3}$ and $Y_{2}$ are scaled. The type of scaling is determined by the $scaleTypeX$ and $scaleTypeY$ hyperparameters, respectively. These are either set to none, Gaussian or quantile Gaussian. The $fit$ function of the $Scaling$ module determines the scaling parameters $scaleParamsX$ and $scaleParamsY$, which are specific to the chosen scaling type ($scaleParamsX$ contains scaling parameters for each of the $k$ columns of $X_{3}$). The $Scaling$ module’s $transform$ function infers the scaling type from the scaling parameters and applies the scaling to produce $X_{4} \in {\mathbb{R}}^{s \times k}$ and $Y_{3} \in {\mathbb{R}}^{s \times 1}$.

If the scaling type is none, the scaling parameters are an empty set. Applying this type of scaling has no effect on the input matrix. In the case of Gaussian scaling, the scaling parameters are the sample mean and sample standard deviation of each column. The $transform$ function of the $Scaling$ module uses these parameters to replace each element of an input matrix $X$, with the z-score

$$z_{i,j} = \frac{{x_{i,j} - \bar{x}_{j} }}{{s_{j} }} ,$$

where $x_{i,j}$ is the element in the ith row and jth column of $X$, and $\bar{x}_{j}$ and $s_{j}$ are the sample mean and sample standard deviation of the jth column of $X$, respectively. Conversely, calling the $Scaling.invTransform$ function (referenced in the predict operation) with Gaussian scaling parameters has the effect of replacing each element $x_{i,j}$ with $s_{j} x_{i,j} + \bar{x}_{j}$.

Since Gaussian scaling depends on sample means, it is susceptible to outliers. Quantile Gaussian scaling uses a deterministic rank-based inverse normal transformation (Beasley et al. 2009), which is much more robust in the presence of outliers. The scaling parameters are a set of functions, $\left\{ {Q_{j} } \right\}$, where $Q_{j} :\left[ {0, 1} \right] \mapsto {\mathbb{R}}$ is a quantile function for the jth column of the matrix to be scaled. In this case, applying $Scaling.transform$ replaces each element of an input matrix $X$, $x_{i,j}$, with $\varPhi^{ - 1} (Q_{j}^{ - 1} \left( {x_{i,j} )} \right)$, where $Q_{j}^{ - 1}$ is the inverse quantile function for the jth column and $\varPhi^{ - 1}$ is the inverse cumulative distribution function of a standard normal distribution. The application of $Scaling.invTransform$ replaces each $x_{i,j}$ with $Q_{j} \left( {\varPhi \left( {x_{i,j} } \right)} \right)$.

After the completion of scaling, the $svrParams$ model parameter is determined using the $fit$ function of the $SVR$ module. The $SVR$ module performs ε-SVR, which was introduced in Sect. 2. In this case $X_{4}$ and $Y_{3}$ are considered as training data of the form $\left\{ {\left( {X_{4}^{\left( 1 \right)} , Y_{3}^{\left( 1 \right)} } \right), \ldots ,\left( {X_{4}^{\left( s \right)} , Y_{3}^{\left( s \right)} } \right)} \right\} \subset {\mathbb{R}}^{k} \times {\mathbb{R}}$, where the notation $X_{4}^{\left( i \right)}$ denotes the ith row of $X_{4}$. ε-SVR fits a model, $f:{\mathbb{R}}^{k} \mapsto {\mathbb{R}}$, to this training data, i.e. it learns the mapping from each row of $X_{4}$ to the corresponding element of $Y_{3}$. In the context of the QEP_SVR model, $f$ has the functional form

$$f\left( X \right) = \mathop \sum \limits_{i = 1}^{s} \left( {\alpha_{i} - \alpha_{i}^{*} } \right)K\left( {X_{4}^{\left( i \right)} ,X} \right) + b ,$$

where $X \in {\mathbb{R}}^{k}$, $\alpha_{i}$ and $\alpha_{i}^{*}$ are Lagrange multipliers associated with the ith training point and b is a constant. The kernel function $K$ is chosen to be the squared exponential (Rasmussen and Williams 2006), stipulated as

$$K\left( {X,X^{\prime}} \right) = e^{{ - \gamma {\parallel}X - X^{\prime}\parallel^2}},$$

where $X,X^{\prime} \in {\mathbb{R}}^{k}$, ${\parallel}X - X^{\prime}{\parallel}$ denotes the Euclidean norm of $X - X^{\prime}$, and the hyperparameter $\gamma \in {\mathbb{R}}$ is included in the $hyperParams$ collection provided as an input to the fit operation. As specified in Sect. 2, the value of $b$ can be estimated using the Karush–Kuhn–Tucker conditions (Karush 1939; Kuhn and Tucker 1951), while the Lagrange multipliers are found by solving the reformulated dual optimization problem:

$$\begin{aligned} & maximize\quad - \frac{1}{2}\mathop \sum \limits_{i,j = 1}^{s} \left( {\alpha_{i} - \alpha_{i}^{*} } \right)\left( {\alpha_{j} - \alpha_{j}^{*} } \right)K\left( {X_{4}^{\left( i \right)} ,X_{4}^{\left( j \right)} } \right) -\upvarepsilon\mathop \sum \limits_{i = 1}^{s} \left( {\alpha_{i} + \alpha_{i}^{*} } \right) + \mathop \sum \limits_{i = 1}^{s} Y_{3}^{\left( i \right)} \left( {\alpha_{i} - \alpha_{i}^{*} } \right) \\ & subject \,to\quad \left\{ {\begin{array}{*{20}l} {\mathop \sum \limits_{i = 1}^{s} \left( {\alpha_{i} - \alpha_{i}^{*} } \right) = 0} \hfill \\ {\alpha_{i} ,\alpha_{i}^{*} \in \left[ {0, C} \right]} \hfill \\ \end{array} } \right. \\ \end{aligned}$$

Recall that the constant $C > 0$ controls the trade-off between attaining a flat $f$ and not allowing deviations greater than ε. Both ε and $C$ are hyperparameters that must be passed to the fit operation. The set of Lagrange multipliers $\left\{ {\left( {\alpha_{i} ,\alpha_{i}^{*} } \right)| i = 1,2, \ldots s} \right\}$, the training points $\{ X_{4}^{\left( i \right)} | i = 1,2, \ldots s\}$ and the constant $b$ are assigned to the collection $svrParams$.

Finally, $selectedFeatures$, $scaleParamsX$, $scaleParamsY$, $svrParams$ as well as the hyperparameter $targetVar$ are returned by the fit operation as the set of model parameters required by the predict operation of the QEP_SVR model.

4 Research method

At any point during their practical application, the BR ARIMA model and the QEP_SVR model are in one of two phases: development or operation. When in operation, a model can be evaluated. Evaluating a model means passing it a continuous historic $orig$ series, $orig_{ - ASL} , \ldots ,orig_{ - 1}$, for which it returns a one-step-ahead prediction of $orig_{0}$, denoted by $orig_{0}^{*}$. As mentioned previously, both models consider an iterative approach to making multi-step-ahead predictions, during which a model is not refitted. Model parameters therefore cannot be modified during operation.

The development phase consists of any activities that prepare a model for operation. In the case of the BR ARIMA model, the values of $\phi$ (the autoregressive parameter) and $\theta$ (the moving average parameter) are firm-specific. This means that they are estimated from the single $orig$ series for which predictions are to be made (Lorek and Willinger 2011). Once $\phi$ and $\theta$ are determined, the BR ARIMA model is immediately evaluated to obtain predictions. Conversely, the hyperparameters and model parameters of the QEP_SVR model are estimated from the historic data of a collection of firms. The model can then be evaluated for multiple firms before entering another development phase.

The predictive accuracy of the QEP_SVR model is compared to that of the BR ARIMA model under different experiment conditions. A condition is described by two variables: available series length ($ASL$) and prediction steps ($PS$). $ASL$ is the length of the series for which predictions are made during operation, while $PS$ specifies how many quarters into the future earnings should be predicted (i.e. given $orig_{ - ASL} , \ldots ,orig_{ - 1}$, the element $orig_{PS - 1}$ is to be predicted). A total of 28 experiment conditions, $\left( {ASL,PS} \right) \in \left\{ {6, \ldots ,12} \right\} \times \left\{ {1, \ldots ,4} \right\}$, are considered. The chosen measure of predictive accuracy is mean absolute percentage error (MAPE) (Makridakis et al. 1982). It is defined as

$${\text{MAPE}} = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left| {\frac{{A_{i} - F_{i} }}{{A_{i} }}} \right|,$$

where $n$ is the number of predictions, and $F$ and $A$ denote the forecasted and actual quarterly earnings. As in Lorek and Willinger (2011), prediction errors exceeding 100 percent are truncated to 100 percent. This is done to avoid the effects of explosive prediction errors.

The data used for the experiment consists of the $orig$ quarterly earnings series of 117 companies across the German DAX, MDAX, SDAX and TecDAX stock market indices.¹ Each series contains 24 consecutive quarterly earnings from Q1 2012 to Q4 2017, giving a total of 2808 quarterly earnings. Table 1 summarizes the distribution of quarterly earnings and yearly book value of total assets of the companies in the experiment data. All values are in millions of Euros, to the nearest hundred thousand.

Table 1

Distribution of quarterly earnings and book value of total assets

	Mean	1st quartile	Median	3rd quartile
Quarterly earnings	104.1	5.3	20.3	62.0
Yearly book value of total assets	34,785.8	941.9	2288.5	7792.8

The main industries of companies in the experiment data are highlighted in Table 2. Industry classification is performed according to the International Standard Industrial Classification of All Economic Activities (ISIC) system (United Nations, 2008).

Table 2

Industries of companies in the experiment data

ISIC section	Number of companies
Manufacturing	51
Information and communication	16
Financial and insurance activities	13
Real estate activities	8
Wholesale and retail trade	8
Transportation and storage	6
Professional, scientific and technical activities	5
Other (sections with less than 5 companies)	10
Total	117

The MAPE values of the QEP_SVR model are calculated for all considered experiment conditions by executing $GetQEPErrors$. This operation implements tenfold cross-validation. The input $S$ is therefore a set of 10 disjoint subsets (folds), $\left\{ {S_{1} , \ldots ,S_{10} } \right\}$, of the 117 series in the experiment data. Each fold is of roughly equal size. The superscript notation $\left( {a :b} \right)$ slices all series in a set by removing earnings at quarters before $a$ and after $b$. All MAPE values are calculated for predictions of quarterly earnings in $predYear$ only.

Figure 1 illustrates four steps of how $GetQepErrors$ makes predictions for $predYear = 2016$ at $ASL = 8$ and $fold = 10$, assuming each fold $S_{i}$ were to consist of a single series. In the first step, one to four-step-ahead predictions are made for $S_{10}$, while only a one-step-ahead prediction can be made for $S_{10}$ in the fourth step.

The hyperparameters required by the fit operation of the QEP_SVR model are determined once for all experiment conditions. This is done by minimizing the MAPE values for predictions of quarterly earnings in 2016:

$$optimalHyperParams \leftarrow \mathop {\text{argmin}}\limits_{hyperParams} GetQepErrors\left( {S,2016, hyperParams} \right) .$$

The set of optimal (lowest) MAPE values found during hyperparameter optimization are referred to as validation errors:

$$validationErrs \leftarrow GetQepErrors\left( {S,2016, optimalHyperParams} \right)$$

However, a more accurate estimate of the QEP_SVR model’s predictive accuracy is obtained by using the optimized hyperparameters to make predictions for quarterly earnings in 2017. This is because the data for 2017 is not observed during hyperparameter optimization. These errors are referred to as testing errors and are obtained as follows:

$$testingErrs \leftarrow GetQepErrors\left( {S,2017, optimalHyperParams} \right)$$

The MAPE values for the BR ARIMA model are also calculated for predictions of quarterly earnings in 2016 and 2017. These are obtained by calling $GetBrArimaErrors\left( {experimentData,2016} \right)$ and $GetBrArimaErrors\left( {experimentData,2017} \right)$, where $experimentData$ denotes the set of all 117 $orig$ series. The $BR.fit$ function determines the values of $\phi$ and $\theta$ by minimizing the sum of squared disturbance terms.

MAPE values are compared on a per-company basis. This means that the errors returned by $GetBrArimaErrors$ and $GetQepErrors$ are aggregated such that there are 117 MAPE values for the BR ARIMA model and the QEP_SVR model under each of the 28 experiment conditions. Hypothesis testing is then performed to asses if the 117 MAPE values calculated for the QEP_SVR model are significantly lower than those calculated for the BR ARIMA under each condition, i.e. if the QEP_SVR model has a significantly higher predictive accuracy than the BR ARIMA model. Since this involves paired samples, the paired t test (Kim 2015) and the Wilcoxon signed-rank test (Wilcoxon 1945) are used. In both cases, the one-tailed test is considered.

For a given condition, let $\mu_{BR}$ and $\mu_{QEP}$ denote the mean of the 117 MAPE values calculated for the BR ARIMA and QEP_SVR models, respectively. Similarly, let $M_{BR}$ and $M_{QEP}$ denote the median of the 117 MAPE values for each model. The null and alternative hypotheses of the paired t-test, $H_{{T_{0} }}$ and $H_{{T_{1} }}$, are stated as

$$H_{{T_{0} }} : \mu_{QEP} \ge \mu_{BR} \quad H_{{T_{1} }} : \mu_{QEP} < \mu_{BR} ,$$

while the null and alternative hypotheses of the Wilcoxon signed-rank test, $H_{{W_{0} }}$ and $H_{{W_{1} }}$, are

$$H_{{W_{0} }} : M_{QEP} \ge M_{BR} \quad H_{{W_{1} }} : M_{QEP} < M_{BR } .$$

5 Results

The hyperparameter values that minimize the validation errors are shown in Table 3. Table 4 shows the results of comparing the validation and testing errors of the QEP_SVR model to the corresponding prediction errors of the BR ARIMA model under each of the 28 experiment conditions.

Table 3

Optimal hyperparameter values

Hyperparameter	Optimal value
$\varvec{targetVar}$	${\text{qdiff}}$
$\varvec{k}$	4
$\varvec{scaleTypeX}$	Quantile Gaussian
$\varvec{scaleTypeY}$	Quantile Gaussian
$\varvec{\varepsilon}$	0.04
$\varvec{C}$	0.2
$\varvec{\gamma}$	0.25

Table 4

Results of comparing the predictive accuracy of the QEP_SVR model to that of the BR ARIMA model

Conditions		Validation (2016)				Testing (2017)
ASL	PS	p value		Mean MAPE		p value		Mean MAPE
ASL	PS	t test	Wilcoxon	QEP_SVR	BR ARIMA	t test	Wilcoxon	QEP_SVR	BR ARIMA
6	1	0.000	0.000	0.447	0.587	0.000	0.000	0.438	0.594
6	2	0.000	0.000	0.440	0.615	0.000	0.000	0.453	0.617
6	3	0.000	0.000	0.460	0.665	0.000	0.000	0.485	0.672
6	4	0.000	0.000	0.477	0.653	0.000	0.000	0.535	0.725
7	1	0.000	0.000	0.443	0.514	0.000	0.000	0.438	0.530
7	2	0.000	0.000	0.439	0.526	0.000	0.000	0.454	0.527
7	3	0.000	0.000	0.458	0.548	0.000	0.000	0.487	0.583
7	4	0.000	0.000	0.485	0.591	0.011	0.005	0.540	0.608
8	1	0.000	0.000	0.434	0.508	0.000	0.000	0.429	0.500
8	2	0.000	0.000	0.423	0.491	0.000	0.000	0.441	0.512
8	3	0.000	0.000	0.445	0.523	0.000	0.000	0.473	0.559
8	4	0.002	0.001	0.486	0.571	0.008	0.004	0.536	0.597
9	1	0.005	0.014	0.442	0.472	0.001	0.000	0.429	0.468
9	2	0.013	0.028	0.429	0.462	0.019	0.023	0.437	0.468
9	3	0.079	0.105	0.451	0.476	0.005	0.002	0.469	0.514
9	4	0.158	0.097	0.494	0.522	0.088	0.018	0.543	0.567
10	1	0.000	0.000	0.433	0.575	0.000	0.000	0.430	0.524
10	2	0.000	0.000	0.444	0.587	0.000	0.000	0.440	0.554
10	3	0.000	0.000	0.483	0.566	0.000	0.000	0.465	0.581
10	4	0.001	0.000	0.439	0.568	0.045	0.045	0.546	0.586
11	1	0.000	0.000	0.434	0.549	0.000	0.000	0.425	0.506
11	2	0.000	0.000	0.444	0.532	0.000	0.000	0.434	0.517
11	3	0.000	0.000	0.483	0.556	0.001	0.000	0.468	0.530
11	4	0.000	0.000	0.438	0.575	0.055	0.056	0.540	0.577
12	1	0.000	0.000	0.438	0.507	0.000	0.000	0.431	0.491
12	2	0.000	0.001	0.439	0.507	0.000	0.000	0.438	0.503
12	3	0.000	0.000	0.455	0.539	0.001	0.000	0.471	0.535
12	4	0.018	0.001	0.481	0.539	0.095	0.106	0.543	0.575

Table 4 shows that 51 out of the 56 p-values calculated using the paired t-test lie below a significance level of 0.05, while 52 out of the 56 p-values calculated using the Wilcoxon signed-rank are below 0.05. We assume statistical significance under a condition if the p-values for a statistical test lie below 0.05 for predictions in both 2016 and 2017. The results in Table 4 therefore provide evidence for the rejection of $H_{{T_{0} }}$ and $H_{{W_{0} }}$ in favor of $H_{{T_{1} }}$ and $H_{{W_{1} }}$, respectively, under 24 of the 28 experiment conditions. The four conditions, ($ASL$, $PS$), with insufficient evidence for rejection are $\left( {9, 3} \right)$, $\left( {9, 4} \right)$, $\left( {11, 4} \right)$ and $\left( {12, 4} \right)$.

The 24 significant conditions include all those where $PS \in \left\{ {1,2} \right\}$ (i.e. having short forecast horizons). This leads to the first result: The predictive accuracy of the QEP_SVR model significantly exceeds that of the BR ARIMA model for short forecast horizons. This means the QEP_SVR model is particularly suitable for companies considering short-term operational planning.

Furthermore, the 24 significant conditions also include all those where $ASL \in \left\{ {6,7,8} \right\}$ (i.e. having limited historic data availability). The second result is thus stated as follows: The predictive accuracy of the QEP_SVR model significantly exceeds that of the BR ARIMA model when only limited historic quarterly earnings data is available. Situations of limited data availability arise in start-ups, as well as in companies that have recently made structural changes to their business model.

The selection of predictive features by the QEP_SVR model is also investigated. As shown in Table 3, the optimal value for the hyperparameter $k$ is found to be 4. This means the fit operation of the QEP_SVR model selects the four predictive features which have the highest mutual information score with the chosen target variable (${\text{qdiff}}_{ 0}$). Table 5 shows the probability of features being selected (i.e. a probability of 1 implies that a feature is always selected) across all predictions of quarterly earnings in 2016 and 2017. Features are arranged from left to right in descending order of mean selection probability across all $ASL$ values.

Table 5

Feature selection probabilities

ASL	Selection probability
ASL	${\text{orig}}_{ - 4}$	${\text{diff}}_{ - 3}$	${\text{diff}}_{ - 4}$	${\text{qdiff}}_{ - 4}$	${\text{qdiff}}_{ - 1}$	${\text{orig}}_{ - 2}$	${\text{orig}}_{ - 1}$
6	1.00	1.00	0.91	0.00	0.58	0.43	0.09
7	1.00	1.00	0.95	0.00	0.44	0.40	0.16
8	1.00	1.00	0.83	1.00	0.05	0.03	0.10
9	0.99	1.00	0.65	1.00	0.20	0.00	0.16
10	0.96	0.95	0.84	1.00	0.00	0.00	0.00
11	0.98	0.95	0.83	0.96	0.00	0.05	0.06
12	0.98	0.96	0.80	0.88	0.00	0.08	0.20
Mean	0.99	0.98	0.83	0.69	0.18	0.14	0.11

The features $orig_{ - 4}$ and ${\text{diff}}_{ - 3}$ have the highest mean selection probabilities and have a selection probability of at least 0.95 across all conditions. After becoming available for selection at $ASL = 8$, the feature ${\text{qdiff}}_{ - 4}$ is selected with a probability of at least 0.88 under all conditions with $ASL \ge 8$. The feature ${\text{qdiff}}_{ - 1}$ is never selected for $ASL \ge 10$. Only 7 different features have selection probabilities exceeding 0.1, despite the pool of available features increasing from 13 (when $ASL = 6$) to 31 (when $ASL = 12$). The fact that 6 out of these 7 features are already available for selection at $ASL = 6$ suggests that increasing $ASL$ may have little effect on the predictive accuracy of the QEP_SVR model.

6 Conclusion

Following the recommendations of Lorek (2014), we propose our QEP_SVR model as a new univariate statistical model for the prediction of quarterly earnings. Empirical evidence shows that under 24 out of 28 tested conditions, the predictive accuracy of the QEP_SVR model is significantly higher than that of the state-of-the-art BR ARIMA model. Furthermore, the significant conditions include all those considering one and two-step-ahead predictions ($PS \in \left\{ {1,2} \right\}$), as well as those for which only limited historic data is available ($ASL \in \left\{ {6,7,8} \right\}$). The experimental results therefore advocate using the QEP_SVR model instead of the BR ARIMA model for short-term operational planning, for recently founded companies and for companies that have recently made fundamental changes to their business model.

Since the hyperparameters and model parameters of the QEP_SVR model are determined from the historic data of multiple companies, further research is needed to understand how the choice of these companies affects predictive accuracy. Factors of interest include the industry, size, and diversity of companies, as well as their relationship to those companies for which predictions are to be made. Other areas of research include studying the effect of condition-specific hyperparameter optimization and exploring methods for combining the QEP_SVR model with other forecasting methods. As an example, predictions of the QEP_SVR model could be combined with analysts’ forecasts, building upon the research of Elgers et al. (2016).

Acknowledgements

Open Access funding provided by Projekt DEAL.

Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Nächster Artikel Family firms and long-term orientation of SG&A expenditures

All data is obtained from publicly available sources. The data and code for the experiments is accessible at the following GitHub repository: https://github.com/fischja/quarterly-earnings-predictions.

Baginski SP, Branson BC, Lorek KS, Willinger GL (2003) A time-series approach to measuring the decline in quarterly earnings persistence. Adv Account 20:23–42. https://doi.org/10.1016/S0882-6110(03)20002-XCrossRef

Beasley TM, Erickson S, Allison DB (2009) Rank-based inverse normal transformations are increasingly used, but are they merited? Behav Genet 39:580–595. https://doi.org/10.1007/s10519-009-9281-0CrossRef

Breiman L (2001) Random forests. Mach Learn 45:5–32. https://doi.org/10.1023/A:1010933404324CrossRef

Brown LD, Rozeff MS (1979) Univariate time-series models of quarterly accounting earnings per share: a proposed model. J Account Res 17:179–189CrossRef

Chen Y-C, Lee C-H, Chou P-I (2015) Stock-based compensation and earnings management behaviors. Rev Pac Basin Finan Mark Pol 18:1550008. https://doi.org/10.1142/S0219091515500083CrossRef

Cover TM, Thomas JA (1991) Elements of information theory. Wiley series in telecommunications. Wiley, New YorkCrossRef

Dechow PM, Kothari SP, Watts RL (1998) The relation between earnings and cash flows. J Account Econ 25:133–168. https://doi.org/10.1016/S0165-4101(98)00020-2CrossRef

Dopuch N, Seethamraju C, Xu W (2008) An empirical assessment of the premium associated with meeting or beating both time-series earnings expectations and analysts’ forecasts. Rev Quant Finan Acc 31:147–166. https://doi.org/10.1007/s11156-007-0075-2CrossRef

Elgers PT, Lo MH, Xie W, Le Xu E (2016) A contextual evaluation of composite forecasts of annual earnings. Rev Pac Basin Finan Mark Pol 19:1650014. https://doi.org/10.1142/S0219091516500144CrossRef

Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Statist 29:1189–1232. https://doi.org/10.1214/aos/1013203451CrossRef

Hastie T, Tibshirani R, Friedman JH (2009) The elements of statistical learning: Data mining, inference, and prediction, 2nd edn. Springer series in statistics. Springer, New YorkCrossRef

Hayn C (1995) The information content of losses. J Account Econ 20:125–153. https://doi.org/10.1016/0165-4101(95)00397-2CrossRef

Isidro H, Dias JG (2017) Earnings quality and the heterogeneous relation between earnings and stock returns. Rev Quant Finan Acc 49:1143–1165. https://doi.org/10.1007/s11156-017-0619-zCrossRef

Karush W (1939) Minima of functions of several variables with inequalities as side constraints. Master’s thesis, University of Chicago

Kim TK (2015) T test as a parametric statistic. Korean J Anesthesiol 68:540–546. https://doi.org/10.4097/kjae.2015.68.6.540CrossRef

Klein A, Marquardt CA (2006) Fundamentals of accounting losses. Account Rev 81:179–206. https://doi.org/10.2308/accr.2006.81.1.179CrossRef

Kuhn HW, Tucker AW (1951) Nonlinear programming. In: The Regents of the University of California

Kwon SS, Yin J (2015) A comparison of earnings persistence in high-tech and non-high-tech firms. Rev Quant Finan Acc 44:645–668. https://doi.org/10.1007/s11156-013-0421-5CrossRef

Lorek KS (2014) A critical assessment of the time-series literature in accounting pertaining to quarterly accounting numbers. Adv Account 30:315–321. https://doi.org/10.1016/j.adiac.2014.09.008CrossRef

Lorek KS, Willinger GL (2008) Time-series properties and predictive ability of quarterly cash flows. Adv Account 24:65–71. https://doi.org/10.1016/j.adiac.2008.05.010CrossRef

Lorek KS, Willinger GL (2011) Multi-step-ahead quarterly cash-flow prediction models. Account Horizons 25:71–86. https://doi.org/10.2308/acch.2011.25.1.71CrossRef

Makridakis S, Andersen A, Carbone R, Fildes R, Hibon M, Lewandowski R, Newton J, Parzen E, Winkler R (1982) The accuracy of extrapolation (time series) methods: results of a forecasting competition. J Forecast 1:111–153. https://doi.org/10.1002/for.3980010202CrossRef

Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning. Adaptive computation and machine learning. MIT, Cambridge

Smola AJ, Schölkopf B (2004) A tutorial on support vector regression. Stat Comput 14:199–222. https://doi.org/10.1023/B:STCO.0000035301.49549.88CrossRef

Thissen U, van Brakel R, de Weijer AP, Melssen WJ, Buydens LMC (2003) Using support vector machines for time series prediction. Chemometr Intell Lab Syst 69:35–49. https://doi.org/10.1016/S0169-7439(03)00111-4CrossRef

United Nations (2008) International Standard industrial classification of all economic activities (ISIC). Rev. 4. Statistical papers. Series M, no. 4, re 4. United Nations, New York

Vapnik V (1995) The nature of statistical learning theory. Springer, New YorkCrossRef

Wilcoxon F (1945) Individual comparisons by ranking methods. Biometrics Bulletin 1:80. https://doi.org/10.2307/3001968CrossRef

Zoubi TA, Salama F, Hossain M, Alkafaji YA (2016) The value relevance of components of other comprehensive income when net income is disaggregated. Rev Pac Basin Finan Mark Pol 19:1650027. https://doi.org/10.1142/S0219091516500272CrossRef

Titel: A machine learning approach to univariate time series forecasting of quarterly earnings
verfasst von: Jan Alexander Fischer
Philipp Pohl
Dietmar Ratz
Publikationsdatum: 03.02.2020
Verlag: Springer US
Erschienen in: Review of Quantitative Finance and Accounting / Ausgabe 4/2020
Print ISSN: 0924-865X
Elektronische ISSN: 1573-7179
DOI: https://doi.org/10.1007/s11156-020-00871-3

Hyperparameter	Optimal value
\(\varvec{targetVar}\)	\({\text{qdiff}}\)
\(\varvec{k}\)	4
\(\varvec{scaleTypeX}\)	Quantile Gaussian
\(\varvec{scaleTypeY}\)	Quantile Gaussian
\(\varvec{\varepsilon}\)	0.04
\(\varvec{C}\)	0.2
\(\varvec{\gamma}\)	0.25

ASL	Selection probability
ASL	\({\text{orig}}_{ - 4}\)	\({\text{diff}}_{ - 3}\)	\({\text{diff}}_{ - 4}\)	\({\text{qdiff}}_{ - 4}\)	\({\text{qdiff}}_{ - 1}\)	\({\text{orig}}_{ - 2}\)	\({\text{orig}}_{ - 1}\)
6	1.00	1.00	0.91	0.00	0.58	0.43	0.09
7	1.00	1.00	0.95	0.00	0.44	0.40	0.16
8	1.00	1.00	0.83	1.00	0.05	0.03	0.10
9	0.99	1.00	0.65	1.00	0.20	0.00	0.16
10	0.96	0.95	0.84	1.00	0.00	0.00	0.00
11	0.98	0.95	0.83	0.96	0.00	0.05	0.06
12	0.98	0.96	0.80	0.88	0.00	0.08	0.20
Mean	0.99	0.98	0.83	0.69	0.18	0.14	0.11

Springer Professional

Abstract

Publisher's Note

1 Introduction

2 Epsilon support vector regression (ε-SVR)

3 The QEPSVR model

4 Research method

5 Results

6 Conclusion

Acknowledgements

Publisher's Note

Weitere Artikel der Ausgabe 4/2020

Impact of economic policy uncertainty on disclosure and pricing of earnings news

Selection bias and pseudo discoveries on the constancy of stock return anomalies

Government control and the value of cash: evidence from listed firms in China

The economic benefits of returned-global Chinese IPOs

Identity of multiple large shareholders and corporate governance: are state-owned entities efficient MLS?

The value of innovation and the spillover effect on alliance partners

3 The QEP_SVR model