2 Methodology
Several approaches to inference about the tail index
\(\zeta \) of heavy-tailed distributions are available in the literature (see the review in Embrechts et al.
1997, Beirlant et al.
2004, Ch. 7 in McNeil et al.
2005; Ibragimov et al.
2015, and references therein). The two most commonly used ones are Hill’s estimator and the OLS approach using the log–log rank-size regression.
Let
\(X_{1}, X_{2}, {\ldots }, X_{N}>0\) be a sample from a population satisfying power law (
1) (e.g., a sample of household income or wealth levels). Further, let, for a tail truncation level
\(n<N\)
$$\begin{aligned} X_{(1)}\ge X_{(2)}\ge \ldots \ge X_{(n)}\ge X_{(n+1)} \end{aligned}$$
(4)
be decreasingly ordered largest (extreme) values of observations in the sample (that is,
\(n+1\) upper order statistics for the sample). Typically, in practice, the number
n of extreme observations (
4) used in estimation of the tail index is taken to be equal to some fraction (e.g.,
\(m\%=10\%\), 5% or 1%) of the total sample size
N:
n=mN/100 (see the discussion in Gabaix and Ibragimov
2011; Ibragimov et al.
2013,
2015, and references therein).
4
Hill’s estimator
\(\hat{\zeta } _{Hill} \) of the tail index
\(\zeta \) is given by (see, among others, Embrechts et al.
1997; Drees et al.
2000, Beirlant et al.
2004, Gabaix
2008; Ibragimov et al.
2013,
2015, and references in those works)
\(\hat{\zeta }_{Hill} =n/\mathop \sum \nolimits _{t=1}^n \left[ {\log (X_{\left( t \right) } )-\log \left( {X_{\left( {n+1} \right) } } \right) } \right] \). The standard error of the estimator is
\(s.e._{Hill} =\hat{\zeta }_{Hill} /\sqrt{n}\). The corresponding 95%-confidence interval for the true (unknown) tail index
\(\zeta \) is thus
\(\Big ( \hat{\zeta }_{Hill} -1.96\hat{\zeta }_{Hill} /\sqrt{n},\hat{\zeta }_{Hill} +1.96\hat{\zeta }_{Hill} /\sqrt{n} \Big ).\)
It was reported in a number of studies that inference on the tail index using Hill’s estimator suffers from several problems, including non-robustness with sensitivity to dependence in data and deviations from power laws in the form of additional slowly varying factors in (
1), and poor finite sample properties (see, for instance, Embrechts et al.
1997, Ch. 6). Motivated by these problems, several studies have focused on alternative approaches to the tail index estimation. For instance, Huisman et al. (
2001) propose a weighted analogue of Hill’s estimator that was reported to correct its small sample bias for sample sizes less than 1000. Embrechts et al. (
1997), among others, advocate sophisticated nonlinear procedures for tail index estimation.
In addition to Hill’s estimates of tail indices of income distributions, we also provide tail index estimates obtained using modifications of log–log rank-size regressions with shifts in ranks recently proposed in Gabaix and Ibragimov (
2011) (see also Ibragimov et al.
2015). These estimation procedures use the optimal shifts in ranks and the correct standard errors obtained in Gabaix and Ibragimov (
2011).
Despite the availability of more sophisticated methods, a popular way to estimate the tail index
\(\zeta \) is still to run the following OLS log–log rank-size regression with
\(\lambda = 0\):
\(\log \left( {t-\lambda } \right) =a-b\cdot \log \left( {X_{\left( t \right) } } \right) ,t=1,...,n,\) or, in other words, calling
t the rank of an observation, and
X(
t) its size:
\(\log \left( {Rank-\lambda } \right) =a-b\cdot \log \left( {Size} \right) \) (here and throughout the rest of the paper, log(
\(\cdot )\) stands for the natural logarithm). The reason for the popularity of the OLS approach to tail index estimation is arguably the simplicity and robustness of this method, including its robustness to dependence and deviations from power laws (see the review in Gabaix and Ibragimov
2011; Ibragimov et al.
2013,
2015, and the discussion at the end of this section).
Unfortunately, the tail index estimation procedures based on OLS log–log rank-size regressions with
\(\lambda = 0\) are strongly biased in small samples. The paper by Gabaix and Ibragimov (
2011) provides a simple practical remedy for this bias, and argues that, if one wants to use an OLS regression approach to tail index estimation, one should use the
\({ Rank}-1/2\), and run
\(\log \left( {Rank-1/2} \right) =a-b\cdot \log \left( {Size} \right) ,\) that is,
$$\begin{aligned} \log \left( {t-1/2} \right) =a-b\cdot \log (X_{( t )} ),\ t=1,\ldots ,n. \end{aligned}$$
(5)
In (
5), one takes the OLS estimate
\(\hat{b}\) as the log–log rank-size regression estimate
\(\hat{\zeta }_{RS} \) of the tail index
\(\zeta \). The shift of 1/2 is optimal and reduces the bias to a leading order. The standard error of the estimator
\(\hat{\zeta }_{RS} \) is
\(s.e._{RS} =\sqrt{2}\hat{\zeta }_{RS} /\sqrt{n}\) (the standard error is thus different from the OLS standard error given by
\(\hat{\zeta }_{RS} /\sqrt{n}\)). The corresponding correct 95% confidence interval for
\({\zeta }\) is thus
\(\left( {\hat{\zeta }_{RS} -1.96\sqrt{2}\hat{\zeta }_{RS} /\sqrt{n},\hat{\zeta }_{RS} +1.96\sqrt{2}\hat{\zeta }_{RS} /\sqrt{n}} \right) .\)
Numerical results in Gabaix and Ibragimov (
2011) (see also the discussion in Ibragimov et al.
2015) demonstrate the advantage of the proposed approach over the standard OLS estimation procedures with
\(\lambda = 0\) and indicate that it performs well under autocorrelated processes driven by innovations with deviations from power laws in the form of additional slowly varying factors in (
1) and, importantly, for dependent heavy-tailed processes, including GARCH models that are often used as frameworks for modeling the dynamics of dependent heavy-tailed economic and financial time series such as financial returns and foreign exchange rates. The modifications of the OLS log–log rank-size regressions with the optimal shift
\(\lambda =1/2\) and the correct standard errors provided by Gabaix and Ibragimov (
2011) were subsequently used in a number of works in economics and finance (see the discussion and references in Gabaix and Ibragimov
2011; Ibragimov et al.
2015).
4 Empirical results
Table
3 provides the tail index estimates
\(\hat{\zeta }_{RS} \) for the income distribution in Russia obtained using log–log rank-size regression (
5) with the optimal shift
\(\lambda = 1/2\) and the correct standard errors
\(s.e._{RS} =\sqrt{2}\hat{\zeta }_{RS} /\sqrt{n}\), as discussed in Sect.
2. The table also provides the correct 95% -confidence intervals for the true tail indices
\(\zeta \) in (
1) constructed using these standard errors. The last three columns of Table
3 provide Hill’s tail index estimates
\(\hat{\zeta }_{Hill} \), their standard errors
\(s.e._{Hill} =\hat{\zeta }_{Hill} /\sqrt{n}\) and the corresponding 95% confidence intervals for the tail indices
\(\zeta \).
The inference results for Russia in Table
3 are presented for the number
n of extreme observations (
4) used in estimation equal to
\(m\%=10\%\) and 5% of the total sample size
N:
n=mN/100 (see Sect.
2).
6
The results in Table
3 for the income distribution in Russia are largely in agreement with the empirical results on the tail indices
\(\zeta \in (1.5, 3)\) for income distribution in developed economies (Sect.
1.1). Namely, all the log–log rank-size regression point estimates
\(\hat{\zeta }_{RS} \) and Hill’s estimates
\(\hat{\zeta }_{Hill} \) in the table are very close to the value
\(\zeta =3\). The most of these point estimates are slightly smaller than 3 and, thus, belong to the benchmark interval
\(\zeta \in (1.5, 3)\).
Similarly, the most of the confidence intervals constructed using the estimates
\(\hat{\zeta }_{RS} \) and
\(\hat{\zeta }_{Hill} \) in the table either lie inside of the interval (1.5, 3) or have their larger parts lying in the interval. Furthermore, this is the case for the tail index estimates and the corresponding confidence intervals constructed using different tail truncation levels (
\(m\%=10\%\), 5% and 1% of the total number
N of observations). For instance, according to the confidence intervals in the last two rows of Table
3 constructed using the estimates
\(\hat{\zeta }_{RS} \) and
\(\hat{\zeta }_{Hill} \) for different truncation levels
\(m\%\), the tail index
\(\zeta \) of the income distribution in Russia in the 4th quarter of 2007 satisfies
\(\zeta \in (2.7, 3.2)\) with 95% probability. Similar conclusions also hold for other time periods in the table.
Importantly, the left end points of
all the confidence intervals in Table
3 are greater than 2. That is, the null hypothesis
\(H_{0}:\zeta =2\) is rejected in favor of
Ha:
\(\zeta >2\) at the 2.5% significance level for all the time periods dealt with. These conclusions thus imply that the variance of the income distribution in Russia is finite.
Similar to the point estimates
\(\hat{\zeta }\), the right-end points of all the confidence intervals in Table
3 are close to the right boundary (=3) of the interval
\(\zeta \in (1.5, 3)\) in developed markets. Thus, similar to the case of developed markets (see the discussion in Sects.
1.1), the third moment is likely to be infinite for the income distribution in Russia.
In addition, the right-end points of all the confidence intervals are smaller than 4. This implies that the null hypothesis \(H_{0}\): \(\zeta =4\) is rejected in favor of Ha: \(\zeta <4\) at the 2.5% significance level in all of the time periods in the table. Consequently, similar to the case of developed economies, the income distribution in Russia has infinite fourth moment.
The qualitative agreement of the results in Table
3 with those for developed economies in the literature indicates that the likelihood of observing very large income values and outliers in income distribution in Russia is similar to that in developed countries. Importantly, the above similarity conclusions are in contrast to the case of foreign exchange rates, where, according to the analysis in Ibragimov et al. (
2013) discussed in Sect.
1.3, heavy-tailedness properties in emerging markets are more pronounced compared to developed economies. Furthermore, the estimates obtained in this section indicate that the tail index of Russian income distribution is greater than 2 and, thus, the distribution has finite variance. As discussed in the introduction, this conclusion is especially important since it justifies applicability and validity of the standard OLS regression methods in the analysis of models involving the income data.
Similar to Figures 3.1–3.3 and the discussion in Section 3 in Ibragimov et al. (
2013) and Section 3.2 in Ibragimov et al. (
2015), in order’ in order to illustrate the appropriateness of the tail truncation levels (
\(m\%=5\%\) and 10% of the total number
N of observations) used in this section for income distribution in Russia (see Table
3), we follow the analysis and suggestions in Embrechts et al. (
1997) and Mikosch and Stariča (
2000) and present the analogues of Hill’s plots for the log–log rank-size regression tail index estimates for the Russian income distribution in the
\(4\mathrm{th}\) quarter of 2007 (Figure
1). These are graphs of the log–log rank-size regression point estimates
\(\hat{\zeta }_{RS} \) of the tail index of income distribution in Russia in that period for different truncation levels (
n=mN/100 largest observations,
m% of the total sample size
N, used in tail index estimation). Figure
1 also presents the corresponding 95% confidence intervals for the true tail index of the income distribution computed using the estimated log–log rank-size regressions. The diagram points out to relative stability of the log–log rank-size regression tail index estimates across different truncation levels. In particular, similar to Figures 3.1–3.3 in Ibragimov et al. (
2013), we note that the 95% confidence intervals for the true tail index in the diagram constructed using log–log regressions with different truncation levels intersect. This means that the tail indices in corresponding power law approximations (
1) for the tails of income distribution in Russia are statistically indistinguishable. In particular, the above conclusions on statistical tests for the tail index of the Russian income distribution (e.g., whether
\(\zeta \in (1.5, 3), \zeta >2\) and
\(\zeta <4\)) remain the same regardless of truncation levels for the log–log rank-size regression tail index estimates and the corresponding confidence intervals the tests are based on.
7
5 Conclusion and further research
Emerging and developing economies are likely to be more volatile than their developed counter-parts and subject to more extreme external and internal shocks. The higher degree of volatility leads to the expectation that heavy-tailedness properties and distributions of key variables in these markets, including income and wealth, may differ from those in developed economies. However, the results obtained in this paper point out to important and somewhat surprising similarities between the heavy-tailedness properties of (and corresponding conclusions on upper-tail inequality in) income distribution in Russia and those in developed markets. This is in contrast to the case of foreign exchange rates, where, according to the analysis in Ibragimov et al. (
2013), heavy-tailedness properties in emerging markets are more pronounced compared to developed economies. According to the empirical results in the working paper version Ibragimov and Ibragimov (
2010) of this work, similar conclusions also hold for income distributions in several CIS economies. Among other implications, the analysis in the paper points out to the necessity of the use of inference methods that are robust to heavy tails, infinite moments, heterogeneity and outliers in the analysis of income and wealth data.
Of key importance is the long-standing problem of quantifying and further explaining the relationship between key macroeconomic variables, such as economic development and mobility in labor markets, and inequality and income distribution, including its behavior in the upper tails. Concerning this research direction, as discussed at the beginning of the paper, the dynamics of inequality is strongly correlated with such important macroeconomic factors as mobility in labor markets and changes in labor market conditions (see Kopczuk et al.
2010, who focus on the analysis of earnings inequality and mobility in the United States in the
\(20\mathrm{th}\) century). In addition, income distribution and its heavy-tailedness, and the dynamics of inequality, upper-tail inequality and their structure are affected by political and macroeconomic structural breaks such as wars and economic crises (e.g., see Piketty and Saez
2003, for the discussion of effects of the Great Depression and World War II on top capital incomes and top wage shares in the United States in the last century), changes in government policies and labor market regulations and other institutional factors. Further research may focus on the analysis of the important problem of the effects of economic and financial crises and other structural shocks on (upper tail) income and wealth inequality in different countries, including Russia and emerging economies, and international and global inequality. Similar to the discussion in this paper, the analysis of the crisis impact on upper tail inequality is related to the study of its effects on heavy-tailedness properties of income and wealth distributions. Such a study may be conducted using robust confidence intervals for tail indices and inequality measures for the pre-crisis and post-crisis periods constructed similar to this paper (see Ibragimov et al.
2013,
2015, for the analysis of the effects of the on-going global crisis on heavy-tailedness characteristics of emerging and developed country exchange rates). Further research topics of key interest include the analysis of economic explanations for the obtained empirical results on heavy-tailedness of income distribution in Russia and other CIS countries. This also concerns observed similarities in heavy-tailedness properties and upper-tail inequality in emerging economies like Russia and those in developed markets, as measured by tail indices of their income distributions.