Skip to main content

2013 | Buch

The Gini Methodology

A Primer on a Statistical Methodology

insite
SUCHEN

Über dieses Buch

Gini's mean difference (GMD) was first introduced by Corrado Gini in 1912 as an alternative measure of variability. GMD and the parameters which are derived from it (such as the Gini coefficient or the concentration ratio) have been in use in the area of income distribution for almost a century. In practice, the use of GMD as a measure of variability is justified whenever the investigator is not ready to impose, without questioning, the convenient world of normality. This makes the GMD of critical importance in the complex research of statisticians, economists, econometricians, and policy makers.

This book focuses on imitating analyses that are based on variance by replacing variance with the GMD and its variants. In this way, the text showcases how almost everything that can be done with the variance as a measure of variability, can be replicated by using Gini. Beyond this, there are marked benefits to utilizing Gini as opposed to other methods. One of the advantages of using Gini methodology is that it provides a unified system that enables the user to learn about various aspects of the underlying distribution. It also provides a systematic method and a unified terminology.

Using Gini methodology can reduce the risk of imposing assumptions that are not supported by the data on the model. With these benefits in mind the text uses the covariance-based approach, though applications to other approaches are mentioned as well.

Inhaltsverzeichnis

Frontmatter

Introduction

Chapter 1. Introduction
Abstract
Gini’s mean difference (hereafter, GMD) was first introduced by Corrado Gini in 1912 as an alternative measure of variability. GMD and the parameters which are derived from it (such as the Gini coefficient, also referred to as the concentration ratio) have been in use in the area of income distribution for almost a century, and there is evidence that the GMD was introduced even earlier (Harter, 1978). In other areas it seems to make sporadic appearances and to be “rediscovered” again and again under different names. It turns out that GMD has at least 14 different alternative representations. Each representation can be given its own interpretation and naturally leads to a different analytical tool such as L1 metric, order statistics theory, extreme value theory, concentration curves, and more. Some of the representations hold only for nonnegative variables while others need adjustments for handling discrete distributions. On top of that, the GMD was developed in different areas and in different languages. Corrado Gini himself mentioned this difficulty (Gini, 1921). Therefore in many cases even an experienced expert in the area may fail to identify a Gini when he or she sees one.
Shlomo Yitzhaki, Edna Schechtman

Theory

Frontmatter
Chapter 2. More Than a Dozen Alternative Ways of Spelling Gini
Abstract
Gini’s mean difference (GMD) as a measure of variability has been known for over a century. It has more than 14 alternative representations. Some of them hold only for continuous distributions while others hold only for nonnegative variables. It seems that the richness of alternative representations and the need to distinguish among definitions that hold for different types of distributions are the main causes for its sporadic reappearances in the statistics and economics literature as well as in other areas of research. An exception is the area of income inequality, where it is holding the position as the most popular measure of inequality. GMD was “rediscovered” several times (see, for example, Chambers 6; Quiggin, 2007; David, 1968; Jaeckel, 1972; Jurečková, 1969; Olkin Yitzhaki, 1992; Kőszegi Rabin, 2007; Simpson, 1949) and has been used by investigators who did not know that they were using a statistic which was a version of the GMD. This is unfortunate, because by recognizing the fact that a GMD is being used the researcher could save time and research effort and use the already known properties of GMD.
Shlomo Yitzhaki, Edna Schechtman
Chapter 3. The Gini Equivalents of the Covariance, the Correlation, and the Regression Coefficient
Abstract
Given two random variables, one may be interested in the correlation or association or concordance between them (Gili & Bettuzzi 1985). This purpose can be generalized by following Daniels who stated the target as “the degree of agreement” (Daniels, 1950, p. 171) between the order and the rank-order of two variables.
Shlomo Yitzhaki, Edna Schechtman
Chapter 4. Decompositions of the GMD
Abstract
Several basic methods of statistical analysis such as regression and analysis of variance are based on the properties of the decomposition of the measure of variability. In the decomposition of a measure of variability we differentiate between two kinds of decompositions:
Shlomo Yitzhaki, Edna Schechtman
Chapter 5. The Lorenz Curve and the Concentration Curve
Abstract
The Lorenz and the concentration curves play important roles in the areas of GMD and the related measures such as Gini covariance, Gini correlation, Gini regression, and more. In this chapter we introduce the curves, discuss their properties, and show their connections to the Gini world. In addition, in order to be able to analyze the parallel concepts that are common in the variance world we investigate the equivalents of the Lorenz and the concentration curves that are relevant to the variance and the covariance, respectively. Those parallel curves share some properties among themselves. Therefore one can deduct from the concentration curve about some properties of the covariance and not only about the Gini covariance. In addition we present the relationships between the concepts of second-degree stochastic dominance and welfare dominance on one hand and the concentration and Lorenz curves on the other hand. These relationships enable the Gini methodology to serve as an analytical tool for statistical analyses and to be compatible with economic theory, a property that holds for the variance as well, but only for specific distributions.
Shlomo Yitzhaki, Edna Schechtman
Chapter 6. The Extended Gini Family of Measures
Abstract
The GMD has many alternative presentations. Some of these alternative presentations can be extended into families of variability measures and the GMD can be viewed as one member of such a family. The fact that there are several alternative presentations implies that one can present the GMD and the Gini coefficient as belonging to several alternative families. These families differ in the properties they have. We do not intend to survey the properties of all possible families. We choose to concentrate on one family that is useful in several fields of applications. We will refer to it as the extended Gini family. However, the reader should keep in mind that for different fields of applications one may want to have alternative extensions.
Shlomo Yitzhaki, Edna Schechtman
Chapter 7. Gini Simple Regressions
Abstract
The basic building block in regression is the covariance between the dependent variable and the explanatory variable(s). There are two regression methods that can be interpreted as based on Gini’s Mean Difference (GMD). The first method is based on the fact that one can present the Gini-covariance between the dependent variable and the explanatory variable as a weighted sum of slopes of the regression curve (a semi-parametric approach). The second method is based on the minimization of the GMD of the residuals. The semi-parametric approach is similar in its structure to the Ordinary Least Squares (OLS) method. That is, the regression coefficient in the OLS has an equivalent term in the Gini semi-parametric regression. The equivalent term is constructed by substituting the covariance and the variance in the OLS regression by the Gini-covariance (hereafter co-Gini) and the Gini, respectively. However, unlike the OLS, the Gini regression coefficient and its estimator are not derived by solving a minimization problem. Therefore they do not have optimality properties and cannot be described as “the best,” at least not with respect to a simple target function. On the other hand, the second method, the minimization of the GMD of the residuals implies optimality but it has its drawbacks. Like Mean Absolute Deviation (MAD) and quantile regressions, the regression coefficient does not have an explicit presentation and can be calculated only numerically. The combination of the two methods of Gini regression enables the user to investigate the appropriateness of the assumptions that lie behind the OLS and Gini regressions (e.g., the linearity of the relationship) and therefore can improve the quality of the conclusions that are derived from them. Moreover, when dealing with a multiple regression one can combine the semi-parametric regression method with the OLS regression method. That is, several explanatory variables can be treated as in the OLS, while others are treated using the Gini method. This flexibility enables one to evaluate the effect of the choice of a regression method on the estimated coefficients in a gradual way by substituting the methodology of the estimation for each explanatory variable in a stepwise way rather than in an “all or nothing” way. This issue will be discussed in Chap. 8.
Shlomo Yitzhaki, Edna Schechtman
Chapter 8. Multiple Regressions
Abstract
The purpose of the simple regression is to study the relationship between one explanatory variable and one dependent variable. The purpose of a multiple regression (the term was first used by Pearson and Lee (1908)) is to learn about the relationships between several explanatory variables and a dependent variable. The extension of the model from one explanatory variable into several explanatory variables introduces several complications. For example, in a multiple regression setting one has to consider the effects of the relationships among the explanatory variables on the estimates. On the other hand, an advantage is that one can mix the regression methodologies used (i.e., apply different regression methodologies to different explanatory variables). In this chapter we will be mainly interested in methods of multiple regressions that are based on the simple regression coefficients. By “based on” we mean not only that the multiple regression coefficients are derived by the same principle that is used to derive the simple regression coefficients but also that the simple regression coefficients are used as the building blocks of the multiple regression coefficients. As such, one can learn about their properties from the properties of the simple coefficients. In particular, we have shown in Chap.​ 7 that the Ordinary Least Squares (OLS) and semi-parametric Gini regression estimators can be interpreted as the slopes of the linear approximations to a regression curve, because they are based on weighted averages of slopes defined between adjacent observations. In other words, the linearity assumption on the regression curve is not used in the estimation stage. This property continues to hold in our extension into the multiple regression case. However, we do introduce some kind of a linearity requirement. The linearity requirement differs from the linearity assumption on the model because it is imposed on the set of equations that are used to derive the multiple regression coefficients, as will be seen below.
Shlomo Yitzhaki, Edna Schechtman
Chapter 9. Inference on Gini-Based Parameters: Estimation
Abstract
The population parameters based on Gini were introduced in previous chapters. One of the objectives in practice is to estimate them from a given data set. This is the main objective of this chapter. When dealing with estimation, several issues come in mind. Is the data based on individual observations or are they grouped? Is the sampling procedure based on equal probability or is it a stratified one? Are the variables of interest coming from a continuous or a discrete distribution? The estimation procedures depend on the answers to the above questions. In addition, the Gini-based parameters have various presentations which lead to different estimators, each one being the natural estimator of a specific definition.
Shlomo Yitzhaki, Edna Schechtman
Chapter 10. Inference on Gini-Based Parameters: Testing
Abstract
Chapter 9 dealt with the estimation of the parameters based on the Gini. In this chapter we introduce methods of testing for the parameters that are based on the Gini. Most of the estimators that were derived in Chap. 9 are based on U-statistics or functions of (dependent) U-statistics. The advantage is that we can use known facts about the limiting distributions of U-statistics and of functions of them in order to obtain statistical tests. In what follows we concentrate on the asymptotic normality but do not give explicit formulas for the variances. Instead we suggest estimating the variances using the jackknife method (to be explained below). Therefore, the explicit variances which sometimes have complicated expressions are not needed for the applications.
Shlomo Yitzhaki, Edna Schechtman
Chapter 11. Inference on Lorenz and on Concentration Curves
Abstract
In a pathbreaking paper, Atkinson (1970) proved several results concerning the ranking of income distributions according to expected values of all concave social welfare functions. One of the important results is that for distributions with equal means, all social welfare functions show the same order of average social welfares (i.e., the same ordering of inequality) if and only if the appropriate Lorenz curves do not intersect. If, on the other hand, the Lorenz curves intersect then it is possible to find two alternative social welfare functions which rank average social welfares differently (to be discussed in Chaps. 13 and 14). This finding by Atkinson has opened the way for using the Lorenz curve as a basic tool in the application of the concept of second-degree stochastic dominance (SSD, to be defined below). This tool allows the analyses of the effects of tax reforms and decision under risk to be applied to a wide group of utility functions, freeing the analysis from the need to specify the utility function. Shorrocks (1983) proved that X dominates Y according to SSD if and only if the absolute Lorenz curve ALC of X is not lower than the ALC of Y. This result enables to extend the possible applications to distributions with different expected values. There are three possible outcomes when comparing two absolute (and relative) Lorenz curves: Lorenz dominance, equivalence, and crossing. Bishop, Chakravarty, and Thistle (1989) extend the works by Gail and Gastwirth (1978), Beach and Davidson (1983), and Gastwirth and Gail (1985) who deal with relative Lorenz curves and suggest a pair-wise multiple comparisons method of sample absolute (generalized) Lorenz ordinates to test for differences.
Shlomo Yitzhaki, Edna Schechtman

Applications

Frontmatter
Chapter 12. Introduction to Applications
Abstract
The main properties of the Gini mean difference (GMD) and the extended Gini (EG) were presented in the first part of the book. We have concentrated on those properties that enable the user to replicate almost everything that can be done when relying on the variance. In some sense we can claim that (almost) every analysis that is performed when using the variance can be done with the GMD, and sometimes with the EG as well. This means more than doubling the number of possible models that can be used because every variance-based model can be replicated by a Gini-based model. This fact raises the question whether it is worth to pursue this direction of research or not and what are the pros and cons of using the Gini methodology. We note that generally speaking when the underlying distribution is multivariate normal then there is nothing to be gained from using the GMD method. The reason is simple: when the underlying distribution is multivariate normal then the estimates of the means, the variances, and the correlations (by Pearson) are sufficient statistics for describing the data, and therefore nothing is gained by using an alternative system for describing the data on one hand, while a loss of efficiency follows because the parameters of the normal distribution are estimated in a circumvent way. However, as pointed out by Huber (1981) and Gorard (2005) even a small deviation from the ideal world of the normal distribution can lead to an advantage of using other measures of variability.
Shlomo Yitzhaki, Edna Schechtman
Chapter 13. Social Welfare, Relative Deprivation, and the Gini Coefficient
Abstract
The aim of this chapter is to elaborate on the role of the Gini coefficient in two major competing theories that dominate the theoretical considerations in the area of income distribution. The major theory that dominates the economic thinking with respect to the role of the government in the area of income distribution is the Bergson’s (1938) social welfare function approach (hereafter SWF). The other theory that is gaining acceptability among economists is the theory of relative deprivation (hereafter RD).
Shlomo Yitzhaki, Edna Schechtman
Chapter 14. Policy Analysis
Abstract
The objective of this chapter is to introduce the use of the concentration curves and the Gini methodology in the areas of taxation and progressivity of public expenditures. Most of the literature in these areas considers the case of a representative individual, which means that issues of income distributions are ignored and the only issue that is considered is efficiency. For the Gini methodology and concentration curves to be useful we extend the model to include issues of income distributions as discussed by Diamond (1975) and Atkinson and Stiglitz (1980). We start with a crude characterization of optimal (mostly indirect) taxation which includes the issue of redistribution in addition to efficiency considerations. In a typical model the investigator assumes a social welfare function (SWF) and optimizes it subject to the behaviors of the individuals and to the instruments that are used by the government. Using those ingredients she gets the first-order conditions for optimization so that the relationships among the instruments in an optimal setting are determined (see, as a background, Atkinson &Stiglitz, 1980, pp. 386–393). Note that by assuming the existence of an SWF, the issues of horizontal equity and of comparisons of utilities of households with different structures are skipped because the mere existence of an SWF implies that one knows how to rank individuals according to economic well-being.
Shlomo Yitzhaki, Edna Schechtman
Chapter 15. Policy Analysis Using the Decomposition of the Gini by Non-marginal Analysis
Abstract
The objective of this chapter is to demonstrate the usefulness of several decompositions of the Gini (and the EG) in order to analyze the strengths and the weaknesses of various policies. We concentrate on distributional issues. The other component of the problem of tax reform—the estimation of the marginal cost of taxation—is identical to the description given in Chap. 14 hence it will not be repeated here.
Shlomo Yitzhaki, Edna Schechtman
Chapter 16. Incorporating Poverty in Policy Analysis: The Marginal Analysis Case
Abstract
The main purpose of this chapter is to expose the reader to additional tools that can be helpful in analyzing the distributional impact of a governmental policy. Assuming that one accepts the Gini coefficient of after-tax income as representing the social attitude toward the income distribution then one can summarize the effects of actions taken by the government by the Gini income elasticity (GIE). Decomposing the GIE by the contributions of the different sections of the income distribution enables one to both use the Gini as representing the social attitude and at the same time target the policy to sections of the distribution. The decomposition of the GIE presented is actually identical to the decomposition of the Gini regression coefficient applied to the Gini coefficient. The main message is that analyzing the effect of public policy by concentrating only on the poor population is not an appropriate approach because it violates the Pareto principle of efficiency and therefore leads governments and researchers to adopt and recommend policies that contradict the verbal declarations of the targets of the policies. On the other hand, by using a decomposition approach of the Gini coefficient or of the EG coefficient, the policy is consistent with the Pareto principle of efficiency and is based on additional useful information that is thrown away when dealing with traditional poverty analysis. An additional type of decomposition is needed whenever one is interested in targeting. We call a policy a targeted one whenever the policy instrument affects only a portion of the population. In this case we will want to decompose the effect of the policy to the contributions of two instruments: the choice of the subpopulation affected (i.e., targeting) and the effect on the subpopulation affected. The issue of targeting is not covered in this book. We refer the interested reader to Wodon and Yitzhaki (2002a, 2002b).
Shlomo Yitzhaki, Edna Schechtman
Chapter 17. Introduction to Applications of the GMD and the Lorenz Curve in Finance
Abstract
The purpose of this part of the book is to expose the reader to applications of the Gini methodology in financial theory. Those applications are relevant whenever one is interested in decision making under risk or in reducing the incompatibility between financial theory and econometric applications. Risky situations are characterized by having to make decisions without knowing what the exact outcome is going to be. This definition covers almost every decision a person makes.
Shlomo Yitzhaki, Edna Schechtman
Chapter 18. The Mean-Gini Portfolio and the Pricing of Capital Assets
Abstract
Since its development by Markowitz (1952, 1970), the mean-variance (MV) model for portfolio selection has become the standard tool by which risky financial assets are allocated. MV has gained a prominent place in finance because of its conceptual simplicity and ease of computation. Many authors, however, have challenged the model’s assumptions, primarily the normality of the probability distributions of the assets’ returns or the quadraticity of the preferences. MV validity has been reasserted by Levy and Markowitz (1979) and by Kroll, Levy, and Markowitz (1984), who showed that MV faithfully approximates expected utility.
Shlomo Yitzhaki, Edna Schechtman
Chapter 19. Applications of Gini Methodology in Regression Analysis
Abstract
Ordinary least squares (OLS) regression is based on the fact that the variance of a linear combination of random variables can be decomposed into the contributions of the individual variables and to the contributions of the correlations among them. The fact that one can imitate this decomposition (under certain conditions) when decomposing the GMD of a linear combination of random variables enables one to take any OLS-based econometric textbook and replicate each chapter using the GMD instead of the variance. Practically, this means doubling the number of models because every OLS econometric model can be replicated by the GMD, resulting in different estimates of the parameters. Moreover, we present via examples (Chap. 21) that the estimates can differ in sign. This means that two investigators who use the same variables, the same model, and the same data may come up with contradicting results concerning the effect of one variable on the other. The only difference between the two researchers lies in the measure of variability they use—the GMD or the variance. Needless to say that in many cases of policy decisions the debate is on the magnitude of a parameter, which is much more vulnerable than the sign and not on the sign itself. And to make life even more complicated any regression model that is estimated by the GMD can be replicated with the EG. This means moving from doubling the number of possible estimates to an infinite number of estimates.
Shlomo Yitzhaki, Edna Schechtman
Chapter 20. Gini’s Multiple Regressions: Two Approaches and Their Interaction
Abstract
Our target in this chapter is to illustrate one of the major advantages of the GMD regressions: they offer a complete framework for checking and dealing with some of the assumptions imposed on the data in a multiple regression problem. There are two approaches that are related to the Gini—the semi-parametric approach and the minimization approach. The interaction between the two gives tools for assessing the adequacy of the model. In addition, there are two tools that enable the researcher to investigate the curvature of the regression curve: the extended Gini regression and the NLMA curve. The basic idea is the following: there is an unknown regression curve that relates the dependent variable Y and (all or some out of) a set of explanatory variables X1,…,Xn. The shape of the curve is not known. The curve is approximated by a linear model (which is then estimated from the data). However, each approach mentioned above leads to a (possibly different) linear model. The interaction between the two approaches can help to decide whether the original curve is linear (in each individual explanatory variable) or not. The suggested stages are the following: first one estimates the regression coefficients according to the semi-parametric approach without specifying a linear model. This means that at this stage the researcher decides only on the set of explanatory variables to be included in the regression model but not on the functional form. Then one uses the residuals from the fitted curve and tests whether they fulfill the necessary conditions for the minimization approach (which were obtained assuming linearity) for each explanatory variable separately. If for any given explanatory variable the above conditions are fulfilled; that is, if the hypothesis that the two regression coefficients are equal is not rejected, then one concludes that the regression curve is linear in this variable. Otherwise it is not (see Chap. 7 for details or below for a brief review). This property is especially important in regressions with several explanatory variables. It enables the investigator to find a set of variables that allows linear predictions without having to commit to the linearity of the model as a whole. Provided that the linearity hypothesis is not rejected for all explanatory variables one can examine the properties of the residuals such as their distribution, whether it is symmetric around the regression line or not, the serial correlation between them, etc., using the methodologies that will keep the analysis under the Gini framework. Although each stage could be performed by alternative methods, we are not aware of any methodology that can offer a complete set of tests that is governed by a unified framework and therefore offers a method to test the assumptions behind the regression with an internal consistency. We note in passing that the suggested test for linearity does not require replications of observations, as is the case in the common tests for linearity.
Shlomo Yitzhaki, Edna Schechtman
Chapter 21. Mixed OLS, Gini, and Extended Gini Regressions
Abstract
The purpose of this chapter is to illustrate the use of the mixed regression technique. The meaning of mixed regression is that some of the explanatory variables are treated according to one regression method, while the others are treated according to another method. We extend this definition and include EG regressions for which different EG parameters may be attached to the different explanatory variables. Like any inbreeding the mixed regression does not have “pure” properties. Therefore the purpose for using it needs to be explained and justified.
Shlomo Yitzhaki, Edna Schechtman
Chapter 22. An Application in Statistics: ANOGI
Abstract
This chapter deals with applications of the GMD and the Gini coefficient in statistics. It presents an application which replicates the ANalysis Of VAriance (ANOVA) and is referred to as ANalysis Of GIni (ANOGI).
Shlomo Yitzhaki, Edna Schechtman
Chapter 23. Suggestions for Further Research
Abstract
Throughout the book we have stressed several properties that distinguish between the GMD and the variance, claiming that those properties give an advantage to using the GMD over the variance, in cases where the assumption of normality is not supported by the data. Among those properties are the following:
Shlomo Yitzhaki, Edna Schechtman
Backmatter
Metadaten
Titel
The Gini Methodology
verfasst von
Shlomo Yitzhaki
Edna Schechtman
Copyright-Jahr
2013
Verlag
Springer New York
Electronic ISBN
978-1-4614-4720-7
Print ISBN
978-1-4614-4719-1
DOI
https://doi.org/10.1007/978-1-4614-4720-7

Premium Partner