main-content

This book discusses the need to carefully and prudently apply various regression techniques in order to obtain the full benefits. It also describes some of the techniques developed and used by the authors, presenting their innovative ideas regarding the formulation and estimation of regression decomposition models, hidden Markov chain, and the contribution of regressors in the set-theoretic approach, calorie poverty rate, and aggregate growth rate. Each of these techniques has applications that address a number of unanswered questions; for example, regression decomposition techniques reveal intra-household gender inequalities of consumption, intra-household allocation of resources and adult equivalent scales, while Hidden Markov chain models can forecast the results of future elections. Most of these procedures are presented using real-world data, and the techniques can be applied in other similar situations. Showing how difficult questions can be answered by developing simple models with simple interpretation of parameters, the book is a valuable resource for students and researchers in the field of model building.

### Chapter 1. Introduction to Correlation and Linear Regression Analysis

Abstract
This chapter gives some concepts of correlation and regression analysis. Correlation comes prior to regression analysis. It starts with the concept of simple correlation coefficient; which gives the degree of linear relationship between two variables. One should draw scatter diagram in order to judge whether there exists any linear relation between the two variables. The correlation coefficient is not only invariant under changes of unit of measurements but also unaffected by changes of origin for both variables. The value of the correlation coefficient always lies in-between –1 and +1. As the scatter points move closer to the straight line, it moves to –1 or +1 depending on whether the relation is negative or positive. The straight-line relation between the two variables can be found by Least Squares (LS) method. The goodness of fit of the linear regression can be measured by the square of the simple correlation coefficient. Multiple Linear Regression Model is an extension of Simple Linear Regression Model. In a multiple linear regression model, we have more than two independent variables. The goodness of fit in this case is measured by coefficient of determination which is the square of the multiple correlation coefficient.
Manoranjan Pal, Premananda Bharati

### Chapter 2. Regression Decomposition Technique Toward Finding Intra-household Gender Bias of Calorie Consumption

Abstract
From the data on total consumption of households, it is not possible to find the intra-household disparity in the consumption pattern among the members of the households. But if we are interested in the estimation of a certain aspects of consumption at the aggregate level, say mean calorie consumption of each of the different groups of members in the households, taking all households into consideration, then it is possible to estimate the same using Generalized Linear Regression Model (GLRM) after some modifications. In this chapter we first discuss the model and the method of estimation of the associated parameters of the model and then apply this technique to the 61st round National Sample Survey Organization (NSSO) data on consumption to see whether mean consumption of calories varies among male and female members of the households. When these estimates are compared to the Food and Agricultural Organization (FAO) and Indian Council for Medical Research (ICMR) norms, it is found that there is no indication of discrimination against the female members in the households.
Manoranjan Pal, Premananda Bharati

### Chapter 3. Estimation of Poverty Rates by Calorie Decomposition Method

Abstract
In this chapter we use the member wise expected calorie consumption of the households to arrive at the poverty rates. For a given household, we compute the per capita average expenditure of each member of the household depending on whether the household belongs to rural or urban sector. The weighted sums of the expected amount of calories consumed by the members are then found separately for male and female members of the household, where the weight is the number of members in each category. In a similar manner we get the sum of calorie norms of members in the household. The calorie norm of the household is compared with the estimated calorie consumption to determine whether the household is poor. If a household is poor then it is given a dummy value ‘1’, otherwise it is given the value ‘0’. Weighted means of these dummy values give us the poverty ratios. This calculation is carried out separately for rural and urban India. The urban poverty ratios are found to be higher than the corresponding rural poverty ratios. This is because the activity status of people is not considered. The need for the daily calorie consumption of urban people is less because most of them they work less. All modern facilities like transport and machineries to give relief to work are more available to urban people. But in our calculations, we did not assume it. There may be other reasons also. Urban people take fast food in the streets more than rural people and these are often not reported.
Manoranjan Pal, Premananda Bharati

### Chapter 4. Estimating Calorie Poverty Rates Through Regression

Abstract
In this paper we assume a tri-variate distribution of the nutrient intake (y), say calorie intake, the income (x) and the nutrient norm (z) of the households, which leads to linear or log-linear regression equations depending on the type of joint distribution assumed for the purpose of estimation. Nutrient norm takes care of age-sex composition of a household. The probability that the household consumes less than the prescribed norm can be computed from the regression result. This probability can be regarded as the estimated value of the calorie-poverty rate when taken in aggregate. In practice, since income data are not available, the per-capita total expenditure of the household is taken as a proxy to per-capita income and regression is run for different expenditure groups. We have applied this technique to the 61st round data collected by National Sample Survey Organization (NSSO), India, on calorie intakes. The estimates of the poverty rates found by this method are unbelievably high and call for further investigations. The reasons for getting such high estimates are discussed and a modification of the estimates is suggested in the paper. The modification leads to reasonable estimates of the poverty rates.
Manoranjan Pal, Premananda Bharati

### Chapter 5. Prediction of Voting Pattern

Abstract
In a setup of a given set of political parties in a region it is possible to build up a Markov chain model, which enables us to predict the results of the subsequent election. We assume that the result of the latest election depends only on the previous election. The transition probabilities can be interpreted as the coefficients of a set of regression lines, where the number of votes obtained by each party in the latest election is regressed on the number of votes obtained by the parties in the previous election. This demands the coefficients to be non-negative. The problem is that the transition probabilities are not known here and need to be estimated from data. We apply the above model to predict the results of the next election of West Bengal using the results of the latest two elections. Least Squares estimates are found subject to non-negative constraints of the coefficients. We try to assess the prospect of Bharatiya Janata Party (BJP) in West Bengal and get mixed results.
Manoranjan Pal, Premananda Bharati

### Chapter 6. Finding Aggregate Growth Rate Using Regression Technique

Abstract
In this chapter we attempt to find the overall growth rate either from the original set of observations or from the growth rate of each component. The focus of the chapter is to find the aggregate growth rate from the individual growth rates. We have also discussed how the calculation of growth rates can be done using regression technique. Moreover, the formula can compute average growth rate even when there are some zero or negative growth rate. The treatments of cross section data and the time series data are usually quite different. The present chapter unifies the methods in such a way that the formula can be applied both in cross section and time series data. The modified growth rate happens to be an intermediate growth rate, because it lies between geometric and arithmetic mean when all the individual growth rates are positive.
Manoranjan Pal, Premananda Bharati

### Chapter 7. Testing Linear Restrictions of Parameters in Regression Analysis

Abstract
In regression we often face situations where we test some restrictions on the regression coefficients, e.g., (i) whether a particular coefficient is equal to a specific value, (ii) whether a coefficient is equal to the other coefficient or (iii) whether a specific linear combination of the coefficients is always constant, and so on. With a little modification of the regression equations and introducing dummy variable we can test cross equation restrictions also. In fact, we combine the two equations into one equation by using dummy variable and then test the restrictions as if it is a single equation model. The role of dummy variables is profound in many of the regressions especially for testing cross equation restrictions.
Manoranjan Pal, Premananda Bharati

### Chapter 8. The Regression Models with Dummy Explanatory Variables

Abstract
Dummy Variables can be incorporated in regression models just as easily as quantitative variables. As a matter of fact, a regression model may contain regressors that are all exclusively dummy, or qualitative in nature. The results of such a model will be exactly same as the results found by Analysis of Variance (ANOVA) model. The regression model used to assess the statistical significance of the relationship between a quantitative regressand and (all) qualitative or dummy regressors is equivalent to a corresponding ANOVA model. For each qualitative regressor the number of dummy variables introduced must be one less than the no. of categories of that variable. If a qualitative variable has m categories, introduce only (m-1) dummy variables. The category for which no dummy variable is assigned is known as the base, benchmark, control, comparison, reference, or omitted category. And all comparisons are made in relation to the benchmark category. The intercept value represents the mean value of the benchmark category. The coefficients attached to the dummy variables are known as the differential intercept coefficients because they tell by how much the value of the intercept that receives the value of 1 differs from the intercept coefficient of the benchmark category.
Manoranjan Pal, Premananda Bharati

### Chapter 9. Relative Contribution of Regressors

Abstract
The aim of the chapter is to review the approaches suggested in the literature to see the contribution of the explanatory variables to the explained variable. This topic is better known as the relative importance of the regressors. In a regression equation set up, the relative importance may be defined by the relative contribution of each of the regressors to the coefficient of determination, R2, considering its effect when combined with the other variables. We have discussed in this chapter various measures of relative importance of predictors, namely, Allocation First, Allocation Last, Hoffman-Pratt Decomposition of R2, Shapley Value Decomposition and Relative weights. Shapley Value Decomposition takes care of all possible cases of regression and it seems to be one of the best measures. But it is almost unmanageable for large number of predictors. The procedure of Relative Weight is an alternative measure which give ranking of the predictors close to that of Shapley Value Decomposition. This procedure is quite manageable even for large number of predictors. It has already been seen that the rankings of the predictors are more or less similar whatever measure is taken. To see the contribution of the explanatory variables to the explained variable, we have devised a novel approach—the Set Theoretic Approach. This approach can find out the individual contribution of the regressors as well as the contribution common to the combination of regressors. A new measure of multicollinearity has also been proposed and illustrated with an example.
Manoranjan Pal, Premananda Bharati