nach oben

2011 | Buch

Kapitel lesen Erstes Kapitel lesen

Econometrics

verfasst von: Badi H. Baltagi

Verlag: Springer Berlin Heidelberg

Buchreihe : Springer Texts in Business and Economics

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

This textbook teaches some of the basic econometric methods and the underlying assumptions behind them. It also includes a simple and concise treatment of more advanced topics in spatial correlation, panel data, limited dependent variables, regression diagnostics, specification testing and time series analysis. Each chapter has a set of theoretical exercises as well as empirical illustrations using real economic applications. These empirical exercises usually replicate a published article using Stata or Eviews.

Inhaltsverzeichnis

Frontmatter

Part I

Frontmatter

Chapter 1. What Is Econometrics?

Abstract

What is econometrics? A few definitions are given below: The method of econometric research aims, essentially, at a conjunction of economic theory and actual measurements, using the theory and technique of statistical inference as a bridge pier. Trygve Haavelmo (1944) Econometrics may be defined as the quantitative analysis of actual economic phenomena based on the concurrent development of theory and observation, related by appropriate methods of inference. Samuelson, Koopmans and Stone (1954) Econometrics is concerned with the systematic study of economic phenomena using observed data. Aris Spanos (1986) Broadly speaking, econometrics aims to give empirical content to economic relations for testing economic theories, forecasting, decision making, and for ex post decision/policy evaluation. J. Geweke, J. Horowitz, and M.H. Pesaran (2008) For other definitions of econometrics, see Tintner (1953).

Badi H. Baltagi

Chapter 2. Basic Statistical Concepts

Abstract

One chapter cannot possibly review what one learned in one or two pre-requisite courses in statistics. This is an econometrics book, and it is imperative that the student have taken at least one solid course in statistics. The concepts of a random variable, whether discrete or continuous, and the associated probability function or probability density function (p.d.f.) are assumed known. Similarly, the reader should know the following statistical terms: Cumulative distribution function, marginal, conditional and joint p.d.f.’s. The reader should be comfortable with computing mathematical expectations, and familiar with the concepts of independence, Bayes Theorem and several continuous and discrete probability distributions. These distributions include: the Bernoulli, Binomial, Poisson, Geometric, Uniform, Normal, Gamma, Chi-squared (X ²), Exponential, Beta, t and F distributions.

Badi H. Baltagi

Chapter 3. Simple Linear Regression

Abstract

In this chapter, we study extensively the estimation of a linear relationship between two variables, Y _i and X _i, of the form:

$$Y_i = \alpha + \beta X_i + ui \quad i = 1,2,\ldots n$$

(3.1)

where Y _i denotes the i-th observation on the dependent variable Y which could be consumption, investment or output, and X _i denotes the i-th observation on the independent variable X which could be disposable income, the interest rate or an input. These observations could be collected on firms or households at a given point in time, in which case we call the data a cross-section. Alternatively, these observations may be collected over time for a specific industry or country in which case we call the data a time-series. n is the number of observations, which could be the number of firms or households in a cross-section, or the number of years if the observations are collected annually. α and β are the intercept and slope of this simple linear relationship between Y and X. They are assumed to be unknown parameters to be estimated from the data. A plot of the data, i.e., Y versus X would be very illustrative showing what type of relationship exists empirically between these two variables.

Badi H. Baltagi

Chapter 4. Multiple Regression Analysis

Abstract

So far we have considered only one regressor X besides the constant in the regression equation. Economic relationships usually include more than one regressor. For example, a demand equation for a product will usually include real price of that product in addition to real income as well as real price of a competitive product and the advertising expenditures on this product. In this case

$$Y_i = \alpha + \beta_2 X_{2i} + \beta_3 X_{3i} + .. + \beta_K X_{Ki} + u_i \quad i= 1,2, \ldots, n$$

(4.1)

where Y _i denotes the i-th observation on the dependent variable Y, in this case the sales of this product. X _ki denotes the i-th observation on the independent variable X _k for k = 2, … , K in this case, own price, the competitor’s price and advertising expenditures. α is the intercept and β ₂, β ₃, … , β _K are the (K − 1) slope coefficients. The ui’s satisfy the classical assumptions 1–4 given in Chapter 3. Assumption 4 is modified to include all the X’s appearing in the regression, i.e., every X _k for k = 2, ’ , K, is uncorrelated with the ui’s with the property that $\sum\nolimits^{n}_{i=1} (X_{ki} - \bar{X}_k)^2/n \ \hbox{where} \ \bar{X}_k = \sum\nolimits^{n}_{i=1} X_{ki}/n$ has a finite probability limit which is different from zero.

Badi H. Baltagi

Chapter 5. Violations of the Classical Assumptions

Abstract

In this chapter, we relax the assumptions made in Chapter 3 one by one and study the effect of that on the OLS estimator. In case the OLS estimator is no longer a viable estimator, we derive an alternative estimator and propose some tests that will allow us to check whether this assumption is violated.

Badi H. Baltagi

Chapter 6. Distributed Lags and Dynamic Models

Abstract

Many economic models have lagged values of the regressors in the regression equation. For example, it takes time to build roads and highways. Therefore, the effect of this public investment on growth in GNP will show up with a lag, and this effect will probably linger on for several years. It takes time before investment in research and development pays off in new inventions which in turn take time to develop into commercial products. In studying consumption behavior, a change in income may affect consumption over several periods. This is true in the permanent income theory of consumption, where it may take the consumer several periods to determine whether the change in real disposable income was temporary or permanent. For example, is the extra consulting money earned this year going to continue next year? Also, lagged values of real disposable income appear in the regression equation because the consumer takes into account his life time earnings in trying to smooth out his consumption behavior.

Badi H. Baltagi

Part II

Frontmatter

Chapter 7. The General Linear Model: The Basics

Abstract

Consider the following regression equation

$$y = X\beta + u$$

(7.1)

where

$$y = \left[\begin{array}{c}Y_1\\ Y_2\\ \atop^{.}_{.}\\ Y_n\end{array} \right]; X = \left[ \begin{array}{cccc}X_{11} & X_{12} & \ldots & X_{1k}\\ X_{21} & X_{22} & \ldots & X_{2k}\\ \atop^{.}_{.} & \atop^{.}_{.} & \atop^{.}_{.} & \atop^{.}_{.}\\ X_{n1} & X_{n2} & \ldots & X_{nk} \end{array} \right]; \beta = \left[ \begin{array}{c}\beta_1\\ \beta_2\\ \atop^{.}_{.}\\ \beta_k \end{array} \right]; u = \left[\begin{array}{c}u_1\\ u_2\\ \atop^{.}_{.}\\ u_n \end{array}\right]$$

with n denoting the number of observations and k the number of variables in the regression, with n > k. In this case, y is a column vector of dimension (n×1) and X is a matrix of dimension (n × k). Each column of X denotes a variable and each row of X denotes an observation on these variables. If y is log(wage) as in the empirical example in Chapter 4, see Table 4.1 then the columns of X contain a column of ones for the constant (usually the first column), weeks worked, years of full time experience, years of education, sex, race, marital status, etc.

Badi H. Baltagi

Chapter 8. Regression Diagnostics and Specification Tests

Abstract

Sources of influential observations include: (i) improperly recorded data, (ii) observational errors in the data, (iii) misspecification and (iv) outlying data points that are legitimate and contain valuable information which improve the efficiency of the estimation. It is constructive to isolate extreme points and to determine the extent to which the parameter estimates depend upon these desirable data.

Badi H. Baltagi

Chapter 9. Generalized Least Squares

Abstract

This chapter considers a more general variance covariance matrix for the disturbances. In other words, u ~ (0, s2In) is relaxed so that u ~ (0, σ²Ω) where Ω is a positive definite matrix of dimension (n×n). First Ω is assumed known and the BLUE for β is derived. This estimator turns out to be different from $\hat{\beta}_{OLS}$, and is denoted by $\hat{\beta}_{GLS}$, the Generalized Least Squares estimator of β. Next, we study the properties of $\hat{\beta}_{OLS}$ under this nonspherical form of the disturbances. It turns out that the OLS estimates are still unbiased and consistent, but their standard errors as computed by standard regression packages are biased and inconsistent and lead to misleading inference. Section 9.3 studies some special forms of Ω and derive the corresponding BLUE for β. It turns out that heteroskedasticity and serial correlation studied in Chapter 5 are special cases of Ω. Section 9.4 introduces normality and derives the maximum likelihood estimator. Sections 9.5 and 9.6 study the way in which test of hypotheses and prediction get affected by this general variance-covariance assumption on the disturbances. Section 9.7 studies the properties of this BLUE for Ω when Ω is unknown, and is replaced by a consistent estimator. Section 9.8 studies what happens to the W, LR and LM statistics when u ~ N(0, σ²Ω). Section 9.9 gives another application of GLS to spatial autocorrelation.

Badi H. Baltagi

Chapter 10. Seemingly Unrelated Regressions

Abstract

When asked “How did you get the idea for SUR?” Zellner responded: ”On a rainy night in Seattle in about 1956 or 1957, I somehow got the idea of algebraically writing a multivariate regression model in single equation form. When I figured out how to do that, everything fell into place because then many univariate results could be carried over to apply to the multivariate system and the analysis of the multivariate system is much simplified notationally, algebraically and, conceptually.” Read the interview of Professor Arnold Zellner by Rossi (1989, p. 292).

Badi H. Baltagi

Chapter 11. Simultaneous Equations Model

Abstract

Economists formulate models for consumption, production, investment, money demand and money supply, labor demand and labor supply to attempt to explain the workings of the economy. These behavioral equations are estimated equation by equation or jointly as a system of equations. These are known as simultaneous equations models. Much of today’s econometrics have been influenced and shaped by a group of economists and econometricians known as the Cowles Commission who worked together at the University of Chicago in the late 1940’s, see Chapter 1. Simultaneous equations models had their genesis in economics during that period. Haavelmo’s (1944) work emphasized the use of the probability approach to formulating econometric models. Koopmans and Marschak (1950) and Koopmans and Hood (1953) in two influential Cowles Commission monographs provided the appropriate statistical procedures for handling simultaneous equations models. In this chapter, we first give simple examples of simultaneous equations models and show why the least squares estimator is no longer appropriate. Next, we discuss the important problem of identification and give a simple necessary but not sufficient condition that helps check whether a specific equation is identified. Sections 11.2 and 11.3 give the estimation of a single and a system of equations using instrumental variable procedures. Section 11.4 gives a test of over-identification restrictions whereas, section 11.5 gives a Hausman specification test. Section 11.6 concludes with an empirical example. The Appendix revisits the identification problem and gives a necessary and sufficient condition for identification.

Badi H. Baltagi

Chapter 12. Pooling Time-Series of Cross-Section Data

Abstract

In this chapter, we will consider pooling time-series of cross-sections. This may be a panel of households or firms or simply countries or states followed over time. Two well known examples of panel data in the U.S. are the Panel Study of Income Dynamics (PSID) and the National Longitudinal Survey (NLS). The PSID began in 1968 with 4802 families, including an over-sampling of poor households. Annual interviews were conducted and socioeconomic characteristics of each of the families and of roughly 31000 individuals who have been in these or derivative families were recorded. The list of variables collected is over 5000. The NLS, followed five distinct segments of the labor force. The original samples include 5020 older men, 5225 young men, 5083 mature women, 5159 young women and 12686 youths. There was an over-sampling of blacks, hispanics, poor whites and military in the youths survey. The list of variables collected runs into the thousands. An inventory of national studies using panel data is given at http://www.isr.umich.edu/src/psid/panelstudies.html. Pooling this data gives a richer source of variation which allows for more efficient estimation of the parameters. With additional, more informative data, one can get more reliable estimates and test more sophisticated behavioral models with less restrictive assumptions. Another advantage of panel data sets are their ability to control for individual heterogeneity. Not controlling for these unobserved individual specific effects leads to bias in the resulting estimates. Panel data sets are also better able to identify and estimate effects that are simply not detectable in pure cross-sections or pure timeseries data. In particular, panel data sets are better able to study complex issues of dynamic behavior. For example, with a cross-section data set one can estimate the rate of unemployment at a particular point in time. Repeated cross-sections can show how this proportion changes over time. Only panel data sets can estimate what proportion of those who are unemployed in one period remain unemployed in another period. Some of the benefits and limitations of using panel data sets are listed in Hsiao (2003) and Baltagi (2008). Section 12.2 studies the error components model focusing on fixed effects, random effects and maximum likelihood estimation. Section 12.3 considers the question of prediction in a random effects model, while Section 12.4 illustrates the estimation methods using an empirical example. Section 12.5 considers testing the poolability assumption, the existence of random individual effects and the consistency of the random effects estimator using a Hausman test. Section 12.6 studies the dynamic panel data model and illustrates the methods used with an empirical example. Section 12.7 concludes with a short presentation of program evaluation and the difference-in-differences estimator.

Badi H. Baltagi

Chapter 13. Limited Dependent Variables

Abstract

In labor economics, one is faced with explaining the decision to participate in the labor force, the decision to join a union, or the decision to migrate from one region to the other. In finance, a consumer defaults on a loan or a credit card debt, or purchases a stock or an asset like a house or a car. In these examples, the dependent variable is usually a dummy variable with values 1 if the worker participates (or consumer defaults on a loan) and 0 if he or she does not participate (or default). We dealt with dummy variables as explanatory variables on the right hand side of the regression, but what additional problems arise when this dummy variable appears on the left hand side of the equation? As we have done in previous chapters, we first study its effects on the usual least squares estimator, and then consider alternative estimators that are more appropriate for models of this nature.

Badi H. Baltagi

Chapter 14. Time-Series Analysis

Abstract

There has been an enormous amount of research in time-series econometrics, and many economics departments have required a time-series econometrics course in their graduate sequence. Obviously, one chapter on this topic will not do it justice. Therefore, this chapter will focus on some of the basic concepts needed for such a course. Section 14.2 defines what is meant by a stationary time-series, while sections 14.3 and 14.4 briefly review the Box-Jenkins and Vector Autoregression (VAR) methods for time-series analysis. Section 14.5 considers a random walk model and various tests for the existence of a unit root. Section 14.6 studies spurious regressions and trend stationary versus difference stationary models. Section 14.7 gives a simple explanation of the concept of cointegration and illustrates it with an economic example. Finally, section 14.8 looks at Autoregressive Conditionally Heteroskedastic (ARCH) time-series.

Badi H. Baltagi

Backmatter

Titel: Econometrics
verfasst von: Badi H. Baltagi
Verlag: Springer Berlin Heidelberg
Electronic ISBN: 978-3-642-20059-5
Print ISBN: 978-3-642-20058-8
DOI: https://doi.org/10.1007/978-3-642-20059-5