Panel Data Econometrics with R (eBook)
John Wiley & Sons (Verlag)
978-1-118-94918-4 (ISBN)
Panel Data Econometrics with R provides a tutorial for using R in the field of panel data econometrics. Illustrated throughout with examples in econometrics, political science, agriculture and epidemiology, this book presents classic methodology and applications as well as more advanced topics and recent developments in this field including error component models, spatial panels and dynamic models. They have developed the software programming in R and host replicable material on the book's accompanying website.
Yves Croissant, Professor of Economics, CEMOI, Faculté de Droit et d'Economie, Université de La Réunion, France
Giovanni Millo, Senior Economist, Group Insurance Research, Assicurazioni Generali S.p.A., Trieste, Italy
Panel Data Econometrics with R provides a tutorial for using R in the field of panel data econometrics. Illustrated throughout with examples in econometrics, political science, agriculture and epidemiology, this book presents classic methodology and applications as well as more advanced topics and recent developments in this field including error component models, spatial panels and dynamic models. They have developed the software programming in R and host replicable material on the book s accompanying website.
Yves Croissant, Professor of Economics, CEMOI, Faculté de Droit et d'Economie, Université de La Réunion, France Giovanni Millo, Senior Economist, Group Insurance Research, Assicurazioni Generali S.p.A., Trieste, Italy
1
Introduction
This book is about doing panel data econometrics with the R software. As such, it is aimed at both panel data analysts who want to use R and R users who endeavor in panel data analysis. In this introductory chapter, we will motivate panel data methods through a simple example, performing calculations in base R, to introduce panel data issues to the R user; then we will give an overview of econometric computing in R for the analyst coming from different software packages or environments.
1.1 Panel Data Econometrics: A Gentle Introduction
In this section we will introduce the broad subject of panel data econometrics through its features and advantages over pure cross‐sectional or time‐series methods. According to Baltagi (2013), panel data allow to control for individual heterogeneity, exploit greater variability for more efficient estimation, study adjustment dynamics, identify effects one could not detect from cross‐section data, improve measurement accuracy (micro‐data instead of aggregated), use one dimension to infer about the other (as in panel time series).
From a statistical modeling viewpoint, first and foremost, panel data techniques address one broad issue: unobserved heterogeneity, aiming at controlling for unobserved variables possibly biasing estimation.
Consider the regression model
where is an observable regressor and is unobservable. The feasible model on observables
suffers from an omitted variables problem; the OLS estimate of is consistent if is uncorrelated with either or : otherwise it will be biased and inconsistent.
One of the best‐known examples of unobserved individual heterogenetiy is the agricultural production function by Mundlak (1961) (see also Arellano, 2003, p. 9) where output depends on (labor), (soil quality) and a stochastic disturbance term (rainfall) so that the data‐generating process can be represented by the above model; if soil quality is known to the farmer, although unobservable to the econometrician, it will be correlated with the effort and hence will be an inconsistent estimator for .
This is usually modeled with the general form:
where is a time‐invariant, generally unobservable characteristic. In the following we will motivate the use of panel data in the light of the need to control for unobserved heterogeneity. We will eliminate the individual effects through some simple techniques. As will be clear from the following chapters, subject to further assumptions on the nature of the heterogeneity there are more sophisticated ways to control for it; but for now we will stay on the safe side, depending only on the assumption of time invariance.
1.1.1 Eliminating Unobserved Components
Panel data turn out especially useful if the unobserved heterogeneity is (can be assumed) time‐invariant. Leveraging the information on time variation for each unit in the cross section, it is possible to rewrite the model 1.1 in terms of observables only, in a form that is equivalent as far as estimating is concerned. The simplest one is by subtracting one cross section from the other.
1.1.1.1 Differencing Methods
Time‐invariant individual components can be removed by first‐differencing the data: lagging the model and subtracting, the time‐invariant components (the intercept and the individual error component) are eliminated, and the model
(where , and, from 1.1, for ) can be consistently estimated by pooled OLS. This is called the first‐difference, or FD estimator.
1.1.1.2 LSDV Methods
Another possibility to account for time‐invariant individual components is to explicitly introduce them into the model specification, in the form of individual intercepts. The second dimension of panel data (here: time) allows in fact to estimate the s as further parameters, together with the parameters of interest . This estimator is referred to as least squares dummy variables, or LSDV. It must be noted that the degrees of freedom for the estimation do now reduce to because of the extra parameters. Moreover, while the vector is estimated using the variability of the full sample and therefore the estimator is ‐consistent, the estimates of the individual intercepts are ‐consistent, as relying only on the time dimension. Nevertheless, it is seldom of interest to estimate the individual intercepts.
1.1.1.3 Fixed Effects Methods
The LSDV estimator is adding a potentially large number of covariates to the basic specification of interest and can be numerically very inefficient. A more compact and statistically equivalent way of obtaining the same estimator entails transforming the data by subtracting the average over time (individual) to every variable. This, which has become the standard way of estimating fixed effects models with individual (time) effects, is usually termed time‐demeaning and is defined as:
where and denote individual means of and .
This is equivalent to estimating the model
i.e., leaving the individual intercepts free to vary, and considering them as parameters to be estimated. The estimates can subsequently be recovered from the OLS estimation of time‐demeaned data.
Example 1‐1 individual heterogeneity – Fatalities data set
The Fatalities dataset from Stock and Watson (2007) is a good example of the importance of individual heterogeneity and time effects in a panel setting.
The research question is whether taxing alcoholics can reduce the road's death toll. The basic specification relates the road fatality rate to the tax rate on beer in a classical regression setting:
Data are 1982 to 1988 for each of the continental US states.
The basic elements of any estimation command in R are a formula specifying the model design and a dataset, usually in the form of a data.frame. Pre‐packaged example datasets are the most hassle‐free way of importing data, as needing only to be called by name for retrieval. In the following, the model is specified in its simplest form, a bivariate relation between the death rate and the beer tax.
data("Fatalities", package="AER")Fatalities$frate <- with(Fatalities, fatal / pop * 10000)fm <- frate ˜ beertax The most basic step is a cross‐sectional analysis for one single year (here, 1982). One proceeds first creating a model object through a call to lm, then displaying a summary.lm of it. Printing to screen occurs when interactively calling an object by name. Notice that subsetting can be done inside the call to lm by feeding an expression that solves into a logical vector to the subset argument: data points corresponding to TRUEs will be selected, FALSEs discarded.
mod82 <- lm(fm, Fatalities, subset = year == 1982)summary(mod82)Call:lm(formula = fm, data = Fatalities, subset = year == 1982)Residuals: Min 1Q Median 3Q Max-0.936 -0.448 -0.107 0.230 2.172Coefficients: Estimate Std. Error t value Pr(>|t|)(Intercept) 2.010 0.139 14.46 <2e-16 ***beertax 0.148 0.188 0.79 0.43‐‐‐Signif. codes:0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1Residual standard error: 0.67 on 46 degrees of freedomMultiple R-squared: 0.0133, Adjusted R-squared: -0.00813F-statistic: 0.621 on 1 and 46 DF, p-value: 0.435 The beer tax turns out statistically insignificant. Turning to the last year in the sample (and employing coeftest for compactness):
mod88 <- update(mod82, subset = year == 1988)library("lmtest")coeftest(mod88)t test of coefficients: Estimate Std. Error t value Pr(>|t|)(Intercept) 1.859 0.106 17.54 <2e-16 ***beertax 0.439 0.164 2.67 0.011 *‐‐‐Signif. codes:0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 the coefficient is significant and positive! Similar results appear for any single year in the sample.
Pooling all cross sections together, without considering any form of individual effect, can be done using the regular lm function or, equivalently, plm; in this second case, for reasons which will be clearer in the following, this is not the default behavior, so the optional model argument has to be specified, setting it to 'pooling'.
Drawing on this much enlarged dataset does not change the qualitative result:
library("plm")poolmod <- plm(fm, Fatalities, model="pooling")coeftest(poolmod)t test of coefficients: Estimate Std. Error t value Pr(>|t|)(Intercept) 1.8533 0.0436 42.54 < 2e-16...| Erscheint lt. Verlag | 10.8.2018 |
|---|---|
| Sprache | englisch |
| Themenwelt | Mathematik / Informatik ► Mathematik |
| Wirtschaft ► Volkswirtschaftslehre ► Ökonometrie | |
| Schlagworte | Econometric & Statistical Methods • Econometrics • Economics • <p>Panel Data Econometrics, R Software, econometrics, political science, agriculture and epidemiology, error component models, spatial panels, dynamic models. </p> • Ökonometrie • Ökonometrie u. statistische Methoden • R (Programm) • Statistical Software / R • Statistics • Statistik • Statistiksoftware / R • Volkswirtschaftslehre |
| ISBN-10 | 1-118-94918-8 / 1118949188 |
| ISBN-13 | 978-1-118-94918-4 / 9781118949184 |
| Informationen gemäß Produktsicherheitsverordnung (GPSR) | |
| Haben Sie eine Frage zum Produkt? |
Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM
Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine
Geräteliste und zusätzliche Hinweise
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich