Introduction to Quantitative Methods

8. Panel Data, Time-Series Cross-Section Models

8.1 Seminar

Setting a Working Directory

Before you begin, make sure to set your working directory to a folder where your course related files such as R scripts and datasets are kept. We recommend that you create a PUBLG100 folder for all your work. Create this folder on N: drive if you're using a UCL computer or somewhere on your local disk if you're using a personal laptop.

Once the folder is created, use the setwd() function in R to set your working directory.

Recommended Folder Location R Function
UCL Computers N: Drive setwd("N:/PUBLG100")
Personal Laptop (Windows) C: Drive setwd("C:/PUBLG100")
Personal Laptop (Mac) Home Folder setwd("~/PUBLG100")

After you've set the working directory, verify it by calling the getwd() function.

getwd()

Now download the R script for this seminar from the "Download .R Script" button above, and save it to your PUBLG100 folder.

If you're using a UCL computer, please make sure you're running R version 3.3.1 or newer. Some of the seminar tasks and exercises will not work with older versions of R.

Required Packages

Let's start off by installing the plm package that allows us to estimate linear models for panel data.

install.packages("plm")

Now let's load these packages:

library(plm)
library(lmtest)
library(texreg)

Clear the environment

rm(list = ls())

More Guns, Less Crimes

Download the guns dataset used by Stock and Watson.

Download Guns Dataset Codebook

Gun rights advocate John Lott argues in his book More Guns, Less Crimes that crime rates in the United States decrease when gun ownership restrictions are relaxed. The data used in Lott's research compares violent crimes, robberies, and murders across 50 states to determine whether the so called "shall" laws that remove discretion from license granting authorities actually decrease crime rates. So far 41 states have passed these "shall" laws where a person applying for a licence to carry a concealed weapon doesn't have to provide justification or "good cause" for requiring a concealed weapon permit.

Let's load the dataset used by Lott and see if we can test the arguments made by gun rights advocates.

guns <- read.csv("guns.csv")

The variables we're interested in are described below.

Variable Definition
mur Murder rate (incidents per 100,000)
shall = 1 if the state has a shall-carry law in effect in that year
= 0 otherwise
incarc_rate Incarceration rate in the state in the previous year (sentenced prisoners per 100,000 residents; value for the previous year)
pm1029 Percent of state population that is male, ages 10 to 29
stateid ID number of states (Alabama = 1, Alaska = 2, etc.)
year Year (1977-1999)

We will focus on murder rates in this example but you could try the same with variables measuring violent crimes or robberies as well.

Let's create a factor variable representing whether a state has passed "shall" law or not. The variable already exists as 0 or 1 but we want to convert it to a factor for our analysis.

guns$shall <- factor(guns$shall, levels = c(0, 1), labels =c("NO", "YES"))

Fixed Effects

Let's estimate a fixed effect model on panel data using the plm() function with shall, incarc_rate, and pm1029 as the independent variables.

state_effects <- plm(mur ~ shall + incarc_rate + pm1029, 
                     data = guns, 
                     index = c("stateid", "year"), 
                     model = "within", 
                     effect = "individual")

summary(state_effects)
Oneway (individual) effect Within Model

Call:
plm(formula = mur ~ shall + incarc_rate + pm1029, data = guns, 
    effect = "individual", model = "within", index = c("stateid", 
        "year"))

Balanced Panel: n=51, T=23, N=1173

Residuals :
   Min. 1st Qu.  Median 3rd Qu.    Max. 
-21.100  -0.959   0.016   1.080  29.000 

Coefficients :
              Estimate Std. Error t-value  Pr(>|t|)    
shallYES    -1.4513886  0.3154300 -4.6013 4.678e-06 ***
incarc_rate  0.0174551  0.0011261 15.4998 < 2.2e-16 ***
pm1029       0.9582993  0.0859610 11.1481 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Total Sum of Squares:    12016
Residual Sum of Squares: 9800
R-Squared:      0.18444
Adj. R-Squared: 0.17595
F-statistic: 84.3526 on 3 and 1119 DF, p-value: < 2.22e-16

The state_effects model shows that all three of our independent variables are statistically significant, with shall decreasing murder rates by 1.45 incidents per 100000 members of the population. The effects of incarceration rate and percentage of male population between 10 and 29 years old are also statistically significant.

Before drawing any conclusions let's make sure whether there are any state effects in our model using plmtest().

plmtest(state_effects, effect="individual")
    Lagrange Multiplier Test - (Honda)

data:  mur ~ shall + incarc_rate + pm1029
normal = 47.242, p-value < 2.2e-16
alternative hypothesis: significant effects

The p-value suggests the presence of state effects. In addition to state fixed effects, a number of factors could affect the murder rate that are not specific to an individual state. We can model these time fixed effects using the effect = "time" argument in plm().

time_effects <- plm(mur ~ shall + incarc_rate + pm1029, 
                    data = guns, 
                    index = c("stateid", "year"), 
                    model = "within", 
                    effect = "time")

summary(time_effects)
Oneway (time) effect Within Model

Call:
plm(formula = mur ~ shall + incarc_rate + pm1029, data = guns, 
    effect = "time", model = "within", index = c("stateid", "year"))

Balanced Panel: n=51, T=23, N=1173

Residuals :
   Min. 1st Qu.  Median 3rd Qu.    Max. 
 -21.70   -2.05   -0.32    1.77   35.20 

Coefficients :
              Estimate Std. Error t-value Pr(>|t|)    
shallYES    -0.2521605  0.3032163 -0.8316   0.4058    
incarc_rate  0.0412157  0.0007704 53.4993   <2e-16 ***
pm1029       0.2148597  0.1407613  1.5264   0.1272    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Total Sum of Squares:    65571
Residual Sum of Squares: 18141
R-Squared:      0.72334
Adj. R-Squared: 0.70731
F-statistic: 999.623 on 3 and 1147 DF, p-value: < 2.22e-16

The incarc_rate variable is the only statistically significant variable in the time fixed effects model.

Now let's run plmtest on the time_effects model to verify if time fixed effects are indeed present in the model.

plmtest(time_effects, effect="time")
    Lagrange Multiplier Test - time effects (Honda)

data:  mur ~ shall + incarc_rate + pm1029
normal = 16.104, p-value < 2.2e-16
alternative hypothesis: significant effects

The p-value tells us that we can reject the null hypothesis so we know that there are time fixed effects present in our model.

We already confirmed the presence of state fixed effects in the first model we estimated. Now, in order to control for both state AND time fixed effects, we need to estimate a model using the effect = "twoways" argument.

twoway_effects <- plm(mur ~ shall + incarc_rate + pm1029, 
                      data = guns, 
                      index = c("stateid", "year"), 
                      model = "within", 
                      effect = "twoways")

summary(twoway_effects)
Twoways effects Within Model

Call:
plm(formula = mur ~ shall + incarc_rate + pm1029, data = guns, 
    effect = "twoways", model = "within", index = c("stateid", 
        "year"))

Balanced Panel: n=51, T=23, N=1173

Residuals :
     Min.   1st Qu.    Median   3rd Qu.      Max. 
-19.20000  -0.97500  -0.00697   1.01000  27.10000 

Coefficients :
              Estimate Std. Error t-value  Pr(>|t|)    
shallYES    -0.5640474  0.3325054 -1.6964 0.0901023 .  
incarc_rate  0.0209756  0.0011252 18.6411 < 2.2e-16 ***
pm1029       0.7326357  0.2189770  3.3457 0.0008485 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Total Sum of Squares:    11263
Residual Sum of Squares: 8519.4
R-Squared:      0.24357
Adj. R-Squared: 0.22779
F-statistic: 117.746 on 3 and 1097 DF, p-value: < 2.22e-16

In a twoway fixed effects model shall is no longer significant and the effect of male population between 10 and 29 years old has decreased from 0.95 to 0.73 incidents per 100,000 population.

The results of all three models are shown below.

screenreg(list(state_effects, time_effects, twoway_effects), 
          custom.model.names = c("State Fixed Effects", 
                                 "Time Fixed Effects", 
                                 "Twoway Fixed Effects"))
==========================================================================
             State Fixed Effects  Time Fixed Effects  Twoway Fixed Effects
--------------------------------------------------------------------------
shallYES       -1.45 ***            -0.25               -0.56             
               (0.32)               (0.30)              (0.33)            
incarc_rate     0.02 ***             0.04 ***            0.02 ***         
               (0.00)               (0.00)              (0.00)            
pm1029          0.96 ***             0.21                0.73 ***         
               (0.09)               (0.14)              (0.22)            
--------------------------------------------------------------------------
R^2             0.18                 0.72                0.24             
Adj. R^2        0.18                 0.71                0.23             
Num. obs.    1173                 1173                1173                
==========================================================================
*** p < 0.001, ** p < 0.01, * p < 0.05

Serial Correlation

For time series data we need to address the potential for serial correlation in the error term. We will test for serial correlation with Breusch-Godfrey test using pbgtest() and provide solutions for correcting it if necessary.

pbgtest(twoway_effects)
    Breusch-Godfrey/Wooldridge test for serial correlation in panel
    models

data:  mur ~ shall + incarc_rate + pm1029
chisq = 765.16, df = 23, p-value < 2.2e-16
alternative hypothesis: serial correlation in idiosyncratic errors

The null hypothesis for the Breusch-Godfrey test is that there is no serial correlation. The p-value from the test tells us that we can reject the null hypothesis and confirms the presence of serial correlation in our error term.

We can correct for serial correlation using coeftest() similar to how we corrected for heteroskedastic errors. We'll use the vcovHC() function for obtaining a heteroskedasticity-consistent covariance matrix, but since we're interested in correcting for autocorrelation as well, we will specify method = "arellano" which corrects for both heteroskedasticity and autocorrelation.

twoway_effects_hac <- coeftest(twoway_effects, 
                               vcov = vcovHC(twoway_effects, 
                                             method = "arellano", 
                                             type = "HC3"))

screenreg(list(twoway_effects, twoway_effects_hac),
          custom.model.names = c("Twoway Fixed Effects", 
                                 "Twoway Fixed Effects (HAC)"))
=============================================================
             Twoway Fixed Effects  Twoway Fixed Effects (HAC)
-------------------------------------------------------------
shallYES       -0.56               -0.56                     
               (0.33)              (0.48)                    
incarc_rate     0.02 ***            0.02 *                   
               (0.00)              (0.01)                    
pm1029          0.73 ***            0.73                     
               (0.22)              (0.54)                    
-------------------------------------------------------------
R^2             0.24                                         
Adj. R^2        0.23                                         
Num. obs.    1173                                            
=============================================================
*** p < 0.001, ** p < 0.01, * p < 0.05

We can see that with heteroskedasticity and autocorrelation consistent (HAC) standard errors, the percent of male population (10 - 29 yr old) is no longer a significant predictor in our model.

Cross Sectional Dependence

If a federal law imposed restrictions on gun ownership or licensing requirements then the changes would likely affect all 50 states. This is an example of Cross Sectional Dependence and not accounted for in a fixed effect model. Other scenarios could also trigger cross sectional dependence that we should take into consideration. For example, security policies and law enforcement efforts might change after an extraordinary event (think of mass shootings or terrorist attacks) thus influencing law enforcement practices in all states. We can check for cross sectional dependence using the Pesaran cross sectional dependence test or pcdtest().

pcdtest(twoway_effects)
    Pesaran CD test for cross-sectional dependence in panels

data:  formula
z = 3.9121, p-value = 9.148e-05
alternative hypothesis: cross-sectional dependence

As we've seen with other tests, the null hypothesis is that there is no cross sectional dependence. The p-value, however tells that there is indeed cross-sectional dependence and we need to correct it. There are two general approaches to correcting for cross sectional dependence.

Beck and Katz (1995) method or Panel Corrected Standard Errors (PCSE): We can obtain Panel Corrected Standard Errors (PCSE) by first obtaining a robust variance-covariance matrix for panel models with the Beck and Katz (1995) method using the vcovBK()) and passing it to the familiar coeftest() function.

twoway_effects_pcse <- coeftest(twoway_effects, 
                                vcov = vcovBK(twoway_effects, 
                                              type="HC3", 
                                              cluster = "group")) 

twoway_effects_pcse
t test of coefficients:

              Estimate Std. Error t value  Pr(>|t|)    
shallYES    -0.5640474  0.7662556 -0.7361    0.4618    
incarc_rate  0.0209756  0.0028249  7.4253 2.254e-13 ***
pm1029       0.7326357  0.5118496  1.4313    0.1526    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The results from PCSE are sensitive to the ratio between the number of time periods in the dataset (T) and the total number of observations (N). When we're dealing with large datasets (i.e. the T/N ratio is small), we use the Driscoll and Kraay method instead.

Driscoll and Kraay (1998) (SCC): The cross-sectional and serial correlation (SCC) method by Driscoll and Kraay addresses the limitations of Beck and Katz's PCSE method is therefore preferred for obtaining heteroskedasticity and autocorrelation consistent errors that are also robust to cross-sectional dependence. We can get SCC corrected covariance matrix using the vcovSCC() function.

twoway_effects_scc <- coeftest(twoway_effects,
                               vcov = vcovSCC(twoway_effects, 
                                              type="HC3", 
                                              cluster = "group"))

twoway_effects_scc
t test of coefficients:

             Estimate Std. Error t value Pr(>|t|)  
shallYES    -0.564047   0.542698 -1.0393  0.29888  
incarc_rate  0.020976   0.010321  2.0324  0.04236 *
pm1029       0.732636   0.551066  1.3295  0.18396  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
screenreg(list(state_effects, 
               time_effects, 
               twoway_effects, 
               twoway_effects_pcse, 
               twoway_effects_scc), 
          custom.model.names = c("State Effects", 
                                 "Time Effects", 
                                 "Twoway Fixed Effects", 
                                 "PCSE", 
                                 "SCC"))
==================================================================================
             State Effects  Time Effects  Twoway Fixed Effects  PCSE       SCC    
----------------------------------------------------------------------------------
shallYES       -1.45 ***      -0.25         -0.56               -0.56      -0.56  
               (0.32)         (0.30)        (0.33)              (0.77)     (0.54) 
incarc_rate     0.02 ***       0.04 ***      0.02 ***            0.02 ***   0.02 *
               (0.00)         (0.00)        (0.00)              (0.00)     (0.01) 
pm1029          0.96 ***       0.21          0.73 ***            0.73       0.73  
               (0.09)         (0.14)        (0.22)              (0.51)     (0.55) 
----------------------------------------------------------------------------------
R^2             0.18           0.72          0.24                                 
Adj. R^2        0.18           0.71          0.23                                 
Num. obs.    1173           1173          1173                                    
==================================================================================
*** p < 0.001, ** p < 0.01, * p < 0.05

Exercises

Download Comparative Political Dataset Codebook

Variable Description
vturn Voter turnout in election
judrev Judicial review (existence of an independent body which decides whether laws conform to the constitution).
Coded 0 = no, 1 = yes.
ud Net union membership as a proportion wage and salary earners in employment (union density)
unemp Unemployment rate, percentage of civilian labour force.
unemp_pmp Cash expenditure for unemployment benefits as a percentage of GDP (public and mandatory private).
realgdpgr Growth of real GDP, percent change from previous year.
debt Gross general government debt (financial liabilities) as a percentage of GDP.
socexp_t_pmp Total public and mandatory private social expenditure as a percentage of GDP.
  1. Estimate a model for the electoral fractionalization of the party system as coded by the rae_ele variable in the dataset using all the variables listed above.

    • Estimate a fixed effect model and test for country and time fixed effects.
    • Run the necessary tests to check whether country and time fixed effects are present.
  2. Estimate a twoway model and compare to the previous country and time fixed effect models.

  3. Test for serial correlation and cross sectional dependence in the twoway model.
  4. If either serial correlation or cross sectional dependence is present, use the methods discussed in the seminar to obtain heteroskedastic and autocorrelation consistent standard errors.
  5. Compare the HAC and spatially robust standard errors with the twoway model estimated earlier.
  6. Display the results in publication-ready tables and discuss the substantively significant findings.