Introduction to Quantitative Methods

4. Bivariate linear regression models

4.2 Solutions

If you're using a UCL computer, please make sure you're running R version 3.2.0. Some of the seminar tasks and exercises will not work with older versions of R. Click here for help on how to start the new version of R on UCL computers.

library(texreg)
Warning: package 'texreg' was built under R version 3.2.3
library(Zelig)
library(dplyr)
# Change your working directory
setwd("N:/PUBLG100")

# Check your working directory
getwd()

# clear the environment
rm(list = ls())

Exercise 1

Load the immigration dataset (http://uclspp.github.io/PUBLG100/data/communities_immig.csv) and merge it with the communities dataset.

Solution

We know that we need data from all three files for comparing different regression models, so let's just merge them all at once.

communities <- read.csv("http://uclspp.github.io/PUBLG100/data/communities.csv")
communities_employment <- read.csv("http://uclspp.github.io/PUBLG100/data/communities_employment.csv")
communities_immigration <- read.csv("http://uclspp.github.io/PUBLG100/data/communities_immig.csv")

communities <- merge(communities, communities_employment, by = c("state", "communityname"))
communities <- merge(communities, communities_immigration, by = c("state", "communityname"))

Exercise 2

Rename the PctImmigRec5 variable to RecentImmigration5.

Solution

Use the dplyr select() function to rename variables.

communities <- select(communities, 
                      state, 
                      Community = communityname, 
                      UnemploymentRate = PctUnemployed, 
                      NoHighSchool = PctNotHSGrad,
                      White = racePctWhite,
                      RecentImmigration5 = PctImmigRec5)

Exercise 3

Estimate a model explaining the relationship between unemployment rate and recent immigration over the past 5 years using the variable RecentImmigration5.

Solution

Use lm() to fit a linear model.

model_immigration <- lm(UnemploymentRate ~ RecentImmigration5, data = communities)

Exercise 4

Plot the regression line of the model.

Solution

Plot the observations and a regression line.

plot(communities$RecentImmigration5, communities$UnemploymentRate,
     xlab = "Recent Immigration",
     ylab = "Unemployment Rate")

abline(model_immigration, col = "red")

Exercise 5

How does this model compare to the models we estimated in the seminar with NoHighSchool and Minority as independent variables? Present your findings by comparing the output of all three models in side-by-side tables using the texreg package.

Solution

Fit the two linear models from the seminar and use screenreg() to display side-by-side tables for comparison.

model_education <- lm(UnemploymentRate ~ NoHighSchool, data = communities)

communities$Minority <- 1 - communities$White
model_minority <- lm(UnemploymentRate ~ Minority, data = communities)

screenreg(list(model_education, model_minority, model_immigration))
=========================================================
                    Model 1      Model 2      Model 3    
---------------------------------------------------------
(Intercept)            0.08 ***     0.26 ***     0.33 ***
                      (0.01)       (0.01)       (0.01)   
NoHighSchool           0.74 ***                          
                      (0.01)                             
Minority                            0.43 ***             
                                   (0.02)                
RecentImmigration5                               0.11 ***
                                                (0.02)   
---------------------------------------------------------
R^2                    0.55         0.27         0.01    
Adj. R^2               0.55         0.27         0.01    
Num. obs.           1994         1994         1994       
RMSE                   0.14         0.17         0.20    
=========================================================
*** p < 0.001, ** p < 0.01, * p < 0.05

Exercise 6

Save your model comparison table to a Microsoft Word document (or another format if you don't use Word).

Solution

Save the output to a document using htmlreg()

htmlreg(list(model_education, model_minority, model_immigration), file = "solutions4_model_comparison.doc")
The table was written to the file 'solutions4_model_comparison.doc'.

Exercise 7

Generate predicted values from the fitted model with RecentImmigration5 and plot the confidence interval using Zelig.

Solution

This follows the example exactly as we covered in the seminar. We first estimate the model with zelig(), then generate predicted values with setx() and sim().

z.out <- zelig(UnemploymentRate ~ RecentImmigration5, data = communities, model = "ls")
 How to cite this model in Zelig:
  Kosuke Imai, Gary King, and Olivia Lau. 2015.
  "ls: Least Squares Regression for Continuous Dependent Variables"
  in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software,"
  http://gking.harvard.edu/zelig
x.out <- setx(z.out, RecentImmigration5 = seq(0, 1, 0.1))
s.out <- sim(z.out, x = x.out)

plot(s.out, qi = "pv", xlab = "Recent Immigration (last 5 years)")

Exercise 8

Save all the plots as graphics files.

Solution

Use png() to send the plot to a file. When we're done plotting, we must close the file with dev.off()

png(filename = "solutions4_model_immigration.png")
plot(communities$RecentImmigration5, communities$UnemploymentRate,
     xlab = "Recent Immigration",
     ylab = "Unemployment Rate")
abline(model_immigration, col = "red")
dev.off()

Now let's do the same for the confidence interval plot.

png(filename = "solutions4_conf_interval.png")
plot(s.out, qi = "pv", xlab = "Recent Immigration")
dev.off()