# Introduction to Quantitative Methods

## 3. T-test for Difference in Means and Hypothesis Testing

### 3.2 Solutions

If you're using a UCL computer, please make sure you're running R version 3.2.0. Some of the seminar tasks and exercises will not work with older versions of R. Click here for help on how to start the new version of R on UCL computers.

#### Exercise 1

Rename the variable wbgi_pse into pol.stability.

#### Solution

To do this we need to do two things. First, load the Stata data set and for that we need the foreign library. Second, we need to load the dplyr library in order to use rename().

library(foreign)
library(dplyr)

Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

filter, lag

The following objects are masked from 'package:base':

intersect, setdiff, setequal, union

df <- read.dta("http://uclspp.github.io/PUBLG100/data/QoG2012.dta")
df <- rename(df, pol.stability = wbgi_pse)


#### Exercise 2

Check whether political stability is different in countries that were former colonies (former_col == 1).

#### Solution

The variable former_col is binary, while pol.stability is continuous. Therefore, we use the t-test. Before we do this we declare former_col to be a factor variable. This will make it easier for you to interpret which group has the larger and which has the smaller mean.

df$former_col <- factor(df$former_col, labels = c("not ex colony", "ex colony"))
t.test(df$pol.stability ~ df$former_col, mu=0, alt="two.sided", conf=0.95)

    Welch Two Sample t-test

data:  df$pol.stability by df$former_col
t = 3.4674, df = 139.35, p-value = 0.0006992
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.2224004 0.8125053
sample estimates:
mean in group not ex colony     mean in group ex colony
0.2858409                  -0.2316120


#### Exercise 3

Choose an alpha level of .01.

#### Solution

The alpha level is the probability that we would see this data given that the null hypothesis was true. If this is very unlikely, we reject the null hypothesis. So, given an alpha level, your confidence-level that the alternative hypothesis is true is 1 - alpha level. This means with an alpha level of 0.01, you would set the argument conf = 0.99

t.test(df$pol.stability ~ df$former_col, mu=0, alt="two.sided", conf=0.99)

    Welch Two Sample t-test

data:  df$pol.stability by df$former_col
t = 3.4674, df = 139.35, p-value = 0.0006992
alternative hypothesis: true difference in means is not equal to 0
99 percent confidence interval:
0.1277220 0.9071837
sample estimates:
mean in group not ex colony     mean in group ex colony
0.2858409                  -0.2316120


#### Exercise 4

We claim the difference in means is 0.3. Check this hypothesis.

#### Solution

Faced with the claim that the difference is 0.3, our alternative hypothesis would be that it is not 0.3. Therefore, the null hypothesis becomes that the difference in means is 0.3 and we check against that. To do this we set the argument mu=.3. If you run the code below you will see that the confidence interval includes 0.3. This means, we cannot reject the hypothesis that the difference in means is 0.3.

t.test(df$pol.stability ~ df$former_col, mu=0.3, alt="two.sided", conf=0.99)

    Welch Two Sample t-test

data:  df$pol.stability by df$former_col
t = 1.4571, df = 139.35, p-value = 0.1473
alternative hypothesis: true difference in means is not equal to 0.3
99 percent confidence interval:
0.1277220 0.9071837
sample estimates:
mean in group not ex colony     mean in group ex colony
0.2858409                  -0.2316120


#### Exercise 5

Rename the variable lp_lat_abst into latitude.

#### Solution

df <- rename(df, latitude = lp_lat_abst)


#### Exercise 6

Check whether latitude and political stability are correlated.

#### Solution

To do this you would first find out how the variables are scaled. We already know political stability is continuous. Our new variable latitude measures the distance to the equator, so a good guess is that is interval scaled. To be sure let's check using a frequency table.

table(df$latitude)   0 0.011111100204289 0.0135556003078818 1 5 1 0.0138889001682401 0.0222222004085779 0.0255555994808674 1 3 1 0.0350000001490116 0.0366666987538338 0.0444444008171558 1 1 2 0.0477778017520905 0.0483332984149456 0.0555556006729603 1 1 2 0.0666666999459267 0.0700000002980232 0.0727778002619743 3 1 1 0.0777778029441833 0.0888888984918594 0.092222198843956 2 6 1 0.100000001490116 0.103333301842213 0.111111097037792 2 1 5 0.122222200036049 0.125555604696274 0.133333295583725 2 1 1 0.134111106395721 0.134444400668144 0.13666670024395 1 1 1 0.144444495439529 0.145555600523949 0.146111100912094 4 1 1 0.147555604577065 0.147777795791626 0.148333296179771 1 2 1 0.150000005960464 0.150333300232887 0.155555605888367 1 1 1 0.166666701436043 0.170000001788139 0.177777796983719 7 1 4 0.188888907432556 0.189222201704979 0.190555602312088 2 1 1 0.191111102700233 0.200000002980232 0.201666697859764 1 2 2 0.211111098527908 0.222222194075584 0.224111095070839 2 5 1 0.233333304524422 0.236666694283485 0.244444400072098 1 1 3 0.255555599927902 0.258888900279999 0.26666671037674 2 1 2 0.268333286046982 0.277777791023254 0.281111091375351 1 2 1 0.288888901472092 0.292222201824188 0.300000011920929 1 1 2 0.303333312273026 0.311111092567444 0.322222203016281 1 2 1 0.325555503368378 0.333333313465118 0.344444513320923 2 2 1 0.347777813673019 0.355555593967438 0.366666704416275 1 2 3 0.372222185134888 0.377777814865112 0.388888895511627 1 2 3 0.394444406032562 0.400000005960464 0.411111086606979 1 1 1 0.422222197055817 0.433333307504654 0.436666697263718 1 3 1 0.444444388151169 0.447777807712555 0.455555588006973 4 1 4 0.461111098527908 0.466666698455811 0.469999998807907 1 1 1 0.472222208976746 0.477777808904648 0.482666611671448 1 1 1 0.488888889551163 0.501111090183258 0.511111080646515 2 1 4 0.522222220897675 0.523333311080933 0.52444452047348 3 1 1 0.533333420753479 0.537777781486511 0.544444382190704 1 1 1 0.549444377422333 0.561111092567444 0.566666722297668 2 1 1 0.577777802944183 0.581111073493958 0.588888883590698 1 1 2 0.600000023841858 0.622222185134888 0.633333325386047 1 2 1 0.655555486679077 0.666666686534882 0.688888907432556 1 2 2 0.711111128330231 0.722222208976746 1 1  We see that the variable ranges from zero to one and it is at least interval scaled. In case you wonder why zero to one, latitudes have been divided by 90. We now know, both variables are at least interval scaled. We check for a relationship visually first. plot(df$latitude, df$pol.stability)  You may have spot a positive correlation. Positive in the sense that larger distances to the equator are related to more political stability. We now apply the appropriate test statistic, Pearson's R: r <- cor.test(df$latitude, df$pol.stability, use="complete.obs", conf.level = 0.99) r   Pearson's product-moment correlation data: df$latitude and df\$pol.stability
t = 5.8247, df = 185, p-value = 2.492e-08
alternative hypothesis: true correlation is not equal to 0
99 percent confidence interval:
0.2224487 0.5413167
sample estimates:
cor
0.3936597