Testing.qmd

# Statistical testing
```{r setup5,echo=F}
library(dplyr)
library(Hmisc)
library(ggplot2)
library(haven)
library(foreign)
datadir<-"/home/piet/Dropbox/work/UKDS/rguide/data/"
bsa<-read.spss(paste0(datadir,"8849spss_V1/bsa2017_open_enviropol.sav"), 
               to.data.frame = TRUE,
               use.value.labels=TRUE,
               max.value.labels = 9)  
```
This section covers how to implement common statistical tests in R with survey data. A working knowledge of these tests and their theoretical assumptions is assumed.

## Differences between means

Two common ways of conducting statistical testing with means in samples consist in testing  whether they are significantly different from 0 (one  sample t-test), or whether they differ between two  groups (two samples t test). In the latter case, one can further distinguish between independent samples (where means come from different groups), or paired  samples (when the same measure is taken at several point in time). Given that it is the most widely used in social science, we will only cover the former here.

Several R packages provide functions for conducting t tests. We will be using `t.test()`, part the `stats` package. Suppose we would like to test whether `libauth` significantly differs between men and women. A two sided test is the default (with H_0 that there is no differences between groups), and H_1 that the  group means do indeed differ. The test is specified with a formula with on the left hand side the quantity to be tested and on the right  hand side the grouping variable. 

One sided tests can be conducted by specifying that the alternative hypothesis (H_1) is either **greater** or **less**. `t.test()` assumes that by default the variances are unequal. this can be changed with the `var.equal=T` option.

```{r 8.1,message=F}
# Testing for significant differences in liberal vs authoritarian score
t.test(libauth~Rsex,data=bsa)
```
No significant differences (ie the difference in  `libauth` between men and women is not significantly different from zero)
```{r 8.2,message=F}

# Testing for whether men have a lower (ie more left-wing)  score
t.test(leftrigh~Rsex,data=bsa, alternative="less")      
```
Men have a significantly lower  score on the scale (at the .05 threshold)  and are therefore on average leaning more to the left than women.


## Differences in variance

Another common significance test in social science is the **variance test** which consists of  testing whether the variances of the same variable across two groups  are equal. This is usually achieved by testing whether the ratio of the variance between the two groups is significantly different from zero. With the BSA data, this amounts to testing whether men and women are more homogenous with regard to their political views.

The syntax for the variance test `var.test()` also included in `stats` is almost identical to that of `t.test()`

```{r 8.3,message=F}
# Testing for gender differences in liberal vs authoritarian score
var.test(libauth~Rsex,data=bsa)
```
Significant differences in the variance  between men and women, but only at the .1 threshold.
```{r 8.4,message=F}

# Testing for whether men have a lower (ie more left-wing)  score
var.test(leftrigh~Rsex,data=bsa,alternative="greater")      
```
The variance of left-right political leaning is larger among men than women, in other words there are more divergence between men than between women.

## Significance of measures of association

**Between continuous variables**
  
  Another type of  common statistical   test in social science is about examining whether a coefficient of correlation is significantly different from 0 (alternative hypothesis).

```{r 8.5}
cor.test(bsa$leftrigh, bsa$libauth, use='complete.obs')
```

As we could have suspected the coefficient of correlation between the two scales is so small that it cannot be said to be  significantly  different from zero.

**Between categorical variables**
  
  The chi-square test of independence is a very common  test of association between categorical variables. It consists in examining whether the association between two variables is likely to be due to chance or not, in other words whether the variability observed in a contingency table is significantly different from what would be expected were it due to chance.

We will be using `chisq.test()`, also  from the `stats` package. By contrast with the test of correlation, the `chisq.test()` needs to be applied to  contingency tables that have already been computed.
Let us go back to an earlier example, and attempt to test whether the gender differences in political affiliations are due to chance or not. 

```{r 8.6}
chisq.test(xtabs(~PartyId2 +Rsex,bsa))
```

As the R output shows, there are highly significant gender differences in political affiliations (p<.001).

\newpage