chi-sq-test.knit

The Chi-square (χ²) Test

Pearson’s χ² test of independence (or the χ² goodness-of-fit test) can be used when we have some variables that can be divided into two or more categories from which we want to determine if there are differences among them. For example, we may want to determine if a sample of seniors from two separate universities have differing opinions (positive, negative, or neutral) on their college experience, or whether three brands of a medication are preferred by different socioeconomic classes (lower-, middle-, and upper-class).

The χ² test compares the observed counts (your data) with the expected counts if the data came from the same distributions. The null (H₀) and alternative (H_A) hypotheses to be tested are:

H₀: The variables have the same distributions H_A: There is a significant difference in the distribution of at least one of the variables

Before using the χ² test to test association (or independence) we should make sure we meet the following conditions:

The data is in non-transformed counts or frequencies (not percentages)
Each level is mutually exclusive
Each subject or observation can only contribute once
The expected values should be >5 for at least 80% of the cells and none should be < 1
No cells in the contingency table are 0

When either of the last two conditions are not satisfied then the Fisher-exact test should be considered instead.

For each cell in the contengency table the χ² test will first calculate the differences between the expected and observed values, squares that difference, then divides by the expected counts. The sum of those values is then calculated to give the χ²-statistic. The formula for the χ² test is:

where x_i and m_i are the observed and expected values, respectively.

Once the χ²-statistic is calculated the corresponding p-value can be determined from the χ² distribution. A p-value less than the chosen statistical threshold (typically α = 0.05) allows the null hypothesis to be rejected, to conclude that there is a statistically significant relationship between the observations in the categories and which variable they belong to.

The Chi-square (χ2) Test

The Chi-square (χ²) Test