Logo

Data Learning Center Statistical Guides

chi-sq-test.knit

The Chi-square (χ2) Test

Pearson’s χ2 test of independence (or the χ2 goodness-of-fit test) can be used when we have some variables that can be divided into two or more categories from which we want to determine if there are differences among them. For example, we may want to determine if a sample of seniors from two separate universities have differing opinions (positive, negative, or neutral) on their college experience, or whether three brands of a medication are preferred by different socioeconomic classes (lower-, middle-, and upper-class).

The χ2 test compares the observed counts (your data) with the expected counts if the data came from the same distributions. The null (H0) and alternative (HA) hypotheses to be tested are:

H0: The variables have the same distributions
HA: There is a significant difference in the distribution of at least one of the variables


Before using the χ2 test to test association (or independence) we should make sure we meet the following conditions:

  • The data is in non-transformed counts or frequencies (not percentages)
  • Each level is mutually exclusive
  • Each subject or observation can only contribute once
  • The expected values should be >5 for at least 80% of the cells and none should be < 1
  • No cells in the contingency table are 0

When either of the last two conditions are not satisfied then the Fisher-exact test should be considered instead.

For each cell in the contengency table the χ2 test will first calculate the differences between the expected and observed values, squares that difference, then divides by the expected counts. The sum of those values is then calculated to give the χ2-statistic. The formula for the χ2 test is:

where xi and mi are the observed and expected values, respectively.

Once the χ2-statistic is calculated the corresponding p-value can be determined from the χ2 distribution. A p-value less than the chosen statistical threshold (typically α = 0.05) allows the null hypothesis to be rejected, to conclude that there is a statistically significant relationship between the observations in the categories and which variable they belong to.

DLC_statistical_guides