Likert scale analysis
For this tutorial we will be using a modified data set that contains Likert responses collected from faculty at the Open University of Catalonia on their opinions for using Wikipedia as a teaching resource. The data set was originally provided by the UCI Maching Learning Repository.
<- read.csv("../../dat/wiki4HE_rev.csv", stringsAsFactors = TRUE) wiki.dat
With Likert scale analysis we take a group of Likert items and sum (or average) their scores for each respondent. This method is beneficial when we are not interested in how respondents answer some specific question but rather the opinion of respondents on a broader topic.
For example, instead of hypothesizing that respondents will answer the question “Articles in Wikipedia are reliable” will be more or lesser in one group than another, we can create and combine similar questions to get a broader sense on how people view the ‘quality’ of Wikipedia.
## Warning: package 'likert' was built under R version 4.2.1
Comparing two groups
Suppose we are interested if two groups respond differently regarding their attitudes toward a subject. For example, let us question whether males and females view the quality of Wikipedia articles differently.
First we take the sum (or alternatively the average) of the responses
for each respondent in the question group of interest. An easy way to
perform this in R is to use the rowSums()
function to
create a new variable (column) that sums the scores of all questions
belonging to our topic, which in this case would be all of the quality
(Qu
) questions.
$Qu <- rowSums(wiki.dat[, c("Qu1", "Qu2", "Qu3", "Qu4", "Qu5")]) wiki.dat
Comparing the distributions of the summed Likert responses below, it is difficult to determine whether there is a different between how males and females view the quality of Wikipedia.
Therefore, we should employ some statistical test to help us determine if there are differences to how their genders view Wikipedia article quality. When we combine multiple Likert-type responses into a single score (at least 4 or 5 items), the distribution of the scores changes from a clear ordinal scale to a more approximately normal distribution. Thus, thanks to the Central Limit Theorum, we can employ parametric tests as long as our sample size is large. Our data set contains 796 respondents which is plenty large enough to meet this condition.
The parametric test we can employ then is the t-test, which we can
run using the t.test()
function with the formula “scores ~
group”.
t.test(Qu ~ GENDER, data = wiki.dat)
##
## Welch Two Sample t-test
##
## data: Qu by GENDER
## t = -2.1664, df = 768.52, p-value = 0.03059
## alternative hypothesis: true difference in means between group Female and group Male is not equal to 0
## 95 percent confidence interval:
## -0.85111378 -0.04191159
## sample estimates:
## mean in group Female mean in group Male
## 15.60725 16.05376
From the results of the t-test (p-value less than 0.05) we can
conclude that there is a statistically significant difference between
the mean responses of how males and females view the quality of
Wikipedia articles. While we can see from the
sample estimates
that males rate higher on average than
females, the difference is quite small and the survey was originally on
an ordinal scale, so the numerical difference between our two groups has
little meaning for us. The big take away is that males tend to (at least
slightly) view the quality of Wikipedia articles higher than females
based on the five questions provided in our survey.
Comparing multiple groups
Now let us test the hypothesis that there are differences in how respondents among our six scholarly domains view the quality of Wikipedia articles.
To test our hypothesis, we can employ the one-way ANOVA test using
the aov()
command using similar formula syntax as with the
t-test. We can also set the results as an object to use with later
functions. Importantly, we will first look at residual plots using
plot()
to observe how well the ANOVA model fit then use
summary()
to see a table of the results. We will also use
par(mfrow=c(2,2))
so that all four diagnostic plots are
drawn in the same frame.
<- aov(lm(Qu ~ DOMAIN, data = wiki.dat))
domain.fit
par(mfrow=c(2,2))
plot(domain.fit)
summary(domain.fit)
## Df Sum Sq Mean Sq F value Pr(>F)
## DOMAIN 5 118 23.585 2.74 0.0183 *
## Residuals 790 6799 8.607
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From the plots there are no notable patterns and the points follow
along the dashed line in the normal Q-Q plot, suggesting a good fit. In
the results table we see that there is a statistically significant
difference in the scores between at least some of the domains, but which
ones? We will need to follow up our ANOVA with a post-hoc test, in this
case Tukey’s honestly significant difference (HSD) test which we can do
with the TukeyHSD()
function.
TukeyHSD(domain.fit)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = lm(Qu ~ DOMAIN, data = wiki.dat))
##
## $DOMAIN
## diff lwr upr p adj
## Engineering & Architecture-Arts & Humanities 0.7233622 -0.27840901 1.7251334 0.3080720
## Health Sciences-Arts & Humanities -0.1016136 -1.34922324 1.1459961 0.9999069
## Law & Politics-Arts & Humanities -0.5418075 -1.66715036 0.5835353 0.7418628
## Other-Arts & Humanities 0.2112871 -0.61587008 1.0384442 0.9782634
## Sciences-Arts & Humanities 0.8692810 -0.48581117 2.2243733 0.4452908
## Health Sciences-Engineering & Architecture -0.8249758 -2.10635708 0.4564055 0.4410434
## Law & Politics-Engineering & Architecture -1.2651697 -2.42784107 -0.1024984 0.0238135
## Other-Engineering & Architecture -0.5120751 -1.38934197 0.3651917 0.5537977
## Sciences-Engineering & Architecture 0.1459188 -1.24032900 1.5321667 0.9996711
## Law & Politics-Health Sciences -0.4401940 -1.82033488 0.9399469 0.9436892
## Other-Health Sciences 0.3129006 -0.83713476 1.4629360 0.9713217
## Sciences-Health Sciences 0.9708946 -0.60221294 2.5440022 0.4903808
## Other-Law & Politics 0.7530946 -0.26299988 1.7691891 0.2792945
## Sciences-Law & Politics 1.4110886 -0.06692832 2.8891055 0.0710521
## Sciences-Other 0.6579940 -0.60783636 1.9238243 0.6740555
Relationships between responses
$Use <- rowSums(wiki.dat[, c("Use1", "Use2", "Use3", "Use4", "Use5")])
wiki.dat
<- lm(Use ~ Qu, data = wiki.dat)
use.qu.fit summary(use.qu.fit)
##
## Call:
## lm(formula = Use ~ Qu, data = wiki.dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.5152 -2.7438 0.0594 2.5863 17.1451
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.01805 0.76388 -0.024 0.981
## Qu 0.78730 0.04733 16.634 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.936 on 794 degrees of freedom
## Multiple R-squared: 0.2584, Adjusted R-squared: 0.2575
## F-statistic: 276.7 on 1 and 794 DF, p-value: < 2.2e-16
plot(use.qu.fit)