Likert-scale.knit

Likert scale analysis

For this tutorial we will be using a modified data set that contains Likert responses collected from faculty at the Open University of Catalonia on their opinions for using Wikipedia as a teaching resource. The data set was originally provided by the UCI Maching Learning Repository.

wiki.dat <- read.csv("../../dat/wiki4HE_rev.csv", stringsAsFactors = TRUE)

With Likert scale analysis we take a group of Likert items and sum (or average) their scores for each respondent. This method is beneficial when we are not interested in how respondents answer some specific question but rather the opinion of respondents on a broader topic.

For example, instead of hypothesizing that respondents will answer the question “Articles in Wikipedia are reliable” will be more or lesser in one group than another, we can create and combine similar questions to get a broader sense on how people view the ‘quality’ of Wikipedia.

## Warning: package 'likert' was built under R version 4.2.1

Comparing two groups

Suppose we are interested if two groups respond differently regarding their attitudes toward a subject. For example, let us question whether males and females view the quality of Wikipedia articles differently.

First we take the sum (or alternatively the average) of the responses for each respondent in the question group of interest. An easy way to perform this in R is to use the rowSums() function to create a new variable (column) that sums the scores of all questions belonging to our topic, which in this case would be all of the quality (Qu) questions.

wiki.dat$Qu <- rowSums(wiki.dat[, c("Qu1", "Qu2", "Qu3", "Qu4", "Qu5")])

Comparing the distributions of the summed Likert responses below, it is difficult to determine whether there is a different between how males and females view the quality of Wikipedia.

Therefore, we should employ some statistical test to help us determine if there are differences to how their genders view Wikipedia article quality. When we combine multiple Likert-type responses into a single score (at least 4 or 5 items), the distribution of the scores changes from a clear ordinal scale to a more approximately normal distribution. Thus, thanks to the Central Limit Theorum, we can employ parametric tests as long as our sample size is large. Our data set contains 796 respondents which is plenty large enough to meet this condition.

The parametric test we can employ then is the t-test, which we can run using the t.test() function with the formula “scores ~ group”.

t.test(Qu ~ GENDER, data = wiki.dat)

## 
##  Welch Two Sample t-test
## 
## data:  Qu by GENDER
## t = -2.1664, df = 768.52, p-value = 0.03059
## alternative hypothesis: true difference in means between group Female and group Male is not equal to 0
## 95 percent confidence interval:
##  -0.85111378 -0.04191159
## sample estimates:
## mean in group Female   mean in group Male 
##             15.60725             16.05376

From the results of the t-test (p-value less than 0.05) we can conclude that there is a statistically significant difference between the mean responses of how males and females view the quality of Wikipedia articles. While we can see from the sample estimates that males rate higher on average than females, the difference is quite small and the survey was originally on an ordinal scale, so the numerical difference between our two groups has little meaning for us. The big take away is that males tend to (at least slightly) view the quality of Wikipedia articles higher than females based on the five questions provided in our survey.

Comparing multiple groups

Now let us test the hypothesis that there are differences in how respondents among our six scholarly domains view the quality of Wikipedia articles.

To test our hypothesis, we can employ the one-way ANOVA test using the aov() command using similar formula syntax as with the t-test. We can also set the results as an object to use with later functions. Importantly, we will first look at residual plots using plot() to observe how well the ANOVA model fit then use summary() to see a table of the results. We will also use par(mfrow=c(2,2)) so that all four diagnostic plots are drawn in the same frame.

domain.fit <- aov(lm(Qu ~ DOMAIN, data = wiki.dat))

par(mfrow=c(2,2))
plot(domain.fit)

summary(domain.fit)

##              Df Sum Sq Mean Sq F value Pr(>F)  
## DOMAIN        5    118  23.585    2.74 0.0183 *
## Residuals   790   6799   8.607                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From the plots there are no notable patterns and the points follow along the dashed line in the normal Q-Q plot, suggesting a good fit. In the results table we see that there is a statistically significant difference in the scores between at least some of the domains, but which ones? We will need to follow up our ANOVA with a post-hoc test, in this case Tukey’s honestly significant difference (HSD) test which we can do with the TukeyHSD() function.

TukeyHSD(domain.fit)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = lm(Qu ~ DOMAIN, data = wiki.dat))
## 
## $DOMAIN
##                                                    diff         lwr        upr     p adj
## Engineering & Architecture-Arts & Humanities  0.7233622 -0.27840901  1.7251334 0.3080720
## Health Sciences-Arts & Humanities            -0.1016136 -1.34922324  1.1459961 0.9999069
## Law & Politics-Arts & Humanities             -0.5418075 -1.66715036  0.5835353 0.7418628
## Other-Arts & Humanities                       0.2112871 -0.61587008  1.0384442 0.9782634
## Sciences-Arts & Humanities                    0.8692810 -0.48581117  2.2243733 0.4452908
## Health Sciences-Engineering & Architecture   -0.8249758 -2.10635708  0.4564055 0.4410434
## Law & Politics-Engineering & Architecture    -1.2651697 -2.42784107 -0.1024984 0.0238135
## Other-Engineering & Architecture             -0.5120751 -1.38934197  0.3651917 0.5537977
## Sciences-Engineering & Architecture           0.1459188 -1.24032900  1.5321667 0.9996711
## Law & Politics-Health Sciences               -0.4401940 -1.82033488  0.9399469 0.9436892
## Other-Health Sciences                         0.3129006 -0.83713476  1.4629360 0.9713217
## Sciences-Health Sciences                      0.9708946 -0.60221294  2.5440022 0.4903808
## Other-Law & Politics                          0.7530946 -0.26299988  1.7691891 0.2792945
## Sciences-Law & Politics                       1.4110886 -0.06692832  2.8891055 0.0710521
## Sciences-Other                                0.6579940 -0.60783636  1.9238243 0.6740555

Relationships between responses

wiki.dat$Use <- rowSums(wiki.dat[, c("Use1", "Use2", "Use3", "Use4", "Use5")])

use.qu.fit <- lm(Use ~ Qu, data = wiki.dat)
summary(use.qu.fit)

## 
## Call:
## lm(formula = Use ~ Qu, data = wiki.dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.5152  -2.7438   0.0594   2.5863  17.1451 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.01805    0.76388  -0.024    0.981    
## Qu           0.78730    0.04733  16.634   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.936 on 794 degrees of freedom
## Multiple R-squared:  0.2584, Adjusted R-squared:  0.2575 
## F-statistic: 276.7 on 1 and 794 DF,  p-value: < 2.2e-16

plot(use.qu.fit)