Negative Binomial Regression in R
For this guide we will revisit the crab data set that we used for Poisson regressions, which observed the relationship between the number of male crabs attaching to a female crab’s nest (Satellites) and the characteristics of the female crab that include the color (Color), spine condition (Spine), weight (Weight), and carapace width (Width).
We will again read the data into R
using the
read.table()
function. We will also go ahead and reassign
the Color and Spine variables to factors.
<- read.csv("../../dat/crab.csv")
crab
$Color <- factor(crab$Color)
crab$Spine <- factor(crab$Spine) crab
To fit a negative binomial model in R
we will need to
load in an external package. We will use the MASS
package
for this guide which includes the glm.nb()
function that
builds upon glm()
to include negative binomial estimation
by adding the θ parameter. After loading the library we can fit a
negative binomial model similarly to how we did for the Poisson model,
except we do not need to define a family
option.
library(MASS)
<- glm.nb(Satellites ~ Color + Spine + Weight + Width,
crab.nb data = crab)
Before diving into analyzing the results of the model, we will first check if the model fits well, both against a null model (one with only an intercept and no predictor variables) and against the Poisson model with the same model formula.
For the null model we can use the update()
function to
take our original model, crab.nb
, and change the model
formula where we keep the response variable with .
to the
left of a ~
and include only 1
on the right
side so that the model includes an intercept but no predictor variables.
Then, we can use anova()
to compare our null model and full
model.
<- update(crab.nb, . ~ 1)
crab.nb0
anova(crab.nb0, crab.nb)
## Likelihood ratio tests of Negative Binomial Models
##
## Response: Satellites
## Model theta Resid. df 2 x log-lik. Test df LR stat.
## 1 1 0.757759 172 -767.4092
## 2 Color + Spine + Weight + Width 0.965038 165 -745.3188 1 vs 2 7 22.09047
## Pr(Chi)
## 1
## 2 0.002450761
From the results we can see that the full model has a significantly higher log likelihood, and therefore is a better fit on the data than the null model.
Next, we should compare the negative binomial model with its Poisson
counter part. To do so, we will fit a Poisson regression model with the
same model formula. Then, we can use the pchisq()
function
to provide the associated p-value for the associated χ^2 statistic which
we calculate by taking the difference of the log likelihoods of the two
models and multiplying by 2. Note that it is important that we subtract
the log likelihood of the Poisson model from the log likelihood of the
negative binomial when calculating the χ^2 statistic and not the other
way around.
<- glm(Satellites ~ Color + Spine + Weight + Width,
crab.pois data = crab,
family = "poisson")
pchisq(2 * (logLik(crab.nb) - logLik(crab.pois)), df = 1, lower.tail = FALSE)
## 'log Lik.' 1.42397e-36 (df=9)
The associated p-value is extremely small, indicating that the negative binomial model is better over the Poisson model.
Now that we have evidence that the negative binomial model is
appropriate for this data, we can use the anova()
and
summary()
functions to get results from the fitted
model.
anova(crab.nb)
## Warning in anova.negbin(crab.nb): tests made without re-estimating 'theta'
## Analysis of Deviance Table
##
## Model: Negative Binomial(0.965), link: log
##
## Response: Satellites
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev Pr(>Chi)
## NULL 172 220.68
## Color 3 6.0196 169 214.66 0.1107
## Spine 2 2.0357 167 212.62 0.3614
## Weight 1 16.1028 166 196.52 5.999e-05 ***
## Width 1 0.0005 165 196.52 0.9818
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(crab.nb)
##
## Call:
## glm.nb(formula = Satellites ~ Color + Spine + Weight + Width,
## data = crab, init.theta = 0.9650380207, link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.8788 -1.3685 -0.3267 0.4224 2.2288
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.274784 1.950675 -0.141 0.8880
## Color2 -0.320766 0.372716 -0.861 0.3894
## Color3 -0.596232 0.417342 -1.429 0.1531
## Color4 -0.579357 0.466470 -1.242 0.2142
## Spine2 -0.242827 0.398357 -0.610 0.5421
## Spine3 0.042811 0.248427 0.172 0.8632
## Weight 0.700752 0.356375 1.966 0.0493 *
## Width -0.002522 0.099678 -0.025 0.9798
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(0.965) family taken to be 1)
##
## Null deviance: 220.68 on 172 degrees of freedom
## Residual deviance: 196.52 on 165 degrees of freedom
## AIC: 763.32
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 0.965
## Std. Err.: 0.176
##
## 2 x log-likelihood: -745.319
We again see that Weight is a statistically significant
predictor of Satellites. However, unlike the Poisson model,
when including the dispersion parameter the Color variable is
no longer statistically significant. Therefore, with a positive
estimated coefficient, we can conclude that female crab Weight
has a positive relationship with the number of male Satellites
attached to the nest. If we are interested in the strength of this
effect we can take the exponential of the coefficient to see the
incident rate ratio. To do so, we can use the exp()
function to return the exponentials of the coefficients
(coef()
) and their confidence intervals
(confint()
) which we will merge into a table using
cbind()
.
exp(cbind(est = coef(crab.nb), confint(crab.nb)))
## est 2.5 % 97.5 %
## (Intercept) 0.7597359 0.01148486 48.600768
## Color2 0.7255934 0.33148037 1.498378
## Color3 0.5508837 0.23555084 1.232776
## Color4 0.5602586 0.21563682 1.417637
## Spine2 0.7844071 0.36933889 1.712163
## Spine3 1.0437402 0.63712967 1.681390
## Weight 2.0152673 0.91857897 4.488111
## Width 0.9974814 0.80262898 1.240856
From the resulting table and knowing that only Weight was statistically significant in the model, we can conclude with 95% certainty that for every 1 unit increase in female crab weight the number of expected male Satellites on its nest increases by 2.015 times.