Logo

Data Learning Center Statistical Guides

negative-binomial-regression.knit

Negative Binomial Regression in R

For this guide we will revisit the crab data set that we used for Poisson regressions, which observed the relationship between the number of male crabs attaching to a female crab’s nest (Satellites) and the characteristics of the female crab that include the color (Color), spine condition (Spine), weight (Weight), and carapace width (Width).

We will again read the data into R using the read.table() function. We will also go ahead and reassign the Color and Spine variables to factors.

crab <- read.csv("../../dat/crab.csv")

crab$Color <- factor(crab$Color)
crab$Spine <- factor(crab$Spine)

To fit a negative binomial model in R we will need to load in an external package. We will use the MASS package for this guide which includes the glm.nb() function that builds upon glm() to include negative binomial estimation by adding the θ parameter. After loading the library we can fit a negative binomial model similarly to how we did for the Poisson model, except we do not need to define a family option.

library(MASS)

crab.nb <- glm.nb(Satellites ~ Color + Spine + Weight + Width,
                 data = crab)

Before diving into analyzing the results of the model, we will first check if the model fits well, both against a null model (one with only an intercept and no predictor variables) and against the Poisson model with the same model formula.

For the null model we can use the update() function to take our original model, crab.nb, and change the model formula where we keep the response variable with . to the left of a ~ and include only 1 on the right side so that the model includes an intercept but no predictor variables. Then, we can use anova() to compare our null model and full model.

crab.nb0 <- update(crab.nb, . ~ 1)

anova(crab.nb0, crab.nb)
## Likelihood ratio tests of Negative Binomial Models
## 
## Response: Satellites
##                            Model    theta Resid. df    2 x log-lik.   Test    df LR stat.
## 1                              1 0.757759       172       -767.4092                      
## 2 Color + Spine + Weight + Width 0.965038       165       -745.3188 1 vs 2     7 22.09047
##       Pr(Chi)
## 1            
## 2 0.002450761

From the results we can see that the full model has a significantly higher log likelihood, and therefore is a better fit on the data than the null model.

Next, we should compare the negative binomial model with its Poisson counter part. To do so, we will fit a Poisson regression model with the same model formula. Then, we can use the pchisq() function to provide the associated p-value for the associated χ^2 statistic which we calculate by taking the difference of the log likelihoods of the two models and multiplying by 2. Note that it is important that we subtract the log likelihood of the Poisson model from the log likelihood of the negative binomial when calculating the χ^2 statistic and not the other way around.

crab.pois <- glm(Satellites ~ Color + Spine + Weight + Width,
                 data = crab,
                 family = "poisson")

pchisq(2 * (logLik(crab.nb) - logLik(crab.pois)), df = 1, lower.tail = FALSE)
## 'log Lik.' 1.42397e-36 (df=9)

The associated p-value is extremely small, indicating that the negative binomial model is better over the Poisson model.

Now that we have evidence that the negative binomial model is appropriate for this data, we can use the anova() and summary() functions to get results from the fitted model.

anova(crab.nb)
## Warning in anova.negbin(crab.nb): tests made without re-estimating 'theta'
## Analysis of Deviance Table
## 
## Model: Negative Binomial(0.965), link: log
## 
## Response: Satellites
## 
## Terms added sequentially (first to last)
## 
## 
##        Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
## NULL                     172     220.68              
## Color   3   6.0196       169     214.66    0.1107    
## Spine   2   2.0357       167     212.62    0.3614    
## Weight  1  16.1028       166     196.52 5.999e-05 ***
## Width   1   0.0005       165     196.52    0.9818    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(crab.nb)
## 
## Call:
## glm.nb(formula = Satellites ~ Color + Spine + Weight + Width, 
##     data = crab, init.theta = 0.9650380207, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.8788  -1.3685  -0.3267   0.4224   2.2288  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)  
## (Intercept) -0.274784   1.950675  -0.141   0.8880  
## Color2      -0.320766   0.372716  -0.861   0.3894  
## Color3      -0.596232   0.417342  -1.429   0.1531  
## Color4      -0.579357   0.466470  -1.242   0.2142  
## Spine2      -0.242827   0.398357  -0.610   0.5421  
## Spine3       0.042811   0.248427   0.172   0.8632  
## Weight       0.700752   0.356375   1.966   0.0493 *
## Width       -0.002522   0.099678  -0.025   0.9798  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(0.965) family taken to be 1)
## 
##     Null deviance: 220.68  on 172  degrees of freedom
## Residual deviance: 196.52  on 165  degrees of freedom
## AIC: 763.32
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  0.965 
##           Std. Err.:  0.176 
## 
##  2 x log-likelihood:  -745.319

We again see that Weight is a statistically significant predictor of Satellites. However, unlike the Poisson model, when including the dispersion parameter the Color variable is no longer statistically significant. Therefore, with a positive estimated coefficient, we can conclude that female crab Weight has a positive relationship with the number of male Satellites attached to the nest. If we are interested in the strength of this effect we can take the exponential of the coefficient to see the incident rate ratio. To do so, we can use the exp() function to return the exponentials of the coefficients (coef()) and their confidence intervals (confint()) which we will merge into a table using cbind().

exp(cbind(est = coef(crab.nb), confint(crab.nb)))
##                   est      2.5 %    97.5 %
## (Intercept) 0.7597359 0.01148486 48.600768
## Color2      0.7255934 0.33148037  1.498378
## Color3      0.5508837 0.23555084  1.232776
## Color4      0.5602586 0.21563682  1.417637
## Spine2      0.7844071 0.36933889  1.712163
## Spine3      1.0437402 0.63712967  1.681390
## Weight      2.0152673 0.91857897  4.488111
## Width       0.9974814 0.80262898  1.240856

From the resulting table and knowing that only Weight was statistically significant in the model, we can conclude with 95% certainty that for every 1 unit increase in female crab weight the number of expected male Satellites on its nest increases by 2.015 times.

DLC_statistical_guides