ANOVA

Gerrit Gort

One-way ANOVA

Do different diets result in different final weights? First we select the final weights by reverse ordering the data frame and then selecting the first observation per chick. Next, treat Diet as a factor (qualitative regressor):

CWf <- ChickWeight[with(ChickWeight,order(Chick,-Time)),]
CWf <- CWf[!duplicated(CWf$Chick),]
if(!is.factor(CWf$Diet))
    CWf$Diet <- factor(CWf$Diet)
ggplot(CWf, aes(x = Diet, weight)) + geom_boxplot(aes(fill=Diet))

Boxplots of weight per diet

One-way ANOVA model as linear model

One-way ANOVA model: \(weight_{ij} = \mu + Diet_i + \epsilon_{ij}\)

ANOVA = regression on indicator variables for the levels of the factor(s). To see this, write the model as:

\(weight_{ij} = \mu + \alpha_1 x_1 + \alpha_2 x_2+ \alpha_3 x_3+ \alpha_4x_4 + \epsilon_{ij}\)

where \(x_i\) is an indicator variable for i-th level of Diet

Function lm

ANOVA is another example of a linear model (linear in the parameters); we can use lm:

(anW1 <- lm(weight ~ Diet, CWf))
## 
## Call:
## lm(formula = weight ~ Diet, data = CWf)
## 
## Coefficients:
## (Intercept)        Diet2        Diet3        Diet4  
##       156.3         58.4        114.0         73.0

Parameterization

The design matrix for ANOVA is (first 7 rows):

model.matrix(anW1)[1:7,]
##     (Intercept) Diet2 Diet3 Diet4
## 196           1     0     0     0
## 182           1     0     0     0
## 175           1     0     0     0
## 155           1     0     0     0
## 107           1     0     0     0
## 220           1     0     0     0
## 119           1     0     0     0

Hence: no indicator variable for the first level of Diet

How does R build this model.matrix? For models with a factor, first a matrix \(\mathbf{X}\) is formed as \(\mathbf{X} = \left[\mathbf{1\;X_a}\right]\) where \(\mathbf{X_a}\) is an incidence matrix with a separate column for each level of the factor. The model.matrix follows as \(\mathbf{X}^* = \left[\mathbf{1\;X_aC_a}\right]\), where \(\mathbf{C}_a\) is a contrast matrix.

Default parameterization

The default contrast matrix \(\mathbf{C}_a\) for factor Diet is:

contrasts(CWf$Diet)
##   2 3 4
## 1 0 0 0
## 2 1 0 0
## 3 0 1 0
## 4 0 0 1

Which gives contrasts between all levels except the first with the first level. This is called the corner-stone parameterization. The estimated coefficients are:

coef(anW1)
## (Intercept)       Diet2       Diet3       Diet4 
##       156.3        58.4       114.0        73.0

With these coefficients, we test if the weight of diet 1 differs from the other diets:

coefficients(summary(anW1))
##             Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)    156.3   15.35332 10.180208 2.299269e-13
## Diet2           58.4   26.59273  2.196089 3.316560e-02
## Diet3          114.0   26.59273  4.286886 9.175867e-05
## Diet4           73.0   26.59273  2.745111 8.600299e-03

Other parameterization

We can define custom contrast matrices, or use other options for the contrast. E.g. another corner-stone parameterization takes the last level of Diet as the reference. Specify the contrast matrix for it and use this contrast matrix in lm as follows:

(Ca <- contr.treatment(levels(CWf$Diet),base=4))
##   1 2 3
## 1 1 0 0
## 2 0 1 0
## 3 0 0 1
## 4 0 0 0
anW2 <- lm(weight ~Diet, contrasts=list(Diet=Ca), CWf)
coef(anW2)
## (Intercept)       Diet1       Diet2       Diet3 
##       229.3       -73.0       -14.6        41.0

All pairwise differences

Significance of pairwise differences can be calculated with Tukey’s Honestly Significant Differences function TukeyHSD. The function works only for a fitted model object from aov:

summary(anW3 <- aov(weight ~ Diet, data = CWf))
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## Diet         3  96913   32304   6.852 0.000652 ***
## Residuals   46 216866    4714                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
round(TukeyHSD(anW3)$Diet, 3)
##      diff      lwr     upr p adj
## 2-1  58.4  -12.483 129.283 0.140
## 3-1 114.0   43.117 184.883 0.001
## 4-1  73.0    2.117 143.883 0.041
## 3-2  55.6  -26.248 137.448 0.282
## 4-2  14.6  -67.248  96.448 0.964
## 4-3 -41.0 -122.848  40.848 0.546

Plotting the pairwise differences:

plot(TukeyHSD(anW3, "Diet"))

Tukey confidence intervals for pairwise differences

End