Gerrit Gort
Do different diets result in different final weights? First we select the final weights by reverse ordering the data frame and then selecting the first observation per chick. Next, treat Diet as a factor (qualitative regressor):
CWf <- ChickWeight[with(ChickWeight,order(Chick,-Time)),]
CWf <- CWf[!duplicated(CWf$Chick),]
if(!is.factor(CWf$Diet))
CWf$Diet <- factor(CWf$Diet)
ggplot(CWf, aes(x = Diet, weight)) + geom_boxplot(aes(fill=Diet))One-way ANOVA model: \(weight_{ij} = \mu + Diet_i + \epsilon_{ij}\)
ANOVA = regression on indicator variables for the levels of the factor(s). To see this, write the model as:
\(weight_{ij} = \mu + \alpha_1 x_1 + \alpha_2 x_2+ \alpha_3 x_3+ \alpha_4x_4 + \epsilon_{ij}\)
where \(x_i\) is an indicator variable for i-th level of Diet
lmANOVA is another example of a linear model (linear in the parameters); we can use lm:
(anW1 <- lm(weight ~ Diet, CWf))##
## Call:
## lm(formula = weight ~ Diet, data = CWf)
##
## Coefficients:
## (Intercept) Diet2 Diet3 Diet4
## 156.3 58.4 114.0 73.0
The design matrix for ANOVA is (first 7 rows):
model.matrix(anW1)[1:7,]## (Intercept) Diet2 Diet3 Diet4
## 196 1 0 0 0
## 182 1 0 0 0
## 175 1 0 0 0
## 155 1 0 0 0
## 107 1 0 0 0
## 220 1 0 0 0
## 119 1 0 0 0
Hence: no indicator variable for the first level of Diet
How does R build this model.matrix? For models with a factor, first a matrix \(\mathbf{X}\) is formed as \(\mathbf{X} = \left[\mathbf{1\;X_a}\right]\) where \(\mathbf{X_a}\) is an incidence matrix with a separate column for each level of the factor. The model.matrix follows as \(\mathbf{X}^* = \left[\mathbf{1\;X_aC_a}\right]\), where \(\mathbf{C}_a\) is a contrast matrix.
The default contrast matrix \(\mathbf{C}_a\) for factor Diet is:
contrasts(CWf$Diet)## 2 3 4
## 1 0 0 0
## 2 1 0 0
## 3 0 1 0
## 4 0 0 1
Which gives contrasts between all levels except the first with the first level. This is called the corner-stone parameterization. The estimated coefficients are:
coef(anW1)## (Intercept) Diet2 Diet3 Diet4
## 156.3 58.4 114.0 73.0
With these coefficients, we test if the weight of diet 1 differs from the other diets:
coefficients(summary(anW1))## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 156.3 15.35332 10.180208 2.299269e-13
## Diet2 58.4 26.59273 2.196089 3.316560e-02
## Diet3 114.0 26.59273 4.286886 9.175867e-05
## Diet4 73.0 26.59273 2.745111 8.600299e-03
We can define custom contrast matrices, or use other options for the contrast. E.g. another corner-stone parameterization takes the last level of Diet as the reference. Specify the contrast matrix for it and use this contrast matrix in lm as follows:
(Ca <- contr.treatment(levels(CWf$Diet),base=4))## 1 2 3
## 1 1 0 0
## 2 0 1 0
## 3 0 0 1
## 4 0 0 0
anW2 <- lm(weight ~Diet, contrasts=list(Diet=Ca), CWf)
coef(anW2)## (Intercept) Diet1 Diet2 Diet3
## 229.3 -73.0 -14.6 41.0
Significance of pairwise differences can be calculated with Tukey’s Honestly Significant Differences function TukeyHSD. The function works only for a fitted model object from aov:
summary(anW3 <- aov(weight ~ Diet, data = CWf))## Df Sum Sq Mean Sq F value Pr(>F)
## Diet 3 96913 32304 6.852 0.000652 ***
## Residuals 46 216866 4714
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
round(TukeyHSD(anW3)$Diet, 3)## diff lwr upr p adj
## 2-1 58.4 -12.483 129.283 0.140
## 3-1 114.0 43.117 184.883 0.001
## 4-1 73.0 2.117 143.883 0.041
## 3-2 55.6 -26.248 137.448 0.282
## 4-2 14.6 -67.248 96.448 0.964
## 4-3 -41.0 -122.848 40.848 0.546
Plotting the pairwise differences:
plot(TukeyHSD(anW3, "Diet"))