Albart Coster
plotFunction plot is the basic function for plotting:
plot(weight~Time,data = ChickWeight)Specify the display with a formula. LHS of formula is the vertical axis, RHS is the horizontal axis of the plot (usually):
class(weight~Time)## [1] "formula"
boxplotFunction boxplot for making box-and-wisker plots:
boxplot(weight~Time,data = ChickWeight)barplotFor a barplot we need to provide a matrix or a vector of data:
mat1 <- with(ChickWeight,tapply(weight,Diet,mean))
barplot(mat1,xlab = "Diet",ylab = " Mean weight (kg)")Suppose that we want to combine the three graphs into one plot. We use function par to set specific graphical parameter settings, in our case mfrow:
par(mfrow = c(1,3))
plot(weight~Time,data = ChickWeight)
boxplot(weight~Time,data = ChickWeight)
barplot(mat1,xlab = "Diet",ylab = " Mean weight (kg)")Functions jpeg, png, bmp and pdf to save graphics in specific formats:
pdf(file = "figure/Cboxplot.pdf",width = 20,height = 10)
boxplot(weight~Time,data = ChickWeight,col = "blue")
graphics.off()
file.exists('figure/Cboxplot.pdf')## [1] TRUE
Note the call graphics.off() to close the graphical printer.
Suppose that we want:
ChickWeight data);Diet;## convert Diet to numeric for convenience
ChickWeight$Diet <- as.numeric(ChickWeight$Diet)
nDiets <- max(ChickWeight$Diet)
colors <- rainbow(nDiets)
lineTypes <- (1:nDiets)
plotChar <- 18+(1:nDiets)
plot(weight~Time,type = "n",data = ChickWeight)
for(d in 1:nDiets)
{
CW <- ChickWeight[ChickWeight$Diet%in%d,]
points(weight~Time,data=CW,col = colors[d],pch=plotChar[d])
abline(lm(weight~Time,data = CW),col=colors[d],lty=lineTypes[d],lwd=1.5)
}
legend(x = 0.05*max(ChickWeight$Time),
y = 0.85*max(ChickWeight$weight),
col = colors,pch = plotChar,lty = lineTypes,
legend = paste("Diet",unique(ChickWeight$Diet)),
text.col = colors)par(mfrow=c(1,3)) specification.There are special packages designed to plot multivariate data, see lattice and ggplot2. In this course, we will only use package ggplot2.
The ggplot2 package is based on the grammar of graphics. It is important to mention because it has become very popular among R-users and because it is very useful to create a wide variety of graphics in R.
Users who wish to create a graphic using the ggplot2 package need to think about the following aspects:
We use function ggplot to create a plot object. Function ggplot has two arguments: data and aestetic mapping, which set up the defaults of the plot object and can also be specified in each layer. The aestetic mapping of the data can be specified with function aes.
p <- ggplot(data = ChickWeight,aes(x = Time,y = weight))This plot can not be displayed until we add a layer, since there is nothing to see:
To display something, we need to add a layer to the plot. The layers can be added with function layer:
layer(geom, geom_params, stat, stat_paramss, data, mapping, position)
Or we can use specific functions to add specific layers:
geom_bar to add bars to the plotgeom_errorbar to add errorbarsgeom_point to add points to the plotgeom_line to add linesgeom_histogram to add a histogramWe can also perform some analyses before and subsequently add a layer to the plot:
stat_summary to summarize the data at each x-valuestat_smooth to draw a smoothed line through the dataNow, we add points representing the individual weights to the plot:
p + geom_point()the + operation returns a ggplot object to the console which is not stored into a new object, we can store the resulting object as a new object, but now we need to specifically call this object to print it to the screen:
p1 <- p + geom_point()
p1We can differentiate between the chick using distinct colours for each chick:
p + geom_point(aes(colour = Chick))since the number of chicks is large, the legend is useless. We use function theme to remove the legend:
p + geom_point(aes(colour = Chick)) + theme(legend.position = 'none')We can also use a distinct shape for each chick:
p + geom_point(aes(shape = Chick,colour = Chick)) + theme(legend.position = 'none')## Warning: The shape palette can deal with a maximum of 6 discrete values
## because more than 6 becomes difficult to discriminate; you have
## 50. Consider specifying shapes manually if you must have them.
## Warning: Removed 525 rows containing missing values (geom_point).
Since we have more than 6 Chicks, we should specify the shapes manually. First, let’s have a look at the different point shapes available in ggplot2:
df <- data.frame(x= 1:12,y = 1,sh = as.character(1:6),colour = as.character(1:6))
ggplot(data = df,aes(x= x,y = y,colour = colour,shape = colour)) + geom_point()ChickWeight$shape <- factor(as.numeric(ChickWeight$Chick)%%6)
p$data <- ChickWeight
p + geom_point(aes(shape = shape,colour = Chick)) + theme(legend.position = 'none')I used the modulus operator ‘%%’ of R (the modulo operation finds the remainder of the division of one number by anonther, see wikipedia).
Since there a many observations on each chick, we can connect the observations by individual lines, for this, we need to add another layer using function geom_lines. Simply adding + geom_line() does not work correctly, since ggplot2 does not know which observations should be connected to each other:
p + geom_line()Hence, we need to tell function ggplot that each chick is a group. Furthermore, we can specify the type of the lines and the colour of the lines as we did previously for the points:
p <- ggplot(data = ChickWeight,aes(x = Time,y = weight,
colour = shape,group = Chick,linetype = shape))
p + geom_line() + theme(legend.position = 'none')We might want to summarize the observations in a graphic
ChickWeight$Diet <- factor(ChickWeight$Diet)
p <- ggplot(data = ChickWeight,aes(x = Time,y = weight))
p + stat_summary(fun.y = 'mean',geom = 'line',aes(group = Diet,colour = Diet)) +
stat_summary(fun.y = 'sd',geom ='point',aes (group = Diet,colour = Diet))We might also want to show the range of the data at each point (line show the 95% quantiles, point showing the median):
q5 <- function(x)
quantile(x,0.05)
q95 <- function(x)
quantile(x,0.95)
p + stat_summary(fun.y = median,fun.ymin = q5,fun.ymax = q95,aes (group = Diet,colour = Diet))This is not very useful since the lines overlap each other. We have several alternatives. The first is to jitter the points to avoid overlapping:
p + stat_summary(fun.y = median,fun.ymin = q5,fun.ymax = q95,
position = 'jitter',aes (group = Diet,colour = Diet))The second option is to make a distinct plot for each diet, using a technique called facetting. Now, the colours are no longer needed to identify the diets and we can use a single colour:
levels(p$data$Diet) <- paste('Diet',levels(p$data$Diet))
p + stat_summary(fun.y = median,fun.ymin = q5,fun.ymax = q95,colour = 'red',aes (group = Diet)) +
facet_grid(.~Diet)To make barplots, we have two alternatives. First, we can provide summarized data, for example the average weight per diet and per point in time:
sw <- with(ChickWeight,tapply(weight,list(Time,Diet),mean,na.rm = T))
head(sw)## 1 2 3 4
## 0 41.40000 40.7 40.8 41.0
## 2 47.25000 49.4 50.4 51.8
## 4 56.47368 59.8 62.2 64.5
## 6 66.78947 75.4 77.9 83.9
## 8 79.68421 91.7 98.4 105.6
## 10 93.05263 108.5 117.1 126.0
Object sw has a format which is not usable for ggplot2; therefore we need to have the data in long format using function melt package reshape:
install.packages('reshape')
require(reshape)swl <- melt(sw)
head(swl)## X1 X2 value
## 1 0 1 41.40000
## 2 2 1 47.25000
## 3 4 1 56.47368
## 4 6 1 66.78947
## 5 8 1 79.68421
## 6 10 1 93.05263
colnames(swl) <- c('Time','Diet','weight')
swl$Diet <- factor(swl$Diet)bp <- ggplot(data = swl,aes(y = weight,x = Time,fill = Diet))
bp + geom_bar(stat = 'identity')By default, the bars are stacked upon each other, we can also place them next to each other:
bp + geom_bar(stat = 'identity',position = 'dodge')But we can also use function stat_summary to calculate the average weight per point in time, as shown previously. Argument position = 'dogde' specifies that the bars are located next to each other, with position = 'stack', they are placed on top of each other:
bp1 <- ggplot(data = ChickWeight,aes(y = weight,x = Time,fill = Diet))
bp1 + stat_summary(fun.y = mean,geom = 'bar',position = 'dodge')bp1 + stat_summary(fun.y = mean,geom = 'bar',position = 'stack')Since we use the raw data, it should be possible to add bars showing the range of the data. For this, we use function geom_errorbar:
bp1 + stat_summary(fun.y = mean,geom = 'bar',position = 'dodge') +
stat_summary(fun.ymax = q95,fun.ymin=q5,geom="errorbar", width=0.25,
position = position_dodge(width = 0.9),size = 1.2)Supose that we want to draw a line through the data, for this we use function stat_smooth:
p <- ggplot(data = ChickWeight,aes(x = Time,y = weight))
p + geom_point(aes(colour = Diet)) + stat_smooth(se = FALSE)## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.
We can also add new lines for each Diet
p + geom_point(aes(colour = Diet)) +
stat_smooth(se = FALSE,colour = 'black',size = 1.2) +
stat_smooth(aes(colour = Diet),size = 1,se = FALSE)## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.
In the previous examples, the function used a nonparametric method called LOESS. We can also use linear regression to fit the data:
p + geom_point(aes(colour = Diet)) + stat_smooth(aes(colour = Diet),
method = 'lm',size = 1.2,se = FALSE)And we can also specifiy the equation:
p + geom_point(aes(colour = Diet)) + stat_smooth(aes(colour = Diet),
method = 'lm',formula = y~x + I(x^2),size = 1.2,se = FALSE)