Graphics in R

Albart Coster

Function plot

Function plot is the basic function for plotting:

plot(weight~Time,data = ChickWeight)

Plot of weight against time.

Specify the display with a formula. LHS of formula is the vertical axis, RHS is the horizontal axis of the plot (usually):

class(weight~Time)
## [1] "formula"

Function boxplot

Function boxplot for making box-and-wisker plots:

boxplot(weight~Time,data = ChickWeight)

Boxplot of chickweights.

Function barplot

For a barplot we need to provide a matrix or a vector of data:

mat1 <- with(ChickWeight,tapply(weight,Diet,mean))
barplot(mat1,xlab = "Diet",ylab = " Mean weight (kg)")

Barplot of chickweights.

Combining graphics

Suppose that we want to combine the three graphs into one plot. We use function par to set specific graphical parameter settings, in our case mfrow:

par(mfrow = c(1,3))
plot(weight~Time,data = ChickWeight)
boxplot(weight~Time,data = ChickWeight)
barplot(mat1,xlab = "Diet",ylab = " Mean weight (kg)")

Plots combined with function par().

Saving graphics

Functions jpeg, png, bmp and pdf to save graphics in specific formats:

pdf(file = "figure/Cboxplot.pdf",width = 20,height = 10)
boxplot(weight~Time,data = ChickWeight,col = "blue")
graphics.off()
file.exists('figure/Cboxplot.pdf')
## [1] TRUE

Note the call graphics.off() to close the graphical printer.

Customizing a graphic:

Suppose that we want:

## convert Diet to numeric for convenience
ChickWeight$Diet <- as.numeric(ChickWeight$Diet)
nDiets <- max(ChickWeight$Diet)
colors <- rainbow(nDiets)
lineTypes <- (1:nDiets)
plotChar <- 18+(1:nDiets)
plot(weight~Time,type = "n",data = ChickWeight)

for(d in 1:nDiets)
{
  CW <- ChickWeight[ChickWeight$Diet%in%d,]
  points(weight~Time,data=CW,col = colors[d],pch=plotChar[d])
  abline(lm(weight~Time,data = CW),col=colors[d],lty=lineTypes[d],lwd=1.5)
}

legend(x = 0.05*max(ChickWeight$Time),
       y = 0.85*max(ChickWeight$weight),
       col = colors,pch = plotChar,lty = lineTypes,
       legend = paste("Diet",unique(ChickWeight$Diet)),
       text.col = colors)

Plot with extras.

There are special packages designed to plot multivariate data, see lattice and ggplot2. In this course, we will only use package ggplot2.

Time for exercises

The ggplot2 package

The ggplot2 package is based on the grammar of graphics. It is important to mention because it has become very popular among R-users and because it is very useful to create a wide variety of graphics in R.

Users who wish to create a graphic using the ggplot2 package need to think about the following aspects:

Creating a plot

We use function ggplot to create a plot object. Function ggplot has two arguments: data and aestetic mapping, which set up the defaults of the plot object and can also be specified in each layer. The aestetic mapping of the data can be specified with function aes.

p <- ggplot(data = ChickWeight,aes(x = Time,y = weight))

This plot can not be displayed until we add a layer, since there is nothing to see:

Adding layers to the plot

To display something, we need to add a layer to the plot. The layers can be added with function layer:

layer(geom, geom_params, stat, stat_paramss, data, mapping, position)

Or we can use specific functions to add specific layers:

We can also perform some analyses before and subsequently add a layer to the plot:

Adding points

Now, we add points representing the individual weights to the plot:

p + geom_point()

Scatterplot with ggplot.

the + operation returns a ggplot object to the console which is not stored into a new object, we can store the resulting object as a new object, but now we need to specifically call this object to print it to the screen:

p1 <- p + geom_point()
p1

Another scatterplot with ggplot.

Differentiating between chicks:

We can differentiate between the chick using distinct colours for each chick:

p + geom_point(aes(colour = Chick))

Scatterplot with distinct colour for each chick.

since the number of chicks is large, the legend is useless. We use function theme to remove the legend:

p + geom_point(aes(colour = Chick)) + theme(legend.position = 'none')

Scatterplot with distinct colour for each chick without legend.

We can also use a distinct shape for each chick:

p + geom_point(aes(shape = Chick,colour = Chick)) + theme(legend.position = 'none')
## Warning: The shape palette can deal with a maximum of 6 discrete values
## because more than 6 becomes difficult to discriminate; you have
## 50. Consider specifying shapes manually if you must have them.
## Warning: Removed 525 rows containing missing values (geom_point).

Scatterplot with distinct colour and shape for each chick

Since we have more than 6 Chicks, we should specify the shapes manually. First, let’s have a look at the different point shapes available in ggplot2:

df <- data.frame(x=  1:12,y = 1,sh = as.character(1:6),colour = as.character(1:6))
ggplot(data = df,aes(x= x,y = y,colour = colour,shape = colour)) + geom_point()

Scatterplot with 6 distinct colours and shapes for chicks

ChickWeight$shape <- factor(as.numeric(ChickWeight$Chick)%%6)
p$data <- ChickWeight
p + geom_point(aes(shape = shape,colour = Chick)) + theme(legend.position = 'none')

Scatterplot with 6 distinct colours and shapes for chicks

I used the modulus operator ‘%%’ of R (the modulo operation finds the remainder of the division of one number by anonther, see wikipedia).

Connecting the chicks by a line

Since there a many observations on each chick, we can connect the observations by individual lines, for this, we need to add another layer using function geom_lines. Simply adding + geom_line() does not work correctly, since ggplot2 does not know which observations should be connected to each other:

p + geom_line()

Plot with lines.

Hence, we need to tell function ggplot that each chick is a group. Furthermore, we can specify the type of the lines and the colour of the lines as we did previously for the points:

p <- ggplot(data = ChickWeight,aes(x = Time,y = weight,
     colour = shape,group = Chick,linetype = shape))
p + geom_line() + theme(legend.position = 'none')

Plot with better lines.

Summarizing data

We might want to summarize the observations in a graphic

ChickWeight$Diet <- factor(ChickWeight$Diet)
p <- ggplot(data = ChickWeight,aes(x = Time,y = weight))
p + stat_summary(fun.y = 'mean',geom = 'line',aes(group = Diet,colour = Diet)) + 
    stat_summary(fun.y = 'sd',geom ='point',aes (group = Diet,colour = Diet))

Plot summarizing data with function stat_summary().

We might also want to show the range of the data at each point (line show the 95% quantiles, point showing the median):

q5 <- function(x)
   quantile(x,0.05)
q95 <- function(x)
    quantile(x,0.95)

p + stat_summary(fun.y = median,fun.ymin = q5,fun.ymax = q95,aes (group = Diet,colour = Diet))

Custom summary of the data with function stat_summary().

This is not very useful since the lines overlap each other. We have several alternatives. The first is to jitter the points to avoid overlapping:

p + stat_summary(fun.y = median,fun.ymin = q5,fun.ymax = q95,
    position = 'jitter',aes (group = Diet,colour = Diet))

Summary with jittered points.

The second option is to make a distinct plot for each diet, using a technique called facetting. Now, the colours are no longer needed to identify the diets and we can use a single colour:

levels(p$data$Diet) <- paste('Diet',levels(p$data$Diet))
p + stat_summary(fun.y = median,fun.ymin = q5,fun.ymax = q95,colour = 'red',aes (group = Diet)) + 
    facet_grid(.~Diet)

Plot with facets.

Barplots

To make barplots, we have two alternatives. First, we can provide summarized data, for example the average weight per diet and per point in time:

sw <- with(ChickWeight,tapply(weight,list(Time,Diet),mean,na.rm = T))
head(sw)
##           1     2     3     4
## 0  41.40000  40.7  40.8  41.0
## 2  47.25000  49.4  50.4  51.8
## 4  56.47368  59.8  62.2  64.5
## 6  66.78947  75.4  77.9  83.9
## 8  79.68421  91.7  98.4 105.6
## 10 93.05263 108.5 117.1 126.0

Object sw has a format which is not usable for ggplot2; therefore we need to have the data in long format using function melt package reshape:

install.packages('reshape')
require(reshape)
swl <- melt(sw)
head(swl)
##   X1 X2    value
## 1  0  1 41.40000
## 2  2  1 47.25000
## 3  4  1 56.47368
## 4  6  1 66.78947
## 5  8  1 79.68421
## 6 10  1 93.05263
colnames(swl) <- c('Time','Diet','weight')
swl$Diet <- factor(swl$Diet)
bp <- ggplot(data = swl,aes(y = weight,x = Time,fill = Diet))
bp + geom_bar(stat = 'identity')

Barplot

By default, the bars are stacked upon each other, we can also place them next to each other:

bp + geom_bar(stat = 'identity',position = 'dodge')

Dodged barplot.

But we can also use function stat_summary to calculate the average weight per point in time, as shown previously. Argument position = 'dogde' specifies that the bars are located next to each other, with position = 'stack', they are placed on top of each other:

bp1 <- ggplot(data = ChickWeight,aes(y = weight,x = Time,fill = Diet))
bp1 + stat_summary(fun.y = mean,geom = 'bar',position = 'dodge')

Barplots with stat_summary().

bp1 + stat_summary(fun.y = mean,geom = 'bar',position = 'stack')

Barplots with stat_summary().

Since we use the raw data, it should be possible to add bars showing the range of the data. For this, we use function geom_errorbar:

bp1 + stat_summary(fun.y = mean,geom = 'bar',position = 'dodge') + 
   stat_summary(fun.ymax = q95,fun.ymin=q5,geom="errorbar", width=0.25,
   position = position_dodge(width = 0.9),size = 1.2)

Barplot showing the range of the data.

Trendlines

Supose that we want to draw a line through the data, for this we use function stat_smooth:

p <- ggplot(data = ChickWeight,aes(x = Time,y = weight)) 
p + geom_point(aes(colour = Diet)) + stat_smooth(se = FALSE)
## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.

Scatterplot with trendlines

We can also add new lines for each Diet

p + geom_point(aes(colour = Diet)) +
  stat_smooth(se = FALSE,colour = 'black',size = 1.2) + 
  stat_smooth(aes(colour = Diet),size = 1,se = FALSE)
## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.

Trendline for each diet.

In the previous examples, the function used a nonparametric method called LOESS. We can also use linear regression to fit the data:

p + geom_point(aes(colour = Diet)) + stat_smooth(aes(colour = Diet),
    method = 'lm',size = 1.2,se = FALSE)

Linear trendline.

And we can also specifiy the equation:

p + geom_point(aes(colour = Diet)) + stat_smooth(aes(colour = Diet),
    method = 'lm',formula = y~x + I(x^2),size = 1.2,se = FALSE)

Quadratic trendline.

Time for exercises