Introduction

Programming: writing commands in a language understandable for a computer. An example:

text <- "Today is"
today <- date()
print(text)

## [1] "Today is"

print(today)

## [1] "Tue May 03 10:19:46 2016"

(Today <- paste(text,today))

## [1] "Today is Tue May 03 10:19:46 2016"

Make use of effort of others: e.g. using the paste function.

Important components of programming in R

Objects
Conditional statements
Loops
Indexing
Output

Objects

Everything in R is an object. An object is defined by its class:

a <- 1
class(a)

## [1] "numeric"

b <- "R course"
class(b)

## [1] "character"

paste

## function (..., sep = " ", collapse = NULL) 
## .Internal(paste(list(...), sep, collapse))
## <bytecode: 0x000000000f1aad90>
## <environment: namespace:base>

class(paste)

## [1] "function"

Functions

A function:

functionname(argument1 = a1, argument2 = a2, ..., argumentn = an)

Write name of the function without parentheses: see its definition
Run a function: provide name, parentheses and arguments. Function generally returns some result

Declare and remove objects

The assignment operator <- assigns objects
Function rm to remove objects;
Or overwrite them with a new object

a <- 1
a <- 2
print(a)

## [1] 2

rm(a)
exists('a')

## [1] FALSE

a <- "object a"
b <- a
b == a

## [1] TRUE

Note: the ‘==’ operator. Returns TRUE or FALSE

Conditional operators

Suppose that we want to know if a certain condition is true. We use conditional operators:

a <- 1; b <- 2
a < b

## [1] TRUE

a <= b

## [1] TRUE

a > b

## [1] FALSE

a >= b

## [1] FALSE

a == b

## [1] FALSE

a != b

## [1] TRUE

Conditional operations

Suppose that we want to do something depending on a certain outcome:

{
    if(a < b)
        print(" a is smaller than b")
    else if (a > b)
        print(" a is bigger than b")
    else
        print("a and b are equal")
}

## [1] " a is smaller than b"

Logical OR and AND operators:

{
  if(a == 1 & b == 10)
    print("A is equal to 1 AND B is equal to 10")
  else if(a == 1 | b == 10)
    print("A is equal to 1 OR B is equal to 10")
  else
    print("A is not equal to 1 NOR B equal to 10")
}

## [1] "A is equal to 1 OR B is equal to 10"

Vectors

numeric vector is a series of numbers
character vector is a series of characters
factor vector a series of factors

Advantage of factor over character is related to storage.

Basic functions to make vectors

Function c to concatenate, required to join numbers or characters into a vector:

(v1 <- c(1,2))

## [1] 1 2

(v2 <- c("a","b","cc"))

## [1] "a"  "b"  "cc"

(v3 <- c(v1,v1))

## [1] 1 2 1 2

(v4 <- c(v1,v2))

## [1] "1"  "2"  "a"  "b"  "cc"

(v5 <- c(first = 1,second = 2,last = 1000))

##  first second   last 
##      1      2   1000

Function seq to make a sequence:

(seq1 <- seq(from = 0,to = 10,by = 1))

##  [1]  0  1  2  3  4  5  6  7  8  9 10

(seq2 <- seq(from = 1,to = 2,length.out = 3))

## [1] 1.0 1.5 2.0

Indexing a vector

The ‘[’ operator to access specific elements of a vector:

v1 <- seq(0,10)
v1[1]

## [1] 0

v1[10]

## [1] 9

ii <- seq(1,3)
v1[ii]

## [1] 0 1 2

Note: the last option; access various indices 1,2,3 directly. This is called vectorized computing.

Indexing by name

Give name to elements of a vector and use the names to access elements:

v1

##  [1]  0  1  2  3  4  5  6  7  8  9 10

names(v1) <- letters[1:length(v1)]
v1

##  a  b  c  d  e  f  g  h  i  j  k 
##  0  1  2  3  4  5  6  7  8  9 10

v1["b"]

## b 
## 1

v5["first"]

## first 
##     1

Note: the ':' operator is similar to the seq function. For for help: ?':'.

1:3

## [1] 1 2 3

Character vectors

Character vectors are vectors of character strings:

letters

##  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q"
## [18] "r" "s" "t" "u" "v" "w" "x" "y" "z"

class(letters)

## [1] "character"

(action <- c("programming","in","R"))

## [1] "programming" "in"          "R"

length(action)

## [1] 3

action[1]

## [1] "programming"

action[1:3]

## [1] "programming" "in"          "R"

Function `paste`

Function paste to combine characters into one character string:

(p1 <- paste(1,letters[1:10],sep = "_"))

##  [1] "1_a" "1_b" "1_c" "1_d" "1_e" "1_f" "1_g" "1_h" "1_i" "1_j"

(p2 <- paste(1:10,letters[1:10],sep = "."))

##  [1] "1.a"  "2.b"  "3.c"  "4.d"  "5.e"  "6.f"  "7.g"  "8.h"  "9.i"  "10.j"

(p3 <- paste(1:3,letters[1:10],sep = "."))

##  [1] "1.a" "2.b" "3.c" "1.d" "2.e" "3.f" "1.g" "2.h" "3.i" "1.j"

Note p3: first argument is recycled.

To concatenate result into one string:

(p4 <- paste(p3,collapse = "/"))

## [1] "1.a/2.b/3.c/1.d/2.e/3.f/1.g/2.h/3.i/1.j"

Function `strsplit`

Suppose that we have a string and want to split it at certain positions:

(p5 <- strsplit(p4,split = "/"))

## [[1]]
##  [1] "1.a" "2.b" "3.c" "1.d" "2.e" "3.f" "1.g" "2.h" "3.i" "1.j"

p6 <- strsplit(unlist(p5),split = "\\.")
p6[1:2]

## [[1]]
## [1] "1" "a"
## 
## [[2]]
## [1] "2" "b"

(todaysplitted <- unlist(strsplit(today,split = " ")))

## [1] "Tue"      "May"      "03"       "10:19:46" "2016"

Note: strsplit returns a list. Use unlist to transform the list to a character object.

Functions `substr` and `gsub`

Suppose that we want a section of the string:

string <- "this is a string"
(sstring <- substr(x = string,start = 1,stop = 4))

## [1] "this"

Suppose that we want to substitute a part of a string:

(nstring <- gsub(pattern = "is",replacement = "was",
                 x = string))

## [1] "thwas was a string"

(nnstring <- gsub(pattern = "[[:space:]]is[[:space:]]"," was ",string))

## [1] "this was a string"

This also introduces the regular expressions. Have a look at ?regexp or google for this topic.

Operator `'%in%'`

Suppose that we want to know if elements of one vector are present in another vector: operator '%in%':

v1 <- letters[1:10]
v2 <- letters[5:15]
v1%in%v2

##  [1] FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE

It does not tells us where each element of v1 appears in v2.

Function `match`

Suppose that we want to know where each element of v1 appears in v2: function match:

v1

##  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"

v2

##  [1] "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o"

match(v1,v2)

##  [1] NA NA NA NA  1  2  3  4  5  6

match(v1,c(v2,v2))

##  [1] NA NA NA NA  1  2  3  4  5  6

Note: match only returns the first match.

Function `which`

Function which to obtain the index where a condition is TRUE:

which(v1 == "a")

## [1] 1

vv <- c(v1,v1,v1)
which(vv == "a")

## [1]  1 11 21

Now, we use which to substitute the elements of vv which are a with A:

vv[which(vv=="a")] <- "A"

Extending the `match` function:

Suppose that we want all the matches. We need to program:

v22 <- c(v2,v2)
indices <- lapply(v1,function(i)
   which(v22%in%i))

A more elegant way: write a new function called matchall, first check if it is not already defined (help.search('matchall'):

matchall <- function(x,y){
  lapply(x,function(i){
    which(y%in%i)})
}
identical(indices,matchall(v1,v22))

## [1] TRUE

This introduces how to program functions.

Time for exercises

Matrices

Function matrix to create a matrix:

(mat1 <- matrix(1:4,nrow = 2,ncol = 2,byrow = T))

##      [,1] [,2]
## [1,]    1    2
## [2,]    3    4

Access the elements of the matrix with '[' operator:

mat1[1,1]

## [1] 1

mat1[1,]

## [1] 1 2

Matrices have two dimensions: the ',' separates the dimensions.

Replace elements of the matrix with the [ <- operator:

mat1[,2] <- -10
mat1

##      [,1] [,2]
## [1,]    1  -10
## [2,]    3  -10

Combining matrices

Function rbind to combine two matrices under each other (make sure that the number of columns coincides):

mat2 <- matrix(0,nrow = 2,ncol = ncol(mat1))
mat3 <- rbind(mat1,mat2)
dim(mat3)

## [1] 4 2

Function cbind to combine matrices by column (make sure that the number of rows coincides):

mat4 <- cbind(mat1,mat2)
dim(mat4)

## [1] 2 4

Other functions for matrices

Transpose of a matrix: function t
The inverse of a (square, invertable) matrix: function solve
The matrix product: '%*%' operator
Obtain diagonal elements of (square) matrix: function diag
Replace diagonal elements of a (square) matrix: function diag <-

Application of matrix functions

Suppose that we have observations x and dependent observations y and we want to know the linear relationship between x and y:

\(\mathbf{y} = \mathbf{x}b,\)

where b is an unknown regression coefficient. The best, least squares, solution is:

\(\hat{b} = \mathbf{(x'x)^{-1}x'y}.\)

Now: program it in R:

(x <- rep(1:3,length.out = 10))

##  [1] 1 2 3 1 2 3 1 2 3 1

b <- 0.5
(y <- b*x + rnorm(10))

##  [1]  1.33391213  2.53923443  3.19224503  0.02218696  1.14560315
##  [6]  1.21403112 -0.79375053 -0.34268710  0.42364929  1.59210687

(bhat <- solve(t(x)%*%x)%*%t(x)%*%y)

##          [,1]
## [1,] 0.542524

Compare the original b to the estimate of b, bhat.

Note: more efficient way to solve linear models with function lm; see slides on regression.

Objects of class `data.frame`

The columns of a matrix are of one single class. A data.frame is generalized matrix where each column can be of a different class.

Create an object of class data.frame:

animals <- data.frame(id = letters[1:10],
                      weight = rnorm(mean = 100,10),
                      sex = sample(c("F","M"),10,replace = T))
head(animals)

##   id    weight sex
## 1  a 100.11286   M
## 2  b 100.77140   F
## 3  c 100.61593   M
## 4  d  99.80709   M
## 5  e  99.16148   M
## 6  f 100.62874   F

Manipulating data.frames

We want to extract the weights from animals into a vector weigths:

weights <- animals$weight

We want the subset of the the where weight is more than 100 in a new data.frame animals100p:

animals100p <- animals[animals$weight>100,]

We we want to substitute weight when sex == 'M' by NA:

animals$weight[animals$sex == 'M'] <- NA

We want to add a column to animals with the squared weight:

animals$weight2 <- animals$weight^2

Reading and writing data

Function write.csv write a data.frame as a *.csv file:

write.csv(animals,file = 'animals.csv',row.names= FALSE,quote = FALSE)

Function read.csv to read a csv file into a data.frame:

animals1 <- read.csv(file = 'animals.csv',header = TRUE)

Note: we made a file called animals.csv somewhere on the HD and then read this file back in the memory. Test if a file exists:

if(!file.exists('nonexistentfile.csv'))
    print("File does not exists.")

## [1] "File does not exists."

Time for exercises

Repeating

Suppose that we want to repeat an operation many times:

Sum up random numbers until the sum reaches 10:

s <- 0
n <- 0
while(s<10){
  n <- n + 1
  s <- s + runif(1,0,1)
}
print(s)

## [1] 10.36739

print(n)

## [1] 18

We needed 18 steps to get a sum higher than 10.

Loops: `for`

Suppose that we want to follow a certain sequence of values and do something at each value, looping:

vec1 <- rnorm(10)
vec2 <- numeric(10)
for(i in 1:length(vec1))
{
    vec2[i] <- abs(vec1[i])
}

Note: try to avoid loops. Make code slow and ugly. Much better and faster solution:

vec2 <- abs(vec1)

Loops: `while`

Suppose that we want to do something until a certain criterion is met:

i <- 0
while(i<10)
    i <- i + 1

Be careful: do not make endless loops:

i <- 1
while(i>0)
    i <- i + 1

Function `apply`

Suppose that we have a matrix and want to calculate the sums of each column (columns are the \(2^{\text{nd}}\) dimension of a matrix).

Ugly method:

mat <- matrix(rnorm(20),nrow = 2,ncol = 10)
uglyCS <- numeric(ncol(mat))
for(col in 1:ncol(mat)){
    uglyCS[col] <- sum(mat[,col])
}

Elegant method:

elegantCS <- apply(mat,2,sum)
identical(uglyCS,elegantCS)

## [1] TRUE

Function apply applies a function to a matrix. Here the function sum is applied to each column of mat (\(2^{nd}\) dimension), second argument of apply is 2. Try to calculate the sums of the rows of mat.

Intermezzo: the `ChickWeight` dataset

The ChickWeight dataset is from an experiment of the effect of diet on the weight of chicken. The variables are:

weight: the weight of the chicken expressed in grams;
Time: time since hatching when the measurement was done;
Chick: identifies each chick;
Diet: identifies the diet of each chick.

summary(ChickWeight)

##      weight           Time           Chick     Diet    shape  
##  Min.   : 35.0   Min.   : 0.00   13     : 12   1:220   0: 96  
##  1st Qu.: 63.0   1st Qu.: 4.00   9      : 12   2:120   1: 98  
##  Median :103.0   Median :10.00   20     : 12   3:120   2:102  
##  Mean   :121.8   Mean   :10.72   10     : 12   4:118   3: 92  
##  3rd Qu.:163.8   3rd Qu.:16.00   17     : 12           4: 96  
##  Max.   :373.0   Max.   :21.00   19     : 12           5: 94  
##                                  (Other):506

Function `tapply`

Suppose that we want to obtain mean weight per Diet and per Time period from ChickWeight. We need:

the number of unique timepoints;
the number of unique diets;
the sum of weight for each DietxTime combination.

Ugly method:

mns <- list()
for(t in unique(ChickWeight$Time)){
    for(d in unique(ChickWeight$Diet)){
        ii <- with(ChickWeight,which(Diet%in%d&Time%in%t))
        mns[[paste(t,d)]] <-
            mean(ChickWeight$weight[ii])
    }}

Elegant method:

tt <- tapply(ChickWeight$weight,list(ChickWeight$Diet,
                                     ChickWeight$Time),mean)
tt[,1:4]

##      0     2        4        6
## 1 41.4 47.25 56.47368 66.78947
## 2 40.7 49.40 59.80000 75.40000
## 3 40.8 50.40 62.20000 77.90000
## 4 41.0 51.80 64.50000 83.90000

Here we applied the function mean to Diet at each unique time. Again, with less writing using function with:

tt <- with(ChickWeight,tapply(weight,list(Diet,Time),mean))
tt[,1:4]

##      0     2        4        6
## 1 41.4 47.25 56.47368 66.78947
## 2 40.7 49.40 59.80000 75.40000
## 3 40.8 50.40 62.20000 77.90000
## 4 41.0 51.80 64.50000 83.90000

Programming functions

Tools in R are functions. Defining a function is not difficult.

Example: a function to summarize data into a mean and a standard deviation, meansd:

meansd <- function (x, nDec = 2,na.rm = TRUE)
{
  mn <- mean(x, na.rm = na.rm)
  if (!is.na(mn))
    mn <- round(mn, nDec)
  sd <- sd(x, na.rm = na.rm)
  if (!is.na(sd))
    sd <- round(sd, nDec)
  return(paste(mn, " (", sd, ")", sep = ""))
}
meansd(1:10)

## [1] "5.5 (3.03)"

Application of `meansd`

Suppose that we want to summarize weight for each combination of diet and time into mean and standard deviation:

ttFancy <- tapply(ChickWeight$weight,list(ChickWeight$Diet,ChickWeight$Time),
                  meansd,1)
                  ttFancy[,10:12]

##   18             20             21            
## 1 "158.9 (49.2)" "170.4 (55.4)" "177.8 (58.7)"
## 2 "187.7 (63.3)" "205.6 (70.3)" "214.7 (78.1)"
## 3 "233.1 (57.6)" "258.9 (65.2)" "270.3 (71.6)"
## 4 "202.9 (33.6)" "233.9 (37.6)" "238.6 (43.3)"

Programming in R

Introduction

Important components of programming in R

Objects

Functions

Declare and remove objects

Conditional operators

Conditional operations

Vectors

Basic functions to make vectors

Indexing a vector

Indexing by name

Character vectors

Function paste

Function strsplit

Functions substr and gsub

Operator '%in%'

Function match

Function which

Extending the match function:

Time for exercises

Matrices

Combining matrices

Other functions for matrices

Application of matrix functions

Objects of class data.frame

Manipulating data.frames

Reading and writing data

Time for exercises

Repeating

Loops: for

Loops: while

Function apply

Intermezzo: the ChickWeight dataset

Function tapply

Programming functions

Application of meansd

Time for exercises

Function `paste`

Function `strsplit`

Functions `substr` and `gsub`

Operator `'%in%'`

Function `match`

Function `which`

Extending the `match` function:

Objects of class `data.frame`

Loops: `for`

Loops: `while`

Function `apply`

Intermezzo: the `ChickWeight` dataset

Function `tapply`

Application of `meansd`