Albart Coster
Programming: writing commands in a language understandable for a computer. An example:
text <- "Today is"
today <- date()
print(text)## [1] "Today is"
print(today)## [1] "Tue May 03 10:19:46 2016"
(Today <- paste(text,today))## [1] "Today is Tue May 03 10:19:46 2016"
Make use of effort of others: e.g. using the paste function.
Everything in R is an object. An object is defined by its class:
a <- 1
class(a)## [1] "numeric"
b <- "R course"
class(b)## [1] "character"
paste## function (..., sep = " ", collapse = NULL)
## .Internal(paste(list(...), sep, collapse))
## <bytecode: 0x000000000f1aad90>
## <environment: namespace:base>
class(paste)## [1] "function"
A function:
functionname(argument1 = a1, argument2 = a2, ..., argumentn = an)
<- assigns objectsrm to remove objects;a <- 1
a <- 2
print(a)## [1] 2
rm(a)
exists('a')## [1] FALSE
a <- "object a"
b <- a
b == a## [1] TRUE
Note: the ‘==’ operator. Returns TRUE or FALSE
Suppose that we want to know if a certain condition is true. We use conditional operators:
a <- 1; b <- 2
a < b## [1] TRUE
a <= b## [1] TRUE
a > b## [1] FALSE
a >= b## [1] FALSE
a == b## [1] FALSE
a != b## [1] TRUE
Suppose that we want to do something depending on a certain outcome:
{
if(a < b)
print(" a is smaller than b")
else if (a > b)
print(" a is bigger than b")
else
print("a and b are equal")
}## [1] " a is smaller than b"
Logical OR and AND operators:
{
if(a == 1 & b == 10)
print("A is equal to 1 AND B is equal to 10")
else if(a == 1 | b == 10)
print("A is equal to 1 OR B is equal to 10")
else
print("A is not equal to 1 NOR B equal to 10")
}## [1] "A is equal to 1 OR B is equal to 10"
numeric vector is a series of numberscharacter vector is a series of charactersfactor vector a series of factorsAdvantage of factor over character is related to storage.
Function c to concatenate, required to join numbers or characters into a vector:
(v1 <- c(1,2))## [1] 1 2
(v2 <- c("a","b","cc"))## [1] "a" "b" "cc"
(v3 <- c(v1,v1))## [1] 1 2 1 2
(v4 <- c(v1,v2))## [1] "1" "2" "a" "b" "cc"
(v5 <- c(first = 1,second = 2,last = 1000))## first second last
## 1 2 1000
Function seq to make a sequence:
(seq1 <- seq(from = 0,to = 10,by = 1))## [1] 0 1 2 3 4 5 6 7 8 9 10
(seq2 <- seq(from = 1,to = 2,length.out = 3))## [1] 1.0 1.5 2.0
The ‘[’ operator to access specific elements of a vector:
v1 <- seq(0,10)
v1[1]## [1] 0
v1[10]## [1] 9
ii <- seq(1,3)
v1[ii]## [1] 0 1 2
Note: the last option; access various indices 1,2,3 directly. This is called vectorized computing.
Give name to elements of a vector and use the names to access elements:
v1## [1] 0 1 2 3 4 5 6 7 8 9 10
names(v1) <- letters[1:length(v1)]
v1## a b c d e f g h i j k
## 0 1 2 3 4 5 6 7 8 9 10
v1["b"]## b
## 1
v5["first"]## first
## 1
Note: the ':' operator is similar to the seq function. For for help: ?':'.
1:3## [1] 1 2 3
Character vectors are vectors of character strings:
letters## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q"
## [18] "r" "s" "t" "u" "v" "w" "x" "y" "z"
class(letters)## [1] "character"
(action <- c("programming","in","R"))## [1] "programming" "in" "R"
length(action)## [1] 3
action[1]## [1] "programming"
action[1:3]## [1] "programming" "in" "R"
pasteFunction paste to combine characters into one character string:
(p1 <- paste(1,letters[1:10],sep = "_"))## [1] "1_a" "1_b" "1_c" "1_d" "1_e" "1_f" "1_g" "1_h" "1_i" "1_j"
(p2 <- paste(1:10,letters[1:10],sep = "."))## [1] "1.a" "2.b" "3.c" "4.d" "5.e" "6.f" "7.g" "8.h" "9.i" "10.j"
(p3 <- paste(1:3,letters[1:10],sep = "."))## [1] "1.a" "2.b" "3.c" "1.d" "2.e" "3.f" "1.g" "2.h" "3.i" "1.j"
Note p3: first argument is recycled.
To concatenate result into one string:
(p4 <- paste(p3,collapse = "/"))## [1] "1.a/2.b/3.c/1.d/2.e/3.f/1.g/2.h/3.i/1.j"
strsplitSuppose that we have a string and want to split it at certain positions:
(p5 <- strsplit(p4,split = "/"))## [[1]]
## [1] "1.a" "2.b" "3.c" "1.d" "2.e" "3.f" "1.g" "2.h" "3.i" "1.j"
p6 <- strsplit(unlist(p5),split = "\\.")
p6[1:2]## [[1]]
## [1] "1" "a"
##
## [[2]]
## [1] "2" "b"
(todaysplitted <- unlist(strsplit(today,split = " ")))## [1] "Tue" "May" "03" "10:19:46" "2016"
Note: strsplit returns a list. Use unlist to transform the list to a character object.
substr and gsubSuppose that we want a section of the string:
string <- "this is a string"
(sstring <- substr(x = string,start = 1,stop = 4))## [1] "this"
Suppose that we want to substitute a part of a string:
(nstring <- gsub(pattern = "is",replacement = "was",
x = string))## [1] "thwas was a string"
(nnstring <- gsub(pattern = "[[:space:]]is[[:space:]]"," was ",string))## [1] "this was a string"
This also introduces the regular expressions. Have a look at ?regexp or google for this topic.
'%in%'Suppose that we want to know if elements of one vector are present in another vector: operator '%in%':
v1 <- letters[1:10]
v2 <- letters[5:15]
v1%in%v2## [1] FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE
It does not tells us where each element of v1 appears in v2.
matchSuppose that we want to know where each element of v1 appears in v2: function match:
v1## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
v2## [1] "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o"
match(v1,v2)## [1] NA NA NA NA 1 2 3 4 5 6
match(v1,c(v2,v2))## [1] NA NA NA NA 1 2 3 4 5 6
Note: match only returns the first match.
whichFunction which to obtain the index where a condition is TRUE:
which(v1 == "a")## [1] 1
vv <- c(v1,v1,v1)
which(vv == "a")## [1] 1 11 21
Now, we use which to substitute the elements of vv which are a with A:
vv[which(vv=="a")] <- "A"match function:Suppose that we want all the matches. We need to program:
v22 <- c(v2,v2)
indices <- lapply(v1,function(i)
which(v22%in%i))A more elegant way: write a new function called matchall, first check if it is not already defined (help.search('matchall'):
matchall <- function(x,y){
lapply(x,function(i){
which(y%in%i)})
}
identical(indices,matchall(v1,v22))## [1] TRUE
This introduces how to program functions.
Function matrix to create a matrix:
(mat1 <- matrix(1:4,nrow = 2,ncol = 2,byrow = T))## [,1] [,2]
## [1,] 1 2
## [2,] 3 4
Access the elements of the matrix with '[' operator:
mat1[1,1]## [1] 1
mat1[1,]## [1] 1 2
Matrices have two dimensions: the ',' separates the dimensions.
Replace elements of the matrix with the [ <- operator:
mat1[,2] <- -10
mat1## [,1] [,2]
## [1,] 1 -10
## [2,] 3 -10
Function rbind to combine two matrices under each other (make sure that the number of columns coincides):
mat2 <- matrix(0,nrow = 2,ncol = ncol(mat1))
mat3 <- rbind(mat1,mat2)
dim(mat3)## [1] 4 2
Function cbind to combine matrices by column (make sure that the number of rows coincides):
mat4 <- cbind(mat1,mat2)
dim(mat4)## [1] 2 4
t'%*%' operatordiagdiag <-Suppose that we have observations x and dependent observations y and we want to know the linear relationship between x and y:
\(\mathbf{y} = \mathbf{x}b,\)
where b is an unknown regression coefficient. The best, least squares, solution is:
\(\hat{b} = \mathbf{(x'x)^{-1}x'y}.\)
Now: program it in R:
(x <- rep(1:3,length.out = 10))## [1] 1 2 3 1 2 3 1 2 3 1
b <- 0.5
(y <- b*x + rnorm(10))## [1] 1.33391213 2.53923443 3.19224503 0.02218696 1.14560315
## [6] 1.21403112 -0.79375053 -0.34268710 0.42364929 1.59210687
(bhat <- solve(t(x)%*%x)%*%t(x)%*%y)## [,1]
## [1,] 0.542524
Compare the original b to the estimate of b, bhat.
Note: more efficient way to solve linear models with function lm; see slides on regression.
data.frameThe columns of a matrix are of one single class. A data.frame is generalized matrix where each column can be of a different class.
Create an object of class data.frame:
animals <- data.frame(id = letters[1:10],
weight = rnorm(mean = 100,10),
sex = sample(c("F","M"),10,replace = T))
head(animals)## id weight sex
## 1 a 100.11286 M
## 2 b 100.77140 F
## 3 c 100.61593 M
## 4 d 99.80709 M
## 5 e 99.16148 M
## 6 f 100.62874 F
We want to extract the weights from animals into a vector weigths:
weights <- animals$weightWe want the subset of the the where weight is more than 100 in a new data.frame animals100p:
animals100p <- animals[animals$weight>100,]We we want to substitute weight when sex == 'M' by NA:
animals$weight[animals$sex == 'M'] <- NAWe want to add a column to animals with the squared weight:
animals$weight2 <- animals$weight^2Function write.csv write a data.frame as a *.csv file:
write.csv(animals,file = 'animals.csv',row.names= FALSE,quote = FALSE)Function read.csv to read a csv file into a data.frame:
animals1 <- read.csv(file = 'animals.csv',header = TRUE)Note: we made a file called animals.csv somewhere on the HD and then read this file back in the memory. Test if a file exists:
if(!file.exists('nonexistentfile.csv'))
print("File does not exists.")## [1] "File does not exists."
Suppose that we want to repeat an operation many times:
Sum up random numbers until the sum reaches 10:
s <- 0
n <- 0
while(s<10){
n <- n + 1
s <- s + runif(1,0,1)
}
print(s)## [1] 10.36739
print(n)## [1] 18
We needed 18 steps to get a sum higher than 10.
forSuppose that we want to follow a certain sequence of values and do something at each value, looping:
vec1 <- rnorm(10)
vec2 <- numeric(10)
for(i in 1:length(vec1))
{
vec2[i] <- abs(vec1[i])
}Note: try to avoid loops. Make code slow and ugly. Much better and faster solution:
vec2 <- abs(vec1)whileSuppose that we want to do something until a certain criterion is met:
i <- 0
while(i<10)
i <- i + 1Be careful: do not make endless loops:
i <- 1
while(i>0)
i <- i + 1applySuppose that we have a matrix and want to calculate the sums of each column (columns are the \(2^{\text{nd}}\) dimension of a matrix).
Ugly method:
mat <- matrix(rnorm(20),nrow = 2,ncol = 10)
uglyCS <- numeric(ncol(mat))
for(col in 1:ncol(mat)){
uglyCS[col] <- sum(mat[,col])
}Elegant method:
elegantCS <- apply(mat,2,sum)
identical(uglyCS,elegantCS)## [1] TRUE
Function apply applies a function to a matrix. Here the function sum is applied to each column of mat (\(2^{nd}\) dimension), second argument of apply is 2. Try to calculate the sums of the rows of mat.
ChickWeight datasetThe ChickWeight dataset is from an experiment of the effect of diet on the weight of chicken. The variables are:
weight: the weight of the chicken expressed in grams;Time: time since hatching when the measurement was done;Chick: identifies each chick;Diet: identifies the diet of each chick.summary(ChickWeight)## weight Time Chick Diet shape
## Min. : 35.0 Min. : 0.00 13 : 12 1:220 0: 96
## 1st Qu.: 63.0 1st Qu.: 4.00 9 : 12 2:120 1: 98
## Median :103.0 Median :10.00 20 : 12 3:120 2:102
## Mean :121.8 Mean :10.72 10 : 12 4:118 3: 92
## 3rd Qu.:163.8 3rd Qu.:16.00 17 : 12 4: 96
## Max. :373.0 Max. :21.00 19 : 12 5: 94
## (Other):506
tapplySuppose that we want to obtain mean weight per Diet and per Time period from ChickWeight. We need:
weight for each DietxTime combination.Ugly method:
mns <- list()
for(t in unique(ChickWeight$Time)){
for(d in unique(ChickWeight$Diet)){
ii <- with(ChickWeight,which(Diet%in%d&Time%in%t))
mns[[paste(t,d)]] <-
mean(ChickWeight$weight[ii])
}}Elegant method:
tt <- tapply(ChickWeight$weight,list(ChickWeight$Diet,
ChickWeight$Time),mean)
tt[,1:4]## 0 2 4 6
## 1 41.4 47.25 56.47368 66.78947
## 2 40.7 49.40 59.80000 75.40000
## 3 40.8 50.40 62.20000 77.90000
## 4 41.0 51.80 64.50000 83.90000
Here we applied the function mean to Diet at each unique time. Again, with less writing using function with:
tt <- with(ChickWeight,tapply(weight,list(Diet,Time),mean))
tt[,1:4]## 0 2 4 6
## 1 41.4 47.25 56.47368 66.78947
## 2 40.7 49.40 59.80000 75.40000
## 3 40.8 50.40 62.20000 77.90000
## 4 41.0 51.80 64.50000 83.90000
Tools in R are functions. Defining a function is not difficult.
Example: a function to summarize data into a mean and a standard deviation, meansd:
meansd <- function (x, nDec = 2,na.rm = TRUE)
{
mn <- mean(x, na.rm = na.rm)
if (!is.na(mn))
mn <- round(mn, nDec)
sd <- sd(x, na.rm = na.rm)
if (!is.na(sd))
sd <- round(sd, nDec)
return(paste(mn, " (", sd, ")", sep = ""))
}
meansd(1:10)## [1] "5.5 (3.03)"
meansdSuppose that we want to summarize weight for each combination of diet and time into mean and standard deviation:
ttFancy <- tapply(ChickWeight$weight,list(ChickWeight$Diet,ChickWeight$Time),
meansd,1)
ttFancy[,10:12]## 18 20 21
## 1 "158.9 (49.2)" "170.4 (55.4)" "177.8 (58.7)"
## 2 "187.7 (63.3)" "205.6 (70.3)" "214.7 (78.1)"
## 3 "233.1 (57.6)" "258.9 (65.2)" "270.3 (71.6)"
## 4 "202.9 (33.6)" "233.9 (37.6)" "238.6 (43.3)"