Background functions

Some R functions may be used frequently throughout analyses yet to new users some of them may appear non-intuitive. Here we attempt to identify some of these functions and illustrate their use.

apply()

One of R’s shortcomings is that for() loops are relatvely slow. Historically this was a greater problem and recent versions of R have improved the performance of these loops. Because of the performance cost associated with for loops it is generally recommended to avoid them in your R code, particularly when performance is an issue. If you have a task that requires iteration and you want performance that is greater than a for() loop will provide, you should consider apply(). We’ll create a matrix to illustrate its use.

tmp <- matrix(rep(1:3, times = 3), ncol = 3)
tmp
##      [,1] [,2] [,3]
## [1,]    1    1    1
## [2,]    2    2    2
## [3,]    3    3    3

The function apply() will ‘apply’ a function over the rows or columns of a matrix. Here we’ll use the function sum() to illustrate this.

apply(tmp, MARGIN = 1, sum)
## [1] 3 6 9
apply(tmp, MARGIN = 2, sum)
## [1] 6 6 6

The MARGIN parameter specifies whether to operate on rows or columns. I remember how 1 and 2 behave by remembering that when we specify the rows of a matrix with the square brackets ([]) we use the first position before the comma. To specify the columns we use the second position or after the comma.

More information on apply() can be found by consulting it’s manual page.

?apply

sweep()

The function sweep() is similar to apply() in that it iterates over the rows or columns of a matrix. However, it takes the additional parameters STATS and FUN. The parameter FUN is a function to be used. This may be an R function or a custom function you’ve created. The parameter STATS is a vector of data to be used by the function. Here we use apply() to create a mean for each column and then use sweep() to divide the values in the matrix by their column mean.

my_means <- apply(tmp, MARGIN = 2, sum) / nrow(tmp)
sweep(tmp, MARGIN = 2, STATS = my_means, FUN = "/")
##      [,1] [,2] [,3]
## [1,]  0.5  0.5  0.5
## [2,]  1.0  1.0  1.0
## [3,]  1.5  1.5  1.5

More information on sweep() can be found by consulting it’s manual page.

?sweep