This website is an introduction to some of the principles of reproducible science in R. While we are using R and other tools for reproducible research, our focus here is not to teach you the tools, but how you can use them to make your research more reproducible. Before we get started on the bulk of the content, we will go over how to install the software needed for this tutorial and briefly introduce working with R.

Installing R and RStudio

First, install R and RStudio:

  1. Download and install the R statistical computing and graphing environment. This works cross-platform on Windows, OS X and Linux operating systems.

  2. Download and install the free, open source edition of the RStudio Desktop integrated development environment (IDE), which we recommend. This is basically a point-and-click interface for R that includes a text editor, file browser, and some other conveniences.

Installing the required packages

The following packages are used in this primer:

  1. rmarkdown (creating reports)
  2. agricolae (agricultural disease data analysis)
  3. ggplot2 (graphs)
  4. poppr (genetic data analysis)
  5. dplyr (data manipulation)
  6. tidyr (data manipulation)

To install these packages, open RStudio and copy and paste the following code into the console:

my_packages <- c("rmarkdown","poppr", "agricolae", "dplyr", "tidyr", "ggplot2")
install.packages(my_packages, repos = "http://cran.rstudio.com", dependencies = TRUE)

Congratulations! You should now be all set for using R. To ensure that everything is set up correctly, you can go through steps 1 and 2 in the markdown chapter.

Optional software

PDFs with LaTeX

If you want to create PDF documents, you will need a \(\LaTeX\) installation. For OSX and Ubuntu users, this can be a large download and you will want to ensure you have a good connection:

To those that know of and fear LaTeX: don’t worry, you don’t need to write any LaTeX code to produce PDFs from Markdown.

Git

Git is a version control program that we will cover, but since installation requirements can vary between operating systems, we are not requiring it for the workshop. However, if you would like to install it, Here’s a website that covers installation for the major operating systems.

A quick introduction to R using RStudio

Let’s review some of the basic features and functions of R. To start R, open the RStudio application from your programs folder or start menu. This will initialize your R session. To exit R, simply close the RStudio application.

Note that R is a case-sensitive language!

Let’s get comfortable with R by submitting the following command on the console (where R prompts you with a > in the lower left RStudio window pane) that will tell you something about your version of R and the packages you have, which is useful for reporting reproducible research.

sessionInfo() # This command will tell you information about your current R session
## R version 3.3.1 (2016-06-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.1 LTS
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] printr_0.0.6 knitr_1.14  
## 
## loaded via a namespace (and not attached):
##  [1] backports_1.0.4    magrittr_1.5       assertthat_0.1    
##  [4] rprojroot_1.1      formatR_1.4        tools_3.3.1       
##  [7] htmltools_0.3.5    yaml_2.1.13        tibble_1.2        
## [10] Rcpp_0.12.7        stringi_1.1.2      rmarkdown_1.1.9012
## [13] stringr_1.1.0      digest_0.6.10      evaluate_0.10

Note that any text after a ‘#’ symbol is a comment and does not affect the code execution; you can just type getwd() after the “>”.

Packages and getting help

One way that R shines above other languages is that R packages in CRAN are all documented and easy to install. Help files are written in HTML and give the user a brief overview of:

  • the purpose of a function
  • the parameters it takes
  • the output it yields
  • examples demonstrating its usage

To get help on any R function, type a question mark before the empty function. Here’s an example of how to get help about the gather() function from the tidyr package:

library('tidyr') # The package with the gather() function.
?gather          # open the R documentation of the function gather()
gather R Documentation

Gather columns into key-value pairs.

Description

Gather takes multiple columns and collapses into key-value pairs, duplicating all other columns as needed. You use gather() when you notice that you have columns that are not variables.

Usage

gather(data, key, value, ..., na.rm = FALSE, convert = FALSE,
  factor_key = FALSE)

Arguments

data

A data frame.

key, value

Names of key and value columns to create in output.

Specification of columns to gather. Use bare variable names. Select all variables between x and z with x:z, exclude y with -y. For more options, see the select documentation.

na.rm

If TRUE, will remove rows from output where the value column in NA.

convert

If TRUE will automatically run type.convert on the key column. This is useful if the column names are actually numeric, integer, or logical.

factor_key

If FALSE, the default, the key values will be stored as a character vector. If TRUE, will be stored as a factor, which preserves the original ordering of the columns.

See Also

gather_ for a version that uses regular evaluation and is suitable for programming with.

Examples

library(dplyr)
# From http://stackoverflow.com/questions/1181060
stocks <- data_frame(
  time = as.Date('2009-01-01') + 0:9,
  X = rnorm(10, 0, 1),
  Y = rnorm(10, 0, 2),
  Z = rnorm(10, 0, 4)
)

gather(stocks, stock, price, -time)
stocks %>% gather(stock, price, -time)

# get first observation for each Species in iris data -- base R
mini_iris <- iris[c(1, 51, 101), ]
# gather Sepal.Length, Sepal.Width, Petal.Length, Petal.Width
gather(mini_iris, key = flower_att, value = measurement,
       Sepal.Length, Sepal.Width, Petal.Length, Petal.Width)
# same result but less verbose
gather(mini_iris, key = flower_att, value = measurement, -Species)

# repeat iris example using dplyr and the pipe operator
library(dplyr)
mini_iris <-
  iris %>%
  group_by(Species) %>%
  slice(1)
mini_iris %>% gather(key = flower_att, value = measurement, -Species)

If you want to run the examples, you can either copy and paste the commands to your R console, or you can run them all with:

example("gather", package = "tidyr")

Other ways of getting help:

help(package = "tidyr")  # Get help for a package.
help("gather")           # Get help for the gather function
?gather                  # same as above
??multilocus             # Search for help that has the keyword 'multilocus' in all packages

Some packages include vignettes that can have different formats such as being introductions, tutorials, or reference cards in PDF format. You can look at a list of vignettes in all packages by typing:

browseVignettes()                     # see vignettes from all packages
browseVignettes(package = 'poppr')    # see vignettes from a specific package.

and to look at a specific vignette you can type:

vignette('mlg')