This website is an introduction to some of the principles of reproducible science in R. While we are using R and other tools for reproducible research, our focus here is not to teach you the tools, but how you can use them to make your research more reproducible. Before we get started on the bulk of the content, we will go over how to install the software needed for this tutorial and briefly introduce working with R.
First, install R and RStudio:
Download and install the R statistical computing and graphing environment. This works cross-platform on Windows, OS X and Linux operating systems.
Download and install the free, open source edition of the RStudio Desktop integrated development environment (IDE), which we recommend. This is basically a point-and-click interface for R that includes a text editor, file browser, and some other conveniences.
The following packages are used in this primer:
To install these packages, open RStudio and copy and paste the following code into the console:
my_packages <- c("rmarkdown","poppr", "agricolae", "dplyr", "tidyr", "ggplot2")
install.packages(my_packages, repos = "http://cran.rstudio.com", dependencies = TRUE)
Congratulations! You should now be all set for using R. To ensure that everything is set up correctly, you can go through steps 1 and 2 in the markdown chapter.
If you want to create PDF documents, you will need a \(\LaTeX\) installation. For OSX and Ubuntu users, this can be a large download and you will want to ensure you have a good connection:
To those that know of and fear LaTeX: don’t worry, you don’t need to write any LaTeX code to produce PDFs from Markdown.
Git is a version control program that we will cover, but since installation requirements can vary between operating systems, we are not requiring it for the workshop. However, if you would like to install it, Here’s a website that covers installation for the major operating systems.
Let’s review some of the basic features and functions of R. To start R, open the RStudio application from your programs folder or start menu. This will initialize your R session. To exit R, simply close the RStudio application.
Note that R is a case-sensitive language!
Let’s get comfortable with R by submitting the following command on the console (where R prompts you with a >
in the lower left RStudio window pane) that will tell you something about your version of R and the packages you have, which is useful for reporting reproducible research.
sessionInfo() # This command will tell you information about your current R session
## R version 3.3.1 (2016-06-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.1 LTS
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] printr_0.0.6 knitr_1.14
##
## loaded via a namespace (and not attached):
## [1] backports_1.0.4 magrittr_1.5 assertthat_0.1
## [4] rprojroot_1.1 formatR_1.4 tools_3.3.1
## [7] htmltools_0.3.5 yaml_2.1.13 tibble_1.2
## [10] Rcpp_0.12.7 stringi_1.1.2 rmarkdown_1.1.9012
## [13] stringr_1.1.0 digest_0.6.10 evaluate_0.10
Note that any text after a ‘#’ symbol is a comment and does not affect the code execution; you can just type
getwd()
after the “>”.
One way that R shines above other languages is that R packages in CRAN are all documented and easy to install. Help files are written in HTML and give the user a brief overview of:
To get help on any R function, type a question mark before the empty function. Here’s an example of how to get help about the gather()
function from the tidyr package:
library('tidyr') # The package with the gather() function.
?gather # open the R documentation of the function gather()
gather | R Documentation |
Gather takes multiple columns and collapses into key-value pairs, duplicating all other columns as needed. You use gather()
when you notice that you have columns that are not variables.
gather(data, key, value, ..., na.rm = FALSE, convert = FALSE, factor_key = FALSE)
data
|
A data frame. |
key, value
|
Names of key and value columns to create in output. |
…
|
Specification of columns to gather. Use bare variable names. Select all variables between x and z with |
na.rm
|
If |
convert
|
If |
factor_key
|
If |
gather_
for a version that uses regular evaluation and is suitable for programming with.
library(dplyr) # From http://stackoverflow.com/questions/1181060 stocks <- data_frame( time = as.Date('2009-01-01') + 0:9, X = rnorm(10, 0, 1), Y = rnorm(10, 0, 2), Z = rnorm(10, 0, 4) ) gather(stocks, stock, price, -time) stocks %>% gather(stock, price, -time) # get first observation for each Species in iris data -- base R mini_iris <- iris[c(1, 51, 101), ] # gather Sepal.Length, Sepal.Width, Petal.Length, Petal.Width gather(mini_iris, key = flower_att, value = measurement, Sepal.Length, Sepal.Width, Petal.Length, Petal.Width) # same result but less verbose gather(mini_iris, key = flower_att, value = measurement, -Species) # repeat iris example using dplyr and the pipe operator library(dplyr) mini_iris <- iris %>% group_by(Species) %>% slice(1) mini_iris %>% gather(key = flower_att, value = measurement, -Species)
If you want to run the examples, you can either copy and paste the commands to your R console, or you can run them all with:
example("gather", package = "tidyr")
Other ways of getting help:
help(package = "tidyr") # Get help for a package.
help("gather") # Get help for the gather function
?gather # same as above
??multilocus # Search for help that has the keyword 'multilocus' in all packages
Some packages include vignettes that can have different formats such as being introductions, tutorials, or reference cards in PDF format. You can look at a list of vignettes in all packages by typing:
browseVignettes() # see vignettes from all packages
browseVignettes(package = 'poppr') # see vignettes from a specific package.
and to look at a specific vignette you can type:
vignette('mlg')