R provides a unique environment for performing population genetic analyses. You will particularly enjoy not having to switch data formats and operating systems to execute a series of analyses, as was the case until now. Furthermore, R provides graphing capabilities that are ready for use in publications, with only a little bit of extra effort. But first, let’s install R, install an integrated development environment, open R, and load R packages.

Installing R

  1. Download and install the R statistical computing and graphing environment. This works cross-platform on Windows, OS X and Linux operating systems.

  2. Download and install the free, open source edition of the RStudio Desktop integrated development environment (IDE) that we recommend.

Installing the required packages

The following packages are utilized in this primer:

  1. poppr
  2. adegenet
  3. ape
  4. ggplot2
  5. mmod
  6. magrittr
  7. dplyr
  8. treemap

Use the following script to install these packages:

install.packages(c("poppr", "mmod", "magrittr", "treemap"), repos = "http://cran.rstudio.com", dependencies = TRUE)

We wrote and actively maintain poppr (Kamvar, Tabima & Grünwald, 2014; Kamvar, Brooks & Grünwald, 2015) and it is heavily relied upon in this primer. Poppr is an R package. You can think of a package as a library of functions written and curated by someone in the R user community, which you can be loaded into R for use.

Once you’ve installed poppr, you can invoke (i.e., load) it by typing or cutting and pasting:

library("poppr")

This will load poppr and all dependent packages, such as adegenet and ade4. You will recognize loading by the prompts written to your screen.

Congratulations. You should now be all set for using R. Loading data and conducting your first analysis will be the topic of the next chapter. But before we go there lets provide a few useful resources.

A quick introduction to R using RStudio

Next, let’s review some of the basic features and functions of R. To start R, open the RStudio application from your programs folder or start menu. This will initialize your R session. To exit R, simply close the RStudio application.

Note that R is a case sensitive language!

Let’s get comfortable with R by submitting the following command on the command line (where R prompts you with a > in the lower left RStudio window pane) that will retrieve the current working directory on your machine:

getwd() # this command will print the current working directory

Note that the symbol ‘#’ is used to add comments to your code and you just type getwd() after the “>”.

Our primer is heavily based on the poppr and adegenet packages. To get help on any of their functions type a question mark before the empty function call as in:

?mlg # open the R documentation of the function mlg()

To quit R you can either use the RStudio > Quit pull-down menu command or execute ⌘ + Q (OS X) or ctrl + Q (PC).

Using magrittr

Various chapters throughout this primer will have the symbol %>% in the code. This is called a “pipe” operator and it allows code to be more readable by stringing together commands from right to left. Here’s a short description of these “pipes” with cats. When reading code, it can be thought of as equivalent to saying “and then”. For example, if you have three consecutive steps to a process, you would write this in English as:

Take your data and then do step one, and then do step two, and then do step three.

In R code with magrittr, assuming that each step is a function, it might be written as:

result <- data %>% step_one() %>% step_two() %>% step_three()

Below, are two examples of how code can be improved with magrittr. More details about magrittr can be found in this link.

Consider a fake Example:

Adapted from Hadley Wickham. Based on the children’s song, Little bunny foo foo.

foo_foo <- little_bunny()

bop_on(scoop_up(hop_through(foo_foo, forest), field_mouse), head)

# VS

foo_foo %>%
  hop_through(forest) %>%
  scoop_up(field_mouse) %>%
  bop_on(head)

Now for a real Example:

We will use the Phytophthora infestans microsatellite data from North and South America (Goss et al., 2014). Let’s calculate allelic diversity per population after clone-correction. This information can be found in our chapters on Population strata and clone correction and Locus based statistics.

library("poppr")
library("magrittr")
data(Pinf)

# Compare the traditional R script

allelic_diversity <- lapply(seppop(clonecorrect(Pinf, strata = ~Continent/Country)),
                            FUN = locus_table, info = FALSE)

# versus the magrittr piping:

allelic_diversity <- Pinf %>%
  clonecorrect(strata= ~Continent/Country) %>% # clone censor by continent and country.
  seppop() %>%                                # Separate populations (by continent)
  lapply(FUN = locus_table, info = FALSE)     # Apply the function locus_table to both populations

To observe the results type allelic_diversity into the console after each statement.

The %>% operator is thus good if you have to do a lot of small steps in your analysis. It allows your code to be more readable and reproducible.

Packages and getting help

One way that R shines above other languages for analysis is the fact that R packages in CRAN are all documented. Help files are written in HTML and give the user a brief overview of:

To see all of the help topics in a package, you can simply type:

help(package = "poppr") # Get help for a package.
help(amova)             # Get help for the amova function.
?amova                  # same as above.
??multilocus            # Search for functions that have the keyword multilocus.

Some packages include vignettes that can have different formats such as being introductions, tutorials, or reference cards in PDF format. You can look at a list of vignettes in all packages by typing:

browseVignettes()                     # see vignettes from all packages
browseVignettes(package = 'poppr')    # see vignettes from a specific package.

and to look at a specific vignette you can type:

vignette('poppr_manual')

Next, consider browsing Appendix 3 on “Introduction to R” if you are not yet familiar with R and RStudio. Otherwise, you are now ready to think about formatting and loading population genetic data into R.

References

Goss EM., Tabima JF., Cooke DEL., Restrepo S., Fry WE., Forbes GA., Fieland VJ., Cardenas M., Grünwald NJ. 2014. The Irish potato famine pathogen Phytophthora infestans originated in central mexico rather than the andes. Proceedings of the National Academy of Sciences 111:8791–8796. Available at: http://www.pnas.org/content/early/2014/05/29/1401884111.abstract

Kamvar ZN., Brooks JC., Grünwald NJ. 2015. Novel R tools for analysis of genome-wide population genetic data with emphasis on clonality. Name: Frontiers in Genetics 6:208. Available at: http://dx.doi.org/10.3389/fgene.2015.00208

Kamvar ZN., Tabima JF., Grünwald NJ. 2014. \(Poppr\): An R package for genetic analysis of populations with clonal, partially clonal, and/or sexual reproduction. PeerJ 2:e281. Available at: http://dx.doi.org/10.7717/peerj.281