While Markdown is useful for simple note taking and writing, RMarkdown can be used to mix complex analyses and programming in the R language with all the features of Markdown. RMarkdown is an extension of the Markdown syntax that allows R code to be displayed and run when the document is rendered. Graphs, tables, file input/output, and bibliographies can all be incorporated using embedded R code. Typical uses range from adding minor bits of code to time-stamp notes to running complex analyses and making a report of the results. RMarkdown is thus ideal for lab notebooks and reproducible science as the scripts can be run over and over as they are modified. Imagine you find one small mistake in a data point after finishing a complicated analysis. If you used RMarkdown, you simply change the data, rerun the analysis with a single click, and the change will be applied to every graph and result. If you use excel and/or manually-manipulated graphs, you have to everything again the exact way you did it last time.

In this section of the tutorial, we will be summarizing the most commonly used features of RMarkdown, but there are many aspects that we will not cover. Unlike pure Markdown, RMarkdown is very flexible and complex. Therefore, as you go through the exercises, it is probably best to focus on remembering what RMarkdown can do rather than how to do it. At the end of the section there are resources that you can reference when you need to look up how to do a specific task.

Mixing R and Markdown

There are two ways to add R code to Markdown: as inline or as code chunks. The syntax for adding inline code is the same as specifying a monospace font (surrounding text with `), except that an r is added after the first `. Inline code syntax is used for short, simple pieces of code that display their results mid-sentence.

Multiple lines of code can be added by putting the lines of code between ```{r} and ```, each on their own line. This is refereed to as a chunk. Unlike the inline method, chunks of code defined this way have many options controlling how they are processed when a document is rendered; we will cover some of these options soon.

Lets work through a set of demonstrations to learn the basic syntax of RMarkdown:

Step 1) Create a new markdown document and delete all the default template content. Paste in the code below, save the file as “rmarkdown-tutorial.Rmd”, and press “Knit HTML”.

### **R**Markdown

Date: `r Sys.time()` 

```{r}
my_data <- c(1, 3, 2.4, 6, 7.2)
mean(my_data)
median(my_data)
```

RMarkdown

Date: 2016-11-02 10:37:49

my_data <- c(1, 3, 2.4, 6, 7.2)
mean(my_data)

## [1] 3.92

median(my_data)

## [1] 3

Code evaluation

In the above example, the code in the chunk was both displayed and executed. However, it is possible to show code without executing it. eval is one of many chunk options that modify how the code is handled; these options are defined in the curly braces after the r. Setting the chunk option eval to FALSE causes the code not to be executed.

Step 2) Copy the text from the box below and paste it on to the end of your Markdown document and press “Knit HTML”.

### Code evaluation

#### Normal evaluation:

```{r}
var( c(1, 3, 2.4, 6, 7.2) ) # how to calculate variance in R
```


#### Unevaluated code: 

```{r, eval=FALSE}
var( c(1, 3, 2.4, 6, 7.2) )
```

Code evaluation

Normal evaluation:

var( c(1, 3, 2.4, 6, 7.2) ) # how to calculate variance in R

## [1] 6.692

Unevaluated code:

var( c(1, 3, 2.4, 6, 7.2) )

Text results

What results of the code are included in the rendered document can also be controlled with chunk options. There are at least 9 options that influence how code affects the output document, but we will only cover the most commonly used here.

Displaying code

The echo option controls whether the code is displayed or not. Setting echo = FALSE will cause the code to not be shown in the output, but will not affect its execution.

Step 3) Copy the text from the box below and paste it on to the end of your Markdown document and press “Knit HTML”.

### Text Results

#### Invisible code:

```{r, echo=FALSE}
var( c(1, 3, 2.4, 6, 7.2) )
```

Text Results

Invisible code:

## [1] 6.692

Displaying results

You can hide the typical text output of chunks by setting results = 'hide'. This only affects the printing of variables, not things like warnings, errors, and graphs.

Step 4) Copy the text from the box below and paste it on to the end of your Markdown document and press “Knit HTML”.

#### Invisible results:

```{r, results = 'hide'}
x <- 'This code was executed'
print(x)
```

```{r}
print(x)
```

Invisible results:

x <- 'This code was executed'
print(x)

print(x)

## [1] "This code was executed"

Hiding warnings and messages

As R is an interactive language, some functions will occasionally display text that conveys a message to you, but is not necessarily part of your data. These are known as messages, warnings, and errors. By default, the options are set to show you all warnings and messages and to stop the execution of your document if an error comes up. You can control these by using the corresponding chunk options.

Step 5) Copy the text from the box below and paste it on to the end of your Markdown document and press “Knit HTML”.

#### Messages, warnings, and errors:

The `poppr.amova()` function from *poppr*, will normally display
progress messages of how it treats the data before the analysis. The following 
code will only display messages.

```{r, results = 'hide', message = FALSE}
library('poppr') # This normally prints a message
data(monpop)
splitStrata(monpop) <- ~Tree/Year/Symptom
poppr.amova(monpop, hier = ~Tree) # Here, we get warnings of Zero distance(s)
```

This will display the output and messages, but no warnings.

```{r, warning = FALSE}
poppr.amova(monpop, hier = ~Tree) # No warnings, but we get messages
```

The code below will not display any warnings or messages. This is the cleanest way
to display things for publication.

```{r, warning = FALSE, message = FALSE}
poppr.amova(monpop, hier = ~Tree) # No warnings or messages :)
```

Messages, warnings, and errors:

The poppr.amova() function from poppr, will normally display progress messages of how it treats the data before the analysis. The following code will only display messages.

library('poppr') # This normally prints a message
data(monpop)
splitStrata(monpop) <- ~Tree/Year/Symptom
poppr.amova(monpop, hier = ~Tree) # Here, we get warnings of Zero distance(s)

## Warning in is.euclid(xdist): Zero distance(s)

## Warning in is.euclid(distmat): Zero distance(s)

This will display the output and messages, but no warnings.

poppr.amova(monpop, hier = ~Tree) # No warnings, but we get messages

## 
##  No loci with missing values above 5% found.

## Distance matrix is non-euclidean.

## Utilizing quasieuclid correction method. See ?quasieuclid for details.

## $call
## ade4::amova(samples = xtab, distances = xdist, structures = xstruct)
## 
## $results
##                  Df    Sum Sq   Mean Sq
## Between samples   3  142.9005 47.633508
## Within samples  690 2501.2569  3.625010
## Total           693 2644.1574  3.815523
## 
## $componentsofcovariance
##                                 Sigma          %
## Variations  Between samples 0.3099169   7.876052
## Variations  Within samples  3.6250100  92.123948
## Total variations            3.9349269 100.000000
## 
## $statphi
##                          Phi
## Phi-samples-total 0.07876052

The code below will not display any warnings or messages. This is the cleanest way to display things for publication.

poppr.amova(monpop, hier = ~Tree) # No warnings or messages :)

## $call
## ade4::amova(samples = xtab, distances = xdist, structures = xstruct)
## 
## $results
##                  Df    Sum Sq   Mean Sq
## Between samples   3  142.9005 47.633508
## Within samples  690 2501.2569  3.625010
## Total           693 2644.1574  3.815523
## 
## $componentsofcovariance
##                                 Sigma          %
## Variations  Between samples 0.3099169   7.876052
## Variations  Within samples  3.6250100  92.123948
## Total variations            3.9349269 100.000000
## 
## $statphi
##                          Phi
## Phi-samples-total 0.07876052

Unlike message and warning, error is FALSE by default and any errors that occur stop the rendering of the document. If error is set to TRUE, errors will be displayed and not stop the rendering of the document.

Step 6) Copy the text from the box below and paste it on to the end of your Markdown document and press “Knit HTML”.

```{r, error = TRUE}
poppr.amova(monpop, hier = Tree) # Forgot the tilde (~)
```

poppr.amova(monpop, hier = Tree) # Forgot the tilde (~)

## Error in poppr.amova(monpop, hier = Tree): object 'Tree' not found

Tables

Like much in R, there are multiple ways to create tables. Tables can be created “by hand” using an RMarkdown syntax, or made using a few R functions. Data can be entered manually in a grid of - and |, with : being used to indicate the alignment of cell contents. The same format is made automatically with the kable function from the knitr package using data stored in R variables.

Step 7) Copy the text from the box below and paste it on to the end of your Markdown document and press “Knit HTML”.

### Tables

#### Manual tables

|  x| squared| cubed|
|--:|-------:|-----:|
|  1|       1|     1|
|  2|       4|     8|
|  3|       9|    27|

#### Kable tables

```{r}
library(knitr)
data = data.frame(x = 1:3)
data$squared = data$x ^ 2
data$cubed = data$x ^ 3
kable(data)
```

Tables

Manual tables

x	squared	cubed
1	1	1
2	4	8
3	9	27

Kable tables

library(knitr)
data = data.frame(x = 1:3)
data$squared = data$x ^ 2
data$cubed = data$x ^ 3
kable(data)

x	squared	cubed
1	1	1
2	4	8
3	9	27

A simple HTML table can also be made using a package called xtable. However, this package is focused mostly on making tables for PDF output and we will not cover it here.

Figures

One of the major advantages of mixing R with markdown is access to R’s well known graphical prowess. Graphs can be inserted into RMarkdown documents the same way text results are. There are two commonly used ways to make graphs in R: the R base graphics and the newer ggplot2 package. Both are very flexible and each have their adherents. The R base graphics are known for their flexibility whereas ggplot2 is known for consistent syntax and aesthetics. Both can be used to create complex and attractive graphics. We will demonstrate both graphing systems, but will not go into details here.

Typical usage

By default, any plots made are displayed after the line of code that created them. The example below uses the R base graphics to plot the distribution of 10 numbers drawn from a normal distribution.

Step 8) Copy the text from the box below and paste it on to the end of your Markdown document and press “Knit HTML”.

### Adding figures

#### Typical usage

```{r}
x = rnorm(10)
print(x)
hist(x)
```

Adding figures

Typical usage

x = rnorm(10)
print(x)

##  [1]  0.01489948 -0.07494234  0.93927687 -2.44985924 -0.69817407
##  [6]  0.52458460  0.09936905 -0.34945109 -0.92766992 -2.45018020

hist(x)

Hiding figures

Figures can be excluded from the output document by setting the chunk option fig.show to 'hide'

Step 9) Copy the text from the box below and paste it on to the end of your Markdown document and press “Knit HTML”.

#### Hiding figures

```{r, fig.show='hide'}
x = rnorm(10)
print(x)
hist(x)
```

Hiding figures

x = rnorm(10)
print(x)

##  [1] -0.34878866 -2.09188500 -1.24264475  1.55024179 -0.76951356
##  [6] -1.42929362  0.35388162  0.18875087  0.08739854  0.64678090

hist(x)

Figure sizes

The size of the figures made in a given chunk can be changed using the options fig.width and fig.height, specified in inches. The example below uses ggplot2 to make a graph similar to the previous ones.

Step 10) Copy the text from the box below and paste it on to the end of your Markdown document and press “Knit HTML”.

#### Figure sizes

```{r, fig.width=2, fig.height=4}
library(ggplot2)
x = data.frame(var = rnorm(100))
ggplot(x) +
  geom_histogram(aes(x = var), binwidth = 0.2)
```

```{r, fig.width=4, fig.height=2}
ggplot(x) +
  geom_histogram(aes(x = var), binwidth = 0.2)
```

Figure sizes

library(ggplot2)
x = data.frame(var = rnorm(100))
ggplot(x) +
  geom_histogram(aes(x = var), binwidth = 0.2)

ggplot(x) +
  geom_histogram(aes(x = var), binwidth = 0.2)

YAML headers

As you might have noticed, the top of the template document RStudio provides when creating a new RMarkdown document contains some settings in a distinct format. This header information of RMarkdown file in a language called YAML, used to store information. In this case, the YAML is used to change settings of the functions that render the Markdown to various formats when you press “Knit”. The YAML header can be used to do things like change the output format to PDF, add a table of contents, and specify themes.

For example, the YAML header below adds section numbers and a table of contents derived from headers (one or more # starting a line).

Step 11) Copy the text from the box below and paste it at the start of your Markdown document and press “Knit HTML”.

---
title: "Example Rmarkdown document"
date: "2016-07-30"
output:
  html_document:
    toc: true
    number_sections: true
---

On of the best uses of the YAML header is to specify the format you want the output document to be when it is rendered. There are multiple output types, but the most used besides HTML is PDF. The code below shows how to set the output type to PDF.

Step 12) Replace the YAML header at the start of your Markdown document with the one below and press “Knit HTML”.

---
title: "Example Rmarkdown document"
date: "2016-07-30"
output:
  pdf_document:
    toc: true
    number_sections: true
---

Adding a bibliography

There are a few ways of adding citations/bibliographies to your RMarkdown document automatically. We will be using an external file containing a list of references in a standard format called BibTeX. Like RMarkdown, this is simply a plain text format viewable in any text editor. Google scholar provides convenient links for references in BibTeX format that can easily be copied and pasted into file containing the references to be cited. Text files of any format can be made in RStudio by clicking on the “New file” drop-down menu and choosing “Text file”.

Step 13) Click on the “New file” dropdown menu and choose “Text file”. Paste the text below into the file and save it as “example_bibliography.bibtex”.

@article{baumer2014r,
  title={R Markdown: Integrating a reproducible analysis tool into introductory statistics},
  author={Baumer, Ben and Cetinkaya-Rundel, Mine and Bray, Andrew and Loi, Linda and Horton, Nicholas J},
  journal={arXiv preprint arXiv:1402.1894},
  year={2014}
}
@article{racine2012rstudio,
  title={RStudio: A Platform-Independent IDE for R and Sweave},
  author={Racine, Jeffrey S},
  journal={Journal of Applied Econometrics},
  volume={27},
  number={1},
  pages={167--172},
  year={2012},
  publisher={Wiley Online Library}
}

Now the information for a few papers are stored in a file format both computers and humans can read. Next, we will associate the bibliography file with the RMarkdown file and add a few citations. This is done using a setting in the YAML header.

There is an ID for each article stored in the BibTeX file located in the lines that start with @article{. Inline citations can be inserted by adding @ followed by the ID of the paper. Surrounding the @ID with [ and ] changes the format of the citation slightly. Any papers cited will appear in a bibliography added to the bottom of the rendered document.

Step 14) Create a new markdown document and delete all the default template content. Paste in the code below, save the file as “rmarkdown-bibliography.Rmd”, and press “Knit HTML”.

---
output:
  html_document:
    toc: true
    number_sections: true
bibliography: example_bibliography.bibtex
---


### Adding a bibliography

RMarkdown [@baumer2014r] is referenced in the @racine2012rstudio paper.  

### References

Adding a bibliography

RMarkdown (Baumer et al. 2014) is referenced in the Racine (2012) paper.

References

Baumer, Ben, Mine Cetinkaya-Rundel, Andrew Bray, Linda Loi, and Nicholas J Horton. 2014. “R Markdown: Integrating a Reproducible Analysis Tool into Introductory Statistics.” ArXiv Preprint ArXiv:1402.1894.

Racine, Jeffrey S. 2012. “RStudio: A Platform-Independent IDE for R and Sweave.” Journal of Applied Econometrics 27 (1). Wiley Online Library: 167–72.

Additional resources

RMarkdown is much more complex than pure Markdown due to all the subtle ways code can be handled and the results displayed. Most people will need to constantly reference documentation and “cheat sheets” to remind themselves of ways to do occasionally necessary tasks. It is best to commit to memory only the most commonly used tools and know where to look up the rest.

Note that the rendering of RMarkdown is implemented using an R package called knitr so you might see references to things like “knitr options”. This is also the reason the button you press to render RMarkdown into an output document is labeled “Knit HTML”.

Albert Einstein once said “Never memorize something that you can look up.” In that spirit, here are some great places to look things up:

The RMarkdown website: This is the official website for RMarkdown. It contains examples of most of the features of RMarkdown.
The Knitr website: Information on the primary tool used by RMarkdown to render documents. This is official source of information on chunk options.
An RMarkdown cheatsheet: This summarizes the main tools of RMarkdown in just two pages.

Try it!

Step 15) Replicate the rest of the AUDPC analysis.

In a new RMarkdown document called audpc_report.Rmd, try to recreate audpc_report_to_replicate.html using the information in original_audpc_report.docx and original_audpc_code.R. These files are in the repository you downloaded earlier (https://github.com/grunwaldlab/audpc_example). You can start by reusing the markdown from the README.md file you created in the exercise for the previous section (“Markdown”).

Step 16) Replicate genetic analysis

Much like the AUDPC example, we have placed an example of a reproducible analysis of the population genetic structure of Phytophthora infestans at https://github.com/grunwaldlab/pinfestans_example and wrote it up in a docx document with the R script used to produce the figures and tables. Download this repository and attempt to recreate the word document using RMarkdown.

Glossary

BibTeX: A plain text format used to store citation/bibliography information

chunk: One or more consecuative lines of code in a RMarkdown file

errors: text displayed to the user to let them know what went wrong in their code.

ggplot2: An R package used for graphing as an alternative to the base graphics

knitr: An R package used to render RMarkdown, as well as other mixes of programming and markup languages

messages: text displayed to convey extra information to the user such as a function’s progress.

R base graphics: original graphing capabilities that are part of the core R language

RMarkdown: An extension of the Markdown syntax that allows R code to be displayed and run when the document is rendered

warnings: text displayed to convey non-normal behavior that the user might be concerned about.

xtable: An R package to print PDF/HTML tables from data encoded in R variables

YAML: A plain text computer language used to store heirarchical information