A very short introduction to R

R is a large and complex topic. Even those who use it every day only know a small subset of what there is to know. However, you don’t need to know very much to do some very impressive things. This section is a very short introduction to the most basic aspects of R to get you started. Look at the list of resources and the end of this section for more complete tutorials.

What is R?

Unlike spreadsheet programs, like excel, R is a text-based “interactive” program, meaning that you use it by typing commands. If you type a valid command, R will do something, like printing something to the screen or modifying a piece of data. If you type an invalid command, R will print an error message and, ideally, nothing will happen. Where you type commands is referred to as the R console. If you are using RStudio, this is the lower left window. Here is what an R console looks like when R is first started:

R version 3.4.4 (2018-03-15) -- "Someone to Lean On"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> |

For example, if I type 1 + 1 into the console, R will do the math and “return” the result.

> 1+1
[1] 2

The thing after the > is what I typed and the line below is what was returned.

Since I did not save the result, it was printed instead. The [1] at the start keeps track of how many values have been printed. Its not too useful when there is only a single value, as in this case, but gets more useful when many values are printed. For example, I could print values from 1 to 100 by typing:

> 1:100
  [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21  22  23
 [24]  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46
 [47]  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69
 [70]  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92
 [93]  93  94  95  96  97  98  99 100

Now the [24] on the second line means that 24 is the 24th element.

Variables

In programmer-speak, any piece of data is called a variable. There are different types, or classes, of variables such as character (text), numeric (numbers), and data.frames (tables). You can “save” the result of commands in R using <- like so:

> a_sequence <- 1:100

Note how nothing is printed since the result of 1:100 was saved in the a_sequence variable, but it can still be printed by entering the name of the variable:

> a_sequence
  [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21  22  23
 [24]  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46
 [47]  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69
 [70]  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92
 [93]  93  94  95  96  97  98  99 100

Now, anytime a_sequence is used it would be as if 1:100 was typed instead (although the calculation would not be done again).

Functions

Any command/tool/action in R is a function Most functions are used by putting the input to the function (if any) in parentheses following the function name. For example, the length function returns how many things are in a variable:

> length(a_sequence)
[1] 100

Even things like + and : are functions, although these are used in a special way. For example, they could also be used this way, just to demonstrate the idea:

> `:`(1, 10)
 [1]  1  2  3  4  5  6  7  8  9 10
> `+`(1, 1)
[1] 2

The vast majority of functions are used like length. Perhaps confusingly, functions are also variables and can be created and “saved”. For example, the following code creates a function to add two numbers together and saves in the variable add:

> add <- function(x, y) { x + y }

We can now use this like any other function:

> add(1, 1)
[1] 2

And it can be printed like any other variable:

> add
function(x, y) { x + y }

Functions can range in complexity from simple ones like add to very long and complex functions that call other custom functions.

Comments

Reading code is difficult even if you wrote it. Although it might seem that the purpose of the code you just wrote is obvious, it will probably be much less clear in a month or two. Be kind to your future self or any other unfortunates that must interpret this code and use comments. Comments are parts of text that R ignores and programmers use to leave notes for themselves and other programmers. Anything that appears after a # is a comment and will be ignored by R.

# Add 1 + 1 and writes the result to console
1 + 1 # This is addition
[1] 2

Comments are only needed when you save R commands in a text file, rather than experimenting on the R console interactively.

Typical workflow

In the examples above we have been assuming you are using R “interactively”, which means typing things directly into the console. This is fine for experimenting, but is really not the best way to use R. You should write your commands in a plain text file with a “.R” file extension and copy and paste them into the R console. That way you have a record of what you have done and can rerun your entire analysis easily. If you are using RStudio, you can move the cursor to the line you want to run in the text editor and Ctrl + Enter to copy, paste, and execute the line in the console all at once.

R packages

An R package is a set of user-defined functions organized so that people can easily share and use them. Most of the functions used by most R users are from R packages rather than those supplied by base R. R packages can be installed in a few ways, but the most common is to download them from The Comprehensive R Archive Network (CRAN) using the install.packages function. For example stringr is an R package that supplies functions to work with text.

> install.packages("stringr")

Once installed, a package must be “loaded” using the library function before any functions it supplies can be used:

> library("stringr")

Now we can use functions from the stringr package. For example, the str_count function counts the number of times a piece of text occurs in a larger piece of text:

> str_count(string = "R is awesome!!!", pattern = "!")
[1] 3

Getting help/documentation

One of the most important skills in R is looking up (and interpreting) the built-in documentation. You can get the help for any R function by prefixing it with a question mark like so:

> ?getwd

Help files have an overview of:

  • purpose of a function
  • options it takes
  • output it yields
  • examples demonstrating its usage

This built-in documentation for each function can be a bit terse and hard to interpret and its quality varies greatly between packages. R packages will also usually have vignettes, which combine examples and explanations for how functions in a package are generally used. These can be found online or accessed in the installed package using the browseVignettes function:

> browseVignettes(package = "metacoder")

Finally, some of the best documentation can be found online. For some of the more popular packages, like dplyr and ggplot2, the best documentation is from unofficial sources, like blogs, books, and question/answer sites like Stack Overflow.

What to do when you dont know what to do

When you run into problems or don’t know how to do something, it will save you a lot of time if you do an internet search describing the problem you have before trying to fix it by guessing, reading through documentation, or making a custom solution. If the problem has an error message, then copy and paste the error into a search engine. At least 95% of problems are well documented and discussed. If you think “there must be a better way to do this”, there almost certainly is and some nice person on the internet spent hours writing a blog post about it or answering a question on Stack Overflow. The trick is knowing what to type since the problem/concept might take some jargon to describe. Try describing your problem a few ways before giving up. If you know someone who uses R more, try asking them what to search for rather than asking them to help you themselves (unless they are super nice and/or have nothing better to do). For example, “I want to combine two tables based on the content of a column they both share” is a very common need, but it might be hard to get Google to understand what you want. Someone who uses R a lot will tell you to search for “joining” or “joins”, which will lead you in the right direction, and will take only a few seconds of their time.

More resources

This was only the most basic concepts in R, but there are lots of free tutorials and info online to learn more. Searching “beginner R tutorial” on the internet will show hundreds of free resources. Here are a few resources for beginners to learn R:

Advanced

Books