We now live in the fast growing era of high throughput sequencing (HTS) that is revolutionizing our ability to understand genetic variation (Luikart et al., 2003; Grünwald, McDonald & Milgroom, 2016). Two factors are contributing to a need for new methods of analyzing data: 1. the data is now often in a genome-wide context where location within a genome is part of the analysis and 2. the number of variants are large.
The R computing language has become a great tool for analyzing population genomic data. A recent special issue in Molecular Ecology Resources provides a nice overview of the arsenal of tools available in R (Paradis et al., 2017). New tools have become available in R for analyzing HTS data including adegenet (Jombart, 2008), ape (Paradis, Claude & Strimmer, 2004), vcfR (Knaus & Grünwald, 2017), and poppr (Kamvar, Tabima & Grünwald, 2014; Kamvar, Brooks & Grünwald, 2015) among others. Section III of this primer is geared towards analyzing whole genome or reduced representation genomic data for populations using the variant call format (VCF). The next three chapters will focus on introducing the VCF file format, reading SNP data into R from high throughput sequencing projects, performing quality control, and conducting selected analyses using population genomic data.
Grünwald NJ., McDonald BA., Milgroom MG. 2016. Population genomics of fungal and oomycete pathogens. Annual Review of Phytopathology 54:323–346. Available at: http://arjournals.annualreviews.org/doi/full/10.1146/annurev-phyto-080614-115913
Jombart T. 2008. \(adegenet\): A R package for the multivariate analysis of genetic markers. Bioinformatics 24:1403–1405. Available at: https://doi.org/10.1093/bioinformatics/btn129
Kamvar ZN., Brooks JC., Grünwald NJ. 2015. Novel R tools for analysis of genome-wide population genetic data with emphasis on clonality. Name: Frontiers in Genetics 6:208. Available at: http://dx.doi.org/10.3389/fgene.2015.00208
Kamvar ZN., Tabima JF., Grünwald NJ. 2014. \(Poppr\): An R package for genetic analysis of populations with clonal, partially clonal, and/or sexual reproduction. PeerJ 2:e281. Available at: http://dx.doi.org/10.7717/peerj.281
Knaus BJ., Grünwald NJ. 2017. \({V}cfr\): A package to manipulate and visualize variant call format data in R. Molecular Ecology Resources 17:44–53. Available at: http://dx.doi.org/10.1111/1755-0998.12549
Luikart G., England PR., Tallmon D., Jordan S., Taberlet P. 2003. The power and promise of population genomics: From genotyping to genome typing. Nature reviews genetics 4:981–994. Available at: http://www.nature.com/nrg/journal/v4/n12/full/nrg1226.html
Paradis E., Gosselin T., Grünwald NJ., Jombart T., Manel S., Lapp H. 2017. Towards an integrated ecosystem of r packages for the analysis of population genetic data. Molecular Ecology Resources 17:1–4. Available at: http://dx.doi.org/10.1111/1755-0998.12636
Paradis E., Claude J., Strimmer K. 2004. APE: Analyses of phylogenetics and evolution in r language. Bioinformatics 20:289–290. Available at: https://doi.org/10.1093/bioinformatics/btg412