Produce a genotype accumulation curve — genotype

Genotype accumulation curves are useful for determining the minimum number of loci necessary to discriminate between individuals in a population. This function will randomly sample loci without replacement and count the number of multilocus genotypes observed.

Usage

genotype_curve(
  gen,
  sample = 100,
  maxloci = 0L,
  quiet = FALSE,
  thresh = 1,
  plot = TRUE,
  drop = TRUE,
  dropna = TRUE
)

Arguments

gen: a genclone, genind, or loci object.
sample: an integer defining the number of times loci will be resampled without replacement.
maxloci: the maximum number of loci to sample. By default, maxloci = 0, which indicates that n - 1 loci are to be used. Note that this will always take min(n - 1, maxloci)
quiet: if FALSE (default), Progress of the iterations will be displayed. If TRUE, nothing is printed to screen as the function runs.
thresh: a number from 0 to 1. This will draw a line at that fraction of multilocus genotypes, rounded. Defaults to 1, which will draw a line at the maximum number of observable genotypes.
plot: if TRUE (default), the genotype curve will be plotted via ggplot2. If FALSE, the resulting matrix will be visibly returned.
drop: if TRUE (default), monomorphic loci will be removed before analysis as these loci affect the shape of the curve.
dropna: if TRUE (default) and drop = TRUE, NAs will be ignored when determining if a locus is monomorphic. When FALSE, presence of NAs will result in the locus being retained. This argument has no effect when drop = FALSE

Value

(invisibly by deafuls) a matrix of integers showing the results of each randomization. Columns represent the number of loci sampled and rows represent an independent sample.

Details

Internally, this function works by converting the data into a loci object, which represents genotypes as a data frame of factors. Random samples are taken of 1 to n-1 columns of the matrix and the number of unique rows are counted to determine the number of multilocus genotypes in that random sample. This function does not take into account any definitions of MLGs via mlg.filter or mll.custom.

Author

Zhian N. Kamvar

Examples

data(nancycats)
nan_geno <- genotype_curve(nancycats)

if (FALSE) { # \dontrun{

# Marker Type Comparison --------------------------------------------------
# With AFLP data, it is often necessary to include more markers for resolution
data(Aeut)
Ageno <- genotype_curve(Aeut)

# Many microsatellite data sets have hypervariable markers
data(microbov)
mgeno <- geotype_curve(microbov)

# Adding a trendline ------------------------------------------------------

# Trendlines: you can add a smoothed trendline with geom_smooth()
library("ggplot2")
p <- last_plot()
p + geom_smooth()

# Producing Figures for Publication ---------------------------------------

# This data set has been pre filtered
data(monpop)
mongeno <- genotype_curve(monpop)

# Here, we add a curve and a title for publication
p <- last_plot()
mytitle <- expression(paste("Genotype Accumulation Curve for ",
                            italic("M. fructicola")))
p + geom_smooth() +
  theme_bw() +
  theme(text = element_text(size = 12, family = "serif")) +
  theme(title = element_text(size = 14)) +
  ggtitle(mytitle)
} # }