Perform a bootstrap analysis on diversity statistics

This function is intended to perform bootstrap statistics on a matrix of multilocus genotype counts in different populations. Results from this function should be interpreted carefully as the default statistics are known to have a downward bias. See the details for more information.

Usage

diversity_boot(
  tab,
  n,
  n.boot = 1L,
  n.rare = NULL,
  H = TRUE,
  G = TRUE,
  lambda = TRUE,
  E5 = TRUE,
  ...
)

Arguments

tab: a table produced from the poppr function mlg.table(). MLGs in columns and populations in rows
n: an integer > 0 specifying the number of bootstrap replicates to perform (corresponds to R in the function boot::boot().
n.boot: an integer specifying the number of samples to be drawn in each bootstrap replicate. If n.boot < 2 (default), the number of samples drawn for each bootstrap replicate will be equal to the number of samples in the data set.
n.rare: a sample size at which all resamplings should be performed. This should be no larger than the smallest sample size. Defaults to NULL, indicating that each population will be sampled at its own size.
H: logical whether or not to calculate Shannon's index
G: logical whether or not to calculate Stoddart and Taylor's index (aka inverse Simpson's index).
lambda: logical whether or not to calculate Simpson's index
E5: logical whether or not to calculate Evenness
...: other parameters passed on to boot::boot() and diversity_stats().

Value

a list of objects of class "boot".

Details

Bootstrapping is performed in three ways:

if n.rare is a number greater than zero, then bootstrapping is performed by randomly sampling without replacement n.rare samples from the data.

\item if `n.boot` is greater than 1, bootstrapping is performed by
sampling n.boot samples from a multinomial distribution weighted by the
proportion of each MLG in the data.

\item if `n.boot` is less than 2, bootstrapping is performed by
sampling N samples from a multinomial distribution weighted by the
proportion of each MLG in the data.

Downward Bias

When sampling with replacement, the diversity statistics here present a downward bias partially due to the small number of samples in the data. The result is that the mean of the bootstrapped samples will often be much lower than the observed value. Alternatively, you can increase the sample size of the bootstrap by increasing the size of n.boot. Both of these methods should be taken with caution in interpretation. There are several R packages freely available that will calculate and perform bootstrap estimates of Shannon and Simpson diversity metrics (eg. entropart, entropy, simboot, and EntropyEstimation. These packages also offer unbiased estimators of Shannon and Simpson diversity. Please take care when attempting to interpret the results of this function.

Author

Zhian N. Kamvar

Examples

library(poppr)
data(Pinf)
tab <- mlg.table(Pinf, plot = FALSE)
diversity_boot(tab, 10L)
#> $`South America`
#> 
#> PARAMETRIC BOOTSTRAP
#> 
#> 
#> Call:
#> boot::boot(data = xi, statistic = boot_stats, R = n, sim = "parametric", 
#>     ran.gen = rg, mle = mle, H = H, G = G, lambda = lambda, E5 = E5)
#> 
#> 
#> Bootstrap Statistics :
#>       original      bias    std. error
#> t1*  3.2679442 -0.43536934  0.11694575
#> t2* 23.2903226 -9.14189912  2.12542971
#> t3*  0.9570637 -0.02908587  0.01013596
#> t4*  0.8825297 -0.06792110  0.04362898
#> 
#> $`North America`
#> 
#> PARAMETRIC BOOTSTRAP
#> 
#> 
#> Call:
#> boot::boot(data = xi, statistic = boot_stats, R = n, sim = "parametric", 
#>     ran.gen = rg, mle = mle, H = H, G = G, lambda = lambda, E5 = E5)
#> 
#> 
#> Bootstrap Statistics :
#>       original       bias    std. error
#> t1*  3.6870132  -0.53187041 0.092362346
#> t2* 34.9090909 -14.63104682 2.798661395
#> t3*  0.9713542  -0.02170139 0.008426053
#> t4*  0.8711297  -0.01960619 0.062031970
#> 
if (FALSE) { # \dontrun{
# This can be done in a parallel fashion (OSX uses "multicore", Windows uses "snow")
system.time(diversity_boot(tab, 10000L, parallel = "multicore", ncpus = 4L))
system.time(diversity_boot(tab, 10000L))
} # }