Find and fix inconsistent repeat lengths

Attempts to fix inconsistent repeat lengths found by test_replen

Usage

fix_replen(gid, replen, e = 1e-05, fix_some = TRUE)

Arguments

gid: a genind or genclone object
replen: a numeric vector of repeat motif lengths.
e: a number to be subtracted or added to inconsistent repeat lengths to allow for proper rounding.
fix_some: if TRUE (default), when there are inconsistent repeat lengths that cannot be fixed by subtracting or adding e, those than can be fixed will. If FALSE, the original repeat lengths will not be fixed.

Value

a numeric vector of corrected repeat motif lengths.

Details

This function is modified from the version used in doi:10.5281/zenodo.13007 .
Before being fed into the algorithm to calculate Bruvo's distance, the amplicon length is divided by the repeat unit length. Because of the amplified primer sequence attached to sequence repeat, this division does not always result in an integer and so the resulting numbers are rounded. The rounding also protects against slight mis-calls of alleles. Because we know that $$\frac{(A - e) - (B - e)}{r}$$ is equivalent to $$\frac{A - B}{r}$$, we know that the primer sequence will not alter the relationships between the alleles. Unfortunately for nucleotide repeats that have powers of 2, rounding in R is based off of the IEC 60559 standard (see round), that means that any number ending in 5 is rounded to the nearest even digit. This function will attempt to alleviate this problem by adding a very small amount to the repeat length so that division will not result in a 0.5. If this fails, the same amount will be subtracted. If neither of these work, a warning will be issued and it is up to the user to determine if the fault is in the allele calls or the repeat lengths.

References

Zhian N. Kamvar, Meg M. Larsen, Alan M. Kanaskie, Everett M. Hansen, & Niklaus J. Grünwald. Sudden_Oak_Death_in_Oregon_Forests: Spatial and temporal population dynamics of the sudden oak death epidemic in Oregon Forests. ZENODO, doi:10.5281/zenodo.13007 , 2014.

Kamvar, Z. N., Larsen, M. M., Kanaskie, A. M., Hansen, E. M., & Grünwald, N. J. (2015). Spatial and temporal analysis of populations of the sudden oak death pathogen in Oregon forests. Phytopathology 105:982-989. doi: doi:10.1094/PHYTO-12-14-0350-FI

Ruzica Bruvo, Nicolaas K. Michiels, Thomas G. D'Souza, and Hinrich Schulenburg. A simple method for the calculation of microsatellite genotype distances irrespective of ploidy level. Molecular Ecology, 13(7):2101-2106, 2004.

Author

Zhian N. Kamvar

Examples


data(Pram)
(Pram_replen <- setNames(c(3, 2, 4, 4, 4), locNames(Pram)))
#>  PrMS6A1  Pr9C3A1 PrMS39A1 PrMS45A1 PrMS43A1 
#>        3        2        4        4        4 
fix_replen(Pram, Pram_replen)
#>  PrMS6A1  Pr9C3A1 PrMS39A1 PrMS45A1 PrMS43A1 
#>  3.00000  2.00000  3.99999  4.00000  4.00000 
# Let's start with an example of a tetranucleotide repeat motif and imagine
# that there are twenty alleles all 1 step apart:
(x <- 1:20L * 4L)
#>  [1]  4  8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80
# These are the true lengths of the different alleles. Now, let's add the
# primer sequence to them. 
(PxP <- x + 21 + 21)
#>  [1]  46  50  54  58  62  66  70  74  78  82  86  90  94  98 102 106 110 114 118
#> [20] 122
# Now we make sure that x / 4 is equal to 1:20, which we know each have
# 1 difference.
x/4
#>  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
# Now, we divide the sequence with the primers by 4 and see what happens.
(PxPc <- PxP/4)
#>  [1] 11.5 12.5 13.5 14.5 15.5 16.5 17.5 18.5 19.5 20.5 21.5 22.5 23.5 24.5 25.5
#> [16] 26.5 27.5 28.5 29.5 30.5
(PxPcr <- round(PxPc))
#>  [1] 12 12 14 14 16 16 18 18 20 20 22 22 24 24 26 26 28 28 30 30
diff(PxPcr) # we expect all 1s
#>  [1] 0 2 0 2 0 2 0 2 0 2 0 2 0 2 0 2 0 2 0

# Let's try that again by subtracting a tiny amount from 4
(PxPc <- PxP/(4 - 1e-5))
#>  [1] 11.50003 12.50003 13.50003 14.50004 15.50004 16.50004 17.50004 18.50005
#>  [9] 19.50005 20.50005 21.50005 22.50006 23.50006 24.50006 25.50006 26.50007
#> [17] 27.50007 28.50007 29.50007 30.50008
(PxPcr <- round(PxPc))
#>  [1] 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
diff(PxPcr)
#>  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1