In this chapter, we will utilize AMOVA to analyze our populations. AMOVA stands for Analysis of MOlecular VAriance and is a method to detect population differentiation utilizing molecular markers (Excoffier, Smouse & Quattro, 1992). This procedure was initially implemented for DNA haplotypes, but applies to any marker system. The implementation of AMOVA in poppr requires two very basic components: (1) A distance matrix derived from the data and (2) a separate table used to partition the data into different stratifications.

The distance matrix can be calculated using any distance as long as it is euclidean. The distance that is used in the program Arlequin is the opposite of the Kronecker Delta function that counts the number of differences summed over $$L$$ loci:

$\delta_{l,m} = \begin{cases} 1 \text{ if } m = l,\\ 0 \text{ if } m \neq l \end{cases}$ $d_{i,j} = \sum_{L = 1}^L 1 - \delta_{i,j}$

## Data set

To calculate AMOVA in poppr, one simply needs to supply a data set with stratifications. We will use the Aphanomyces euteiches data set from (Grünwald & Hoheisel, 2006).

library("poppr")
data("Aeut")
strata(Aeut) <- data.frame(other(Aeut)$population_hierarchy) Aeut <- as.genclone(Aeut) Aeut ## ## This is a genclone object ## ------------------------- ## Genotype information: ## ## 119 original multilocus genotypes ## 187 diploid individuals ## 56 dominant loci ## ## Population information: ## ## 3 strata - Pop_Subpop, Pop, Subpop ## 2 populations defined - Athena, Mt. Vernon We can see that this data set contains clonal data and has three stratifications where the first is really a combination of the other levels. We can take a look at the different stratifications, populations or subpopulations: table(strata(Aeut, ~Pop)) # Populations ## ## Athena Mt. Vernon ## 97 90 table(strata(Aeut, ~Pop/Subpop, combine = FALSE)) # Subpopulations ## Subpop ## Pop 1 2 3 4 5 6 7 8 9 10 ## Athena 9 12 10 13 10 5 11 8 10 9 ## Mt. Vernon 10 6 8 12 17 12 12 13 0 0 In this example, we have a data set of 187 individuals sampled from two fields located in Athena or Mt. Vernon over 8 or 10 different soil samples within each field. We want to see if most of the variance is observed at the sample, field, or regional level. ## Analysis In panmictic populations, we would expect to see most of the variance arise from within samples. If we see that the most of the variance occurs among samples within populations or among populations, then there is evidence that we have some sort of population structure. In the case of clonal organisms, this would help support a hypothesis of clonal reproduction. Since Aphanomyces eutieches is known to be clonal, we would not expect most of the variance to come from within samples. Let’s invoke the AMOVA functions with and without clone correction: Aeutamova <- poppr.amova(Aeut, ~Pop/Subpop) Aeutamovacc <- poppr.amova(Aeut, ~Pop/Subpop, clonecorrect = TRUE) We’ll look at the AMOVA results for both analyses. Aeutamova ##$call
## ade4::amova(samples = xtab, distances = xdist, structures = xstruct)
##
## $results ## Df Sum Sq Mean Sq ## Between Pop 1 1051.2345 1051.234516 ## Between samples Within Pop 16 273.4575 17.091091 ## Within samples 169 576.5059 3.411277 ## Total 186 1901.1979 10.221494 ## ##$componentsofcovariance
##                                            Sigma          %
## Variations  Between Pop                11.063446  70.006786
## Variations  Between samples Within Pop  1.328667   8.407483
## Variations  Within samples              3.411277  21.585732
## Total variations                       15.803391 100.000000
##
## $statphi ## Phi ## Phi-samples-total 0.7841427 ## Phi-samples-Pop 0.2803128 ## Phi-Pop-total 0.7000679 Aeutamovacc ##$call
## ade4::amova(samples = xtab, distances = xdist, structures = xstruct)
##
## $results ## Df Sum Sq Mean Sq ## Between Pop 1 741.9872 741.987234 ## Between samples Within Pop 16 185.6877 11.605483 ## Within samples 123 520.1123 4.228555 ## Total 140 1447.7872 10.341337 ## ##$componentsofcovariance
##                                             Sigma          %
## Variations  Between Pop                10.4131525  66.777680
## Variations  Between samples Within Pop  0.9520545   6.105355
## Variations  Within samples              4.2285550  27.116965
## Total variations                       15.5937620 100.000000
##
## $statphi ## Phi ## Phi-samples-total 0.7288303 ## Phi-samples-Pop 0.1837727 ## Phi-Pop-total 0.6677768 The first thing to look at are the $results element. The degrees of freedom (the Df column) should match what we expect from our (not clone-corrected) data. The number in the Total row should equal 186 or $$N - 1$$, where values are calculated from pooled data. Note that here, “samples” actually refers to subpopulations since we cannot asses within-sample variance of dominant data.

The $componentsofcovariance table shows how much variance is detected at each stratification. We expect variations within samples to give the greatest amount of variation for populations that are not significantly differentiated. Sigma represents the variance, $$\sigma$$, for each hierarchical level and to the right is the percent of the total. Finally, $statphi provides the $$\phi$$ population differentiation statistics. These are used to test hypotheses about population differentiation. We would expect a higher $$\phi$$ statistic to represent a higher amount of differentiation.

Note, if you want to make a table of any of these components, you can isolate them by using the $ operator and then export it to a table with write.table. Here’s an example with the components of covariance: write.table(Aeutamova$componentsofcovariance, sep = ",", file = "~/Documents/AeutiechesAMOVA.csv")

## Significance testing

To test if populations are significantly different, we perform a randomization test using the function randtest() from the ade4 package. This will randomly permute the sample matrices as described in (Excoffier et al., 1992).

set.seed(1999)
Aeutsignif   <- randtest(Aeutamova, nrepet = 999)
Aeutccsignif <- randtest(Aeutamovacc, nrepet = 999)
plot(Aeutsignif)

Aeutsignif
## class: krandtest lightkrandtest
## Monte-Carlo tests
## Call: randtest.amova(xtest = Aeutamova, nrepet = 999)
##
## Number of tests:   3
##
## Adjustment method for multiple comparisons:   none
## Permutation number:   999
##                         Test       Obs    Std.Obs   Alter Pvalue
## 1  Variations within samples  3.411277 -31.902575    less  0.001
## 2 Variations between samples  1.328667  20.986193 greater  0.001
## 3     Variations between Pop 11.063446   9.120263 greater  0.001

From this output, you can see three histograms representing the distribution of the randomized strata. The black line represents the observed data. You can see a table of observed results in the output showing that there is significant population structure considering all levels of the population strata. Of course, this could be due to the presence of clones, so let’s visualize the results from the clone corrected data set below:

plot(Aeutccsignif)

Aeutccsignif
## class: krandtest lightkrandtest
## Monte-Carlo tests
## Call: randtest.amova(xtest = Aeutamovacc, nrepet = 999)
##
## Number of tests:   3
##
## Adjustment method for multiple comparisons:   none
## Permutation number:   999
##                         Test        Obs    Std.Obs   Alter Pvalue
## 1  Variations within samples  4.2285550 -22.250873    less  0.001
## 2 Variations between samples  0.9520545   9.821081 greater  0.001
## 3     Variations between Pop 10.4131525   9.983940 greater  0.001

The above graphs show significant population differentiation at all levels given that the observed $$\phi$$ does not fall within the distribution expected from the permutation. Compare the results of our AMOVA analysis to those published in (Grünwald & Hoheisel, 2006). They should be identical.

## Randomized population structure

Since AMOVA is used to detect whether or not there is significant population structure, we can see what happens when we randomly shuffle the population assignments in our data. Here we will show what the populations look like before and after shuffling:

Aeut.new <- Aeut
head(strata(Aeut)[, -1])
##        Pop Subpop
## 001 Athena      1
## 002 Athena      1
## 003 Athena      1
## 004 Athena      1
## 005 Athena      1
## 006 Athena      1
set.seed(9001)
head(strata(Aeut)[sample(nInd(Aeut)), -1])
##            Pop Subpop
## 044     Athena      4
## 177 Mt. Vernon      8
## 036     Athena      4
## 127 Mt. Vernon      4
## 008     Athena      1
## 039     Athena      4

Here we see that the populations are completely shuffled, so in the next step, we will reassign the strata with these newly shuffled populations and rerun the AMOVA analysis.

set.seed(9001)
strata(Aeut.new) <- strata(Aeut)[sample(nInd(Aeut)), -1]
Aeut.new.amova         <- poppr.amova(Aeut.new, ~Pop/Subpop)
Aeut.new.amova
## $call ## ade4::amova(samples = xtab, distances = xdist, structures = xstruct) ## ##$results
##                             Df     Sum Sq   Mean Sq
## Between Pop                  1   23.20576 23.205765
## Between samples Within Pop  16  136.33757  8.521098
## Within samples             169 1741.65452 10.305648
## Total                      186 1901.19786 10.221494
##
## $componentsofcovariance ## Sigma % ## Variations Between Pop 0.1588974 1.544009 ## Variations Between samples Within Pop -0.1733264 -1.684217 ## Variations Within samples 10.3056481 100.140207 ## Total variations 10.2912190 100.000000 ## ##$statphi
##                            Phi
## Phi-samples-total -0.001402074
## Phi-samples-Pop   -0.017106288
## Phi-Pop-total      0.015440091
Aeut.new.amova.test    <- randtest(Aeut.new.amova, nrepet = 999)
Aeut.new.amova.test
## class: krandtest lightkrandtest
## Monte-Carlo tests
## Call: randtest.amova(xtest = Aeut.new.amova, nrepet = 999)
##
## Number of tests:   3
##
## Adjustment method for multiple comparisons:   none
## Permutation number:   999
##                         Test        Obs    Std.Obs   Alter Pvalue
## 1  Variations within samples 10.3056481  0.3775001    less  0.605
## 2 Variations between samples -0.1733264 -0.7375843 greater  0.746
## 3     Variations between Pop  0.1588974  1.2656780 greater  0.123
plot(Aeut.new.amova.test)

We see that there now is no significant population structure.

## Conclusions

AMOVA is a powerful tool that can help support hypotheses of population structure due to clonal reproduction or isolation without making assumptions about Hardy-Weinberg equilibrium. We have shown that we can reject the $$H_o$$ of random mating between the two populations and have strong evidence that these populations are significantly differentiated at all stratifications (Grünwald & Hoheisel, 2006). From these results, we can investigate hypotheses as to why these populations are significantly differentiated.

This example was performed with a data set of dominant (AFLP) markers, but it can also be performed on codominant markers such as SNPs or SSRs. These provide more information because within sample (individual) variance is also assessed. If one wants to utilize a genetic distance that has biological relevance, a different distance matrix can be chosen. See help('amova', package = 'poppr') for more details.

## References

Excoffier L., Smouse PE., Quattro JM. 1992. Analysis of molecular variance inferred from metric distances among dna haplotypes: Application to human mitochondrial dna restriction data. Genetics 131:479–491. Available at: http://www.genetics.org/content/131/2/479.abstract

Grünwald NJ., Hoheisel G-A. 2006. Hierarchical analysis of diversity, selfing, and genetic differentiation in populations of the oomycete Aphanomyces euteiches. Phytopathology 96:1134–1141. Available at: http://apsjournals.apsnet.org/doi/abs/10.1094/PHYTO-96-1134