Chapter 4 Simulating population structure
A toy example for this Charpter can be found in gc5k’s Rpub
4.1 Genetic drift
As each locus follows binomial distribution, the genetic drift can be modelled \(\frac{\sqrt{pq}}{2n_e}\), in which \(n_e\) is the effective population size.
4.2 Discrete populations
Algorithm
Generate frequency \(f\) from the uniform distribution \((0.05, 0.95)\).
Given \(F_{st1}\), generating \(z_{1|1}\) and \(z_{2|1}\) from \(Beta(f\frac{1-F_{st1}}{F_{st1}}, (1-f)\frac{1-F_{st1}}{F_{st1}})\), respectively. The mean of them will be \(f\), and their sampling variance will be \(F_{st}\). Similarly, generate \(z_{1|2}\) and \(z_{2|2}\).
Set \(D_1\) and \(D_2\), the realized frequencies of the two populations.
For the three-population simulation \[\begin{equation} F= \left ( \begin{array}{cc} 1 & 0\\ 0 & 1 \\ 0 & 1\\ \end{array} \right ) \left ( \begin{array}{c} z_{1|1}\\ z_{2|1}\\ \end{array} \right ) + \left ( \begin{array}{cc} 0 & 0\\ 1 & 0 \\ 0 & 1\\ \end{array} \right ) \left ( \begin{array}{c} z_{1|2}\\ z_{2|2}\\ \end{array} \right) \end{equation}\]
Below are three simulations generated from 3, 5, and 9 populations, each of which has 100 individuals; 10000 markers are used for each individual. \(D_1\) and \(D_2\) are printed for the three simulations below.
For the five-population simulation,
\[\begin{equation} F=\left ( \begin{array}{cc} 1 & 0 \\ 0.5 & 0.5 \\ 0.5 & 0.5 \\ 0.5 & 0.5 \\ 0 & 1\\ \end{array} \right ) \left ( \begin{array}{c} z_{1|1}\\ z_{2|1}\\ \end{array} \right ) + \left (\begin{array}{cc} 0 & 0 \\ 1 & 0 \\ 0 & 0 \\ 0 & 1 \\ 0 & 0\\ \end{array} \right) \left( \begin{array}{c} z_{1|2}\\ z_{2|2}\\ \end{array} \right) \end{equation}\]
For the nine-population simulation, scheme 1 \[\begin{equation} F=\left ( \begin{array}{cc} 1 & 0 \\ 0.3 & 0.7 \\ 0.3 & 0.7 \\ 0.5 & 0.5 \\ 0.5 & 0.5 \\ 0.5 & 0.5 \\ 0.7 & 0.3 \\ 0.7 & 0.3 \\ 0 & 1 \\ \end{array} \right) \left( \begin{array}{c} z_{1|1}\\ z_{2|1}\\ \end{array} \right) + \left ( \begin{array}{cc} 0 & 0 \\ 0.42 & 0.18 \\ 0.18 & 0.42 \\ 1 & 0 \\ 0 & 0 \\ 0 & 1 \\ 0.42 & 0.18 \\ 0.18 & 0.42 \\ 0 & 0 \\ \end{array} \right ) \left ( \begin{array}{c} z_{1|2}\\ z_{2|2}\\ \end{array} \right ) \end{equation}\]
For the nine-population simulation, scheme 2 \[\begin{equation} F=\left ( \begin{array}{cc} 1 & 0 \\ 0.3 & 0.7 \\ 0.3 & 0.7 \\ 0.5 & 0.5 \\ 0.5 & 0.5 \\ 0.5 & 0.5 \\ 0.7 & 0.3 \\ 0.7 & 0.3 \\ 0 & 1 \\ \end{array} \right) \left( \begin{array}{c} z_{1|1}\\ z_{2|1}\\ \end{array} \right) + \left ( \begin{array}{cc} 0 & 0 \\ 0.3 & 0.0 \\ 0.0 & 0.3 \\ 1 & 0 \\ 0 & 0 \\ 0 & 1 \\ 0.3 & 0.0 \\ 0.0 & 0.3 \\ 0 & 0 \\ \end{array} \right ) \left ( \begin{array}{c} z_{1|2}\\ z_{2|2}\\ \end{array} \right ) \end{equation}\]
4.3 Admixture populations
4.4 Homo & Heteogeneous \(F_{st}\)
4.5 Wishart distribution
R function rWishart
can generate Wishart distribution easiliy.
## [1] 10 10 1000
4.6 Tracy-Widom distribution
R package RMTstat
can help study Tracy-Widom distribution.