Chapter 2 Protocols
2.1 Protocols for selection
Constructing genetic relationship matrix \(\mathbf{G}\). The difference between a random mating population and inbred population is the way \(\mathbf{G}\) is constructed. So for a pair of individuals \(i\) and \(j\), \[G_{ij}=\frac{1}{\tilde{m}}\sum_l^{\tilde{m}}\frac{(x_{il}-2p_l)(x_{jl}-2p_l)}{2(1+F)p_lq_l}\] and \(F=1\) for inbred lines. Please refer to Chapter 1.
Conducting eigenanalysis for \(\mathbf{G}\), see it definition (1.2).
Linear regression analysis for each SNP.
2.2 Protocal for EigenGWAS
A simple analysis pipeline written in Rscripts may be found at Rpub and its Shiny demo.
Or, try it as below in RStudio
runGitHub("Rshiny", "gc5k")
2.3 Protocol for predicted eigenvectors
The prediction accuracy can be written as \[R^2 \approx \frac{1}{1+\frac{n_e}{m}}\]
2.4 Nucleotide diversity
\(\theta\), the normalized observed number of variants sites
\(\pi\), the observed heterozygosity per base pair
Nonetheless, we can expect that favored alleles will generally sit whthin large shared haplotypes, and that these haplotypes will be in sharp contrast with more variable haplotypes on the unselected background. Voight, et al, PLoS Biology, 2006, 4, e72.
However, the excess variability around the site of F/S amino-acid polymorphism is found within S haplotypes, and F haplotypes are depauperate in variability, suggesting that the F allele is a derived variant that has recently swept to an intermediate frequency. Charsworth & Charsworth, Heredity, 2017, 118:2-9.
2.5 Other test statistics for selection
1 Tajima’s \(D\), Statistical method for testing the neutral mutation hypothesis by DNA polymorphism, 1989, \(Genetics\), 123:585-95
2 Fu & Li’s \(D\), Statistical tests of neutrality of mutations, 1993, \(Genetics\), 133:693-709
3 Fay & Wu’s \(H\), Hitchhiking under positive Darwinian selection, 2000, \(Genetics\), 155:1405-13
4 McDonald and Kreitman test, Adaptive protein evolution at the Adh locus in Drosophila, 1991, \(Nature\), 335:167-70
5 HKA test, Hudson, Kreitman, Aguade, A test of neutral molecular evolution based on nucleotide data, 1987, \(Genetics\), 166:153-9)
6 Extended haplotype homozygosity (EHH) The biological ground please refer to \(Nature\), 419:832-7 and the definition of the test statistic please refer to Voight et al \(PLoS Biology, 2006, 4, e72\)
Unstandardized \(iHS=ln\frac{iHH_A}{iHH_D}\)
and standardized \(iHS=\frac{ln[\frac{iHH_A}{iHH_D}]-E_p[ln(\frac{iHH_A}{iHH_D})]}{SD_P[ln(\frac{iHH_A}{iHH_D})]}\)
It has been realized in
rehh: an R package to detect footprints of selection in genome-wide SNP data from haplotype structure, \(Bioinformatics\), 2012, 8:1176-7
selscan: An Efficient Multithreaded Program to Perform EHH-Based Scans for Positive Selection, 2014, \(MBE\), 31:2824-7, selcan’s github
PopGenome, R package:, Pfeifer, B. et al. (2014) PopGenome: An Efficient Swiss Army Knife for Population Genomic Analyses in R. Mol Biol Evol 31(7): 1929-1936.