Calculating Fst for haploid data in R

How to estimate the fixation index, FST, to test for population differentiation in R

I was looking for a tutorial to estimate FST for my microbial (aka haploid) dataset but sadly there was no specific instruction, although other researchers are definitely looking for it (see here). Sadly, one respondent said that “there is no way to estimate Fst for microbial data”. However, ultimately I found a “way” to estimate Fst, and here is a quick R tutorial.

The fixation index, otherwise known as FST, is population differentiation due to genetic structure and is one of the fundamental concepts in population genetics. Essentially, FST is the proportion of the total genetic variance contained in a subpopulation (the S-subscript) compared to the total genetic variance (the T-subscript). Its value can range from 0 to 1. Higher FST indicates a considerable degree of differentiation among populations. The figure below illustrates what it means for two populations to have very different genetic structures.

Genetic differentiation: an example of the behavior of fixation index... |  Download Scientific Diagram
(Left) Two sub-population have the same genotype composition, hence Fst is 0. On the other hand, (right) if two sub-population have very different genotype compositions, the Fst will be 1.
Continue reading “Calculating Fst for haploid data in R”