So you've read how **'Gst and it's relatives do not measure differentiation'** and you're
itching to see if that panmictic set of populations you've got lying around in an old GENPOP file
contains hidden diversity not identified by the 'usual suspects' (e.g., F_{ST} or G_{ST}).
Well, you've come to the right place! Just follow the directions and admire your results.

Note: If you discover any please bugs please send an email. Likewise, if there is something you'd like to see added let me know. Thanks.

The mauscript has been accepted in Molecular Ecology Resources (*4/04/10*):

Crawford NG. 2010. SMOGD: software for the measurement of genetic diversity. Molecular Ecology Resources, 10, 556-557.

Update (*8/3/09*), v1.2.5:
Pairwise calculations across loci are here! Sorry for the delay. I've been busy abroad and in the lab. Ugh. I also
added bootstrap calculations of 95% confidence intervals and updated the user manual.
I just updated the harmonic mean calculation for combining D_{est} values to include Anne Chao's approximation.

Hmean = 1/[(1/A)+var(D)(1/A)^3] where A is the arithmetic mean and var(D) is the variance of D_{est} values.

More details here: [link]

Update (*6/24/09*), v1.2.4:
The harmonic mean is calculated across loci for each population. I'll be implementing similar pairwise calculations in the next couple of days (*e.g.*, the harmonic mean of all loci for each possible pair of populations).

Update (* ~ 5/15/09*), v1.2.3:
Fixed a bug that resulting in improper delineation of populations due to truncation of population names

Update (*5/6/09*), v1.2.2:
Arlequin genotypic format is now accepted.

Update (*5/5/09*):

Added javascript to ease clutter in the "Recent News" section.

Genepop format now works with files formated with a single line of loci.
Loci may be separated either by spaces alone or commas and spaces.

It is no longer necessary to have all individuals within a population named the same.
The first individual in the population now lends its name to the
population.

Update (*4/21/09*): You can now download distance matrices for each
locus for G_{ST_est}, G'_{ST_est}, and D_{est}. You can also download each
table as a text file. Files are time-stamped so you won't download a colleague's data set. I've also
written an Arlequin format parser which I'll try to add tomorrow.

Update (*4/15/09*): Fixed a bug in the bootstrapping algorithm.

Update (*4/14/09*): Fixed the Hedrick (G'_{ST_est}) equation to account for multiple populations. Thanks for
catching this Mark.

Update (*4/13/09*): I've just included bootstrapping functionality! I'll add a FAQ shortly with more
details.

Update (*4/10/09*): I've added formulae to calculate G_{ST_est}, G'_{ST_est},
and D_{est}. Note, that unlike similar software this program works with multiple loci.

Update (*4/8/09*): I've made some pretty substantial updates to the original back-end code. The output is now
similar to FSTAT and follows
the examples in Jost 2008 more accurately.

In the text box on the right 'cut-n-paste' the text of a GenePop or Arlequin file in the two or three digit allele format and click submit. Missing genotypes can be coded as '000000', '0000', 'BADDNA', NA', '?', or '0.'

A user manual can be downloaded here: [manual]

Source code may be downloaded from [github]

Crawford NG. 2010. SMOGD: software for the measurement of genetic diversity. Molecular Ecology Resources, 10, 556-557.

Aczél J, Daróczy Z. 1975. On measures of information and their characterizations. Mathematics in Science and Engineering, vol. 115, Academic Press, New York, San Francisco, London, 1975, xii + 234 pp. [link to review]

Hedrick, PW. 2005. A Standardized Genetic Differentiation Measure. Evolution 59(8), 1633-1638. [link]

Jost L. 2008. G_{ST} and its relatives do not measure differentiation. Molecular Ecology 17(18), 4015-4026.
[link]

Nei M. 1973. Analysis of gene diversity in subdivided populations. Proceedings of the National Academy of Sciences, USA., 70(12, Pt 1), 3321-3323. [link]

Nei M, Chesser RK. 1983. Estimation of fixation indices and gene diversities. Annals of Human Genetics. 47(3), 253-259 [link]

Tsallis C, Bigatti E. 2004. Nonextensive statistical mechanics: A brief introduction. Continuum Mechanics and Thermodynamics, 16(3), 223-235. [link]

**n**= number of populations**D**= absolute differentiation (Nei 1973)_{ST}**G**= relative differentiation (Nei 1973)_{ST}**H**= between-subpopulation heterozygosity (Aczel & Daroczy 1975; Tsallis & Brigatti 2004)_{ST}**Δ**= between-subpopulation component of diversity, or the effective number of distinct subpopulations (Jost 2008)_{ST}**D**= actual differentiation (Jost 2008)**H**= proportion intra-population heterozygosity vs total heterozygosity (Jost 2008)_{S}/H_{T}**Δ**= proportion of total diversity that is contained in the average subpopulation (Jost 2008)_{S}/Δ_{T}

This is essentially Table 1 from Jost (2008), but don't report these values in a paper! The formulae assume that the allele frequencies are known exactly - that you genotyped every individual in the population. You want to report the Estimated Parameters which account for small sample sizes.

**Ñ**= harmonic mean of population sizes**H**= nearly unbiased estimator of within-subpopulation heterozygosty (Nei 1983)_{S_est}**H**= nearly unbiased estimator of total-subpopulation heterozygosty (Nei 1983)_{T_est}**H**= nearly unbiased estimator of between-subpopulation heterozygosity (Nei 1983)_{ST_est}**G**= nearly unbiased estimator of relative differentiation (Nei 1983)_{ST_est}**G'**= standardized measure of genetic differentiation (Hedrick 2005)_{ST_est}**D**= estimator of actual differentiation (Jost 2008)_{est}

These parameters incorporate the H_{S_est} and H_{T_est} nearly unbiased estimator's from Nei (1983) as
well as a few other modifications to the formulae (see Jost 2008, pg. 4022) They should better account for small sample sizes.

Note: it's possible to get **negative values** for G_{ST_est}, G'_{ST_est}, and D_{est} if
you are comparing populations with virtually identical gene frequencies. Try not to panic. You may just want to
report these values as zero.