Calculates the new measures of genetic differentiation described in Jost 2008 and Hedrick 2005


So you've read how 'Gst and it's relatives do not measure differentiation' and you're itching to see if that panmictic set of populations you've got lying around in an old GENPOP file contains hidden diversity not identified by the 'usual suspects' (e.g., FST or GST). Well, you've come to the right place! Just follow the directions and admire your results.

Note: If you discover any please bugs please send an email. Likewise, if there is something you'd like to see added let me know. Thanks.

Recent News:

The mauscript has been accepted in Molecular Ecology Resources (4/04/10):

Crawford NG. 2010. SMOGD: software for the measurement of genetic diversity. Molecular Ecology Resources, 10, 556-557.

--- more

Update (8/3/09), v1.2.5: Pairwise calculations across loci are here! Sorry for the delay. I've been busy abroad and in the lab. Ugh. I also added bootstrap calculations of 95% confidence intervals and updated the user manual. I just updated the harmonic mean calculation for combining Dest values to include Anne Chao's approximation.

Hmean = 1/[(1/A)+var(D)(1/A)^3] where A is the arithmetic mean and var(D) is the variance of Dest values.

More details here: [link]

Update (6/24/09), v1.2.4: The harmonic mean is calculated across loci for each population. I'll be implementing similar pairwise calculations in the next couple of days (e.g., the harmonic mean of all loci for each possible pair of populations).

Update ( ~ 5/15/09), v1.2.3: Fixed a bug that resulting in improper delineation of populations due to truncation of population names

Update (5/6/09), v1.2.2: Arlequin genotypic format is now accepted.

Update (5/5/09):

Added javascript to ease clutter in the "Recent News" section.

Genepop format now works with files formated with a single line of loci. Loci may be separated either by spaces alone or commas and spaces.

It is no longer necessary to have all individuals within a population named the same. The first individual in the population now lends its name to the population.

Update (4/21/09): You can now download distance matrices for each locus for GST_est, G'ST_est, and Dest. You can also download each table as a text file. Files are time-stamped so you won't download a colleague's data set. I've also written an Arlequin format parser which I'll try to add tomorrow.

Update (4/15/09): Fixed a bug in the bootstrapping algorithm.

Update (4/14/09): Fixed the Hedrick (G'ST_est) equation to account for multiple populations. Thanks for catching this Mark.

Update (4/13/09): I've just included bootstrapping functionality! I'll add a FAQ shortly with more details.

Update (4/10/09): I've added formulae to calculate GST_est, G'ST_est, and Dest. Note, that unlike similar software this program works with multiple loci.

Update (4/8/09): I've made some pretty substantial updates to the original back-end code. The output is now similar to FSTAT and follows the examples in Jost 2008 more accurately. --- less


In the text box on the right 'cut-n-paste' the text of a GenePop or Arlequin file in the two or three digit allele format and click submit. Missing genotypes can be coded as '000000', '0000', 'BADDNA', NA', '?', or '0.'


A user manual can be downloaded here: [manual]

Source Code:

Source code may be downloaded from [github]

How to Cite:

Crawford NG. 2010. SMOGD: software for the measurement of genetic diversity. Molecular Ecology Resources, 10, 556-557.


Aczél J, Daróczy Z. 1975. On measures of information and their characterizations. Mathematics in Science and Engineering, vol. 115, Academic Press, New York, San Francisco, London, 1975, xii + 234 pp. [link to review]

Hedrick, PW. 2005. A Standardized Genetic Differentiation Measure. Evolution 59(8), 1633-1638. [link]

Jost L. 2008. GST and its relatives do not measure differentiation. Molecular Ecology 17(18), 4015-4026. [link]

Nei M. 1973. Analysis of gene diversity in subdivided populations. Proceedings of the National Academy of Sciences, USA., 70(12, Pt 1), 3321-3323. [link]

Nei M, Chesser RK. 1983. Estimation of fixation indices and gene diversities. Annals of Human Genetics. 47(3), 253-259 [link]

Tsallis C, Bigatti E. 2004. Nonextensive statistical mechanics: A brief introduction. Continuum Mechanics and Thermodynamics, 16(3), 223-235. [link]

Paste file here
To reduce load on the server the maximum number of bootstrap replicates allowed is 1000. Setting the number of replicates to zero prevents your data set from being bootstrapped.

Basic Parameters: assumes actual allele frequencies are known

This is essentially Table 1 from Jost (2008), but don't report these values in a paper! The formulae assume that the allele frequencies are known exactly - that you genotyped every individual in the population. You want to report the Estimated Parameters which account for small sample sizes.

Estimated Parameters: diversity measures for small sample sizes

These parameters incorporate the HS_est and HT_est nearly unbiased estimator's from Nei (1983) as well as a few other modifications to the formulae (see Jost 2008, pg. 4022) They should better account for small sample sizes.

Note: it's possible to get negative values for GST_est, G'ST_est, and Dest if you are comparing populations with virtually identical gene frequencies. Try not to panic. You may just want to report these values as zero.