Skip to content

Schraiber/hardyweinberg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 

Repository files navigation

hardweinberg

This set of R functions will perform maximum likelihood estimation of inbreeding coefficients in small sample sizes by fitting the exact distribution of the number of heterozygotes. Further, it can perform composite likelihood estimation of inbreeding coefficients in a single individual.

Using the scripts

Installation

Calling source("HWtest.r") to load the scripts. The only required package is EMT.

Data input format

Data is required to be in a format similar, but not identical to, a VCF file. The required columns are: 1) chromosome, 2) position, 3) number of heterozygotes, 4) allele frequency, 5-end) a 1 for each individual who is a heterozygote, and a 0 for each individual who is homozygous (either reference or alternative). For example, the following data corresponds to the 9 YRI individuals in the Complete Genomics dataset:

 V1     V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13
  1 564654  0  8  0  0  0  0  0   0   0   0   0
  1 564936  0  2  0  0  0  0  0   0   0   0   0
  1 565317  0  2  0  0  0  0  0   0   0   0   0
  1 565371  1  1  0  0  0  0  0   1   0   0   0
  1 713698  2  2  0  1  0  0  0   0   1   0   0
  1 721119  1  1  0  0  0  1  0   0   0   0   0

Batch inbreeding coefficient

To compute the liklelihood of the data assuming a single inbreeding coefficient for all individuals in the dataset, use the function LL_combined. It requirest he data to be preprocessed by the function counts_from_VCF on the input data. Computing the maximum likelihood estimate of F can then be done using the optim function in R. For example,

> head(YRI.VCF)
 V1     V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13
  1 564654  0  8  0  0  0  0  0   0   0   0   0
  1 564936  0  2  0  0  0  0  0   0   0   0   0
  1 565317  0  2  0  0  0  0  0   0   0   0   0
  1 565371  1  1  0  0  0  0  0   1   0   0   0
  1 713698  2  2  0  1  0  0  0   0   1   0   0
  1 721119  1  1  0  0  0  1  0   0   0   0   0
> myCounts = counts_from_VCF(YRI.VCF)
> LL_combined(.01,MyCounts)
> res = optim(.5, function(F) { -LL_combined(F,myCounts) },method="L-BFGS-B",
lower = .00001, upper = .9999)

It is also possible to use the function LL_combined_beta to compute the likelihood under an assumption of Beta-distributed F among individuals. The input is a vector of two parameters, corresponding to the alpha and beta parameters of the Beta distribution.

Inbreeding coefficients per individual and per chromosome

To use composite likelihood to estimate an F per each individual and each chromosome, use the function F_per_chrom. The output is a matrix giving an estimate of F for each chromosome (columns) of each individual (rows).

It may be desirable to obtain block jackknife confidence intervals for these estimates. This can be done using the functions jackkife_F (which expects a VCF file with only a SINGLE chromosome) or jackknife_F_genome (which will perform the operation genome-wide). Note that these require as input the size of the block to remove, in base pairs. These functions can take quite a long time to run.

Citation

If you use this code, please cite this paper: Schraiber, Joshua G., Stephannie Shih, and Montgomery Slatkin. "Genomic tests of variation in inbreeding among individuals and among chromosomes." Genetics 192.4 (2012): 1477-1482.

About

Code associated with Schraiber et al (2012), Genetics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages