Skip to content
/ LodSeq Public

LodSeq performs the genetic linkage analysis across families, by computing lod-scores given a gvcf file and a related tfam pedigree file

License

Notifications You must be signed in to change notification settings

CNRGH/LodSeq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

image

image

image

LODSEQ

LodSeq performs the genetic linkage analysis across families, by computing lod-scores given a gvcf file and a related tfam pedigree file.

Both singlepoint and multipoint parametric linkage analyses are computed using merlin (Abecasis et al. Nature Genetics 2002).

CITATION

Please cite LodSeq using the following DOI: 10.5281/zenodo.1321178.

PREREQUISITES

INSTALLATION

Download genetic maps from the HapMap project:

$ cd data/inputs/
$ wget -nc ftp://ftp.ncbi.nlm.nih.gov/hapmap/recombination/2011-01_phaseII_B37/genetic_map_HapMapII_GRCh37.tar.gz
$ mkdir genetic_map_HapMapII_GRCh37 \
  && tar -zxf genetic_map_HapMapII_GRCh37.tar.gz -C genetic_map_HapMapII_GRCh37
$ cd ../..

Once the prerequisites have been installed, configure the environment. vcftools, plink and merlin will be installed by conda:

$ conda env create -n lodseq --file environment.yaml

Then, check that the external programs have been properly installed:

$ source activate lodseq
$ which vcftools && vcftools
$ which plink && plink
$ which merlin && merlin
$ source deactivate lodseq

QUICK START

The quick start dataset was arbitrarily modified from this available dataset:

https://github.com/jmchilton/SnpEffect/tree/master/papers/Protocol/data/chr7.vcf.gz

  • The pedigree was reduced to 6 individus.
  • Some genotypes were modified replacing phased genotypes by unphased genotypes.
  • Some records were removed to reduce the final file size.

Activate the environment

$ source activate lodseq

Run the workflow (using 2 threads)

$ mkdir test/
$ snakemake -s Snakefile --configfile config.yaml -j 2

Exit

$ source deactivate lodseq

RESULTS

The lod-score results are output into directory test/.

You can choose a different output directory by changing the value of the field out_dir into config.yaml.

Please compare your results to the expected output files:

$ cd data/outputs/mergeResults/
$ ls -1
    results_multipoint_merged_dominant.txt
    results_multipoint_merged_recessive.txt
    results_singlepoint_merged_dominant.txt
    results_singlepoint_merged_recessive.txt

RESULT EXAMPLE

Parametric singlepoint analysis using a recessive model:

$ less test/runSinglePointMerlin/7/results_singlepoint_chr7_recessive.txt
Parametric Analysis, Model Recessive_Model
=======================================================
  POSITION        LOD      ALPHA       HLOD
   7:16487      0.000      0.000      0.000
   7:16671      0.000      0.000      0.000
   7:16692      0.000      0.000      0.000
   7:16712      0.000      0.000      0.000
   7:16717      0.000      0.000      0.000
   7:16719      0.000      0.000      0.000
   7:16787      0.000      0.000      0.000
   7:16798      0.000      0.000      0.000
   7:16878      0.000      0.000      0.000
   7:17881      0.000      0.000      0.000
   7:18510      0.000      0.000      0.000
   7:19181     -2.575      0.000      0.000

See documentation of merlin for a complete description.

WARNINGS

Loci with bad inheritance or loci with multichar allele variation will have a null lod-score (-0.000 or 0.000).

RUN LODSEQ ON A NEW DATASET

Edit the file config.yaml to change parameters of the workflow.

Description of the fields into config.yaml:

vcf            # input gvcf file (.vcf or .vcf.gz)
tfam           # input pedigree tfam file
dom_model      # dominant model file
rec_model      # recessive model file
genetic_maps   # path of the HapMap genetic maps directory
out_dir        # path of the directory containing LodSeq output files
out_log        # path of a log file
out_prefix     # prefix of output files
lod_threshold  # minimal significant lod-score value, must be greater than 0
threads        # number of threads used by multithread steps
chromosomes    # list of chromosomes to analyze

You can also change the parameters of the recessive and dominant models:

$ less data/inputs/parametric_dominant.model
$ less data/inputs/parametric_recessive.model

The parameters of these models are respectively:

  • affection name
  • disease allele frequency
  • penetrances
  • model name

See a complete description and examples on merlin website and here.

LODSEQ PIPELINE

image

AUTHORS

Edith Le Floch, Centre National de Recherche en Génomique Humaine - CEA, Evry, France, edith.le-floch@cea.fr

Elise Larsonneur, Centre National de Recherche en Génomique Humaine - CEA, Evry, France, elise.larsonneur@cea.fr

CONTACT

For help please contact the authors.

LICENSING

LodSeq is released under the terms of the CeCILL license, a free software license agreement adapted to both international and French legal matters that is fully compatible with the GNU GPL, GNU Affero GPL and/or EUPL license.

For further details see LICENSE file or check out http://www.cecill.info/.

About

LodSeq performs the genetic linkage analysis across families, by computing lod-scores given a gvcf file and a related tfam pedigree file

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published