Skip to content

mikegloudemans/insulin-resistance-colocalization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

96 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Insulin Resistance Colocalization Analysis

Analysis performed by Mike Gloudemans and Brunilda Balliu

With contributions from Daniel Nachun, Matthew Durrant, Martin Wabitsch, Erik Ingelsson, Thomas Quertermous, Joshua Knowles, Stephen Montgomery, and Ivan Cárcamo-Oribe

You can view PDFs of the full heatmaps for all tested loci at https://zenodo.org/record/4659095.

Summary

This project contains the scripts required to perform the colocalization analyses described in the paper.

A generalized tool for generating similar heatmaps for any set of GWAS and QTL studies is at https://github.com/mikegloudemans/post-coloc-toolkit/.

The top-level project directory should contain the folders bin, data, output, tmp, and scripts. All scripts should be run from this top-level directory, or they'll be unable to locate the required files.

The complete analysis can be run sequentially from pipe.sh in the top-level project directory. NOTE: The scripts can ONLY be run from this directory, as the file paths are specified relative to this top level.

Required Tools / Dependencies

The following analyses were performed using some other publicly available tools. To fully complete the analyses, you will have to install or link these tools within the bin subfolder, or modify the pipe.sh script to include the paths to the directories where these tools are installed.

Getting started

  • An hg38-formatted version of the 1000 Genomes VCF is required for computing allele frequencies in a reference population.
  • GWAS summary statistics are publicly available; consistently-formatted versions of these and other GWAS can be downloaded directly.
  • GTEx v8 QTL association statistics can be downloaded from the GTEx Portal. Some minor pre-processing will be required to run these scripts; this process is described here.

All required data

If you're having trouble accessing any of these data, please contact me (see Contact section below) and I can quickly point you to the right location to obtain them, or set up a direct transfer.

To run all the scripts listed here, the data folder must contain all of the following files:

  • data/1KG: 1KG VCF for hg38, publicly available for download as described above.
  • data/eqtls: GTEx eQTLs for v8, and any other QTLs of interest, downloadable from GTEx Portal as described above.
  • data/gwas: All publicly available GWAS summary statistics, downloadable as described above. If already re-formatted for colocalization, place them in a formatted subdirectory; if not, place them in a raw subdirectory.

The rest of these files are not explicitly necessary, but were required to obtain the full results of the post-processing steps.

  • data/cadd: Should contain a CADD file with VEP consequence predictions for every possible variant. May be skipped if necessary.
  • data/hgnc: The file mart_export.txt should contain a basic mapping of Ensembl gene IDs to HGNC names, obtained through Biomart. This step can be omitted if such a file is not present.
  • data/ld: Pairwise LD scores from 1K Genomes Phase 1, used for selecting LD buddies in the Variant Effect Predictor annotation step, which can also be skipped if necessary.

About the scripts

I've broadly organized the scripts into "pre-coloc", "colocalization", and "post-coloc" sections. Most of the code in this project is focused on the "post-coloc" analysis, since the other two parts of the analysis lean heavily on code from other projects (described and linked above).

Scripts to generate figures for the paper are in a dedicated folder.

Many of these scripts have wide-ranging applications but are currently geared towards our IR-specific application. Feel free to contact me if you're trying to figure out how to gear a particular script for your own analysis; some customization may be necessary but it's certainly doable.

I'm also working on a general Snakemake pipeline "post-coloc-toolkit", which I hope to make available soon, and will work not only on our IR data but on any set of GWAS and QTL summary statistics! I will link it from here when it's public.

Contact

For any questions about this pipeline or about the colocalization-related analyses for this project, please contact Mike Gloudemans (mgloud@stanford.edu). I'll be glad to help you get these analyses up and running!

Quick note on research ethics

We're making this code and the associated data available in hopes that it will help to further biomedical research. Please be mindful of the ethical implications of your intended application, and use it for good :)

About

Colocalization analyses of insulin resistance-related GWAS

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published