Navigation Menu

Skip to content

dgmaghini/GENIE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 

Repository files navigation

GENIE

Description

These scripts parse version 4 of the GENIE database. Users input a set of oncotree codes (LUAD, LUSC, etc.) representing cancer types of interest, and parse_genie.py downloads the relevant files and creates a sample by mutation count matrix indicating whether a particular sample is wild-type (0), mutated (1) or not screened (-1) for that mutation. mutation_counting.py takes the complete mutation table output from parse_genie.py, a list of mutations of interest (plain text file, one gene/mutation per line) and other background mutations of interest (KRAS, EGFR, etc.). This script outputs a tab-delimited file indicating the number of samples that fall into every combination of wild type and mutated for the gene of interest and background mutations.

Prerequisities

  • Python 3.6
  • synapseclient
  • pysftp
  • an account at Synapse.org

Installing

  • synapseclient is easiest installed with conda or can be installed with pip install synapseclient
  • parse_genie.py and mutation_counting.py can be downloaded and run using Python3

Running

parse_genie.py requires a log in to the Synapse database. Run parse_genie.py with:

python3 parse_genie.py -u <synapse_username> -p <synapse_password> -c <oncotree code 1> <oncotree code 2> ...

mutation_counting.py Run mutation_counting.py with:

python3 mutation_counting.py path/to/complete_mutations*.txt path/to/listofgenes.txt output_name.txt -g1 background_gene1 -g2 background_gene2

Authors

Dylan Maghini, (view LinkedIn)

Acknowledgments

Thank you to Monte Winslow and the members of the Winslow lab for critical feedback and ideas on how to extend the use of these scripts.

About

For parsing out mutation data from the GENIE database

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages