Skip to content
This repository has been archived by the owner on Jan 31, 2020. It is now read-only.

Post processing of Somatic Variation Models

zskidmor edited this page Apr 30, 2014 · 3 revisions

Contents

Overview

Post-processing of somatic-variation models can be accomplished through the use of the following Genome Model Tool: 'gmt somatic process-somatic-variation'

This tool deduplicates variant calls, filters out off-target sites, tiers the variants, adds dbSNP and GMAF information, etc. Appropriate files for manual review (XML and bed files) can also be generated.

If processing a large number of samples, you can give them all the same output-directory. Each model's output will be stored in a subdirectory with the sample name, and the manual-review files will all be grouped together in a review/ subdirectory.

This tool has many options, but a typical command might look like this:

$ gmt somatic process-somatic-variation --somatic-variation-model-id 12345678 --output-dir somatic-validation --add-dbsnp-and-gmaf --add-tiers --restrict-to-target-regions --create-review-files --tiers-to-review 1 --igv-reference-name=b37

Options that may be useful to you:

  • --add-dbsnp-and-gmaf appends columns with dbSNP ids and global minor allele frequency (GMAF)
  • --add-tiers appends a column containing the tier of each variant
    
  • --create-review-files generates bed and xml files necessary to do manual review
    
  • --tiers-to-review choose which tiers of variants should be placed into the bed files for review (default: 1)
    
  • --igv-reference-name provide the reference name for the IGV session (most commonly: b37)
    
  • --get-readcounts append readcounts from the normal and tumor bams for each variant
    
  • --restrict-to-target-regions only keep calls in target-regions (as specified by the target_regions on the build)
    

Less commonly used options:

  • --filter-regions
  • --filter-sites Pass in either a list of variants to remove or a list of regions from which all variant calls will be removed.
    
  • --sites-to-pass Pass in a list of sites that should be output, even if they would otherwise be filtered
    
  • --required-snv-callers If set to a value greater than 1, requires that N variant callers independently call a variant before it is reported. Occassionally useful for filtering noisy data down to a manageable list.
    
  • --sample-name This name will be used for the output directory, instead of the subject_name from the model.
    

Output:

  • samplename/
    
  •     snvs.indels.annotation Final output, containing filtered and annotated variants. Other columns are appended based on options above (tier, dbsnp, readcounts, etc)
    
  •     snvs.indels.annotation.xls The same data as above, converted to xls format
    
  •     snvs/  intermediate files from filtering, annotation, etc.
    
  •     indels/ intermediate files from filtering, annotation, etc.
    
  • review/ - files generated for manual review
    
  •     samplename.xml
    
  •     samplename.bed
    
Clone this wiki locally