Skip to content

This code takes output from the VarScan pipeline and calculates Cochran-Mantel-Haenszel chi-squared tests on stratified contingency tables of read counts standardized by population ploidy.

CoAdapTree/cmh_test

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 

Repository files navigation

DOI Codacy Badge


Usage

If you use or are inspired by code from this repo, please site related manuscripts and data:

  • Zenodo - contains an archived release of this repository - DOI

    Lind B (2021) GitHub.com/brandonlind/cmh_test: preprint release (Version 1.0.0). Zenodo. http://doi.org/10.5281/zenodo.5083798
    

cmh_test

Using ipcluster engines to parallelize calculations from varscan_pipeline outfiles, calculate Cochran-Mantel-Haenszel chi-squared tests on stratified contingency tables.

Each stratum is a population. Each population has a "case" pool and a "control" pool. Together, these case and control pools make the contingency table. Each contingency table is 2x2 - case and control x REF and ALT allele counts.

ALT and REF allele counts are calculated by multiplying the ploidy of the pool by ... ... either the ALT freq or (1-ALT_freq).

Assumed environment

This code was written and tested with python 3.7.6. It seemed that python3.8 had issues with parallelization implementation; this issue was not addressed in current version.

Module versions used can be mirrored with pip install -r requirements.txt

Usage

usage: cmh_test.py [-h] -i INPUT -o OUTDIR --case CASE --control CONTROL -p
                   PLOIDYFILE -e ENGINES [--ipcluster-profile PROFILE]

optional arguments:
  -h, --help            show this help message and exit
  --ipcluster-profile PROFILE
                        The ipcluster profile name with which to start engines. Default: 'default'

required arguments:
  -i INPUT, --input INPUT
                        /path/to/VariantsToTable_output.txt
                        It is assumed that there is either a 'locus' or 'unstitched_locus' column.
                        The 'locus' column elements are the hyphen-separated
                        CHROM-POS. If the 'unstitched_chrom' column is present, the code will use the
                        'unstitched_locus' column for SNP names, otherwise 'locus'. The
                        'unstitched_locus' elements are therefore the hyphen-separated
                        unstitched_chrom-unstitched_pos. FREQ columns from VarScan are also
                        assumed.
  -o OUTDIR, --outdir OUTDIR
                        /path/to/cmh_test_output_dir/
                        File output from cmh_test.py will be saved in the outdir, with the original
                        name of the input file, but with the suffix "_CMH-test-results.txt"
  --case CASE           The string present in every column for pools in "case" treatments.
  --control CONTROL     The string present in every column for pools in "control" treatments.
  -p PLOIDYFILE, --ploidy PLOIDYFILE
                        /path/to/the/ploidy.pkl file output by the VarScan pipeline. This is a python
                        dictionary with key=pool_name, value=dict with key=pop, value=ploidy. The code
                        will prompt for pool_name if necessary.
  -e ENGINES, --engines ENGINES
                        The number of ipcluster engines that will be launched.

About

This code takes output from the VarScan pipeline and calculates Cochran-Mantel-Haenszel chi-squared tests on stratified contingency tables of read counts standardized by population ploidy.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%