Skip to content

Latest commit

 

History

History
64 lines (50 loc) · 10.1 KB

README.md

File metadata and controls

64 lines (50 loc) · 10.1 KB

DOI Codacy Badge


Usage

If you use or are inspired by code from this repo, please site related manuscripts and data:

  • Zenodo - contains an archived release of this repository - DOI

    Lind B (2021) GitHub.com/brandonlind/cmh_test: preprint release (Version 1.0.0). Zenodo. http://doi.org/10.5281/zenodo.5083798
    

cmh_test

Using ipcluster engines to parallelize calculations from varscan_pipeline outfiles, calculate Cochran-Mantel-Haenszel chi-squared tests on stratified contingency tables.

Each stratum is a population. Each population has a "case" pool and a "control" pool. Together, these case and control pools make the contingency table. Each contingency table is 2x2 - case and control x REF and ALT allele counts.

ALT and REF allele counts are calculated by multiplying the ploidy of the pool by ... ... either the ALT freq or (1-ALT_freq).

Assumed environment

This code was written and tested with python 3.7.6. It seemed that python3.8 had issues with parallelization implementation; this issue was not addressed in current version.

Module versions used can be mirrored with pip install -r requirements.txt

Usage

usage: cmh_test.py [-h] -i INPUT -o OUTDIR --case CASE --control CONTROL -p
                   PLOIDYFILE -e ENGINES [--ipcluster-profile PROFILE]

optional arguments:
  -h, --help            show this help message and exit
  --ipcluster-profile PROFILE
                        The ipcluster profile name with which to start engines. Default: 'default'

required arguments:
  -i INPUT, --input INPUT
                        /path/to/VariantsToTable_output.txt
                        It is assumed that there is either a 'locus' or 'unstitched_locus' column.
                        The 'locus' column elements are the hyphen-separated
                        CHROM-POS. If the 'unstitched_chrom' column is present, the code will use the
                        'unstitched_locus' column for SNP names, otherwise 'locus'. The
                        'unstitched_locus' elements are therefore the hyphen-separated
                        unstitched_chrom-unstitched_pos. FREQ columns from VarScan are also
                        assumed.
  -o OUTDIR, --outdir OUTDIR
                        /path/to/cmh_test_output_dir/
                        File output from cmh_test.py will be saved in the outdir, with the original
                        name of the input file, but with the suffix "_CMH-test-results.txt"
  --case CASE           The string present in every column for pools in "case" treatments.
  --control CONTROL     The string present in every column for pools in "control" treatments.
  -p PLOIDYFILE, --ploidy PLOIDYFILE
                        /path/to/the/ploidy.pkl file output by the VarScan pipeline. This is a python
                        dictionary with key=pool_name, value=dict with key=pop, value=ploidy. The code
                        will prompt for pool_name if necessary.
  -e ENGINES, --engines ENGINES
                        The number of ipcluster engines that will be launched.