Skip to content

💪 Pipeline for performing power calculations for genetic studies

License

Notifications You must be signed in to change notification settings

DiseaseTranscriptomicsLab/power-calculations-genetics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Genetic power calculations

Pipeline in R for performing power calculations (based on chi-squared test) for genetic studies, using pwr R package (see vignette here). Its main application is to assist users with experimental design.

Table of contents


Introduction

The rational behind performing power calculations is that a study with low statistical power has a reduced chance of detecting a true effect, but it is less well appreciated that low power also reduces the likelihood that a statistically significant result reflects a true effect (see "Power failure: why small sample size undermines the reliability of neuroscience" published in Nature Reviews Neuroscience (2013)). The consequences of this include overestimates of effect size and low reproducibility of results.


Installation

Use the environment.yaml file to create conda environment and install required packages. The -p flag should point to the miniconda installation path. For instance, to create power_calc_genetics environment using miniconda installed in /miniconda directory run the following command:

conda env create -p /miniconda/envs/power_calc_genetics --file envm/environment.yaml

Activate created power_calc_genetics conda environment before running the pipeline

conda activate power_calc_genetics

Usage

To run the pipeline execute the power_calc_genetics.R script. This script catches the arguments from the command line and passes them to the power_calc_genetics.Rmd script to perform power calculations and produce the interactive HTML report.

Arguments

Argument Description Required
--samples_n Total number of samples No
--features_n Total number of features (used for multihypothesis testing adjustment) No
--power Power of test (1 - type II error probability) No
--sig_level Significance level (type I error probability) No
--deg_freedom Degree of freedom No
--report_name Desired name for the report No
--report_dir Desired location for the report Yes
--seed Seed for random number generation No
--hide_code_btn Hide the "Code" button allowing to show/hide code chunks in the final HTML report No

Packages: required packages are listed in environment.yaml file.

Examples

Below is a command line use example for generating a genetic power calculations report for a hypothetical dataset of 10000 samples and 100000 genetic features (variants):

conda activate power_calc_genetics

power_calc_genetics.R script (see the beginning of Usage section) should be executed from the scripts folder

cd Power_calc_genetics/scripts

Rscript power_calc_genetics.R  --samples_n 10000 --power 0.9 --sig_level 0.05 --deg_freedom 1 --features_n 100000 --report_name power_calc_genetics --report_dir output

The interactive HTML report named power_calc_genetics.html will be created in output folder.

Note: make sure that the created conda environment (see Installation section) is activated


Output

The pipeline generates HTML-based genetic power calculations report within user-defined output folder:

|
|____[output]
  |____[power_calc_genetics].html
  |____[power_calc_genetics].md

Note: the [power_calc_genetics].md file is a markdown (md) file containing a plain text representation of the content before it's formatted into the .html report.


About

💪 Pipeline for performing power calculations for genetic studies

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages