runs/gtex at master · nellore/runs

History

Name		Name	Last commit message	Last commit date
parent directory ..
DER_analysis		DER_analysis
logs		logs
v2		v2
AUC.sh		AUC.sh
README.md		README.md
SRA_GTEx_search_screenshot_6.37.16_PM_ET_11.21.2015.png		SRA_GTEx_search_screenshot_6.37.16_PM_ET_11.21.2015.png
SraRunInfo.csv		SraRunInfo.csv
activity.tsv		activity.tsv
align_gtex_batch_0.sh		align_gtex_batch_0.sh
align_gtex_batch_1.sh		align_gtex_batch_1.sh
align_gtex_batch_10.sh		align_gtex_batch_10.sh
align_gtex_batch_11.sh		align_gtex_batch_11.sh
align_gtex_batch_12.sh		align_gtex_batch_12.sh
align_gtex_batch_13.sh		align_gtex_batch_13.sh
align_gtex_batch_14.sh		align_gtex_batch_14.sh
align_gtex_batch_15.sh		align_gtex_batch_15.sh
align_gtex_batch_16.sh		align_gtex_batch_16.sh
align_gtex_batch_17.sh		align_gtex_batch_17.sh
align_gtex_batch_18.sh		align_gtex_batch_18.sh
align_gtex_batch_19.sh		align_gtex_batch_19.sh
align_gtex_batch_2.sh		align_gtex_batch_2.sh
align_gtex_batch_20.sh		align_gtex_batch_20.sh
align_gtex_batch_21.sh		align_gtex_batch_21.sh
align_gtex_batch_22.sh		align_gtex_batch_22.sh
align_gtex_batch_23.sh		align_gtex_batch_23.sh
align_gtex_batch_24.sh		align_gtex_batch_24.sh
align_gtex_batch_25.sh		align_gtex_batch_25.sh
align_gtex_batch_26.sh		align_gtex_batch_26.sh
align_gtex_batch_27.sh		align_gtex_batch_27.sh
align_gtex_batch_28.sh		align_gtex_batch_28.sh
align_gtex_batch_29.sh		align_gtex_batch_29.sh
align_gtex_batch_3.sh		align_gtex_batch_3.sh
align_gtex_batch_4.sh		align_gtex_batch_4.sh
align_gtex_batch_5.sh		align_gtex_batch_5.sh
align_gtex_batch_6.sh		align_gtex_batch_6.sh
align_gtex_batch_7.sh		align_gtex_batch_7.sh
align_gtex_batch_8.sh		align_gtex_batch_8.sh
align_gtex_batch_9.sh		align_gtex_batch_9.sh
combine_gtex.py		combine_gtex.py
cores.pdf		cores.pdf
costs.csv		costs.csv
costs.pdf		costs.pdf
download.sh		download.sh
gen.py		gen.py
generate_sums.sh		generate_sums.sh
gtex_batch_0.manifest		gtex_batch_0.manifest
gtex_batch_1.manifest		gtex_batch_1.manifest
gtex_batch_10.manifest		gtex_batch_10.manifest
gtex_batch_11.manifest		gtex_batch_11.manifest
gtex_batch_12.manifest		gtex_batch_12.manifest
gtex_batch_13.manifest		gtex_batch_13.manifest
gtex_batch_14.manifest		gtex_batch_14.manifest
gtex_batch_15.manifest		gtex_batch_15.manifest
gtex_batch_16.manifest		gtex_batch_16.manifest
gtex_batch_17.manifest		gtex_batch_17.manifest
gtex_batch_18.manifest		gtex_batch_18.manifest
gtex_batch_19.manifest		gtex_batch_19.manifest
gtex_batch_2.manifest		gtex_batch_2.manifest
gtex_batch_20.manifest		gtex_batch_20.manifest
gtex_batch_21.manifest		gtex_batch_21.manifest
gtex_batch_22.manifest		gtex_batch_22.manifest
gtex_batch_23.manifest		gtex_batch_23.manifest
gtex_batch_24.manifest		gtex_batch_24.manifest
gtex_batch_25.manifest		gtex_batch_25.manifest
gtex_batch_26.manifest		gtex_batch_26.manifest
gtex_batch_27.manifest		gtex_batch_27.manifest
gtex_batch_28.manifest		gtex_batch_28.manifest
gtex_batch_29.manifest		gtex_batch_29.manifest
gtex_batch_3.manifest		gtex_batch_3.manifest
gtex_batch_4.manifest		gtex_batch_4.manifest
gtex_batch_5.manifest		gtex_batch_5.manifest
gtex_batch_6.manifest		gtex_batch_6.manifest
gtex_batch_7.manifest		gtex_batch_7.manifest
gtex_batch_8.manifest		gtex_batch_8.manifest
gtex_batch_9.manifest		gtex_batch_9.manifest
hg38.sizes		hg38.sizes
incomplete.py		incomplete.py
incomplete.sh		incomplete.sh
incomplete.tsv		incomplete.tsv
mapped.pdf		mapped.pdf
mapped.sh		mapped.sh
mapped_prop.tsv		mapped_prop.tsv
meanbigwig.sh		meanbigwig.sh
pheno.py		pheno.py
prep_gtex_batch_0.sh		prep_gtex_batch_0.sh
prep_gtex_batch_1.sh		prep_gtex_batch_1.sh
prep_gtex_batch_10.sh		prep_gtex_batch_10.sh
prep_gtex_batch_11.sh		prep_gtex_batch_11.sh
prep_gtex_batch_12.sh		prep_gtex_batch_12.sh
prep_gtex_batch_13.sh		prep_gtex_batch_13.sh
prep_gtex_batch_14.sh		prep_gtex_batch_14.sh
prep_gtex_batch_15.sh		prep_gtex_batch_15.sh
prep_gtex_batch_16.sh		prep_gtex_batch_16.sh
prep_gtex_batch_17.sh		prep_gtex_batch_17.sh
prep_gtex_batch_18.sh		prep_gtex_batch_18.sh
prep_gtex_batch_19.sh		prep_gtex_batch_19.sh
prep_gtex_batch_2.sh		prep_gtex_batch_2.sh
prep_gtex_batch_20.sh		prep_gtex_batch_20.sh
prep_gtex_batch_21.sh		prep_gtex_batch_21.sh
prep_gtex_batch_22.sh		prep_gtex_batch_22.sh

README.md

If anything below doesn't make sense, ask us questions: .

Redoing GTEx Rail-RNA runs on Amazon Elastic MapReduce

Install Rail-RNA v0.2.1, which is available for download here.
Follow the instructions here to set up an Amazon Web Services (AWS) account with an Identity and Access Management (IAM) user configured to analyze dbGaP data securely. Name the CloudFormation stack dbgap-1 rather than dbgap, as those instructions recommend. The secure bucket name created with the CloudFormation template is referenced as s3://gtex-bucket here.
To faciliate submitting job flows in multiple availability zones of the US Standard region (i.e., us-east-1), create three more CloudFormation stacks as described here, except with this CloudFormation template, which requires specification of an availability zone for the public subnet into which the Rail-RNA Elastic MapReduce cluster will be launched. Choose from among us-east-1a, us-east-1b, us-east-1c, us-east-1d, and us-east-1e, and name the stacks dbgap-2, dbgap-3, and dbgap-4.
Download the dbGaP repository key granting access to GTEx data. It should have the extension .ngc and is referenced as /path/to/dbgap/key.ngc here.

Run

 python gen.py
   --s3-bucket s3://gtex-bucket
   --prep-stack-names <one or more of the dbgap-* stack names above separated by spaces>
   --align-stack-names <one or more of the dbgap-* stack name above separated by spaces>
   --dbgap-key /path/to/dbgap/key.ngc

to generate scripts for preprocessing and aligning GTEx data. 60 scripts representing a partitioning of GTEx RNA-seq data into 30 batches are generated: 30 for preprocessing and 30 for aligning. 6. Run the scripts generated in the previous step to submit job flows to Elastic MapReduce. Each prep_gtex_batch_k.sh file for k between 0 and 29 inclusive should be run and its job flow completed before the corresponding align_gtex_batch_k.sh is run to align data preprocessed and uploaded to S3. It is recommended that only three preprocessing job flows are submitted at a time. Tweak shell scripts to change the argument of --stack-name in the rail-rna command as necessary if Elastic MapReduce complains that there aren't enough IPs in the subnet of a VPC in a given availability zone to launch more job flows. 7. Use this script to download all results from S3 to local storage. Command-line parameters are described in its comments. 8. Compute total number of reads across samples with the script total.sh. Its only command-line parameter is the local GTEx output directory specified in the previous step, which is where all analysis results should have been dumped. The figure we obtained was 896,466,227,499. Here, "read" refers to a mate for paired-end samples.

Reproducing figures from the Rail-dbGaP paper

Figure 1: security architecture

This figure was generated using Keynote; see security_figure.key.

Figures 2 and 3: core activity and costs

Run the Mathematica 10 notebook rail_dbgap_plots.nb. It uses costs.csv, costs downloaded from the AWS Cost Explorer, as well as activity.tsv, which has start and end times of all GTEx preprocess and align job flows. The file activity.tsv was generated with reconstruct_activity.py from the saved Elastic MapReduce web interface HTML files in logs/. If you don't have Mathematica, check rail_dbgap_plots.pdf for its output.

Files

gtex

Directory actions

More options

Directory actions

More options

Latest commit

History

gtex

Folders and files

parent directory