Name		Name	Last commit message	Last commit date
parent directory ..
resume_json		resume_json
NOTES		NOTES
README.md		README.md
add_tcga_knowngene.py		add_tcga_knowngene.py
align_tcga_batch_0.sh		align_tcga_batch_0.sh
align_tcga_batch_1.sh		align_tcga_batch_1.sh
align_tcga_batch_10.sh		align_tcga_batch_10.sh
align_tcga_batch_11.sh		align_tcga_batch_11.sh
align_tcga_batch_12.sh		align_tcga_batch_12.sh
align_tcga_batch_13.sh		align_tcga_batch_13.sh
align_tcga_batch_14.sh		align_tcga_batch_14.sh
align_tcga_batch_15.sh		align_tcga_batch_15.sh
align_tcga_batch_16.sh		align_tcga_batch_16.sh
align_tcga_batch_17.sh		align_tcga_batch_17.sh
align_tcga_batch_18.sh		align_tcga_batch_18.sh
align_tcga_batch_19.sh		align_tcga_batch_19.sh
align_tcga_batch_2.sh		align_tcga_batch_2.sh
align_tcga_batch_20.sh		align_tcga_batch_20.sh
align_tcga_batch_21.sh		align_tcga_batch_21.sh
align_tcga_batch_22.sh		align_tcga_batch_22.sh
align_tcga_batch_23.sh		align_tcga_batch_23.sh
align_tcga_batch_24.sh		align_tcga_batch_24.sh
align_tcga_batch_25.sh		align_tcga_batch_25.sh
align_tcga_batch_26.sh		align_tcga_batch_26.sh
align_tcga_batch_27.sh		align_tcga_batch_27.sh
align_tcga_batch_28.sh		align_tcga_batch_28.sh
align_tcga_batch_29.sh		align_tcga_batch_29.sh
align_tcga_batch_3.sh		align_tcga_batch_3.sh
align_tcga_batch_4.sh		align_tcga_batch_4.sh
align_tcga_batch_5.sh		align_tcga_batch_5.sh
align_tcga_batch_6.sh		align_tcga_batch_6.sh
align_tcga_batch_7.sh		align_tcga_batch_7.sh
align_tcga_batch_8.sh		align_tcga_batch_8.sh
align_tcga_batch_9.sh		align_tcga_batch_9.sh
aliquot.tsv.gz		aliquot.tsv.gz
all_cgc_metadata.tsv.gz		all_cgc_metadata.tsv.gz
analyte.tsv.gz		analyte.tsv.gz
case.tsv.gz		case.tsv.gz
cgc_tcga_metadata.py		cgc_tcga_metadata.py
combine_tcga.py		combine_tcga.py
costs.csv		costs.csv
drug_therapy.tsv.gz		drug_therapy.tsv.gz
file.tsv.gz		file.tsv.gz
follow_up.tsv.gz		follow_up.tsv.gz
gen.py		gen.py
merge_tables.py		merge_tables.py
portion.tsv.gz		portion.tsv.gz
prep_tcga_batch_0.sh		prep_tcga_batch_0.sh
prep_tcga_batch_1.sh		prep_tcga_batch_1.sh
prep_tcga_batch_10.sh		prep_tcga_batch_10.sh
prep_tcga_batch_11.sh		prep_tcga_batch_11.sh
prep_tcga_batch_12.sh		prep_tcga_batch_12.sh
prep_tcga_batch_13.sh		prep_tcga_batch_13.sh
prep_tcga_batch_14.sh		prep_tcga_batch_14.sh
prep_tcga_batch_15.sh		prep_tcga_batch_15.sh
prep_tcga_batch_16.sh		prep_tcga_batch_16.sh
prep_tcga_batch_17.sh		prep_tcga_batch_17.sh
prep_tcga_batch_18.sh		prep_tcga_batch_18.sh
prep_tcga_batch_19.sh		prep_tcga_batch_19.sh
prep_tcga_batch_2.sh		prep_tcga_batch_2.sh
prep_tcga_batch_20.sh		prep_tcga_batch_20.sh
prep_tcga_batch_21.sh		prep_tcga_batch_21.sh
prep_tcga_batch_22.sh		prep_tcga_batch_22.sh
prep_tcga_batch_23.sh		prep_tcga_batch_23.sh
prep_tcga_batch_24.sh		prep_tcga_batch_24.sh
prep_tcga_batch_25.sh		prep_tcga_batch_25.sh
prep_tcga_batch_26.sh		prep_tcga_batch_26.sh
prep_tcga_batch_27.sh		prep_tcga_batch_27.sh
prep_tcga_batch_28.sh		prep_tcga_batch_28.sh
prep_tcga_batch_29.sh		prep_tcga_batch_29.sh
prep_tcga_batch_3.sh		prep_tcga_batch_3.sh
prep_tcga_batch_4.sh		prep_tcga_batch_4.sh
prep_tcga_batch_5.sh		prep_tcga_batch_5.sh
prep_tcga_batch_6.sh		prep_tcga_batch_6.sh
prep_tcga_batch_7.sh		prep_tcga_batch_7.sh
prep_tcga_batch_8.sh		prep_tcga_batch_8.sh
prep_tcga_batch_9.sh		prep_tcga_batch_9.sh
radiation_therapy.tsv.gz		radiation_therapy.tsv.gz
sample.tsv.gz		sample.tsv.gz
slide.tsv.gz		slide.tsv.gz
tcga_batch_0.manifest		tcga_batch_0.manifest
tcga_batch_1.manifest		tcga_batch_1.manifest
tcga_batch_10.manifest		tcga_batch_10.manifest
tcga_batch_11.manifest		tcga_batch_11.manifest
tcga_batch_12.manifest		tcga_batch_12.manifest
tcga_batch_13.manifest		tcga_batch_13.manifest
tcga_batch_14.manifest		tcga_batch_14.manifest
tcga_batch_15.manifest		tcga_batch_15.manifest
tcga_batch_16.manifest		tcga_batch_16.manifest
tcga_batch_17.manifest		tcga_batch_17.manifest
tcga_batch_18.manifest		tcga_batch_18.manifest
tcga_batch_19.manifest		tcga_batch_19.manifest
tcga_batch_2.manifest		tcga_batch_2.manifest
tcga_batch_20.manifest		tcga_batch_20.manifest
tcga_batch_21.manifest		tcga_batch_21.manifest
tcga_batch_22.manifest		tcga_batch_22.manifest
tcga_batch_23.manifest		tcga_batch_23.manifest
tcga_batch_24.manifest		tcga_batch_24.manifest
tcga_batch_25.manifest		tcga_batch_25.manifest
tcga_batch_26.manifest		tcga_batch_26.manifest

README.md

Reproducing TCGA runs

We securely reanalyzed TCGA in the cloud with Amazon Elastic MapReduce. Reproducing our TCGA runs requires dbGaP authorized access, a CGC account, and an AWS account set up for runs of Rail-RNA on dbGaP-protected data. If you already have an AWS account, see our documentation for instructions on preparing the account for analysis of dbGaP-protected data.

Clone the repo and change to this directory at the command line. Then run python tcga_file_list.py >tcga_file_list.tsv to obtain a list of file paths from a SPARQL query of CGC. See the docstring of tcga_file_list.py for its requirements. The user's list may be different from the list we obtained when we performed the query (9/29/2016). Our list is tcga_file_list.tsv, and the user may skip this step and simply use our file, assuming all file paths on the CGC are the same.
Download and install Rail-RNA v0.2.4a. Set Rail-RNA up for analyzing dbGaP-protected data by following the instructions at http://docs.rail.bio/dbgap/.
Use the Python script gen.py to regenerate all the Rail-RNA manifest files (*.manifest) in this directory as well as scripts that run Rail-RNA to preprocess (prep_tcga_batch_*.sh) and align (align_tcga_batch_*.sh) TCGA data on Amazon Elastic MapReduce. Refer to gen.py's docstring for the precise command to execute; be sure to change the output bucket on S3. The script divides TCGA into 30 batches, each with about 380 randomly selected samples. A given batch is associated with a different Rail-RNA manifest file, preprocess script, and alignment script. Note that gen.py requires an authorization token provided by CGC. To obtain one, sign up for a CGC account, confirm dbGaP authorized access to TCGA, generate a token using the CGC web interface, and store it in a text file locally.
For each batch b (a number between 0 and 29 inclusive), run
```
 sh prep_tcga_batch_b.sh
```

wait for for the Rail-RNA preprocess job on Elastic MapReduce to finish successfully, and next run

    sh align_tcga_batch_b.sh

Download all results from the output bucket on S3 you chose in step 3 to a dbGaP-compliant local cluster using either the AWS CLI or the console.

Reproducing metadata tables

Run sh tcga_query.sh to ultimately obtain all_cgc_metadata.tsv.gz.

Files

tcga

Directory actions

More options

Directory actions

More options

Latest commit

History

tcga

Folders and files

parent directory