Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparing Treatments with Multiple Paired End Replicates #125

Open
raboul101 opened this issue Jun 25, 2018 · 11 comments
Open

Comparing Treatments with Multiple Paired End Replicates #125

raboul101 opened this issue Jun 25, 2018 · 11 comments

Comments

@raboul101
Copy link

raboul101 commented Jun 25, 2018

I have an ATACseq data set that includes three different treatments, each with three biological replicates. I have paired-end fastq files for each replicate. The question is: Can the fastqs for each treatment be run through the pipeline simultaneously, or must they be run separately and then compared through post-processing?

If treatments can be run simultaneously, could you provide an example how to properly phrase the BDS command? For more clarifcation, see below --

The Usage section of "https://github.com/kundajelab/atac_dnase_pipelines" states the following:

"For multiple replicates (PE), specify fastqs with -fastq[REP_ID][PAIR_ID]. Add -fastq[][] for each replicate and pair to the command line:replicates.

-fastq1_1 [READ_REP1_PAIR1] -fastq1_2 [READ_REP1_PAIR2] -fastq2_1 [READ_REP2_PAIR1] -fastq2_1 [READ_REP2_PAIR2] .."

This seems to suggest that one can only enter bioreps for one treatment, e.g. -fastq1_1 trt1_rep1_R1.fastq.gz -fastq1_2 trt1_rep1_R2.fastq.gz -fastq2_1 trt1_rep2_R1.fastq.gz and so on. I don't see any clear way to denote treatment. An example of this would be very helpful, if possible.

@akundaje
Copy link
Contributor

You should run each of the treatments separately (bioreps for each treatment together). Then you need to use a differential analysis package to identify differential peaks. You can use the union of naive overlap peaks across all conditions as your complete set of peaks. Quantify read counts in each peak in each of the replicates and treatments. Then run these through DESeq2 or EdgeR or some other differential count analysis method.

@raboul101
Copy link
Author

raboul101 commented Jun 26, 2018

Thank you. But now I have another question: I see that the pipeline version I used (downloaded sometime in April) is now deprecated. Should I abandon my previous results and go with the new pipeline?
The new version is implemented through Docker (which i have installed). However, the instructions for use are somewhat bewildering. Where is the best place to look for a clear set of use instructions? ... I am not familiar with DNAnexus, and it appears to be a fee-based service.

@akundaje
Copy link
Contributor

No you dont have to re-run the pipeline. Its the same pipeline just dockerized so installs more easily on several platforms. We will improve installation and usage instructions of the new version (@leepc12 Note we need to improve documentation for the new version of the pipelines). I would suggest switching to when you can because we will only be developing the docker version going forward.

@akundaje
Copy link
Contributor

@raboul101 Could you give us specifics on which part of the installation process with the new pipeline you found confusing. We are starting to improve documentation so best to get specific feedback from users. Thanks!

@leepc12
Copy link
Collaborator

leepc12 commented Jun 26, 2018

https://encode-dcc.github.io/wdl-pipelines/install.html#local-computer-with-docker

@raboul101: We are sorry about that, we wanted to have a unified documentation for all pipelines but that made users confusing. We will update the documentation. Until then, please let me know which step made you confusing. Also, please feel free to post issues on the new pipeline github repo (or here).

[MINICONDA3_INSTALL_DIR]: where you installed miniconda3
[WDL_PIPELINE_DIR] : where you installed the pipeline (git directory)

java -jar -Dconfig.file=backends/backend.conf cromwell-30.2.jar run atac.wdl -i input.json -o workflow_opts/docker.json

New pipeline takes in a JSON file instead of parameters defined in command line arguments.
input.json description: https://encode-dcc.github.io/wdl-pipelines/input_json_atac.html

You can find examples on /examples/klab. You may need to change genome TSV file path and paths for FASTQs.

@raboul101
Copy link
Author

raboul101 commented Jun 28, 2018

Sorry for the late reply. Since I already have results with the old pipeline, I haven't proceeded with installing the docker-based pipeline. However, what was confusing was the input.json file. As I understand the new process, you
1: download the genome data (along with the associated genome.tsv -- this downloads along with the genome, correct?),
2: add this genome.tsv, along with input files and desired options, to the input.json file,
3: run the pipeline with the command listed in the previous comment (above).

So,

Where does one obtain a template input.json, or if it needs to be created de novo, what is the proper format,

What is the backend.conf file, and/or where is it?

My main hang-up is where to get or how to create the .json, I think clearing that up will help greatly. And thank you for putting these pipelines together, they are a great resource.

@leepc12
Copy link
Collaborator

leepc12 commented Jun 28, 2018

There are many template input JSON files in /examples/ (for each platform, pick up any JSON in /examples/klab/ for local pipeline running).

bandend.conf is in /backends/.

We strongly recommend that users need to docker for the pipeline so that annoying dependency issues will not occur.

Sorry, I am still working on the documentation, will update it soon.

@raboul101
Copy link
Author

What is the full path for those JSON examples? I don't see them in github: kundajelab/atac_dnase_pipelines/examples

@leepc12
Copy link
Collaborator

leepc12 commented Jun 28, 2018

New pipeline repo is https://github.com/ENCODE-DCC/atac-seq-pipeline/

@raboul101
Copy link
Author

Aha! That clears it up. Thank you again.

@vervacity
Copy link
Collaborator

Hi llz-hiv, please repost this as a separate issue, as this is not related to the above thread. Also please consider subscribing to our pipelines google group, which may have additional useful information as you consider downstream analyses :) https://groups.google.com/forum/#!forum/klab_genomic_pipelines_discuss

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants