New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding the feature of shifting and splitting the reads #91
Comments
Suggest also reading Thomas Caroll ATACseq workflow: |
Thanks, @houghtos. This pipeline also performs partitioning. Perhaps I will give it a try. |
Thanks for the detailed info and references @husaynahmed @houghtos. Having thought about this a little I think this is something that would be ideal to implement as a sub-workflow using DSL2. This means that we can just pass a BAM file (filtered in whatever way) to the same sub-workflow to run the downstream processes. I think I will probably start making a serious effort to port this pipeline to DSL2 after the next release so your input and contributions will be more than welcome 😃 |
Hi @husaynahmed Thank you for these references. I have an additional question related to reads shifting and peak calling parameters ... which is finally also related to #108. If I'm correct, when you are using Macs2 Is that correct ? |
Hi,
I was wondering if adding the feature of shifting and splitting the reads after the alignment would be helpful.
A recent review on the approaches to analyze ATAC-seq data (Yan et al., 2020) suggests reads should be shifted + 4 bp and − 5 bp for positive and negative strand respectively, to account for the 9-bp duplication created by DNA repair of the nick by Tn5 transposase and achieve base-pair resolution of TF footprint and motif related analyses.
Also, in many studies researchers have looked into the nucleosomal-free regions and nucleosome associated regions separately. For example Yoshida et al., 2019 .
Here's what I am proposing.
Let's have three modes of post-alignment analysis for the ATAC-seq pipeline.
The first approach (which is the current atacseq pipeline) is useful in almost all cases where identifying the open chromatin regions is the objective. The latter would be helpful if the exact cut sites of transposase is important like motif analyses, footprinting etc., or when the analysis requires looking into the NFR/NBR regions separately.
The downstream analysis and QC could be done depending on which of these options the user chooses. For example, if the user chooses option 2, we could generate mergedLibrary and mergedReplicate BAM, bigWigs etc separately for NFR and NBR and perform peak calling and all the QC for them separately.
I have used deeptools in the past to perform shifting and splitting. Here's an example code for the same.
It would be great to know comments and suggestions from the nf-core members. I would be happy to discuss. I am a beginner in the nf-core community but would be happy to contribute to this pipeline enhancement.
The text was updated successfully, but these errors were encountered: