Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding the feature of shifting and splitting the reads #91

Open
husaynahmed opened this issue Mar 4, 2020 · 4 comments
Open

Adding the feature of shifting and splitting the reads #91

husaynahmed opened this issue Mar 4, 2020 · 4 comments
Labels
enhancement New feature or request

Comments

@husaynahmed
Copy link
Member

Hi,

I was wondering if adding the feature of shifting and splitting the reads after the alignment would be helpful.

A recent review on the approaches to analyze ATAC-seq data (Yan et al., 2020) suggests reads should be shifted + 4 bp and − 5 bp for positive and negative strand respectively, to account for the 9-bp duplication created by DNA repair of the nick by Tn5 transposase and achieve base-pair resolution of TF footprint and motif related analyses.

Also, in many studies researchers have looked into the nucleosomal-free regions and nucleosome associated regions separately. For example Yoshida et al., 2019 .

Here's what I am proposing.

Let's have three modes of post-alignment analysis for the ATAC-seq pipeline.

  1. No shifting and splitting the aligned reads
  2. Shifting the reads and splitting them into two - NFR (Nucleosomal Free Regions) and NBR (Nucleosomal Bound regions)
  3. Shifting the reads and splitting them into four - NFR (Nucleosomal Free Regions), Mono-, di- and tri-nucleosomal bound regions.

The first approach (which is the current atacseq pipeline) is useful in almost all cases where identifying the open chromatin regions is the objective. The latter would be helpful if the exact cut sites of transposase is important like motif analyses, footprinting etc., or when the analysis requires looking into the NFR/NBR regions separately.

The downstream analysis and QC could be done depending on which of these options the user chooses. For example, if the user chooses option 2, we could generate mergedLibrary and mergedReplicate BAM, bigWigs etc separately for NFR and NBR and perform peak calling and all the QC for them separately.

I have used deeptools in the past to perform shifting and splitting. Here's an example code for the same.


bam_loc="/path/to/aligned/bamfiles/directory"
out_loc="/path/to/output/directory"

## For NFR (Frag length of 120 bp and less)

alignmentSieve --bam $bam_loc/${SAMPLE}.mLb.clN.sorted.bam \
               --outFile $out_loc/${SAMPLE}.mLb.ss.NFR.bam \
               --ATACshift \
               -p 8 \
               --smartLabels \
               --minFragmentLength 0 \
               --maxFragmentLength 120 \
               --filterMetrics $out_loc/${SAMPLE}.mLb.ss.NFR.log

## For NBR (Frag length of 180 bp and above)

alignmentSieve --bam $bam_loc/${SAMPLE}.mLb.clN.sorted.bam \
               --outFile $out_loc/${SAMPLE}.mLb.ss.NBR.bam \
               --ATACshift \
               -p 8 \
               --smartLabels \
               --minFragmentLength 180 \
               --filterMetrics $out_loc/${SAMPLE}.mLb.ss.NBR.log


It would be great to know comments and suggestions from the nf-core members. I would be happy to discuss. I am a beginner in the nf-core community but would be happy to contribute to this pipeline enhancement.

@husaynahmed husaynahmed added the enhancement New feature or request label Mar 4, 2020
@houghtos
Copy link

houghtos commented Mar 6, 2020

Suggest also reading Thomas Caroll ATACseq workflow:
https://rockefelleruniversity.github.io/RU_ATAC_Workshop.html
Goes into using Subread to partitioning alignment BAM file into nucleosome free region, di-nucleosome, and mono-nucleosome alignment files. Based on the ATACseq Greenleaf paper..

@husaynahmed
Copy link
Member Author

Suggest also reading Thomas Caroll ATACseq workflow:
https://rockefelleruniversity.github.io/RU_ATAC_Workshop.html
Goes into using Subread to partitioning alignment BAM file into nucleosome free region, di-nucleosome, and mono-nucleosome alignment files. Based on the ATACseq Greenleaf paper..

Thanks, @houghtos. This pipeline also performs partitioning. Perhaps I will give it a try.
However, it would be great if we could integrate this feature in the nf-core/atacseq pipeline.

@drpatelh
Copy link
Member

Thanks for the detailed info and references @husaynahmed @houghtos. Having thought about this a little I think this is something that would be ideal to implement as a sub-workflow using DSL2. This means that we can just pass a BAM file (filtered in whatever way) to the same sub-workflow to run the downstream processes. I think I will probably start making a serious effort to port this pipeline to DSL2 after the next release so your input and contributions will be more than welcome 😃

@nservant
Copy link

Hi @husaynahmed

Thank you for these references. I have an additional question related to reads shifting and peak calling parameters ... which is finally also related to #108.

If I'm correct, when you are using Macs2 --shift 100 --extend 200 or genrich -j -y (see https://informatics.fas.harvard.edu/atac-seq-guidelines.html), you do not need to do reads shifting, as the shift is part of the peak calling analysis ?
However, applying the shift with alignmentSieve can indeed be useful if you are using for instance Macs2 in BAMPE mode using the full fragment as a proxy.

Is that correct ?
Thanks
Nicolas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants