Adding the feature of shifting and splitting the reads #91

husaynahmed · 2020-03-04T15:48:56Z

Hi,

I was wondering if adding the feature of shifting and splitting the reads after the alignment would be helpful.

A recent review on the approaches to analyze ATAC-seq data (Yan et al., 2020) suggests reads should be shifted + 4 bp and − 5 bp for positive and negative strand respectively, to account for the 9-bp duplication created by DNA repair of the nick by Tn5 transposase and achieve base-pair resolution of TF footprint and motif related analyses.

Also, in many studies researchers have looked into the nucleosomal-free regions and nucleosome associated regions separately. For example Yoshida et al., 2019 .

Here's what I am proposing.

Let's have three modes of post-alignment analysis for the ATAC-seq pipeline.

No shifting and splitting the aligned reads
Shifting the reads and splitting them into two - NFR (Nucleosomal Free Regions) and NBR (Nucleosomal Bound regions)
Shifting the reads and splitting them into four - NFR (Nucleosomal Free Regions), Mono-, di- and tri-nucleosomal bound regions.

The first approach (which is the current atacseq pipeline) is useful in almost all cases where identifying the open chromatin regions is the objective. The latter would be helpful if the exact cut sites of transposase is important like motif analyses, footprinting etc., or when the analysis requires looking into the NFR/NBR regions separately.

The downstream analysis and QC could be done depending on which of these options the user chooses. For example, if the user chooses option 2, we could generate mergedLibrary and mergedReplicate BAM, bigWigs etc separately for NFR and NBR and perform peak calling and all the QC for them separately.

I have used deeptools in the past to perform shifting and splitting. Here's an example code for the same.


bam_loc="/path/to/aligned/bamfiles/directory"
out_loc="/path/to/output/directory"

## For NFR (Frag length of 120 bp and less)

alignmentSieve --bam $bam_loc/${SAMPLE}.mLb.clN.sorted.bam \
               --outFile $out_loc/${SAMPLE}.mLb.ss.NFR.bam \
               --ATACshift \
               -p 8 \
               --smartLabels \
               --minFragmentLength 0 \
               --maxFragmentLength 120 \
               --filterMetrics $out_loc/${SAMPLE}.mLb.ss.NFR.log

## For NBR (Frag length of 180 bp and above)

alignmentSieve --bam $bam_loc/${SAMPLE}.mLb.clN.sorted.bam \
               --outFile $out_loc/${SAMPLE}.mLb.ss.NBR.bam \
               --ATACshift \
               -p 8 \
               --smartLabels \
               --minFragmentLength 180 \
               --filterMetrics $out_loc/${SAMPLE}.mLb.ss.NBR.log

It would be great to know comments and suggestions from the nf-core members. I would be happy to discuss. I am a beginner in the nf-core community but would be happy to contribute to this pipeline enhancement.

The text was updated successfully, but these errors were encountered:

houghtos · 2020-03-06T21:29:32Z

Suggest also reading Thomas Caroll ATACseq workflow:
https://rockefelleruniversity.github.io/RU_ATAC_Workshop.html
Goes into using Subread to partitioning alignment BAM file into nucleosome free region, di-nucleosome, and mono-nucleosome alignment files. Based on the ATACseq Greenleaf paper..

husaynahmed · 2020-03-10T10:54:24Z

Suggest also reading Thomas Caroll ATACseq workflow:
https://rockefelleruniversity.github.io/RU_ATAC_Workshop.html
Goes into using Subread to partitioning alignment BAM file into nucleosome free region, di-nucleosome, and mono-nucleosome alignment files. Based on the ATACseq Greenleaf paper..

Thanks, @houghtos. This pipeline also performs partitioning. Perhaps I will give it a try.
However, it would be great if we could integrate this feature in the nf-core/atacseq pipeline.

drpatelh · 2020-03-11T12:39:30Z

Thanks for the detailed info and references @husaynahmed @houghtos. Having thought about this a little I think this is something that would be ideal to implement as a sub-workflow using DSL2. This means that we can just pass a BAM file (filtered in whatever way) to the same sub-workflow to run the downstream processes. I think I will probably start making a serious effort to port this pipeline to DSL2 after the next release so your input and contributions will be more than welcome 😃

nservant · 2020-12-11T08:35:53Z

Hi @husaynahmed

Thank you for these references. I have an additional question related to reads shifting and peak calling parameters ... which is finally also related to #108.

If I'm correct, when you are using Macs2 --shift 100 --extend 200 or genrich -j -y (see https://informatics.fas.harvard.edu/atac-seq-guidelines.html), you do not need to do reads shifting, as the shift is part of the peak calling analysis ?
However, applying the shift with alignmentSieve can indeed be useful if you are using for instance Macs2 in BAMPE mode using the full fragment as a proxy.

Is that correct ?
Thanks
Nicolas

husaynahmed added the enhancement New feature or request label Mar 4, 2020

JoseEspinosa mentioned this issue Sep 23, 2021

new module: genrich/genrich nf-core/modules#726

Closed

4 tasks

ktrns mentioned this issue Jul 5, 2023

DO NOT MERGE yet: Discuss fixes of #164 #168 #169 #301

Merged

samuelruizperez mentioned this issue Aug 22, 2023

Adding peak-calling with Genrich (fix #108) and multi-mapping read analysis #331

Draft

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding the feature of shifting and splitting the reads #91

Adding the feature of shifting and splitting the reads #91

husaynahmed commented Mar 4, 2020

houghtos commented Mar 6, 2020

husaynahmed commented Mar 10, 2020

drpatelh commented Mar 11, 2020

nservant commented Dec 11, 2020

Adding the feature of shifting and splitting the reads #91

Adding the feature of shifting and splitting the reads #91

Comments

husaynahmed commented Mar 4, 2020

houghtos commented Mar 6, 2020

husaynahmed commented Mar 10, 2020

drpatelh commented Mar 11, 2020

nservant commented Dec 11, 2020