Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CG CHG and CHH with --comprehensive flag using bismark workflow #388

Open
MMJBerger opened this issue Mar 13, 2024 · 8 comments
Open

CG CHG and CHH with --comprehensive flag using bismark workflow #388

MMJBerger opened this issue Mar 13, 2024 · 8 comments
Labels
enhancement New feature or request

Comments

@MMJBerger
Copy link

Description of feature

Hello,

Would it be possible not to instruct Bismark methylation extractor to use both the --comprehensive and --merge_non_CpG flags when specifying --comprehensive to the pipeline ?

This is not big issue since it is possible to act on it (thank you for your answer on the issue #372 ). However, in plants, the three methylation contexts are (almost) always considered independently in methylome analysis, and I found a little bit confusing to 'only' get "CpG" and "non_CpG" results with the Bismark workflow by using the same flag than the one currently used on Bismark to access CHG and CHH context independently.

I think having an easier way to chose directly in the pipeline if we want 'non_CpG' merged or not when using bismark workflow might be a plus for plant biologists !

Best regards,

@MMJBerger MMJBerger added the enhancement New feature or request label Mar 13, 2024
@deep-buddingcoder
Copy link

Hi MMJ,

I am not from nf-core/methylseq team.
I am primarily an experimental biologist and I do perform some bioinformatics data processing and analysis of ATACseq and ChIPseq data. Now I am learning to perform data processing of RRBS/EMseq DNA methylation NGS data. I have some experience in python and snakemake scripting.

Because of my own curiosity, I dig into your issue and thought of sharing my findings (while you are still awaiting response from nf-core/methylseq team).

Assuming you are on main github webpage (https://github.com/nf-core/methylseq), click on code --> click on conf --> click on modules.config --> go to line 209 which looks like this:

params.comprehensive ? ' --comprehensive --merge_non_CpG' : '',

if this line is modified into as described under:

params.comprehensive ? ' --comprehensive' : '',
params.mergenonCpG ? ' --merge_non_CpG' : '',

then using the flag <--comprehensive> in your command, I believe, you may get separate result for CHG and CHH.

if you want combined result, then use the flag <--comprehensive --mergenonCpG> instead.

In order to implement this suggestion, you will have to find the file <modules.config> in your local set up.

I am not sure, if this is the only file that needs to be changed to implement your objective or there are more files which may need to be modified.
So, I don't know if it will work or not,
but if I were you, I would give it a try. If it fails, then I will always revert back to original modules.config and wait for response from nf-core/methylseq team.

Hope this helps
Best
Deep

@ewels
Copy link
Member

ewels commented Mar 23, 2024

Hi @MMJBerger,

I agree that this is a bit confusing to have a flag with the same name as Bismark that does more stuff than advertised. However, I'm a bit wary of changing it as it would be a breaking change that could affect existing users who upgrade.

Any suggestions on new flags that we could add progressively to get what you want?

Phil

@ewels
Copy link
Member

ewels commented Mar 23, 2024

@deep-buddingcoder you're on the right tracks for how to customise pipeline behaviour. But note that you should never need to edit the pipeline source code to adjust configuration - you can leave that alone and override it with your own local configs. See the docs here: https://nf-co.re/docs/usage/configuration#custom-configuration-files

@MMJBerger
Copy link
Author

MMJBerger commented Mar 25, 2024

Hi @deep-buddingcoder, hi @ewels !

Thank you for your answer, actually we managed to get the CG, CHG and CHH methylation_extractor files independently by simply removing the flag "--mergenonCpG" in the BISMARK_METHYLATIONEXTRACTOR section (module.config), and it worked verry well !

However, @ewels , I have another question, we could not obtain separated files for the rest of the pipeline. Indeed, we tried to use the --CX flag to get independent cytosine reports and the coverage files for each context, without succeeding.

We tried modifying the step BISMARK_COVERAGE2CYTOSINE by adding the '--CX' flag in the "ext.args" (module config) but it didn"t change anything, we get the files, but not separately for each sequence context.

results/bismark/methylation_calls/methylation_calls
--> Here, we have independent files for each samples and each sequences context

results/bismark/methylation_calls/methylation_coverage
--> Here, we have one .cov file for each samples, but without destinction between the sequence contexts

results/bismark/coverage2cytosine/reports
--> Here, we only have reports for each samples, but not for each contexts

results/bismark/coverage2cytosine/coverage
--> Here, we don't have any files (sorry for the potential silly question but is that normal ?)

Is there any way to act on it and get CG, CHG and CHH files for each samples in ../methylation_calls/methylation_coverage and ../coverage2cytosine/reports ?

@ewels I would suggest to add a flag, or a setting maybe, that could allow us to get all of these informations separately if wanted (and if it's possible). For plants analysis we need to consider methylation independently as each context and their related changes are informative and associated with specific types of chromatin regulation. It would really help a lot to get them directly with the pipeline instead of running another script to extract these informations afterwards.

Thank you again for your help,
Best regards,
Margot

@deep-buddingcoder
Copy link

Hi Phil,

with respect to your message:

@deep-buddingcoder you're on the right tracks for how to customise pipeline behaviour. But .....

Thanks a lot for the suggestion about "your own local configs". It helps me develop greater understanding about how source codes work and how custom changes should be implemented.

Best
Deep

@deep-buddingcoder
Copy link

Hi Margot,

I am glad to know that the suggestion worked for you.

As Phil has mentioned above, it may not be the best strategy to implement custom requirement. You may want to look into the usage of custom configuration file as suggested by Phil (in order to avoid fiddling with source code).

Best
Deep

@FelixKrueger
Copy link
Contributor

Hi Margot,

it is true that Bismark is written primarily with mammalian samples in mind, and the typical procedure only runs a single coverage/bedGraph conversion (limited to the CpG context by default). The nf-core/methylseq workflow even goes one step further by merging CHG and CHH contexts into a single non-CG context (somewhat counter-intuitive to what you would expect from the option --comprehensive on its own).

As a plant person, you would have to essentially run bismark2bedGraph three times separately from each other on the context specific output from the methylation extraction (which could be run in parallel though):

bismark2bedGraph      -o CpG_context.bedGraph --gzip CpG*
bismark2bedGraph --CX -o CHG_context.bedGraph --gzip CHG*
bismark2bedGraph --CX -o CHH_context.bedGraph --gzip CHH*

If you would also like the genome-wide reports, this would then have to be followed by running coverage2cytosine three times, with the context-specific coverage file generated by the step above:

coverage2cytosine      --gzip --genome /path/to/genome/-o CpG_report.txt CpG_context.cov.gz
coverage2cytosine --CX --gzip --genome /path/to/genome/-o CHG_report.txt CHG_context.cov.gz
coverage2cytosine --CX --gzip --genome /path/to/genome/-o CHH_report.txt CHH_context.cov.gz

While coverage2cytosine already is a separate module in nf-core (which we implemented at the end of '22 as it is needed to run it in NOMe-seq mode), bismark2bedGraph is not currently a stand-alone module but is called instead as part of the methylation extraction (by using the flag --bedGraph):

bismark_methylation_extractor $bam --bedGraph --counts --gzip --report $seqtype $args

So in short:
Plant-based folks cannot currently get coverage files/cytosine reports split by the three contexts but would need to run the above steps separately. Implementing such an option (--vegan 🌱 ?) would require adding a new module BISMARK2BEDGRAPH, and require splitting out the parts that are hard-coded into the methylation extractor process into 1 (or more) separate steps.

@MMJBerger
Copy link
Author

Hi @FelixKrueger !

Thank you for your answer, indeed we used to run things like you explained above for our analyzes to get coverage files and reports separately. However, the possibility to get everything done by itself on all samples analysed using nf-core/methylseq workflow was too appealing not to ask whether we could implement it 'easily' (by that I mean without any deep code modifications) in some way in the pipeline so ...

Thank you all for your help, and yes, if that's possible, we, plant-folks (and other, I am sure) would be very gratefull if such an option was implemented (--Ent 🌱?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants