Adding peak-calling with Genrich (fix #108) and multi-mapping read analysis #331

samuelruizperez · 2023-08-22T15:55:17Z

This PR:

Addresses issue Addition of peak-calling with Genrich #108:

This version of the pipeline keeps the current pipeline structure the same but adds a separate "branch" for peak-calling with Genrich and its necessary up and downstream processes.
- After merging resequenced libraries, BAMs are sorted by name but not filtered. Duplicate and blacklisted region removal is left to Genrich to allow multimapping read analysis.
- Currently, normalised bigWig coverage tracks are generated for these unfiltered BAMs and added to IGV. Here perhaps it would be more appropriate to replace these coverage files with normalised versions of the pileup.bedGraph files generated by Genrich? (see Genrich/issues/71)
- By default, biological replicates are analysed collectively by Genrich, so replicate merging and consensus peak analysis is skipped. Optionally, peak calling can be performed separately for each biological replicate by setting the parameter --skip_genrich_sep to false.
- Currently, this module generates narrow peaks by default. Maybe a broad peak option could be added by modifying the peak calling parameters based on Appropriate area under the curve cut-off for ChIP-seq data jsh58/Genrich#58 (comment) ?.
- featureCounts data for Genrich's output is a work in progress.
NOTES:
- This PR uses this version of the nf-core module: Update Genrich module input tuple and argument modules#3720.
- Based on DO NOT MERGE yet: Discuss fixes of #164 #168 #169 #301, it would seem that using Genrich's ATAC-seq mode for peak calling would solve:

Adds a parameter --analyze_multimappers <Int> to include multimapping reads (secondary alignments) in Genrich's peak calling process (see Genrich#multimap). Currently, it only works with --aligner bowtie2 or --aligner star, the option with Chromap is a work in progress.

For example, setting --analyze_multimappers 50, will run Bowtie2 with -k 50 (or STAR with --outFilterMultimapNmax 50 --winAnchorMultimapNmax 100) and Genrich with -s 50. This means the aligner will report 50 distinct valid alignments for each read, and Genrich will keep secondary alignments whose scores are within 50 of the best for peak calling.

Secondary alignments will still be filtered out with samtools view -q 1 -F 4 -F 256 before calling peaks with MACS2 unless --keep_multi_map is set.

NOTES:
- I have tested --analyze_multimappers with all the test datasets without issues, but when using other experimental data I have found that the allocated resources for Bowtie2 alignment (withLabel:process_high) might not be enough. Usually, it runs out of time or memory. Perhaps we could set up a conditional label/process resource allocation when this parameter is used with Bowtie2?

PR checklist

This comment contains a description of changes (with reason).
If you've added a new tool - have you followed the pipeline conventions in the contribution docs
Make sure your code lints (nf-core lint).
Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
Usage Documentation in docs/usage.md is updated.
Output Documentation in docs/output.md is updated.
CHANGELOG.md is updated.
README.md is updated (including new tool citations and authors/contributors).

I would really appreciate it if you could take a look at these changes. This is my first attempt at a contribution to nf-core, so please let me know if have made any basic mistakes or if I could do something better to follow the guidelines.

Thanks!

@samuelruizperez

github-actions · 2023-08-22T15:57:28Z

`nf-core lint` overall result: Passed ✅ ⚠️

Posted for pipeline commit 3f103dd

+| ✅ 156 tests passed       |+
!| ❗   3 tests had warnings |!

❗ Test warnings:

pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
system_exit - System.exit in WorkflowAtacseq.groovy: System.exit(1) [line 22]
system_exit - System.exit in WorkflowAtacseq.groovy: System.exit(1) [line 35]

✅ Tests passed:

files_exist - File found: .gitattributes
files_exist - File found: .gitignore
files_exist - File found: .nf-core.yml
files_exist - File found: .editorconfig
files_exist - File found: .prettierignore
files_exist - File found: .prettierrc.yml
files_exist - File found: CHANGELOG.md
files_exist - File found: CITATIONS.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_exist - File found: nextflow_schema.json
files_exist - File found: nextflow.config
files_exist - File found: README.md
files_exist - File found: .github/.dockstore.yml
files_exist - File found: .github/CONTRIBUTING.md
files_exist - File found: .github/ISSUE_TEMPLATE/bug_report.yml
files_exist - File found: .github/ISSUE_TEMPLATE/config.yml
files_exist - File found: .github/ISSUE_TEMPLATE/feature_request.yml
files_exist - File found: .github/PULL_REQUEST_TEMPLATE.md
files_exist - File found: .github/workflows/branch.yml
files_exist - File found: .github/workflows/ci.yml
files_exist - File found: .github/workflows/linting_comment.yml
files_exist - File found: .github/workflows/linting.yml
files_exist - File found: assets/email_template.html
files_exist - File found: assets/email_template.txt
files_exist - File found: assets/sendmail_template.txt
files_exist - File found: assets/nf-core-atacseq_logo_light.png
files_exist - File found: conf/modules.config
files_exist - File found: conf/test.config
files_exist - File found: conf/test_full.config
files_exist - File found: docs/images/nf-core-atacseq_logo_light.png
files_exist - File found: docs/images/nf-core-atacseq_logo_dark.png
files_exist - File found: docs/output.md
files_exist - File found: docs/README.md
files_exist - File found: docs/README.md
files_exist - File found: docs/usage.md
files_exist - File found: lib/nfcore_external_java_deps.jar
files_exist - File found: lib/NfcoreTemplate.groovy
files_exist - File found: lib/Utils.groovy
files_exist - File found: lib/WorkflowMain.groovy
files_exist - File found: main.nf
files_exist - File found: assets/multiqc_config.yml
files_exist - File found: conf/base.config
files_exist - File found: conf/igenomes.config
files_exist - File found: .github/workflows/awstest.yml
files_exist - File found: .github/workflows/awsfulltest.yml
files_exist - File found: lib/WorkflowAtacseq.groovy
files_exist - File found: modules.json
files_exist - File found: pyproject.toml
files_exist - File not found check: Singularity
files_exist - File not found check: parameters.settings.json
files_exist - File not found check: .nf-core.yaml
files_exist - File not found check: bin/markdown_to_html.r
files_exist - File not found check: conf/aws.config
files_exist - File not found check: .github/workflows/push_dockerhub.yml
files_exist - File not found check: .github/ISSUE_TEMPLATE/bug_report.md
files_exist - File not found check: .github/ISSUE_TEMPLATE/feature_request.md
files_exist - File not found check: docs/images/nf-core-atacseq_logo.png
files_exist - File not found check: .markdownlint.yml
files_exist - File not found check: .yamllint.yml
files_exist - File not found check: lib/Checks.groovy
files_exist - File not found check: lib/Completion.groovy
files_exist - File not found check: lib/Workflow.groovy
files_exist - File not found check: .travis.yml
nextflow_config - Config variable found: manifest.name
nextflow_config - Config variable found: manifest.nextflowVersion
nextflow_config - Config variable found: manifest.description
nextflow_config - Config variable found: manifest.version
nextflow_config - Config variable found: manifest.homePage
nextflow_config - Config variable found: timeline.enabled
nextflow_config - Config variable found: trace.enabled
nextflow_config - Config variable found: report.enabled
nextflow_config - Config variable found: dag.enabled
nextflow_config - Config variable found: process.cpus
nextflow_config - Config variable found: process.memory
nextflow_config - Config variable found: process.time
nextflow_config - Config variable found: params.outdir
nextflow_config - Config variable found: params.input
nextflow_config - Config variable found: params.validationShowHiddenParams
nextflow_config - Config variable found: params.validationSchemaIgnoreParams
nextflow_config - Config variable found: manifest.mainScript
nextflow_config - Config variable found: timeline.file
nextflow_config - Config variable found: trace.file
nextflow_config - Config variable found: report.file
nextflow_config - Config variable found: dag.file
nextflow_config - Config variable (correctly) not found: params.nf_required_version
nextflow_config - Config variable (correctly) not found: params.container
nextflow_config - Config variable (correctly) not found: params.singleEnd
nextflow_config - Config variable (correctly) not found: params.igenomesIgnore
nextflow_config - Config variable (correctly) not found: params.name
nextflow_config - Config variable (correctly) not found: params.enable_conda
nextflow_config - Config timeline.enabled had correct value: true
nextflow_config - Config report.enabled had correct value: true
nextflow_config - Config trace.enabled had correct value: true
nextflow_config - Config dag.enabled had correct value: true
nextflow_config - Config manifest.name began with nf-core/
nextflow_config - Config variable manifest.homePage began with https://github.com/nf-core/
nextflow_config - Config dag.file ended with .html
nextflow_config - Config variable manifest.nextflowVersion started with >= or !>=
nextflow_config - Config manifest.version ends in dev: 2.2.0dev
nextflow_config - Config params.custom_config_version is set to master
nextflow_config - Config params.custom_config_base is set to https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Lines for loading custom profiles found
files_unchanged - .gitattributes matches the template
files_unchanged - .prettierrc.yml matches the template
files_unchanged - CODE_OF_CONDUCT.md matches the template
files_unchanged - LICENSE matches the template
files_unchanged - .github/.dockstore.yml matches the template
files_unchanged - .github/CONTRIBUTING.md matches the template
files_unchanged - .github/ISSUE_TEMPLATE/bug_report.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/config.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/feature_request.yml matches the template
files_unchanged - .github/PULL_REQUEST_TEMPLATE.md matches the template
files_unchanged - .github/workflows/branch.yml matches the template
files_unchanged - .github/workflows/linting_comment.yml matches the template
files_unchanged - .github/workflows/linting.yml matches the template
files_unchanged - assets/email_template.html matches the template
files_unchanged - assets/email_template.txt matches the template
files_unchanged - assets/sendmail_template.txt matches the template
files_unchanged - assets/nf-core-atacseq_logo_light.png matches the template
files_unchanged - docs/images/nf-core-atacseq_logo_light.png matches the template
files_unchanged - docs/images/nf-core-atacseq_logo_dark.png matches the template
files_unchanged - docs/README.md matches the template
files_unchanged - lib/nfcore_external_java_deps.jar matches the template
files_unchanged - lib/NfcoreTemplate.groovy matches the template
files_unchanged - .gitignore matches the template
files_unchanged - .prettierignore matches the template
files_unchanged - pyproject.toml matches the template
actions_ci - '.github/workflows/ci.yml' is triggered on expected events
actions_ci - '.github/workflows/ci.yml' checks minimum NF version
actions_awstest - '.github/workflows/awstest.yml' is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml does not use -profile test
readme - README Nextflow minimum version badge matched config. Badge: 23.04.0, Config: 23.04.0
readme - README Zenodo placeholder was replaced with DOI.
pipeline_name_conventions - Name adheres to nf-core convention
template_strings - Did not find any Jinja template strings (239 files)
schema_lint - Schema lint passed
schema_lint - Schema title + description lint passed
schema_lint - Input mimetype lint passed: 'text/csv'
schema_params - Schema matched params returned from nextflow config
actions_schema_validation - Workflow validation passed: linting.yml
actions_schema_validation - Workflow validation passed: awsfulltest.yml
actions_schema_validation - Workflow validation passed: linting_comment.yml
actions_schema_validation - Workflow validation passed: clean-up.yml
actions_schema_validation - Workflow validation passed: awstest.yml
actions_schema_validation - Workflow validation passed: fix-linting.yml
actions_schema_validation - Workflow validation passed: ci.yml
actions_schema_validation - Workflow validation passed: branch.yml
merge_markers - No merge markers found in pipeline files
modules_json - Only installed modules found in modules.json
multiqc_config - 'assets/multiqc_config.yml' follows the ordering scheme of the minimally required plugins.
multiqc_config - 'assets/multiqc_config.yml' contains a matching 'report_comment'.
multiqc_config - 'assets/multiqc_config.yml' contains 'export_plots: true'.
modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'

Run details

nf-core/tools version 2.9
Run at 2023-09-08 13:01:36

samuelruizperez added 15 commits August 14, 2023 16:00

Adding Genrich peak calling

c29d5dd

fixing genrich_sep output dirs

7ba93fe

Adding if closure in modules.config

3932ea2

fixing se bam name sorting for genrich

238f781

fixing se bam name sorting for genrich

7207209

fixing se bam name sorting for genrich

1f9e99e

fixing se bam name sorting for genrich

02f89ee

added distance to TSS 1kb subset to plot

002351d

adding default skip_genrich_sep = true

6997f7d

edited analyze_multimappers exit message

47c6bd1

simplified QC plotting for macs2 and genrich

fc85733

added skip_genrich_sep to schema

199c8c5

Removing unused peak annotation code

a18aa15

Removing unused peak annotation code

c1830f9

fixed skip_genrich_sep config selector

3f4fe7e

samuelruizperez added enhancement New feature or request WIP Work in progress labels Aug 22, 2023

samuelruizperez requested review from JoseEspinosa and maxulysse August 22, 2023 15:55

samuelruizperez removed request for maxulysse and JoseEspinosa August 23, 2023 07:37

samuelruizperez added this to the 2.2 milestone Aug 23, 2023

samuelruizperez added 4 commits August 29, 2023 09:51

Adding STAR analyze_multimappers support 1

64ca3b1

fixed unrecognized parameter for STAR multimappers

3b014d6

fixing peak annotation dirs

8eb8271

Merge branch 'nf-core:dev' into dev

3f103dd

JoseEspinosa mentioned this pull request Oct 16, 2023

Update Genrich module input tuple and argument nf-core/modules#3720

Merged

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding peak-calling with Genrich (fix #108) and multi-mapping read analysis #331

Adding peak-calling with Genrich (fix #108) and multi-mapping read analysis #331

samuelruizperez commented Aug 22, 2023 •

edited

github-actions bot commented Aug 22, 2023 •

edited

❗ Test warnings:

✅ Tests passed:

Run details

Adding peak-calling with Genrich (fix #108) and multi-mapping read analysis #331

Are you sure you want to change the base?

Adding peak-calling with Genrich (fix #108) and multi-mapping read analysis #331

Conversation

samuelruizperez commented Aug 22, 2023 • edited

PR checklist

github-actions bot commented Aug 22, 2023 • edited

nf-core lint overall result: Passed ✅ ⚠️

❗ Test warnings:

✅ Tests passed:

Run details

samuelruizperez commented Aug 22, 2023 •

edited

github-actions bot commented Aug 22, 2023 •

edited

`nf-core lint` overall result: Passed ✅ ⚠️