add dorado #256

yuukiiwa · 2023-10-17T09:12:01Z

Currently, dorado v0.3.2 is incorporated into through its docker container. It works without demultiplexing, but doesn't work with demultiplexing with qcat downstream

I raised an issue on dorado's github repo requesting for the dorado v0.4.0 to be dockerized (here is the issue). Will not incorporate basecalling and demultiplexing until the dorado v0.4.0 is available on docker hub.

I made some changes to basecalling without demultiplexing, where the user can specify the input fast5 directory from the samplesheet for each sample. If a user has multiple sample, then he/she will have to indicate the respective input fast5 directory for those samples.

Here is the run on my machine:

github-actions · 2023-10-17T16:07:01Z

`nf-core lint` overall result: Failed ❌

Posted for pipeline commit 5738424

+| ✅ 158 tests passed       |+
!| ❗   5 tests had warnings |!
-| ❌  20 tests failed       |-

❌ Test failures:

files_exist - File must be removed: lib/Utils.groovy
files_exist - File must be removed: lib/WorkflowMain.groovy
files_exist - File must be removed: lib/NfcoreTemplate.groovy
files_exist - File must be removed: lib/WorkflowNanoseq.groovy
nextflow_config - Config default value incorrect: params.kit is set as `` in nextflow_schema.json but is `null` in `nextflow.config`.
nextflow_config - Config default value incorrect: params.flowcell is set as `` in nextflow_schema.json but is `null` in `nextflow.config`.
nextflow_config - Config default value incorrect: params.dorado_model is set as `` in nextflow_schema.json but is `null` in `nextflow.config`.
nextflow_config - Config default value incorrect: params.jaffal_ref_dir is set as for_jaffal in nextflow_schema.json but is null in nextflow.config.
nextflow_config - Config default value incorrect: params.tracedir is set as ${params.outdir}/pipeline_info in nextflow_schema.json but is ./results/pipeline_info in nextflow.config.
files_unchanged - .github/CONTRIBUTING.md does not match the template
files_unchanged - .github/PULL_REQUEST_TEMPLATE.md does not match the template
files_unchanged - .github/workflows/branch.yml does not match the template
files_unchanged - .github/workflows/linting_comment.yml does not match the template
files_unchanged - .github/workflows/linting.yml does not match the template
files_unchanged - assets/email_template.html does not match the template
files_unchanged - assets/email_template.txt does not match the template
files_unchanged - assets/nf-core-nanoseq_logo_light.png does not match the template
files_unchanged - docs/images/nf-core-nanoseq_logo_light.png does not match the template
files_unchanged - docs/images/nf-core-nanoseq_logo_dark.png does not match the template
files_unchanged - pyproject.toml does not match the template

❗ Test warnings:

nextflow_config - Config manifest.version should end in dev: 3.1.0
pipeline_todos - TODO string in README.md: Describe the minimum required steps to execute the pipeline, e.g. how to prepare samplesheets.
pipeline_todos - TODO string in base.config: Customise requirements for specific processes.
pipeline_todos - TODO string in awsfulltest.yml: You can customise AWS full pipeline tests as required
pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your prefered methods description, e.g. add publication citation for this pipeline

✅ Tests passed:

files_exist - File found: .gitattributes
files_exist - File found: .gitignore
files_exist - File found: .nf-core.yml
files_exist - File found: .editorconfig
files_exist - File found: .prettierignore
files_exist - File found: .prettierrc.yml
files_exist - File found: CHANGELOG.md
files_exist - File found: CITATIONS.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_exist - File found: nextflow_schema.json
files_exist - File found: nextflow.config
files_exist - File found: README.md
files_exist - File found: .github/.dockstore.yml
files_exist - File found: .github/CONTRIBUTING.md
files_exist - File found: .github/ISSUE_TEMPLATE/bug_report.yml
files_exist - File found: .github/ISSUE_TEMPLATE/config.yml
files_exist - File found: .github/ISSUE_TEMPLATE/feature_request.yml
files_exist - File found: .github/PULL_REQUEST_TEMPLATE.md
files_exist - File found: .github/workflows/branch.yml
files_exist - File found: .github/workflows/ci.yml
files_exist - File found: .github/workflows/linting_comment.yml
files_exist - File found: .github/workflows/linting.yml
files_exist - File found: assets/email_template.html
files_exist - File found: assets/email_template.txt
files_exist - File found: assets/sendmail_template.txt
files_exist - File found: assets/nf-core-nanoseq_logo_light.png
files_exist - File found: conf/modules.config
files_exist - File found: conf/test.config
files_exist - File found: conf/test_full.config
files_exist - File found: docs/images/nf-core-nanoseq_logo_light.png
files_exist - File found: docs/images/nf-core-nanoseq_logo_dark.png
files_exist - File found: docs/output.md
files_exist - File found: docs/README.md
files_exist - File found: docs/README.md
files_exist - File found: docs/usage.md
files_exist - File found: main.nf
files_exist - File found: assets/multiqc_config.yml
files_exist - File found: conf/base.config
files_exist - File found: conf/igenomes.config
files_exist - File found: .github/workflows/awstest.yml
files_exist - File found: .github/workflows/awsfulltest.yml
files_exist - File found: modules.json
files_exist - File found: pyproject.toml
files_exist - File not found check: Singularity
files_exist - File not found check: parameters.settings.json
files_exist - File not found check: pipeline_template.yml
files_exist - File not found check: .nf-core.yaml
files_exist - File not found check: bin/markdown_to_html.r
files_exist - File not found check: conf/aws.config
files_exist - File not found check: .github/workflows/push_dockerhub.yml
files_exist - File not found check: .github/ISSUE_TEMPLATE/bug_report.md
files_exist - File not found check: .github/ISSUE_TEMPLATE/feature_request.md
files_exist - File not found check: docs/images/nf-core-nanoseq_logo.png
files_exist - File not found check: .markdownlint.yml
files_exist - File not found check: .yamllint.yml
files_exist - File not found check: lib/Checks.groovy
files_exist - File not found check: lib/Completion.groovy
files_exist - File not found check: lib/Workflow.groovy
files_exist - File found check: lib/nfcore_external_java_deps.jar
files_exist - File not found check: .travis.yml
nextflow_config - Config variable found: manifest.name
nextflow_config - Config variable found: manifest.nextflowVersion
nextflow_config - Config variable found: manifest.description
nextflow_config - Config variable found: manifest.version
nextflow_config - Config variable found: manifest.homePage
nextflow_config - Config variable found: timeline.enabled
nextflow_config - Config variable found: trace.enabled
nextflow_config - Config variable found: report.enabled
nextflow_config - Config variable found: dag.enabled
nextflow_config - Config variable found: process.cpus
nextflow_config - Config variable found: process.memory
nextflow_config - Config variable found: process.time
nextflow_config - Config variable found: params.outdir
nextflow_config - Config variable found: params.input
nextflow_config - Config variable found: params.validationShowHiddenParams
nextflow_config - Config variable found: params.validationSchemaIgnoreParams
nextflow_config - Config variable found: manifest.mainScript
nextflow_config - Config variable found: timeline.file
nextflow_config - Config variable found: trace.file
nextflow_config - Config variable found: report.file
nextflow_config - Config variable found: dag.file
nextflow_config - Config variable (correctly) not found: params.nf_required_version
nextflow_config - Config variable (correctly) not found: params.container
nextflow_config - Config variable (correctly) not found: params.singleEnd
nextflow_config - Config variable (correctly) not found: params.igenomesIgnore
nextflow_config - Config variable (correctly) not found: params.name
nextflow_config - Config variable (correctly) not found: params.enable_conda
nextflow_config - Config timeline.enabled had correct value: true
nextflow_config - Config report.enabled had correct value: true
nextflow_config - Config trace.enabled had correct value: true
nextflow_config - Config dag.enabled had correct value: true
nextflow_config - Config manifest.name began with nf-core/
nextflow_config - Config variable manifest.homePage began with https://github.com/nf-core/
nextflow_config - Config dag.file ended with .html
nextflow_config - Config variable manifest.nextflowVersion started with >= or !>=
nextflow_config - Config params.custom_config_version is set to master
nextflow_config - Config params.custom_config_base is set to https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Lines for loading custom profiles found
nextflow_config - nextflow.config contains configuration profile test
nextflow_config - Config default value correct: params.input= ./samplesheet.csv
nextflow_config - Config default value correct: params.dorado_device= cuda:all
nextflow_config - Config default value correct: params.qcat_min_score= 60
nextflow_config - Config default value correct: params.aligner= minimap2
nextflow_config - Config default value correct: params.variant_caller= medaka
nextflow_config - Config default value correct: params.structural_variant_caller= sniffles
nextflow_config - Config default value correct: params.quantification_method= bambu
nextflow_config - Config default value correct: params.igenomes_base= s3://ngi-igenomes/igenomes
nextflow_config - Config default value correct: params.custom_config_version= master
nextflow_config - Config default value correct: params.custom_config_base= https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Config default value correct: params.max_cpus= 16
nextflow_config - Config default value correct: params.max_memory= 128.GB
nextflow_config - Config default value correct: params.max_time= 240.h
nextflow_config - Config default value correct: params.max_multiqc_email_size= 25.MB
nextflow_config - Config default value correct: params.validate_params= true
files_unchanged - .gitattributes matches the template
files_unchanged - .prettierrc.yml matches the template
files_unchanged - CODE_OF_CONDUCT.md matches the template
files_unchanged - LICENSE matches the template
files_unchanged - .github/.dockstore.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/bug_report.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/config.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/feature_request.yml matches the template
files_unchanged - assets/sendmail_template.txt matches the template
files_unchanged - docs/README.md matches the template
files_unchanged - .gitignore matches the template
files_unchanged - .prettierignore matches the template
actions_ci - '.github/workflows/ci.yml' is triggered on expected events
actions_ci - '.github/workflows/ci.yml' checks minimum NF version
actions_awstest - '.github/workflows/awstest.yml' is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml does not use -profile test
readme - README Nextflow minimum version badge matched config. Badge: 22.10.1, Config: 22.10.1
readme - README Zenodo placeholder was replaced with DOI.
pipeline_name_conventions - Name adheres to nf-core convention
template_strings - Did not find any Jinja template strings (185 files)
schema_lint - Schema lint passed
schema_lint - Schema title + description lint passed
schema_lint - Input mimetype lint passed: 'text/csv'
schema_params - Schema matched params returned from nextflow config
system_exit - No System.exit calls found
actions_schema_validation - Workflow validation passed: awstest.yml
actions_schema_validation - Workflow validation passed: branch.yml
actions_schema_validation - Workflow validation passed: ci.yml
actions_schema_validation - Workflow validation passed: clean-up.yml
actions_schema_validation - Workflow validation passed: fix-linting.yml
actions_schema_validation - Workflow validation passed: linting.yml
actions_schema_validation - Workflow validation passed: awsfulltest.yml
actions_schema_validation - Workflow validation passed: linting_comment.yml
merge_markers - No merge markers found in pipeline files
modules_json - Only installed modules found in modules.json
multiqc_config - 'assets/multiqc_config.yml' contains report_section_order
multiqc_config - 'assets/multiqc_config.yml' contains export_plots
multiqc_config - 'assets/multiqc_config.yml' contains report_comment
multiqc_config - 'assets/multiqc_config.yml' follows the ordering scheme of the minimally required plugins.
multiqc_config - 'assets/multiqc_config.yml' contains a matching 'report_comment'.
multiqc_config - 'assets/multiqc_config.yml' contains 'export_plots: true'.
modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'

Run details

nf-core/tools version 2.13.1
Run at 2024-04-26 06:45:54

jfy133

Lots of 'unusual' things to what I normally see in nf-core pipelines, but I don't see anything necessarily breaking, except for the licensing issue.

Main thing though you have quite a few modules that appear to only work with conda (not docker), but there is no documentation on this, I think it would be very important to add lots of warnings/checks in the code and also usage documentation warning users that conda won't be possible in many cases

jfy133 · 2024-04-26T06:12:48Z

conf/test_withpull.config

+    trim_barcodes=true
+    output_demultiplex_fast5 = true


Harshil Align™️!

jfy133 · 2024-04-26T06:13:50Z

modules/local/dorado.nf

+    tag "$meta.id"
+    label 'process_medium'
+
+    container "docker.io/ontresearch/dorado"


Just to double check, it is OK to use this license wise?

And would this work with singularity stilll?

jfy133 · 2024-04-26T06:14:22Z

modules/local/dorado.nf

+    dorado download --model $dorado_model
+    dorado basecaller $dorado_model $pod5_path --device $dorado_device --emit-fastq > basecall.fastq


Are there any options a user could theoretically add? Missing ext.args, for example.

jfy133 · 2024-04-26T06:15:27Z

modules/local/dorado.nf

+        dorado: \$(echo \$(dorado --version 2>&1) | sed -r 's/.{81}//')
+    END_VERSIONS
+
+    gzip basecall.fastq


This should probably go before the emissions, and should the file be forced to be basecall.fastq for downstream purposes? Otherwise Iw ould recommend using the ${prefix}.fastq system

jfy133 · 2024-04-26T06:15:55Z

modules/local/fast5_to_pod5.nf

+    label 'process_medium'
+
+    conda "conda-forge::r-base=4.0.3 bioconda::bioconductor-bambu=3.0.8 bioconda::bioconductor-bsgenome=1.66.0"
+    container "docker.io/yuukiiwa/pod5:0.2.4"


this could be a biocontainer

modules/local/get_test_data.nf

nextflow_schema.json

jfy133 · 2024-04-26T06:23:02Z

workflows/nanoseq.nf

+            if (workflow.profile.contains('test')){
+                ch_input_path = params.input_path
+            } else {
+                ch_input_path = Channel.fromPath(params.input_path, checkIfExists: true)
+            }


I don't really understand this, there is no difference in the way the channel gets taken right?

These are different.

For the test, I need to get the stage fast5 directory (with many fast5 files) from nf-core/test-dataset, so the input_path is not local, so checkInExist doesn't work

For the user input, there's no staging of the fast5 directory, so checking whether those exist is needed.

Ah sorry, misread input_path as just input 😅 . I find the testdata set up unusual which is also why tripped me up, but I don't think this is relevant for this PR (also given you've been waiting such a long time)

workflows/nanoseq.nf

jfy133 · 2024-04-26T06:26:25Z

nextflow_schema.json

+                "dorado_device": {
+                    "type": "string",
+                    "default": "cuda:all",
+                    "description": "Device specified using '--device'.",


Is dorado a particular model of nanopore or something? What is a dorado device?

this is for specifying what kind of compute one wants to use: cuda:all for all GPUs or cuda:0 for a specific GPU or CPU

Co-authored-by: James A. Fellows Yates <jfy133@gmail.com>

yuukiiwa added 2 commits October 17, 2023 16:53

fast5_to_pod5 and dorado (not demultiplexed)

1867595

update

b664520

nf-core deleted a comment from github-actions bot Oct 17, 2023

yuukiiwa added 5 commits October 18, 2023 08:33

update ci test pointers

b6309f2

fixes

9ab2f0f

fix

7a8cce9

editor config linting fix

510adfb

linting fix

db1eaaf

yuukiiwa requested a review from christopher-hakkaart October 18, 2023 03:34

jfy133 reviewed Apr 26, 2024

View reviewed changes

yuukiiwa and others added 3 commits April 26, 2024 14:37

remove checker

352d2f9

Co-authored-by: James A. Fellows Yates <jfy133@gmail.com>

clean up

d3d089d

Co-authored-by: James A. Fellows Yates <jfy133@gmail.com>

clean up

5738424

Co-authored-by: James A. Fellows Yates <jfy133@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add dorado #256

add dorado #256

yuukiiwa commented Oct 17, 2023 •

edited

github-actions bot commented Oct 17, 2023 •

edited

❌ Test failures:

❗ Test warnings:

✅ Tests passed:

Run details

jfy133 left a comment •

edited

jfy133 Apr 26, 2024

jfy133 Apr 26, 2024

jfy133 Apr 26, 2024

jfy133 Apr 26, 2024

jfy133 Apr 26, 2024

jfy133 Apr 26, 2024

maxulysse Apr 26, 2024

jfy133 Apr 26, 2024

yuukiiwa Apr 26, 2024

jfy133 Apr 26, 2024

jfy133 Apr 26, 2024

yuukiiwa Apr 26, 2024

		dorado download --model $dorado_model
		dorado basecaller $dorado_model $pod5_path --device $dorado_device --emit-fastq > basecall.fastq

add dorado #256

Are you sure you want to change the base?

add dorado #256

Conversation

yuukiiwa commented Oct 17, 2023 • edited

github-actions bot commented Oct 17, 2023 • edited

nf-core lint overall result: Failed ❌

❌ Test failures:

❗ Test warnings:

✅ Tests passed:

Run details

jfy133 left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yuukiiwa commented Oct 17, 2023 •

edited

github-actions bot commented Oct 17, 2023 •

edited

`nf-core lint` overall result: Failed ❌

jfy133 left a comment •

edited