Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long read pipeline Canu issue with Mouse Genome reference #416

Open
nikhil777shingte opened this issue Aug 12, 2023 · 2 comments
Open

Long read pipeline Canu issue with Mouse Genome reference #416

nikhil777shingte opened this issue Aug 12, 2023 · 2 comments

Comments

@nikhil777shingte
Copy link

I am trying to run a long read pipeline with ONT long reads data for mouse models using Terra platform.

I was able to run https://github.com/broadinstitute/long-read-pipelines/blob/kvg_guppy_cpu/wdl/pipelines/ONT/Preprocessing/ONTBasecall.wdl successfully using my fast5 files.

When I am trying to run https://github.com/broadinstitute/long-read-pipelines/blob/3.0.1/wdl/ONTAssembleWithCanu.wdl , I am running into the below issue. Can you please advise.

2023/08/12 03:56:10 Starting container setup.
2023/08/12 03:56:12 Done container setup.
2023/08/12 03:56:13 Starting localization.
2023/08/12 03:56:19 Localization script execution started...
2023/08/12 03:56:19 Localizing input gs://fc-88614ae6-5245-4a6e-ab14-5c3fc9d007a2/submissions/87ee6a97-1be5-4429-9557-817c073de5ae/ONTAssembleWithCanu/f8d2b9d9-a770-45ef-a9b5-6b2a1d0b456f/call-MergeFastqs/cacheCopy/merged.fq.gz -> /cromwell_root/fc-88614ae6-5245-4a6e-ab14-5c3fc9d007a2/submissions/87ee6a97-1be5-4429-9557-817c073de5ae/ONTAssembleWithCanu/f8d2b9d9-a770-45ef-a9b5-6b2a1d0b456f/call-MergeFastqs/cacheCopy/merged.fq.gz
2023/08/12 03:56:25 Localizing input gs://fc-88614ae6-5245-4a6e-ab14-5c3fc9d007a2/submissions/87ee6a97-1be5-4429-9557-817c073de5ae/ONTAssembleWithCanu/f8d2b9d9-a770-45ef-a9b5-6b2a1d0b456f/call-Canu/Canu/61aef15e-aa55-498e-9e6f-e09331f445da/call-Correct/script -> /cromwell_root/script
2023/08/12 03:56:27 Localization script execution complete.
2023/08/12 03:56:31 Done localization.
2023/08/12 03:56:32 Running user action: docker run -v /mnt/local-disk:/cromwell_root --entrypoint=/bin/bash us.gcr.io/broad-dsp-lrma/lr-canu@sha256:b116e4c74fa74e384491457fb09b6729e40138d00d7611fea912ab130386d9eb /cromwell_root/script
+ canu -correct -p 65209 -d canu_correct_output genomeSize=2731m corMaxEvidenceErate=0.15 correctedErrorRate=0.15 -nanopore /cromwell_root/fc-88614ae6-5245-4a6e-ab14-5c3fc9d007a2/submissions/87ee6a97-1be5-4429-9557-817c073de5ae/ONTAssembleWithCanu/f8d2b9d9-a770-45ef-a9b5-6b2a1d0b456f/call-MergeFastqs/cacheCopy/merged.fq.gz
-- Canu 2.0
--
-- Detected Java(TM) Runtime Environment '1.8.0_252' (from '/usr/local/openjdk-8/bin/java') with -d64 support.
--
-- WARNING:
-- WARNING: Failed to run gnuplot using command 'gnuplot'.
-- WARNING: Plots will be disabled.
-- WARNING:
--
-- Detected 32 CPUs and 31 gigabytes of memory.
-- No grid engine detected, grid and staging disabled.
--
-- ERROR
-- ERROR
-- ERROR Found 1 machine configuration:
-- ERROR class0 - 1 machines with 32 cores with 31 GB memory each.
-- ERROR
-- ERROR Task red can't run on any available machines.
-- ERROR It is requesting:
-- ERROR redMemory=32-48 memory (gigabytes)
-- ERROR redThreads=4-8 threads
-- ERROR
-- ERROR No available machine configuration can run this task.
-- ERROR
-- ERROR Possible solutions:
-- ERROR Change redMemory and/or redThreads
-- ERROR

ABORT:
ABORT: Canu 2.0
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting. If that doesn't work, ask for help.
ABORT:
ABORT: task red failed to find a configuration to run on.
ABORT:
2023/08/12 03:56:34 Starting delocalization.
2023/08/12 03:56:35 Delocalization script execution started...
2023/08/12 03:56:35 Delocalizing output /cromwell_root/memory_retry_rc -> gs://fc-88614ae6-5245-4a6e-ab14-5c3fc9d007a2/submissions/87ee6a97-1be5-4429-9557-817c073de5ae/ONTAssembleWithCanu/f8d2b9d9-a770-45ef-a9b5-6b2a1d0b456f/call-Canu/Canu/61aef15e-aa55-498e-9e6f-e09331f445da/call-Correct/memory_retry_rc
2023/08/12 03:56:37 Delocalizing output /cromwell_root/rc -> gs://fc-88614ae6-5245-4a6e-ab14-5c3fc9d007a2/submissions/87ee6a97-1be5-4429-9557-817c073de5ae/ONTAssembleWithCanu/f8d2b9d9-a770-45ef-a9b5-6b2a1d0b456f/call-Canu/Canu/61aef15e-aa55-498e-9e6f-e09331f445da/call-Correct/rc
2023/08/12 03:56:39 Delocalizing output /cromwell_root/stdout -> gs://fc-88614ae6-5245-4a6e-ab14-5c3fc9d007a2/submissions/87ee6a97-1be5-4429-9557-817c073de5ae/ONTAssembleWithCanu/f8d2b9d9-a770-45ef-a9b5-6b2a1d0b456f/call-Canu/Canu/61aef15e-aa55-498e-9e6f-e09331f445da/call-Correct/stdout
2023/08/12 03:56:40 Delocalizing output /cromwell_root/stderr -> gs://fc-88614ae6-5245-4a6e-ab14-5c3fc9d007a2/submissions/87ee6a97-1be5-4429-9557-817c073de5ae/ONTAssembleWithCanu/f8d2b9d9-a770-45ef-a9b5-6b2a1d0b456f/call-Canu/Canu/61aef15e-aa55-498e-9e6f-e09331f445da/call-Correct/stderr
2023/08/12 03:56:42 Delocalizing output /cromwell_root/canu_correct_output/65209.correctedReads.fasta.gz -> gs://fc-88614ae6-5245-4a6e-ab14-5c3fc9d007a2/submissions/87ee6a97-1be5-4429-9557-817c073de5ae/ONTAssembleWithCanu/f8d2b9d9-a770-45ef-a9b5-6b2a1d0b456f/call-Canu/Canu/61aef15e-aa55-498e-9e6f-e09331f445da/call-Correct/canu_correct_output/65209.correctedReads.fasta.gz
Required file output '/cromwell_root/canu_correct_output/65209.correctedReads.fasta.gz' does not exist.
@SHuang-Broad
Copy link
Collaborator

Hi,

Based on the error message, the resource allocated for canu isn't enough for it to run.
You can adjust the WDL accordingly when running the pipeline.

That being said, canu is resource hungry and the mouse genome is large.
So the assembly could run for weeks for your data (it could be really really expensive).
The workflow is really written for the assembly of small genomes.
I'd advise planning your analysis strategy accordingly before running this pipeline.

Regards,
Steve

@nikhil777shingte
Copy link
Author

nikhil777shingte commented Aug 18, 2023

Hi Steve, thanks for your response. I was actually able to run this successfully with relatively inexpensive cost ( less than 10$ )

I should provide more context.

Sequencing Data I have is coming ONT sequencer with adaptive sampling. Due to this, I have to run few more steps in addition to the this pipeline to select region of interest for which I have the reads. I have forked repository and made changes so that I am able to pass Canu parameter of estimated size given my adaptive sampling reads.

Earlier, mouse genome size used by Canu was incorrect in my case since my data is from adaptive sampling.

You can find more details here :

https://github.com/nikhil777shingte/long-read-pipelines/tree/test-long-read-canu-assembly

I still have changes made for the Canu resources here [ when it was using mouse genome size ] but with workflow changes I have done, dont think Canu will be resource intensive and able to finish the pipeline within couple of hours.

Link to dockstore published workflow : https://dockstore.org/workflows/github.com/nikhil777shingte/long-read-pipelines/ONTAssembleWithCanuAdaptiveSampling:test-long-read-canu-assembly

Terra details :

ONTAssembleWithCanuAdaptiveSampling
ID:
9aadab80-9ca6-4b89-b29b-459295d9097a

workspace-id: 88614ae6-5245-4a6e-ab14-5c3fc9d007a2
submission-id: 4c630f67-bdbe-4521-b415-4205c5828429

I am not sure if you have adaptive sampling support already in current pipeline or have that in your backlog, but would be good to hear your thoughts.

Thanks
Nikhil

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants