Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding masurca version 4.1.1 #908

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from
Draft

adding masurca version 4.1.1 #908

wants to merge 3 commits into from

Conversation

erinyoung
Copy link
Contributor

There's a new version of MASURCA! (More info here: https://github.com/alekseyzimin/masurca/releases/tag/v4.1.1)

I copied the files from 4.1.0 and made the following changes:

  • updated to ubuntu:jammy
  • updated the software version ARG
  • added a hybrid assembly example to the README
  • bwa is now installed via apt-get

Pull Request (PR) checklist:

  • Include a description of what is in this pull request in this message.
  • The dockerfile successfully builds to a test target for the user creating the PR. (i.e. docker build --tag samtools:1.15test --target test docker-builds/samtools/1.15 )
  • Directory structure as name of the tool in lower case with special characters removed with a subdirectory of the version number (i.e. spades/3.12.0/Dockerfile)
    • (optional) All test files are located in same directory as the Dockerfile (i.e. shigatyper/2.0.1/test.sh)
  • Create a simple container-specific README.md in the same directory as the Dockerfile (i.e. spades/3.12.0/README.md)
    • If this README is longer than 30 lines, there is an explanation as to why more detail was needed
  • Dockerfile includes the recommended LABELS
  • Main README.md has been updated to include the tool and/or version of the dockerfile(s) in this PR
  • Program_Licenses.md contains the tool(s) used in this PR and has been updated for any missing

@erinyoung erinyoung marked this pull request as ready for review March 14, 2024 19:21
@kapsakcj kapsakcj self-requested a review March 21, 2024 17:40
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for adding all these relative links 👍

@kapsakcj
Copy link
Collaborator

kapsakcj commented May 3, 2024

I see some errors in the test command at the end, I'm surprised it exited 0 and the image built successfully. Looks like it requires file to be installed:

#13 [test 2/2] RUN wget -q https://github.com/rrwick/Unicycler/raw/69e712eb95c4b9f8a46aade467260260a9ce7a91/sample_data/short_reads_1.fastq.gz &&   wget -q https://github.com/rrwick/Unicycler/raw/69e712eb95c4b9f8a46aade467260260a9ce7a91/sample_data/short_reads_2.fastq.gz &&   wget -q https://github.com/rrwick/Unicycler/raw/69e712eb95c4b9f8a46aade467260260a9ce7a91/sample_data/long_reads_low_depth.fastq.gz &&   masurca -t 2 -i short_reads_1.fastq.gz,short_reads_2.fastq.gz -r long_reads_low_depth.fastq.gz
#13 2.313 Verifying PATHS...
#13 2.316 jellyfish OK
#13 2.361 runCA OK
#13 2.373 createSuperReadsForDirectory.perl OK
#13 2.373 creating script file for the actions...done.
#13 2.373 execute assemble.sh to run assembly
#13 2.382 [Thu Mar 14 19:06:59 UTC 2024] Processing pe library reads
#13 2.385 /MaSuRCA-4.1.1/bin/expand_fastq: 12: file: not found
#13 2.385 WARNING!!! Unknown file type for input file 'short_reads_1.fastq.gz', assuming type text/
#13 2.386 /MaSuRCA-4.1.1/bin/expand_fastq: 12: file: not found
#13 2.386 WARNING!!! Unknown file type for input file 'short_reads_2.fastq.gz', assuming type text/
#13 2.387 File 'short_reads_1.fastq.gz' is not a fastq file
#13 2.387 File 'short_reads_2.fastq.gz' is not a fastq file
#13 2.392 [Thu Mar 14 19:06:59 UTC 2024] Average PE read length -nan
#13 2.395 Illegal division by zero at -e line 1.
#13 2.397 [Thu Mar 14 19:06:59 UTC 2024] Using kmer size of for the graph
#13 2.401 [Thu Mar 14 19:06:59 UTC 2024] MIN_Q_CHAR: 64
#13 2.406 [Thu Mar 14 19:06:59 UTC 2024] Creating mer database for Quorum
#13 3.652 [Thu Mar 14 19:07:00 UTC 2024] Error correct PE
#13 4.437 [Thu Mar 14 19:07:01 UTC 2024] Error correction of PE reads failed. Check pe.cor.log.

maybe try adding file to the list of things intstalled via apt-get to see if it resolves?

@kapsakcj
Copy link
Collaborator

kapsakcj commented May 3, 2024

OK tests look happier now that file is installed, but now there's an error on mega-reads ? But again is not caught as an error, the image builds successfully despite the error (exit code = 0 when it should not be)

#13 [test 2/2] RUN wget -q https://github.com/rrwick/Unicycler/raw/69e712eb95c4b9f8a46aade467260260a9ce7a91/sample_data/short_reads_1.fastq.gz &&   wget -q https://github.com/rrwick/Unicycler/raw/69e712eb95c4b9f8a46aade467260260a9ce7a91/sample_data/short_reads_2.fastq.gz &&   wget -q https://github.com/rrwick/Unicycler/raw/69e712eb95c4b9f8a46aade467260260a9ce7a91/sample_data/long_reads_low_depth.fastq.gz &&   masurca -t 2 -i short_reads_1.fastq.gz,short_reads_2.fastq.gz -r long_reads_low_depth.fastq.gz
#13 1.693 Verifying PATHS...
#13 1.696 jellyfish OK
#13 1.742 runCA OK
#13 1.755 createSuperReadsForDirectory.perl OK
#13 1.755 creating script file for the actions...done.
#13 1.755 execute assemble.sh to run assembly
#13 1.764 [Fri May  3 17:55:35 UTC 2024] Processing pe library reads
#13 1.959 [Fri May  3 17:55:35 UTC 2024] Average PE read length 125
#13 2.066 [Fri May  3 17:55:35 UTC 2024] Using kmer size of 83 for the graph
#13 2.225 [Fri May  3 17:55:35 UTC 2024] MIN_Q_CHAR: 33
#13 2.230 [Fri May  3 17:55:35 UTC 2024] Creating mer database for Quorum
#13 4.109 [Fri May  3 17:55:37 UTC 2024] Error correct PE
#13 10.04 [Fri May  3 17:55:43 UTC 2024] Estimating genome size
#13 11.63 [Fri May  3 17:55:45 UTC 2024] Estimated genome size: 187640
#13 11.63 [Fri May  3 17:55:45 UTC 2024] Creating k-unitigs with k=83
#13 13.49 [Fri May  3 17:55:47 UTC 2024] Computing super reads from PE 
#13 14.02 [Fri May  3 17:55:47 UTC 2024] Using CABOG from /MaSuRCA-4.1.1/bin/../CA8/Linux-amd64/bin
#13 14.02 [Fri May  3 17:55:47 UTC 2024] Running mega-reads correction/assembly
#13 14.02 [Fri May  3 17:55:47 UTC 2024] Using mer size 17 for mapping, B=15, d=0.02
#13 14.02 [Fri May  3 17:55:47 UTC 2024] Estimated Genome Size 187640
#13 14.02 [Fri May  3 17:55:47 UTC 2024] Estimated Ploidy 1
#13 14.03 [Fri May  3 17:55:47 UTC 2024] Using 2 threads
#13 14.03 [Fri May  3 17:55:47 UTC 2024] Output prefix mr.83.17.15.0.02
#13 14.04 [Fri May  3 17:55:47 UTC 2024] Creating k-unitigs for k=19
#13 14.85 [Fri May  3 17:55:48 UTC 2024] Pre-correcting long reads
#13 15.09 [Fri May  3 17:55:48 UTC 2024] Pre-corrected reads are in longest_reads.25x.fa
#13 15.10 [Fri May  3 17:55:48 UTC 2024] Computing mega-reads
#13 15.10 [Fri May  3 17:55:48 UTC 2024] Running locally in 1 batch
#13 15.10 [Fri May  3 17:55:48 UTC 2024] mega-reads pass 1 failed
#13 15.10 [Fri May  3 17:55:48 UTC 2024] mega-reads exited before assembly

@kapsakcj
Copy link
Collaborator

kapsakcj commented May 3, 2024

Something about not able to set mempolicy and interleave mask:

$ cat create_mega-reads.err 
set_mempolicy: Operation not permitted
setting interleave mask: Operation not permitted

I'm not sure what to do here....

@erinyoung
Copy link
Contributor Author

From what I gather, it looks like Docker prevented numactl from setting mempolicy.

https://forums.docker.com/t/cannot-run-numactl-interleave-all-in-docker/40631/5

@erinyoung
Copy link
Contributor Author

I am encountering problems with the biocontainer image as well (quay.io/biocontainers/masurca:4.1.1--pl5321hb5bd705_0):

# masurca -t 2 -i short_reads_1.fastq.gz,short_reads_2.fastq.gz -r long_reads_low_depth.fastq.gz
Verifying PATHS...
jellyfish OK
runCA OK
createSuperReadsForDirectory.perl OK
creating script file for the actions...done.
execute assemble.sh to run assembly
[Fri May  3 19:16:20 UTC 2024] Processing pe library reads
/usr/local/bin/expand_fastq: line 12: file: command not found
/usr/local/bin/expand_fastq: line 12: file: command not found
WARNING!!! Unknown file type for input file 'short_reads_2.fastq.gz', assuming type text/
WARNING!!! Unknown file type for input file 'short_reads_1.fastq.gz', assuming type text/
File 'short_reads_2.fastq.gz' is not a fastq file
File 'short_reads_1.fastq.gz' is not a fastq file
awk: cmd. line:1: Division by zero
[Fri May  3 19:16:20 UTC 2024] Average PE read length
Illegal division by zero at -e line 1.
[Fri May  3 19:16:20 UTC 2024] Using kmer size of for the graph
[Fri May  3 19:16:20 UTC 2024] MIN_Q_CHAR: 64
[Fri May  3 19:16:20 UTC 2024] Creating mer database for Quorum
[Fri May  3 19:16:24 UTC 2024] Error correct PE
[Fri May  3 19:16:26 UTC 2024] Error correction of PE reads failed. Check pe.cor.log.

So... hmm...

@erinyoung
Copy link
Contributor Author

I'm a little torn about what to do with this one.

  1. I only use this image for POLCA
  2. But I've been moving to pypolca
  3. But what about the people that actually want to use this for hybrid assembly?
  4. POLCA doesn't even have any changes in this version

I need to read up on the new GRID_ENGINE=MANUAL to see if that can fix things. I'll move this to a draft for now.

@erinyoung erinyoung marked this pull request as draft May 3, 2024 19:22
@kapsakcj
Copy link
Collaborator

kapsakcj commented May 3, 2024

I wouldn't burn too much time/effort on this if you are utilizing POLCA in other ways.

If someone really wants to use masurca for hybrid assembly via a docker image, then we can ask them to help with resolving these issues. I don't work with this tool ever so it's difficult for me to troubleshoot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants