Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] use of max_genome_size? #441

Open
Svnipni opened this issue Sep 11, 2023 · 4 comments
Open

[question] use of max_genome_size? #441

Svnipni opened this issue Sep 11, 2023 · 4 comments
Labels
question Further information is requested

Comments

@Svnipni
Copy link

Svnipni commented Sep 11, 2023

I've had great usage of this pipeline on my isolates. But one isolate sample seems to always return a 11Mbp genome on assembly where a typical P.protegens genome size of ~6.5Mbp is expected. So I tried to run

bactopia --R1 B_1.fastq.gz --R2 B_2.fastq.gz --sample sample --species "Pseudomonas protegens" --genome_size median --max_genome_size 7000000 --min_coverage 100 --datasets datasets --outdir out/

But it still returns the same size genome.
Checkm indicated there's some (5.43%) contamination and both Kmer and Kaiju queries on the fastq.gz reads shows a near full hit (70%) matching P.protegens, with some Citrobacter spp sprinkled in.
It seem max_genome_size doesn't affect the assembly, or perhaps I'm not using it properly?

Any advice would be greatly appreciated.

@Svnipni Svnipni added the question Further information is requested label Sep 11, 2023
@rpetit3
Copy link
Member

rpetit3 commented Sep 15, 2023

Yo @Svnipni !

This looks like a bug I need to look into, because there should be a check after the assembly to prevent this from happening.

Also, this looks like its on v2, do you mind trying on v3 as well?

Thank you!
Robert

@Svnipni
Copy link
Author

Svnipni commented Sep 15, 2023

Yes, this was still V2. Will try it with V3 on Monday!

@rpetit3
Copy link
Member

rpetit3 commented Sep 15, 2023

sounds good, thank you!

@Svnipni
Copy link
Author

Svnipni commented Sep 18, 2023

I'm still getting a 11Mbp genome from what I expected to be no larger than 6-7Mbps.
I ran
bactopia --r1 B_1.fastq.gz --r2 B_2.fastq.gz --sample sample_Pprotegens_v3 --species "Pseudomonas protegens" --genome_size 6000000 --max_genome_size 8000000 --outdir bactopia_out/

I'm pretty certain it's my sample that's causing the problem, since SPADES gives a similar genome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants