Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An error when looking for 200 proteins in a genome #46

Open
gilloe opened this issue Jun 6, 2023 · 6 comments
Open

An error when looking for 200 proteins in a genome #46

gilloe opened this issue Jun 6, 2023 · 6 comments

Comments

@gilloe
Copy link

gilloe commented Jun 6, 2023

Hi,
I get the following error when running miniprot looking for 200 proteins at a 73MB genome:
[M::mp_ntseq_read@0.3751.01] read 68163509 bases in 69053 contigs
[M::mp_idx_build@0.379
1.01] 608218 blocks
[M::mp_idx_build@0.9872.83] collected syncmers
[M::mp_idx_build@13.918
1.13] 23643989 kmer-block pairs
[M::mp_idx_print_stat] 1694743 distinct k-mers; mean occ of infrequent k-mers: 13.95; 0 frequent k-mers accounting for 0 occurrences
##gff-version 3
BUG! 1929 == 1929? 621 == 622? 40M1I57M3D83M2D98M3I8M1I59M1D26M15U117M1D46M2D30M5I27M1I5M63U12M
miniprot_: align.c:195: mp_extra_cal: Assertion `al == r->qe - r->qs' failed.
Aborted

I tried to run it looking for 20 proteins in the same genome and it worked fine, and also looked for the 200 proteins in a smaller genome, and it works fine. So I don't think it is a file format issue.
Any suggestions?

Thanks,
Gil

@lh3
Copy link
Owner

lh3 commented Jun 6, 2023

Could you share me with the proteins and the reference genome?

@gilloe
Copy link
Author

gilloe commented Jun 7, 2023

Yes, thanks.
The genome can be downloaded from: https://www.ncbi.nlm.nih.gov/nuccore/LDNA00000000.1
And the query file is attached.
Hydra_uniprot (2).zip

@lh3
Copy link
Owner

lh3 commented Jun 8, 2023

I downloaded the genome and could get the results:

[M::mp_ntseq_read@0.385*1.00] read 68163509 bases in 69053 contigs
[M::mp_idx_build@0.386*1.00] 608218 blocks
[M::mp_idx_build@0.794*2.52] collected syncmers
[M::mp_idx_build@1.099*2.09] 23643989 kmer-block pairs
[M::mp_idx_print_stat] 1694743 distinct k-mers; mean occ of infrequent k-mers: 13.95; 0 frequent k-mers accounting for 0 occurrences
[M::worker_pipeline::1.299*2.27] mapped 200 sequences
[M::main] Version: 0.11-r234
[M::main] CMD: ./miniprot --gff GCA_001455295.2_ASM145529v2_genomic.fna.gz Hydra_uniprot.fasta

Based on the total number of bases, we are using the same reference.

What version are you using?

@gilloe
Copy link
Author

gilloe commented Jun 8, 2023

How do I know which version I have?

@lh3
Copy link
Owner

lh3 commented Jun 8, 2023

miniprot --version. You may just try the latest version.

@gilloe
Copy link
Author

gilloe commented Jun 8, 2023

0.11-r235-dirty
I downloaded it very recently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants