Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URLs for MAXCONTIGIDLEN limit #131

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

spock
Copy link

@spock spock commented Jul 15, 2015

I've hit a 20-character limit on auto-generated contig IDs in the released prokka 1.11 archive, and started looking for the source of this limit.
I have only managed to find 2 SeqID length limits: 25 in Sequin documentation, and 41 in the "annotation pipeline readme"; NCBI's sample GenBank record (no longer?) mentions any specific limits on LOCUS name length.

It might be useful to have some limit-related URLs next to the limit definition, so that it is easier to set a new reasonable limit when the default is not suitable for some reason.

I've hit a 20-character limit on auto-generated contig IDs in the released prokka 1.11 archive, and started looking for the source of this limit.
I have only managed to find 2 SeqID length limits: 25 in Sequin documentation, and 41 in the "annotation pipeline readme"; NCBI's sample GenBank record (no longer?) mentions any specific limits on LOCUS name length.

It might be useful to have some limit-related URLs next to the limit definition, so that it is easier to set a new reasonable limit when the default is not suitable for some reason.
@aleimba
Copy link

aleimba commented Jul 15, 2015

Hi @spock,

this might be useful for others: There's an issue (#76) discussing Prokka's MAXCONTIGIDLEN.

I would actually prefer a more informative error message from Prokka. I don't think many users will look in the source code, but still nice to have those URLs.

The problem is, that GBK, GFF and SQN files all have different restrictions on the SeqID length ...

@spock
Copy link
Author

spock commented Jul 15, 2015

Hi @aleimba,

thank you for the issue link - I forgot to search the issues before requesting :(
After reading the #76 thread, I believe these URLs in the code will help decrease confusion.

The suggested URLs do cover GBK and SQN cases, and according to @tseemann GFF has no limit on SeqID.

Not sure how to improve the error message, though.
Something like "You can edit the MAX...LEN to bypass this limit"?
That should probably be a separate commit/request, and might actually be better as a command-line option, to accommodate for all unimaginable corner-cases :)

@aleimba
Copy link

aleimba commented Jul 15, 2015

"Previously", I added also the max. characters for the locus_tag:

err("Genbank contig IDs are $contig_name_len chars, must be <= $MAXCONTIGIDLEN. Prefix is '$contigprefix', locus_tag has to be <= 6.");

But that was before @tseemann mentioned the different GBK, GFF, SQN specs. So, probably doesn't make sense and I'm guessing @tseemann will replace tbl2asn pretty soon anyway.

Another option flag might be an idea (although Prokka already has quite many) and mentioning it in the error message. And then a mention of potentially broken GBKs ... Definitely another commit ;-).

Nevertheless, I'm guessing at least all the SPAdes users are running into the MAXCONTIGIDLEN problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants