Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow input of pre-annotated ORFs #255

Open
BenPonBiobrain opened this issue Apr 4, 2023 · 7 comments
Open

Allow input of pre-annotated ORFs #255

BenPonBiobrain opened this issue Apr 4, 2023 · 7 comments
Assignees
Labels
enhancement Improvement for existing functionality

Comments

@BenPonBiobrain
Copy link

Description of feature

Originating from e.g. metatdenovo and mag pipelines. This would be very helpful for me as I run those pipelines and would like to keep the ORF ids for all analyses.

@BenPonBiobrain BenPonBiobrain added the enhancement Improvement for existing functionality label Apr 4, 2023
@jasmezz jasmezz added this to the 1.1 - British Beans on Toast milestone Apr 4, 2023
@jfy133
Copy link
Member

jfy133 commented Apr 26, 2023

Some development observations:

  • AMP workflow takes FAA (for amplify, hmmsearch, ampir, ampcombi)
  • ARG workflow takes FAA (for deeparg)
  • BGC workflow takes GFF/FAA/GBK (faa: hmmsearch; gff: antismash (prodigal); gbk: antismash (prokka, bakta))

I would propose that we have two additional columns: e.g. amino_acid_fasta and feature_file. The latter will except GFF or GBK files, but if the user supplies the wrong one to antismash that's there problem.

Thoughts @nf-core/funcscan ?

@erikrikarddaniel
Copy link
Member

These were exactly the questions I had when I looked at the code, and more or less the way I would choose to go ahead. It would mean that no columns, except sample name, would be mandatory, right?

Can't one have separate columns for GFF and GBK? (Rather naive on my side, not knowing the code.)

@jfy133
Copy link
Member

jfy133 commented Apr 28, 2023

These were exactly the questions I had when I looked at the code, and more or less the way I would choose to go ahead. It would mean that no columns, except sample name, would be mandatory, right?

No fasta would still be required as some tools still require both

Can't one have separate columns for GFF and GBK? (Rather naive on my side, not knowing the code.)

We could but that would make the logic more complicated if someone supplies a mixture of both. Also the GFF/GBK is to account purely for antismash wierdness (all other tools take other FAA), so I don't want to overengineer the pipeline to account for a single strange/picky tool 😬

@erikrikarddaniel
Copy link
Member

These were exactly the questions I had when I looked at the code, and more or less the way I would choose to go ahead. It would mean that no columns, except sample name, would be mandatory, right?

No fasta would still be required as some tools still require both

The alternative would perhaps be to skip those tools when no fasta is available. The more I think about this with metatdenovo and magmap output as the input to this pipeline, the less of a problem with providing both contig fasta and ORF amino acid fasta (or ORF nucleotide fasta for that matter) I see, so I'm fine with this!

Can't one have separate columns for GFF and GBK? (Rather naive on my side, not knowing the code.)

We could but that would make the logic more complicated if someone supplies a mixture of both. Also the GFF/GBK is to account purely for antismash wierdness (all other tools take other FAA), so I don't want to overengineer the pipeline to account for a single strange/picky tool grimacing

👍

Since I have a colleague waiting for this I'd be happy to contribute, but I'm afraid I won't have time for a couple of weeks.

@jfy133
Copy link
Member

jfy133 commented May 8, 2023

Since I have a colleague waiting for this I'd be happy to contribute, but I'm afraid I won't have time for a couple of weeks.

That would be wonderful @erikrikarddaniel ! 🤩 when/if you get time feel free to continue on my PR here if I dont' make progress either!

@erikrikarddaniel
Copy link
Member

Just to let you know, @jfy133: It's been a month, and I still don't see when I'll have time for this. If you don't either, no problem, I'll get to this in due time.

@jfy133
Copy link
Member

jfy133 commented Jun 13, 2023

No worries! I'm slowly picking away at it on the PR above, so just jump in when you can :)

@jfy133 jfy133 self-assigned this Jul 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improvement for existing functionality
Projects
None yet
Development

No branches or pull requests

4 participants