Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Blastn as an alternative classifier #608

Open
a4000 opened this issue Aug 2, 2023 · 6 comments
Open

Proposal: Blastn as an alternative classifier #608

a4000 opened this issue Aug 2, 2023 · 6 comments
Labels
enhancement New feature or request

Comments

@a4000
Copy link
Contributor

a4000 commented Aug 2, 2023

Description of feature

There is already an nf-core module for blastn. It just just needs to be modified to be more compatible with Ampliseq (mostly by removing the part of the module that expects a meta map). If we do modify an nf-core module, I'm not sure if we should put the module in the modules/nf-core or modules/local sub-directories. There are three main obstacles I can see with adding this module to Ampliseq.

  1. There would also need to be a makeblastdb step for cases when the user doesn't already have a local blast database.
  2. I'm not sure how compatible the blastn output files are with the other downstream steps of Ampliseq. If that is an issue, we may need a post blastn step to modify the output files.
  3. When using a local blast database in Nextflow, depending on the local directory structure, the docker run command may need to bind a directory to a volume. My way of getting around this issue is to have a bind_dir parameter with null as the default, then in the config file I use the -v option in docker if the bind_dir parameter is not null.
@a4000 a4000 added the enhancement New feature or request label Aug 2, 2023
@d4straub
Copy link
Collaborator

d4straub commented Aug 2, 2023

I would add it into modules/nf-core, then either use nf-core modules patch to remove the meta map or, probably even easier, just add a fake meta map to the channel before entering the unmodified blastn module.
About the issues:

  1. That could be done in an additional step, initially a local copy would be fine I guess
  2. Optimally the output file should be modified in an additional process/module (local) that makes the output compatible with downstream analysis
  3. This problem I am not aware of. I would hope that with nextflow channels and file staging that shouldn't be necessary, but there were some oddities of containerized programs that I encountered as well.

@erikrikarddaniel
Copy link
Member

erikrikarddaniel commented Aug 2, 2023

I can't see why one would have to patch the module to remove the meta map. One can just create a meta map on the fly -- usually, just the id entry is needed -- before calling the module. (Edit: Sorry, hadn't seen that you already proposed this, @d4straub.)

Otherwise, I agree with @d4straub, including his comments for point 3: I think it's a matter of properly staging the directory in which the blast database is located or will be created in. I see there is already a module for makeblastdb, so I suppose it's just a matter of calling that the correct way.

@erikrikarddaniel
Copy link
Member

Come to think of it: We're already using VSEARCH, which is basically doing the same thing as BLASTN but at least an order of magnitude faster -- can't that be used instead?

@a4000
Copy link
Contributor Author

a4000 commented Aug 2, 2023

I haven't tried using VSEARCH for classification, but I'll give it a try using it tomorrow. If it does the same thing, but faster, then the only argument I still have for adding blastn is for cases when someone wants to use a blast database. For example, if someone is working on a supercomputer that stores and manages a blast nt database.

@sebrauschert
Copy link

Hi, I think a strong case can be made for incorporating blast as an option rather than only keeping VSEARCH, as @a4000 has suggested.
Increasingly there will be a need for re-analysis of data, especially when longitudinal bio-monitoring projects will become more standard practice in e.g. marine environments. A lot of data sets already created had taxonomic annotation performed via blast. One would at least want to be able to recreate those results in a backward compatible manner. As there is already an nf-core module available, incorporation should not require a huge amount of work?

@d4straub
Copy link
Collaborator

d4straub commented Aug 30, 2023

Imho, re-analysis from scratch of relevant data might be more helpful than attempting to use old methods to make data comparable ("old" doesnt mean here particularly blast, but all steps that come with analysing raw data). And ampliseq supports large data batches.

As there is already an nf-core module available, incorporation should not require a huge amount of work?

The implementation effort just to run blast on ASVs should be indeed relatively little. But the module covers only a small snippet of whats needed (and the easiest, imho). But I would be happy to be proven wrong!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants