Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bakta #16

Closed
fmalmeida opened this issue May 26, 2021 · 8 comments · Fixed by #58 or #61
Closed

Add bakta #16

fmalmeida opened this issue May 26, 2021 · 8 comments · Fixed by #58 or #61
Assignees
Labels
enhancement New feature or request stalled Implementation is slow or difficult

Comments

@fmalmeida
Copy link
Owner

Study the best way to implement Bakta in the pipeline.

It will be nice to provide the users with the option to choose the base annotation with Prokka or Bakta, depending on their needs.

Check if it will be possible to add it.

@fmalmeida fmalmeida self-assigned this May 26, 2021
@fmalmeida fmalmeida added the enhancement New feature or request label May 26, 2021
@fmalmeida
Copy link
Owner Author

fmalmeida commented Nov 7, 2021

Bakta outputs are extremely similar to Prokka, however, their annotation is more reliable. Therefore, the addition seems to be very straightforward:

  • Create a module for bakta so users can use either prokka or bakta
  • If using bakta, select the outputs that are similar to the ones produced by prokka and are used throughout the pipeline, thus, the rest of the pipeline would be exactly the same, using the GFF and TSV from bakta or prokka

One thing to think is:

  • Bakta depends on a heavy database, thus, it would not be adequate to put it into the docker image
  • Therefore, to add bakta to the pipeline, the pipeline itself must be reconfigured to have a module that create all the databases that are used throughout the pipeline
  • Then, make the pipeline receive a parameter setting path to this database, which would be easier to users to make them up to date
  • This would also make the docker images only possess the tools, and not the database files, making them smaller, and also making it possible to use the pipeline with different profiles such as: conda, docker or singularity

Recapitulating:

To add bakta it would be necessary to:

  • make the pipeline use tools from conda, docker or singularity with the databases being set in a custom user path
  • create a module to automatically download and format the databases for the pipeline
  • re-configure the pipeline to use the database files from this database directory provided by the user
  • add bakta

@fmalmeida fmalmeida added the stalled Implementation is slow or difficult label Dec 5, 2021
@fmalmeida
Copy link
Owner Author

fmalmeida commented Mar 30, 2022

Now that pipeline has been restructured, this issue can become a reality.

Since bakta database is huge, instead of downloading and formatting with the pipeline users will have to download themselves as each system or institute will have a way to handle such massive download.

Thus, if users want to annotate and trigger bakta, they will have to simply:

  1. Download the database
  2. Set path to bakta database with --bakta_db

When using this parameter, the pipeline should automatically trigger bakta instead of prokka.

@fmalmeida
Copy link
Owner Author

fmalmeida commented Jul 6, 2022

Finally, after very much time, workflow is now properly running from top to bottom when using bakta. For release, it is now required to:

  • Update the docs to explain about bakta option. How to use it? What to expect?
  • Update version on manifest
  • Update automatic reports so they understand when user used prokka or bakta. Check if everything is well rendered.
  • Automatic report, when using prokka must understand when pipeline run using additional hmm libraries for prokka, and which ones were used (from the ones possible when building databases).
  • To think. If using bakta, there is addional parsing of outputs that we can do to give users more information in outputs?

@fmalmeida fmalmeida linked a pull request Jul 6, 2022 that will close this issue
@fmalmeida
Copy link
Owner Author

Almost ready.

@fmalmeida
Copy link
Owner Author

fmalmeida commented Sep 2, 2022

  • requires running at least two annotations to evaluate how final results look like, so changes can be merged
  • And make sure docs are up to date

try to roll it up in the next 3 days

@fmalmeida
Copy link
Owner Author

Something is wrong with bakta docker image. When running it, it is complaining about diamond.
With some -9 exit code.

@fmalmeida
Copy link
Owner Author

Execution tests were finished. Now building new docker images, to check whether scripts and reports are properly updated so release can be made.

@fmalmeida fmalmeida linked a pull request Sep 9, 2022 that will close this issue
@fmalmeida
Copy link
Owner Author

Finally done 🥳

@fmalmeida fmalmeida pinned this issue Sep 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request stalled Implementation is slow or difficult
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant