Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PROPOSAL: diffs #103

Open
lskatz opened this issue Aug 24, 2021 · 6 comments
Open

PROPOSAL: diffs #103

lskatz opened this issue Aug 24, 2021 · 6 comments
Assignees
Labels
Status: On Hold Assigned but not being worked on at the moment. Type: Enhancement

Comments

@lskatz
Copy link

lskatz commented Aug 24, 2021

Hi, I am finding one aspect of ChewBBACA problematic: that it adds alleles in the same command that it analyzes. This leads to several problems including

  • Automatic errors if the database is on a read-only drive. It will err as soon as it tries to write. This has happened if I mount read-only with Singularity, for example. Or if there is a central read-only MLST database on our high performance computer (HPC) that everyone uses.
  • Pollution of the database. I queried with some bad assemblies and now the database is ruined. The only way to backtrack is to delete and recreate the database. If there is a central MLST database on our HPC, then it is problematic if one user's mistakes lead to the pollution of the database which affects all users.

I would like to propose that the AlleleCall step produces something like diff or patch files. I would also like to propose an additional step that can accept a patch file to update the database. The most efficient way to accept a patch might be through git commands but that is just a suggestion.

Having patch files might also be helpful for compatibility with any current or future MLST callers like STing, if they decide to accept patches. It would also help in communicating between labs using ChewBBACA. For example, if I discover a new allele, it would be a standardized approach to communicating it to chewbbaca.online.

Thank you for your consideration on this topic.

@lskatz
Copy link
Author

lskatz commented Aug 24, 2021

@rfm-targa rfm-targa self-assigned this Aug 24, 2021
@rfm-targa rfm-targa added Status: In Progress Has been assigned and is being worked on. Type: Enhancement labels Aug 24, 2021
@ramirma
Copy link
Member

ramirma commented Aug 25, 2021

Thanks for the suggestions @lskatz . Some of the points you raised have been in discussion in the group for some time, so your comments are an excellent starting point to think more seriously about this. I see @rfm-targa has already self-assigned this. I would just like to highlight that the communication with chewie name server at chewbbaca.online is already automated in chewBBACA, including the submission of new alleles identified for the first time locally. You can see more on this at https://chewie-ns.readthedocs.io/en/latest/user/synchronize_api.html.

@lskatz
Copy link
Author

lskatz commented Aug 25, 2021

Thank you @ramirma and @rfm-targa for having already thought about this! Thank you for considering this topic!

@rfm-targa rfm-targa added Status: On Hold Assigned but not being worked on at the moment. and removed Status: In Progress Has been assigned and is being worked on. labels Jan 31, 2022
@lskatz
Copy link
Author

lskatz commented Feb 7, 2023

Hi, has all this been fixed in version 3?

@rfm-targa
Copy link
Contributor

Hello @lskatz! We've added the --no-inferred parameter to allow users to decide if they want to add novel alleles to the schemas. If you use that parameter, chewBBACA will still classify novel alleles but will not add them to the schema (intermediate files are created in a separate directory). This should help prevent database pollution.
Since it does not add novel alleles to the schema if you pass the --no-inferred parameter, it should also be possible to perform allele calling if the schema is read-only. Except for the first time you use a schema to perform allele calling (created with chewBBACA v3 or schemas from chewBBACA <= 2.8.5). chewBBACA v3 creates files with pre-computed values that are used to speedup execution. After the first AlleleCall execution, you can use/copy the schema and use it in read-only mode with the --no-inferred parameter. It only updates the pre-computed files when novel alleles are added to the schema.
Let us know if you run the latest version and if any of these issues are not fixed. We'll gladly add changes to make it work under both scenarios you've described.

@ramirma
Copy link
Member

ramirma commented Feb 10, 2023

@lskatz , I hope @rfm-targa's answer clarifies the points you raised. Also please note that chewBBACA may now run in 3 different modes that may also be of use to you. For more information on this please have a look at the documentation. Do let us know if the solutions implemented fully address the issues you raised.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: On Hold Assigned but not being worked on at the moment. Type: Enhancement
Projects
None yet
Development

No branches or pull requests

3 participants