Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance the polish module #58

Open
12 of 15 tasks
GwennyGit opened this issue Jan 31, 2023 · 10 comments · Fixed by #51, #54, #62, #86 or #88
Open
12 of 15 tasks

Enhance the polish module #58

GwennyGit opened this issue Jan 31, 2023 · 10 comments · Fixed by #51, #54, #62, #86 or #88
Labels
enhancement New feature or request refactoring changes in the code functionality

Comments

@GwennyGit
Copy link
Collaborator

GwennyGit commented Jan 31, 2023

This issue was opened to collect all enhancements for the polish module.

@GwennyGit GwennyGit added enhancement New feature or request refactoring changes in the code functionality labels Jan 31, 2023
GwennyGit added a commit that referenced this issue Jan 31, 2023
Additionally:
    - Cleaned up some code
    - Added more comments
    - Resolved issues regarding empty `protein_fasta` parameter
@GwennyGit
Copy link
Collaborator Author

I just detected that currently the function add_reac in the polish module does not handle the Growth or Biomass reaction properly. This is due to the fact that current_id is shortened - which is necessary for all other reactions - like that: current_id = current_id[2:]. However, if the ID of the reaction is for example 'Growth' the result for current_id will be 'owth'. Thus, the function cannot handle these IDs with the regex 'growth|_*biomass\d*_*' properly. The code needs to be adjusted to handle these exceptions correctly.

@famosab
Copy link
Member

famosab commented Feb 9, 2023

At least in my models the growth function has the id id="R_Growth" which would mean that removing the trailing R is still valid and would not break the regex function. We might need to insert a check whether the trailing R is really there before we try to remove it. Or am I missing something?

        # Get ID and remove 'R_'
        current_id = entity.getId()
        if current_id[:2] == 'R_':
            current_id = current_id[2:]

@GwennyGit
Copy link
Collaborator Author

GwennyGit commented Feb 9, 2023

Regarding the ID for growth, this seems to vary between models a lot. We could also just change the regex to include R_R_?growth|R_?_*biomass\d*_*. However, the proposed change is also good and might be better as other reactions could also exist without R_.
Looking at your changes I realised one problem with the function change_all_qualifiers. If one has a lab strain model the GeneProtein qualifier should be set to isHomologTo as we discussed before.

@GwennyGit
Copy link
Collaborator Author

GwennyGit commented Feb 10, 2023

As mentioned in these comments #52 (comment) and #52 (comment). It would indeed make sense to add compartment specifications for all reactions which happen in the same compartment within polish.

@draeger
Copy link
Member

draeger commented Feb 10, 2023

COBRApy manipulates IDs back and forth. This prefix is only there in the SBML export. Upon import R_ is stripped away.

@famosab
Copy link
Member

famosab commented Feb 10, 2023

In libsbml the trailing R is not stripped though. That is sometimes confusing. (The same is true for genes and metabolites).

@draeger
Copy link
Member

draeger commented Feb 10, 2023

Of course not. This entire ID manipulation is nonsense and causes chaos. There should be separate fields for every piece of information that the BiGG ID ships.

@GwennyGit
Copy link
Collaborator Author

I think the R_ for reactions is used to clarify that this is a reaction ID within the model. R_ is not part of the BiGG IDs same as M_ for metabolites/species and G_ for groups and GeneProducts.

@famosab
Copy link
Member

famosab commented Feb 10, 2023

But for the polishing module we mostly use libsbml, I think we only even use cobrapy for the growth simulations (and a few more small things). So we always need to take care of those trailing IDs (but through string slicing that is not an issue I think). One could argue that they are obsolete since the entity itself holds the information already - but that is something we cannot change here (or shouldn't on our own).

@GwennyGit
Copy link
Collaborator Author

GwennyGit commented Jun 21, 2023

While investigating models from Heinken et al.[1] I realised that improving the function get_curie_set within the polish module could be interesting. As these models contain NaN identifiers and the function get_curie_set assembles a dictionary mapping the database of each CURIE to the correspondingly found identifier it would be helpful to also check for NaNs and remove the according entries.


[1]
Heinken, A., Hertel, J., Acharya, G. et al. Genome-scale metabolic re-
construction of 7,302 human microorganisms for personalized medicine.
Nature Biotechnology. issn: 1546-1696. doi:10.1038/s41587-022-01628-0 (Jan. 2023).

GwennyGit added a commit that referenced this issue Jun 29, 2023
GwennyGit added a commit that referenced this issue Aug 9, 2023
This module now can:
- Remove NaN containing URIs or at least return them
- Check if the vmhreaction identifiers found are also BiGG Identifiers if  'id_db:' is set to 'VMH'
This was linked to pull requests Aug 9, 2023
GwennyGit added a commit that referenced this issue Aug 16, 2023
* Changed PyPI version badge

For the next release the PyPI version badge stems now from 'shields.io' and not from 'badge.fury.io'.

* Changed colour for refineGEMs version badge

* Adjusted handling of BioCyc identifiers in polish_annotations #95 #58

* Added requirement for importlib_resources=5.13.0 to Pipfile

* Added code to cope with missing sub-database prefixes for BioCyc identifiers #95

* Changed NaN identifier handling #95

* Fixed issue III: None prefix identifier pairs in invalid_curies.tsv #95

* Adjusted files with version for release 1.2.2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment