Enhance the `polish` module #58

GwennyGit · 2023-01-31T11:52:32Z

Additionally: - Cleaned up some code - Added more comments - Resolved issues regarding empty `protein_fasta` parameter

GwennyGit · 2023-02-02T13:29:51Z

I just detected that currently the function add_reac in the polish module does not handle the Growth or Biomass reaction properly. This is due to the fact that current_id is shortened - which is necessary for all other reactions - like that: current_id = current_id[2:]. However, if the ID of the reaction is for example 'Growth' the result for current_id will be 'owth'. Thus, the function cannot handle these IDs with the regex 'growth|_*biomass\d*_*' properly. The code needs to be adjusted to handle these exceptions correctly.

famosab · 2023-02-09T12:31:53Z

At least in my models the growth function has the id id="R_Growth" which would mean that removing the trailing R is still valid and would not break the regex function. We might need to insert a check whether the trailing R is really there before we try to remove it. Or am I missing something?

        # Get ID and remove 'R_'
        current_id = entity.getId()
        if current_id[:2] == 'R_':
            current_id = current_id[2:]

GwennyGit · 2023-02-09T14:17:26Z

Regarding the ID for growth, this seems to vary between models a lot. We could also just change the regex to include R_ → R_?growth|R_?_*biomass\d*_*. However, the proposed change is also good and might be better as other reactions could also exist without R_.
Looking at your changes I realised one problem with the function change_all_qualifiers. If one has a lab strain model the GeneProtein qualifier should be set to isHomologTo as we discussed before.

…58 (1) Added more comments (2) Restructured functions (3) Removed BUG

GwennyGit · 2023-02-10T19:12:48Z

As mentioned in these comments #52 (comment) and #52 (comment). It would indeed make sense to add compartment specifications for all reactions which happen in the same compartment within polish.

draeger · 2023-02-10T19:15:52Z

COBRApy manipulates IDs back and forth. This prefix is only there in the SBML export. Upon import R_ is stripped away.

famosab · 2023-02-10T19:19:10Z

In libsbml the trailing R is not stripped though. That is sometimes confusing. (The same is true for genes and metabolites).

draeger · 2023-02-10T19:30:06Z

Of course not. This entire ID manipulation is nonsense and causes chaos. There should be separate fields for every piece of information that the BiGG ID ships.

GwennyGit · 2023-02-10T19:32:14Z

I think the R_ for reactions is used to clarify that this is a reaction ID within the model. R_ is not part of the BiGG IDs same as M_ for metabolites/species and G_ for groups and GeneProducts.

famosab · 2023-02-10T19:37:29Z

But for the polishing module we mostly use libsbml, I think we only even use cobrapy for the growth simulations (and a few more small things). So we always need to take care of those trailing IDs (but through string slicing that is not an issue I think). One could argue that they are obsolete since the entity itself holds the information already - but that is something we cannot change here (or shouldn't on our own).

GwennyGit · 2023-06-21T13:36:47Z

While investigating models from Heinken et al.[1] I realised that improving the function get_curie_set within the polish module could be interesting. As these models contain NaN identifiers and the function get_curie_set assembles a dictionary mapping the database of each CURIE to the correspondingly found identifier it would be helpful to also check for NaNs and remove the according entries.

[1]
Heinken, A., Hertel, J., Acharya, G. et al. Genome-scale metabolic re-
construction of 7,302 human microorganisms for personalized medicine.
Nature Biotechnology. issn: 1546-1696. doi:10.1038/s41587-022-01628-0 (Jan. 2023).

This module now can: - Remove NaN containing URIs or at least return them - Check if the vmhreaction identifiers found are also BiGG Identifiers if 'id_db:' is set to 'VMH'

…tifiers #95

* Changed PyPI version badge For the next release the PyPI version badge stems now from 'shields.io' and not from 'badge.fury.io'. * Changed colour for refineGEMs version badge * Adjusted handling of BioCyc identifiers in polish_annotations #95 #58 * Added requirement for importlib_resources=5.13.0 to Pipfile * Added code to cope with missing sub-database prefixes for BioCyc identifiers #95 * Changed NaN identifier handling #95 * Fixed issue III: None prefix identifier pairs in invalid_curies.tsv #95 * Adjusted files with version for release 1.2.2

GwennyGit added enhancement New feature or request refactoring changes in the code functionality labels Jan 31, 2023

GwennyGit added a commit that referenced this issue Jan 31, 2023

Changed handling of lab strains #58

9902a61

Additionally: - Cleaned up some code - Added more comments - Resolved issues regarding empty `protein_fasta` parameter

famosab added a commit that referenced this issue Feb 9, 2023

added wrapper func, extended polish and included check for R_ #50 #58

1acb08c

famosab added a commit that referenced this issue Feb 9, 2023

update authorship #58

b0f88c5

GwennyGit mentioned this issue Feb 9, 2023

Feature request: Module handling URI patterns #50

Closed

famosab mentioned this issue Feb 9, 2023

Add more functionality to SBOannotator draeger-lab/SBOannotator#1

Open

5 tasks

GwennyGit added a commit that referenced this issue Feb 9, 2023

Restructured code for changing qualifiers & polishing annotations #50 #…

327ae4c

…58 (1) Added more comments (2) Restructured functions (3) Removed BUG

GwennyGit added a commit that referenced this issue Feb 10, 2023

Changed prints & fixed some bugs in change_qualifier_per_entity #50 #58

4e86e1a

GwennyGit added a commit that referenced this issue Feb 10, 2023

Readjusted script to run all functions in polish again #50 #58

1416f5a

famosab mentioned this issue Feb 10, 2023

Improvement of gap-filling in refineGEMs #52

Open

20 tasks

famosab added this to the New functions towards a version 1.1 milestone Feb 14, 2023

famosab added a commit that referenced this issue Feb 21, 2023

modify table extraction in ncbiprotein #58

f798287

GwennyGit added a commit that referenced this issue Jun 25, 2023

Some bug fixes in changed polish function #53 #58

b0b4cab

GwennyGit added a commit that referenced this issue Jun 29, 2023

Updated polish.py #53 #58

b505e82

GwennyGit added a commit that referenced this issue Jul 5, 2023

Updated io due to changes in polish #53 #58

fccd17a

GwennyGit added a commit that referenced this issue Aug 9, 2023

Enhanced polish module #58

b003626

This module now can: - Remove NaN containing URIs or at least return them - Check if the vmhreaction identifiers found are also BiGG Identifiers if 'id_db:' is set to 'VMH'

This was linked to pull requests Aug 9, 2023

Feature/polish unit update #51

Merged

Merge Feature/polish update into main #54

Merged

Polish update #62

Merged

This was linked to pull requests Aug 9, 2023

Add new biomass module to dev #86

Merged

Polish update - Enable more GeneProduct annotations #88

Merged

Feature/polish with bioregistry #92

Merged

GwennyGit added a commit that referenced this issue Aug 14, 2023

Adjusted handling of BioCyc identifiers in polish_annotations #95 #58

a25ffdf

GwennyGit referenced this issue Aug 15, 2023

Added code to cope with missing sub-database prefixes for BioCyc iden…

4652b45

…tifiers #95

GwennyGit added a commit that referenced this issue Sep 14, 2023

Fixed bug with inchikey in polish module #58

d58c5c5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance the `polish` module #58

Enhance the `polish` module #58

GwennyGit commented Jan 31, 2023 •

edited

GwennyGit commented Feb 2, 2023

famosab commented Feb 9, 2023 •

edited

GwennyGit commented Feb 9, 2023 •

edited

GwennyGit commented Feb 10, 2023 •

edited

draeger commented Feb 10, 2023

famosab commented Feb 10, 2023

draeger commented Feb 10, 2023

GwennyGit commented Feb 10, 2023

famosab commented Feb 10, 2023

GwennyGit commented Jun 21, 2023 •

edited

Enhance the polish module #58

Enhance the polish module #58

Comments

GwennyGit commented Jan 31, 2023 • edited

This issue was opened to collect all enhancements for the polish module.

GwennyGit commented Feb 2, 2023

famosab commented Feb 9, 2023 • edited

GwennyGit commented Feb 9, 2023 • edited

GwennyGit commented Feb 10, 2023 • edited

draeger commented Feb 10, 2023

famosab commented Feb 10, 2023

draeger commented Feb 10, 2023

GwennyGit commented Feb 10, 2023

famosab commented Feb 10, 2023

GwennyGit commented Jun 21, 2023 • edited

Enhance the `polish` module #58

Enhance the `polish` module #58

GwennyGit commented Jan 31, 2023 •

edited

This issue was opened to collect all enhancements for the `polish` module.

famosab commented Feb 9, 2023 •

edited

GwennyGit commented Feb 9, 2023 •

edited

GwennyGit commented Feb 10, 2023 •

edited

GwennyGit commented Jun 21, 2023 •

edited