Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: disable adding prefix 'M_' and 'R_' in exportModel function #360

Draft
wants to merge 6 commits into
base: develop
Choose a base branch
from

Conversation

edkerk
Copy link
Member

@edkerk edkerk commented Aug 13, 2021

Main improvements in this PR:

Duplicate of #356, reverting the revert by #359 :) (see explanation there).

The changes made in PR are based on the discussion and consensus in #353

  • stops adding prefixes M_, R_, and C_ to metabolite, reaction, and compartment ids in exportModel function (RAVEN doesn't correct but report errors)
  • introduces regular expressions to checkModelStruct for checking whether the IDs are in compliance with libSBML specification
  • fully runs checkModelStruct by default in exportModel and reports instances of invalid ids as errors (prevent their spreading to the community)

I hereby confirm that I have:

@edkerk edkerk marked this pull request as ready for review August 13, 2021 21:35
@mihai-sysbio mihai-sysbio changed the title fix: disable adding prefix 'M_' and 'R_' in exprotModel function fix: disable adding prefix 'M_' and 'R_' in exportModel function Aug 20, 2021
@edkerk edkerk mentioned this pull request Sep 3, 2021
3 tasks
@edkerk edkerk added this to the 2.6.0 milestone Sep 5, 2021
As discussed in #353, this commit introduce regular expressions for checking whether the IDs are in compliance with libSBML specification.
- Enable throwing errors when running `checkModelStruct` in `exportModel`
- Provide informative error message when reporting invalid ids
@edkerk
Copy link
Member Author

edkerk commented Sep 25, 2021

Line 118-132 should be removed, no longer required as invalid IDs would already have thrown an error.

RAVEN/io/exportModel.m

Lines 118 to 132 in a459d6c

%Convert ids to SBML-convenient format. This is to avoid the data loss when
%unsupported characters are included in ids. Here we are using part from
%convertSBMLID, originating from the COBRA Toolbox
model.rxns=regexprep(model.rxns,'([^0-9_a-zA-Z])','__${num2str($1+0)}__');
model.mets=regexprep(model.mets,'([^0-9_a-zA-Z])','__${num2str($1+0)}__');
model.comps=regexprep(model.comps,'([^0-9_a-zA-Z])','__${num2str($1+0)}__');
if isfield(model,'genes')
problemGenes=find(~cellfun('isempty',regexp(model.genes,'([^0-9_a-zA-Z])')));
originalGenes=model.genes(problemGenes);
replacedGenes=regexprep(model.genes(problemGenes),'([^0-9_a-zA-Z])','__${num2str($1+0)}__');
model.genes(problemGenes)=replacedGenes;
for i=1:numel(problemGenes)
model.grRules = regexprep(model.grRules, ['(^|\s|\()' originalGenes{i} '($|\s|\))'], ['$1' replacedGenes{i} '$2']);
end
end

@haowang-bioinfo
Copy link
Member

agree

@mihai-sysbio
Copy link
Member

no longer required as invalid IDs would already have thrown an error.

@edkerk this got me thinking - are invalid reported throwing errors for all import formats, including importExcelModel?

core/checkModelStruct.m Show resolved Hide resolved
@edkerk
Copy link
Member Author

edkerk commented Sep 27, 2021

no longer required as invalid IDs would already have thrown an error.

@edkerk this got me thinking - are invalid reported throwing errors for all import formats, including importExcelModel?

@mihai-sysbio What I meant is that exportModel will throw the error via checkModelStruct. importModel would not need to call checkModelStruct, as libSBML would have failed to load a model with illegal characters. For importExcelModel though, and a future readYaml function (which at this time does not exist in RAVEN 2, but is planned to be included) it indeed would make sense to call checkModelStruct.

@edkerk
Copy link
Member Author

edkerk commented Sep 28, 2021

Should not be merged before #364 is merged and pushed from devel to main. Release 2.5.4 has then minor changes, while release 2.6.0 (this PR) will have the significant change that I/O is no longer resulting in the same model in MATLAB as in previous versions, as it no longer uses the COBRA-style prefixes.

@mihai-sysbio
Copy link
Member

Resolving the conflict and the merging of #364 brings this PR in a merge-able state.

Release 2.5.4 has then minor changes, while release 2.6.0 (this PR) will have the significant change that I/O is no longer resulting in the same model in MATLAB as in previous versions, as it no longer uses the COBRA-style prefixes.

I guess we then expect one more PR that releases 2.5.4. before merging this one though @edkerk ?

@edkerk
Copy link
Member Author

edkerk commented Dec 6, 2021

I'm still hesitant with this, as it can break backwards compatibility with people's scripts. It changes the identifier representation (e.g. M_glc6p instead of glc6p) when changing RAVEN version. Might it not be so drastic that it should (part of a) major release?

edit: I moved my further discussion to issue #353.

@edkerk edkerk modified the milestones: 2.6.0, 2.7.0 Jan 13, 2022
@simas232
Copy link
Collaborator

I tried to test this PR with the iAL1006 model from tutorials. Due to the suggested changes in this PR we have a new problem - there are many rxns and mets that begin with a numeric char so the model is no longer compliant with SBML specifications and the exportModel function therefore fails. I am not sure how common it is for metabolite and reaction IDs to start with a number, but this is definitely problematic for GEMs that use BiGG IDs for mets and/or rxns.

@edkerk
Copy link
Member Author

edkerk commented Apr 24, 2022

I agree @simas232, this should be properly resolved, also discussed in #353. To avoid confusion, I'll change this PR to draft for now.

@edkerk edkerk marked this pull request as draft April 24, 2022 20:03
@mihai-sysbio
Copy link
Member

mihai-sysbio commented Apr 25, 2022

Good catch @simas232.

this is definitely problematic for GEMs that use BiGG IDs for mets and/or rxns.

To me, this sounds very much like a problem with the GEMs themselves not being SBML-compatible. The main question who's responsibility it is to address it.

I could imagine a version of the tutorial that starts with a Colab notebook with cobrapy that imports and exports the model, just to obtain a valid SBML (assuming cobrapy also implements this prefixing functionality). Alternatively, such a "patched" GEM could be delivered with the tutorial, as long as this is documented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants