Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add speciesReference ID reducing converter #249

Open
luciansmith opened this issue Jul 14, 2022 · 14 comments
Open

Add speciesReference ID reducing converter #249

luciansmith opened this issue Jul 14, 2022 · 14 comments
Labels
enhancement New feature or request

Comments

@luciansmith
Copy link
Member

As per sbmlteam/sbml-test-suite#87 it would be nice to have a converter that could reduce the use of speciesReference IDs in the model. This complete when the speciesReference is constant: any initial assignment could be calculated and assigned to the 'stoichiometry' attribute, and any use of the speciesReference in MathML could be converted to a 'cn' with that value.

For non-constant speciesReferences less can be done, though we could in theory channel everything through a new generated variable that participated in the rest of the model, and which the speciesReference has an assignment rule to the generated variable, and that's the only interaction of that ID with the rest of the model. I'm not sure if that's super helpful, though.

@fbergmann
Copy link
Member

i dont find this converter really useful. For constant speciesReferences, after expanding initial assignments, the final value can already be retrieved readily. For non-constant stoichiometries i dont see the proposed approach working, as you'd have to regenerate your stoichiometry matrix at any point.

@luciansmith luciansmith added the enhancement New feature or request label Jul 14, 2022
@luciansmith
Copy link
Member Author

The main use case I can think of is for simulators that don't know about speciesReference IDs. When they're constant, you could expunge them from the model, much like initial assignments. It wouldn't help when they're non-constant, though.

@paulflang
Copy link

@luciansmith thanks for creating this issue. @fbergmann you say that

For constant speciesReferences, after expanding initial assignments, the final value can already be retrieved readily

This means that if the id of a speciesReference has an initial assignment, the result of the initialAssignment calculation is substituted as their stoichiometry, right? But if the id of a speciesReference is used in another expression in the model the "expandInitialAssignments" would not substitute the stoichiometry in this expression, right? We use the "expandInitialAssignments" converter in SBML.jl, but cannot solve test suite cases like 974.

@exaexa already created this issue for this problem. Not sure if solving this issue would rely on the speciesReference reducing converter proposed here.

@fbergmann
Copy link
Member

How are you currently resolving elements with an sbml id to its value. Say a global parameter? In python you could just call getElementBySId on the model, and if it is a speciesreference access its stoichiometry:

   doc = libsbml.readSBMLFromFile('./00974-sbml-l3v2.xml')
   model = doc.getModel()
   model.getElementBySId('Xref').stoichiometry

but if that is not possible, a loop through the species references of all reactions, and for each with an id defined, unset that id, and create a global parameter with its id and value. the issue i have with making this converter, in libsbml, is that it will not work for the cases where the stoichiometry is changing (and for which the id was introduced on the species references in the first place).

@paulflang
Copy link

IIUC, the problem is not extracting this information, the problem is that SBML.jl represents Reactions in a struct with fields for reactants and products. Each are described by a dictionary with species ID as keys and stoichiometry as values. We would have to introduce a speciesReference struct and then replace the dictionary with a list of speciesReference structs, I believe. That would be a breaking change. Or alternatively, as you suggest, use the workaround with a global parameter.

@luciansmith
Copy link
Member Author

I am obviously not an expert in your system, but would it be possible to introduce an 'Xref' parameter into your parameter list, and relax your dictionary to have a species key and a stoichiometry value-or-parameterID?

No name mangling would be required; the IDs of species references are already guaranteed to be different from any other SBML ID with mathematical meaning.

When translating back to SBML, you'd just need to check the stoichiometry values for IDs, and adjust accordingly.

@luciansmith
Copy link
Member Author

(I should add that without something like this, you'll not be able to translate models with variable stoichiometries to SBML.jl Which might be fine if that's out of your scope. Do know that SBML L2 models also allowed variable stoichiometries, but had a different way of storing that information.)

@paulflang
Copy link

Seems like a reasonable suggestion, thanks! But I am only a contributor to SBML.jl. @exaexa is one of the main developers, so I leave this up to him.

@fbergmann
Copy link
Member

IIUC, the problem is not extracting this information, the problem is that SBML.jl represents Reactions in a struct with fields for reactants and products. Each are described by a dictionary with species ID as keys and stoichiometry as values. We would have to introduce a speciesReference struct and then replace the dictionary with a list of speciesReference structs, I believe. That would be a breaking change. Or alternatively, as you suggest, use the workaround with a global parameter.

I dont see the breaking change. The question is, when evaluating an AST tree, and you encounter an AST of type NAME. What do you do? You have to resolve the name to a value at that point in time. You are already doing this for compartments, species, global parameters. What would it take to add a fallback for a speciesReference at that point?

@paulflang
Copy link

I am not sure if I understand correctly. Perhaps worth mentioning that we are not building a simulator. We are building SBML.jl, which parses an SBML file into and SBML.jl model. This can be used by COBREXA.jl for constraint-based modelling or by SBMLToolkit.jl. The latter converts the SBML.jl model into a Catalyst.jl ReactionSystem. This is (together with rules and events) translated into a ModelingToolkit.jl ODESystem, which is then simulated. So SBMLToolkit.jl never sees an AST and I believe SBML.jl never resolves names to values (this is what ModelingToolkit compatible solvers do, I think).

@luciansmith
Copy link
Member Author

In that case, the issue becomes how COBREXA.jl or SBMLToolkit.jl deal with named and/or changing stoichiometries (or if they simply don't). I would be surprised if constraint-based models have yet encountered changing stoichiometries, but maybe they have? The ModelingToolkit.jl would be more likely to do so, but it too might not.

Either way, if you can store it as something simple on your end, then you can either translate that to the target platform, or throw an error saying that the model cannot be translated as written.

That said, I do see the use case for a translator that simplifies what it can (the constant case, like the initial assignment translator). I don't know how many of those models exist outside of the test suite, but there's probably at least a few.

@exaexa
Copy link
Contributor

exaexa commented Jul 17, 2022

Hi all,

just confirming -- while there are use cases for changing stoichiometries in constraint-based models (e.g. tuning the biomass production to whatever measured values), the changes are typically completely "static" -- having this directly in a simulated constraint-based model would make it non-linear, thus generally non-desirable. :D

For SBML.jl, this is AFAIK not an issue. While we indeed change the representation to Julia structures, I'm trying hard to just represent the SBML contents and avoid interpreting any information from the model by SBML.jl, because that would almost certainly break someone's use case. I guess that just having the fields extracted correctly is the best we can do.

As for the new converter (adding my 0.5 USD): We've seen a lot of ambiguity in interpreting the math formulations stored in SBML models, and the converter code is the only implicitly valid go-to solution that we currently have for determining the meaning of many expressions, and unfortunately it is mostly a blackbox (unless one decides to start reading recursive evaluators implemented in C++03). This issue adds yet another source of variables and another source of constrainable stuff into the mix that we have to handle. So, in the long term, could we please have some interpretation reference -- preferably a full denotational semantics with precisely resolved scopes -- for the math in SBML? I think an explicit representation of the math could then be used to quickly implement straightforward (and correct!) evaluators for the SBML math in any language, and (in turn) effectively avoid this whole category of problems for the foreseeable future (possibly also solving my fun set of ambiguities with AssigmnentRules :] ). At the same time, I guess it's not a lot of work for SBML authors who already know how the stuff is evaluated; I don't expect the whole semantics would take more than a 2 or 3 sheets of latex math. I guess I can invest some time into that too -- let me know in case this would be viable.

@luciansmith
Copy link
Member Author

Coming back to this: it occurs to me that this must be already possible with the initial assignment converter: if all initial assignments in the model are converted to doubles, surely this would be true for any stoichiometries as well?

Also, this caught my eye this time through, and I think I missed it the first time from @exaexa:

We've seen a lot of ambiguity in interpreting the math formulations stored in SBML models

This surprises me! I would have thought that all the math formulations were either explicitly defined in the MathML spec, or defined in the SBML spec for the 'csymbol' objects. What specifically have you found that's ambiguous? We'd be happy to clarify things, both here and in a new version of the SBML specification if necessary.

@exaexa
Copy link
Contributor

exaexa commented Apr 18, 2023

Hi!
the ambiguity is mostly for "how should we interpret the reference as a computable value".

The math is not very explicit esp. in:

  • units (either all scalars should be unit-ed (so that forward inference is possible) or all references to things should explicitly carry an unit for unitless interpretation (so that stuff may be safely processed with software that doesn't support unitful computation))
  • meaning of things (does M_accoa_c mean concentration or amount of accoa? does R_somereaction mean "reaction is active" or "reaction is enabled" or the current rate of reaction? (...in which units? does this depend on interpreting the SBML as a qual vs fbc?)

I believe these are treated by the specification correctly (although understanding the stuff is sometimes painful, you might find my gripe around hasOnlySubstanceUnits somewhere, which breaks like 4 engineering principles at once, incl. constructivistic interpretation and substitution principle).

I'd really welcome if this could be done using e.g. accessor functions or special accessor references that CLEARLY specify how the value for the formula should be computed. For example, instead of saying:

rate is e^(SpeciesA - SpeciesB)

we can much more clearly say

rate in mmol/h is e^(concentrationOf(SpeciesA, mmol/l) - concentrationOf(SpeciesB, mmol/l))

[edit:] or even without caring too much about the units (possibly leaving the user to infer the units correctly, or just ignore them safely if the model docs say so, as:

rate is e^(concentrationOf(SpeciesA) - concentrationOf(SpeciesB))

The same is kinda implicitly present already with rateOf, except without units, but that might be extended to rateOf of reactions, presenceOf (for gene products), amountOf (species), lower/upperBoundOf (reaction rate), sizeOf and dimensionOf (compartments), parameterVal (again with units!), amountChangeRateOf vs concentrationChangeRateOf (disambiguation for species), etc.

Encoding that into MathML might be a challenge, but I guess the "call" semantics as with rateOf could work (for units you may add direct name references to the unit list in the file). Alternative ways might be adding parameters to <ci>, but that again breaks the extension logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants