support for export to cobrapy-compatible yaml format #77

edkerk · 2018-04-06T12:16:01Z

RAVEN should support writing a cobrapy-compatible yaml format, as this format is very concise.

make writeYaml function that writes a text file, and parses through the model structure.
remove old Yaml-export functionality
modify exportForGit to support new writeYaml function
confirm that output from this function is identical to cobrapy-yaml (except for RAVEN-unique fields)

A few considerations:

Use the correct format. This example is old, rather use this example of the yeast consensus network.
Do not reformat any ids to replace non-standard characters. This is done when writing SBML, as SBML doesn't support certain characters. But the YAML file should represent the model as it is in MATLAB.
Include as many annotations as possible:

For each metabolite, include:

mets
metNames
metComps
inchis
metFormulas
metMiriams (any)
metCharges
unconstrained
rxnFrom (?)

For each reaction, include:

rxns
rxnNames
rxnComps
metabolites and their stoichiometry
grRules
subSystems
eccodes
rxnMiriams (any)
rxnNotes
rxnConfidenceScores

For each compartment, include:

comps
compNames
compOutside
compMiriams

For each gene, include:

genes
geneComps
geneMiriams
geneShortNames

But of course only write those fields if they are present in the model.

The text was updated successfully, but these errors were encountered:

edkerk · 2018-04-06T12:17:52Z

Another consideration. If you look at the example from yeast consensus network, metabolite names are appended with [compartment] in the cobrapy-version. This should not be done in this function, as it should represent how the model is presented in MATLAB.

BenjaSanchez · 2018-04-10T15:07:09Z

@edkerk sounds good! I will write the writeYaml function. I assume we should continue the work on branch fix/export_functions?

edkerk · 2018-04-10T16:25:38Z

Yups, I'll push devel to it, to ensure latest changes. (done: #80)

edkerk · 2018-04-12T12:20:54Z

A writeYaml function has been added in branch fix/export_functions: 23bb1f7

@BenjaSanchez a few points:

note where publications are stored: pubmed ids can be stored in rxnMiriams (as pmid), while non-pubmed ids can be stored in rxnReferences. Neither should be stored in rxnNotes, this field is for generic notes
note that not all fields are compulsory: if a model has rxnMiriams field, the function requires that is also has ec-codes, rxnKEGG and rxnNotes, according to line 110: if ~isempty(model.eccodes{pos}) || ~isempty(model.rxnKEGG{pos}) || ~isempty(model.rxnNotes{pos}) fails if one of them doesn't exist.

BenjaSanchez · 2018-04-12T12:40:18Z

@edkerk RE the first point, this is because I was using a COBRA structure, which does not support rxnMiriams, it can only store pmids either in rxnReferences or rxnNotes (see here), and at the moment they are stored in rxnNotes which get transfered to the same field by ravenCobraWrapper. Should I move those fields in the yeast model then to rxnReferences? In the case of a generic RAVEN model, there will not be any issue once we implement the generic extractMiriam we discussed in Gitter.

RE the second point, that I can also address once we have the improved extractMiriam

edkerk · 2018-04-12T13:18:34Z

Looking at the specification of Cobra model that you linked, they also indeed be in rxnReferences, for both Cobra and Raven. Also, there it specifies that rxnReferences is "Column Cell Array of Strings" and "of references for each corresponding reaction.", so not necessarily pubmed IDs.
You're right, an improved extractMiriam will probably change the code a bit anyway.

edkerk · 2018-04-12T13:31:45Z

Irrespective, pubmed IDs don't need to be prefixed with pmid:. This is done for ChEBI as it really is part of the identifier (here, CHEBI:17234), not because it would otherwise be a number-only. Also compare for instances ChEBI and KEGG compound on identifiers.org, pay attention to the identifier pattern.

BenjaSanchez · 2018-04-13T08:43:51Z

actually now testing with COBRA not even rxnReferences can be used to store things, as after a I/O cycle everything there gets transfered to rxnNotes. So pmid's will continue to be stored in the yeast model in rxnNotes with the format pmid:XXX; pmid:YYY, and I will include in ravenCobraWrapper a section for detecting these cases and sending them to rxnMiriams

edkerk · 2018-04-13T08:55:38Z

Perhaps then also start an issue at Cobra, because they do state that rxnReferences and rxnNotes are separate fields?

tpfau · 2018-04-19T06:06:43Z

@BenjaSanchez

actually now testing with COBRA not even rxnReferences can be used to store things, as after a I/O cycle everything there gets transfered to rxnNotes. So pmid's will continue to be stored in the yeast model in rxnNoteswith the format pmid:XXX; pmid:YYY, and I will include in ravenCobraWrappera section for detecting these cases and sending them to rxnMiriams

Could you give me an example for this in the COBRA toolbox?
In general: PMIDs should (imo) be added via MIRIAM annotations. Pubmed is listed on registry.org and is parsed by the COBRA toolbox SBML parser into the rxnReferences field, if it is correctly annotated (i.e. using the isDescribedBy bioql qualifier).
We try to put into notes things that are either invalid IDs (as mentioned above, PMID1234 is not a valid Pubmed id, while 1234 is, so if an invalid PMID is provided, that is likely to go into the notes field, but not into rxnReferences during IO cycles.

BenjaSanchez · 2018-04-20T12:39:42Z

@tpfau thanks for that insight! now with only the id the field gets properly stored in rxnReferences in a full cycle I/O. I've adapted the corresponding ravenCobraWrapper in RAVEN (df918f5) and the yeast model .xml (68d5e8a).

edkerk · 2021-04-07T12:59:40Z

As far as I know the yaml format from RAVEN and cobrapy are not identical.

RAVEN does not yet output metadata (feat: addition of metadata section to the yaml file specification in RAVEN #311), while cobrapy does includes in the yaml file, although not necessarily all that is mentioned in feat: addition of metadata section to the yaml file specification in RAVEN #311.
RAVEN gives list type field as array of string (Output list type field as array of string in Yaml #107), but cobrapy does not.

edkerk added the feature A new function or new functionality for an existing function label Apr 6, 2018

edkerk assigned edkerk, BenjaSanchez and simas232 Apr 6, 2018

BenjaSanchez mentioned this issue Apr 12, 2018

Feat/new formats SysBioChalmers/yeast-GEM#88

Merged

BenjaSanchez mentioned this issue Apr 16, 2018

Fix/export functions #81

Merged

BenjaSanchez mentioned this issue Apr 20, 2018

fix: pmids now in rxnReferences #85

Merged

BenjaSanchez mentioned this issue May 15, 2018

Output list type field as array of string in Yaml #107

Closed

mihai-sysbio mentioned this issue Apr 7, 2021

feat: yaml worflow SysBioChalmers/Human-GEM#173

Merged

2 tasks

haowang-bioinfo closed this as completed Apr 7, 2021

edkerk reopened this Apr 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support for export to cobrapy-compatible yaml format #77

support for export to cobrapy-compatible yaml format #77

edkerk commented Apr 6, 2018 •

edited

edkerk commented Apr 6, 2018

BenjaSanchez commented Apr 10, 2018

edkerk commented Apr 10, 2018 •

edited

edkerk commented Apr 12, 2018 •

edited by BenjaSanchez

BenjaSanchez commented Apr 12, 2018

edkerk commented Apr 12, 2018

edkerk commented Apr 12, 2018

BenjaSanchez commented Apr 13, 2018

edkerk commented Apr 13, 2018

tpfau commented Apr 19, 2018

BenjaSanchez commented Apr 20, 2018

edkerk commented Apr 7, 2021

support for export to cobrapy-compatible yaml format #77

support for export to cobrapy-compatible yaml format #77

Comments

edkerk commented Apr 6, 2018 • edited

edkerk commented Apr 6, 2018

BenjaSanchez commented Apr 10, 2018

edkerk commented Apr 10, 2018 • edited

edkerk commented Apr 12, 2018 • edited by BenjaSanchez

BenjaSanchez commented Apr 12, 2018

edkerk commented Apr 12, 2018

edkerk commented Apr 12, 2018

BenjaSanchez commented Apr 13, 2018

edkerk commented Apr 13, 2018

tpfau commented Apr 19, 2018

BenjaSanchez commented Apr 20, 2018

edkerk commented Apr 7, 2021

edkerk commented Apr 6, 2018 •

edited

edkerk commented Apr 10, 2018 •

edited

edkerk commented Apr 12, 2018 •

edited by BenjaSanchez