SBO term for confidence score #5

draeger · 2016-01-30T19:38:22Z

Request a new SBO term to be used for confidence scores.

aebrahim · 2016-01-30T19:39:18Z

How exactly will this work if SBO terms are to be unique per reaction?

draeger · 2016-01-30T19:42:33Z

At the moment there is no field where we can store the confidence scores. We either need parameters or something new. Parameters aren't appropriate, because we cannot refer to a reaction from them. Local parameters aren't suitable either because we would then need to create a kinetic law whose math element must not be empty. I am currently thinking about what to do with confidence scores.

aebrahim · 2016-01-30T19:46:38Z

Ah I see. That's still a TBD.

I think one approach would be to create an evidence type. So you could link to a paper, and classify the type of evidence it is. I'm not yet sure if that will work with every case though.

draeger · 2016-01-30T19:52:13Z

There is an SBML package for distributions, need to check if this can be helpful: http://sourceforge.net/p/sbml/code/HEAD/tree/trunk/specifications/sbml-level-3/version-1/distrib/sbml-level-3-distrib-package-proposal.pdf?format=raw

aebrahim · 2016-01-31T01:45:16Z

What parts in particular would be relevant? This seems to be about sampling from distributions, and I can't see how that's related.

draeger · 2016-01-31T05:07:56Z

Yes. I wanted to check if it also includes confidence scores, but haven't seen it either. Conclusion, we probably need some additional field where we can put this.

aebrahim · 2016-01-31T06:18:04Z

I think this calls for a new "citations" or "evidence" package

draeger · 2016-02-01T11:01:51Z

Good idea! I'll collect all other missing fields and see what else is needed. I'll raise this point in the next SBML team meeting (tomorrow).

draeger · 2018-01-05T13:45:07Z

This was further discussed in thread opencobra/schema/issues/4, where @matthiaskoenig had the idea to use more specific terms from the evidenceontology.org. We should check if we can make use of this here.

matthiaskoenig · 2018-01-19T10:55:33Z

In my opinion an SBO term is the wrong way to do this. The evidence ontology ECO is absolutely sufficient to encode all the evidence today.

It is part of the MIRIAM registry collections
https://www.ebi.ac.uk/miriam/main/datatypes/MIR:00000055

Used in multiple projects and allows encoding the evidence for projects like UniProt
"Standardized description of scientific evidence using the Evidence Ontology (ECO)"
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4105709/

In addition it is very easy to use and supported today by SBML and other standard formats like CellML. One just has to write the annotation and that is it. No need for any additional package.

To annotate the evidence for an SBML element just write the annotation for the evidence. For instance to say that a certain reaction/protein is based on "high throughput evidence used in automatic assertion (ECO:0006057)" just do:

 <rdf:Description rdf:about="./BIOMD0000000176.xml#_525635">
    <bqbiol:isDescribedBy rdf:resource="http://identifiers.org/eco/ECO:0006057" />
<rdf:Description/>

Please no new mechanisms if there are established working mechanisms to encode all the information, and in a much better way than evidence codes. By using composite annotations even the original datasets and publications for the evidence can be easily stored in the annotation for the SBML element.

Best Matthias

draeger · 2018-01-19T14:02:29Z

Here is an overview of the scores as these are usually defined in COBRA, where 0 is best and 4 is lowest confidence.

0 = Biochemical: Enzyme has been tested biochemically.
1 = Genetic: Gene overexpression and purification, gene deletions.
2 = Sequence: There is significant sequence similarity to another gene with known function.
3 = Physiological: There is physiological data to support inclusion in the model.
4 = Modeling: Reaction is included to improve simulation results

For the export from COBRA/BiGG models to SBML we will only need to find the closest terms from ECO for these 4 levels. For the other direction we will need to also define a rule how to match terms between those.

matthiaskoenig · 2018-01-19T14:47:13Z

Here some suggestion, please feel free to correct. If this is not exact enough additional terms should be added to ECO.

0 = Biochemical: Enzyme has been tested biochemically.

ECO:0000002: direct assay evidence
A type of experimental evidence resulting from the direct measurement of 
some aspect of a biological feature.

Or a subclass of it to be more specific like e.g.,
ECO:0000005: enzyme assay evidence
http://evidenceontology.org/browse/#ECO_0000002

1 = Genetic: Gene overexpression and purification, gene deletions.

ECO:0000073: experimental genomic evidence
A type of experimental evidence that is based on the 
characterization of an attribute of the genome underlying a gene product.

http://evidenceontology.org/browse/#ECO_0000073

2 = Sequence: There is significant sequence similarity to another gene with known function.

ECO:0000044: sequence similarity evidence 
A type of similarity based on biomolecular sequence.

http://evidenceontology.org/browse/#ECO_0000044

3 = Physiological: There is physiological data to support inclusion in the model.

ECO:0005551: biological system reconstruction evidence by experimental evidence
A type of biological system reconstruction evidence that uses 
experimental evidence as support.

4 = Modeling: Reaction is included to improve simulation results
Personally I think this is problematic, because it states "there is no evidence". Personally I think this should just not have an evidence code, which clearly indicates this was just added without any evidence. I.e. if there is no evidence, i.e, 4 modeling than it has no evidence code. It just states "we added this so we get the results we want"
Alternatively something like:

ECO:0000001: inference from background scientific knowledge
A type of curator inference where conclusions are drawn 
based on the background scientific knowledge of the curator.

matthiaskoenig · 2018-01-19T14:50:28Z

And forgot:
About the rules: You just use the ontology tree to match the terms. I.e. everything which is below the respective terms is matched to the terms. If evidence codes in SBML not a subelement of the suggested ECOS than no match can be done.

draeger · 2018-01-19T15:38:19Z

This looks like a very good start! Thanks @matthiaskoenig. We should also direct @tpfau to this suggestion.

matthiaskoenig · 2018-01-21T09:49:38Z

Just to add to this:
The big advantage of using annotation via ECO is that it allows to store the confidence! and especially what is the basis of the confidence, because one can add multiple evidence annotations ! This is crucial for metabolic network reconstructions and one of the big short comings of the current confidence scores.

One wants to store for a reaction all the evidence which is there, not only the minimal common denominator.
Example given: One has a reaction R1

there is some evidence based on homology to mouse (-> add an annotation to homology evidence)
there is some evidence based on protein data (-> add an annotation to experimental evidence based on protein)
there is some bioinformatics inference for R1 (-> add the inference evidence)
there is some indirect evidence based on mRNA (-> add the infered from experimental data evidence)
If there is some evidence only in a certain tissue, based on omics data (-> add a complex annotation of this evidence to the reaction)

Suddenly you have the collection of evidence and confidence for the reaction and not only a "0". Confidence scores is nothing anybody should use in a reconstruction in 2018.

ChristianLieven · 2018-01-21T20:23:39Z

Supporting this is something @Midnighter and @cdiener may also want to consider when improving the cobrapy parsers. Once this finds its way into Cobrapy.Model objects I'm very happy to start writing tests for this in memote.

Important to me is that one can directly link the ECO terms with links to the literature (DOI, PubmedID, etc). But if I understand @matthiaskoenig correctly, composite annotations would allow us to do this!

tpfau · 2018-01-22T05:32:41Z

In general I think using ECO here is a very good idea.
However, there are methods which rely on the 0-4 schematic used by bigg, and we should offer some way to translate at least the ECO top levels:

        ECO:0000006 experimental evidence
        ECO:0000041 similarity evidence
        ECO:0000088 biological system reconstruction evidence
        ECO:0000177 genomic context evidence
        ECO:0000204 author statement
        ECO:0000212 combinatorial evidence
        ECO:0000311 imported information
        ECO:0000352 evidence used in manual assertion
        ECO:0000361 inferential evidence
        ECO:0000501 evidence used in automatic assertion
        ECO:0006055 high throughput evidence

To the 0-4 levels.

Linelili · 2019-06-27T09:30:19Z

Please note that COBRA's definition of the confidence scores (0= best, 4 = lowest confidence score) is inverse to the definition of Ines Thiele's and Bernhard Ø. Palsson's "A protocol for generating a high-quality genome-scale metabolic reconstruction", where 4 is the best and 0 the lowest confidence score (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3125167/table/T2/?report=objectonly).

Hence, using ECO numbers instead of scores from 0 to 4 might help to avoid confusion.

draeger added the enhancement label Jan 30, 2016

draeger self-assigned this Jan 30, 2016

draeger assigned mephenor and unassigned draeger Jan 19, 2018

draeger mentioned this issue Jan 20, 2018

SBML -> Confidence Scores opencobra/schema#9

Open

This was referenced Jan 21, 2018

In cobra.io neither sbml.py nor sbml3.py seem to import or export notes. opencobra/schema#4

Open

[Feature Request] Store collections of evidence annotation opencobra/cobrapy#653

Closed

ChristianLieven mentioned this issue Oct 14, 2018

Missing GPRs opencobra/memote#214

Open

sulheim mentioned this issue Oct 26, 2018

feat: additional annotations SysBioChalmers/Sco-GEM#44

Closed

7 tasks

mephenor added this to Close open issues in Release 2.1 Nov 7, 2019

mephenor moved this from Close open issues to Backlog in Release 2.1 Jan 31, 2020

Schmoho removed the enhancement label May 10, 2022

Schmoho added the feature Issues that aim to introduce new feature in ModelPolisher. label May 10, 2022

Schmoho unassigned mephenor May 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SBO term for confidence score #5

SBO term for confidence score #5

draeger commented Jan 30, 2016

aebrahim commented Jan 30, 2016

draeger commented Jan 30, 2016

aebrahim commented Jan 30, 2016

draeger commented Jan 30, 2016

aebrahim commented Jan 31, 2016

draeger commented Jan 31, 2016

aebrahim commented Jan 31, 2016

draeger commented Feb 1, 2016

draeger commented Jan 5, 2018 •

edited

matthiaskoenig commented Jan 19, 2018

draeger commented Jan 19, 2018

matthiaskoenig commented Jan 19, 2018

matthiaskoenig commented Jan 19, 2018

draeger commented Jan 19, 2018

matthiaskoenig commented Jan 21, 2018 •

edited

ChristianLieven commented Jan 21, 2018 •

edited

tpfau commented Jan 22, 2018

Linelili commented Jun 27, 2019

SBO term for confidence score #5

SBO term for confidence score #5

Comments

draeger commented Jan 30, 2016

aebrahim commented Jan 30, 2016

draeger commented Jan 30, 2016

aebrahim commented Jan 30, 2016

draeger commented Jan 30, 2016

aebrahim commented Jan 31, 2016

draeger commented Jan 31, 2016

aebrahim commented Jan 31, 2016

draeger commented Feb 1, 2016

draeger commented Jan 5, 2018 • edited

matthiaskoenig commented Jan 19, 2018

draeger commented Jan 19, 2018

matthiaskoenig commented Jan 19, 2018

matthiaskoenig commented Jan 19, 2018

draeger commented Jan 19, 2018

matthiaskoenig commented Jan 21, 2018 • edited

ChristianLieven commented Jan 21, 2018 • edited

tpfau commented Jan 22, 2018

Linelili commented Jun 27, 2019

draeger commented Jan 5, 2018 •

edited

matthiaskoenig commented Jan 21, 2018 •

edited

ChristianLieven commented Jan 21, 2018 •

edited