Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SBO term for confidence score #5

Open
draeger opened this issue Jan 30, 2016 · 18 comments
Open

SBO term for confidence score #5

draeger opened this issue Jan 30, 2016 · 18 comments
Labels
feature Issues that aim to introduce new feature in ModelPolisher.
Projects

Comments

@draeger
Copy link
Member

draeger commented Jan 30, 2016

Request a new SBO term to be used for confidence scores.

@draeger draeger self-assigned this Jan 30, 2016
@aebrahim
Copy link

How exactly will this work if SBO terms are to be unique per reaction?

@draeger
Copy link
Member Author

draeger commented Jan 30, 2016

At the moment there is no field where we can store the confidence scores. We either need parameters or something new. Parameters aren't appropriate, because we cannot refer to a reaction from them. Local parameters aren't suitable either because we would then need to create a kinetic law whose math element must not be empty. I am currently thinking about what to do with confidence scores.

@aebrahim
Copy link

Ah I see. That's still a TBD.

I think one approach would be to create an evidence type. So you could link to a paper, and classify the type of evidence it is. I'm not yet sure if that will work with every case though.

@draeger
Copy link
Member Author

draeger commented Jan 30, 2016

@aebrahim
Copy link

What parts in particular would be relevant? This seems to be about sampling from distributions, and I can't see how that's related.

@draeger
Copy link
Member Author

draeger commented Jan 31, 2016

Yes. I wanted to check if it also includes confidence scores, but haven't seen it either. Conclusion, we probably need some additional field where we can put this.

@aebrahim
Copy link

I think this calls for a new "citations" or "evidence" package

@draeger
Copy link
Member Author

draeger commented Feb 1, 2016

Good idea! I'll collect all other missing fields and see what else is needed. I'll raise this point in the next SBML team meeting (tomorrow).

@draeger
Copy link
Member Author

draeger commented Jan 5, 2018

This was further discussed in thread opencobra/schema/issues/4, where @matthiaskoenig had the idea to use more specific terms from the evidenceontology.org. We should check if we can make use of this here.

@matthiaskoenig
Copy link
Collaborator

In my opinion an SBO term is the wrong way to do this. The evidence ontology ECO is absolutely sufficient to encode all the evidence today.

It is part of the MIRIAM registry collections
https://www.ebi.ac.uk/miriam/main/datatypes/MIR:00000055

Used in multiple projects and allows encoding the evidence for projects like UniProt
"Standardized description of scientific evidence using the Evidence Ontology (ECO)"
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4105709/

In addition it is very easy to use and supported today by SBML and other standard formats like CellML. One just has to write the annotation and that is it. No need for any additional package.

To annotate the evidence for an SBML element just write the annotation for the evidence. For instance to say that a certain reaction/protein is based on "high throughput evidence used in automatic assertion (ECO:0006057)" just do:

 <rdf:Description rdf:about="./BIOMD0000000176.xml#_525635">
    <bqbiol:isDescribedBy rdf:resource="http://identifiers.org/eco/ECO:0006057" />
<rdf:Description/>

Please no new mechanisms if there are established working mechanisms to encode all the information, and in a much better way than evidence codes. By using composite annotations even the original datasets and publications for the evidence can be easily stored in the annotation for the SBML element.

Best Matthias

@draeger draeger assigned mephenor and unassigned draeger Jan 19, 2018
@draeger
Copy link
Member Author

draeger commented Jan 19, 2018

Here is an overview of the scores as these are usually defined in COBRA, where 0 is best and 4 is lowest confidence.

  • 0 = Biochemical: Enzyme has been tested biochemically.
  • 1 = Genetic: Gene overexpression and purification, gene deletions.
  • 2 = Sequence: There is significant sequence similarity to another gene with known function.
  • 3 = Physiological: There is physiological data to support inclusion in the model.
  • 4 = Modeling: Reaction is included to improve simulation results

For the export from COBRA/BiGG models to SBML we will only need to find the closest terms from ECO for these 4 levels. For the other direction we will need to also define a rule how to match terms between those.

@matthiaskoenig
Copy link
Collaborator

Here some suggestion, please feel free to correct. If this is not exact enough additional terms should be added to ECO.

0 = Biochemical: Enzyme has been tested biochemically.

ECO:0000002: direct assay evidence
A type of experimental evidence resulting from the direct measurement of 
some aspect of a biological feature.

Or a subclass of it to be more specific like e.g.,
ECO:0000005: enzyme assay evidence
http://evidenceontology.org/browse/#ECO_0000002

1 = Genetic: Gene overexpression and purification, gene deletions.

ECO:0000073: experimental genomic evidence
A type of experimental evidence that is based on the 
characterization of an attribute of the genome underlying a gene product.

http://evidenceontology.org/browse/#ECO_0000073

2 = Sequence: There is significant sequence similarity to another gene with known function.

ECO:0000044: sequence similarity evidence 
A type of similarity based on biomolecular sequence.

http://evidenceontology.org/browse/#ECO_0000044

3 = Physiological: There is physiological data to support inclusion in the model.

ECO:0005551: biological system reconstruction evidence by experimental evidence
A type of biological system reconstruction evidence that uses 
experimental evidence as support.

4 = Modeling: Reaction is included to improve simulation results
Personally I think this is problematic, because it states "there is no evidence". Personally I think this should just not have an evidence code, which clearly indicates this was just added without any evidence. I.e. if there is no evidence, i.e, 4 modeling than it has no evidence code. It just states "we added this so we get the results we want"
Alternatively something like:

ECO:0000001: inference from background scientific knowledge
A type of curator inference where conclusions are drawn 
based on the background scientific knowledge of the curator.

@matthiaskoenig
Copy link
Collaborator

And forgot:
About the rules: You just use the ontology tree to match the terms. I.e. everything which is below the respective terms is matched to the terms. If evidence codes in SBML not a subelement of the suggested ECOS than no match can be done.

@draeger
Copy link
Member Author

draeger commented Jan 19, 2018

This looks like a very good start! Thanks @matthiaskoenig. We should also direct @tpfau to this suggestion.

@matthiaskoenig
Copy link
Collaborator

matthiaskoenig commented Jan 21, 2018

Just to add to this:
The big advantage of using annotation via ECO is that it allows to store the confidence! and especially what is the basis of the confidence, because one can add multiple evidence annotations ! This is crucial for metabolic network reconstructions and one of the big short comings of the current confidence scores.

One wants to store for a reaction all the evidence which is there, not only the minimal common denominator.
Example given: One has a reaction R1

  • there is some evidence based on homology to mouse (-> add an annotation to homology evidence)
  • there is some evidence based on protein data (-> add an annotation to experimental evidence based on protein)
  • there is some bioinformatics inference for R1 (-> add the inference evidence)
  • there is some indirect evidence based on mRNA (-> add the infered from experimental data evidence)
  • If there is some evidence only in a certain tissue, based on omics data (-> add a complex annotation of this evidence to the reaction)

Suddenly you have the collection of evidence and confidence for the reaction and not only a "0". Confidence scores is nothing anybody should use in a reconstruction in 2018.

@ChristianLieven
Copy link

ChristianLieven commented Jan 21, 2018

Supporting this is something @Midnighter and @cdiener may also want to consider when improving the cobrapy parsers. Once this finds its way into Cobrapy.Model objects I'm very happy to start writing tests for this in memote.

Important to me is that one can directly link the ECO terms with links to the literature (DOI, PubmedID, etc). But if I understand @matthiaskoenig correctly, composite annotations would allow us to do this!

@tpfau
Copy link

tpfau commented Jan 22, 2018

In general I think using ECO here is a very good idea.
However, there are methods which rely on the 0-4 schematic used by bigg, and we should offer some way to translate at least the ECO top levels:

        ECO:0000006 experimental evidence
        ECO:0000041 similarity evidence
        ECO:0000088 biological system reconstruction evidence
        ECO:0000177 genomic context evidence
        ECO:0000204 author statement
        ECO:0000212 combinatorial evidence
        ECO:0000311 imported information
        ECO:0000352 evidence used in manual assertion
        ECO:0000361 inferential evidence
        ECO:0000501 evidence used in automatic assertion
        ECO:0006055 high throughput evidence

To the 0-4 levels.

@Linelili
Copy link

Please note that COBRA's definition of the confidence scores (0= best, 4 = lowest confidence score) is inverse to the definition of Ines Thiele's and Bernhard Ø. Palsson's "A protocol for generating a high-quality genome-scale metabolic reconstruction", where 4 is the best and 0 the lowest confidence score (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3125167/table/T2/?report=objectonly).

Hence, using ECO numbers instead of scores from 0 to 4 might help to avoid confusion.

@mephenor mephenor added this to Close open issues in Release 2.1 Nov 7, 2019
@mephenor mephenor moved this from Close open issues to Backlog in Release 2.1 Jan 31, 2020
@Schmoho Schmoho added the feature Issues that aim to introduce new feature in ModelPolisher. label May 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Issues that aim to introduce new feature in ModelPolisher.
Projects
Release 2.1
  
Todo
Development

No branches or pull requests

8 participants