Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to describe predicted vs measured/observed phenotypes? #477

Open
ddooley opened this issue Feb 19, 2022 · 2 comments
Open

How to describe predicted vs measured/observed phenotypes? #477

ddooley opened this issue Feb 19, 2022 · 2 comments
Assignees

Comments

@ddooley
Copy link

ddooley commented Feb 19, 2022

  1. Phenotypes can be observed/measured, or inferred from data (e.g. genomic sequence data). If inferred, there is a degree of uncertainty as to whether the expected phenotype will actually inhere in the thing. For example in organism X, a specific mutation may correlate with resistance to a given drug. The correlation has not been tested and confirmed for organism Y. It is expected that resistance will occur if the mutation is observed, but has not been confirmed. In this case the phenotype is predicted and there is a degree of uncertainty that needs to be communicated to users of the data e.g. clinicians.

    Could we introduce “predicted phenotype” into PATO?

  2. Confidence in predicted phenotypes: When predictions about phenotypes are made based on data and correlations, if the correlations are strong, there can be a high degree of confidence in the predictions. If there is less data or the correlations are weaker, there may be less confidence in the predictions. The confidence level may be descriptive (e.g. high, moderate, low), or could be numerical (given a statistical method).

    Would PATO be the place for “predicted phenotype confidence level”?

Emma would be happy to discuss this in a PATO curation call if desired.

c/o @griffie

@shawntanzk
Copy link
Contributor

assigning @dosumis and @cmungall - figure you'd be interested in this.

Would PATO be the place for “predicted phenotype confidence level”?

Regarding this, I think it is similar to something we are trying to work out in the Brain Data Standards ontology in recording confidence of markers that identify a cell type. We are trying to do this through having a class that contains information about the method and the statistics (in our case NS-forest and F-beta score) - we haven't fully figured this out yet, but we are trying to use STATO to record the confidence scores, if you're interested, you can follow the thread here: ISA-tools/stato#85

@dosumis
Copy link
Contributor

dosumis commented Feb 21, 2022

I think PATO is not the right place. A predicted phenotype is not a quality of something, like its colour, length or shape. It also sounds like you need properties, which PATO has not, to now, been in the business of minting.

In general there is resistance in OBO to recording uncertainty/evidence in assertions rather than on them. Many of the assertions we make have some degree of uncertainty - with evidence improving over time.

I think the conventional OBO way to do this would be to annotate X has_phenotype some Y (or a simple triple "X has_phenotype Y") with an AP axiom recording some confidence score. I think a dedicated OP (has predicted phenotype) would cause less problems at the individual level than the class level. You cold then accumulate individual pieces of evidence as separate annotated triples. Query-wise. RDF/OWL is not a great fit for this type of thing. Converting to a graph representation with edge annotations (e.g. in Neo4j) is much better.

Perhaps RO or STATO would be suitable places for the confidence score AP?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants