Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defining a basic, non-normative model for confidence in SSSOM ontologies #338

Open
matentzn opened this issue Nov 21, 2023 · 0 comments
Open

Comments

@matentzn
Copy link
Collaborator

Confidence in mappings is a tricky issue. While SSSOM has a nice confidence field, it is not very clear from the specification alone what it pertains to. There are at least two possible interpretations:

  1. confidence of the mapping: the likelihood that the mapping is correct. This is most likely the prevalent interpretation, but not the one we have intended.
  2. confidence of the mapping justification: degree of trust gained from the justification into the truthfulness of the mapping. This is what we originally intended, but never communicated very well.

In practice, both are quite similar (especially in the frequent case of only having a single justification), but the reality is that a mapping can have multiple justifications, all of which provide different levels of confidence into the truthfulness of the mapping. We can have a low confidence value provided by a lexical match justification, and a high confidence value by a human curated match, and neither, all by itself, says something about the "likelihood that the mappings is correct".

The matter of fact is, they mean something different. And to make things worse, we have the following to consider:

  • the phrase "likelihood that the mappings is correct" is basically meaningless as mappings cannot really be true in the philosophical sense of the idea of truth. Mappings can serve a purpose.
  • There are at least two more stakeholders that the sssom standard considers, but has not yet really documented well:
    • registry confidence: The confidence of a mapping registry into the quality of a specific mapping set, which is basically a measure of trust of the registry into the mapping provider
    • user ratings (semapv:MappingReview): Basically thumbs up/down votings or confirmations that a particular mapping is correct (this is similar to semapv:ManualMappingCuration but not quite the same, as it does not include the search for alternative, possibly better, mappings)

Now given all this complexity, it makes sense to think about a recommended way how tools should determine the overall confidence in a mapping. For example, consider an instance of OxO loading a mapping set with

  • a low registry_confidence (not too trustworthy, i.e. ad-hoc lexical matching)
  • multiple justifications per mapping (all with different confidence levels)

We also want to support a user-rating feature in the app (thumbs up/down).

The two concrete things we need to determine is this:

  1. How should the tool compute overall mapping confidence? ("Give me all the high confidence, >90%, mappings")
  2. How should the tool capture that confidence value? By creating an additional semapv:CompositeMatching justification with mapping_tool=OxO and a confidence value compounded of all the others? By adding a non-standard mapping_confidence value to the internal data model and use that to drive search?

I don't think anything should be done here in a normative way, but I think it is valuable to discuss this or at least have a ticket to capture some of our thoughts on the matter.

For me personally, right now, I tend to think something like this is a good start for computing the mapping confidence:

mapping confidence = (m*AVG(confidence)) * (n*RegistryConfidence) * (o * (thumbs-up/ratings))

with m, n, o initially set to 1, but independently adjustable by the mapping browser developer.

and recommending to throwing a new semapv:CompositeMatching justification into the mapping database to capture this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant