Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Similarity Score - usage question #348

Open
dr-shorthair opened this issue Feb 1, 2024 · 7 comments
Open

Similarity Score - usage question #348

dr-shorthair opened this issue Feb 1, 2024 · 7 comments

Comments

@dr-shorthair
Copy link

I'm assisting the development of some mappings within a set of ecosystem and land-use classifications.

The actual mappings are all done manually by subject-matter experts, so the mapping justification is semapv:ManualMappingCuration.

However, many of the mappings are partial, in this sense - a class from the source scheme maps to n classes in the target scheme, in known proportions, e.g.

  • 30% of source:Class345 will correspond to target:ClassDFG
  • 10% of source:Class345 will correspond to target:ClassHJK
  • 60% of source:Class345 will correspond to target:ClassZXC

Is this where semantic_similarity_score comes in? Would it be correct to set

  1. predicate_id to skos:relatedMatch (or should it be skos:narrowMatch?)
  2. semantic_similarity_score to 0.3 0.1 0.6 respectively

Or is this all application-dependent - i.e. it is up to us, since we are the ones who will be using the mappings.

@matentzn
Copy link
Collaborator

matentzn commented Feb 1, 2024

I have never wondered about this :D I am not entirely sure. What are the use cases for these kinds of alignments? What can you do with a 10% alignment practically I mean?

@DavidKeith
Copy link

Implementation of this solution requires an intermediate assumption that links the membership estimates (0.3, 0.6, 0.1 in your example) to spatial expression. For example, a membership estimate of 0.3 means that, based on the information in the class descriptions, there is a subjective probability of 0.3 that source:Class345 belongs to target:ClassDFG (ideally, subjective probabilities should be estimated by averaging estimates of replicated subject experts). To give this spatial expression in the way proposed, we need to assume that subjective probabilities of membership are directly related to spatial extent. i.e. membership = 0.3 means that 30% of the mapped extent of source:Class345 occurs [somewhere] within the mapped area of target:ClassDFG, but we do not know which 30% of 345. In many cases, uses may decide the assumption is reasonable for their application, though ideally it should be empirically evaluated with test data.

@dr-shorthair
Copy link
Author

dr-shorthair commented Feb 5, 2024

To provide a bit more context: the goal is to determine some property (function) of a spatial region, where we have a classification of the region using System 1, but the assessment requires its classification using System 2.

i.e.

  • we have a spatial region classified according to a term from System 1
  • to get an assessment of the region, we have some procedure that we can apply using terms from System 2
  • we know what proportions of the class from System 1 correspond to classes in System 2
  • so we compute an assessment based on the area (?) of the region multiplied by the proportion inferred to be in each class from System 2.

Using a (notional) example

Region q23w

  • has an area 230 sq.km
  • is classified as source:Class345

Using the proportions in the example above, this would mean that
--> 69 sq.km is inferred to be target:ClassDFG
--> 23 sq.km is inferred to be target:ClassHJK
--> 138 sq.km is inferred to be target:ClassZXC

so you do the assessment based on the latter three ...

(of course we don't know which​ 138 sq.km is ClassZXC, etc, but it is assumed to fall within Region q23w).

@matentzn
Copy link
Collaborator

matentzn commented Feb 5, 2024

This discussion is a tad out of my depth, I am sorry; I hope someone else from the @mapping-commons/sssom-core team can chip in and give feedback. Without understanding this exactly, I would say that such fuzzy matches are out of scope for SSSOM, but this does not have to keep you from using the semantic_similarity fields to record the information. In my view:

  1. confidence captures the level of certainty an agent has in the absolute truthfulness of the mapping (subject, predicate, object).
  2. semantic_similarity_score captures the result of a semantic similarity matching process, which was grounds to inform the mapping agent (the curator, or the tool) to assert the mapping, whose truthfulness is still absolute (i.e. not fuzzy / partial)

Maybe however my internal and your internal model of your question only have a very low "semantic overlap" and what I am saying here is completely off topic 😛

@gouttegd
Copy link
Contributor

gouttegd commented Feb 5, 2024

My 2 cents:

  • It seems to me that using semantic_similarity_score for that purpose would be overloading its intended meaning. If you need to do it, I strongly recommend making sure you also fill the semantic_similarity_measure field to point to a resource that makes clear what kind of “score“ is actually stored there (in fact, I personally think that semantic_similarity_score should never be used without an accompanying semantic_similarity_measure no matter what).
  • This might be a case where the use of a non-standard field (“extension slot”) is warranted. We don’t recommend that in the sake of interoperability, but since those mappings are apparently intended for internal use only (“we are the ones who will use the mappings”), this may be acceptable. The problem is that extension slots cannot really be used for now, because support for them is still experimental in SSSOM-Java and inexistent in SSSOM-Py. (So if you prepare a mapping set with extension slots and then at some point process the set with SSSOM-Py, the extension slots will be lost.)

@dr-shorthair
Copy link
Author

I would say that such fuzzy matches are out of scope for SSSOM

Hmm. That would be disappointing. I really doubt it is really just a niche concern - it certainly isn't in linguistics. Partial matches are supported by narrowMatch/broadMatch already. We just have an assessment of the proportions of the extension of the source class match the target classes.

I understand that semantic_similarity_score was devised to capture the result of an automated similarity assessment. But its semantics appear to match our application as well. The semantic_similarity_measure would be something like semapv:ManualMappingCuration again.

Perhaps we just adopt a local convention in the context of our project to use the slots in this way. But I thought it was worth canvassing this list to see if a similar use case had already been encountered.

@cthoyt
Copy link
Member

cthoyt commented Feb 5, 2024

I am not sure using SSSOM to describe the extent of overlap of regions is the right use of SSSOM. This seems more like a more general kind of relationship instead of a mapping. From what I understand, the semantic similarity measurement should be something like "ontological similarity" like what https://github.com/related-sciences/nxontology implements, but it's understandable that this up to interpretation since the docs are completely empty for https://mapping-commons.github.io/sssom/semantic_similarity_score/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants