You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
The "Similar" tab for individual entities and collection-wide xref use different scoring methods.
To Reproduce
Steps to reproduce the behavior:
Xref a collection and pick any pair of matching entities from the xref results.
Click on the first entity of that pair to view the entity details.
Open the "Similar" tab.
Try to find the other entity.
Most likely, the score in the "Similar" tab will be different from the score in the collection-wide xref results.
Expected behavior
Both the "Similar" tab and collection-wide xref should use the same scoring method.
Aleph version
Latest
Additional context
This is probably the case due to historic reasons. Aleph used to use the scoring methods from the followthemoney.compare module which computes similarity scores by property type and then reduces these scores into a single score by weighting property types (e.g. identifier properties might be more important because they are more specific compare to other property types).
Nowadays, Aleph still uses this method to compute the scores in the "Similar" tab. However, for collection-wide xref, Aleph now uses a machine-learning model to infer a similarity score.
I’m not aware of anything blocking us from using the "new" model to compute the similarity scores in the "Similar" tab as well. The main difference to collection-wide xref is that the scores are computed on-demand as part of the request-response cycle, so we’d need to check whether inferring the score is fast enough to not significantly increase response time.
Categorized as a moderate bug because it’s unexpected behavior and confusing for users, but the "Similar" tab doesn’t seem to be used a lot.
The text was updated successfully, but these errors were encountered:
tillprochaska
added
bug
Things that should work, but don’t
triage
These issues need to be reviewed by the Aleph team
backend
Issues related to Aleph’s backend, API, CLI etc.
Moderate
Issue that may require attention
and removed
triage
These issues need to be reviewed by the Aleph team
labels
Sep 26, 2023
Describe the bug
The "Similar" tab for individual entities and collection-wide xref use different scoring methods.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Both the "Similar" tab and collection-wide xref should use the same scoring method.
Aleph version
Latest
Additional context
This is probably the case due to historic reasons. Aleph used to use the scoring methods from the
followthemoney.compare
module which computes similarity scores by property type and then reduces these scores into a single score by weighting property types (e.g.identifier
properties might be more important because they are more specific compare to other property types).Nowadays, Aleph still uses this method to compute the scores in the "Similar" tab. However, for collection-wide xref, Aleph now uses a machine-learning model to infer a similarity score.
I’m not aware of anything blocking us from using the "new" model to compute the similarity scores in the "Similar" tab as well. The main difference to collection-wide xref is that the scores are computed on-demand as part of the request-response cycle, so we’d need to check whether inferring the score is fast enough to not significantly increase response time.
Categorized as a moderate bug because it’s unexpected behavior and confusing for users, but the "Similar" tab doesn’t seem to be used a lot.
The text was updated successfully, but these errors were encountered: