Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Inconsistent xref scoring methods #3333

Open
tillprochaska opened this issue Sep 26, 2023 · 0 comments
Open

BUG: Inconsistent xref scoring methods #3333

tillprochaska opened this issue Sep 26, 2023 · 0 comments
Labels
backend Issues related to Aleph’s backend, API, CLI etc. bug Things that should work, but don’t Moderate Issue that may require attention

Comments

@tillprochaska
Copy link
Contributor

tillprochaska commented Sep 26, 2023

Describe the bug
The "Similar" tab for individual entities and collection-wide xref use different scoring methods.

To Reproduce
Steps to reproduce the behavior:

  1. Xref a collection and pick any pair of matching entities from the xref results.
  2. Click on the first entity of that pair to view the entity details.
  3. Open the "Similar" tab.
  4. Try to find the other entity.
  5. Most likely, the score in the "Similar" tab will be different from the score in the collection-wide xref results.

Expected behavior
Both the "Similar" tab and collection-wide xref should use the same scoring method.

Aleph version
Latest

Additional context

  • This is probably the case due to historic reasons. Aleph used to use the scoring methods from the followthemoney.compare module which computes similarity scores by property type and then reduces these scores into a single score by weighting property types (e.g. identifier properties might be more important because they are more specific compare to other property types).

  • Nowadays, Aleph still uses this method to compute the scores in the "Similar" tab. However, for collection-wide xref, Aleph now uses a machine-learning model to infer a similarity score.

  • I’m not aware of anything blocking us from using the "new" model to compute the similarity scores in the "Similar" tab as well. The main difference to collection-wide xref is that the scores are computed on-demand as part of the request-response cycle, so we’d need to check whether inferring the score is fast enough to not significantly increase response time.

  • Categorized as a moderate bug because it’s unexpected behavior and confusing for users, but the "Similar" tab doesn’t seem to be used a lot.

@tillprochaska tillprochaska added bug Things that should work, but don’t triage These issues need to be reviewed by the Aleph team backend Issues related to Aleph’s backend, API, CLI etc. Moderate Issue that may require attention and removed triage These issues need to be reviewed by the Aleph team labels Sep 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Issues related to Aleph’s backend, API, CLI etc. bug Things that should work, but don’t Moderate Issue that may require attention
Projects
None yet
Development

No branches or pull requests

1 participant