Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full stop behind some VIAF person headings prevent automatic matching #23

Open
ChristianeKlaes opened this issue Nov 18, 2020 · 2 comments

Comments

@ChristianeKlaes
Copy link

Hi,

I'm using your VIAF recon service to reconcile scholar's names from the field of Lexicography and Dictionary Research, to contruct a domain bibliography and person registry in the Linked Open Data environment.

After reconciling and manually validating 200 person names with VIAF (and getting very good results in general!), I came across a peculiar feature in VIAF that seems to prevent automatic matching in many cases, and increases tedious manual validation. Apparently, one of the VIAF contributors, NUKAT, sets a full stop behind a person name heading, resulting in an otherwise non-existent edit distance and causing the score to drop below 1. Even with the selected option in OpenRefine to auto-match candidates with a high confidence during reconciling, the score is often below the threshold.

Typical example from my data:

Name literal: Quasthoff, Uwe
VIAF candidate: Quasthoff, Uwe. (score: 0.933)
VIAF URI: https://viaf.org/viaf/22741331/

As far as I can see, NUKAT ist the only VIAF contributor with a full stop behind a person's name, and yet this particular heading is always ranked highest in the VIAF cluster. As we have no way to anticipate whether a matching VIAF cluster includes NUKAT headings or not, is there a way to modify the matching algorithm and chop off the full stop (if it exists) for the candidates returned from VIAF?

This would really help to improve your VIAF recon service even further. Thanks for all the work you've already done!

Regards,
Christiane

@codeforkjeff
Copy link
Owner

Hi! I hope to look into this and the other issue this weekend.

@ChristianeKlaes
Copy link
Author

Hi,

I've done some more reconciling to VIAF and have come across an additional, related issue:

Some person name headings include disambiguating information like the birth year oder an occupation as a qualifier. These seem to be treated as part of the name literal, resulting in low scores:

grafik

In my database, I've got "Josselin-Leray, Amélie". VIAF recon service returns the correct match as a candidate with a score of only 0.786, when it should be 1.00 ...

Is there any way to eliminate those qualifiers before computing a matching score? At least in MARC21 format, qualifiers of a name are distinguished by their own subfield (in this instance, subfield code "d" - see all MARC21 specifications for personal name headings here: https://www.loc.gov/marc/authority/ad100.html)

grafik
(taken from this person's SUDOC record within VIAF, http://viaf.org/processed/SUDOC%7C096134925)

Thanks a lot!

Christiane

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants