-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Treat identical citations always as duplicates? #160
Comments
I'd like to take a look at that final.ris twice example. Interested in what "enough" means exactly and what metadata those records are missing. We should provide users with instructions to ensure specific fields are complete, but could see this as an argument in the dedup. |
I'm going to add a discussion thread about building a test .ris file. This file should include known duplicates and false positives. We can easily label these in citesource to test various deduplication changes. |
Currently, CiteSource does not always treat identical citations as duplicates - if they are not complete enough, ASySD does not achieve sufficient confidence. For instance, if we import the working example
final.ris
with 242 results twice, ASySD finds 272 unique citations before manual deduplication.I would be minded to add a default in CiteSource that treats identical entries as duplicates, if this appears too risky for ASySD - as it stands, this means that summaries across stages are predictably misleading until one completes the manual deduplication (which makes CiteSource less useful for quick exploration than it could be ...)
@kaitlynhair @TNRiley what are your thoughts?
The text was updated successfully, but these errors were encountered: