Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide function to include provided annotations for extractions #332

Open
caufieldjh opened this issue Feb 12, 2024 · 0 comments
Open

Provide function to include provided annotations for extractions #332

caufieldjh opened this issue Feb 12, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@caufieldjh
Copy link
Member

caufieldjh commented Feb 12, 2024

There are cases in which we already have input text with entity annotations (e.g., from PubTator etc.).
These may be inline or in other formats, like JSON.
It would be useful to be able to pass these annotations directly to the extracted output, independent of any SPIRES extraction.
(this relates to the MAXO annotation extraction specifically)

For example:
We have input text containing "I take aspirin for a headache" and we're trying to extract relations between drugs and symptoms.
I already have the annotation of CHEBI:15365 for aspirin but still need to extract and ground headache.
The schema will still need to define a class for Drug and Symptom and the relation between the two, and that's fine as long as any provided annotations match that schema, but there's a tricky point here where the LLM may not extract an entity matching the provided annotation.

So the naive way to do this is just to pass all provided annotations directly to the output (making subissue for this).
But a more useful way may be to do a sort of mini-RAG in which the pre-provided entity annotations (just the text, no IDs) are injected into results as extraction is done recursively, so if we expect to see a relation between A and B then we inject the entity annotation for A before the LLM tries to find any relations between A and B.
Or we just decouple NER and RE entirely and don't bother with using the LLM for the former in some cases.
(Some of this may be more in the curate-gpt space, but there are certainly some options to try)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant