Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support triple extraction use case #33

Open
caufieldjh opened this issue Mar 6, 2024 · 3 comments
Open

Support triple extraction use case #33

caufieldjh opened this issue Mar 6, 2024 · 3 comments

Comments

@caufieldjh
Copy link
Member

In discussion with RNA-KG group (Marco Mesiti, Elena Casiraghi, Emanuele Cavalleri) and @justaddcoffee -
we would like to be able to extract triples (s, p, o) from a provided text, using graph embeddings to guide the process.
The goal is to find additional content for RNA-KG. Using OntoGPT has worked well for this so far but does not take advantage of the existing relations within the KG.

This would involve:

  • Including interface (CLI and/or GUI) to use text document as input
  • Providing way to index KGX and/or derive a schema from it
  • Building wrapper for graph embeddings.
    • Using GRAPE directly through this project would be a heavy lift, so retrieving embeddings from an external source like Huggingface would likely work better, save time, and avoid introduction of many new dependencies
  • Writing documentation for the above

Integrating some process for comparison of the extracted triples would be ideal (e.g., A vs B appears in 20 documents, 15 of them from different sources, etc).

RNA-KG group has also suggested trying an alternative vector DB (https://www.llamaindex.ai/) to see if it works better for RAG with KG data.

@cmungall
Copy link
Member

cmungall commented Mar 6, 2024

I'm not following the part about KG embeddings. I don't think we'd want a dependency on GRAPE here. But we want to support people providing their own embeddings e.g. via venomx. However I don't get how GRAPE/node2vec style embeddings would work with RAG.

Good suggestion to explore llamaindex. But I think this is orthogonal. See #34

@justaddcoffee
Copy link
Member

Not sure what exactly Marco had in mind for using KG embeddings with RAG, but possibly something like read in abstracts that may contain relations of interest, do NER/ground to get IDs/CURIEs of interest from text, then pull these and any related nodes using KG embeddings and send along for context? Not sure

@justaddcoffee
Copy link
Member

Also, agree that a GRAPE dependency might not be what we want here. I've made a (draft) PR #36 to support pulling embeddings from huggingface or any other URL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants