Skip to content

A workflow that generates a knowledge graph from a list of DOIs using the tool '# End-to-end Relation Extraction on the natural-product literature' from IDIAP.

Notifications You must be signed in to change notification settings

p2m2/workflow-kg-plants-taxon-compound

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Workflow KG Plants Taxon Compound

This workflow involves a chain of processes to construct a knowledge graph from a list of scientific article DOIs. It aims to establish connections between scientific articles contained in PubMed and pairs of taxa/metabolites through the "produces" relationship. This work is based on the repository Relation Extraction in underexplored biomedical domains: A diversity-optimised sampling and synthetic data generation approach.

RDF Model

1 - Building DOI list file

  • Search in PubMed for articles related to a taxon of the Brassicaceae family and glucosinolate compounds.
curl -s 'https://pubmed.ncbi.nlm.nih.gov/?term=brassica+glucosinolate&format=pubmed&size=200' | grep "\[doi\]" | cut -d" " -f3 > data/brassicale_glucosinolate.txt

2 - a) Building the article base from a list of DOIs

python src/api_doi.py --list_doi "10.1021/jf401802n,10.1021/jf405538d" --output test.json

2 - b) Building the article base from a list of DOIs in a file

python src/api_doi.py --list_doi_file data/list_doi_example.txt --output test.json

2 - c) Building tha article base from pdf article

TODO

3 - IDIAP Workflow to generate Taxon / Metabolite "produces" associations

  • Working with a GPU environment

Genouest Org

ssh $USER@genossh
srun --gpus 1 -p gpu --pty bash
. /local/env/envpython-3.9.5.sh
virtualenv ~/env-idiap ## only the first time !!
source ~/env-idiap/bin/activate 
export PATH=/home/genouest/inra_umr1349/$USER/.local/bin:$PATH
python src/workflow_idap.py --dump igepp.json

References

4 - Build RDF Graph

pip install pygbif rdflib
python src/build_rdf_graph.py --dump_doi test.json --dump_taxon_compound test_taxon_metabolite_associations_idiap.json

Note about relation to build/infere

gist

About

A workflow that generates a knowledge graph from a list of DOIs using the tool '# End-to-end Relation Extraction on the natural-product literature' from IDIAP.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages