COVID-19 Research Knowledge Graph

Builds a knowledge graph from the COVID-19 Open Research Dataset (CORD-19) dataset. As of 2020-03-18 it has been run against the Commercial use subset (includes PMC content) -- 9000 papers, 186Mb.

This project is written is Scala... you require sbt to continue.

Prerequsites

Install sbt
Download the Commercial use subset and extract it to some local directory
Clone dair-iitd/OpenIE-standalone and follow the build instructions
- git clone https://github.com/dair-iitd/OpenIE-standalone.git && cd OpenIE-standalone
- sbt -J-Xmx10000M clean compile assembly
- java -Xmx10g -XX:+UseConcMarkSweepGC -jar target/scala-2.10/openie-assembly-5.0-SNAPSHOT.jar --httpPort 8000
- To get an extraction from the server use the POST request on /getExtraction endpoint to POST sentences. The sentence will go in the body of HTTP request. An example of curl request curl -X POST http://localhost:8000/getExtraction -d "The Jet Propulsion Laboratory is a federally funded research and development center and NASA field center in the city of La Canada Flintridge with a Pasadena mailing address, within the state of California, United States."

Installation

Back in this directory...

Launch sbt:

$ sbt compile

Running

From sbt

Launch sbt:

$ sbt

Run the program with an argument indicating the input data directory containing the dataset:

> run /path/to/directory/containing/individual/CORD-19_files /path/to/directory/containing/individual/annie_extra ction_files

As a standalone JAR

First assemble the JAR

$ sbt assembly

... then run jar via java

$ java -jar ./target/scala-2.13/covid19_knowledge_graph-assembly-0.1.0-SNAPSHOT.jar

Output

Once the program runs (this may take some time depending on how much memory your machine has) you will find a newly written file called covid19_knowledge_graph.ttl. This file can be loaded into Apache Jena's Fuseki server (or any other SPARQL server which permits ingest of TTL RDF graphs).

Querying Data

Once the data is loaded into Fuseki, you can use Jena's powerful full text search which combines SPARQL and full text search via Lucene or ElasticSearch (built on Lucene). It gives applications the ability to perform indexed full text searches within SPARQL queries.

Contact

Dr. Lewis John McGibbney Ph.D., B.Sc.(Hons)

Enterprise Search Technologist

Web and Mobile Application Development Group (172B)

Application, Consulting, Development and Engineering Section (1722)

Info & Engineering Technology Planning and Development Division (1720)

Jet Propulsion Laboratory

California Institute of Technology

4800 Oak Grove Drive

Pasadena, California 91109-8099

Mail Stop : 600-172A

Tel: (+1) (818)-393-7402

Cell: (+1) (626)-487-3476

Fax: (+1) (818)-393-1190

Email: lewis.j.mcgibbney@jpl.nasa.gov

ORCID: orcid.org/0000-0003-2185-928X

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
project		project
src/main/scala/gov/nasa/jpl/covid19_textmining_kaggle		src/main/scala/gov/nasa/jpl/covid19_textmining_kaggle
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

project

project

src/main/scala/gov/nasa/jpl/covid19_textmining_kaggle

src/main/scala/gov/nasa/jpl/covid19_textmining_kaggle

.gitignore

.gitignore

LICENSE.md

LICENSE.md

README.md

README.md

build.sbt

build.sbt

Repository files navigation

COVID-19 Research Knowledge Graph

Prerequsites

Installation

Running

From sbt

As a standalone JAR

Output

Querying Data

Contact

About

Releases

Packages

Languages

License

nasa-jpl-cord-19/covid19-knowledge-graph

Folders and files

Latest commit

History

Repository files navigation

COVID-19 Research Knowledge Graph

Prerequsites

Installation

Running

From sbt

As a standalone JAR

Output

Querying Data

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Languages