Skip to content

sbl-sdsc/coronavirus-knowledge-graph

Repository files navigation

Coronavirus-Knowledge-Graph

This project is obsolete.

The current version of this project is here: COVID-19-Net Knowledge Graph

WORK IN PROGRESS

Binder

Fig. Strain subgraph

Documentation below is obsolete

Prototype to create a Neo4j Knowledge Graph for Coronavirus outbreaks.

The goal of this Knowledge Graph project is to link heterogeneous data from publically available resources relevant to the COVID-19 outbreak. By linking disparate datasets, new insights may be gained.

Currently, this project integrates data from:

The initial focus of this repo is on the Novel Coronavirus COVID-19 (2019-nCoV). In the future we will include data for SARS and MERS.

How to use this project?

  1. Launch this repo on MyBinder.org. Binder lets you run Jupyter Notebooks in your web browser without software installation.

Binder

  1. Once Jupyter Lab launches (this may take a couple of minutes), navigate to the notebooks directory.

  2. Run the following Jupyter Notebooks:

  • 1-PrepareDatasets.ipynb (downloads public data about COVID-19)

  • 2-CreateKnowledgeGraph (creates a Neo4j Knowledge Graph)

  • 3-AddGeneProteinInfo (adds genome, gene, and protein information)

  • 4-ExampleQueries (runs Cypher queries on the Knowledge Graph)

Coronavirus KG Views displayed in Neo4j Browser

Fig. 1: The whole Coronavirus KG

Fig. 2: Outbreaks by Country, State/Province, and City

Fig. 3: Pathogen, Genome, Genes, and Proteins

Fig. 4: Strains found in Hubei province

How can you help?

  • Suggest complementary publically accessible datasets to include in this Knowledge Graph
  • Suggest queries and analyses
  • Report bugs or issues
  • Submit a pull request

Please send feedback or feature requests.