Skip to content

deskool/brainworks-public

Repository files navigation

BRAIN initiative Workspace to Organize the Knowledge Space (BRAINWORKS)

Author: Dr. Mohammad Ghassemi, National Scholar, Data and Technology Advancement, National Institutes of Health

Overview

knowledge-integration

Fig.1 - An illustrative depicition of BRAINWORK's objective - to structure the scientific literature as an integrated knowledge graph


The Need: The scientific knowledge landscape is vast, complex and rapidly expanding. In 2020, an additional 2 million new peer-reviewed papers were added to the scientific literature, which is now estimated to contain over 60 million works. At this volume, it would take a single individual almost 20 years (without breaks) to perform a 5-minute review of each paper written in 2020. Even narrow subdomains of scientific investigation now produce a level of output that is intractable for a single scholar to master: over 100,000 papers about the coronavirus pandemic were published in 2020, alone.

The Solution: As knowledge generation continues to outpace the ability of individual scientists to consume and integrate it, there is a critical need for technology tools that can organize, integrate, and represent the nuanced knowledge contained within the growing body of the scientific literature. BRAINWORKS is a web platform that addresses these needs by structuring the scientific literature as a dynamic and interactive knowledge graph. While development of the platform is ongoing, an alpha version of the tool is freely available online at http://brainworks.scigami.org.

The Innovation: BRAINWORKS is innovative because of its ability to represent scientific knowledge as well as the context governing its creation (funding, grants, authors, etc.). Furthermore, it provides a novel way to visualize the temporal evolution of scientific knowledge.

Technology Stack

The technology stack for BRAINWORKS consists of three layers. Each layer was designed to function independently to maximize extensions of the technology stack for other use cases. In brief, the technology stack consists of:

  1. An Information layer: that collects and stores publicly available publication, grant, and meta-data in a centralized database. This layer enables several potential downstream applications including:

    • Normalization of citation volumes by publication time and domain

    • Organization of scientific publications in a structured, searchable knowledge base

    • Association of non-scientific factors (e.g. grants and author) with publication content

  2. An Algorithms layer: that parses the unstructured publication text into structured semantic triples and also identifies UMLS entities that occur within the triples. This layer enables several potential downstream applications including:

    • Prediction of intervention impact (e.g., grants, or new papers) on knowledge graph structure

    • Prediction of prospective knowledge graph structure via historical graph dynamics

    • Identification and mapping of scientific entities in free text to established ontologies (e.g. UMLS)

  3. A visualization layer: that represents a set of semantic triples, and meta data as a dynamic and interactive knowledge graph. This layer enables several potential downstream applications including:

    • Visualization of the evolution of scientific topics and topics-relations over time

    • Representation and exploration of hierarchical or other complex knowledge relationships

Getting started

For ease of extension, we have included several iPython tutorial notebooks that illustrate how to use each component of the technology stack for data collection, natural language analysis, and data visualization, in general. The tutorials also include a few illustrative examples of the analyses that can be performed using the data.

Here are the steps you need to get started:

  1. On an Ubuntu 20.04 machine (or equivalent), run ./setup.sh
  2. Update the configuration file with API keys, and database location.
  3. Go through the tutorials.

Other Resources:

Slides For more details on the BRAINWORK development approach, see these slides.

Complete Data: An AWS RDS database snapshot containing the entirety of the collected data is available upon request. Please contact us with a brief description of your interest and your AWS account ID.

Acknowledgements

The Development of BRAINWORKS was led by Dr. Mohammad Ghassemi, 2021 NIH Data and Technology Advancement (DATA) National Service Scholar for the NIH BRAIN Initiative, through support of the National Institutes of Health (NIH) Office of Data Science Strategy (ODSS) and the National Institute of Neurological Disorders and Stroke (NINDS). Dr. Ghassemi worked in close collaboration with several members of BRAIN Initiative Team E including Dr. Grace Peng, Dr. Jim Gnadt, Dr. Michele Ferrante, Dr. Susan Wright, Dr. Karen David, Dr. Christina Fang and the Director of the BRAIN Initiative, Dr. John Ngai.

We would like to acknowledge the many scientists and NIH partners that generously agreed to participate in interviews that guided the initial development of the platform, the NIH ODSS, and the NIH BRAIN Initiative for supporting this work.

knowledge-integration

Fig.2 - Names, images and affiliations of those who assisted with the vision of the BRAINWORKS platform.

How to Cite

If you use BRAINWORKS, or a part of this repository as part of your own work, please show us your support by staring and citing this repository:

@misc{Ghassemi2021BRAINWORKS,
  author = {Ghassemi, Mohammad Mahdi and Peng, Grace and Gnadt, Jim and Ferrante, Michele and Wright, Susan and David, Karen and Fang, Christina and Ngai, John},
  title = {BRAIN initiative Workspace to ORganize the Knowledge Space},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/deskool/brainworks-public}}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published