Skip to content

NCI-GDC/gdcdatamodel

Repository files navigation

Build Status pre-commit


GDC Data Model

Repo to keep information about the GDC data model design.

Installation

To install the gdcdatamodel library run the setup script:

❯ python setup.py install

Jupyter + Graphviz

It's helpful to examine the relationships between nodes visually. One way to do this is to run a Jupyter notebook with a Python2 kernel. When used with Graphviz's SVG support, you can view a graphical representation of a subgraph directly in a REPL. To do so, install the dev-requirements.txt dependencies. There is an example Jupyter notebook at examples/jupyter_example.ipynb (replicated in examples/jupyter_example.py for clarity)

pip install -r dev-requirements
PG_USER=* PG_HOST=* PG_DATABASE=* PG_PASSWORD=*   jupyter notebook examples/jupyter_example.ipynb

Documentation

Visual representation

For instructions on how to build the Graphviz representation of the datamodel, see the docs readme.

Dependencies

Before continuing you must have the following programs installed:

The gdcdatamodel library requires the following pip dependencies

Project Dependencies

Project dependencies are managed using PIP

Example validation usage

from gdcdatamodel import node_avsc_object
from gdcdatamodel.mappings import get_participant_es_mapping, get_file_es_mapping
from avro.io import validate
import json


with open('examples/nodes/aliquot_valid.json', 'r') as f:
    node = json.loads(f.read())
print validate(node_avsc_object, node)  # if valid, prints True


print(get_participant_es_mapping())  # Prints participant elasticsearch mapping
print(get_file_es_mapping())         # Prints file elasticsearch mapping

Tests

❯  nosetests -v
test_invalid_aliquot_node (test_avro_schemas.TestAvroSchemaValidation) ... ok
test_valid_aliquot_node (test_avro_schemas.TestAvroSchemaValidation) ... ok

----------------------------------------------------------------------
Ran 2 tests in 0.033s

OK

Setup pre-commit hook to check for secrets

We use pre-commit to setup pre-commit hooks for this repo. We use detect-secrets to search for secrets being committed into the repo.

To install the pre-commit hook, run

pre-commit install

To update the .secrets.baseline file run

detect-secrets scan --update .secrets.baseline

.secrets.baseline contains all the string that were caught by detect-secrets but are not stored in plain text. Audit the baseline to view the secrets .

detect-secrets audit .secrets.baseline

Contributing

Read how to contribute here