Skip to content

RNAcentral/rnacentral-data-schema

Repository files navigation

RNAcentral data schema

Build Status

This repository contains the JSON schema files for importing data into RNAcentral. The goal is create a generic, simple method of importing data into RNAcentral. Currently the schema here are under review are are not ready for real world use case. If you are interested in providing feedback please open an issue or pull request.

The work here is based off the agr_schemas version 0.6.2. The top level schema is rnacentral-schema.json with subsections stored in sections. Examples will be found in examples and a validation script is provided in validate.py.

Coordinate systems

We support referencing coordinates in two ways, either 0-start, half open or 1-start, fully-closed. For details on what these mean and why there are two coordinate systems please refer to:

By default we assume that the coordinates for genomic locations are 0-based, half open, but this can be changed by setting the genomicCoordinateSystem property of the metaData. This does not change the behavior of annotations about any sequence features like, modifications or the coordinates in related sequences.

Requirements

The validation script requires the python packages found in requirements.txt. Install them with pip install -r requirements.txt. Validation is not yet provided as a library, but may be in the future.

Validating files

The validator script in ./validate.py can be used to validate files. Below are some example usages:

./validate.py examples/flybase.json

If you have moved schema files, or the directory of sections around you can use the --schema or --sections options. Like:

./validate.py --schema different-schema.json --sections new-sections.json examples/flybase.json