Installing the Bystro Python libraries and cli tools

Bystro consists of 2 main components:

The Bystro annotator (Perl) which is a command line tool for building new Bystro annotation databases, and for annotating VCF files with those databases.
The bystro Python package, which contains:
1. The bystro library, which contains general purpose machine learning / statistical methods as well as applications of these methods in biology, with methods like global ancestry, polygenic risk score calculation, and proteomic analysis (data cleaning, pQTL, joining/filtering on genetic data).
2. The bystro-api command line tool, which is a command line interface for the Bystro API server. This is used to login to Bystro cluster, submit jobs, and check job status. It has most of the functionality of the web application, but is more convenient for batch processing.
3. For enterprise users that have their own Bystro cluster, the Bystro Python package also gives the ability to launch workers to handle Bystro API server requests (bystro-save-worker, bystro-index-worker).

To install the Bystro Python package, run:

pip install --pre bystro

The Bystro ancestry CLI score tool (bystro-api ancestry score) parses VCF files to generate dosage matrices. This requires bystro-vcf, a Go program which can be installed with:

# Requires Go: install from https://golang.org/doc/install
go install github.com/bystrogenomics/bystro-vcf@2.2.2

Bystro is compatible with Linux and MacOS. Windows support is experimental. If you are installing on MacOS as a native binary (Apple ARM Architecture), you will need to install the following additional dependencies:

brew install cmake

Setting up the Bystro project for development

If you wish to stand up a local development environment, we recommend using Miniconda to manage Bystro Python dependencies: https://docs.conda.io/projects/miniconda/en/latest/

Once Bystro annotator installation is complete, and assuming Conda/Miniconda has been installed, run :

# Install Rust
curl https://sh.rustup.rs -sSf | sh -s -- -y
echo -e "\n### Bystro: Done installing Rust! Now sourcing .cargo/env for use in the current shell ###\n"
source "$HOME/.cargo/env"
# Create or activate Bystro conda environment and install all dependencies
# This assumes you are in the bystro folder
source .initialize_conda_env.sh;

If you edit Cython or Rust code, you will need to recompile the code. To do this, run:

make build-python-dev

Starting Bystro API server listeners

If you have a private deployment of the Bystro cluster, you will need to start the API server listeners.

To start local workers for receiving and processing jobs from the Bystro API server, use either make serve-local or make serve-dev, depending on whether you are in a product environment, or development (the difference is entirely in build time and optimization flags).

These workers rely on a few Go programs, which are compiled and installed in the go/bin directory. If you are running the workers for the first time, you will need to install Go, and then install the Go programs by running:

make install-go

To start the workers, update config/beanstalk.yml to point to the correct beanstalk servers (the ones that the Bystro API server is pointing to), and config/opensearch.yml to point to an OpenSearch server, and then run:

make serve-dev

Installing Bystro using Docker

The recommended way to use Bystro on the command line

Make sure you have Docker installed

Building the latest version of Bystro in Docker

git clone https://github.com/bystrogenomics/bystro.git && cd bystro
docker build -t bystro .
docker run bystro bystro-annotate.pl #Annotate
docker run bystro bystro-build.pl #Build

Direct (non-Docker) installation

There are 2 components to Bystro:

The Bystro annotator: a Perl program accessed through the command line (via bin/bystro-*)
The Bystro Python package: where the rest of Bystro's functionality lives (statistics, proteomics, etc).

Installing the Bystro annotator (Perl/cli)

(Fedora, Redhat, Centos, openSUSE, Mandriva)

git clone https://github.com/bystrogenomics/bystro.git && cd bystro && source ./install-rpm.sh

MacOS (tested on HighSierra, interactive)

git clone https://github.com/bystrogenomics/bystro.git && cd bystro && source ./install-mac.sh

Ubuntu

Ensure that packages are up to date (sudo apt update), or that you are satisified with the state of package versions.
git clone https://github.com/bystrogenomics/bystro.git && cd bystro && source ./install-apt.sh
- Please not that this installation script will ask you for the root password in order to install system dependencies

Configuring the Bystro annotator

Once Bystro is installed, it needs to be configured. The easiest step is choosing the species/assemblies to annotate.

Download the Bystro database for your species/assembly

Example: hg38 (human reference GRCh38): wget https://s3.amazonaws.com/bystro-db/hg38_v8.tar.gz
- You need ~700GB of free space for hg38 and ~400GB of free space for hg19, including the space for the tar.gz archives

To install the database:

Example:

cd /mnt/annotator/
wget https://s3.amazonaws.com/bystro-db/hg38_v8.tar.gz
bgzip -d -c --threads 32 hg38_v8.tar.gz | tar xvf -

In this example the hg38 database would located in /mnt/annotator/hg38

Update the YAML configuration for the species/assembly to point to the database.

For human genome assemblies, we provide pre-configured hg19.yml and hg38.yml, which assume /mnt/annotator/hg19_v10 and /mnt/annotator/hg38_v8 database directories respectively.

If using a different mount point, different database folder name, or a different (or custom-built) database altogether, you will need to update the database_dir property of the yaml config.
- Note for a custom database, you would also need to ensure the track outputOrder lists all tracks, and that each track has all desired features listed
For instance, using yq to can configure the database_dir and set temp_dir to have in-progress annotations written to local disk
```
yq write -i config/hg38.yml database_dir /mnt/my_fast_local_storage/hg38_v8
yq write -i config/hg38.yml temp_dir /mnt/my_fast_local_storage/tmp
```

Databases:

Human (hg38): https://s3.amazonaws.com/bystro-db/hg38_v8.tar.gz
Human (hg19): https://s3.amazonaws.com/bystro-db/hg19_v10.tar.gz
There are no restrictions on species support, but we currently only build human genomes. Please create a GitHub issue if you would like us to support others.

Running your first annotation

Ex: Runing hg38 annotation

bin/bystro-annotate.pl --config config/hg38.yml --in /path/in.vcf.gz --out /path/outPrefix --run_statistics [0,1] --compress

The outputs will be:

Annotation (compressed, due to --compress flag): outPrefix.annotation.tsv.gz
Annotation log: outPrefix.log.txt
Statistics JSON file outPrefix.statistics.json
Statistics tab-separated file: outPrefix.statistics.tsv
- Removing the --run_statistics flag will skip the generation of outPrefix.statistics.* files

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

INSTALL.md

INSTALL.md

Table of Contents

Installing the Bystro Python libraries and cli tools

Setting up the Bystro project for development

Starting Bystro API server listeners

Installing Bystro using Docker

The recommended way to use Bystro on the command line

Building the latest version of Bystro in Docker

Direct (non-Docker) installation

Installing the Bystro annotator (Perl/cli)

(Fedora, Redhat, Centos, openSUSE, Mandriva)

MacOS (tested on HighSierra, interactive)

Ubuntu

Configuring the Bystro annotator

Databases:

Running your first annotation

Files

INSTALL.md

Latest commit

History

INSTALL.md

File metadata and controls

Table of Contents

Installing the Bystro Python libraries and cli tools

Setting up the Bystro project for development

Starting Bystro API server listeners

Installing Bystro using Docker

The recommended way to use Bystro on the command line

Building the latest version of Bystro in Docker

Direct (non-Docker) installation

Installing the Bystro annotator (Perl/cli)

(Fedora, Redhat, Centos, openSUSE, Mandriva)

MacOS (tested on HighSierra, interactive)

Ubuntu

Configuring the Bystro annotator

Databases:

Running your first annotation