Skip to content

nextstrain/flu_frequencies

Repository files navigation

Flu frequencies

Estimating flu clade and mutation frequencies

branch URL
release https://flu-frequencies.vercel.app/
master https://master-flu-frequencies.vercel.app/

Development

This README is for the data analysis pipeline. For the web interface, see web/README.md.

Setup

Using Nextstrain CLI

Currently not working due to lack of polars in Nextstrain managed environments (conda/docker)
Install Nextstrain CLI

On Linux:

curl -fsSL --proto '=https' https://nextstrain.org/cli/installer/linux | bash

On macOS:

curl -fsSL --proto '=https' https://nextstrain.org/cli/installer/mac | bash
Setup Nextstrain CLI

You can set it up to use Docker or a Nextstrain managed conda environment (completely independent of any other conda environments you may have).

Using docker:

nextstrain setup --set-default docker

Using managed conda environment:

nextstrain setup --set-default conda
Run analysis

Run analysis:

nextstrain build . --profile profiles/flu

Using custom conda environment

Install conda environment:

mamba env create -f environment.yml

Activate the environment:

conda activate flu_frequencies

Run for flu using:

snakemake --profile profiles/flu

Run for SARS-CoV-2 using:

snakemake --profile profiles/SC2

Viewing results in web app

Copy snakemake workflow results to data_web/inputs, ensuring that correct filenames are used, e.g.:

cp results/h3n2/continent-country-frequencies.csv data_web/inputs/flu-h3n2.csv

Then process the csv files into json:

python scripts/web_convert.py --input-pathogens-json data_web/inputs/pathogens.json --output-dir web/public/data

TODO

  • Provide mamba environment file for simpler setup
  • Agree on formatters to use (snakefmt and black?)