Skip to content

Latest commit

 

History

History
200 lines (158 loc) · 6.97 KB

ONT_variant_calling_r10_q20_gpu.md

File metadata and controls

200 lines (158 loc) · 6.97 KB

Oxford Nanopore R10.4 Q20 variant calling workflow [ON GPU]

PEPPER-Margin-DeepVariant is a haplotype-aware variant calling pipeline for long reads.

PEPPER-Margin-DeepVariant Variant Calling Workflow


HG002 chr20 case-study

We evaluated this pipeline on ~75x HG002 data. The data is publicly available, please feel free to download, run and evaluate the pipeline.

Sample:     HG002
Chemistry:  R10.4 Q20
Coverage:   ~25-90x
Basecaller: Guppy 5.0.15 Sup
Region:     chr20
Reference:  GRCh38_no_alt

Command-line instructions

Step 1: Install CUDA and NVIDIA-docker
Expand to see CUDA + NVIDIA-docker installation guide.
Preprocessing: Install CUDA [must be installed by root]

Install CUDA toolkit 11.0 from the CUDA archive.

Here are the instructions to install CUDA 11.0 on Ubuntu 20.04LTS:

# Verify you have CUDA capable GPUs:
lspci | grep -i nvidia

# Verify Linux version
uname -m && cat /etc/*release
# Expected output: x86_64

sudo apt-get -qq -y update
sudo apt-get -qq -y install gcc wget make

# Install proper kernel headers: This is for ubuntu
# Details: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
sudo apt-get install linux-headers-$(uname -r)

wget https://developer.download.nvidia.com/compute/cuda/11.2.0/local_installers/cuda_11.2.0_460.27.04_linux.run
sudo sh cuda_11.2.0_460.27.04_linux.run
Step 1.1: Install docker

Please install docker and wget if you don't have it installed already. You can install docker for other distros from here:

We show the installation instructions for Ubuntu here:

# Install wget to download data files.
sudo apt-get -qq -y update
sudo apt-get -qq -y install wget

# Install docker using instructions on:
# https://docs.docker.com/install/linux/docker-ce/ubuntu/
sudo apt-get -qq -y install apt-transport-https ca-certificates curl gnupg-agent software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) \
stable"

sudo apt-get -qq -y update
sudo apt-get -qq -y install docker-ce
docker --version

# To add the user to avoid running docker with sudo:
# Details: https://docs.docker.com/engine/install/linux-postinstall/

sudo groupadd docker
sudo usermod -aG docker $USER

# Log out and log back in so that your group membership is re-evaluated.

# After logging back in.
docker run hello-world

# If you can run docker without sudo then change the following commands accordingly.
Step 1.1: Install nvidia-docker

Install nvidia docker following these instructions.

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
&& curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update

sudo apt-get install -y nvidia-docker2

sudo systemctl restart docker

sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
# The output show show all your GPUs, if you enabled Docker for users then you should be able to run nvidia-docker without sudo
Step 2: Download and prepare input data
BASE="${HOME}/ont-case-study"

# Set up input data
INPUT_DIR="${BASE}/input/data"
REF="GRCh38_no_alt.chr20.fa"
BAM="HG002_pass_2_GRCh38.R10.4_q20.chr20.bam"

# Set the number of CPUs to use
THREADS="64"

# Set up output directory
OUTPUT_DIR="${BASE}/output"
OUTPUT_PREFIX="HG002_ONT_R10_Q20_2_GRCh38_PEPPER_Margin_DeepVariant.chr20"
OUTPUT_VCF="HG002_ONT_R10_Q20_2_GRCh38_PEPPER_Margin_DeepVariant.chr20.vcf.gz"

## Create local directory structure
mkdir -p "${OUTPUT_DIR}"
mkdir -p "${INPUT_DIR}"

# Download the data to input directory
wget -P ${INPUT_DIR} https://storage.googleapis.com/pepper-deepvariant-public/usecase_data/HG002_pass_2_GRCh38.R10.4_q20.chr20.bam
wget -P ${INPUT_DIR} https://storage.googleapis.com/pepper-deepvariant-public/usecase_data/HG002_pass_2_GRCh38.R10.4_q20.chr20.bam.bai
wget -P ${INPUT_DIR} https://storage.googleapis.com/pepper-deepvariant-public/usecase_data/GRCh38_no_alt.chr20.fa
wget -P ${INPUT_DIR} https://storage.googleapis.com/pepper-deepvariant-public/usecase_data/GRCh38_no_alt.chr20.fa.fai
Step 3: Run PEPPER-Margin-DeepVariant
## Pull the docker image.
sudo docker pull kishwars/pepper_deepvariant:r0.8-gpu

# Run PEPPER-Margin-DeepVariant
sudo docker run --ipc=host \
--gpus all \
-v "${INPUT_DIR}":"${INPUT_DIR}" \
-v "${OUTPUT_DIR}":"${OUTPUT_DIR}" \
kishwars/pepper_deepvariant:r0.8-gpu \
run_pepper_margin_deepvariant call_variant \
-b "${INPUT_DIR}/${BAM}" \
-f "${INPUT_DIR}/${REF}" \
-o "${OUTPUT_DIR}" \
-p "${OUTPUT_PREFIX}" \
-t "${THREADS}" \
--ont_r10_q20
Evaluation using hap.py (Optional)

You can evaluate the variants using hap.py.

Download benchmarking data:

# Set up input data
TRUTH_VCF="HG002_GRCh38_1_22_v4.2.1_benchmark.vcf.gz"
TRUTH_BED="HG002_GRCh38_1_22_v4.2.1_benchmark_noinconsistent.bed"

# Download truth VCFs
wget -P ${INPUT_DIR} https://storage.googleapis.com/pepper-deepvariant-public/usecase_data/HG002_GRCh38_1_22_v4.2.1_benchmark.vcf.gz
wget -P ${INPUT_DIR} https://storage.googleapis.com/pepper-deepvariant-public/usecase_data/HG002_GRCh38_1_22_v4.2.1_benchmark_noinconsistent.bed

Run hap.py:

# Pull the docker image
sudo docker pull jmcdani20/hap.py:v0.3.12

# Run hap.py
sudo docker run -it \
-v "${INPUT_DIR}":"${INPUT_DIR}" \
-v "${OUTPUT_DIR}":"${OUTPUT_DIR}" \
jmcdani20/hap.py:v0.3.12 /opt/hap.py/bin/hap.py \
${INPUT_DIR}/${TRUTH_VCF} \
${OUTPUT_DIR}/${OUTPUT_VCF} \
-f "${INPUT_DIR}/${TRUTH_BED}" \
-r "${INPUT_DIR}/${REF}" \
-o "${OUTPUT_DIR}/happy.output" \
--pass-only \
-l chr20 \
--engine=vcfeval \
--threads="${THREADS}"

Expected output:

Type Truth
total
True
positives
False
negatives
False
positives
Recall Precision F1-Score
INDEL 11256 9442 1724 774 0.846837 0.926517 0.884887
SNP 71333 71288 60 51 0.999159 0.999285 0.999222

Authors:

This pipeline is developed in a collaboration between UCSC genomics institute and the genomics team at Google health.