pacbio_hifi_assembly

This pipeline automates via snakemake the genome assembly of PacBio HiFi reads using Hifiasm. It takes BAM files from sequencing as input and will output a final primary assembly plus two haplotype assemblies (by default) and basic assembly QC metrics from QUAST.

Getting the pipeline

Simply clone this repository from Github, after making sure git is installed on your machine! (Note: if you are using the Harvard Canon computing cluster, git is installed by default):

git clone https://github.com/harvardinformatics/pacbio_hifi_assembly.git

Then enter the directory: cd pacbio_hifi_assembly/

You will also need snakemake installed to run the pipeline. The easiest way to install snakemake is via Conda/Mamba (preferably Mamba), instructions for which can be found here.

Configuring the pipeline

In the repo directory, there is a file in the config/ subdirectory called config.yaml that you will need to modify to point towards your data. For a basic assembly, just change the following lines:

sample: "sample_name"  #Name of the sample to act as base name
reads: " "  #List of BAM files output from sequencer (include full path). If multiple files, separate by SPACES

You do not need to change any other options in config/config.yaml unless you are incorporating HiC data (note this is NOT the same as scaffolding with HiC!) or only want a primary assembly.

Running the pipeline

From the main directory, navigate into the workflow/ subdirectory, which contains the Snakefile that determines the order in which the pipeline runs. For running the assembly on the cluster, here is an example SLURM file to run from within the workflow directory:

#!/bin/bash

#SBATCH -N 1
#SBATCH -t 7-00:00:00
#SBATCH --mem 120G
#SBATCH -n 20
#SBATCH --partition shared
#SBATCH -J snakemake

source ~/anaconda3/bin/activate snakemake

snakemake -r --cores 20 --use-conda --rerun-incomplete

Resources needed will depend on the size and complexity of the genome, as will the time to complete. If successful, you should see a subdirectory workflow/results/ that contains the finished assembly and QC stats!

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
config		config
profiles/slurm		profiles/slurm
workflow		workflow
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

config

config

profiles/slurm

profiles/slurm

workflow

workflow

README.md

README.md

Repository files navigation

pacbio_hifi_assembly

Getting the pipeline

Configuring the pipeline

Running the pipeline

About

Releases

Packages

Languages

harvardinformatics/pacbio_hifi_assembly

Folders and files

Latest commit

History

Repository files navigation

pacbio_hifi_assembly

Getting the pipeline

Configuring the pipeline

Running the pipeline

About

Resources

Stars

Watchers

Forks

Languages