(Nucleomics-VIB) - PacBio-Tools

All tools presented below have only been tested by me and may contain bugs, please let me know if you find some. Each tool relies on dependencies normally listed at the top of the code (cpan for perl and cran for R will help you add them)

Please refer to the accompanying wiki for examples and workflows.

smrtlink-tools

[back-to-top]

bam_subset_smrt.sh

[smrtlink-tools]

The bash file bam_subset_smrt.sh creates a random subset from a BAM data and uploads the resulting file to the SMRT server as a new dataset.

# Usage: bam_subset_smrt.sh -b <input.bam>
# script version 1.0, 2017_01_18
# [optional: -o <output_prefix|sample_SS_XXpc>]
# [optional: -s <seed|1>]
# [optional: -f <fraction in %|10>]
# [optional: -t <threads|32>]
# [optional: -S <SMRT-server|"${smrthostname}">]
# [optional: -p <SMRT-port|9091>]
# [-h for this help]

explain-LocalContextFlags.html

[smrtlink-tools]

The html file explain-LocalContextFlags.html explains explain LocalContext Flags present in PacBio BAM data as a binary value in plain english. The content of this page is fully taken and adapted from a similar page dedicated to explaining SAM flags and hosted here. Please cite the PICARD source and not our version when using this code.

Open a local copy of the file using your favorite web browser to use it

rundata2tgz.sh

[smrtlink-tools]

The bash file rundata2tgz.sh creates a tar-gz archive from a local folder generated after a Sequel run on the storage share. The script creates a md5sum file and verifies that the checksum is valid. This script should be ran for each subfolder present in a run folder (eg: 1_A01, 2_B01, ...).

# Usage: rundata2tgz.sh
# script version 1.1.1, 2017_09_20
## input files
# [required: -i <run-folder> (name of the run folder containing the flow-cell folder)]
# [-f <flowcell name (default <1_A01> for a single-cell run)>]
# [-o <output folder (default to <$GCDADA>]
# [-l <show the list of runs currently present on the server>]
# [-h for this help]

jobdata2tgz.sh

[smrtlink-tools]

The bash file jobdata2tgz.sh creates a tar-gz archive from a job folder generated after a SMRTLink run on the storage share. The script creates a md5sum file and verifies that the checksum is valid (rem: .las files are excluded from the archive)

# Usage: jobdata2tgz.sh
# script version 1.0, 2018_04_13
## input files
# [required: -i <job-folder> (name of the run folder containing the SMRTLink job)]
# [-o <output folder ($NCDATA|$GCDATA; default to <$GCDADA>)]
# [-S <JOB data root (default to <$SMRT_DATA/000>)]
# [-l <show the list of jobs currently present on the server>]
# [-h for this help]

smrtlink_init.sh

[smrtlink-tools]

The bash file smrtlink_init.sh creates a launcher for the SMRT Link service (not tested).

# please use at your own risks
# info on how to set this can be found on the web

pbvcf2vcf4.pl

[smrtlink-tools]

The bash file pbvcf2vcf4.pl creates a VCF version 4.x copy of the SMRT vcf 3.3 file. The original format does not comply to VCF standards and the original GFF output does not help. The code requires the reference assembly file and its faidx index to add contig lines to the output and extract sequence at INS positions. The current code does only support haploid calls. This code is experimental and not finished.

# !!! this code is currently only valid for haploid calls
usage: pbvcf2vcf4.pl <pacbio_vcf3.3.vcf> <indexed-fasta-reference>

bam-utils

[back-to-top]

pb2polymerase.sh

[bam-utils]

The shell wrapper pb2polymerase.sh recreates polymerase reads from scraps and subreads using Pacbio bam2bam. Also reports polymerase lengths to be used in R for plotting.

Usage: pb2polymerase.sh <name>.scraps.bam> <threads|8>

SEQUELstats4one.sh

[bam-utils]

The shell wrapper SEQUELstats4one.sh applies code fro mthe Welcome Sanger repo to a single smartcell dataset (thereby avoiding issues where bsub is not installed; read: VertebrateResequencing/SEQUELstats#1)

Usage: SEQUELstats4one.sh <path to the Sequel BAM data>

sequel_read_lengths.R

[bam-utils]

The R script sequel_read_lengths.R reports subread and scrap read length distribution from a Sequel smartcell folder. It also plots polymerase lengths when they have been pre-processed using pb2polymerase.sh

Usage: sequel_read_lengths.R <path to the Sequel run data>

bam_size-filter.pl

[bam-utils]

The perl script bam_size-filter.pl filters BAM records exports and saves length information (and optionally BAM data) to file(s).

Aim: Filter a BAM file by read length
#  print filtered read lengths to file
#  (also output kept reads to BAM if -b is set)
## Usage: bam_size-filter.pl <-i bam-file>
# optional <-m minsize>
# optional <-x maxsize>
# optional <-b to also create a BAM output (default only text file of lengths)>
# <-h to display this help>

bam2sizedist.sh

[bam-utils]

The bash file bam2sizedist.sh extracts from a BAM file: molecule ID, read length, barcode information, and polymerase coordinates, and saves results to a text table (TSV) for stats in R.

# provide a bam file to be parsed!

bam_size-filter.sh

[bam-utils]

The perl file bam_size-filter.pl filters BAM records by min and max length. It output all filtered lengths to file for stats and can also create a BAM output (optional).

Aim: Filter a BAM file by read length
#  print filtered read lengths to# please provide mandatory arguments -q and -d!
# Usage: pb_STARlong.sh 
# -q <query sequences (reads)> 
# -d <STAR_database-folder>
# optional -t <threads> (default 8)>
# script version 1.0, 2017_03_03
# [-h for this help] file
#  (also output kept reads to BAM if -b is set)
## Usage: bam_size-filter.pl <-i bam-file>
# optional <-m minsize>
# optional <-x maxsize>
# optional <-b to also create a BAM output (default only text file of lengths)>
# <-h to display this help>

general-tools

[back-to-top]

arrow_polish_asm.sh

[general-tools]

The facilitating bash script arrow_polish_asm.sh maps Sequel reads to a draft Fasta assembly and uses the mapped reads to correct basecall errors and produce a polished version of the assembly.

# Usage: arrow_polish_asm.sh -a <fasta assembly> -b <sequel reads (bam)> 
# [optional: -p <smrt_bin path> (suggested: /opt/pacbio/smrtlink/smrtcmds/bin)
# [optional: -o <result folder>]
# [optional: -t <available threads|1>]
# [optional: -h <this help text>]
# script version 1.0, 2017_12_13

pb_STARlong.sh

[general-tools]

The facilitating bash script pb_STARlong.sh runs a preconfigured STARlong command with PacBio reads (Fasta). The arguments used in this script were reproduced from the dedicated Github page https://github.com/PacificBiosciences/cDNA_primer/wiki/Bioinfx-study:-Optimizing-STAR-aligner-for-Iso-Seq-data and can be amended when changes are necessary.

# Usage: pb_STARlong.sh 
# -q <query sequences (reads)> 
# -d <STAR_database-folder>
# optional -t <threads> (default 8)>
# script version 1.0, 2017_03_03
# [-h for this help]

[back-to-top]

Please send comments and feedback to nucleomics.bioinformatics@vib.be

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Name		Name	Last commit message	Last commit date
Latest commit History 277 Commits
bam-utils		bam-utils
custom_pipelines/Kinnex_16S_decat_demux.bash		custom_pipelines/Kinnex_16S_decat_demux.bash
general-tools		general-tools
pb-16S-nf-tools		pb-16S-nf-tools
pbcromwell_scripts		pbcromwell_scripts
pictures		pictures
plotting-tools		plotting-tools
qc-tools		qc-tools
smrtlink-tools		smrtlink-tools
wiki_files		wiki_files
LICENSE.md		LICENSE.md
NC_logo.png		NC_logo.png
README.md		README.md

License

Nucleomics-VIB/pacbio-tools

Folders and files

Latest commit

History

Repository files navigation

(Nucleomics-VIB) - PacBio-Tools

Table of Contents

smrtlink-tools

bam_subset_smrt.sh

explain-LocalContextFlags.html

rundata2tgz.sh

jobdata2tgz.sh

smrtlink_init.sh

pbvcf2vcf4.pl

bam-utils

pb2polymerase.sh

SEQUELstats4one.sh

sequel_read_lengths.R

bam_size-filter.pl

bam2sizedist.sh

bam_size-filter.sh

general-tools

arrow_polish_asm.sh

pb_STARlong.sh

Please send comments and feedback to nucleomics.bioinformatics@vib.be

About

Topics

Resources

License

Stars

Watchers

Forks

Languages