Skip to content

pdimens/bio-bin

Repository files navigation

Bio-bin, the genomic toolbox

A place to store custom and forked scripts used for genomic analysis- a list slowly growing as things come up.

allmaps_split_chimera.sh BASH logo

A reusable script that wraps the steps provided by ALLMAPS to identify and split chimeric contigs.

bampurge.sh BASH logo

Sort and index a BAM file, along with removing unmapped reads. Provide the number of threads as the second argument to run multithreaded.

configure_blasr_install BASH logo

It took me forever to get blasr/sparc installed and running correctly for hybrid genome assemblies, and after finally getting it to work, I vowed to never ever have to deal with it again, so this scipt does the necessary tweaks to get sparc_split_and_run.sh working right, and from your $PATH. Deprecated since adding PR's to DBG2OLC repo

CoverageCutoff.jl Julia logo

Simple isolation of contigs below a specified sequence coverage threshold. Typically used for the genome.file output from dDocent's FreeBayes step when FreeBayes crashes due to memory load because de novo assembly with too many contigs. Output usually fed into faSomeRecords to "prune" the de novo assembly of low-coverage contigs.

countbam BASH logo

Simple wrapper for SAMtools which counts the total number of reads and number of mapped reads in bam files.

CountMatch.jl Julia logo

Takes an input file of strings (like 6bp indices) and does and all vs. all match to count the number of mismatches between the indices. Outputs an html heatmap and textfile of the pairwise comparisons.

estimateGenomeSize BASH logo

Iteratively performs the first steps of the Jellyfish Kmer counting method

exportenv | condadeps BASH logo

For those times you forget the command to export (and strip the prefix from) your current conda environment to a yaml file. Use condadeps to list only the manually (explicitly) installed programs.

FastStructureK.sh BASH logo

A convenience wrapper to perform fastStructure anaylses for a range of 1 to k values, then summarize all the marginal likelihoods into a single file.

punzip BASH logo

Parallelized unzipping of .gz files from one directory into another. Can do an entire directory, or only files containing something specific in their name, such as lobster, _R1_, britneyspears, etc.

revcomp BASH logo

Returns the reverse, complement, or reverse-complement of DNA bases in a text file.

unpac BASH logo

Converts pacbio sequences from bam to fasta/q. A wrapper for bam2fastx