Scripts and data regarding Vampyrellida genomics.
-
Updated
May 10, 2024 - Shell
Scripts and data regarding Vampyrellida genomics.
Genomics workflows for CPG using Hail Batch
RawHash is the first mechanism that can accurately and efficiently map raw nanopore signals to large reference genomes (e.g., a human reference genome) in real-time without using powerful computational resources (e.g., GPUs). Described by Firtina et al. (published at https://academic.oup.com/bioinformatics/article/39/Supplement_1/i297/7210440)
Earl Grey: A fully automated TE curation and annotation pipeline
Snakemake workflow for the analysis of biosynthetic gene clusters across large collections of genomes (pangenomes)
Randomly subsample sequencing reads
GRAMEP - Genome vaRiation Analysis from the Maximum Entropy Principle
NanoRepeat: fast and accurate analysis of Short Tandem Repeats (STRs) from Oxford Nanopore sequencing data
awk simulators for the pacbiohifi assembly reading from the graphs. easy to use awk for the coverage and the length files. udpating later on with the complete awk functionalities for the compilation into the direct kernels.
genome sorting and plotting the length alignment from the lastz alignment for the node calculations right before you import them for the calculations.
a combination of the gawk and the awk basic compiler to create a function that will estimate the aligned genome fractions from the paf alignment of the genome long reads to the genome. This includes multiple functions which also allow for the filtering of the quality alignments before estimating the genome length coverage.
a pacbiohifi read check for the quick view of the read types and making it easy for the fasta manipulations and read extractions. You can spawn also this in rust.
a genome reference estimation based on the peak size calibration and the given the values of the peak size calibration will estimate the genome and the sub genome fraction.
a function to calculate the genome annotation for the microbiome and also for the other genomes. It will take a genome annotation or a text file and will prepare the count and also for the gene ontology analysis
estimation of the genome size for the illumina reads, only for the pre-screening purposes and includes a R function also.
a awk library package for the numerical reads analysis. parses the long reads and the pacbiohifi reads and the corresponding alignments .
a python package for working with the tair, phytozome and conversions and also the annotation and coordinates checker.
plotting tools for the mRNA from the proteome to the genome anntoation. Produces a tab delimited files with the start and the stop of the mRNAs
a coding plotter for the protein annotations coming from the annotation of the genome using the protein hints and to extract and plot the specific length estimates.
extracting all the intergenic regions from the genome annotation using the protein alignments.
Add a description, image, and links to the genome-analysis topic page so that developers can more easily learn about it.
To associate your repository with the genome-analysis topic, visit your repo's landing page and select "manage topics."