Skip to content

Hormiphora californensis genome annotation and supplemental materials related to the genome assembly.

Notifications You must be signed in to change notification settings

conchoecia/hormiphora

Repository files navigation

hormiphora

hormiphora image

This repo contains the code necessary to generate the annotation for the Hormiphora californensis genome, the annotation as releases, and the code and results from the H. californensis genome assembly paper.

This repo is maintained by Darrin T. Schultz and Warren R. Francis.

Directory

The assembly - Hcv1

Download the gzipped genome assembly fasta file here.

To download the latest annotation, navigate to the releases page and download the Hc[version]_release.tar.gz file. The releases are editioned like so, and dot character, ., -delimited :

Hcv1.av93
 Hcv1 = Hormiphora californensis genome assembly version 1
 av93 = annotation version 93
  • The releases contain specific documentation, but briefly, each release contains the three most important files:
    • Model proteins (use these for protein analyses - do not translate proteins from CDS sequences generated from the GFF file/assembly file.
    • The transcripts, may contain prematurely truncated CDS sequences. See above for getting the final model proteins.
    • GFF of the transcripts.

The actual annotation directory in the repo contains the files necessary to generate the current annotation version. The annotation can be reconstructed by running snakemake in that directory.

  • Contains directories with supplementary files for the following analyses:
    • heterozygosity contains files that were generated in the process of calculating the heterozygosity of the H. californensis genome.
    • intergenic_antisense contains a Snakefile and config file used to investigate nested intronic genes. Also includes the data output for Hormiphora.
    • centromere_plots contains an annotation of the repeats present in the genome, as well as a python file used to plot this in repeat frequency vs coordinate to look for repeat-rich regions, as well as the plots from this analysis.
    • pictures

This directory contains the TADs for H. californensis.

Contains the whole genome assembly, converted into haplotype-specific fasta files using the phased VCF files.