Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manage File Assets for Batch Processing of a Genome #35

Open
josiahseaman opened this issue Nov 3, 2016 · 0 comments
Open

Manage File Assets for Batch Processing of a Genome #35

josiahseaman opened this issue Nov 3, 2016 · 0 comments
Assignees

Comments

@josiahseaman
Copy link
Owner

josiahseaman commented Nov 3, 2016

In order to complete #10 Galleries of whole genomes, with Annotations #21, we need the ability to build up file assets and not recompute them for every job. In particular, this turned out to be onerous when creating Annotation FASTA files multiple time for each chromosome. Translocations caused one chromosome to need another chromosome's Annotation FASTA, which then created a lot of unnecessary work.

Faster Turn-Around batch

  • Whole Batch goes in a folder with subfolders for each viz
  • Read all contigs from memory, don’t use ungapped seq files
  • Place each Annotation FASTA in the main folder: These are Assets!
    • Calculate ungapped annotation seq and hold in memory
  • Gapped seq is computed, output, and dumped
  • Do all file generation first, then only compute viz at the end
  • Jobs are driven by asset generation
    • For (asset_name, job) check if asset_name exists, if not, run Job
    • If asset_name exists, read it into memory and move on to next job

Asset Order:

  • Folder Structure
  • Gapped FASTA (subfolder)
  • Annotation FASTA (main folder)
  • Gapped Annotation with differences (subfolder)
  • PNG Composite
  • DeepZoom and HTML
  • Gallery HTML
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant