Skip to content
Aaron McKenna edited this page Jun 22, 2020 · 8 revisions

FlashFry requires the Java virtual machine (JVM) to run. This is on almost every system imaginable these days, so it's probably already on your machine. We've tested it with both Oracle's Java as well as the open JVM. Other requirements include:

  • Your reference genome as a fasta file
  • The region you'd like to score, as a fasta file
  • A computer, with the command line terminal open
  • Java 1.8

Once you have the requirements setup, there are three main steps to running FlashFry. Check out the command-line documentation page on specific runtime options.

  1. First, you build a database using the specified CRISPR motif against the target database using the --analysis index option. This is only done once, as the database is reuseable. You have to choose the enzyme type to use while indexing. As of writing this includes the Cas9s with 23 bp targets: SpCas9 (NAG or NGG), SpCas9NGG (NGG), SpCas9NAG (NAG), and Cpf1 (TTTN) with 24 basepair targets. These are adjustable in the code, or you can create your own. In writing the database temporary files are put in the --tempLocation location. This will take up a bit more space than the final database (maybe 10-20% depending on how duplicated genome targets are). Runtimes on a pretty slow drive look like (formated hours:minutes:seconds):
Genome / version Cas9 (NGG) Cas9 (NGG/NAG) CPF1 (TTTN)
Caenorhabditis elegans - 235 0:3:21 0:6:03 0:5:35
Human - hg38 3:19:29 5:24:55 2:50:59
Mouse - mm10 2:36:53 4:36:03 2:11:35
Drophellia melanegaster - BDGP6 0:6:33 0:10:48 0:5:44
  1. The next step is to find candidate targets within the fasta sequence of interest. The --analysis discover options handles this. The candidates found in the fasta are then run against the off-targets database, and an annotated output file is produced. This output file is a tab-delimited text file.

  2. Lastly, you can score this annotated output file. This is handled by the --analysis score module. We've implemented a fair number of scoring metrics. For guidance on which are appropriate in which situation, please see the wonderful paper by Maximilian Haeussler which analyzed all of these methods in aggregate:

  • hsu2013 - The Hsu et. al. method, also known as crispr.mit.edu score: Pubmed
  • doench2014ontarget - Doench 2014 on-target efficiency score Pubmed
  • doench2016cfd - The Doench 2016 cutting frequency determination score Pubmed
  • moreno2015 - Moreno-Mateos and Vejnar's CRISPRscan on-target method Pubmed

We've also implemented a some additional metrics that are useful in CRISPR library creation:

  • bedannotator - annotation your output targets with information from an associated BED file. It can find the 0 mismatch targets in the genome database and use those to infer the genomic location
  • dangerous - annotate targets that have dangerous sequence features, such as high or extremely low GC, polIII transcriptional terminators, or low entropy.
  • minot - add a column that indicates the minimum mismatches to any off-target hit.