Testing

To test readfish on your configuration we recommend first running a playback experiment to test unblock speed and then selection.

Configuring bulk FAST5 file Playback

Download an open access bulk FAST5 file from here. This file is 21Gb so make sure you have plenty of space.
To configure a run for playback, you need to find and edit a sequencing TOML file. These are typically located in /opt/ont/minknow/conf/package/sequencing. Edit a file such as sequencing_MIN106_DNA.toml and under the entry [custom_settings] add a field:
```
simulation = "/full/path/to/your_bulk.FAST5"
```
If running GUPPY in GPU mode, set the parameter break_reads_after_seconds = 1.0 to break_reads_after_seconds = 0.4.
If using MinKNOW 4.0 or later, you need to reload scripts. To do so, click on "start" to setup a new run. Then choose "start sequencing". In the next window you will see three dots arranged one above the other in an icon in the top right hand corner. Click these and choose Reload Scripts. Your version of MinKNOW will now playback the bulkfile rather than live sequencing.
Insert a configuration test flowcell into the sequencing device.
Start a sequencing run as you would normally, selecting the corresponding flow cell type to the edited script (here FLO-MIN106) as the flowcell type.
The run should start and immediately begin a mux scan. Let it run for around fifteen minutes after which your read length histogram should look as below:
Now stop the run.

Testing unblock response

Now we shall test unblocking by running readfish unblock-all which will simply eject every single read on the flow cell.

Start a new sequencing run as above.

Now start a readfish unblock-all run. To do this run:

readfish unblock-all --device <YOUR_DEVICE_ID> --experiment-name "Testing ReadFish Unblock All"

Leave the run for around 15 minutes and observe the read length histogram. If unblocks are happening correctly you will see something like the below: A closeup of the unblock peak shows reads being unblocked quickly: This compares with the control run:

If you are happy with the unblock response, move onto testing basecalling.

If you are not happy with the unblock response you can try adjusting the throttle. This throttle limits the rate at which messages are sent to MinKNOW to perform unblocks. By default we have set this to 0.4 seconds. In our experience setting the throttle to be same same size as the break_reads_after_seconds parameter can be helpful.

To change this run:

readfish unblock-all --device <YOUR_DEVICE_ID> --experiment-name "Testing ReadFish Unblock All" --throttle <YOUR_VALUE_HERE>

Testing basecalling and mapping.

To test selective sequencing you must have access to a guppy basecall server (>=4.0.4) and configure a TOML file. Here we provide an example TOML file.

First make a local copy of the example TOML file:

curl -O https://github.com/LooseLab/readfish/blob/master/examples/human_chr_selection.toml

Modify the reference field in the file to be the full path to a minimap2 index of the human genome.
Modify the targets fields for each condition to reflect the naming convention used in your index. This is the sequence name only, up to but not including any whitespace. e.g. >chr1 human chromosome 1 would become chr1. If these names do not match, then target matching will fail.

We provide a JSON schema and a script for validating configuration files which will let you check if the toml will drive an experiment as you expect:

readfish validate human_chr_selection.toml

Errors with the configuration will be written to the terminal along with a text description of the conditions for the experiment as below.

readfish validate examples/human_chr_selection.toml
😻 Looking good!
Generating experiment description - please be patient!
This experiment has 1 region on the flowcell

Using reference: /path/to/reference.mmi

Region 'select_chr_21_22' (control=False) has 2 targets of which 2 are
in the reference. Reads will be unblocked when classed as single_off
or multi_off; sequenced when classed as single_on or multi_on; and
polled for more data when classed as no_map or no_seq.

If your toml file validates then run the following command:

readfish targets --device <YOUR_DEVICE_ID> \
              --experiment-name "RU Test basecall and map" \
              --toml <PATH_TO_TOML> \
              --log-file ru_test.log

In the terminal window you should see messages reporting the speed of mapping of the form:
```
2020-02-24 16:45:35,677 ru.ru_gen 7R/0.03526s
2020-02-24 16:45:35,865 ru.ru_gen 3R/0.02302s
2020-02-24 16:45:35,965 ru.ru_gen 4R/0.02249s
```
Note: if these times are longer than 0.4 seconds (or your break_reads_after_seconds value) you may have performance issues.
In the MinKNOW messages interface you should see the experiment description as generated by the readfish validate command above.

Testing expected results from a selection experiment.

The only way to test readfish on a playback run is to look at changes in read length for rejected vs accepted reads. To do this:

Start a fresh simulation run using the bulkfile provided above.

Restart the readfish command (as above):

readfish targets --device <YOUR_DEVICE_ID> \
              --experiment-name "RU Test basecall and map" \
              --toml <PATH_TO_TOML> \
              --log-file ru_test.log

Allow the run to proceed for at least 30 minutes (making sure you are writing out read data!).
After 30 minutes it should look something like this: Zoomed in on the unblocks:

Run readfish summary to check if your run has performed as expected. This file requires the path to your toml file followed by the path to your fastq reads. Typical results are provided below and show longer mean read lengths for the two selected chromosomes (here chr21 and chr22). Note the mean read lengths observed will be dependent on system performance. Optimal guppy configuration for your system is left to the user.

contig  number      sum   min     max    std   mean  median    N50
  chr1    2045  8031506   220  318254  15566   3927    1476  26513
 chr10    1109  4723969   263  261207  14559   4260    1592  27313
 chr11    1232  4754809   213  304465  16228   3859    1314  38943
 chr12    1050  4526674   261  166256  12536   4311    1508  23582
 chr13     684  3126069   184  299397  18358   4570    1573  35034
 chr14     796  4263462   242  249680  18806   5356    1502  37446
 chr15     994  5240288   245  187955  17111   5272    1489  48036
 chr16     429  2702573   233  180260  16343   6300    1841  33347
 chr17     574  3453521   271  388105  23709   6017    1482  69464
 chr18     538  3873005   349  274407  24659   7199    1424  70263
 chr19     483  2625211   248  163416  16557   5435    1564  43457
  chr2    1402  8174215   220  303798  19553   5830    1526  42543
 chr20     342  2214472   225  209686  20661   6475    1456  55394
 chr21      57  1758058   347  254718  46708  30843    9409  83729
 chr22      69   851125   447   77401  15509  12335    5952  25811
  chr3    2119  7585521   197  325017  14512   3580    1412  25708
  chr4    1367  8772764   211  307864  23260   6418    1605  64709
  chr5    1527  6629025   221  223762  15298   4341    1385  40421
  chr6    1450  6101223   236  260918  15773   4208    1514  28634
  chr7    1291  5812463   155  350180  16907   4502    1540  34863
  chr8    1001  4849272   214  317181  19480   4844    1420  50186
  chr9    1113  5505104   219  485498  21692   4946    1504  40112
  chrM      84   298104   276   16409   4132   3549    1694   9137
  chrX     941  5713488   213  320496  22251   6072    1409  65532
  chrY       5   138365  2043   87799  35701  27673   14315  87799

After completing your tests you should remove the simulation line from the sequencing_MIN106_DNA.toml file. You MUST then reload the scripts. If using Guppy GPU basecalling leave the break_reads_after_seconds parameter as 0.4.

Observant users may note that the median read lengths shown here are longer than we have previously described. These effects are all down to basecalling speed. We note that changes made to ONT basecalling configurations have changed speed performance. Users are advised to look at our basecalling page on the wiki.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Testing

Testing

Configuring bulk FAST5 file Playback

Testing unblock response

Testing basecalling and mapping.

Testing expected results from a selection experiment.

Clone this wiki locally