Testing
To test readfish on your configuration we recommend first running a playback experiment to test unblock speed and then selection.
- Download an open access bulk FAST5 file from here. This file is 21Gb so make sure you have plenty of space.
- To configure a run for playback, you need to find and edit a sequencing TOML
file. These are typically located in
/opt/ont/minknow/conf/package/sequencing
. Edit a file such as sequencing_MIN106_DNA.toml and under the entry[custom_settings]
add a field:simulation = "/full/path/to/your_bulk.FAST5"
- If running GUPPY in GPU mode, set the parameter
break_reads_after_seconds = 1.0
tobreak_reads_after_seconds = 0.4
. - If using MinKNOW 4.0 or later, you need to reload scripts.
To do so, click on "start" to setup a new run.
Then choose "start sequencing".
In the next window you will see three dots arranged one above the other in an icon in the top right hand corner.
Click these and choose
Reload Scripts
. Your version of MinKNOW will now playback the bulkfile rather than live sequencing. - Insert a configuration test flowcell into the sequencing device.
- Start a sequencing run as you would normally, selecting the corresponding flow cell type to the edited script (here FLO-MIN106) as the flowcell type.
- The run should start and immediately begin a mux scan. Let it run for around fifteen minutes after which your read length histogram should look as below:
- Now stop the run.
Now we shall test unblocking by running readfish unblock-all
which will simply eject
every single read on the flow cell.
- Start a new sequencing run as above.
- Now start a
readfish unblock-all
run. To do this run:readfish unblock-all --device <YOUR_DEVICE_ID> --experiment-name "Testing ReadFish Unblock All"
- Leave the run for around 15 minutes and observe the read length histogram.
If unblocks are happening correctly you will see something like the below:
A closeup of the unblock peak shows reads being unblocked quickly:
This compares with the control run:
If you are happy with the unblock response, move onto testing basecalling.
If you are not happy with the unblock response you can try adjusting the throttle. This throttle limits the rate at which messages are sent to MinKNOW to perform unblocks. By default we have set this to 0.4 seconds. In our experience setting the throttle to be same same size as the break_reads_after_seconds parameter can be helpful.
To change this run:
readfish unblock-all --device <YOUR_DEVICE_ID> --experiment-name "Testing ReadFish Unblock All" --throttle <YOUR_VALUE_HERE>
To test selective sequencing you must have access to a guppy basecall server (>=4.0.4) and configure a TOML file. Here we provide an example TOML file.
-
First make a local copy of the example TOML file:
curl -O https://github.com/LooseLab/readfish/blob/master/examples/human_chr_selection.toml
-
Modify the
reference
field in the file to be the full path to a minimap2 index of the human genome. -
Modify the
targets
fields for each condition to reflect the naming convention used in your index. This is the sequence name only, up to but not including any whitespace. e.g.>chr1 human chromosome 1
would becomechr1
. If these names do not match, then target matching will fail. -
We provide a JSON schema and a script for validating configuration files which will let you check if the toml will drive an experiment as you expect:
readfish validate human_chr_selection.toml
Errors with the configuration will be written to the terminal along with a text description of the conditions for the experiment as below.
readfish validate examples/human_chr_selection.toml 😻 Looking good! Generating experiment description - please be patient! This experiment has 1 region on the flowcell Using reference: /path/to/reference.mmi Region 'select_chr_21_22' (control=False) has 2 targets of which 2 are in the reference. Reads will be unblocked when classed as single_off or multi_off; sequenced when classed as single_on or multi_on; and polled for more data when classed as no_map or no_seq.
-
If your toml file validates then run the following command:
readfish targets --device <YOUR_DEVICE_ID> \ --experiment-name "RU Test basecall and map" \ --toml <PATH_TO_TOML> \ --log-file ru_test.log
-
In the terminal window you should see messages reporting the speed of mapping of the form:
2020-02-24 16:45:35,677 ru.ru_gen 7R/0.03526s 2020-02-24 16:45:35,865 ru.ru_gen 3R/0.02302s 2020-02-24 16:45:35,965 ru.ru_gen 4R/0.02249s
Note: if these times are longer than 0.4 seconds (or your
break_reads_after_seconds
value) you may have performance issues. -
In the MinKNOW messages interface you should see the experiment description as generated by the readfish validate command above.
The only way to test readfish on a playback run is to look at changes in read length for rejected vs accepted reads. To do this:
- Start a fresh simulation run using the bulkfile provided above.
- Restart the readfish command (as above):
readfish targets --device <YOUR_DEVICE_ID> \ --experiment-name "RU Test basecall and map" \ --toml <PATH_TO_TOML> \ --log-file ru_test.log
- Allow the run to proceed for at least 30 minutes (making sure you are writing out read data!).
- After 30 minutes it should look something like this: Zoomed in on the unblocks:
- Run
readfish summary
to check if your run has performed as expected. This file requires the path to your toml file followed by the path to your fastq reads. Typical results are provided below and show longer mean read lengths for the two selected chromosomes (here chr21 and chr22). Note the mean read lengths observed will be dependent on system performance. Optimal guppy configuration for your system is left to the user.contig number sum min max std mean median N50 chr1 2045 8031506 220 318254 15566 3927 1476 26513 chr10 1109 4723969 263 261207 14559 4260 1592 27313 chr11 1232 4754809 213 304465 16228 3859 1314 38943 chr12 1050 4526674 261 166256 12536 4311 1508 23582 chr13 684 3126069 184 299397 18358 4570 1573 35034 chr14 796 4263462 242 249680 18806 5356 1502 37446 chr15 994 5240288 245 187955 17111 5272 1489 48036 chr16 429 2702573 233 180260 16343 6300 1841 33347 chr17 574 3453521 271 388105 23709 6017 1482 69464 chr18 538 3873005 349 274407 24659 7199 1424 70263 chr19 483 2625211 248 163416 16557 5435 1564 43457 chr2 1402 8174215 220 303798 19553 5830 1526 42543 chr20 342 2214472 225 209686 20661 6475 1456 55394 chr21 57 1758058 347 254718 46708 30843 9409 83729 chr22 69 851125 447 77401 15509 12335 5952 25811 chr3 2119 7585521 197 325017 14512 3580 1412 25708 chr4 1367 8772764 211 307864 23260 6418 1605 64709 chr5 1527 6629025 221 223762 15298 4341 1385 40421 chr6 1450 6101223 236 260918 15773 4208 1514 28634 chr7 1291 5812463 155 350180 16907 4502 1540 34863 chr8 1001 4849272 214 317181 19480 4844 1420 50186 chr9 1113 5505104 219 485498 21692 4946 1504 40112 chrM 84 298104 276 16409 4132 3549 1694 9137 chrX 941 5713488 213 320496 22251 6072 1409 65532 chrY 5 138365 2043 87799 35701 27673 14315 87799
After completing your tests you should remove the simulation line from the sequencing_MIN106_DNA.toml file. You MUST then reload the scripts. If using Guppy GPU basecalling leave the break_reads_after_seconds parameter as 0.4.
Observant users may note that the median read lengths shown here are longer than we have previously described. These effects are all down to basecalling speed. We note that changes made to ONT basecalling configurations have changed speed performance. Users are advised to look at our basecalling page on the wiki.