Skip to content

readfish align

mattloose edited this page Jan 13, 2023 · 5 revisions

Deprecation notice.

Newer versions of ReadFish will be removing this functionality. It was intended as a demonstration of how this experimental approach might be applied and used Run Until features. Newer code (such as BOSS_RUNS) provides more sophisticated approaches for this type of analysis.

Purpose

The purpose of readfish align is to give uniform coverage over a range of sequences present on a flowcell. As an example, we have used this to reach 40x coverage on a mixed bacterial/eukaryotic sample available from Zymobiomics.

The underlying asssumption with readfish align is that the user is aware of the makeup of the sample being sequenced. If you wish to experiment with samples with an unknown composition, you should investigate readfish centrifue.

readfish align -h
usage: readfish align [-h] [--host HOST] [--port PORT] --device DEVICE
                      --experiment-name EXPERIMENT-NAME [--workers WORKERS]
                      [--channels CHANNELS CHANNELS] [--run-time RUN-TIME]
                      [--unblock-duration UNBLOCK-DURATION]
                      [--cache-size CACHE-SIZE] [--batch-size BATCH-SIZE]
                      [--throttle THROTTLE] [--dry-run]
                      [--log-level LOG-LEVEL] [--log-format LOG-FORMAT]
                      [--log-file LOG-FILE] --toml TOML [--paf-log PAF_LOG]
                      [--chunk-log CHUNK_LOG] [--watch FOLDER] [--depth DEPTH]
                      [--threads THREADS]

optional arguments:
  -h, --help            show this help message and exit
  --host HOST           MinKNOW server host (default: 127.0.0.1)
  --port PORT           MinKNOW server port (default: 9501)
  --device DEVICE       Name of the sequencing position e.g. MS29042 or X1
                        etc.
  --experiment-name EXPERIMENT-NAME
                        Describe the experiment being run, enclose in quotes
  --workers WORKERS     Number of worker threads (default: 1)
  --channels CHANNELS CHANNELS
                        Channel range to use as a sequence, expects two
                        integers separated by a space (default: [1, 512])
  --run-time RUN-TIME   Period (seconds) to run the analysis (default:
                        172,800)
  --unblock-duration UNBLOCK-DURATION
                        Time, in seconds, to apply unblock voltage (default:
                        0.1)
  --cache-size CACHE-SIZE
                        The size of the read cache in the ReadUntilClient
                        (default: 512)
  --batch-size BATCH-SIZE
                        The maximum number of reads to pull from the read
                        cache (default: 512)
  --throttle THROTTLE   Time interval, in seconds, between requests to the
                        ReadUntilClient (default: 0.4)
  --dry-run             Run the ReadFish Until experiment without sending
                        unblock commands
  --log-level LOG-LEVEL
                        One of: debug, info, warning, error or critical
  --log-format LOG-FORMAT
                        A standard Python logging format string (default:
                        '%(asctime)s %(name)s %(message)s')
  --log-file LOG-FILE   A filename to write logs to, or None to write to the
                        standard stream (default: None)
  --toml TOML           TOML file specifying experimental parameters
  --paf-log PAF_LOG     PAF log
  --chunk-log CHUNK_LOG
                        Chunk log
  --watch FOLDER        Top Level Folder containing fastq reads.
  --depth DEPTH         Desired coverage depth (default 30)
  --threads THREADS     Set the number of default threads to use for threaded
                        tasks (default 2) 

Minimal commands for running readfish align are:

readfish align --device <DEVICE_ID> --toml <your_toml_file.toml> --depth <target_depth_e.g 30> --exp <"Free text describing the experiment.">

For a toml file the configuration should be as follows:

[caller_settings]
host = "127.0.0.1"
port = 5555
config_name = "dna_r9.4.1_450bps_fast"

[conditions]
reference = "<path to your reference>"

[conditions.0]
name = "Gradual Rejection"
targets = [ ]
control = false
max_chunks = inf
min_chunks = 0
multi_on = "unblock"
single_on = "unblock"
no_map = "proceed"
no_seq = "proceed"
multi_off = "stop_receiving"
single_off = "stop_receiving"

This toml file will reject any read which is found in the reference and matches a target. The starting toml file does not have to contain any targets, meaning that all reads will be sequenced. The sequenced reads will be mapped back to the reference provided. Once coverage for a specific sequence in the reference has exceeded the threshold set, the sequence name is added to the toml file targets automatically and the sequence is rejected. To do this, a new toml file is created with an .toml_live extension.

[caller_settings]
host = "127.0.0.1"
port = 5555
config_name = "dna_r9.4.1_450bps_fast"

[conditions]
reference = "<path to your reference>"

[conditions.0]
name = "Gradual Rejection"
targets = [ "Saccharomyces_cerevisiae_V", "Listeria_monocytogenes_complete_genome",]
control = false
max_chunks = inf
min_chunks = 0
multi_on = "unblock"
single_on = "unblock"
no_map = "proceed"
no_seq = "proceed"
multi_off = "stop_receiving"
single_off = "stop_receiving"

Once all sequences in the reference have been added to the toml file, it means that the average coverage depth for all sequences is at the minimum level specified in the readfish align command and the run will be stopped.

readfish align will log a series of messages to the minKNOW interface during the run to inform the user about events.

These will look as follows: Connection

ReadFish will also tell you what it is doing: Init message

and let you know it really is going to do stuff....: ReadFish Live

Then over time readfish will tell you as it is adding targets to be rejected: AddTarget

Readfish will also log the proportion of reads being accepted at any given time: AcceptProportions.

Finally ReadFish will tell you when the job is done and stop the sequencing run. Completed