Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dorado plugin #344

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Open

Dorado plugin #344

wants to merge 15 commits into from

Conversation

mattloose
Copy link
Contributor

@mattloose mattloose commented Apr 3, 2024

Create a new dorado plugin to handle the latest dorado_basecall_server version (7.3.9), which only accepts connections from ont-pybasecall-client-lib, deprecating ont-pyguppy-client-lib.

This pull request is to trigger discussion about the best way to switch to the new ONT pybasecaller client.

Rather than introduce this in the guppy plugin I have created a distinct dorado plugin.

This is due to changes in how the reads need to be packaged for the client, which expects some surprising data (e.g sample rate). Given the switch from guppy to dorado at ONT it may make sense in the future for people to be loading a dorado plugin instead of a guppy plugin. Within a few months many users may have forgotten about (or never known about) guppy in the first place.

Related to the above, we need to find a way of providing the sample rate to the basecall client. Currently the base caller has no way of accessing this?

Any suggestions?

Until this is resolved the PR is not fit for merging.

…t_pyguppy_client_lib to ont_pybasecaller_client_lib
@mattloose
Copy link
Contributor Author

In addition this request needs to know if it should send read_id or read name - it has not yet been updated to use the read name.

@Adoni5
Copy link
Contributor

Adoni5 commented Apr 17, 2024

This should close #347 - which will start to appear more as more people update.

I think we should merge #324 before we merge this.

I also think that making this a new plugin is probably the right way to go! It's going to be heavy duplication of the guppy.py plugin - but I suggest we add a deprecation warning when running the guppy plugin - and state in that we will not support that plugin moving forwards, and it is provided as is, with support for dorado only moving forwards.

@mattloose could we take the sample_rate from the caller config?

@Adoni5
Copy link
Contributor

Adoni5 commented Apr 17, 2024

Uhoh spaggetios

==================================================================================== test session starts ====================================================================================
platform linux -- Python 3.11.0, pytest-7.4.3, pluggy-1.3.0
rootdir: /home/adoni5/Projects/readfish
configfile: pyproject.toml
testpaths: tests/*test.py, src/readfish
collected 65 items / 1 error                                                                                                                                                                

========================================================================================== ERRORS ===========================================================================================
______________________________________________________________________ ERROR collecting src/readfish/plugins/guppy.py _______________________________________________________________________
../../miniforge3/envs/readfish/lib/python3.11/site-packages/_pytest/runner.py:341: in from_call
    result: Optional[TResult] = func()
../../miniforge3/envs/readfish/lib/python3.11/site-packages/_pytest/runner.py:372: in <lambda>
    call = CallInfo.from_call(lambda: list(collector.collect()), "collect")
../../miniforge3/envs/readfish/lib/python3.11/site-packages/_pytest/doctest.py:567: in collect
    module = import_path(
../../miniforge3/envs/readfish/lib/python3.11/site-packages/_pytest/pathlib.py:567: in import_path
    importlib.import_module(module_name)
../../miniforge3/envs/readfish/lib/python3.11/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
<frozen importlib._bootstrap>:1206: in _gcd_import
    ???
<frozen importlib._bootstrap>:1178: in _find_and_load
    ???
<frozen importlib._bootstrap>:1149: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:690: in _load_unlocked
    ???
<frozen importlib._bootstrap_external>:940: in exec_module
    ???
<frozen importlib._bootstrap>:241: in _call_with_frames_removed
    ???
src/readfish/plugins/guppy.py:16: in <module>
    from pyguppy_client_lib.helper_functions import package_read
../../miniforge3/envs/readfish/lib/python3.11/site-packages/pyguppy_client_lib/helper_functions.py:17: in <module>
    from pyguppy_client_lib.client_lib import GuppyClient
E   ImportError: NumPy: dtype is already registered
================================================================================== short test summary info ==================================================================================
ERROR src/readfish/plugins/guppy.py - ImportError: NumPy: dtype is already registered
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
===================================================================================== 1 error in 0.35s ======================================================================================

Seems installing ont-pybasecall-client-lib 7.3.10 alongside ont-pyguppy-client-lib might not work

@mattloose
Copy link
Contributor Author

Yes - we need some kind of conditional install? IT's a bit of a pain this for sure.

This is so pytest collections (The only time both are imported into the same interperter runtime) doesn't crash.
This requires all tests to only test one, so I've moved all tests to test dorado
This is due to the fact that pytest imports all files, the only time that pyguppy-client-lib and pybasecall-client-lib are imported at the same time
In a real experiment, the plugin choice for base-caller imports the correct module and leaves the other one out of the interpreter runtime
…s we can't really cover them

They require things link a read_until_api or live base caller to properly test
@Adoni5
Copy link
Contributor

Adoni5 commented Apr 17, 2024

Needs the documentation updating - this is confirmed working with latest Minknow /Dorado 7.3 plus, and should work with <= 7.2.
I've bumped the version to 2024.0.0, should maybe be 2024.1.0
I've moved all tests to test dorado plugin, and added the new ont-pybasecall-client-lib to pyproject.toml
Excluded some plugins/files we can't really test from coverage reporting

Final steps:

  • Get sample rate from either MinKNOW API/pybasecall-client-lib connection, provide to package_reads accordingly
  • Update documentation
  • Add deprecation warning to guppy.py

@mattloose
Copy link
Contributor Author

Updated to pull sample rate from MinKNOW. But also have found an issue with dorado returning two reads with the same id in the same batch.

@Adoni5 Adoni5 changed the title Introducing a dorado specific plugin which handles the change from on… Dorado plugin Apr 22, 2024
alexomics
alexomics previously approved these changes Apr 23, 2024
mattloose and others added 2 commits May 7, 2024 17:58
* Initial validation of minknow version and guppy dorado versions

* MinKNOW compatibility check function
Base-caller compatibiity check

* Compatibility checks

* Remove rogue comment

* edit insanely long string multiline string example of sub tags

* Local updates

* Error checking and minor corrections to address compatibility. We no longer raise RuntimeException when a version issue is encountered. This means that readfish will continue to function with future versions of minKNOW if compatible.

* Add default when popping `sample_rate` from guppy params
Prevents `KeyError` from being raised if `sample_rate`isn't listed as a `kwarg`

* Remove unused basecaller compatibilty function
We now just check in `dorado.py` and warn if there is a mismatch

* Correct docstring, add doctests

* Add more complete docstring to `DIRECTION` enum
Only return Enum variant from `check_compatibility`, not tuple of (bool, variant)

* Suggest opening an issue if a suitable version of readfish doesn't exist

---------

Co-authored-by: Adoni5 <roryjmunro1@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants