Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nanopolish does not include the required HDF5 plugin and/or VBZ decompression #231

Open
Eric-CH-Chen opened this issue Mar 17, 2023 · 7 comments
Labels
bug Something isn't working

Comments

@Eric-CH-Chen
Copy link

Description of the bug

Bug 1: The included nanopolish doesn't included the required HDF5 plugins or VBZ file.

To rerproduce:
When creating a a fresh, base environment and minimal working example. The nanopolish step doesn't work (can't read fast5, error message attached)

When nanopolish is installed in the local environment, (mamba create -n nanopolish_env nanopolish and followed by 'export HDF5_PLUGIN_PATH=/data/eric/test_output/github_issues/ont-vbz-hdf-plugin-1.0.1-Linux/usr/local/hdf5/lib/plugin') it will work in the work folder when executing locally, but resume via nextflow still fails

Bug 2: When nanopolish fail (as noted in the work folder's .command.log), nextflow still reports success since the eventalign output file is still made by nanopolish. This "eventalign.txt" file only have the associated header.

Command used and terminal output

nextflow run nf-core/nanoseq --input example_input.csv --protocol directRNA --skip_demultiplexing --skip_fusion_analysis --skip_differential_analysis -profile singularity

INFO:    Converting SIF file to temporary sandbox...
WARNING: Skipping mount /home/eric/.linuxbrew/var/singularity/mnt/session/etc/resolv.conf [files]: /etc/resolv.conf doesn't exist in container
[readdb] indexing fast5
[readdb] num reads: 32000, num reads with path to fast5: 32000
HDF5-DIAG: Error detected in HDF5 (1.8.18) thread 140642332776192:
  #000: H5Dio.c line 173 in H5Dread(): can't read data
    major: Dataset
    minor: Read failed
  #001: H5Dio.c line 554 in H5D__read(): can't read data
    major: Dataset
    minor: Read failed
  #002: H5Dchunk.c line 1875 in H5D__chunk_read(): unable to read raw data chunk
    major: Low-level I/O
    minor: Read failed
  #003: H5Dchunk.c line 2905 in H5D__chunk_lock(): data pipeline read failed
    major: Data filters
    minor: Filter operation failed
  #004: H5Z.c line 1347 in H5Z_pipeline(): required filter 'vbz' is not registered
    major: Data filters
    minor: Read failed
  #005: H5PL.c line 358 in H5PL_load(): search in paths failed
    major: Plugin for dynamically loaded library
    minor: Can't get value
  #006: H5PL.c line 475 in H5PL__find(): can't open directory: /usr/local/lib/hdf5/plugin
    major: Plugin for dynamically loaded library
    minor: Can't open directory or file
HDF5-DIAG: Error detected in HDF5 (1.8.18) thread 140642315990784:
  #000: H5Dio.c line 173 in H5Dread(): can't read data
    major: Dataset
    minor: Read failed
  #001: H5Dio.c line 554 in H5D__read(): can't read data
    major: Dataset
    minor: Read failed
  #002: H5Dchunk.c line 1875 in H5D__chunk_read(): unable to read raw data chunk
    major: Low-level I/O
    minor: Read failed
  #003: H5Dchunk.c line 2905 in H5D__chunk_lock(): data pipeline read failed
    major: Data filters
    minor: Filter operation failed
  #004: H5Z.c line 1347 in H5Z_pipeline(): required filter 'vbz' is not registered
    major: Data filters
    minor: Read failed
  #005: H5PL.c line 358 in H5PL_load(): search in paths failed
    major: Plugin for dynamically loaded library
    minor: Can't get value
  #006: H5PL.c line 475 in H5PL__find(): can't open directory: /usr/local/lib/hdf5/plugin
    major: Plugin for dynamically loaded library
    minor: Can't open directory or file
HDF5-DIAG: Error detected in HDF5 (1.8.18) thread 140642324383488:
  #000: H5Dio.c line 173 in H5Dread(): can't read data
    major: Dataset
    minor: Read failed
  #001: H5Dio.c line 554 in H5D__read(): can't read data
    major: Dataset
    minor: Read failed
  #002: H5Dchunk.c line 1875 in H5D__chunk_read(): unable to read raw data chunk
    major: Low-level I/O
    minor: Read failed
  #003: H5Dchunk.c line 2905 in H5D__chunk_lock(): data pipeline read failed
    major: Data filters
    minor: Filter operation failed
  #004: H5Z.c line 1347 in H5Z_pipeline(): required filter 'vbz' is not registered
    major: Data filters
    minor: Read failed
  #005: H5PL.c line 358 in H5PL_load(): search in paths failed
    major: Plugin for dynamically loaded library
    minor: Can't get value
  #006: H5PL.c line 475 in H5PL__find(): can't open directory: /usr/local/lib/hdf5/plugin
    major: Plugin for dynamically loaded library
    minor: Can't open directory or file
HDF5-DIAG: Error detected in HDF5 (1.8.18) thread 140642349561600:
  #000: H5Dio.c line 173 in H5Dread(): can't read data
    major: Dataset
    minor: Read failed
  #001: H5Dio.c line 554 in H5D__read(): can't read data
    major: Dataset
    minor: Read failed
  #002: H5Dchunk.c line 1875 in H5D__chunk_read(): unable to read raw data chunk
    major: Low-level I/O
    minor: Read failed
  #003: H5Dchunk.c line 2905 in H5D__chunk_lock(): data pipeline read failed
    major: Data filters
    minor: Filter operation failed
  #004: H5Z.c line 1347 in H5Z_pipeline(): required filter 'vbz' is not registered
    major: Data filters
    minor: Read failed
  #005: H5PL.c line 358 in H5PL_load(): search in paths failed
    major: Plugin for dynamically loaded library
    minor: Can't get value
  #006: H5PL.c line 475 in H5PL__find(): can't open directory: /usr/local/lib/hdf5/plugin
    major: Plugin for dynamically loaded library
    minor: Can't open directory or file
...

Relevant files

No response

System information

Container engine: Singularity
Executor: local
Nextflow: v22.10.7
nf-core/nanoseq: v3.1.0-g6e563e5
OS: Ubuntu 20.04.3 LTS (GNU/Linux 5.15.0-48-generic x86_64)

@Eric-CH-Chen Eric-CH-Chen added the bug Something isn't working label Mar 17, 2023
@yinshiyi
Copy link

this is assuming the input fast5 is the newer compression version that requires vbz.
I think it worked fine for fast5 that is the non-vbz compression

@matthewstuartedwards
Copy link

I've been looking into this issue as it has affected my runs of newer nanopore data created with the VBZ compression. At the nanopolish Github page, version 0.13.3 has noted better handling of VBZ-compressed files. I believe that upgrading the nanopolish container to > 0.13.3 would fix this issue as the VBZ plugin would be included in the container.

Biocontainers doesn't have 0.13.3, but it has 0.14.

I'm currently testing this to make sure that it does indeed solve the issue.

@vetmohit89
Copy link

Hello Matt,

Is there any update on this error fix? I am getting similar error when trying to do RNA modification study with nanoseq.

@christopher-hakkaart
Copy link
Member

I agree with @matthewstuartedwards - a quick update to a newer container should resolve this. I haven't tested it but I would hope 0.14 would resolve the issue.

@matthewstuartedwards
Copy link

matthewstuartedwards commented Aug 16, 2023

Hey, I did test this out with the nanopolish-0.14.0--h773013f_3.img container of nanopolish. This does have the VBZ plugin library installed, but they did not add the required environment variables in the container and you will see an error that says:

The fast5 file is compressed with VBZ but the required plugin is not loaded. Please read the instructions here: nanoporetech/vbz_compression#5

Now since this is just an evironment variable being set issue, you can just make sure this is added at runtime in various ways. I added a file VBZ.conf in my locally downloaded version of the nanoseq pipeline and added it using the -profile flag when running my pipeline.

The contents of the VBZ.conf file are:

env {
        HDF5_PLUGIN_PATH = '/usr/local/hdf5/lib/plugin'
}

This is one method of adding the environment variable. I'm sure there's other ways too. Also, the best way to handle this is just for Nanopolish to have that environment variable set when creating the container.

After adding the environment variable, the pipeline was able to execute on the newer VBZ compressed fast5 files.

@vetmohit89
Copy link

Dear Matt,

Thank you so much for prompt response. Just making sure if you added VBZ.conf using -profile flag or -c.

Also, please see if following steps I need to follow to fix this issue:

In the following directory:

/data/user//.nextflow/assets/nf-core/nanoseq
[mbansal@c0158 nanoseq]$ ls
CHANGELOG.md  
CODE_OF_CONDUCT.md  README.md  bin   docs  main.nf  modules.json     nextflow_schema.json  subworkflows
CITATIONS.md  LICENSE             assets     conf  lib   modules  nextflow.config  pyproject.toml        workflows

I will download vbz config from conda in this directory and create a config file with its path and provide this config file in the command to run this pipeline?

@matthewstuartedwards
Copy link

I added the VBZ.conf file using the -profile flag. So it's just -profile VBZ

The fix isn't in the VBZ from conda. The main issue is you need a VBZ plugin that is provided by Oxford Nanopore. If you look at the package recipe for nanopolish, it now contains the ont_vbz_hdf_plugin which is needed to decompress the fast5 files.

So to fix the issue you need to:

  1. Update the container used by nf-core/nanoseq/modules/local/nanopolish_index_eventalign.nf. I've commented out the old code below. I don't know if you can just update the conda line in the same way. I've tested this using Singularity.
    container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
    //    'https://depot.galaxyproject.org/singularity/nanopolish:0.13.2--he3b7ca5_2' :
        'https://depot.galaxyproject.org/singularity/nanopolish:0.14.0--h773013f_3' :
    //    'quay.io/biocontainers/nanopolish:0.13.2--he3b7ca5_2' }"
        'quay.io/biocontainers/nanopolish:0.14.0--h773013f_3' }"
  1. Get the HDF5_PLUGIN_PATH set in the container by some method such as the VBZ.conf as I've suggested as a short term solution.

You shouldn't need anything more to fix the issue provided you're using containers to execute nanopolish.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants