Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bioconda::prokka environment error #172

Open
shigdon opened this issue Feb 8, 2022 · 1 comment
Open

bioconda::prokka environment error #172

shigdon opened this issue Feb 8, 2022 · 1 comment

Comments

@shigdon
Copy link

shigdon commented Feb 8, 2022

Hello,

I was attempting BACTpipe runs on ctmr-gandalf with a test set of 2 isolate pe-fastq samples when unexpectedly the system encountered an error during the Prokka module. The Git Repo was pulled fresh yesterday, Feb 7, 2022. I initially submitted the jobs via the following sbatch script:

#!/bin/bash -login
#SBATCH -D /ceph/projects/201_MiUdda/analysis/smh_V300099797_BACTpipe
#SBATCH -p ctmr
#SBATCH -J Bpipe_0
#SBATCH -t 168:00:00
#SBATCH -N 1
#SBATCH -c 1
#SBATCH --output /ceph/projects/201_MiUdda/analysis/smh_V300099797_BACTpipe/slurm-log/bactpipe-%j.out
#SBATCH --error /ceph/projects/201_MiUdda/analysis/smh_V300099797_BACTpipe/slurm-log/bactpipe-%j.err
#SBATCH --mail-type=END,FAIL
#SBATCH --mail-user=shawn.higdon@ki.se

# activate a snakemake wgs conda environment
conda activate bactpipe

# make things fail on errors
set -o nounset
set -o errexit
set -x


nextflow run ctmrbio/BACTpipe \
    -profile ctmr_gandalf \
    --kraken2_db /ceph/db/kraken2/gtdb_r89_54k \
    --kraken2_confidence 0.5 \
    --keep_shovill_output TRUE \
    --shovill_depth 100 \
    --shovill_minlen 500 \
    --reads '/ceph/projects/201_MiUdda/analysis/smh_V300099797_BACTpipe/test_fq/*_{1,2}.fq.gz' \
    -resume

This produced the following error during prokka module:

[22:33:53] There are still 1247 unannotated CDS left (started with 4883)
  [22:33:53] Will use hmmer3 to search against /ceph/projects/201_MiUdda/analysis/smh_V300099797_BACTpipe/work/conda/env-478453b56157f1ebe398640a7fd2ec99/db/hmm/HAMAP.hmm with 4 CPUs
  [22:33:53] Running: cat 117\-89\-c2_prokka\/117\-89\-c2\.HAMAP\.hmm\.tmp\.2161016\.faa | parallel --gnu --plain -j 4 --block 41360 --recstart '>' --pipe hmmscan --noali --notextw --acc -E 1e-09 --cpu 1 /ceph/projects/201_MiUdda/analysis/smh_V300099797_BACTpipe/work/conda/env-478453b56157f1ebe398640a7fd2ec99/db/hmm/HAMAP.hmm /dev/stdin > 117\-89\-c2_prokka\/117\-89\-c2\.HAMAP\.hmm\.tmp\.2161016\.hmmer3 2> /dev/null
  Bio::SearchIO: hmmer3 cannot be found
  Exception
  ------------- EXCEPTION -------------
  MSG: Failed to load module Bio::SearchIO::hmmer3. Can't locate Bio/SearchIO/hmmer3.pm in @INC (you may need to install the Bio::SearchIO::hmmer3 module) (@INC contains: /ceph/projects/201_MiUdda/analysis/smh_V300099797_BACTpipe/work/conda/env-478453b56157f1ebe398640a7fd2ec99/lib/perl5/5.32/site_perl /ceph/projects/201_MiUdda/analysis/smh_V300099797_BACTpipe/work/conda/env-478453b56157f1ebe398640a7fd2ec99/lib/perl5/site_perl /ceph/projects/201_MiUdda/analysis/smh_V300099797_BACTpipe/work/conda/env-478453b56157f1ebe398640a7fd2ec99/lib/perl5/5.32/vendor_perl /ceph/projects/201_MiUdda/analysis/smh_V300099797_BACTpipe/work/conda/env-478453b56157f1ebe398640a7fd2ec99/lib/perl5/vendor_perl /ceph/projects/201_MiUdda/analysis/smh_V300099797_BACTpipe/work/conda/env-478453b56157f1ebe398640a7fd2ec99/lib/perl5/5.32/core_perl /ceph/projects/201_MiUdda/analysis/smh_V300099797_BACTpipe/work/conda/env-478453b56157f1ebe398640a7fd2ec99/lib/perl5/core_perl .) at /ceph/projects/201_MiUdda/analysis/smh_V300099797_BACTpipe/work/conda/env-478453b56157f1ebe398640a7fd2ec99/lib/perl5/site_perl/Bio/Root/Root.pm line 520.

  STACK Bio::Root::Root::_load_module /ceph/projects/201_MiUdda/analysis/smh_V300099797_BACTpipe/work/conda/env-478453b56157f1ebe398640a7fd2ec99/lib/perl5/site_perl/Bio/Root/Root.pm:522
  STACK (eval) /ceph/projects/201_MiUdda/analysis/smh_V300099797_BACTpipe/work/conda/env-478453b56157f1ebe398640a7fd2ec99/lib/perl5/site_perl/Bio/SearchIO.pm:620
  STACK Bio::SearchIO::_load_format_module /ceph/projects/201_MiUdda/analysis/smh_V300099797_BACTpipe/work/conda/env-478453b56157f1ebe398640a7fd2ec99/lib/perl5/site_perl/Bio/SearchIO.pm:619
  STACK Bio::SearchIO::new /ceph/projects/201_MiUdda/analysis/smh_V300099797_BACTpipe/work/conda/env-478453b56157f1ebe398640a7fd2ec99/lib/perl5/site_perl/Bio/SearchIO.pm:217
  STACK toplevel /ceph/projects/201_MiUdda/analysis/smh_V300099797_BACTpipe/work/conda/env-478453b56157f1ebe398640a7fd2ec99/bin/prokka:1113
  -------------------------------------


  For more information about the SearchIO system please see the SearchIO docs.
  This includes ways of checking for formats at compile time, not run time
  Can't call method "next_result" on an undefined value at /ceph/projects/201_MiUdda/analysis/smh_V300099797_BACTpipe/work/conda/env-478453b56157f1ebe398640a7fd2ec99/bin/prokka line 1114.

I then reran after modifying the nextflow command by substituting the -profile flag for:

-c ctmr_gandalf-custom.config

Where the only change I made to the config was to specify a specific version of prokka to Conda:

// vim: syntax=groovy expandtab
// BACTpipe Nextflow configuration file for use on CTMR Gandalf

params {
    project = 'bio'
    partition = 'ctmr'
}

process {
    errorStrategy = 'terminate'
    executor = 'slurm'
    clusterOptions = {
        " --partition ${params.partition} -A ${params.project}" + (params.clusterOptions ?: '')
    }
    scratch = false
    stageInMode = 'copy'
    stageOutMode = 'copy'

    withName:
    FASTP {
        cpus = 4
        time = 20.m
        conda = 'bioconda::fastp'
    }

    withName:
    SHOVILL {
        cpus = 10
        time = 2.h
        conda = 'bioconda::shovill bioconda::bwa=0.7.16 python=3'
    }

    withName:
    CLASSIFY_TAXONOMY {
        cpus = 10
        time = 30.m
        conda = 'bioconda::kraken2'
    }

    withName:
    ASSEMBLY_STATS {
        cpus = 1
        time = 20.m
        conda = 'bioconda::bbmap'
    }

    withName:
    PROKKA {
        cpus = 8
        time = 2.h
        conda = 'bioconda::prokka=1.14.6'
    }

    withName:
    MULTIQC {
        cpus = 1
        time = 10.m
        conda = 'bioconda::multiqc'
    }
}

Rerunning with this change in configuration profile did not solve the issue and produced the same error.

I am not sure how to proceed in fixing this but my next logical step would be to downgrade the prokka version. Any thoughts or suggestions are much appreciated.

Thanks!

@boulund
Copy link
Member

boulund commented Feb 10, 2022

Sorry to hear you're having issues @shigdon!

There should be no need to use an sbatch script to run the nextflow pipeline; nextflow will automatically do all the required job submissions if you run with the appropriate profile (in this case ctmr_gandalf). You can just run nextflow in a tmux session on the login node, that's perfectly ok!

I agree that it sounds as if the prokka environment isn't working as intended. Perhaps a version change could work, did you have time to try that yet?

Another thing I've been thinking of is to add container directives to the config of all modules so we can use already available biocontainers for all these packages, that should make the pipeline more robust overall and easier to execute in different compute environments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants