Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CAT_DB process has harcoded CAT subdirectory names #611

Open
3 tasks
maxibor opened this issue Apr 18, 2024 · 1 comment
Open
3 tasks

CAT_DB process has harcoded CAT subdirectory names #611

maxibor opened this issue Apr 18, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@maxibor
Copy link
Member

maxibor commented Apr 18, 2024

Description of the bug

In the CAT_DB process, the subdirectories names are hardcoded (to database and taxonomy), which is problematic because the newer versions of the CAT database these directories renames to db and tax.
Furthermore, the symlinking of these subdirectories in the process might be posing an issue when running using singularity.

ERROR ~ Error executing process > 'NFCORE_MAG:MAG:CAT_DB (20231120_CAT_nr)'

Caused by:
  Missing output file(s) `database/*` expected by process `NFCORE_MAG:MAG:CAT_DB (20231120_CAT_nr)`

Command executed:

  if [[ 20231120_CAT_nr != *.tar.gz ]]; then
      ln -sr `find 20231120_CAT_nr/ -type d -name "*taxonomy*"` taxonomy
      ln -sr `find 20231120_CAT_nr/ -type d -name "*database*"` database
  else
      mkdir catDB
      tar -xf 20231120_CAT_nr -C catDB
      mv `find catDB/ -type d -name "*taxonomy*"` taxonomy/
      mv `find catDB/ -type d -name "*database*"` database/
  fi

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_MAG:MAG:CAT_DB":
      tar: $(tar --version 2>&1 | sed -n 1p | sed 's/tar (GNU tar) //')
  END_VERSIONS

Command exit status:
  0

Command output:
  (empty)

Work dir:
  /home/lucia_winkler/nf-temp/26/59a97e2b1eb9ced31d84c2abe2d7d9

Command used and terminal output

nextflow run nf-core/mag -r 2.5.4 \
    -profile eva,archgen \
    --input /home/lucia_winkler/speleothem/pilot_sequences/2024-04-16_samplesheet.csv \
    --outdir results \
    --reads_minlength 30 \
    --bbnorm \
    --igenomes_base "/home/maxime_borry/SDAG_old/04_genomes/" \
    --host_genome GRCh38 \
    --skip_spades \
    --refine_bins_dastool \
    --ancient_dna \
    --skip_prokka \
    --binning_map_mode own \
    --busco_db "/r1/people/maxime_borry/02_db/busco_downloads" \
    --run_gunc \
    --gunc_db /r1/people/maxime_borry/02_db/gunc/gunc_db_progenomes2.1.dmnd \
    --postbinning_input both \
    --gtdb_db /home/maxime_borry/02_db/gtdb/r207/gtdbtk_r207_v2_data.tar.gz \
    --cat_db "/home/maxime_borry/02_db/cat/20231120_CAT_nr" \
    -resume \
    -with-tower

Relevant files

No response

System information

No response

Tasks

  1. new module
  2. new module
  3. new module
@maxibor maxibor added the bug Something isn't working label Apr 18, 2024
@jfy133
Copy link
Member

jfy133 commented Apr 19, 2024

Agree, that mdoule is very old and rather fragile

We should entirely replace CAT modules with official ones, and I think from: https://github.com/MGXlab/CAT_pack

Which looks MUCH better (although not yet on bioconda), as it also describves hwo to make custom databses etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants