Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Summary can't find the samples #509

Open
andreaniml opened this issue Apr 20, 2024 · 31 comments
Open

Summary can't find the samples #509

andreaniml opened this issue Apr 20, 2024 · 31 comments
Labels
bug Something isn't working fixed

Comments

@andreaniml
Copy link

Description
I was able to run the main pipeline, I used bakta as an annotator instead of prokka

I am not able to run neither summary command or the pangenome workflow using panaroo. I think it is a bug similar to the issue open for PIRATE (#507), it is having problems to find the samples.

Steps to Reproduce
Steps to reproduce the behavior for panaroo:

bactopia-summary --bactopia-path Results_Processed

024-04-20 12:05:47 INFO     2024-04-20 12:05:47:root:INFO - Found 323 samples in Results_Processed to    summary.py:351
                             process
2024-04-20 12:05:48 WARNING  2024-04-20 12:05:48:root:WARNING - No samples found to process!              summary.py:514

Steps for reproduce panaroo bug

bactopia --wf pangenome --bactopia /home/malu/Projetos/Brazilian_Flagelin/Results_Processed/ --use_panaroo --panaroo_opts --remove-invalid-genes --skip_recombination --max_cpus 12 -qs 1 -bg --output-dir Results_Processed

Found 323 samples to process

If this looks wrong, now's your chance to back out (CTRL+C 3 times).
Sleeping for 5 seconds...
--------------------------------------------------------------------
WARN: The operator `first` is useless when applied to a value channel which returns a single value by definition
[d8/52a217] Submitted process > BACTOPIATOOLS:PANGENOME:PG_TOOL:PANAROO_RUN (panaroo)
[d8/52a217] NOTE: Process `BACTOPIATOOLS:PANGENOME:PG_TOOL:PANAROO_RUN (panaroo)` terminated with an error exit status (123) -- Execution is retried (1)
[87/ea33eb] Re-submitted process > BACTOPIATOOLS:PANGENOME:PG_TOOL:PANAROO_RUN (panaroo)
[87/ea33eb] NOTE: Process `BACTOPIATOOLS:PANGENOME:PG_TOOL:PANAROO_RUN (panaroo)` terminated with an error exit status (123) -- Execution is retried (2)
[47/073ce0] Re-submitted process > BACTOPIATOOLS:PANGENOME:PG_TOOL:PANAROO_RUN (panaroo)
[47/073ce0] NOTE: Process `BACTOPIATOOLS:PANGENOME:PG_TOOL:PANAROO_RUN (panaroo)` terminated with an error exit status (123) -- Execution is retried (3)
[da/6dd4d7] Re-submitted process > BACTOPIATOOLS:PANGENOME:PG_TOOL:PANAROO_RUN (panaroo)
ERROR ~ Error executing process > 'BACTOPIATOOLS:PANGENOME:PG_TOOL:PANAROO_RUN (panaroo)'

Caused by:
  Process `BACTOPIATOOLS:PANGENOME:PG_TOOL:PANAROO_RUN (panaroo)` terminated with an error exit status (123)

Command executed:

  mkdir gff
  cp -L gff-tmp/* gff/
  find gff/ -name "*.gff.gz" | xargs gunzip

  panaroo \
      --clean-mode strict --threshold 0.98 --family_threshold 0.7 --len_dif_percent 0.98 --alignment core --aligner mafft --core_threshold 0.95 true \
      -t 12 \
      -o results \
      -i gff/*.gff

  # Cleanup
  find . -name "*.fas" | xargs -I {} -P 12 -n 1 gzip {}

  if [[ -f "results/core_gene_alignment.aln" ]]; then
      gzip results/core_gene_alignment.aln
      cp results/core_gene_alignment.aln.gz ./core-genome.aln.gz
  fi

  cat <<-END_VERSIONS > versions.yml
  "BACTOPIATOOLS:PANGENOME:PG_TOOL:PANAROO_RUN":
      panaroo: $(echo $(panaroo --version 2>&1) | sed 's/^.*panaroo //' ))
  END_VERSIONS

Command exit status:
  123

Command output:
  (empty)

Command error:

  gzip: stdin: unexpected end of file

Expected Behavior
Files for bactocpia summary and report being created
Panaroo output being created

Execution Environment

  • Bactopia Version: bactopia 3.0.1
  • OS: Ubuntu
  • Environment: conda
@andreaniml andreaniml added the bug Something isn't working label Apr 20, 2024
@rpetit3
Copy link
Member

rpetit3 commented Apr 20, 2024

Hi @andreaniml

Good news I know the issue for the both of these.

The issue with the bactopia summary is related to changes I made to do with this #487

You are correct! Panaroo is related to this #507

For bactopia summary, I also ran into this and will get the fix into bactopia-py. And for Panaroo, it should just require a rebuild of the dev environment.

I can update soon one the bactopia summary fix.

Thank you letting me know about these!
Robert

@andreaniml
Copy link
Author

Hi! I tried using the dev build, but I faced the same problem while trying to use wf pangenome

(bactopia-dev) malu@srv-Renato-CBMEG:~/Projetos/Brazilian_Flagelin$ bactopia --wf pangenome --bactopia /home/malu/Projetos/Brazilian_Flagelin/Results_Processed/ --use_panaroo --panaroo_opts --remove-invalid-genes --skip_recombination --max_cpus 12 -qs 1 -bg --output-dir Results_Processed
2024-04-20 15:21:12 INFO     2024-04-20 15:21:12:root:INFO - Checking if environment pre-builds are      download.py:544
                             needed (this may take a while if building for the first time)
                    INFO     2024-04-20 15:21:12:root:INFO - Begin prokka create to                      download.py:261
                             /home/malu/.bactopia/conda/bioconda--prokka-1.14.6
^C
Aborted.
^C^C
(bactopia-dev) malu@srv-Renato-CBMEG:~/Projetos/Brazilian_Flagelin$ bactopia-summary --bactopia-path Results_Processed
2024-04-20 15:21:24 INFO     2024-04-20 15:21:24:root:INFO - Found 323 samples in Results_Processed to    summary.py:351
                             process
2024-04-20 15:21:26 WARNING  2024-04-20 15:21:26:root:WARNING - No samples found to process!              summary.py:514
(bactopia-dev) malu@srv-Renato-CBMEG:~/Projetos/Brazilian_Flagelin$ bactopia --wf pangenome --bactopia /home/malu/Projetos/Brazilian_Flagelin/Results_Processed/ --use_panaroo --panaroo_opts --remove-invalid-genes --skip_recombination --max_cpus 12 -qs 1 -bg --output-dir Results_Processed
2024-04-20 15:21:36 INFO     2024-04-20 15:21:36:root:INFO - Checking if environment pre-builds are      download.py:544
                             needed (this may take a while if building for the first time)
                    INFO     2024-04-20 15:21:36:root:INFO - Begin prokka create to                      download.py:261
                             /home/malu/.bactopia/conda/bioconda--prokka-1.14.6
(bactopia-dev) malu@srv-Renato-CBMEG:~/Projetos/Brazilian_Flagelin$ N E X T F L O W  ~  version 23.10.1
Launching `/home/malu/anaconda3/envs/bactopia-dev/share/bactopia-3.0.2/main.nf` [disturbed_swartz] DSL2 - revision: 0cd9f79ba7

WARN: Found unexpected parameters:
* --output-dir: Results_Processed
* --remove-invalid-genes: true
- Ignore this warning: params.schema_ignore_params = "output-dir,remove-invalid-genes"



--------------------------------------------------------------------
   _                _              _         _              _
  | |__   __ _  ___| |_ ___  _ __ (_) __ _  | |_ ___   ___ | |___
  | '_ \ / _` |/ __| __/ _ \| '_ \| |/ _` | | __/ _ \ / _ \| / __|
  | |_) | (_| | (__| || (_) | |_) | | (_| | | || (_) | (_) | \__ \
  |_.__/ \__,_|\___|\__\___/| .__/|_|\__,_|  \__\___/ \___/|_|___/
                            |_|
  bactopia tools pangenome v3.0.2
  Pangenome analysis with optional core-genome phylogeny
--------------------------------------------------------------------
Core Nextflow options
  runName              : disturbed_swartz
  container            : quay.io/bactopia/bactopia:3.0.2
  launchDir            : /home/malu/Projetos/Brazilian_Flagelin
  workDir              : /home/malu/Projetos/Brazilian_Flagelin/work
  projectDir           : /home/malu/anaconda3/envs/bactopia-dev/share/bactopia-3.0.2
  userName             : malu
  profile              : standard
  configFiles          : /home/malu/anaconda3/envs/bactopia-dev/share/bactopia-3.0.2/nextflow.config

Required Parameters
  bactopia             : /home/malu/Projetos/Brazilian_Flagelin/Results_Processed/

Panaroo Parameters
  use_panaroo          : true
  panaroo_opts         : true

Prokka Parameters
  proteins             : /home/malu/anaconda3/envs/bactopia-dev/share/bactopia-3.0.2/data/proteins.faa

ClonalFrameML Parameters
  skip_recombination   : true

Optional Parameters
  outdir               : /home/malu/Projetos/Brazilian_Flagelin/Results_Processed/

Max Job Request Parameters
  max_cpus             : 12
  max_memory           : 128 GB
  max_time             : 10d

Nextflow Profile Parameters
  condadir             : /home/malu/.bactopia/conda
  datasets_cache       : /home/malu/.bactopia/datasets
  singularity_cache_dir: /home/malu/.bactopia/singularity

Helpful Parameters
  wf                   : pangenome

!! Only displaying parameters that differ from the pipeline defaults !!
--------------------------------------------------------------------
If you use bactopia for your analysis please cite:

* Bactopia
  https://doi.org/10.1128/mSystems.00190-20

* The nf-core framework
  https://doi.org/10.1038/s41587-020-0439-x

* Software dependencies
  https://bactopia.github.io/acknowledgements/
--------------------------------------------------------------------
WARN:
WARN: Conda Disclaimer
WARN:
WARN: If you have access to Docker or Singularity, please consider
WARN: running Bactopia using containers. The containers are less
WARN: susceptible to Conda environment related issues (e.g. version
WARN: conflicts) and errors caused by creation of conda environments
WARN: in parallel (use '--max_cpus 1' to over come this error).
WARN:
WARN: To use containers, you can use the profile parameter
WARN:     Docker: -profile docker
WARN:     Singularity: -profile singularity
WARN:
--------------------------------------------------------------------
Found 323 samples to process

If this looks wrong, now's your chance to back out (CTRL+C 3 times).
Sleeping for 5 seconds...
--------------------------------------------------------------------
WARN: The operator `first` is useless when applied to a value channel which returns a single value by definition
[90/781262] Submitted process > BACTOPIATOOLS:PANGENOME:PG_TOOL:PANAROO_RUN (panaroo)
[90/781262] NOTE: Process `BACTOPIATOOLS:PANGENOME:PG_TOOL:PANAROO_RUN (panaroo)` terminated with an error exit status (2) -- Execution is retried (1)
ERROR ~ Error executing process > 'BACTOPIATOOLS:PANGENOME:PG_TOOL:PANAROO_RUN (panaroo)'

Caused by:
  Process requirement exceeds available memory -- req: 128 GB; avail: 118.1 GB

Command executed:

  mkdir gff
  cp -L gff-tmp/* gff/
  find gff/ -name "*.gz" | xargs gunzip

  # Make FOFN of gff (Prokka) and gff3 (Bakta) files
  find gff/ -name "*.gff" -or -name "*.gff3" > gff-fofn.txt

  panaroo \
      --clean-mode strict --threshold 0.98 --family_threshold 0.7 --len_dif_percent 0.98 --alignment core --aligner mafft --core_threshold 0.95 true \
      -t 12 \
      -o results \
      -i gff-fofn.txt

  # Cleanup
  find . -name "*.fas" | xargs -I {} -P 12 -n 1 gzip {}

  if [[ -f "results/core_gene_alignment.aln" ]]; then
      gzip results/core_gene_alignment.aln
      cp results/core_gene_alignment.aln.gz ./core-genome.aln.gz
  fi

  cat <<-END_VERSIONS > versions.yml
  "BACTOPIATOOLS:PANGENOME:PG_TOOL:PANAROO_RUN":
      panaroo: $(echo $(panaroo --version 2>&1) | sed 's/^.*panaroo //' ))
  END_VERSIONS

Command exit status:
  -

Command output:
  (empty)

Command error:
  /home/malu/.bactopia/conda/bioconda--panaroo-1.4.2/lib/python3.10/site-packages/Bio/Application/__init__.py:40: BiopythonDeprecationWarning: The Bio.Application modules and modules relying on it have been deprecated.

  Due to the on going maintenance burden of keeping command line application
  wrappers up to date, we have decided to deprecate and eventually remove these
  modules.

  We instead now recommend building your command line and invoking it directly
  with the subprocess module.
    warnings.warn(
  usage: panaroo [-h] -i INPUT_FILES [INPUT_FILES ...] -o OUTPUT_DIR --clean-mode {strict,moderate,sensitive}
                 [--remove-invalid-genes] [-c ID] [-f FAMILY_THRESHOLD] [--len_dif_percent LEN_DIF_PERCENT]
                 [--merge_paralogs] [--search_radius SEARCH_RADIUS] [--refind_prop_match REFIND_PROP_MATCH]
                 [--refind_strict] [--min_trailing_support MIN_TRAILING_SUPPORT]
                 [--trailing_recursive TRAILING_RECURSIVE] [--edge_support_threshold EDGE_SUPPORT_THRESHOLD]
                 [--length_outlier_support_proportion LENGTH_OUTLIER_SUPPORT_PROPORTION]
                 [--remove_by_consensus {True,False}] [--high_var_flag CYCLE_THRESHOLD_MIN]
                 [--min_edge_support_sv MIN_EDGE_SUPPORT_SV] [--all_seq_in_graph] [--no_clean_edges] [-a {core,pan}]
                 [--aligner {prank,clustal,mafft}] [--codons] [--core_threshold CORE] [--core_subset SUBSET]
                 [--core_entropy_filter HC_THRESHOLD] [-t N_CPU] [--codon-table TABLE] [--quiet] [--version]
  panaroo: error: unrecognized arguments: true

Work dir:
  /home/malu/Projetos/Brazilian_Flagelin/work/76/ee5e9718741f0cc877a8c865e8f8d8

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

 -- Check '.nextflow.log' file for details

    Bactopia Tools: `pangenome Execution Summary
    ---------------------------
    Bactopia Version : 3.0.2
    Nextflow Version : 23.10.1
    Command Line     : nextflow run /home/malu/anaconda3/envs/bactopia-dev//share/bactopia-3.0.2/main.nf -w /home/malu/Projetos/Brazilian_Flagelin/work/ --wf pangenome --bactopia /home/malu/Projetos/Brazilian_Flagelin/Results_Processed/ --use_panaroo --panaroo_opts --remove-invalid-genes --skip_recombination --max_cpus 12 -qs 1 -bg --output-dir Results_Processed
    Resumed          : false
    Completed At     : 2024-04-20T15:29:24.051383436-03:00
    Duration         : 29.9s
    Success          : false
    Exit Code        : null
    Error Report     : Error executing process > 'BACTOPIATOOLS:PANGENOME:PG_TOOL:PANAROO_RUN (panaroo)'

Caused by:
  Process requirement exceeds available memory -- req: 128 GB; avail: 118.1 GB

Command executed:

  mkdir gff
  cp -L gff-tmp/* gff/
  find gff/ -name "*.gz" | xargs gunzip

  # Make FOFN of gff (Prokka) and gff3 (Bakta) files
  find gff/ -name "*.gff" -or -name "*.gff3" > gff-fofn.txt

  panaroo \
      --clean-mode strict --threshold 0.98 --family_threshold 0.7 --len_dif_percent 0.98 --alignment core --aligner mafft --core_threshold 0.95 true \
      -t 12 \
      -o results \
      -i gff-fofn.txt

  # Cleanup
  find . -name "*.fas" | xargs -I {} -P 12 -n 1 gzip {}

  if [[ -f "results/core_gene_alignment.aln" ]]; then
      gzip results/core_gene_alignment.aln
      cp results/core_gene_alignment.aln.gz ./core-genome.aln.gz
  fi

  cat <<-END_VERSIONS > versions.yml
  "BACTOPIATOOLS:PANGENOME:PG_TOOL:PANAROO_RUN":
      panaroo: $(echo $(panaroo --version 2>&1) | sed 's/^.*panaroo //' ))
  END_VERSIONS

Command exit status:
  -

Command output:
  (empty)

Command error:
  /home/malu/.bactopia/conda/bioconda--panaroo-1.4.2/lib/python3.10/site-packages/Bio/Application/__init__.py:40: BiopythonDeprecationWarning: The Bio.Application modules and modules relying on it have been deprecated.

  Due to the on going maintenance burden of keeping command line application
  wrappers up to date, we have decided to deprecate and eventually remove these
  modules.

  We instead now recommend building your command line and invoking it directly
  with the subprocess module.
    warnings.warn(
  usage: panaroo [-h] -i INPUT_FILES [INPUT_FILES ...] -o OUTPUT_DIR --clean-mode {strict,moderate,sensitive}
                 [--remove-invalid-genes] [-c ID] [-f FAMILY_THRESHOLD] [--len_dif_percent LEN_DIF_PERCENT]
                 [--merge_paralogs] [--search_radius SEARCH_RADIUS] [--refind_prop_match REFIND_PROP_MATCH]
                 [--refind_strict] [--min_trailing_support MIN_TRAILING_SUPPORT]
                 [--trailing_recursive TRAILING_RECURSIVE] [--edge_support_threshold EDGE_SUPPORT_THRESHOLD]
                 [--length_outlier_support_proportion LENGTH_OUTLIER_SUPPORT_PROPORTION]
                 [--remove_by_consensus {True,False}] [--high_var_flag CYCLE_THRESHOLD_MIN]
                 [--min_edge_support_sv MIN_EDGE_SUPPORT_SV] [--all_seq_in_graph] [--no_clean_edges] [-a {core,pan}]
                 [--aligner {prank,clustal,mafft}] [--codons] [--core_threshold CORE] [--core_subset SUBSET]
                 [--core_entropy_filter HC_THRESHOLD] [-t N_CPU] [--codon-table TABLE] [--quiet] [--version]
  panaroo: error: unrecognized arguments: true

Work dir:
  /home/malu/Projetos/Brazilian_Flagelin/work/76/ee5e9718741f0cc877a8c865e8f8d8

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line
    Launch Dir       : /home/malu/Projetos/Brazilian_Flagelin
    

@rpetit3
Copy link
Member

rpetit3 commented Apr 21, 2024

Awesome, I can see the new syntax in the error message. There seems to be a lingering "true" value in there. Let me dig into this and I will update soon

@andreaniml
Copy link
Author

If it helps, I tried using the "--remove-invalid-genes" option from panaroo pipeline. I may have got the syntax of the command line wrong, but I think that it tries to give a "true" value to "--panaroo_opts" instead of passing the option itself.

@rpetit3
Copy link
Member

rpetit3 commented Apr 22, 2024

Could you try --panaroo_opts="--remove-invalid-genes" I think without the quotes Nextflow will equate the extra "--" in "--remove-invalid-genes" as another parameter

@andreaniml
Copy link
Author

Looks like it worked! (At least it is running. I will report back when it finishes)

For the bactopia summary, should I clone the bactopia-py repo?

@rpetit3
Copy link
Member

rpetit3 commented Apr 22, 2024

Cool keep me posted on the panaroo bit, if that's the fix I'll be sure to update docs to highlight the need for the --opts="--xyz"

for bactopia-py, I'll get that fixed this morning and update soon

@andreaniml
Copy link
Author

Hi! Seems everything worked out fine on the panaroo end

image

@rpetit3
Copy link
Member

rpetit3 commented Apr 25, 2024

OK! I think this is fixed:

bactopia summary --version
bactopia-summary, version 1.0.9

bactopia summary --bactopia-path bactopia/ --force --verbose
2024-04-25 18:20:22 DEBUG    2024-04-25 18:20:22:root:DEBUG - Creating output directory: ./                                                                                                                                                   summary.py:344
                    INFO     2024-04-25 18:20:22:root:INFO - Found 40 samples in bactopia/ to process                                                                                                                                         summary.py:351
                    DEBUG    2024-04-25 18:20:22:root:DEBUG - Processing kledsiella-rapid3 (/home/robert_petit/aphl-minion-rapid2/bactopia/kledsiella-rapid3)                                                                                 summary.py:356
                    DEBUG    2024-04-25 18:20:22:root:DEBUG - Skipping kledsiella-rapid3 (/home/robert_petit/aphl-minion-rapid2/bactopia/kledsiella-rapid3) due to missing files. Missing:                                                    summary.py:407
                    DEBUG    2024-04-25 18:20:22:root:DEBUG -        /home/robert_petit/aphl-minion-rapid2/bactopia/kledsiella-rapid3/tools/amrfinderplus/kledsiella-rapid3-genes.tsv                                                         summary.py:411
                    DEBUG    2024-04-25 18:20:22:root:DEBUG -        /home/robert_petit/aphl-minion-rapid2/bactopia/kledsiella-rapid3/tools/amrfinderplus/kledsiella-rapid3-proteins.tsv
...

With fixes

bactopia summary --version
bactopia-summary, version 1.1.0

summary --bactopia-path bactopia/ --force
2024-04-25 18:23:18 INFO     2024-04-25 18:23:18:root:INFO - Found 40 samples in bactopia/ to process                                                                                                                                         summary.py:351
2024-04-25 18:23:20 INFO     2024-04-25 18:23:20:root:INFO - Writing report: ./bactopia-report.tsv                                                                                                                                            summary.py:442
                    INFO     2024-04-25 18:23:20:root:INFO - Writing exclusion report: ./bactopia-exclude.tsv                                                                                                                                 summary.py:446
                    INFO     2024-04-25 18:23:20:root:INFO - Writing summary report: ./bactopia-summary.txt

I will get a release out on bactopia-py, then update the bactopia recipe

@rpetit3 rpetit3 added the fixed label Apr 25, 2024
@rpetit3
Copy link
Member

rpetit3 commented Apr 25, 2024

This should now be fixed in dev build

@andreaniml
Copy link
Author

Hi Robert! unfortunately, I am still getting the same bug

bactopia summary --version
bactopia-summary, version 1.1.0
bactopia summary --bactopia-path Results_Processed/ --outdir Results_Processed/
2024-04-25 18:34:34 INFO     2024-04-25 18:34:34:root:INFO - Found 323 samples in Results_Processed/ to process                     summary.py:351
2024-04-25 18:34:57 WARNING  2024-04-25 18:34:57:root:WARNING - No samples found to process!                                        summary.py:514

running with the --force flag still gets me the error

@rpetit3
Copy link
Member

rpetit3 commented Apr 25, 2024

Can you run with --verbose so we can see what it's missing?

@andreaniml
Copy link
Author

andreaniml commented Apr 25, 2024

Here
(not the complete output because it always flags the same files)

(bactopia-dev) malu@srv-Renato-CBMEG:~/Projetos/Brazilian_Flagelin$ bactopia summary --bactopia-path Results_Processed/ --outdir Results_Processed/ --verbose
2024-04-25 19:07:40 DEBUG    2024-04-25 19:07:40:root:DEBUG - Creating output directory: Results_Processed/                                               summary.py:344
                    INFO     2024-04-25 19:07:40:root:INFO - Found 323 samples in Results_Processed/ to process                                           summary.py:351
                    DEBUG    2024-04-25 19:07:40:root:DEBUG - Processing ERX1814181 (/home/malu/Projetos/Brazilian_Flagelin/Results_Processed/ERX1814181) summary.py:356
                    DEBUG    2024-04-25 19:07:40:root:DEBUG - Skipping ERX1814181 (/home/malu/Projetos/Brazilian_Flagelin/Results_Processed/ERX1814181)   summary.py:407
                             due to missing files. Missing:
                    DEBUG    2024-04-25 19:07:40:root:DEBUG -                                                                                             summary.py:411
                             /home/malu/Projetos/Brazilian_Flagelin/Results_Processed/ERX1814181/main/annotator/prokka/ERX1814181.txt
                    DEBUG    2024-04-25 19:07:40:root:DEBUG -                                                                                             summary.py:411
                             /home/malu/Projetos/Brazilian_Flagelin/Results_Processed/ERX1814181/tools/amrfinderplus/ERX1814181.tsv
                    DEBUG    2024-04-25 19:07:40:root:DEBUG - Processing SRX23944083                                                                      summary.py:356
                             (/home/malu/Projetos/Brazilian_Flagelin/Results_Processed/SRX23944083)
                    DEBUG    2024-04-25 19:07:40:root:DEBUG - Skipping SRX23944083 (/home/malu/Projetos/Brazilian_Flagelin/Results_Processed/SRX23944083) summary.py:407
                             due to missing files. Missing:
                    DEBUG    2024-04-25 19:07:40:root:DEBUG -                                                                                             summary.py:411
                             /home/malu/Projetos/Brazilian_Flagelin/Results_Processed/SRX23944083/main/annotator/prokka/SRX23944083.txt
                    DEBUG    2024-04-25 19:07:40:root:DEBUG -                                                                                             summary.py:411
                             /home/malu/Projetos/Brazilian_Flagelin/Results_Processed/SRX23944083/tools/amrfinderplus/SRX23944083.tsv

It thinks I used prokka and now amrfinder outputs two tsv instead of one

(bactopia-dev) malu@srv-Renato-CBMEG:~/Projetos/Brazilian_Flagelin/Results_Processed/ERX1814181/main/annotator/bakta$ ls
ERX1814181-blastdb.tar.gz  ERX1814181.faa.gz  ERX1814181.fna.gz   ERX1814181.gff3.gz               ERX1814181.hypotheticals.tsv  ERX1814181.txt
ERX1814181.embl.gz         ERX1814181.ffn.gz  ERX1814181.gbff.gz  ERX1814181.hypotheticals.faa.gz  ERX1814181.tsv                logs

and

(bactopia-dev) malu@srv-Renato-CBMEG:~/Projetos/Brazilian_Flagelin/Results_Processed/ERX1814181/tools/amrfinderplus$ ls
ERX1814181-genes.tsv  ERX1814181-proteins.tsv  logs

@rpetit3
Copy link
Member

rpetit3 commented Apr 25, 2024

Oh! 🤦 Let me check to see if it's because I don't check for bakta outputs!

@rpetit3
Copy link
Member

rpetit3 commented May 1, 2024

Ok! I think this is now fixed in the latest dev version and bactopia-py 1.1.1

Try a rebuild and make sure bactopia summary is v1.1.1

bactopia summary --version
bactopia-summary, version 1.1.1

Thank you for bringing this to my attention!

@andreaniml
Copy link
Author

Hi ! Almost there, it now can take the bakta output, but it still complains on the amrfinderplus samples where there are two files instead of one.

It tries to find:
/home/malu/Projetos/Brazilian_Flagelin/Results_Processed/ERX1814181/tools/amrfinderplus/ERX1814181.tsv
But such file does not exist, there are two:

~/Projetos/Brazilian_Flagelin/Results_Processed/ERX1814181/tools/amrfinderplus$ ls ERX1814181-genes.tsv ERX1814181-proteins.tsv logs
Thanks!

@rpetit3
Copy link
Member

rpetit3 commented May 1, 2024

Hmmm we have an issue here, bactopia-py no longer supports 2 file AMRFinder results, and the old version which does, does not support bakta.

One simple solution would be to rerun amrfinder plus this would get you the single file output

bactopia --wf amrfinderplus 

@andreaniml
Copy link
Author

Oh, will try this.
I don't know why do I have the 2 file AMRFinder results, but will report back if I find any issues.

Thank you very much for all the help!

@andreaniml
Copy link
Author

Ok, it seems that there may be a problem with my amrfinder.
with the line bactopia --wf amrfinderplus --bactopia Results_Processed/ --outdir Results_Processed/ I get:

[skipped  ] process > BACTOPIATOOLS:DATASETS                                      [100%] 1 of 1, stored: 1 ✔
[86/b98736] process > BACTOPIATOOLS:AMRFINDERPLUS:AMRFINDERPLUS_RUN (ERX3863158)  [  0%] 3 of 326, failed: 3, retries:
executor >  local (6)
[skipped  ] process > BACTOPIATOOLS:DATASETS                                      [100%] 1 of 1, stored: 1 ✔
[86/b98736] process > BACTOPIATOOLS:AMRFINDERPLUS:AMRFINDERPLUS_RUN (ERX3863158)  [  0%] 3 of 326, failed: 3, retries:
executor >  local (8)
[skipped  ] process > BACTOPIATOOLS:DATASETS                                      [100%] 1 of 1, stored: 1 ✔
[36/181033] process > BACTOPIATOOLS:AMRFINDERPLUS:AMRFINDERPLUS_RUN (SRX4997132)  [  1%] 5 of 328, failed: 5, retries:
[-        ] process > BACTOPIATOOLS:AMRFINDERPLUS:CSVTK_CONCAT                    -
[-        ] process > BACTOPIATOOLS:CUSTOM_DUMPSOFTWAREVERSIONS                   -
executor >  local (9)
[skipped  ] process > BACTOPIATOOLS:DATASETS                                      [100%] 1 of 1, stored: 1 ✔
[8e/970cd8] process > BACTOPIATOOLS:AMRFINDERPLUS:AMRFINDERPLUS_RUN (ERX1814186)  [  1%] 6 of 329, failed: 6, retries:
[-        ] process > BACTOPIATOOLS:AMRFINDERPLUS:CSVTK_CONCAT                    -
[-        ] process > BACTOPIATOOLS:CUSTOM_DUMPSOFTWAREVERSIONS                   -
[skipping] Stored process > BACTOPIATOOLS:DATASETS
[7d/dc8cff] NOTE: Process `BACTOPIATOOLS:AMRFINDERPLUS:AMRFINDERPLUS_RUN (SRX10511031)` terminated with an error exit status (1) -- Execution is retried (1)
[2f/afcff4] NOTE: Process `BACTOPIATOOLS:AMRFINDERPLUS:AMRFINDERPLUS_RUN (MB5_S4)` terminated with an error exit status (1) -- Execution is retried (1)
[ca/45ae35] NOTE: Process `BACTOPIATOOLS:AMRFINDERPLUS:AMRFINDERPLUS_RUN (SRX12119349)` terminated with an error exit status (1) -- Execution is retried (1)
[7c/e1cc72] NOTE: Process `BACTOPIATOOLS:AMRFINDERPLUS:AMRFINDERPLUS_RUN (SRX8358195)` terminated with an error exit status (1) -- Execution is retried (1)
[91/33b966] NOTE: Process `BACTOPIATOOLS:AMRFINDERPLUS:AMRFINDERPLUS_RUN (SRX5325940)` terminated with an error exit status (1) -- Execution is retried (1)
[86/b98736] NOTE: Process `BACTOPIATOOLS:AMRFINDERPLUS:AMRFINDERPLUS_RUN (ERX3863158)` terminated with an error exit status (1) -- Execution is retried (1)

I tried building a new bactopia-dev, and got the same error, tried specifying another directory and got the same error.

Checking the .nextflow.log it seems that something gets to run (if I check the work directory, there are files), I am not sure which part of the log could be of use, I suspect is this one (but I can upload the whole file if necessary).

May-01 20:06:01.128 [Task submitter] INFO  nextflow.Session - [6b/265234] Submitted process > BACTOPIATOOLS:AMRFINDERPLUS:AMRFINDERPLUS_RUN (SRX6785048)
May-01 20:06:03.710 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 14; name: BACTOPIATOOLS:AMRFINDERPLUS:AMRFINDERPLUS_RUN (SRX2249379); status: COMPLETED; exit: 1; error: -; workDir: /home/malu/Projetos/Brazilian_Flagelin/work/75/711c03cc75611ad95cb4f8492b0b7b]
May-01 20:06:03.716 [Task monitor] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
  task: name=BACTOPIATOOLS:AMRFINDERPLUS:AMRFINDERPLUS_RUN (SRX2249379); work-dir=/home/malu/Projetos/Brazilian_Flagelin/work/75/711c03cc75611ad95cb4f8492b0b7b
  error [nextflow.exception.ProcessFailedException]: Process `BACTOPIATOOLS:AMRFINDERPLUS:AMRFINDERPLUS_RUN (SRX2249379)` terminated with an error exit status (1)

@rpetit3
Copy link
Member

rpetit3 commented May 1, 2024

Look for .command.log in /home/malu/Projetos/Brazilian_Flagelin/work/75/711c03cc75611ad95cb4f8492b0b7

I suspect this is going to be something related to gff and mismatching contigs names

@andreaniml
Copy link
Author

you are right!

Running: amrfinder --nucleotide SRX2249379.fna --protein SRX2249379.faa --gff SRX2249379.gff3 --annotation_format bakta --plus --ident_min -1 --coverage_min 0.5 --translation_table 11 --database amrfinderplus/ --threads 4 --name SRX2249379
Software directory: '/home/malu/.bactopia/conda/bioconda--ncbi-amrfinderplus-3.12.8/bin/'
Software version: 3.12.8
Database directory: '/home/malu/Projetos/Brazilian_Flagelin/work/75/711c03cc75611ad95cb4f8492b0b7b/amrfinderplus'
Database version: 2024-01-31.1
AMRFinder combined translated and protein search
  - include -O ORGANISM, --organism ORGANISM option to add mutation searches and suppress common proteins

GFF file mismatch.
ESC[1;31m*** ERROR ***ESC[0m
gff_check.cpp: GFF contig id "contig_1" is not in the DNA FASTA file

HOSTNAME: ?
SHELL: /bin/bash
PWD: /home/malu/Projetos/Brazilian_Flagelin/work/75/711c03cc75611ad95cb4f8492b0b7b
PATH: /home/malu/.local/bin:/home/malu/.bactopia/conda/bioconda--ncbi-amrfinderplus-3.12.8/bin:/home/malu/anaconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/home/renato/bin:/home/renato/programas/miniconda3/bin:/snap/bin:/home/malu/anaconda3/envs/BactopiadevNew_try/share/bactopia-3.0.2/bin
Progam name:  gff_check
Command line: /home/malu/.bactopia/conda/bioconda--ncbi-amrfinderplus-3.12.8/bin/gff_check SRX2249379.gff3 -gfftype bakta -prot SRX2249379.faa -dna SRX2249379.fna -log /tmp/amrfinder.YVGoCe/log

I think that bakta does change the name of the contigs automatically (I should have used the option to keep the contig names but forgot), is there anything I can do?

@rpetit3
Copy link
Member

rpetit3 commented May 1, 2024

Let me look into this, I saw this earlier on some of my samples and was hoping it was just the way I ran them.

Did you originally run the samples with Bakta? Or run it as a bactopia tool?

@andreaniml
Copy link
Author

I originally ran the main pipeline, using the "--use_bakta" flag

@rpetit3
Copy link
Member

rpetit3 commented May 1, 2024

great I ran my the same, I will take a look now and see if I can get it figure out

@andreaniml
Copy link
Author

andreaniml commented May 1, 2024

I am not sure if it works like that, but I think that bakta renames the contigs to "contig_1","contig_2" and so on. Maybe if amrfinder was able to look at the fasta headers of the .fna file it could be normalized? (I want to think that the contigs stay in the same order in the gff and fna, but don't know)

I think that probably the easiest way it to rerrun everything with the "--keep-contig-headers" option (bactopia can take additional options for bakta, right?), but I'll leave the thought

@rpetit3
Copy link
Member

rpetit3 commented May 1, 2024

I think I got a fix in place: d80eece haha I will once again ask you to rebuild your dev environment


So you are correct on what is happening.

in the bactopia main pipeline, amrfinder gets the assembly, proteins and gff from either prokka or bakta

In the bactopia-tools, amrfinder gets the assembly from shovill or dragonflye, and the proteins and gff from prokka or bakta

In the above commit, I make sure amrfinder gets the assembly from prokka/bakta. With a quick test on my end, the error went away

@andreaniml
Copy link
Author

Oh nice! Will rebuild here!

@andreaniml
Copy link
Author

Success...?
Looking nice!

image

@rpetit3
Copy link
Member

rpetit3 commented May 1, 2024

I think we got that back working, haha we'll wait to see on bactopia-summary!

@andreaniml
Copy link
Author

It worked!!!

(BactopiadevAgrvaiamr) malu@srv-Renato-CBMEG:~/Projetos/Brazilian_Flagelin$ bactopia summary --bactopia-path Results_Processed/ --outdir Results_Processed/
2024-05-01 21:48:28 INFO     2024-05-01 21:48:28:root:INFO - Found 323 samples in Results_Processed/ to   summary.py:351
                             process
2024-05-01 21:48:56 INFO     2024-05-01 21:48:56:root:INFO - Writing report:                              summary.py:442
                             Results_Processed/bactopia-report.tsv
                    INFO     2024-05-01 21:48:56:root:INFO - Writing exclusion report:                    summary.py:446
                             Results_Processed/bactopia-exclude.tsv
                    INFO     2024-05-01 21:48:56:root:INFO - Writing summary report:                      summary.py:462
                             Results_Processed/bactopia-summary.txt

Thanks a lot for all the help!

@rpetit3
Copy link
Member

rpetit3 commented May 2, 2024

Awesome! Thank you so much for your patience and help in getting this all sorted out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working fixed
Projects
None yet
Development

No branches or pull requests

2 participants