Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GUNZIP module is confused by repeated patterns in sample name and fasta path. #367

Open
m3hdad opened this issue Apr 26, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@m3hdad
Copy link

m3hdad commented Apr 26, 2024

Description of the bug

A couple of months ago we had a slack thread about how NFCORE_FUNCSCAN:FUNCSCAN:GUNZIP_PYRODIGAL_FNA gets confused with meta.id and gz file path if sample names are repeated along full path to fasta file.

The topic is discussed here on slack.
Fix: Changing sample names solves the problem.

ERROR ~ Error executing process > 'NFCORE_FUNCSCAN:FUNCSCAN:GUNZIP_PYRODIGAL_FNA ([GCA_001438805.1.fna.gz, GCA_001438805.1_ASM143880v1_genomic.fna.gz])'

Caused by:
  Missing output file(s) `GCA_001438805.1.fna GCA_001438805.1_ASM143880v1_genomic.fna.gz` expected by process `NFCORE_FUNCSCAN:FUNCSCAN:GUNZIP_PYRODIGAL_FNA ([GCA_001438805.1.fna.gz, GCA_001438805.1_ASM143880v1_genomic.fna.gz])`

Command executed:

  # Not calling gunzip itself because it creates files
  # with the original group ownership rather than the
  # default one for that user / the work directory
  gzip \
      -cd \
       \
      GCA_001438805.1.fna.gz GCA_001438805.1_ASM143880v1_genomic.fna.gz \
      > GCA_001438805.1.fna GCA_001438805.1_ASM143880v1_genomic.fna.gz

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_FUNCSCAN:FUNCSCAN:GUNZIP_PYRODIGAL_FNA":
      gunzip: $(echo $(gunzip --version 2>&1) | sed 's/^.*(gzip) //; s/ Copyright.*$//')
  END_VERSIONS

Command exit status:
  0

Command output:
  (empty)

Work dir:
  /home/test/.work/e1/6e71e01659781bea8f6deb48144838

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

 -- Check '.nextflow.log' file for details
ERROR ~ Failed to invoke `workflow.onComplete` event handler

 -- Check script '/home/.nextflow/assets/nf-core/funcscan/./workflows/funcscan.nf' at line: 314 or see '.nextflow.log' file for more details

The input file which resulted in this error was:

sample,fasta
GCA_000184535.1,/home/test/genomes/ncbi_dataset/data/GCA_000184535.1/GCA_000184535.1_ASM18453v1_genomic.fna
GCA_000260455.1,/home/test/genomes/ncbi_dataset/data/GCA_000260455.1/GCA_000260455.1_ASM26045v1_genomic.fna
GCA_000615725.1,/home/test/genomes/ncbi_dataset/data/GCA_000615725.1/GCA_000615725.1_ASM61572v1_genomic.fna

changing the input file to the following fixed the issue:

sample,fasta
sample-1,/home/test/genomes/ncbi_dataset/data/GCA_000184535.1/GCA_000184535.1_ASM18453v1_genomic.fna
sample-2,/home/test/genomes/ncbi_dataset/data/GCA_000260455.1/GCA_000260455.1_ASM26045v1_genomic.fna
sample-3,/home/test/genomes/ncbi_dataset/data/GCA_000615725.1/GCA_000615725.1_ASM61572v1_genomic.fna

Command used and terminal output

No response

Relevant files

No response

System information

No response

@m3hdad m3hdad added the bug Something isn't working label Apr 26, 2024
@jfy133
Copy link
Member

jfy133 commented May 15, 2024

Look again, Im' even more confused. I think the issue is coming further upstream, but I'm not sure where at the moment.

I have a suspion there is a faulty join function that is somehow merging the original FASTAs with some processed onat asome point. Then all downstream modules are for some reason get recieving both FASTAs rather than one.

That said, why cahnging the name would affect that I have no idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants