Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected behaviour of cat/cat #5516

Open
2 tasks done
k1sauce opened this issue Apr 22, 2024 · 0 comments
Open
2 tasks done

Unexpected behaviour of cat/cat #5516

k1sauce opened this issue Apr 22, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@k1sauce
Copy link
Contributor

k1sauce commented Apr 22, 2024

Have you checked the docs?

Description of the bug

-[ClearNote-Health/Overlord] Pipeline completed with errors-
ERROR ~ Error executing process > 'OVERLORD:CAT_CAT (1)'

Caused by:
  The name of the input file can't be the same as for the output prefix in the module CAT_CAT (currently `xxxxx.1.aaaaa.003.bed.gz`). Please choose a different one. -- Check script 'workflows/../modules/nf-core/cat/cat/main.nf' at line: 43

I am attempting to merge several gz BED files with cat/cat. For reference here is what my input channel looks like

[DUMP: beds] [['id':'xxxxx.2.aaaaa', 'data_type':'fastq', 'paired_end':true, 'filter':'pass', 'genome':'R64-1-1'], [/Users/kyle/Projects/cnh-overlord/work/0d/4c52846b6116972a997695b3030925/xxxxx.2.aaaaa.004.bed.gz, /Users/kyle/Projects/cnh-overlord/work/ca/d64b77b025a6f9bb672a0341e5ccb4/xxxxx.2.aaaaa.001.bed.gz, /Users/kyle/Projects/cnh-overlord/work/75/029ecfb361c820744709ca420ef9ba/xxxxx.2.aaaaa.002.bed.gz, /Users/kyle/Projects/cnh-overlord/work/c9/00d16b1f8434d476eb2e3461e6146b/xxxxx.2.aaaaa.003.bed.gz, /Users/kyle/Projects/cnh-overlord/work/af/2863c6c4b2d7d6e45dce8a92263ce3/xxxxx.2.aaaaa.000.bed.gz]]

As you can see I would expect the output file to be named like xxxxx.2.aaaaa.bed.gz. However the cat_cat process is failing because it attempts to create an output file called xxxxx.1.aaaaa.003.bed.gz which has the same name of an input file (thankful that check exists!).

The cause of this error is the function defined in the module named getFileSuffix. This function incorrectly returns the suffix as.004.bed.gz, etc. because it attempts to find the suffix based on the first input file. I expect that it would return .bed.gz.

The stated intention of the function is "for .gz files also include the second to last extension if it is present. E.g., .fasta.gz"

The regex matcher in this function does not look right to me, it contains 3 capture groups (1 outer and 2 inner) and always grabs the outer. Also the regex looks for suffix patterns that are 1 to 5 chars long. This does not match the functions stated intention nor should there be an artificial limit.

I would also like to point out that I think the use of $prefix is not ideal in this module. The output of the cat command is directed to $prefix though prefix should default to the meta.id. I do not think it is good practice to manipulate it outside the context of the task.ext.prefix. I think it may be better to direct the output to a file that concatenates the $prefix and $suffix strings.

Command used and terminal output

No response

Relevant files

No response

System information

No response

@k1sauce k1sauce added the bug Something isn't working label Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: No status
Development

No branches or pull requests

1 participant