New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bump star version #5529
base: master
Are you sure you want to change the base?
Bump star version #5529
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no objections against the current implementation, but if you anyway implement a separate input for the index, you could make it optional and have a fallback solution for determining NUM_BASES
with bash in the genomegenerate module?
tuple val(meta), path(fasta) | ||
tuple val(meta2), path(gtf) | ||
tuple val(meta) , path(fasta) | ||
tuple val(meta2), path(fai) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you not make this a tuple val(meta) , path(fasta), path(fai)
input?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm following the example of a bunch of other modules, AFAIK the guidelines aren't adamant on this. (Can't find where it's specified though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't mean to suggest that your approach is against the rules.
To me, it just seemed easier to keep the Fasta and its associated index in one channel rather than in two separate (which might get out of sync and become disordered).
But if other modules handle it with separate channels, I am fine with that, too. It does not require some possibly confusing .map()
operation to pass it to the tool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, I agree about that. TBH, when this gets merged, the first thing I do is patch the module and make one tuple of all the args, but that's just for me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Until we have a proper way to deal with it in an automatic way, I'd rather we keep each single reference file separated, otherwise it might get complex and messy.
But I agree that I don't like this too much
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, an index is just the table of contents for the reference, so conceptually not a separate entity. I think, there is simply too little use for it on its own.
Personally, I find it convenient, if it is automatically attached to the source file like 42basepairs does, but I see why this opinionated approach is discouraged for a module. Thus, I think you were correct separating the channels.
END_VERSIONS | ||
""" | ||
} else { | ||
""" | ||
samtools faidx $fasta | ||
NUM_BASES=`gawk '{sum = sum + \$2}END{if ((log(sum)/log(2))/2 - 1 > 14) {printf "%.0f", 14} else {printf "%.0f", (log(sum)/log(2))/2 - 1}}' ${fasta}.fai` | ||
NUM_BASES=`awk '{sum = sum + \$2}END{if ((log(sum)/log(2))/2 - 1 > 14) {printf "%.0f", 14} else {printf "%.0f", (log(sum)/log(2))/2 - 1}}' ${fai}` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, there is no need for a specific awk
implementation, since nothing in this code would require gawk
specifically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to replace it with your bash script, awk
doesn't publish any version info and it breaks the conda tests
@@ -61,8 +59,6 @@ process STAR_GENOMEGENERATE { | |||
cat <<-END_VERSIONS > versions.yml | |||
"${task.process}": | |||
star: \$(STAR --version | sed -e "s/STAR_//g") | |||
samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') | |||
gawk: \$(echo \$(gawk --version 2>&1) | sed 's/^.*GNU Awk //; s/, .*\$//') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, if you keep using awk
, you also need to publish its version information.
@@ -28,6 +28,14 @@ input: | |||
description: | | |||
Groovy Map containing reference information | |||
e.g. [ id:'test' ] | |||
- fai: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above, I would not define a separate channel for the index input, but rather extend the tuple.
Co-authored-by: Maxime U Garcia <max.u.garcia@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be two separate PRs (one for Arriba, one for STAR), but hey ho.
Code looks fine, I'm a little concerned that the Docker volume mounts stuff isn't very clear on why it's needed, what it achieves, how it works etc. Future developers will need to know why that exists.
@@ -11,4 +11,4 @@ process { | |||
} | |||
|
|||
// Fix chown issue for the output star folder | |||
docker.runOptions = '--platform=linux/amd64 -u $(id -u):$(id -g)' | |||
docker.runOptions = '--platform=linux/amd64 -u $(id -u):$(id -g) -e "HOME=${HOME}" -v /etc/passwd:/etc/passwd:ro -v /etc/shadow:/etc/shadow:ro -v /etc/group:/etc/group:ro -v $HOME:$HOME' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oof. I'm not a fan of injecting extra config into a test, it makes it deviate from a real use case.
At the very least, can we get some comments on why this is here and what it achieves?
PR checklist
Closes #XXX
versions.yml
file.label
nf-core modules test <MODULE> --profile docker
nf-core modules test <MODULE> --profile singularity
nf-core modules test <MODULE> --profile conda
nf-core subworkflows test <SUBWORKFLOW> --profile docker
nf-core subworkflows test <SUBWORKFLOW> --profile singularity
nf-core subworkflows test <SUBWORKFLOW> --profile conda