Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump star version #5529

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open

Bump star version #5529

wants to merge 8 commits into from

Conversation

matthdsm
Copy link
Contributor

@matthdsm matthdsm commented Apr 26, 2024

  • bump star version
  • drop mulled container
  • fix tests
  • arriba: pytest > nf-test

PR checklist

Closes #XXX

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the module conventions in the contribution docs
  • If necessary, include test data in your PR.
  • Remove all TODO statements.
  • Emit the versions.yml file.
  • Follow the naming conventions.
  • Follow the parameters requirements.
  • Follow the input/output options guidelines.
  • Add a resource label
  • Use BioConda and BioContainers if possible to fulfil software requirements.
  • Ensure that the test works with either Docker / Singularity. Conda CI tests can be quite flaky:
    • For modules:
      • nf-core modules test <MODULE> --profile docker
      • nf-core modules test <MODULE> --profile singularity
      • nf-core modules test <MODULE> --profile conda
    • For subworkflows:
      • nf-core subworkflows test <SUBWORKFLOW> --profile docker
      • nf-core subworkflows test <SUBWORKFLOW> --profile singularity
      • nf-core subworkflows test <SUBWORKFLOW> --profile conda

Copy link
Member

@MatthiasZepper MatthiasZepper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no objections against the current implementation, but if you anyway implement a separate input for the index, you could make it optional and have a fallback solution for determining NUM_BASES with bash in the genomegenerate module?

tuple val(meta), path(fasta)
tuple val(meta2), path(gtf)
tuple val(meta) , path(fasta)
tuple val(meta2), path(fai)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you not make this a tuple val(meta) , path(fasta), path(fai) input?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm following the example of a bunch of other modules, AFAIK the guidelines aren't adamant on this. (Can't find where it's specified though)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't mean to suggest that your approach is against the rules.

To me, it just seemed easier to keep the Fasta and its associated index in one channel rather than in two separate (which might get out of sync and become disordered).

But if other modules handle it with separate channels, I am fine with that, too. It does not require some possibly confusing .map() operation to pass it to the tool.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, I agree about that. TBH, when this gets merged, the first thing I do is patch the module and make one tuple of all the args, but that's just for me

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until we have a proper way to deal with it in an automatic way, I'd rather we keep each single reference file separated, otherwise it might get complex and messy.
But I agree that I don't like this too much

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, an index is just the table of contents for the reference, so conceptually not a separate entity. I think, there is simply too little use for it on its own.

Personally, I find it convenient, if it is automatically attached to the source file like 42basepairs does, but I see why this opinionated approach is discouraged for a module. Thus, I think you were correct separating the channels.

END_VERSIONS
"""
} else {
"""
samtools faidx $fasta
NUM_BASES=`gawk '{sum = sum + \$2}END{if ((log(sum)/log(2))/2 - 1 > 14) {printf "%.0f", 14} else {printf "%.0f", (log(sum)/log(2))/2 - 1}}' ${fasta}.fai`
NUM_BASES=`awk '{sum = sum + \$2}END{if ((log(sum)/log(2))/2 - 1 > 14) {printf "%.0f", 14} else {printf "%.0f", (log(sum)/log(2))/2 - 1}}' ${fai}`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, there is no need for a specific awk implementation, since nothing in this code would require gawk specifically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to replace it with your bash script, awk doesn't publish any version info and it breaks the conda tests

@@ -61,8 +59,6 @@ process STAR_GENOMEGENERATE {
cat <<-END_VERSIONS > versions.yml
"${task.process}":
star: \$(STAR --version | sed -e "s/STAR_//g")
samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
gawk: \$(echo \$(gawk --version 2>&1) | sed 's/^.*GNU Awk //; s/, .*\$//')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, if you keep using awk, you also need to publish its version information.

@@ -28,6 +28,14 @@ input:
description: |
Groovy Map containing reference information
e.g. [ id:'test' ]
- fai:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above, I would not define a separate channel for the index input, but rather extend the tuple.

Co-authored-by: Maxime U Garcia <max.u.garcia@gmail.com>
Copy link
Contributor

@adamrtalbot adamrtalbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be two separate PRs (one for Arriba, one for STAR), but hey ho.

Code looks fine, I'm a little concerned that the Docker volume mounts stuff isn't very clear on why it's needed, what it achieves, how it works etc. Future developers will need to know why that exists.

@@ -11,4 +11,4 @@ process {
}

// Fix chown issue for the output star folder
docker.runOptions = '--platform=linux/amd64 -u $(id -u):$(id -g)'
docker.runOptions = '--platform=linux/amd64 -u $(id -u):$(id -g) -e "HOME=${HOME}" -v /etc/passwd:/etc/passwd:ro -v /etc/shadow:/etc/shadow:ro -v /etc/group:/etc/group:ro -v $HOME:$HOME'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof. I'm not a fan of injecting extra config into a test, it makes it deviate from a real use case.

At the very least, can we get some comments on why this is here and what it achieves?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In progress
Development

Successfully merging this pull request may close these issues.

None yet

4 participants