Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce the number of published files from alevin-fry output #178

Open
tomsing1 opened this issue Nov 6, 2022 · 1 comment
Open

Reduce the number of published files from alevin-fry output #178

tomsing1 opened this issue Nov 6, 2022 · 1 comment
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@tomsing1
Copy link

tomsing1 commented Nov 6, 2022

Description of feature

I am using the alevin-fry quantitation method. For downstream analysis, I am mainly interested in the final count matrix, e.g.

  1. The content of the af_quant/alevin directory and
  2. The af_quant/quant.json file

Right now, the workflow publishes lots of other files - some of them very large - as well, e.g. the af_map output directory or the af_quant/alevin/map.collated.rad which can be tens of gigabytes in size for large experiments.

It would be great to be able to whittle down the published files to reduce the size of the pipeline's output. (After all, the intermediate files are still available in the working directory.)

For example, I am running nextflow on AWS Batch with an S3 bucket as the publish directory. It takes many times longer to copy the output files to the bucket than to run the actual workflow (because the publishing is not parallelized*.)

*If there is a way to speed this up, I would love to learn!

@tomsing1 tomsing1 added the enhancement New feature or request label Nov 6, 2022
@grst
Copy link
Member

grst commented Nov 17, 2022

I agree we don't need all these intermediate files. Happy to accept a PR!

We could add an option --save_align_intermediates as in the rnaseq workflow:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants