-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: update datavzrd to wrapper v3.10.2
datavzrd.smk
#92
base: main
Are you sure you want to change the base?
Conversation
this then uses `datavzrd=2.36.10`
@@ -93,6 +93,9 @@ enrichment: | |||
# the species specified by resources -> ref -> species above | |||
pathway_database: "panther" | |||
|
|||
report: | |||
offer_excel: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
offer_excel: true | |
offer_excel: false |
Depending on the size of the input data this might cause extremely long runtimes as the excel writer in rust requires to read all of the input data into memory before one can write it out:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks for checking the template and for the clarification!
Did you already check for better options for xlsx file writing? Have you for example seen this one:
https://docs.rs/simple-xlsx-writer/latest/simple_xlsx_writer/
At least it doesn't read everything into memory. But not sure, if it will be any quicker and if the interface works for how you parse stuff in datavzrd...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is literally the one we are using right now 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm. Maybe switch to something else, then... 🤔
But yeah, I think this switch of the default here is good enough for now. I'll add a comment warning of the performance hit when activating this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm. Maybe switch to something else, then... 🤔
I am not sure there is something that would allow something like flushing the writer every thousand records for example. But it'd great to have something like that for sure. Might even have to do with excel itself maybe?
But yeah, I think this switch of the default here is good enough for now. I'll add a comment warning of the performance hit when activating this...
Yeah a warning when manually activated sounds good!
based on this very approximate statement, but started with a lower estimate here: pachterlab/sleuth#139 (comment) so we should increase this if we see this failing with out of memory on any real datasets
this then uses
datavzrd=2.36.10