Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce Sequali and (slightly) FastQC memory footprint #2516

Merged
merged 3 commits into from Apr 30, 2024

Conversation

rhpvorderman
Copy link
Contributor

  • This comment contains a description of changes (with reason)

All I have done for FastQC is delete the huge amount of memory it uses to store the report data for aggregation. Any modules and MultiQC data that runs after the FastQC module can use this memory.

For sequali the changes are much more substantial. I thought hard about using clever methods, but these were always going to complicate the code a lot. Simply loading everything into memory and aggregating later is just so much simpler.

So I did the same as for FastQC, the data is not stored as a class variable but as a normal one that is passed to the class functions. This accomplishes the same as FastQC, when the variable is out of scope, the memory can be used again. On top of that I added a pruning function that removes all the data that is not used by MultiQC from each JSON sample dictionary immediately after loading. This pruning saves massive amounts of memory. According to memray this reduces the amount of memory used by sequali from 600+ MiB to just 150 MiB for 800 reports!

I did not add Sequali: in front of the title as the v1.22 release which contains sequali is not released yet.

Copy link
Member

@vladsavelyev vladsavelyev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect!

That's something I started doing for some modules, e.g. Picard

data_by_sample = dict()
histogram_by_sample = dict()

but didn't get around to complete for all of them.

@vladsavelyev vladsavelyev added this to the MultiQC v1.22: Pydantic milestone Apr 30, 2024
@vladsavelyev vladsavelyev merged commit e534f51 into MultiQC:main Apr 30, 2024
6 checks passed
@rhpvorderman rhpvorderman deleted the sequalirefactor branch April 30, 2024 14:13
@rhpvorderman
Copy link
Contributor Author

Thanks for merging!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants