Reduce Sequali and (slightly) FastQC memory footprint #2516

rhpvorderman · 2024-04-30T09:36:36Z

This comment contains a description of changes (with reason)

All I have done for FastQC is delete the huge amount of memory it uses to store the report data for aggregation. Any modules and MultiQC data that runs after the FastQC module can use this memory.

For sequali the changes are much more substantial. I thought hard about using clever methods, but these were always going to complicate the code a lot. Simply loading everything into memory and aggregating later is just so much simpler.

So I did the same as for FastQC, the data is not stored as a class variable but as a normal one that is passed to the class functions. This accomplishes the same as FastQC, when the variable is out of scope, the memory can be used again. On top of that I added a pruning function that removes all the data that is not used by MultiQC from each JSON sample dictionary immediately after loading. This pruning saves massive amounts of memory. According to memray this reduces the amount of memory used by sequali from 600+ MiB to just 150 MiB for 800 reports!

I did not add Sequali: in front of the title as the v1.22 release which contains sequali is not released yet.

vladsavelyev

Perfect!

That's something I started doing for some modules, e.g. Picard

MultiQC/multiqc/modules/picard/WgsMetrics.py

Lines 17 to 18 in 931de5d

    
           data_by_sample = dict() 
        
           histogram_by_sample = dict()

but didn't get around to complete for all of them.

rhpvorderman · 2024-04-30T14:15:34Z

Thanks for merging!

rhpvorderman added 3 commits April 30, 2024 09:12

Do not store data in module object

b83a485

Remove FastQC permanent data storage

8cd37d3

Prune sequali dictionaries for much reduced memory usage

52cde3d

rhpvorderman mentioned this pull request Apr 30, 2024

Reduce MultiQC memory consumption #2517

Open

7 tasks

vladsavelyev approved these changes Apr 30, 2024

View reviewed changes

vladsavelyev added the module: enhancement label Apr 30, 2024

vladsavelyev added this to the MultiQC v1.22: Pydantic milestone Apr 30, 2024

vladsavelyev merged commit e534f51 into MultiQC:main Apr 30, 2024
6 checks passed

rhpvorderman deleted the sequalirefactor branch April 30, 2024 14:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce Sequali and (slightly) FastQC memory footprint #2516

Reduce Sequali and (slightly) FastQC memory footprint #2516

rhpvorderman commented Apr 30, 2024

vladsavelyev left a comment •

edited

rhpvorderman commented Apr 30, 2024

Reduce Sequali and (slightly) FastQC memory footprint #2516

Reduce Sequali and (slightly) FastQC memory footprint #2516

Conversation

rhpvorderman commented Apr 30, 2024

vladsavelyev left a comment • edited

Choose a reason for hiding this comment

rhpvorderman commented Apr 30, 2024

vladsavelyev left a comment •

edited