Faster JSON dumping. #2518

rhpvorderman · 2024-04-30T10:05:01Z

Description of feature

As suggested with #1920 , there might be possibilities to speed up the json serialization. Unfortunately many of the "fast" JSON libraries do not support streaming to a file. Using the native python JSON module seems the best option.

I profiled the current MultiQC and it spends a fair bit of time (30% ish) on dumping the JSON. Essentially it is done three times:

One time to generate the json that is embedded in the HTML.
One time for each report section to check if it is serializable.
One time to write all the report sections to the data file.

Technically it could be possible to create the json dump only once as a in memory gzip blob. That can be encoded with base64 for the embedded html. It can be written decompressed to the data file. That looses the ability however to selectively truncate misbehaving reports for the data file. Using the --no-data-dir option already make sure it is only used once.

In general I think it is not worth the effort, as it is a simple CPU bound problem and this is not much of an issue in the light of very extensive workflows. With the --no-data-dir option the speed is already pretty much optimal. I just want to report my findings here. If I happen to find a JSON library that can actually dump JSON faster while streaming to a file I will report it here.

The text was updated successfully, but these errors were encountered:

vladsavelyev · 2024-04-30T13:06:20Z

Thanks for putting up an issue, it helps to have it structured!

We actually have on the roadmap replacing JSON as an intermediate format for data with something like Parquet: #1256

So will be looking into this more when we start working on it. Agree that writing JSON three times is suboptimal can can be improved.

vladsavelyev added the core: back end label Apr 30, 2024

vladsavelyev added this to the MultiQC v1.23 milestone Apr 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster JSON dumping. #2518

Faster JSON dumping. #2518

rhpvorderman commented Apr 30, 2024

vladsavelyev commented Apr 30, 2024 •

edited

Faster JSON dumping. #2518

Faster JSON dumping. #2518

Comments

rhpvorderman commented Apr 30, 2024

Description of feature

vladsavelyev commented Apr 30, 2024 • edited

vladsavelyev commented Apr 30, 2024 •

edited