You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As suggested with #1920 , there might be possibilities to speed up the json serialization. Unfortunately many of the "fast" JSON libraries do not support streaming to a file. Using the native python JSON module seems the best option.
I profiled the current MultiQC and it spends a fair bit of time (30% ish) on dumping the JSON. Essentially it is done three times:
One time to generate the json that is embedded in the HTML.
One time for each report section to check if it is serializable.
One time to write all the report sections to the data file.
Technically it could be possible to create the json dump only once as a in memory gzip blob. That can be encoded with base64 for the embedded html. It can be written decompressed to the data file. That looses the ability however to selectively truncate misbehaving reports for the data file. Using the --no-data-dir option already make sure it is only used once.
In general I think it is not worth the effort, as it is a simple CPU bound problem and this is not much of an issue in the light of very extensive workflows. With the --no-data-dir option the speed is already pretty much optimal. I just want to report my findings here. If I happen to find a JSON library that can actually dump JSON faster while streaming to a file I will report it here.
The text was updated successfully, but these errors were encountered:
Description of feature
As suggested with #1920 , there might be possibilities to speed up the json serialization. Unfortunately many of the "fast" JSON libraries do not support streaming to a file. Using the native python JSON module seems the best option.
I profiled the current MultiQC and it spends a fair bit of time (30% ish) on dumping the JSON. Essentially it is done three times:
Technically it could be possible to create the json dump only once as a in memory gzip blob. That can be encoded with base64 for the embedded html. It can be written decompressed to the data file. That looses the ability however to selectively truncate misbehaving reports for the data file. Using the
--no-data-dir
option already make sure it is only used once.In general I think it is not worth the effort, as it is a simple CPU bound problem and this is not much of an issue in the light of very extensive workflows. With the
--no-data-dir
option the speed is already pretty much optimal. I just want to report my findings here. If I happen to find a JSON library that can actually dump JSON faster while streaming to a file I will report it here.The text was updated successfully, but these errors were encountered: