Release Increase efficiency of CAG construction · Golob-Minot/geneshot

This release has been refactored to make the CAG construction process much more efficient. The changes implemented for that goal are:

Saving all gene abundances in Zarr format to speed up the process of reading subsets in each shard
Grouping genes into CAGs before constructing each DataFrame to reduce the total memory burden
Increasing the number of shards used to initiate the CAG construction by 10X

This release has been tested on a real-world dataset, and we found that with this updated version the set of CAGs which were generated closely matched the results of the previous release. In addition, the CAG creation process was able to be spread out in parallel over ~10X more nodes (for the initial step) which had the effect of reducing the total time to answer while keeping the total compute time constant. In other words, with this release my expectation is that the cost of analysis will be unchanged, but the results should be available more quickly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase efficiency of CAG construction