Increase memory efficiency of `refineCAGs`

Latest

Latest

github-actions released this 21 Sep 16:33

The updates in this minor commit mostly focus on memory optimizations in the refineCAGs step.

The changes which I found to be most helpful for reducing the memory burden of this step was:

Updating the cag_membership table (linking genes and CAGs) with new CAG IDs is best done by initializing an entirely new DataFrame
Building a single dict is much better than concatenating a set of two-column DataFrames
The index can be dropped from the table with input CAG membership immediately after reading in each shard
This release has only performance improvements over v0.8.2, but no expected differences in the results.

Assets 3