Skip to content

Increase memory efficiency of `refineCAGs`

Latest
Compare
Choose a tag to compare
@github-actions github-actions released this 21 Sep 16:33

The updates in this minor commit mostly focus on memory optimizations in the refineCAGs step.

The changes which I found to be most helpful for reducing the memory burden of this step was:

  • Updating the cag_membership table (linking genes and CAGs) with new CAG IDs is best done by initializing an entirely new DataFrame
    Building a single dict is much better than concatenating a set of two-column DataFrames
  • The index can be dropped from the table with input CAG membership immediately after reading in each shard
  • This release has only performance improvements over v0.8.2, but no expected differences in the results.