Break up results to save memory #49

ajenhl · 2016-07-04T04:50:23Z

Some operations are memory hogs when operating on large results. For example, diff-reduce starts with one pandas DataFrames for all of the results, and gradually builds up another DataFrame with a subset of those results. When the CSV file for the full results is several G in size, this ends up using a lot of RAM.

It is probably worth breaking the results into chunks where possible, and writing out to disk. So, for example, diff-reduce could append each processed group of results to a file as CSV rather than keeping them in memory.

ajenhl added the enhancement label Jul 4, 2016

ajenhl self-assigned this Jul 4, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Break up results to save memory #49

Break up results to save memory #49

ajenhl commented Jul 4, 2016

Break up results to save memory #49

Break up results to save memory #49

Comments

ajenhl commented Jul 4, 2016