Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Create individual calculators in Compute Studio runs #95

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

andersonfrailey
Copy link
Collaborator

This PR implements the idea @hdoupe proposed in issue #94. I've modified the run function of the TaxBrain object to accept a new argument cs_run that will change when calculator objects are created in compute studio. Right now the tests are failing, but I'm hoping to get that fixed soon.

I'm not sure what the best way to profile memory usage/speed in order to compare performance. Any ideas, @hdoupe?

@hdoupe
Copy link
Collaborator

hdoupe commented Nov 4, 2019

@andersonfrailey This looks like the right idea. I'm inclined to offer something like a generic run_parallel argument instead of cs_run that users (including c/s) can take advantage of if they have the compute resources.

I'll have time to test #95 out this afternoon/early tomorrow and will report back.

@hdoupe
Copy link
Collaborator

hdoupe commented Nov 6, 2019

@andersonfrailey I refactored some of the changes you made in this PR so that functions were passed to dask instead of methods on Taxbrain. For some reason, distributed doesn't work very well when large Python objects like TaxBrain or taxcalc.Calculator are passed to it. To get around this, I moved the methods _cs_run, _cs_static_run, and _cs_dynamic_run to functions in utils.py, and I passed the new utils._cs_run function and its arguments to the dask workers. With 4 workers, each with 6 GB RAM, tb.run(csparams=True) finished in about 30 seconds. Here's the branch with the changes: https://github.com/hdoupe/Tax-Brain/tree/csmemory

(I made these changes mostly to figure out what the bottleneck was, but if you think they are helpful, I'm happy to help clean them up/re-think how they could better fit with Tax-Brain's API)

I ran into the initial issue with passing large python objects to the dask workers again when this code was run in cs_config/functions.py:

delayed_list = []
for year in range(start_year, end_year + 1):
print('delaying for', year)
delay = delayed(nth_year_results)(tb, year, user_mods, fuzz)
delayed_list.append(delay)
results = compute(*delayed_list)

I re-wrote these lines so that they didn't use dask since I wasn't sure how to run them without passing the TaxBrain object:

https://github.com/hdoupe/Tax-Brain/blob/15512a69ab9e357b4ba4b285280935408148cc14/cs-config/cs_config/functions.py#L187-L190

With all of these changes, the total run time is about 150 seconds. This is about the same as the simulation times that we get on Compute Studio now. However, if we can get the table creation code parallelized, then I think we could get the run time down significantly. Without dask, it takes about 9 seconds/year to create the outputs. If we can parallelize that, maybe we can get it down to 30-45 seconds instead of 90 seconds.

I measured this via the cs-config test:

py.test cs-config/cs_config/tests/test_functions.py::TestFunctions1::test_run_model -s -v

I set up the dask workers by opening three terminal tabs:

  1. Terminal to run the tests.
export DASK_SCHEDULER_ADDRESS=127.0.0.1:8786
conda activate taxbrain-dev
py.test py.test cs-config/cs_config/tests/test_functions.py::TestFunctions1::test_run_model -s -v
  1. Terminal for the distributed schedulers
conda activate taxbrain-dev 
dask-scheduler
  1. Terminal for the workers (2 processes, 2 threads per process, 12 GB per process):
conda activate taxbrain-dev
dask-worker 127.0.0.1:8786 --nprocs 2 --nthreads 2 --memory-limit 12G

@hdoupe
Copy link
Collaborator

hdoupe commented Nov 6, 2019

After looking at the run time for tb.run() in serial in #98 (comment), it looks like the parallel version of tb.run() is slower than the serial version by about 10 seconds. I still think we can see some speed ups if we can create the outputs without passing the TaxBrain object through dask. I think this is the case because we'd be able to run the simulations and create the ouputs for a given year without having to serialize the initial results on the workers, bring it back into the main process, and then serialize it and send it back to the workers to create the outputs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants