-
-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Create separate Dask worker numbers for tax func estimation vs. model solution #566
Conversation
@jdebacker . I am currently running the new
I am going to continue working on this PR. |
RAM_stats = psutil.virtual_memory() | ||
RAM_total_bytes = RAM_stats.total | ||
RAM_total_GB = RAM_total_bytes / 1073741824 | ||
mem_per_wkr_txf = 3.5 # Memory per worker (GB) in tax function estimation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to make put these amounts for the memory for tax function and memory for model solution in constants.py
so they only need to be adjusted in one place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jdebacker Agreed. Plus, I just guessed at the mem_per_wkr_mod=0.05
amount. But I know that it is much less than a gigabyte so that the processor constraint rarely binds. I'll go through and profile the memory footprint before I remove the WIP designation of this PR.
@rickecon This is looking good! Thanks for doing this! A reminder to as |
@rickecon Re your issue with It might be worth while to put the following lines into their own function (e.g., in
And then write a test for this function to be sure things are working as expected (though I'm not sure off hand how to test with some large data to see that limits are hit or not, but you might just use the |
Codecov Report
@@ Coverage Diff @@
## master #566 +/- ##
==========================================
+ Coverage 84.18% 84.19% +0.01%
==========================================
Files 47 47
Lines 6657 6664 +7
==========================================
+ Hits 5604 5611 +7
Misses 1053 1053
Continue to review full report at Codecov.
|
The Compute.Studio parallelization for OG-USA is currently set to 5 workers for both tax function estimation and model run (see line 160 of ./cs-config/cs_config/functions.py). However, @hdoupe added a TODO in lines 161-164 of ./cs-config/cs_config/functions.py saying to set |
I also need to make sure that I close the |
Codecov Report
@@ Coverage Diff @@
## master #566 +/- ##
==========================================
+ Coverage 84.00% 84.19% +0.18%
==========================================
Files 47 47
Lines 6848 6664 -184
==========================================
- Hits 5753 5611 -142
+ Misses 1095 1053 -42
Continue to review full report at Codecov.
|
This PR addresses the different memory requirements in the parallelization of tax function estimation operations of OG-USA versus the lower memory requirements of the parallelization in the model solution. Some of these issues are addressed in Issue #562.
OG-USA's parallelization uses the Dask library. The
dask.distributed
schedule has the following memory cutoffs for each worker.In the tax function estimation, the memory requirements on each worker range between 3.2 GB and 3.8 GB. This PR calculates separate numbers of workers for the tax function estimation (
num_worker_txf
) and for the model solution (num_worker_mod
). These calculations are made in the OG-USA run start script (run_ogusa_example.py
) and are passed as inputs to theexecute.py
runner()
function.The calculation for the optimal number of workers for a given process is the following.
cc: @jdebacker