[WIP] Create separate Dask worker numbers for tax func estimation vs. model solution #566

rickecon · 2020-05-13T16:48:38Z

This PR addresses the different memory requirements in the parallelization of tax function estimation operations of OG-USA versus the lower memory requirements of the parallelization in the model solution. Some of these issues are addressed in Issue #562.

OG-USA's parallelization uses the Dask library. The dask.distributed schedule has the following memory cutoffs for each worker.

distributed:
  worker:
    # Fractions of worker memory at which we take action to avoid memory blowup
    # Set any of the lower three values to False to turn off the behavior entirely
    memory:
      target: 0.60  # target fraction to stay below
      spill: 0.70  # fraction at which we spill to disk
      pause: 0.80  # fraction at which we pause worker threads
      terminate: 0.95  # fraction at which we terminate the worker

In the tax function estimation, the memory requirements on each worker range between 3.2 GB and 3.8 GB. This PR calculates separate numbers of workers for the tax function estimation (num_worker_txf) and for the model solution (num_worker_mod). These calculations are made in the OG-USA run start script (run_ogusa_example.py) and are passed as inputs to the execute.py runner() function.

The calculation for the optimal number of workers for a given process is the following.

num_workers = min(N_max, int(np.floor(0.7 * RAM_tot_GB / mem_per_wkr)))

cc: @jdebacker

… model solution

rickecon · 2020-05-13T16:53:32Z

@jdebacker . I am currently running the new run_ogusa_example.py. I get the same error as described in Issue #562 when I go to the tax function estimation in the reform despite setting the num_workers_txf=2.

distributed.worker - WARNING - Memory use is high but worker has no data to store to disk.  Perhaps some other process is leaking memory?  Process memory: 3.60 GB -- Worker memory limit: 4.29 GB
distributed.worker - WARNING - Memory use is high but worker has no data to store to disk.  Perhaps some other process is leaking memory?  Process memory: 3.35 GB -- Worker memory limit: 4.29 GB
distributed.worker - WARNING - Memory use is high but worker has no data to store to disk.  Perhaps some other process is leaking memory?  Process memory: 4.06 GB -- Worker memory limit: 4.29 GB
...
distributed.scheduler.KilledWorker: ('taxcalc_advance-6ba9fc1a-b387-4911-80c6-1ab48caedec8', <Worker 'tcp://127.0.0.1:62856', name: 1, memory: 0, processing: 6>)
distributed.nanny - WARNING - Restarting worker
distributed.nanny - WARNING - Restarting worker
distributed.nanny - WARNING - Restarting worker

I am going to continue working on this PR.

jdebacker · 2020-05-13T16:58:02Z

run_examples/run_ogusa_example.py

+    RAM_stats = psutil.virtual_memory()
+    RAM_total_bytes = RAM_stats.total
+    RAM_total_GB = RAM_total_bytes / 1073741824
+    mem_per_wkr_txf = 3.5  # Memory per worker (GB) in tax function estimation


We may want to make put these amounts for the memory for tax function and memory for model solution in constants.py so they only need to be adjusted in one place.

@jdebacker Agreed. Plus, I just guessed at the mem_per_wkr_mod=0.05 amount. But I know that it is much less than a gigabyte so that the processor constraint rarely binds. I'll go through and profile the memory footprint before I remove the WIP designation of this PR.

jdebacker · 2020-05-13T16:59:52Z

@rickecon This is looking good! Thanks for doing this!

A reminder to as psutils to the environment file.

jdebacker · 2020-05-13T17:08:54Z

@rickecon Re your issue with run_ogusa_example.py. It does surprise me. I'd proceed by making the changes you think are necessary, then running the unit tests. Pay particular attention to those related modules/functions that have changed and esp test_get_micro_data.py. For that, you might use the verbose option and save the log file so that you can review for any warnings like above (which you wouldn't see if the tests pass). Once unit tests check out, go back to the full model run and see if how it works with different processes.

It might be worth while to put the following lines into their own function (e.g., in utils.py):

RAM_stats = psutil.virtual_memory()
     RAM_total_bytes = RAM_stats.total
     RAM_total_GB = RAM_total_bytes / 1073741824
     mem_per_wkr_txf = 3.5  # Memory per worker (GB) in tax function estimation
    client = Client()
     num_workers = min(multiprocessing.cpu_count(), 7)
     print('Number of workers = ', num_workers)
     num_workers_max = int(min(multiprocessing.cpu_count(), 7))
     num_workers_txf = \
         np.minimum(num_workers_max, int(np.floor((0.7 * RAM_total_GB) /
                                                  mem_per_wkr_txf)))
     num_workers_mod = \
         np.minimum(num_workers_max, int(np.floor((0.7 * RAM_total_GB) /
                                                  mem_per_wkr_mod)))

And then write a test for this function to be sure things are working as expected (though I'm not sure off hand how to test with some large data to see that limits are hit or not, but you might just use the Calculator object across a few workers).

codecov-io · 2020-05-13T17:50:26Z

Codecov Report

Merging #566 into master will increase coverage by 0.01%.
The diff coverage is 81.25%.

@@            Coverage Diff             @@
##           master     #566      +/-   ##
==========================================
+ Coverage   84.18%   84.19%   +0.01%     
==========================================
  Files          47       47              
  Lines        6657     6664       +7     
==========================================
+ Hits         5604     5611       +7     
  Misses       1053     1053

Impacted Files	Coverage Δ
ogusa/SS.py	`88.15% <0.00%> (ø)`
ogusa/TPI.py	`76.34% <ø> (ø)`
ogusa/execute.py	`14.58% <ø> (ø)`
ogusa/tests/test_basic.py	`33.33% <ø> (ø)`
ogusa/tests/test_execute.py	`68.18% <ø> (ø)`
ogusa/tests/test_run_ogusa.py	`32.65% <ø> (ø)`
ogusa/tests/test_startyears.py	`50.00% <ø> (ø)`
ogusa/tests/test_txfunc.py	`65.07% <ø> (ø)`
ogusa/tests/test_user_inputs.py	`60.71% <ø> (ø)`
ogusa/tests/test_TPI.py	`59.81% <50.00%> (ø)`
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bd9b897...99857fa. Read the comment docs.

rickecon · 2020-07-07T16:39:25Z

The Compute.Studio parallelization for OG-USA is currently set to 5 workers for both tax function estimation and model run (see line 160 of ./cs-config/cs_config/functions.py). However, @hdoupe added a TODO in lines 161-164 of ./cs-config/cs_config/functions.py saying to set num_workers_txf = 5 and num_workers_mod = 6. So I will add this and see if it passes the Compute.Studio tests.

rickecon · 2020-07-07T16:40:55Z

I also need to make sure that I close the Dask.Client after each tax function estimation before I start a new Client with the baseline or reform simulation.

codecov-commenter · 2020-08-06T07:39:24Z

Codecov Report

Merging #566 into master will increase coverage by 0.18%.
The diff coverage is 81.25%.

@@            Coverage Diff             @@
##           master     #566      +/-   ##
==========================================
+ Coverage   84.00%   84.19%   +0.18%     
==========================================
  Files          47       47              
  Lines        6848     6664     -184     
==========================================
- Hits         5753     5611     -142     
+ Misses       1095     1053      -42

Impacted Files	Coverage Δ
ogusa/SS.py	`88.15% <0.00%> (ø)`
ogusa/TPI.py	`76.34% <ø> (ø)`
ogusa/execute.py	`14.58% <ø> (ø)`
ogusa/tests/test_basic.py	`33.33% <ø> (ø)`
ogusa/tests/test_execute.py	`68.18% <ø> (ø)`
ogusa/tests/test_run_ogusa.py	`32.65% <ø> (ø)`
ogusa/tests/test_startyears.py	`50.00% <ø> (ø)`
ogusa/tests/test_txfunc.py	`65.07% <ø> (-0.53%)`	⬇️
ogusa/tests/test_user_inputs.py	`60.71% <ø> (ø)`
ogusa/tests/test_TPI.py	`59.81% <50.00%> (-0.73%)`	⬇️
... and 19 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7af1c19...0551707. Read the comment docs.

rickecon added 2 commits May 13, 2020 10:10

Update num_workers to separate values for tax function estimation and…

cdee945

… model solution

Updated the percent of available worker memory from 0.6 to 0.7

87fb391

jdebacker reviewed May 13, 2020

View reviewed changes

rickecon added 2 commits May 13, 2020 11:01

Added psutils to environment.yml

4eba911

Fixed misspelling of psutil

99857fa

rickecon mentioned this pull request Jul 7, 2020

OG-USA developer meeting agenda 7/7/20 #580

Closed

This was referenced Jul 14, 2020

OG-USA developer meeting agenda 7/14/20 #581

Closed

OG-USA developer meeting agenda 7/21/20 #597

Closed

rickecon mentioned this pull request Aug 5, 2020

Incorporate memory limit into dask number of workers calculation #562

Open

Merged OG-USA master into branch

ebde8df

Merge branch 'master' into memory

0551707

rickecon closed this Aug 7, 2020

rickecon deleted the memory branch August 7, 2020 21:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Create separate Dask worker numbers for tax func estimation vs. model solution #566

[WIP] Create separate Dask worker numbers for tax func estimation vs. model solution #566

rickecon commented May 13, 2020 •

edited

rickecon commented May 13, 2020 •

edited

jdebacker May 13, 2020

rickecon May 13, 2020

jdebacker commented May 13, 2020

jdebacker commented May 13, 2020

codecov-io commented May 13, 2020 •

edited

rickecon commented Jul 7, 2020

rickecon commented Jul 7, 2020

codecov-commenter commented Aug 6, 2020 •

edited

[WIP] Create separate Dask worker numbers for tax func estimation vs. model solution #566

[WIP] Create separate Dask worker numbers for tax func estimation vs. model solution #566

Conversation

rickecon commented May 13, 2020 • edited

rickecon commented May 13, 2020 • edited

jdebacker May 13, 2020

Choose a reason for hiding this comment

rickecon May 13, 2020

Choose a reason for hiding this comment

jdebacker commented May 13, 2020

jdebacker commented May 13, 2020

codecov-io commented May 13, 2020 • edited

Codecov Report

rickecon commented Jul 7, 2020

rickecon commented Jul 7, 2020

codecov-commenter commented Aug 6, 2020 • edited

Codecov Report

rickecon commented May 13, 2020 •

edited

rickecon commented May 13, 2020 •

edited

codecov-io commented May 13, 2020 •

edited

codecov-commenter commented Aug 6, 2020 •

edited