Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea for improving speed and memory usage #94

Open
hdoupe opened this issue Oct 31, 2019 · 4 comments
Open

Idea for improving speed and memory usage #94

hdoupe opened this issue Oct 31, 2019 · 4 comments

Comments

@hdoupe
Copy link
Collaborator

hdoupe commented Oct 31, 2019

Tax-Brain has been somewhat limited on Compute Studio because it has hit memory problems when running the calculations for each year in parallel. Now that C/S supports dask clusters, we should see how much of a speed up we can get for Tax-Brain. In OG-USA, @jdebacker found that passing a Calculator object from one process to another using the distributed client causes memory problems, but things work fine if you create the calculator object in the process where the calculations will be run and just advance it to the correct year there (https://github.com/PSLmodels/OG-USA/pull/496#issuecomment-542953090). So, my question is: Can this approach work for Tax-Brain, too?

@hdoupe hdoupe changed the title Idea for fixing memory problems Idea for improving speed and memory usage Oct 31, 2019
@andersonfrailey
Copy link
Collaborator

@hdoupe, I'm definitely down to try this approach. If I'm understanding the process you're describing correctly, what we'd need to do is create a new function in calculator that we will create each calculator object, advance/run that calculator, then pass all the results back for aggregation/presentation. Does that sound about right?

@hdoupe
Copy link
Collaborator Author

hdoupe commented Nov 1, 2019

Yep, you got it.

@andersonfrailey
Copy link
Collaborator

Sweet. Definitely down to give it a shot. Do you think this would cause any issues for users running Tax-Brain locally? That would be a lot of calculator creation for a personal computer to handle. Maybe we could add an argument to the run method of TaxBrain that would either run Tax-Brain as it currently runs (only two calculators created) or in this new method, depending on its argument. This might make maintenance a tad bit tougher, but I don't think it'd be a significant challenge.

@jdebacker
Copy link
Member

@andersonfrailey You should be able to have this work well locally and on C/S. You can have an argument for the Dask client and have it default to None. Tax-Brain users running on their own machines may never touch it, but you can set it to what you want for Compute-Studio runs.

e.g. in OG-USA's execute.runner() function:

def runner(output_base, baseline_dir, test=False, time_path=True,
           baseline=True, iit_reform={}, og_spec={}, guid='',
           run_micro=True, tax_func_path=None, data=None, client=None,
           num_workers=1):

We calling functions like this for Compute Studio, we create a client in functions.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants