Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename dropq directory/files as tbi and refactor run_nth_year_*_model functions #1577

Merged
merged 12 commits into from
Oct 10, 2017
Merged

Rename dropq directory/files as tbi and refactor run_nth_year_*_model functions #1577

merged 12 commits into from
Oct 10, 2017

Conversation

martinholmer
Copy link
Collaborator

@martinholmer martinholmer commented Oct 2, 2017

The goal of this pull request is to provide TaxBrain with a simple interface to Tax-Calculator that can handle cps.csv input as well as puf.csv input. So, this pull request is an attempt to resolve PolicyBrain issue 668. While this pull request does not attempt to deal with CPS benefits information, it should provide a foundation for doing that. So, that means that pending pull request #1500 needs to be coordinated with the changes in this pull request.

Note that this pull request changes the public API of Tax-Calculator (from the point of view of TaxBrain only).
In addition to the changes in the two run_nth_year_*_model functions, the create_json_table function has been renamed create_dict_table because the function converts a dataframe table into a dictionary. It has never created a JSON table, so the old name was highly misleading.

Despite the changes in the public API, there are no changes in tax-calculating logic or in tax results.

@MattHJensen @feenberg @Amy-Xu @andersonfrailey @hdoupe @GoFroggyRun @brittainhard

@codecov-io
Copy link

codecov-io commented Oct 2, 2017

Codecov Report

Merging #1577 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@          Coverage Diff           @@
##           master   #1577   +/-   ##
======================================
  Coverage     100%    100%           
======================================
  Files          37      37           
  Lines        2731    2731           
======================================
  Hits         2731    2731
Impacted Files Coverage Δ
taxcalc/taxcalcio.py 100% <ø> (ø) ⬆️
taxcalc/records.py 100% <ø> (ø) ⬆️
taxcalc/macro_elasticity.py 100% <100%> (ø) ⬆️
taxcalc/__init__.py 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 408f82d...8561084. Read the comment docs.

@martinholmer martinholmer changed the title Rename dropq directory/files as tbi and refactor run_nth_year*model functions Rename dropq directory/files as tbi and refactor run_nth_year_*_model functions Oct 2, 2017
@hdoupe
Copy link
Collaborator

hdoupe commented Oct 3, 2017

@martinholmer I just briefly looked through this PR. This looks great. Thanks for knocking this out quickly. I'll take a closer look tomorrow.

@martinholmer
Copy link
Collaborator Author

martinholmer commented Oct 3, 2017

The refactored run_nth_year_tax_calc_model function handles the reading of the specified input file, but does not maintain a cache of input file contents to speed execution. On a middle-aged iMac, the puf-read-time is about 2.5 seconds, which is a relatively small part of the overall run_nth_year_tax_calc_model function run time of 66 to 67 seconds.

So, it seems as if the speed-up benefits of maintaining a cache might not exceed the complexity costs of maintaining a cache. This issue can be reconsidered after pull request #1577 has been tested with TaxBrain.

@MattHJensen @feenberg @Amy-Xu @andersonfrailey @hdoupe @GoFroggyRun @brittainhard

P.S. The timing statistics above were generated for the second call to the run_nth_year_tax_calc_model function where the puf input data was in the puf.csv.gz file located in the current working directory, which is my understanding of where it would be located when running computational servers on multiple AWS instances.

input_path = 'puf.csv.gz'
if not os.path.isfile(input_path):
# otherwise try local Tax-Calculator deployment path
input_path = os.path.join(tbi_path, '..', '..', 'puf.csv')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martinholmer Currently, the PUF is sent to taxcalc as a pandas dataframe. I'm not sure how to generally specify the PUF path relative to this file. It may be easier for use_puf_not_cps to be a keyword argument where it is the PUF data frame if the user chooses the PUF file option and None or False otherwise.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hdoupe said:

Currently, the PUF is sent to taxcalc as a pandas dataframe. I'm not sure how to generally specify the PUF path relative to this file.

The newest tbi_utils.py code tries to anticipate where the input files will be located.
The following code fragment from PolicyBrain/webapp/apps/taxbrain/tasks.py suggests to me that puf.csv.gz will be in the current working directory:

def get_tax_results_async(mods, inputs_pk):
    print "mods is ", mods
    user_mods = package_up_vars(mods)
    print "user_mods is ", user_mods
    print "begin work"
    cur_path = os.path.abspath(os.path.dirname(__file__))
    tax_dta = pd.read_csv("puf.csv.gz", compression='gzip')
    mY_dec, mX_dec, df_dec, mY_bin, mX_bin, df_bin, fiscal_tots = dropq.run_models(tax_dta,
        num_years=NUM_BUDGET_YEARS, user_mods={START_YEAR:user_mods})

And that is where the tbi_utils.py code looks for it.
Does this make sense?

@martinholmer
Copy link
Collaborator Author

The philosophy guiding the refactoring in pull request #1577 is that TaxBrain should be able to just tell Tax-Calculator what it wants to do and then have Tax-Calculator do it and return the results. Expecting TaxBrain to read input files and draw quick-calculation subsamples (which TaxBrain has been doing incorrectly for many months --- see unresolved Policy Brain issue 574, which has been open since June 28th) does not seem like a sensible strategy.

@MattHJensen @hdoupe @GoFroggyRun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update TaxBrain and Tax-Calculator to handle UBI and CPS specific inputs
3 participants