Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CBO baseline update #412

Merged
merged 18 commits into from Aug 6, 2022
Merged

CBO baseline update #412

merged 18 commits into from Aug 6, 2022

Conversation

bodiyang
Copy link
Collaborator

@bodiyang bodiyang commented Jun 23, 2022

This PR updates Tax Data to the May 2022 CBO economic projections.

The update process follows Tax Data CBO Baseline Updating Instructions

However, the algorithm has been out of date and needs to be changed each year based on the format of the projection forms. Future updates for the Tax Data CBO Baseline updates can either be completed following the Updating Instructions with fair amount of code change or completed manually in the CBO_baseline.csv.

@bodiyang bodiyang changed the title CBO baseline update WIP_CBO baseline update Jun 24, 2022
@bodiyang
Copy link
Collaborator Author

@andersonfrailey Hi Anderson, the update of CBO Baseline, May 2022, has been completed in this PR. Can you have a check if everything looks ok to merge?

@bodiyang bodiyang changed the title WIP_CBO baseline update CBO baseline update Jun 27, 2022
@bodiyang bodiyang closed this Jun 27, 2022
@bodiyang bodiyang reopened this Jun 27, 2022
@andersonfrailey
Copy link
Collaborator

Thanks for working on this, @bodiyang! Can you see how this would affect our projections and generate a report? You can do so following the instructions here.

Also, you're right about the auto-updating scripts being out of date. I tried running them the other day and I think a few things have changed on CBO's side that broke the code. I'll see if there's a way to make them work again or if it's better to go back to doing things by hand.

@bodiyang
Copy link
Collaborator Author

Thanks for working on this, @bodiyang! Can you see how this would affect our projections and generate a report? You can do so following the instructions here.

Also, you're right about the auto-updating scripts being out of date. I tried running them the other day and I think a few things have changed on CBO's side that broke the code. I'll see if there's a way to make them work again or if it's better to go back to doing things by hand.

Have generated the report of 2022 in the last commit.

@andersonfrailey
Copy link
Collaborator

Thanks! One more thing that I forgot about yesterday. Did you run the make all command to recalculate all of the growth rates and weights? We should see some changes there with the new projections. Since a new year is being added to the projections, you'll also need to to update the stage 1, 2, and 3 scripts. Instructions for that are at the bottom of the CBO updating instructions doc.

@bodiyang
Copy link
Collaborator Author

bodiyang commented Jun 30, 2022

Thanks! One more thing that I forgot about yesterday. Did you run the make all command to recalculate all of the growth rates and weights? We should see some changes there with the new projections. Since a new year is being added to the projections, you'll also need to to update the stage 1, 2, and 3 scripts. Instructions for that are at the bottom of the CBO updating instructions doc.

Trying to make it run right now~ Got an error from running make all command, looks like the puf2011.csv file is missing. Has it been deleted or changes need to be made in the createpuf.py?

"/Users/bodiyang/Desktop/taxdata/taxdata/createpuf.py", line 97, in
puf2011 = pd.read_csv(Path(DATA_PATH, "puf2011.csv"))

@andersonfrailey
Copy link
Collaborator

andersonfrailey commented Jul 1, 2022

Do you have access to the raw PUF? That's the file it's looking for here.

If not, just run the make command for the CPS file, I can merge this, then do the PUF next

@bodiyang bodiyang closed this Jul 6, 2022
@bodiyang bodiyang reopened this Jul 6, 2022
@bodiyang
Copy link
Collaborator Author

bodiyang commented Jul 7, 2022

issue: when running cps_stage2/stage2.py get the error as~

The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/bodiyang/Desktop/taxdata/taxdata/cps_stage2/stage2.py", line 106, in
main()
File "/Users/bodiyang/Desktop/taxdata/taxdata/cps_stage2/stage2.py", line 65, in main
factor_match = _factors[year].equals(CUR_FACTORS[year])
File "/opt/miniconda3/lib/python3.9/site-packages/pandas/core/frame.py", line 3455, in getitem
indexer = self.columns.get_loc(key)
File "/opt/miniconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
raise KeyError(key) from err
KeyError: 2032

---> at line 37
CUR_FACTORS = pd.read_csv(
"https://raw.githubusercontent.com/PSLmodels/taxdata/master/puf_stage1/Stage_I_factors.csv",
index_col=0,)

The problem of this error might be because the Stage_I_factors.csv is not updated in this link, which does not include the year 2032. Shall we have a try to merge the PR and see if this link will be updated with 2032 values? to pass the stage2.py script

@andersonfrailey
Copy link
Collaborator

@bodiyang, I think your diagnosis is right, but let's do a quick fix instead of merging and then fixing just because I don't want the weights in the repo to be out of step with the projections and that would cause some issues with creating weights. If you update lines 65 and 66 to read as the following:

try:
    factor_match = _factors[year].equals(CUR_FACTORS[year])
except KeyError:
    factor_match = False
try:
    target_match = stage_2_targets[f"{year}"].equals(CUR_TARGETS[f"{year}"])
except KeyError:
    target_match = False

Similarly, lines 62 and 63 in puf_stage2/stage2.py should be changed to

try:
    factor_match = Stage_I_factors[i].equals(CUR_FACTORS[i])
except KeyError:
    factor_match = False
try:
    target_match = Stage_II_targets[f"{i}"].equals(CUR_TARGETS[f"{i}"])
except KeyError:
    target_match = False

factor_match and target_match are just used to see if we can skip creating weights for a given year, by setting both to false if a year doesn't appear in our current projections we'd be telling the program that weights do need to be created for that year.

This is a good catch too because it's a problem that'll pop up for every CBO update. I probably should've thought of it when I first wrote those lines so sorry about that! But if you add those lines in it shouldn't be an issue any more.

@bodiyang
Copy link
Collaborator Author

@bodiyang, I think your diagnosis is right, but let's do a quick fix instead of merging and then fixing just because I don't want the weights in the repo to be out of step with the projections and that would cause some issues with creating weights. If you update lines 65 and 66 to read as the following:

try:
    factor_match = _factors[year].equals(CUR_FACTORS[year])
except KeyError:
    factor_match = False
try:
    target_match = stage_2_targets[f"{year}"].equals(CUR_TARGETS[f"{year}"])
except KeyError:
    target_match = False

Similarly, lines 62 and 63 in puf_stage2/stage2.py should be changed to

try:
    factor_match = Stage_I_factors[i].equals(CUR_FACTORS[i])
except KeyError:
    factor_match = False
try:
    target_match = Stage_II_targets[f"{i}"].equals(CUR_TARGETS[f"{i}"])
except KeyError:
    target_match = False

factor_match and target_match are just used to see if we can skip creating weights for a given year, by setting both to false if a year doesn't appear in our current projections we'd be telling the program that weights do need to be created for that year.

This is a good catch too because it's a problem that'll pop up for every CBO update. I probably should've thought of it when I first wrote those lines so sorry about that! But if you add those lines in it shouldn't be an issue any more.

Thanks Anderson, solved this in another PR

@bodiyang
Copy link
Collaborator Author

bodiyang commented Jul 15, 2022

@andersonfrailey Have solved the bugs and everything looks good to me right now. Able to create CPS files by running make all as way of testing. Do you think we are ready to merge now?

@andersonfrailey
Copy link
Collaborator

Except for the test failures this is looking pretty good. But I don't really understand why the CPS projections don't change at all. I wouldn't expect much of a difference, but for there to be no difference feels fishy. I don't know if we can't get to a definitive answer as to why that's happening, but do you have any guesses?

@bodiyang
Copy link
Collaborator Author

bodiyang commented Jul 18, 2022

Thanks Anderson. Have fixed the testing problems in the last commit;

Can you expand more on

But I don't really understand why the CPS projections don't change at all.

Where/which point makes you think the CPS projections don't change? (I'm not sure is there a CPS projection file would be automated in taxdata and should reflect some changes there?)

@andersonfrailey
Copy link
Collaborator

@bodiyang, there's a table in the PDF report that shows the year-by-year projections for the CPS and those are the same for both the old and new file. I also ran them through taxcalc myself just to verify and got the same results.

It seems pretty unlikely to me that the exact same weights would be generated for each year after the CBO updates, but if after checking we can't find something that's wrong I guess we'll just have to accept it

@bodiyang
Copy link
Collaborator Author

bodiyang commented Jul 19, 2022

Got it, I will have another check on this to see if can figure out if anything goes wrong.

@bodiyang
Copy link
Collaborator Author

bodiyang commented Jul 20, 2022

follow up note:
Comparison between base CPS value and new CPS value is constructed by report.py line 184 ~ line 204, based upon cps.csv.gz and cps_weights.csv.gz

For cps.csv.gz, nothing is updated; For cps_weights.csv.gz, the value of 2032 is added, the value of all previous years is not changed.

This is the reason why CPS projections don't change at all.

So our issue can be narrowed down to check why cps.csv.gz and cps_weights.csv.gz remained unchanged or should they expected to be unchanged.

I have checked the previous PR, these two files had been changed/updated. However in this update, these two files remained the same (year before 2031).

cc @andersonfrailey @MattHJensen @jdebacker

@bodiyang
Copy link
Collaborator Author

bodiyang commented Jul 26, 2022

follow up note: Comparison between base CPS value and new CPS value is constructed by report.py line 184 ~ line 204, based upon cps.csv.gz and cps_weights.csv.gz

For cps.csv.gz, nothing is updated; For cps_weights.csv.gz, the value of 2032 is added, the value of all previous years is not changed.

This is the reason why CPS projections don't change at all.

So our issue can be narrowed down to check why cps.csv.gz and cps_weights.csv.gz remained unchanged or should they expected to be unchanged.

I have checked the previous PR, these two files had been changed/updated. However in this update, these two files remained the same (year before 2031).

cc @andersonfrailey @MattHJensen @jdebacker

@andersonfrailey
I made a mistake in this previous conversation. So reexplain the issue here:
[cps.csv.gz] (https://github.com/PSLmodels/taxdata/blob/master/data/cps.csv.gz) and cps_weights.csv.gz have been updated. report.py line 184 ~ line 204 will compare the old CPS and the new CPS based on the old cps.csv.gz, old cps_weights.csv.gz with the new cps.csv.gz, new cps_weights.csv.gz.

So to speak, this problem is basically base files cps.csv.gz and cps_weights.csv.gz have been changed/updated, while the resulting CPS projections remain unchanged.

Have discussed with @jdebacker @MattHJensen in the PSL meeting, and consider this is probably the problem of how taxdata generates report.

I will conduct more investigation into this issue in report.py

@bodiyang
Copy link
Collaborator Author

bodiyang commented Jul 27, 2022

@andersonfrailey @jdebacker @MattHJensen

Have run the code by hand in report.py. The issue why CBO projections showing no difference in the report is related to the decimal places. There are actually calculated differences, but too small to show up.

For example the Current Payroll versus the New Payroll of the year 2023 in baseline CPS and new CPS are
1375.1514275498105 and 1375.1514277266267; In the report, both of them appear to be 1375.2

Full detailed results can be referred to values.docx
99)

So to speak, CBO projections indeed have changed because of the CBO update, while the differences are very small from this year's CBO update.

We can then merge this PR, if think it's all right.

@bodiyang
Copy link
Collaborator Author

bodiyang commented Aug 2, 2022

Generated a new report with PUF

@bodiyang
Copy link
Collaborator Author

bodiyang commented Aug 4, 2022

Details of which files are used to construct the record class used in the report's tax liability analysis

CPS comparison:
Records(data = cps.csv. gz, weights = cps_weights.csv.gz, adjust_ratio=None, start_year=2014, gfactor=Growfactors())
cps.csv.gz: old/base will call the file from tax-calculator, new will call the file from new one in tax-data; both old and new are the same
cps_weights.csv.gz: old/base will call the file from tax-calculator, new will call the file from new one in tax-data; old and new files are different in all of the years
gfactor: both old and new call the fill from tax-calculator, they are same.

PUF comparison:
Records(data= puf.csv, weights = puf_weights.csv, adjust_ratios = puf_ratios)
puf.csv: both old and new call the one in tax-data; they are the same
puf_weights.csv: old/base will call the file from tax-calculator, new will call the file from new one in tax-data; the new one just add the year 2032, previous years' values are the same
puf_ratios: old/base will call the file from tax-calculator, new will call the file from new one in tax-data; old and new files are different in all of the years

@andersonfrailey
Copy link
Collaborator

For posterity:

We had a discussion today regarding the reports being generated by this PR. We believe that there is an issue with the reports script that causes the tables showing the difference in tax liabilities to be incorrect. We're going to generate those tables without using the reports function and post them here before merging.

@bodiyang
Copy link
Collaborator Author

bodiyang commented Aug 5, 2022

CPS Tax Liability values generated without using the reports function:
(in billions)
Tax Liability Tax Year
0 1375.151428 Current Payroll 2023
1 1375.151428 New Payroll 2023
2 1522.562076 Current Income 2023
3 1522.562078 New Income 2023
4 2897.713504 Current Combined 2023
5 2897.713506 New Combined 2023
6 1436.817409 Current Payroll 2024
7 1436.817408 New Payroll 2024
8 1605.965920 Current Income 2024
9 1605.965924 New Income 2024
10 3042.783330 Current Combined 2024
11 3042.783332 New Combined 2024
12 1501.066231 Current Payroll 2025
13 1501.066231 New Payroll 2025
14 1695.325091 Current Income 2025
15 1695.325091 New Income 2025
16 3196.391321 Current Combined 2025
17 3196.391322 New Combined 2025
18 1564.427289 Current Payroll 2026
19 1564.427289 New Payroll 2026
20 2011.714781 Current Income 2026
21 2011.714781 New Income 2026
22 3576.142070 Current Combined 2026
23 3576.142070 New Combined 2026
24 1624.190463 Current Payroll 2027
25 1624.190464 New Payroll 2027
26 2100.235837 Current Income 2027
27 2100.235837 New Income 2027
28 3724.426300 Current Combined 2027
29 3724.426301 New Combined 2027
30 1684.738718 Current Payroll 2028
31 1684.738718 New Payroll 2028
32 2186.922203 Current Income 2028
33 2186.922203 New Income 2028
34 3871.660922 Current Combined 2028
35 3871.660922 New Combined 2028
36 1746.246235 Current Payroll 2029
37 1746.246236 New Payroll 2029
38 2279.761544 Current Income 2029
39 2279.761535 New Income 2029
40 4026.007778 Current Combined 2029
41 4026.007771 New Combined 2029
42 1808.787103 Current Payroll 2030
43 1808.787102 New Payroll 2030
44 2374.260655 Current Income 2030
45 2374.260657 New Income 2030
46 4183.047758 Current Combined 2030
47 4183.047759 New Combined 2030
48 1875.956833 Current Payroll 2031
49 1875.956832 New Payroll 2031
50 2474.772797 Current Income 2031
51 2474.772800 New Income 2031
52 4350.729630 Current Combined 2031
53 4350.729632 New Combined 2031

@bodiyang
Copy link
Collaborator Author

bodiyang commented Aug 5, 2022

Current PUF:
Tax Liability Tax Year
0 1396.225761 Current Payroll 2023
1 4893.428384 Current Income 2023
2 6289.654144 Current Combined 2023
3 1458.204694 Current Payroll 2024
4 5085.776681 Current Income 2024
5 6543.981375 Current Combined 2024
6 1523.474740 Current Payroll 2025
7 5308.387928 Current Income 2025
8 6831.862667 Current Combined 2025
9 1588.177347 Current Payroll 2026
10 5814.751007 Current Income 2026
11 7402.928354 Current Combined 2026
12 1650.251097 Current Payroll 2027
13 5902.757669 Current Income 2027
14 7553.008766 Current Combined 2027
15 1712.969742 Current Payroll 2028
16 6115.146495 Current Income 2028
17 7828.116237 Current Combined 2028
18 1776.676028 Current Payroll 2029
19 6309.859750 Current Income 2029
20 8086.535778 Current Combined 2029
21 1841.491841 Current Payroll 2030
22 6525.260429 Current Income 2030
23 8366.752270 Current Combined 2030
24 1911.803796 Current Payroll 2031
25 6756.977524 Current Income 2031
26 8668.781320 Current Combined 2031

@bodiyang
Copy link
Collaborator Author

bodiyang commented Aug 5, 2022

@andersonfrailey Have generated the CPS and PUF tax liability without calling report.py, as documented in the previous conversation. I think the PR is good to merge right now, see if there is any other question.

@andersonfrailey
Copy link
Collaborator

I still think it's weird that the CPS projections aren't changing more. However, I can't see anything wrong in this PR that would cause that to happen so I'm going to merge it so that we can get to the newer updates and I can work on #411 again.

@andersonfrailey andersonfrailey merged commit b76d900 into PSLmodels:master Aug 6, 2022
bodiyang added a commit to bodiyang/taxdata that referenced this pull request Aug 16, 2022
Merge pull request PSLmodels#412 from bodiyang/master
@bodiyang bodiyang mentioned this pull request Aug 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants