Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do we have a sense of where Tax Calculator's 2011 PUF outputs are most likely to be off? #410

Open
Thirdhuman opened this issue Dec 9, 2021 · 2 comments

Comments

@Thirdhuman
Copy link

[Reposting from Tax Calculator's issues board]

Do users/developers have any sense of what the biases are of the 2011 PUF-based outputs likely are when extrapolated to more recent years via Tax Calculator? For instance, understating/overstating aggregate AGI? Or areas where the income distribution or anticipated filing behaviors might be off?

With these sorts of things, I know that methodological consistency and faithfulness to the underlying data can dominate strong hunches about what's actually true. I'm trying to figure out where those are for the 2011 PUF.

For context, I'm asking this because I'm in the process of validating my adaptation of TaxData puf.csv file for the 2014 PUF file, currently by cross-referencing TaxData's puf_agg_exp.csv and TaxBrain's PUF calculations. And so I'm trying to diagnose whether any disparities are likely due to possible calculation/coding errors vs. defensible differences stemming from the dataset.

@martinholmer
Copy link
Contributor

@Thirdhuman, The person who has spent the most time looking into the quality of the PUF data projections beyond 2011 is @donboyd5. Over the past couple of years he has raised a number of data-quality issues in both the Tax-Calculator and taxdata repositories. You can find a complete list of his (open and closed) issues in each repository by using this search phrase --- is:issue author:donboyd5 --- on the Issues page of the repo. After raising several issues, and there being little or no progress on many of them, he finally wrote a plea for a better data-preparation process in issue #400, which also provides links to most of his specific issues. Issue #400 is the place for you to start answering your question.

@donboyd5
Copy link

Thanks much. In addition to the issues mentioned in #400, there is another item that arose over the last year, related to the SALT deduction. I don't think I ever opened an issue, although I did document it and mention it maybe in a PSL blog post. I don't have time to track it down, but the essence of it is this:

  • If you start with the 2011 puf.csv
  • Grow it to 2017
  • Calculate tax under 2017 law, so as to get AGI, potential SALT deduction, and other values for each record
  • Determine (estimate) which of the records would be tax filers under 2017 law (this is art and science, but I think the method I described in one of the issues works well - basically my method kept anyone who had to file due to income rules or other rules such as qualifying for certain credits, plus anyone who ought to want to file because they could claim a refundable credit or were owed a refund for overwithholding)
  • Filter so that you only keep tax filers in 2017, because you can only compare results to IRS totals for filers (because IRS doesn't have nonfilers)
  • Summarize # returns, # SALT claimants, $ SALT amount by AGI ranges defined to be consistent with the AGI ranges the IRS uses for the summary spreadsheets they put on their site
  • Compare total available (claimed) SALT deduction by income range from the advanced-filtered puf.csv to same in the published IRS totals

You will find (or I found, anyway) that while the SALT deduction is not dramatically different on the bottom line than what the IRS shows, it is far too low in the highest income ranges. I've copied below a table from the last time I did this. You should be able to match up the target column to published IRS tables. The puf column is what I calculated from the advanced-to-2017-then-filtered-to-exclude-nonfilers puf.csv. (Someone who does this won't get identical numbers to what I calculated for the puf column because of certain small differences in what I did from the standard Tax-Calculator approach, but numbers should be similar.) As you can see this leads to $30 billion too little SALT deduction for $10-millionaires, relative to what the IRS reports. The difference would be exacerbated in later years as the file is grown beyond 2017 (but the comparison is for 2017 because that is the last year of pre-SALT-cap IRS summary data.) I think this leads to substantial underestimates of the cost of restoring the full SALT deduction, unless adjusted for.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants