Skip to content

Commit

Permalink
Merge pull request #2745 from martinholmer/revise-data-doc
Browse files Browse the repository at this point in the history
Update docs/usage/data.md documentation
  • Loading branch information
martinholmer committed May 10, 2024
2 parents fe26399 + addc27d commit 48c8354
Showing 1 changed file with 67 additions and 12 deletions.
79 changes: 67 additions & 12 deletions docs/usage/data.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,83 @@
Data for Tax-Calculator
=======================

A Tax-Calculator `Records` object is created by passing a Pandas DataFrame or a string that provides the path to a CSV file with data you'd like to use in Tax-Calculator.
A Tax-Calculator `Records` object is created by passing a Pandas
DataFrame or a string that provides the path to a CSV file with data
you'd like to use with Tax-Calculator.


## TaxData
## Using prepared data with Tax-Calculator

To make Tax-Calculator more useful out of the box, it ships with two data options for users, both of which are created by the [TaxData](https://github.com/PSLmodels/TaxData) project. We refer users to that project for more specific documentation of these data, but we provide a brief overview of the two data options provided by TaxData that come with a Tax-Calculator installation.
To make Tax-Calculator more useful out of the box, it includes three
data options for users, two of which (`cps.csv` and `puf.csv`) are
created in the [taxdata
repository](https://github.com/PSLmodels/taxdata), and one of which
(`tmd.csv`) is created in the [tax-microdata
repository](https://github.com/PSLmodels/tax-microdata-benchmarking).
We refer users to those repositories for more specific documentation
of these data, but we provide a brief overview of the three prepared
data options that are compatible with Tax-Calculator.

### Current Population Survey Data
### Current Population Survey data (`cps.csv`)

Using the `Records.cps_constructor()` method to create a `Records` class object, Tax-Calculator users will be loading the `taxdata` Current Population Survey (CPS) data file. This file is based on publicly available survey data, which is then weighted via `taxdata` to hit IRS/SOI targets. The data are then grown out to hit aggregate forecasts through the time horizon available in Tax-Calculator (approximately the next 10 years).
Using the `Records.cps_constructor()` method to create a `Records`
class object, Tax-Calculator users will be loading the `taxdata`
Current Population Survey (CPS) data file. This file is based on
publicly available survey data, which is then weighted in `taxdata` to
hit IRS/SOI targets. The data are then grown out to hit aggregate
forecasts through the time horizon available in Tax-Calculator
(approximately the next 10 years). All the files required to use this
prepared data option are included in the Tax-Calculator package.

Tax-Calculator provides unit tests to ensure that certain totals are hit with the CPS-based file. However, users should note that these tests are simply to ensure **the accuracy of Tax-Calculator's tax logic** and **not the accuracy of the CPS-based data file produced by `taxdata`**. Please see the [TaxData](https://github.com/PSLmodels/TaxData) documentation for any validation of those data.
Tax-Calculator provides unit tests to ensure that certain totals are
hit with the CPS-based file. However, users should note that these
tests are simply to ensure **the accuracy of Tax-Calculator's tax
logic** and **not the accuracy of the CPS-based data file produced by
`taxdata`**. Please see the
[taxdata](https://github.com/PSLmodels/taxdata) documentation for any
validation of those data.

### The IRS Public Use File
### 2011 IRS public use data (`puf.csv`)

The `taxdata` package also produces a weights file and growth factors file for use with the IRS-SOI Public Use File (PUF). A given `taxdata` version typically produces these files for a specific vintage of the PUF. As of `taxdata` v 0.3.1, the 2012 PUF is used.
The taxdata repository also produces a weights file and growth
factors file for use with the 2011 IRS-SOI Public Use File (PUF). For
users who have purchased their own version of the 2011 PUF, the
`puf_weights.csv.gz` and `growfactors.csv` files that are included in
Tax-Calculator can be used along with the `taxdata` generated `puf.csv`
file when using Tax-Calculator.

For users who have purchased their own version of the PUF, the `puf_weights.csv.gz` and `growfactors.csv` files that are included in Tax-Calculator can be used to create a PUF-based dataset suitable for use in Tax-Calculator.
We refer users of the PUF to the IRS limitations on the use of those
data and their distribution. We also refer users of the PUF weights
file and grow factors to the
[taxdata](https://github.com/PSLmodels/taxdata) documentation for
details on how to use these files with the PUF and to see how well the
resulting tax calculations hit aggregate targets published by the IRS.
However, we do note that analysis with a PUF-based data file tends to
be more accurate than the `cps.csv` file and that the validation of
Tax-Calculator with other microsimulation models uses the `puf.csv`
file.

We refer users of the PUF to the IRS limitations on the use of those data and their distribution. We also refer users of the PUF weights file and grow factors to the [TaxData](https://github.com/PSLmodels/TaxData) documentation for details on how to use these files with the PUF and to see how well the resulting tax calculations hit aggregate targets published by the IRS. However, we do note that analysis with a PUF-based data file tends to be the most accurate and validation of Tax-Calculator with other microsimulation models often uses a PUF-based data file.
### 2015 IRS public use data (`tmd.csv`)

## Using other data in Tax-Calculator
The [tax-microdata
repository](https://github.com/PSLmodels/tax-microdata-benchmarking)
produces an input variables file (`tmd.csv`) and a
`tmd_weights.csv.gz` file that is included in the Tax-Calculator
package beginning with the 3.6.0 release. The `tmd.csv` file is
available only to Tax-Calculator users who have purchased their own
version of the 2015 IRS-SOI PUF. For those users, the
`Records.tmd_constructor()` method creates a `Records` class object
containing the `tmd` variables and weights.

Using other data sources in Tax-Calculator is possible. Users can pass any csv file to the `Records` class and, so long as it has the appropriate [input variables](https://taxcalc.pslmodels.org/guide/input_vars.html), one may be able to obtain results. Using Tax-Calculator with custom data takes care and significant understanding of the model and data. Those interested in using their own data in Tax-Calculator might also look to the [Tax-Cruncher](https://github.com/PSLmodels/Tax-Cruncher) project, which is built as an interface between Tax-Calculator and custom datasets.
## Using other data with Tax-Calculator

Using other data sources with Tax-Calculator is possible. Users can
pass any csv file to the `Records` class and, so long as it has the
appropriate [input
variables](https://taxcalc.pslmodels.org/guide/input_vars.html), one
may be able to obtain results. Using Tax-Calculator with custom data
takes care and significant understanding of the model and data. Those
interested in using their own data with Tax-Calculator might also look
to the [Tax-Cruncher](https://github.com/PSLmodels/Tax-Cruncher)
project, which is built as an interface between Tax-Calculator and
custom datasets.

0 comments on commit 48c8354

Please sign in to comment.