Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identifying tax filers #23

Open
donboyd5 opened this issue Mar 10, 2024 · 1 comment
Open

Identifying tax filers #23

donboyd5 opened this issue Mar 10, 2024 · 1 comment

Comments

@donboyd5
Copy link
Collaborator

We need a methodology for distinguishing filers from nonfilers so that we can target filers as well as target the universe.

Here are two prior discussions on this topic: taxdata#366 and Tax-Calculator #2501.

My best python code for identifying tax filers in the puf is in the second, here. It's possible that I progressed beyond this, but I'll have to look; I know I was pretty satisfied with the definition just linked to.

Presumably this code should also work well on the PE flat file since it has puf variables. Is what I've provided sufficient? Should we start with that and open a GitHub issue to address questions that inevitably will pop up?

In the next comment I've copied the latest filer-identifying python code I wrote.

@donboyd5
Copy link
Collaborator Author

def filers(puf, year=2017):
    """Return boolean array identifying tax filers.

    Parameters
    ----------
    puf : TYPE
        DESCRIPTION.
    year : TYPE
        DESCRIPTION.

    Returns
    -------
    None.

    # IRS rules for filers: https://www.irs.gov/pub/irs-prior/p17--2017.pdf

    Gross income. This includes all income you receive in the form of money,
    goods, property, and services that isn't exempt from tax. It also includes
    income from sources outside the United States or from the sale of your main
    home (even if you can exclude all or part of it). Include part of your
    social security benefits if: 1. You were married, filing a separate return,
    and you lived with your spouse at any time during 2017; or 2. Half of your
    social security benefits plus your other gross income and any tax-exempt
    interest is more than $25,000 ($32,000 if married filing jointly).

    define gross income as above the line income plus any losses deducted in
    arriving at that, plus any income excluded in arriving at that
    """
    if year == 2017:
        s_inc_lt65 = 10400
        s_inc_ge65 = 11950

        mfj_inc_bothlt65 = 20800
        mfj_inc_onege65 = 22050
        mfj_inc_bothge65 = 23300

        mfs_inc = 4050

        hoh_inc_lt65 = 13400
        hoh_inc_ge65 = 14950

        qw_inc_lt65 = 16750
        qw_inc_ge65 = 18000

        wage_threshold = 1000

    # above the line income is agi plus above line adjustments getting to agi
    above_line_income = puf.c00100 + puf.c02900

    # add back any losses that were used to reduce above the line income
    # these are negative so we will subtract them from above the line income
    capital_losses = puf.c23650.lt(0) * puf.c23650 \
        + puf.c01000.lt(0) * puf.c01000
    other_losses = puf.e01200.lt(0) * puf.e01200
    business_losses = puf.e00900.lt(0) * puf.e00900
    rent_losses = puf.e02000.lt(0) * puf.e02000
    farm_losses = puf.e02100.lt(0) * puf.e02100
    above_line_losses = capital_losses + other_losses + business_losses \
        + rent_losses + farm_losses

    # add back any untaxed income that was excluded in calculating
    # above the line income and that is not considered "exempt"
    # It is clear that IRS includes some Social Security in some
    # circumstances, but for now I treat it as wholly exempt

    # here is the full portion of untaxed Social Security. I think this
    # is OVERSTATED - I think IRS has a limit on amount that can be added
    # back but am not sure how to calculate it
    # socsec_untaxed = puf.e02400 - puf.c02500  # always ge zero, I checked
    socsec_untaxed = 0
    above_line_untaxed = socsec_untaxed

    # gross_income -- is anything left out?
    gross_income = above_line_income - above_line_losses + above_line_untaxed

    # to be on the safe side, don't let gross_income be negative
    gross_income = gross_income * gross_income.ge(0)

    # define filer masks; the approach is to define two groups of households:
    #   (1) households that are required to file based on marital status,
    #       age, and gross income
    #   (2) households that are likely to file whether they are required to or
    #       not, because they are likely to need a refund (e.g., people with)
    #       wage income, or are seeking a credit, or have a complex return
    #       (they have negative AGI)

    # single
    m_single_lt65 = puf.MARS.eq(1) \
        & puf.age_head.lt(65) \
        & gross_income.ge(s_inc_lt65)

    m_single_ge65 = puf.MARS.eq(1) \
        & puf.age_head.ge(65) \
        & gross_income.ge(s_inc_ge65)

    m_single = m_single_lt65 | m_single_ge65

    # married joint
    m_mfj_bothlt65 = puf.MARS.eq(2) \
        & puf.age_head.lt(65) \
        & puf.age_spouse.lt(65) \
        & gross_income.ge(mfj_inc_bothlt65)

    m_mfj_onege65 = puf.MARS.eq(2) \
        & (puf.age_head.ge(65) | puf.age_spouse.ge(65)) \
        & ~(puf.age_head.ge(65) & puf.age_spouse.ge(65)) \
        & gross_income.ge(mfj_inc_onege65)

    m_mfj_bothge65 = puf.MARS.eq(2) \
        & puf.age_head.ge(65) \
        & puf.age_spouse.ge(65) \
        & gross_income.ge(mfj_inc_bothge65)

    m_mfj = m_mfj_bothlt65 | m_mfj_onege65 | m_mfj_bothge65

    # married separate
    m_mfs = puf.MARS.eq(3) & gross_income.ge(mfs_inc)

    # head of household
    m_hoh_lt65 = puf.MARS.eq(4) \
        & puf.age_head.lt(65) \
        & gross_income.ge(hoh_inc_lt65)

    m_hoh_ge65 = puf.MARS.eq(4) \
        & puf.age_head.ge(65) \
        & gross_income.ge(hoh_inc_ge65)

    m_hoh = m_hoh_lt65 | m_hoh_ge65

    # qualifying widow(er)
    m_qw_lt65 = puf.MARS.eq(5) \
        & puf.age_head.lt(65) \
        & gross_income.ge(qw_inc_lt65)

    m_qw_ge65 = puf.MARS.eq(5) \
        & puf.age_head.ge(65) \
        & gross_income.ge(qw_inc_ge65)

    m_qw = m_qw_lt65 | m_qw_ge65

    m_required = m_single | m_mfj | m_mfs | m_hoh | m_qw

    # returns that surely will or must file even if
    # marital-status/age/gross_income requirement is not met
    m_negagi = puf.c00100.lt(0)  # negative agi
    m_iitax = puf.iitax.ne(0)
    m_credits = puf.c07100.ne(0) | puf.refund.ne(0)
    m_wages = puf.e00200.ge(wage_threshold)

    m_likely = m_negagi | m_iitax | m_credits | m_wages

    m_filer = m_required | m_likely

    return m_filer


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant