Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update TaxBrain and Tax-Calculator to handle UBI and CPS specific inputs #668

Open
9 tasks done
hdoupe opened this issue Sep 25, 2017 · 14 comments · Fixed by PSLmodels/Tax-Calculator#1577
Open
9 tasks done
Labels

Comments

@hdoupe
Copy link
Collaborator

hdoupe commented Sep 25, 2017

The goal of this issue is to outline the requirements for putting the CPS and UBI parameters on TaxBrain. The proposed steps are:

  • Add back-end capability to use the PUF or the CPS
    • Augment dropq functions in Tax-Calculator to work with PUF and CPS (or refactor to create TaxBrain interface function)
    • Update warning/error messages to reflect that some parameters are used exclusively by either the PUF or the CPS
    • Handle the different results tables based on whether CPS or PUF is used
  • Add front-end capability to use the PUF or the CPS
    • Add toggle to switch between PUF and CPS
    • Add capability to grey-out certain TaxBrain inputs dependent on the position of the switch and the value of the “availability” parameter in the records JSON
    • Add UBI inputs to TaxBrain input page
    • Add UBI related output to results page

This is a tentative outline. I encourage all feedback, especially on whether some items should be added/removed or on ideas for implementing these items.

@MattHJensen @Amy-Xu @andersonfrailey @martinholmer @GoFroggyRun

@martinholmer
Copy link
Contributor

martinholmer commented Sep 25, 2017

@hdoupe said in issue #668 wrt adding CPS and UBI/benefits to TaxBrain:

This is a tentative outline. I encourage all feedback, especially on whether some items should be added/removed or on ideas for implementing these items.

This is a major undertaking with many parts. As with anything this large, I think you, and the others involved, would benefit from applying the well-known divide-and-conquer computer-programming strategy. I think you should give serious consideration to dividing this into at lease two pieces: first make the CPS input file work with TaxBrain, then after that is done, add the benefits/UBI capabilities to Tax-Calculator and TaxBrain. Essentially, I'm suggesting you "refactor" your work plan.

@MattHJensen @Amy-Xu @andersonfrailey

@hdoupe
Copy link
Collaborator Author

hdoupe commented Sep 25, 2017

@martinholmer said

This is a major undertaking with many parts. As with anything this large, I think you, and the others involved, would benefit from applying the well-known divide-and-conquer computer-programming strategy.

Breaking this project up into chunks is a good way to think about this project. Thank you for your advice.

I think you should give serious consideration to dividing this into at lease two pieces: first make the CPS input file work with TaxBrain, then after that is done, add the benefits/UBI capabilities to Tax-Calculator and TaxBrain. Essentially, I'm suggesting you "refactor" your work plan.

I agree with this breakdown. Further, we can start on the first phase while we are still fleshing out the details of what will be required in the second phase.

I think that the three main requirements for the first phase would be

  1. Add a switch indicating whether to use PUF or CPS
  2. Display PUF or CPS parameters depending on the position of the switch
  3. Indicate to Tax-Calculator whether to use PUF or CPS (either pass taxrec_df as None or create a use_cps flag or some other more creative solution).

Since benefits data would not be included at this point, I do not think that anything else would have to be modified.

@martinholmer
Copy link
Contributor

@hdoupe said in PB #668:

This is a major undertaking with many parts. As with anything this large, I think you, and the others involved, would benefit from applying the well-known divide-and-conquer computer-programming strategy.

Breaking this project up into chunks is a good way to think about this project. Thank you for your advice.

I think you should give serious consideration to dividing this into at lease two pieces: first make the CPS input file work with TaxBrain, then after that is done, add the benefits/UBI capabilities to Tax-Calculator and TaxBrain. Essentially, I'm suggesting you "refactor" your work plan.

I agree with this breakdown. Further, we can start on the first phase while we are still fleshing out the details of what will be required in the second phase.

I think that the three main requirements for the first phase would be

  1. Add a switch indicating whether to use PUF or CPS
  2. Display PUF or CPS parameters depending on the position of the switch
  3. Indicate to Tax-Calculator whether to use PUF or CPS (either pass taxrec_df as None or create a use_cps flag or some other more creative solution).

Since benefits data would not be included at this point, I do not think that anything else would have to be modified.

OK, items 1 and 2 on your list are TaxBrain enhancements.
I don't know enough to offer any advice on those enhancements.

Item 3 on your list involves some coordination between TaxBrain and Tax-Calculator.
I know you've thought about modifying dropq to work with cps.csv input data, but I'm wary about that approach for at least two reasons. First, we aren't dropping any records in each table cell so the semantics of calling dropq on CPS data is confusing. And second, the dropq code is complex, and hence, error prone to modify. Perhaps, in the end, this will be judged the best approach, but have you considered other approaches? For example, what about using the capabilities (which don't drop records) in the taxcalcio.py file? They may need some refactoring in order to be efficiently called from TaxBrain, but that might be easier than revising dropq. Or perhaps you've considered this approach and found it wanting. If so, please share your thinking with the rest of us.

@hdoupe
Copy link
Collaborator Author

hdoupe commented Sep 25, 2017

@martinholmer these are valid concerns.

First, we aren't dropping any records in each table cell so the semantics of calling dropq on CPS data is confusing.

Yes, even though by setting mask_computed to False I am not (or at least hope I am not) dropping any records. However, using the dropq functions without dropping any records does not make much sense.

And second, the dropq code is complex, and hence, error prone to modify.

Another valid concern. It is difficult to ensure that setting mask_computed to False is actually preventing records from being dropped in all cases.

For example, what about using the capabilities (which don't drop records) in the taxcalcio.py file? They may need some refactoring in order to be efficiently called from TaxBrain, but that might be easier than revising dropq.

This is a good idea. I am not familiar with taxcalcio.py but it looks like it could do the job.

My main motivation for using the dropq functions was that it appeared easy to modify and it already formats the results into the structure that TaxBrain expects. However, as you point out, it is not as easy to modify as I thought.

My main concern with using taxcalcio.py when the CPS is used is that TaxBrain would now have two functions that it could call from Tax-Calculator. I like the idea of having one function that is called in Tax-Calculator and sending all of the relevant information to that function. This gives the Tax-Calculator developers more control over the modeling process and I think that in general they will be much more particular about this process and quicker to respond to bugs than PolicyBrain developers.

Furthermore, a longer term goal for PolicyBrain is to be able to quickly set up models on the PolicyBrain platform. To ease the burden on PolicyBrain developers and give more control to the model developers, a requirement could be for new models to have one function that serves as an interface to PolicyBrain. This function would receive all of the relevant modeling information and return the results. This strategy is up for debate and can be hashed out further in the future.

For these reasons, I think my initial approach of hacking the dropq functions was wrong, and I think a better approach would be to develop a TaxBrain interface function in Tax-Calculator that contains the logic of running the model and returning the results. Part of this logic would be sending data to taxcalcio.py (or some other function for running the model) or to dropq.py if records need to be dropped.

@martinholmer
Copy link
Contributor

@hdoupe concluded a thoughtful comment in the discussion of issue #668 as follows:

For these reasons, I think my initial approach of hacking the dropq functions was wrong, and I think a better approach would be to develop a TaxBrain interface function in Tax-Calculator that contains the logic of running the model and returning the results [expected by TaxBrain]. Part of this logic would be sending data to taxcalcio.py (or some other function for running the model) or to dropq.py if records need to be dropped.

I think I understand your suggestion and it makes a lot of sense to me.

I'll begin thinking about such a "universal" function (that is, one that would work with puf.csv or cps.csv or cps.csv+benefits.csv` input data) and return the results expected by TaxBrain (as currently done only in dropq).

I'd appreciate any more detailed thoughts you have about such a function. What would be the function arguments? And what exactly should be returned in order to meet TaxBrain needs?

@hdoupe
Copy link
Collaborator Author

hdoupe commented Sep 26, 2017

@martinholmer said

I'll begin thinking about such a "universal" function (that is, one that would work with puf.csv or cps.csv or cps.csv+benefits.csv` input data) and return the results expected by TaxBrain (as currently done only in dropq).

I'd appreciate any more detailed thoughts you have about such a function. What would be the function arguments? And what exactly should be returned in order to meet TaxBrain needs?

Great, that would be very helpful. I have a couple ideas on the arguments.

  1. I'm still not exactly sure how the benefits data issue will be worked out. However, I think it will need to be passed as an argument from TB to TC.
  2. Right now, there is a separate function run_nth_year_gdp_elast_model for dealing with the GDP elasticity model. It would be nice if we could wrap that into the single interface function.
  3. Many of the user reform arguments are passed in a single dictionary user_mods, with keys growdiff_response, consumption, growdiff_baseline, behavior, policy, and gdp_elasticity. I think the code would be easier to read and understand if we passed some of these nested dictionaries explicitly. However, this is my opinion and others may disagree.
  4. Since Records already has a convenient cps_constructor function I think we could default to using that if taxrec_df is None. If we are only using taxrec_df when we are using the PUF, then we could rename the argument puf_df. We could also add a flag argument use_cps that would more clearly indicate whether we are using the PUF or the CPS. But, this would not be necessary if we use the logic above.

So, my proposed function would be something like:

results = taxcalc_interface(year_n, 
                            start_year, 
                            reform, 
                            assumptions=None,
                            gdp_elasticity=None,
                            use_cps = False,
                            puf_df = None,
                            benefits_df = None)

The results that are returned now are fine with me. @andersonfrailey and @Amy-Xu do you know what the new outputs will be?

@martinholmer
Copy link
Contributor

@hdoupe said in issue #668:

So, my proposed function would be something like this:

results = taxcalc_interface(year_n, 
                            start_year, 
                            reform, 
                            assumptions=None,
                            gdp_elasticity=None,
                            use_cps=False,
                            puf_df=None,
                            benefits_df=None)

The results that are returned now are fine with me.

Thanks for thinking about this in more detail. Your thoughts are very helpful.
The two-stage approach to this work (that we seemed to agree on the other day) implies we don't have to worry (yet) about benefits input data and benefit results (because we're discussion just the first stage).
Just a couple questions for now:

  1. How would the subsample option work? Don't we need a parameter to say we want to use a subsample for the "quick calculations" in TaxBrain?
  2. The gdp_elasticity results are completely different from the results in all other cases. How do you think we should handle that?

@MattHJensen

@hdoupe
Copy link
Collaborator Author

hdoupe commented Sep 28, 2017

@martinholmer had two questions:

  1. How would the subsample option work? Don't we need a parameter to say we want to use a subsample for the "quick calculations" in TaxBrain?

Currently, if the user chooses the quick-calc option, only a sample of the PUF data is sent to the Tax-Calculator. If we keep it this way, then nothing would change for simulations using the PUF. However, an extra parameter indicating that we are using the quick-calc option would be needed if the simulation uses the CPS. Alternatively, we could send the entire PUF data set over for all runs and a flag indicating whether this is a quick-calculation or not. Then, the sub-sample would be chosen in Tax-Calculator.

  1. The gdp_elasticity results are completely different from the results in all other cases. How do you think we should handle that?

Ah, right. I think this causes a problem with the idea of having one function to serve as an interface between TaxBrain and Tax-Calculator. I think we could make it work if we just have two different types of output. However, in the future, if we have 3 or 4 or more different types of output, then I can see this implementation getting more and more complicated.

Now that I'm thinking more about this idea, as we call this universal function in more and more different ways, it will become pretty complicated for this function to figure out what TaxBrain wants it to do. I guess what I'm getting at is that this implementation wouldn't scale very well if we add more apps on TaxBrain that use Tax-Calculator.

@martinholmer what do you think about this?

@martinholmer
Copy link
Contributor

@hdoupe said:

Now that I'm thinking more about this idea, as we call this universal function in more and more different ways, it will become pretty complicated for this function to figure out what TaxBrain wants it to do. I guess what I'm getting at is that this implementation wouldn't scale very well if we add more apps on TaxBrain that use Tax-Calculator.

Agreed.

I've started working on a Tax-Calculator pull request that will allow TaxBrain to use either puf.csv or cps.csv input. Hope to have that pull request ready for review soon.

@hdoupe
Copy link
Collaborator Author

hdoupe commented Oct 2, 2017

Sounds good. Thanks @martinholmer

@hdoupe
Copy link
Collaborator Author

hdoupe commented Oct 10, 2017

@martinholmer I'm going to keep this issue open and check off the corresponding item on the list above.

@hdoupe hdoupe reopened this Oct 10, 2017
@martinholmer
Copy link
Contributor

@hdoupe, Sorry about issue #668 being closed prematurely.
Somehow GitHub did that automatically when I merged Tax-Calculator pull request 1577.

@MattHJensen
Copy link
Contributor

@martinholmer said:

Somehow GitHub did that automatically when I merged Tax-Calculator pull request 1577.

It's because of the bolded part of this sentence in 1577: "So, this pull request is an attempt to resolve PolicyBrain issue 668."

@martinholmer
Copy link
Contributor

@MattHJensen, Thanks for the tip in #668 on how GitHub takes action based on what you write.

@hdoupe hdoupe mentioned this issue Nov 8, 2017
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants