Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need for more variables returned from response function #38

Closed
martinholmer opened this issue Jan 29, 2019 · 18 comments
Closed

Need for more variables returned from response function #38

martinholmer opened this issue Jan 29, 2019 · 18 comments
Labels
enhancement New feature or request request

Comments

@martinholmer
Copy link
Collaborator

In the original design of the Behavioral-Responses response function, it returned two DataFrame objects: the 22 DISTRIBUTION_VARIABLES from the pre-reform Calculator object passed to the function and the 22 DISTRIBUTION_VARIABLES from the post-reform Calculator object passed to the function including the effects of the behavioral responses. This design returned all the variables needed to construct diagnostic, distribution, and difference tables, and hence, met all the high-level data demands of the TaxBrain webapp and Python users constructing the standard tables.

But it has recently come to light that some users have an interest in seeing additional variables that are currently internal to the response function. This is a legitimate enhancement request but nobody had raised an issue describing their extra-data needs and asking for those needs to be met. Because nobody else has done that, I'm doing that here. This is the place to describe your extra-response-data needs. When we have a clear description of the needs, we can discuss how to meet these needs in a safe and efficient manner.

The first evidence of an extra-response-data need was in this PR #37 comment by @MattHJensen:

  • In order to conduct the original exploratory analysis to determine that [issue] Should CG income be included in the base for calculating ETI substitution effect? #36 was a problem, I needed to look at how behresp was acting on individual records, and so I was working with a modified fork of Behavioral-Responses that returned the calculator.

  • It was taking me a long time to figure out how to write a failing test for this problem w/o being able to look at the microdata, but it is easy with the microdata and I was able to do so quickly.

  • We have seen many users take advantage of the tc --dump capabilities. Without returning the calculator, I don't know how users could do similar work with Behavioral-Responses.

  • I have seen another user modify a fork of Behavioral-Responses to return the Calculator object in order to try to figure out what is going on in the microdata.

And then @andersonfrailey said he was the user referred to in the fourth bullet:

I was working with Behavioral-Response a few weeks ago and I wanted to look at some of the data that isn't included in the distribution table DataFrame that gets returned by the response function.

In order to figure out how to meet these extra-response-data needs in a safe and efficient manner, I need a description of the problems users are experiencing. It is not appropriate to discuss solution options until we have a description of the problem. Which extra response variables (other the the 22 variables already returned) were needed ?

@MattHJensen
Copy link
Contributor

@martinholmer, thanks for opening this issue. Could you please describe what you mean by "safe"? It would be helpful if you did the same for "efficient."

@MattHJensen
Copy link
Contributor

MattHJensen commented Jan 29, 2019

Which extra response variables (other the the 22 variables already returned) were needed ?

I would like to access the record-level data for all variables that behresp modifies, including both the tax-calculation input variables directly modified by behresp and the tax-calculation output variables influenced by those modifications.

@martinholmer
Copy link
Collaborator Author

martinholmer commented Jan 29, 2019

@MattHJensen said:

thanks for opening this issue. Could you please describe what you mean by "safe"? It would be helpful if you did the same for "efficient."

We'll get to that after we've defined the problem and begin considering solutions. Meanwhile, I hope you can describe your experiences. What extra response data were useful in your work?

@martinholmer martinholmer added enhancement New feature or request request labels Jan 29, 2019
@andersonfrailey
Copy link

@martinholmer the biggest thing I wanted to do was use the calculator's MTR methods. I'm not sure how that would be possible without returning the calculator itself. I suppose additional functions/arguments could be added to Behavior Response, but that seems like it would be inefficient to do.

@martinholmer
Copy link
Collaborator Author

@andersonfrailey, what was the purpose of calculating MTRs on the post response calculator object? Which MTRs did you want to calculate?

@martinholmer
Copy link
Collaborator Author

@MattHJensen said in issue #38:

I would like to access the record-level data for all variables that behresp modifies, including both the tax-calculation input variables directly modified by behresp and the tax-calculation output variables influenced by those modifications.

Thanks for the clear request.

@martinholmer
Copy link
Collaborator Author

@andersonfrailey first said in issue #38:

I was working with Behavioral-Response a few weeks ago and I wanted to look at some of the data that isn't included in the distribution table DataFrame that gets returned by the response function.

Then he said:

The biggest thing I wanted to do was use the Calculator object's MTR methods.

So which is it? To me "data" means variables computed in the response function that are not returned by the response function. Using the Calculator object's "methods" is a completely different thing. Can you explain why you needed to do that sort of thing? Which MTRs did you want to compute?

@andersonfrailey
Copy link

@martinholmer this was just to compare MTRs between the baseline policy and the reform. I wanted to calculate the combined MTRs.

@martinholmer
Copy link
Collaborator Author

@andersonfrailey said:

this was just to compare MTRs between the baseline policy and the reform. I wanted to calculate the combined MTRs.

OK that's good to know. But I still have two questions.

  1. By the "reform" do you mean the MTR before or after the responses?
  2. Good to know you were interested in "combined MTRs", but MTRs with respect to a small change in what income variable?

@andersonfrailey
Copy link

@martinholmer asked:

By the "reform" do you mean the MTR before or after the responses?

After the responses

And:

Good to know you were interested in "combined MTRs", but MTRs with respect to a small change in what income variable?

Wages and salary income, e00200.

@martinholmer
Copy link
Collaborator Author

martinholmer commented Jan 30, 2019

Given the needs for extra response data expressed by @MattHJensen and @andersonfrailey in Behavioral-Responses issue #38, I think best solution is to add an optional dump argument to the response function. The default value of that argument would be False, in which case the response function would work exactly as it does now. If the new dump argument is set to True, then the two returned DataFrames would include not only the 22 DISTRIBUTION_VARIABLES but all the input and calculated variables included in the tc --dump option, which includes combined MTR on earnings.

This seems like it will meet users extra-data needs in an efficient and safe way. This proposed change does not involve an API change (which would be expensive in developer time) and does not slow down the most common use of the response function by the TaxBrain webapp (which would be expensive in computer time). And this approach is far safer than returning a Calculator object that includes responses because that approach (in addition to causing an expensive API change) is dangerous because it invites the misuse of the returned Calculator object. If you don't think that would be a problem, then think again after reading Tax-Calculator pull request 1848 and the enormously time-consuming discussion that preceded that pull request. Removing the Behavior class from Tax-Calculator eliminated the risk of users misusing a Calculator object that incorporated behavioral responses. I don't think it would be wise to reintroduce that risk of misuse.

@MattHJensen
Copy link
Contributor

The --dump feature satisfies my needs. If anything new comes up in the future, I will open an issue (rather than go crazy in a PR).

@MattHJensen
Copy link
Contributor

MattHJensen commented Jan 31, 2019

I would be happy to implement this feature sometime in February, unless you would prefer to do it or do it sooner, @martinholmer or @andersonfrailey.

@martinholmer
Copy link
Collaborator Author

@MattHJensen said:

I would be happy to implement this feature sometime in February, unless you would prefer to do it or do it sooner.

Thanks for the offer, but in making the proposal I had to outline the implementation. Given that I'll be able to do it quite quickly, and I think it would be good to include in the next release, which I imagine will be in the next few days (after T-C PR 2212 is merged).

@MattHJensen
Copy link
Contributor

Thanks for the offer, but in making the proposal I had to outline the implementation. Given that I'll be able to do it quite quickly, and I think it would be good to include in the next release, which I imagine will be in the next few days (after T-C PR 2212 is merged).

Ok, that sounds great. Thanks, Martin.

@martinholmer
Copy link
Collaborator Author

@andersonfrailey, Are you OK with the enhancement proposed in Behavioral-Responses issue #38?
Note that with the for the post-reform-with-responses dump output, a user could create a Calculator object using those data to calculate other MTRs if that was of interest.

@andersonfrailey
Copy link

@martinholmer yes, this is good with me. Thanks!

@martinholmer
Copy link
Collaborator Author

Pull request #39 resolves issue #38.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request request
Projects
None yet
Development

No branches or pull requests

3 participants