Need for more variables returned from response function #38

martinholmer · 2019-01-29T13:26:36Z

In the original design of the Behavioral-Responses response function, it returned two DataFrame objects: the 22 DISTRIBUTION_VARIABLES from the pre-reform Calculator object passed to the function and the 22 DISTRIBUTION_VARIABLES from the post-reform Calculator object passed to the function including the effects of the behavioral responses. This design returned all the variables needed to construct diagnostic, distribution, and difference tables, and hence, met all the high-level data demands of the TaxBrain webapp and Python users constructing the standard tables.

But it has recently come to light that some users have an interest in seeing additional variables that are currently internal to the response function. This is a legitimate enhancement request but nobody had raised an issue describing their extra-data needs and asking for those needs to be met. Because nobody else has done that, I'm doing that here. This is the place to describe your extra-response-data needs. When we have a clear description of the needs, we can discuss how to meet these needs in a safe and efficient manner.

The first evidence of an extra-response-data need was in this PR #37 comment by @MattHJensen:

In order to conduct the original exploratory analysis to determine that [issue] Should CG income be included in the base for calculating ETI substitution effect? #36 was a problem, I needed to look at how behresp was acting on individual records, and so I was working with a modified fork of Behavioral-Responses that returned the calculator.

It was taking me a long time to figure out how to write a failing test for this problem w/o being able to look at the microdata, but it is easy with the microdata and I was able to do so quickly.

We have seen many users take advantage of the tc --dump capabilities. Without returning the calculator, I don't know how users could do similar work with Behavioral-Responses.

I have seen another user modify a fork of Behavioral-Responses to return the Calculator object in order to try to figure out what is going on in the microdata.

And then @andersonfrailey said he was the user referred to in the fourth bullet:

I was working with Behavioral-Response a few weeks ago and I wanted to look at some of the data that isn't included in the distribution table DataFrame that gets returned by the response function.

In order to figure out how to meet these extra-response-data needs in a safe and efficient manner, I need a description of the problems users are experiencing. It is not appropriate to discuss solution options until we have a description of the problem. Which extra response variables (other the the 22 variables already returned) were needed ?

The text was updated successfully, but these errors were encountered:

MattHJensen · 2019-01-29T13:45:06Z

@martinholmer, thanks for opening this issue. Could you please describe what you mean by "safe"? It would be helpful if you did the same for "efficient."

MattHJensen · 2019-01-29T13:57:33Z

Which extra response variables (other the the 22 variables already returned) were needed ?

I would like to access the record-level data for all variables that behresp modifies, including both the tax-calculation input variables directly modified by behresp and the tax-calculation output variables influenced by those modifications.

martinholmer · 2019-01-29T13:58:09Z

@MattHJensen said:

thanks for opening this issue. Could you please describe what you mean by "safe"? It would be helpful if you did the same for "efficient."

We'll get to that after we've defined the problem and begin considering solutions. Meanwhile, I hope you can describe your experiences. What extra response data were useful in your work?

andersonfrailey · 2019-01-29T14:45:05Z

@martinholmer the biggest thing I wanted to do was use the calculator's MTR methods. I'm not sure how that would be possible without returning the calculator itself. I suppose additional functions/arguments could be added to Behavior Response, but that seems like it would be inefficient to do.

martinholmer · 2019-01-29T14:57:07Z

@andersonfrailey, what was the purpose of calculating MTRs on the post response calculator object? Which MTRs did you want to calculate?

martinholmer · 2019-01-29T17:48:25Z

@MattHJensen said in issue #38:

I would like to access the record-level data for all variables that behresp modifies, including both the tax-calculation input variables directly modified by behresp and the tax-calculation output variables influenced by those modifications.

Thanks for the clear request.

martinholmer · 2019-01-29T17:57:28Z

@andersonfrailey first said in issue #38:

I was working with Behavioral-Response a few weeks ago and I wanted to look at some of the data that isn't included in the distribution table DataFrame that gets returned by the response function.

Then he said:

The biggest thing I wanted to do was use the Calculator object's MTR methods.

So which is it? To me "data" means variables computed in the response function that are not returned by the response function. Using the Calculator object's "methods" is a completely different thing. Can you explain why you needed to do that sort of thing? Which MTRs did you want to compute?

andersonfrailey · 2019-01-29T19:21:20Z

@martinholmer this was just to compare MTRs between the baseline policy and the reform. I wanted to calculate the combined MTRs.

martinholmer · 2019-01-29T21:36:01Z

@andersonfrailey said:

this was just to compare MTRs between the baseline policy and the reform. I wanted to calculate the combined MTRs.

OK that's good to know. But I still have two questions.

By the "reform" do you mean the MTR before or after the responses?
Good to know you were interested in "combined MTRs", but MTRs with respect to a small change in what income variable?

andersonfrailey · 2019-01-30T14:35:12Z

@martinholmer asked:

By the "reform" do you mean the MTR before or after the responses?

After the responses

And:

Good to know you were interested in "combined MTRs", but MTRs with respect to a small change in what income variable?

Wages and salary income, e00200.

martinholmer · 2019-01-30T20:53:06Z

Given the needs for extra response data expressed by @MattHJensen and @andersonfrailey in Behavioral-Responses issue #38, I think best solution is to add an optional dump argument to the response function. The default value of that argument would be False, in which case the response function would work exactly as it does now. If the new dump argument is set to True, then the two returned DataFrames would include not only the 22 DISTRIBUTION_VARIABLES but all the input and calculated variables included in the tc --dump option, which includes combined MTR on earnings.

This seems like it will meet users extra-data needs in an efficient and safe way. This proposed change does not involve an API change (which would be expensive in developer time) and does not slow down the most common use of the response function by the TaxBrain webapp (which would be expensive in computer time). And this approach is far safer than returning a Calculator object that includes responses because that approach (in addition to causing an expensive API change) is dangerous because it invites the misuse of the returned Calculator object. If you don't think that would be a problem, then think again after reading Tax-Calculator pull request 1848 and the enormously time-consuming discussion that preceded that pull request. Removing the Behavior class from Tax-Calculator eliminated the risk of users misusing a Calculator object that incorporated behavioral responses. I don't think it would be wise to reintroduce that risk of misuse.

MattHJensen · 2019-01-31T21:22:55Z

The --dump feature satisfies my needs. If anything new comes up in the future, I will open an issue (rather than go crazy in a PR).

MattHJensen · 2019-01-31T21:26:51Z

I would be happy to implement this feature sometime in February, unless you would prefer to do it or do it sooner, @martinholmer or @andersonfrailey.

martinholmer · 2019-01-31T21:49:25Z

@MattHJensen said:

I would be happy to implement this feature sometime in February, unless you would prefer to do it or do it sooner.

Thanks for the offer, but in making the proposal I had to outline the implementation. Given that I'll be able to do it quite quickly, and I think it would be good to include in the next release, which I imagine will be in the next few days (after T-C PR 2212 is merged).

MattHJensen · 2019-01-31T21:52:24Z

Thanks for the offer, but in making the proposal I had to outline the implementation. Given that I'll be able to do it quite quickly, and I think it would be good to include in the next release, which I imagine will be in the next few days (after T-C PR 2212 is merged).

Ok, that sounds great. Thanks, Martin.

martinholmer · 2019-01-31T21:53:14Z

@andersonfrailey, Are you OK with the enhancement proposed in Behavioral-Responses issue #38?
Note that with the for the post-reform-with-responses dump output, a user could create a Calculator object using those data to calculate other MTRs if that was of interest.

andersonfrailey · 2019-01-31T21:56:40Z

@martinholmer yes, this is good with me. Thanks!

martinholmer · 2019-02-01T16:48:36Z

Pull request #39 resolves issue #38.

martinholmer mentioned this issue Jan 29, 2019

Remove ltcg from ETI substitution effect base #37

Merged

1 task

martinholmer added enhancement New feature or request request labels Jan 29, 2019

martinholmer mentioned this issue Feb 1, 2019

Add dump option to response function and streamline tests #39

Merged

martinholmer closed this as completed Feb 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need for more variables returned from response function #38

Need for more variables returned from response function #38

martinholmer commented Jan 29, 2019

MattHJensen commented Jan 29, 2019

MattHJensen commented Jan 29, 2019 •

edited

martinholmer commented Jan 29, 2019 •

edited

andersonfrailey commented Jan 29, 2019

martinholmer commented Jan 29, 2019

martinholmer commented Jan 29, 2019

martinholmer commented Jan 29, 2019

andersonfrailey commented Jan 29, 2019

martinholmer commented Jan 29, 2019

andersonfrailey commented Jan 30, 2019

martinholmer commented Jan 30, 2019 •

edited

MattHJensen commented Jan 31, 2019

MattHJensen commented Jan 31, 2019 •

edited

martinholmer commented Jan 31, 2019

MattHJensen commented Jan 31, 2019

martinholmer commented Jan 31, 2019

andersonfrailey commented Jan 31, 2019

martinholmer commented Feb 1, 2019

Need for more variables returned from response function #38

Need for more variables returned from response function #38

Comments

martinholmer commented Jan 29, 2019

MattHJensen commented Jan 29, 2019

MattHJensen commented Jan 29, 2019 • edited

martinholmer commented Jan 29, 2019 • edited

andersonfrailey commented Jan 29, 2019

martinholmer commented Jan 29, 2019

martinholmer commented Jan 29, 2019

martinholmer commented Jan 29, 2019

andersonfrailey commented Jan 29, 2019

martinholmer commented Jan 29, 2019

andersonfrailey commented Jan 30, 2019

martinholmer commented Jan 30, 2019 • edited

MattHJensen commented Jan 31, 2019

MattHJensen commented Jan 31, 2019 • edited

martinholmer commented Jan 31, 2019

MattHJensen commented Jan 31, 2019

martinholmer commented Jan 31, 2019

andersonfrailey commented Jan 31, 2019

martinholmer commented Feb 1, 2019

MattHJensen commented Jan 29, 2019 •

edited

martinholmer commented Jan 29, 2019 •

edited

martinholmer commented Jan 30, 2019 •

edited

MattHJensen commented Jan 31, 2019 •

edited