Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure imputed capital gains CDF is valid (monotonic) #816

Open
MaxGhenis opened this issue Feb 20, 2024 · 2 comments
Open

Ensure imputed capital gains CDF is valid (monotonic) #816

MaxGhenis opened this issue Feb 20, 2024 · 2 comments
Labels
calibration Calibration measures to improve statistical accuracy

Comments

@MaxGhenis
Copy link
Collaborator

MaxGhenis commented Feb 20, 2024

impute_capital_gains currently interpolates/extrapolates the provided quantiles to a CDF by fitting splines. This can result in CDFs that are not monotonically increasing and thus invalid.

After asking ChatGPT for some ideas, I think a promising approach could be first synthesizing a pdf from the quantiles, smoothing it with a kernel density estimator, then integrating it to a cdf. Here's an example of how that might look:

image

Other options like isotonic regression or transformations could also work, and we may want something more complex if we want to consider all the data together rather than each income group independently.

@MaxGhenis MaxGhenis added the calibration Calibration measures to improve statistical accuracy label Feb 20, 2024
@MaxGhenis
Copy link
Collaborator Author

MaxGhenis commented Feb 25, 2024

FWIW I only saw one income level where the spline was obviously nonmonotonic, so might not be such a high priority:
image

https://policyengine-uk-documentation.nw.r.appspot.com/Capital_Gains_Tax

@MaxGhenis
Copy link
Collaborator Author

The PCHIP Interpolator seems ideally suited to this. It both preserves monotonicity and supports extrapolation.

Here's an example for the 99th income centile, which the spline currently produces a nonmonotonic interpolation from.

image

Relevant code (notebook):

pchip_interpolator = PchipInterpolator(quantiles, gains, extrapolate=True)
extended_quantiles = np.linspace(0.01, 0.99, 99)
extended_gains = pchip_interpolator(extended_quantiles)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
calibration Calibration measures to improve statistical accuracy
Projects
None yet
Development

No branches or pull requests

1 participant