Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: handle dummy and categorical variables in robust methods, e.g. cov, mahalanobis, subsampling #9223

Open
josef-pkt opened this issue Apr 19, 2024 · 0 comments

Comments

@josef-pkt
Copy link
Member

(there is not specific issue for this yet)

The current robust methods and those in planning do not handle dummy and categorical variables (in exog) differently from continuous variables.

Our methods will run into problems if there are many dummy variables in exog, especially if some have small cell counts.
For example, subsampling will run into empty cell problems.
Small cell counts might also create influential points, that, however, we will need to include.

I briefly looked at part of the literature that includes specific categorical handling, but I did not look at the details.

Categorical exog are not a problem in current RLM, AFAIU, or we never ran into them, because there is no subsampling and similar methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant