Add support for dropping collinear variables #16376

divyaprabha123 · 2020-02-03T15:46:40Z

Describe the workflow you want to enable

Can we add a feature in LinearRegression that could remove collinearity (exact collinearity) in the data?.

Describe your proposed solution

My proposal is to add an extra argument like remove_collinearity if it is set by the user then we can remove exact collinear variables using the rank of the matrix or collinear variables using VIF. This can save some time instead of going for Ridge regression.

rth · 2020-02-03T16:16:15Z

It might be better to have this as a prepreprocessor in sklearn.feature_selection, that way it could be applied to multiple estimators. I'm not sure that exact collinearity is a frequent issue though. Maybe an estimator with a user defined feature correlation threshold?

I'm not sure if it's something that is often done, as opposed to say feature clustering? The latter can be done in scikit-learn with cluster.FeatureAgglomeration though maybe the interface with a required n_clusters is not ideal.

cc @glemaitre

thomasjpfan · 2020-02-03T16:16:39Z

This is being worked on as a feature selection transformer here: #14698

rth · 2020-02-03T16:20:58Z

Indeed thanks. Closing this issue as a duplicate of #13405 then. If you have other comments or suggestions @divyaprabha123 please comment there.

divyaprabha123 added the New Feature label Feb 3, 2020

rth closed this as completed Feb 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for dropping collinear variables #16376

Add support for dropping collinear variables #16376

divyaprabha123 commented Feb 3, 2020

rth commented Feb 3, 2020

thomasjpfan commented Feb 3, 2020

rth commented Feb 3, 2020

Add support for dropping collinear variables #16376

Add support for dropping collinear variables #16376

Comments

divyaprabha123 commented Feb 3, 2020

Describe the workflow you want to enable

Describe your proposed solution

rth commented Feb 3, 2020

thomasjpfan commented Feb 3, 2020

rth commented Feb 3, 2020