Is categorical feature currently supported by causalnex with label encoding? #170

tonyabracadabra · 2022-09-01T10:49:38Z

I know conducting label encoding on categorical variable would make the algorithm works with categorical variables, but is it mathematically valid for validating their causal relationships when those label encoding are applied?

tonyabracadabra · 2022-09-27T05:23:14Z

Hey folks, is there any updates on this question? @oentaryorj @GabrielAzevedoFerreiraQB Any insights would be helpful. I think we might need to handle the independence test for categorical variable separately and I am not sure if that is implemented in the system now.

GabrielAzevedoFerreiraQB · 2022-09-27T06:14:14Z

Hey Tony,

Hope you are well! Thanks for the great question!

You're absolutely right.

For NOTEARS, we do need continuous variables as you correctly mentioned.
It doesn't always make sense to do a simple label encoding. For example, encoding a variable "countries" directly ("randomly") would not give any signal for NOTEARS to learn relationship.
However, in certain situations it is still possible to do such encoding:
- case where variables are binary
- case where there is an ordinal order in the variables - say days of the week (to certain extent)

One thing to note, though, is that NOTEARS is not "scale invariant", meaning that if we multiply a variable by a constant, NOTEARS results are different. There are discussions on the best way to handle this, but I'd (personally!) recommend thinking about normalizing the variables more carefully if dealing with encoded discrete variables

tonyabracadabra · 2022-09-27T08:19:55Z

Hey Tony,

Hope you are well! Thanks for the great question!

You're absolutely right.

For NOTEARS, we do need continuous variables as you correctly mentioned.

It doesn't always make sense to do a simple label encoding. For example, encoding a variable "countries" directly ("randomly") would not give any signal for NOTEARS to learn relationship.

However, in certain situations it is still possible to do such encoding:

case where variables are binary

case where there is an ordinal order in the variables - say days of the week (to certain extent)

One thing to note, though, is that NOTEARS is not "scale invariant", meaning that if we multiply a variable by a constant, NOTEARS results are different. There are discussions on the best way to handle this, but I'd (personally!) recommend thinking about normalizing the variables more carefully if dealing with encoded discrete variables

Thanks Gabriel for answering my question!

I saw that in the release note, it says Added categorical distributed data support for pytorch NOTEARS., what does that mean?

Is there any plans on supporting causal discoveries with mixed type of data with newly published papers?

jinowork · 2023-06-23T04:26:46Z

in that case, can i do one hot encoding for categorical variables?

GabrielAzevedoFerreiraQB self-assigned this Sep 6, 2022

oentaryorj added the question Further information is requested label Sep 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is categorical feature currently supported by causalnex with label encoding? #170

Is categorical feature currently supported by causalnex with label encoding? #170

tonyabracadabra commented Sep 1, 2022

tonyabracadabra commented Sep 27, 2022 •

edited

GabrielAzevedoFerreiraQB commented Sep 27, 2022

tonyabracadabra commented Sep 27, 2022

jinowork commented Jun 23, 2023

Is categorical feature currently supported by causalnex with label encoding? #170

Is categorical feature currently supported by causalnex with label encoding? #170

Comments

tonyabracadabra commented Sep 1, 2022

tonyabracadabra commented Sep 27, 2022 • edited

GabrielAzevedoFerreiraQB commented Sep 27, 2022

tonyabracadabra commented Sep 27, 2022

jinowork commented Jun 23, 2023

tonyabracadabra commented Sep 27, 2022 •

edited