Replies: 1 comment
-
Hi @benTC74, just as short comments to your points:
I hope this can help! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi All,
I have a couple of questions when I am using the DML package, any help is super much appreciated!!
When I am evaluating the performance of the model, how do I know the model is performing well or reliable (e.g. providing trustful treatment estimation) as there is no ground truth to be compared with, and there is not a metrics such as R2 in linear regression that shows how well the model is explaining. Without this, how can I explain to people the model is reliable?
With the above question, it brings me to my second question, can the treatment estimation always be trusted as long as the it is significant (pvalue < 0.05)? And how large are the standard error and confidence interval for them to be considered too large? For example, my dependent variable has a range from around -2000 to +8000, and one of the treatment estimation is around -200, standard error is around 60, and confidence interval from around -300 to -90. But another treatment estimation is around -4500, standard error is around 1400, and confidence interval from around -7000 to -1700, They are both significant.
The DML runs quite long for even a small dataset of only a few hundred observations and around 40 features, it take around 40 mins to 1.5 hours depending on the treatment variables. I using RandomForest with grid search of 3 different parameters. Is that normal?
Can I actually include all the different treatment variables at once in a model instead of iterating one by one? And for the categorical treatment variable, can I put that directly into the model without one-hot encoding?
Is there a way in DML for checking whether my data is violating the positivity or overlap assumptions in the propensity score models? For both binary and continuous treatment variables? If not, is there any pointers on how can I be validating those? It will be really really good to know.
Just a side question; not DML related, if I have some categorical variables (e.g. number of products - "Many" & "Few") that could be either treatment or control and that are static for the whole dataset in each group (I have 8 groups in the observations), meaning they are always of the same value in each group. Is that actually a problem in modelling? Especially in the case that I have a very small dataset?
Sorry for all of these long questions, but I am super new in this area and am wanting to really understand! I am really appreciating your help here!
Beta Was this translation helpful? Give feedback.
All reactions