Self Normalized Estimator _estimate_round_rewards is wrong? #187

szsb26 · 2022-11-28T22:51:38Z

in SelfNormalizedInverseProbabilityWeighting._estimate_round_rewards, what is returned in the denominator is iw.mean() when in fact this should be is iw.sum(). I think this computation affects the computation of the confidence intervals for this class.

Found this issue when i found that the SNIPS estimator had unusually higher variance than the IPW estimator.

This means that _estimate_policy_value in InverseProbabilityWeighting (the base class) may need to be changed as well, since the return for that is .mean(), and there is no such normalizing constant in the definition of the SNIPS estimator.

szsb26 changed the title ~~Self Normalized Estimator _estimate_round_rewards is wrong.~~ Self Normalized Estimator _estimate_round_rewards is wrong? Nov 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Self Normalized Estimator _estimate_round_rewards is wrong? #187

Self Normalized Estimator _estimate_round_rewards is wrong? #187

szsb26 commented Nov 28, 2022 •

edited

Self Normalized Estimator _estimate_round_rewards is wrong? #187

Self Normalized Estimator _estimate_round_rewards is wrong? #187

Comments

szsb26 commented Nov 28, 2022 • edited

szsb26 commented Nov 28, 2022 •

edited