Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent and unclear loss calculation for contextual bandits #4495

Open
jackgerrits opened this issue Feb 13, 2023 · 1 comment
Open
Labels
Deprecation Unintuitive Behavior Not a bug precisely, but a behavior that is surprising.

Comments

@jackgerrits
Copy link
Member

The loss calculation for CB reductions is not consistent and not well documented. The current situation is:

  • cb_adf records loss as calculated by an IPS estimator, except for if CB type DR or DM is in use, in which case it is using the DR estimator for loss.
  • cb_explore_adf always uses an IPS estimate (including for DR and DM)

The proposed solution is to unify these implementations to use IPS specifically for clarity. Longer term, we wish to be able to use the various estimator implementations that have been added.

Since this may be a surprising change to anyone measuring model performance based on the DR estimate we will add this as a deprecation with a flag to force IPS to be used in cases where DR was used before and then in VW 10 we will swap the default to IPS.

It is possible to use the estimators in Python with the vw-estimators library. We need to add documentation about the integration of these packages since it is so important.

@jackgerrits jackgerrits added Unintuitive Behavior Not a bug precisely, but a behavior that is surprising. Deprecation labels Feb 13, 2023
@lalo
Copy link
Collaborator

lalo commented Feb 14, 2023

we also have these other estimator impls (not part of cb_adf): https://github.com/VowpalWabbit/vowpal_wabbit/tree/master/vowpalwabbit/core/include/vw/core/estimators

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecation Unintuitive Behavior Not a bug precisely, but a behavior that is surprising.
Projects
None yet
Development

No branches or pull requests

2 participants