Skip to content
olgavrou edited this page Jan 4, 2023 · 11 revisions

This is a living document that will be updated as questions are asked and answered.

How would I handle a multiclass dataset where the number of labels increases over time?

The --csoaa_ldf and --wap_ldf modes use label dependent features, which allow you to specify a dynamic set of labels on each example.

How can I see what model weights are assigned to my features?

See here

Where can I find examples of the most popular reductions in use?

See tutorials page here

What does MTR stand for and why is it the default update rule for many exploration algorithms?

MTR stands for Multi Task Regression and more information can be found in the CB Bakeoff paper.

The reason it is set as the default is that in an action dependent features settings, the update rule is usually more efficient relative to IPS/DR:

  • In MTR, only the weights of the chosen action are updated (rather than updating all weights assuming 0 reward for non-chosen actions for IPS)
  • In MTR, the propensity (i.e., the probability of the chosen action) is directly used in the regression cost formula as a weight rather than only be used in the estimator of loss (in the CB Bakeoff paper, compare eq. 6 for MTR versus eq. 5 where the estimator of the loss is in eq. 3 for IPS and in eq. 4 for DR).

I have historical data without probabilities, can I estimate performance of contextual bandits offline?

When the propensities are not available there are two options:

  1. No randomization is performed online, hence each action is taken with probability 1. In this case, offline estimation of the performance of CB (or any other offline algorithm) cannot be done reliably, incurring also possibly obtaining very bias and wrong estimates (e.g., the No Unknown Confounder assumption may not hold). One option is to start implementing an A/B test where you randomize your campaigns. There is a relevant discussion here
  2. Randomization is performed but the propensity is not known. In this case, one could use offline experimentation estimators that do not use the propensity. One such estimator is the Direct Method estimator. In our empirical evidence this option is usually not very data efficient since the DM estimator would still need a large offline dataset to provide small confidence intervals, and it is also prone to very difficult to spot estimation errors due to bugs in the data collection pipeline. When possible, we suggest collecting data with data pipelines that include logging the propensity, as done in Azure Personalizer.

How do I save a model and continue learning at a later point

See documentation here

Clone this wiki locally