Save predicted reward for chosen arm (feature request) #10

pstansell · 2019-12-25T12:03:42Z

Hello Robin,

This is a feature request, not a bug report.

I'd like the output from history$get_data_table() to include a column for the predicted values of the chosen arms at each step.

For example, for EpsilonGreedyPolicy it would just be self$theta$mean[[chosen_arm]], which I realise is available by setting save_theta = TRUE in Simulator$new. If I also set save_context = TRUE the predicted value of the chosen action can be obtained. (Although I have to take into account the fact that the theta values are one time step ahead of the values for the current context-arm pair since they have been updated with the reward from the current context-arm pair. That is, the theta values do not hold the predicted values for the current context-arm pair since they hold the values computed after the reward for the current context-arm pair is known.)

With other policies, such as ContextualEpsilonGreedyPolicy, using the output from history$get_data_table() to compute the expected reward for the current action before it is taken is not so straightforward. I see in policy_cmab_lin_epsilon_greedy.R that you compute expected_rewards[arm], but you don't seem to save the values for output later on. It is exactly expected_rewards[arm] that I would like history$get_data_table() to include in its output. Having expected_rewards[arm] for just the chosen arm would be enough for my current needs, but maybe having expected_rewards[arm] for all arms would be useful in future.

I had a look at history.R to see if I could work out how to save the values of expected_rewards, but it looks rather complicated to me and my R is nowhere near as good as yours :-).

Thanks,

Paul

The text was updated successfully, but these errors were encountered:

robinvanemden · 2020-03-04T15:24:22Z

Hi @pstansell - first of all, my apologies for my late reply! Is this issue still of relevance to you?

pstansell · 2020-03-05T11:01:11Z

Hello Robin,

Yes, this issue is still relevant to me. The reason I'd like the predicted values of each arm is so that I can rank the arms in order of their predicted values. I'd like to rank the arms before a particular arm is chosen.

Thanks,

Paul

robinvanemden added the enhancement label Mar 5, 2020

robinvanemden added Feature request and removed enhancement labels Apr 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save predicted reward for chosen arm (feature request) #10

Save predicted reward for chosen arm (feature request) #10

pstansell commented Dec 25, 2019 •

edited

robinvanemden commented Mar 4, 2020

pstansell commented Mar 5, 2020

Save predicted reward for chosen arm (feature request) #10

Save predicted reward for chosen arm (feature request) #10

Comments

pstansell commented Dec 25, 2019 • edited

robinvanemden commented Mar 4, 2020

pstansell commented Mar 5, 2020

pstansell commented Dec 25, 2019 •

edited