Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save predicted reward for chosen arm (feature request) #10

Open
pstansell opened this issue Dec 25, 2019 · 2 comments
Open

Save predicted reward for chosen arm (feature request) #10

pstansell opened this issue Dec 25, 2019 · 2 comments

Comments

@pstansell
Copy link

pstansell commented Dec 25, 2019

Hello Robin,

This is a feature request, not a bug report.

I'd like the output from history$get_data_table() to include a column for the predicted values of the chosen arms at each step.

For example, for EpsilonGreedyPolicy it would just be self$theta$mean[[chosen_arm]], which I realise is available by setting save_theta = TRUE in Simulator$new. If I also set save_context = TRUE the predicted value of the chosen action can be obtained. (Although I have to take into account the fact that the theta values are one time step ahead of the values for the current context-arm pair since they have been updated with the reward from the current context-arm pair. That is, the theta values do not hold the predicted values for the current context-arm pair since they hold the values computed after the reward for the current context-arm pair is known.)

With other policies, such as ContextualEpsilonGreedyPolicy, using the output from history$get_data_table() to compute the expected reward for the current action before it is taken is not so straightforward. I see in policy_cmab_lin_epsilon_greedy.R that you compute expected_rewards[arm], but you don't seem to save the values for output later on. It is exactly expected_rewards[arm] that I would like history$get_data_table() to include in its output. Having expected_rewards[arm] for just the chosen arm would be enough for my current needs, but maybe having expected_rewards[arm] for all arms would be useful in future.

I had a look at history.R to see if I could work out how to save the values of expected_rewards, but it looks rather complicated to me and my R is nowhere near as good as yours :-).

Thanks,

Paul

@robinvanemden
Copy link
Member

Hi @pstansell - first of all, my apologies for my late reply! Is this issue still of relevance to you?

@pstansell
Copy link
Author

Hello Robin,

Yes, this issue is still relevant to me. The reason I'd like the predicted values of each arm is so that I can rank the arms in order of their predicted values. I'd like to rank the arms before a particular arm is chosen.

Thanks,

Paul

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants