-
-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement (environment and) algorithm(s) from "Adaptively Tracking the Best Arm with an Unknown Number of Distribution Changes" #146
Comments
It's a really complicated algorithm… |
- Not sure at all how to store rewards to be able to compute empirical means like they do in their algorithm!
Example of regret for stationary bandit (K=9 arms, µ=[0.1,…,0.9]), T=1000 and N=10 repetitions (basic simulation):Example of regret for piecewise stationary bandit (K=2 arms, T=10000, N=16 repetitions): |
- It's not good, and doesn't work well efficiently
With this implementation, I'm sure my DoublingTrickWrapper can be used with AdSwitch. I'm not even going to try, because I don't have time to check if my implementation is correct or explore why AdSwitch seems to be inefficient! |
This comment has been minimized.
This comment has been minimized.
They improved their paper, it was published in COLT 2019, see here in PLMR, volume 99. |
I am not surprised to see that the inefficiency of AdSwitch in your experiments. Their main contribution is theoretically showing that AdSwitch is optimal and parameter free. In your experiment, the regret curve is almost straight in each section, and I guess that the time horizon in each section is not enough. I guess that in each section, the time should satisify T = 10,0000 and maybe you can observe the sqrt(t) curve. |
Hi @haoyuzhao123, |
I guess I'm done? I'm trying on a few experiments… I hope the code is not too bugged! |
I think I'm done for this… I should work more, but I don't know what to do:
|
DoublingTrickWrapper
algorithm.The text was updated successfully, but these errors were encountered: