Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement (environment and) algorithm(s) from "Adaptively Tracking the Best Arm with an Unknown Number of Distribution Changes" #146

Closed
5 of 6 tasks
Naereen opened this issue Oct 2, 2018 · 13 comments
Assignees
Labels
enhancement I have to improve something which already works not too badly new algo I have to implement a new algorithm! Yay! non-stationary For non-stationary bandits simulations question Things I'm not sure how to solve single-player For single-player bandits simulations

Comments

@Naereen
Copy link
Member

Naereen commented Oct 2, 2018

This recent article ["Adaptively Tracking the Best Arm with an Unknown Number of Distribution Changes". Peter Auer, Pratik Gajane and Ronald Ortner] is really interesting. They quote our work on doubling-trick, but I'm mostly interested about the piecewise stationary bandit model they study.

@Naereen Naereen added enhancement I have to improve something which already works not too badly question Things I'm not sure how to solve new algo I have to implement a new algorithm! Yay! labels Oct 2, 2018
@Naereen Naereen self-assigned this Oct 2, 2018
@Naereen Naereen changed the title Implement (environment and) algorithm(s) from Implement (environment and) algorithm(s) from "Adaptively Tracking the Best Arm with an Unknown Number of Distribution Changes" Oct 2, 2018
@Naereen Naereen added single-player For single-player bandits simulations non-stationary For non-stationary bandits simulations labels Oct 7, 2018
@Naereen
Copy link
Member Author

Naereen commented Nov 26, 2018

It's a really complicated algorithm…

Naereen added a commit that referenced this issue Nov 26, 2018
- Not sure at all how to store rewards to be able to compute empirical means like they do in their algorithm!
@Naereen
Copy link
Member Author

Naereen commented Nov 26, 2018

@Naereen
Copy link
Member Author

Naereen commented Nov 28, 2018

Example of regret for stationary bandit (K=9 arms, µ=[0.1,…,0.9]), T=1000 and N=10 repetitions (basic simulation):

  • Final regret of different policies:
    figure_1
  • Evolution of regret:
    figure_1

Example of regret for piecewise stationary bandit (K=2 arms, T=10000, N=16 repetitions):

  • Problem (history of means of arms)
    figure_1
  • Evolution of regret:
    figure_1
  • AdSwitch seems to be not too costly in comparison with other policies:
    figure_1

Naereen added a commit that referenced this issue Nov 28, 2018
- It's not good, and doesn't work well efficiently
@Naereen
Copy link
Member Author

Naereen commented Nov 28, 2018

With this implementation, I'm sure my DoublingTrickWrapper can be used with AdSwitch. I'm not even going to try, because I don't have time to check if my implementation is correct or explore why AdSwitch seems to be inefficient!

@Naereen Naereen reopened this Dec 12, 2018
@Naereen

This comment has been minimized.

@Naereen Naereen pinned this issue Dec 20, 2018
@Naereen
Copy link
Member Author

Naereen commented Jul 8, 2019

They improved their paper, it was published in COLT 2019, see here in PLMR, volume 99.

@haoyuzhao123
Copy link

I am not surprised to see that the inefficiency of AdSwitch in your experiments. Their main contribution is theoretically showing that AdSwitch is optimal and parameter free. In your experiment, the regret curve is almost straight in each section, and I guess that the time horizon in each section is not enough. I guess that in each section, the time should satisify T = 10,0000 and maybe you can observe the sqrt(t) curve.

@Naereen
Copy link
Member Author

Naereen commented Aug 17, 2019

Hi @haoyuzhao123,
Thanks for your remark, you are right.
Regards.

@Naereen
Copy link
Member Author

Naereen commented Oct 3, 2019

I have to implement the newest version, which seems simpler to code.
Capture d’écran_2019-10-03_17-37-25

@Naereen
Copy link
Member Author

Naereen commented Oct 4, 2019

I guess I'm done? I'm trying on a few experiments… I hope the code is not too bugged!

@Naereen
Copy link
Member Author

Naereen commented Oct 4, 2019

I guess the code works, but it's EXTREMELY slow. I have to try to implement their suggestion for speeding up the for loops:
Capture d’écran_2019-10-04_16-51-50

@Naereen
Copy link
Member Author

Naereen commented Oct 7, 2019

I think I'm done for this… I should work more, but I don't know what to do:

  • I don't understand their idea of optimization, it cannot reduce the complexity from $O(K t^3)$ to $O(K (\log(T))^2)$, this just seems impossible to me. I get the idea, but don't really know how to implement it!
  • I did some test where the AdSwitch algorithm performs much more efficient than stupid algorithms (e.g., uniform $U({1,\dots,K})$), and similarly to algorithms non-aware of the non-stationarity, and almost as well as OracleRestart.
  • 💥 but the algorithm is extremely slow to run, for instance for N=16 repetitions and T=1000 it takes 12 seconds, so for just T=1500 (simply 1.5 more, so a complexity in $O(T^4)$ should multiply by $1.5^4 \simeq 5$), it indeed takes up-to 72 seconds, that's just normal. But it's way too slow!

@Naereen
Copy link
Member Author

Naereen commented Oct 9, 2019

Some results of (tiny horizon) numerical simulations

Problem 1 (T=2000 and N=128 rep)

main_BoxPlotRegret____env1-1_8069578288429589148
main____env1-1_8069578288429589148

Problem 2 (T=2000 and N=128 rep)

main_BoxPlotRegret____env1-1_3023660779679331808
main____env1-1_3023660779679331808

Problem 3 (T=2000 and N=128 rep)

(problem 4 in our paper)
main_BoxPlotRegret____env1-1_3225533169000520703
main____env1-1_3225533169000520703

Problem 4 (T=2000 and N=128 rep)

(not in our paper)
main_BoxPlotRegret____env1-1_1849132217809735497
main____env1-1_1849132217809735497

@Naereen Naereen closed this as completed Oct 9, 2019
@Naereen Naereen unpinned this issue Oct 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement I have to improve something which already works not too badly new algo I have to implement a new algorithm! Yay! non-stationary For non-stationary bandits simulations question Things I'm not sure how to solve single-player For single-player bandits simulations
Development

No branches or pull requests

2 participants