Implement (environment and) algorithm(s) from "Adaptively Tracking the Best Arm with an Unknown Number of Distribution Changes" #146

Naereen · 2018-10-02T15:50:52Z

This recent article ["Adaptively Tracking the Best Arm with an Unknown Number of Distribution Changes". Peter Auer, Pratik Gajane and Ronald Ortner] is really interesting. They quote our work on doubling-trick, but I'm mostly interested about the piecewise stationary bandit model they study.

I should read it more carefully,
I need to implement more "dynamic" scenarios in SMPyBandits, see Implement more "dynamic" scenarios #71, and the other issues Implement EXP3.R to tackle switching bandits ? #100, Implement Memory Bandits for switching bandits #130,
And implement in SMPyBandits their algorithm(s),
To do my own comparison against other "switching algorithms", mainly the naive sliding-window UCB and discounted UCB (from ["On upper-confidence bound policies for non-stationary bandit problems", Aurélien Garivier, Eric Moulines, arXiv 0805.3415]),
And check and verify their claims. (or disprove them?)
Also check that their algorithm can be adapted to work without knowing horizon T by using my DoublingTrickWrapper algorithm.

The text was updated successfully, but these errors were encountered:

Naereen · 2018-11-26T16:15:49Z

It's a really complicated algorithm…

- Not sure at all how to store rewards to be able to compute empirical means like they do in their algorithm!

Naereen · 2018-11-26T17:08:39Z

http://banditslilian.gforge.inria.fr/docs/Policies.AdSwitch.html

…460356310

Naereen · 2018-11-28T13:20:41Z

Example of regret for stationary bandit (K=9 arms, µ=[0.1,…,0.9]), T=1000 and N=10 repetitions (basic simulation):

Final regret of different policies:
Evolution of regret:

Example of regret for piecewise stationary bandit (K=2 arms, T=10000, N=16 repetitions):

Problem (history of means of arms)
Evolution of regret:
AdSwitch seems to be not too costly in comparison with other policies:

- It's not good, and doesn't work well efficiently

Naereen · 2018-11-28T13:55:02Z

With this implementation, I'm sure my DoublingTrickWrapper can be used with AdSwitch. I'm not even going to try, because I don't have time to check if my implementation is correct or explore why AdSwitch seems to be inefficient!

Naereen · 2019-07-08T15:46:41Z

They improved their paper, it was published in COLT 2019, see here in PLMR, volume 99.

haoyuzhao123 · 2019-08-17T02:06:09Z

I am not surprised to see that the inefficiency of AdSwitch in your experiments. Their main contribution is theoretically showing that AdSwitch is optimal and parameter free. In your experiment, the regret curve is almost straight in each section, and I guess that the time horizon in each section is not enough. I guess that in each section, the time should satisify T = 10,0000 and maybe you can observe the sqrt(t) curve.

Naereen · 2019-08-17T10:48:40Z

Hi @haoyuzhao123,
Thanks for your remark, you are right.
Regards.

Naereen · 2019-10-03T15:37:45Z

I have to implement the newest version, which seems simpler to code.

Naereen · 2019-10-04T14:33:03Z

I guess I'm done? I'm trying on a few experiments… I hope the code is not too bugged!

Naereen · 2019-10-04T14:52:22Z

I guess the code works, but it's EXTREMELY slow. I have to try to implement their suggestion for speeding up the for loops:

… is required #146

Naereen · 2019-10-07T16:43:10Z

I think I'm done for this… I should work more, but I don't know what to do:

I don't understand their idea of optimization, it cannot reduce the complexity from $O(K t^3)$ to $O(K (\log(T))^2)$, this just seems impossible to me. I get the idea, but don't really know how to implement it!
I did some test where the AdSwitch algorithm performs much more efficient than stupid algorithms (e.g., uniform $U({1,\dots,K})$), and similarly to algorithms non-aware of the non-stationarity, and almost as well as OracleRestart.
💥 but the algorithm is extremely slow to run, for instance for N=16 repetitions and T=1000 it takes 12 seconds, so for just T=1500 (simply 1.5 more, so a complexity in $O(T^4)$ should multiply by $1.5^4 \simeq 5$), it indeed takes up-to 72 seconds, that's just normal. But it's way too slow!

…ore. #146

Naereen · 2019-10-09T06:43:12Z

Some results of (tiny horizon) numerical simulations

Problem 1 (T=2000 and N=128 rep)

Problem 2 (T=2000 and N=128 rep)

Problem 3 (T=2000 and N=128 rep)

(problem 4 in our paper)

Problem 4 (T=2000 and N=128 rep)

(not in our paper)

Naereen added enhancement I have to improve something which already works not too badly question Things I'm not sure how to solve new algo I have to implement a new algorithm! Yay! labels Oct 2, 2018

Naereen self-assigned this Oct 2, 2018

Naereen changed the title ~~Implement (environment and) algorithm(s) from~~ Implement (environment and) algorithm(s) from "Adaptively Tracking the Best Arm with an Unknown Number of Distribution Changes" Oct 2, 2018

Naereen added single-player For single-player bandits simulations non-stationary For non-stationary bandits simulations labels Oct 7, 2018

Naereen mentioned this issue Oct 8, 2018

Article on non-stationary bandits Naereen/me#15

Open

5 tasks

Naereen added a commit that referenced this issue Nov 26, 2018

I'm struggling to implement this AdSwitch algorithm. Cf. #146

5058f12

- Not sure at all how to store rewards to be able to compute empirical means like they do in their algorithm!

Naereen added a commit that referenced this issue Nov 27, 2018

Still working on AdSwitch cf. #146

d4be760

Naereen added a commit that referenced this issue Nov 28, 2018

Cf. #146, Fixes https://travis-ci.org/SMPyBandits/SMPyBandits/builds/…

95f3174

…460356310

Naereen closed this as completed in fc58da9 Nov 28, 2018

Naereen added a commit that referenced this issue Nov 28, 2018

Done for AdSwitch, I understood it correctly. Fix #146

b611e5a

- It's not good, and doesn't work well efficiently

Naereen reopened this Dec 12, 2018

This comment has been minimized.

Sign in to view

Naereen pinned this issue Dec 20, 2018

Naereen added a commit that referenced this issue Oct 5, 2019

Almost done for the newest version of AdSwitch for K>=2 arm (#146).

9019ece

Naereen added a commit that referenced this issue Oct 6, 2019

Almost done coding a realistic version of AdSwitch, cf #146

9178494

Naereen added a commit that referenced this issue Oct 6, 2019

Should really be the correct algorithm, now that I found that C1 > 16…

88e658c

… is required #146

Naereen added a commit that referenced this issue Oct 7, 2019

Should be done for AdSwitch-New #146

9a1639c

Naereen added a commit that referenced this issue Oct 9, 2019

Should be done for this algorithm. It's not perfect, but I can't do m…

6c1bb0a

…ore. #146

Naereen closed this as completed Oct 9, 2019

Naereen unpinned this issue Oct 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement (environment and) algorithm(s) from "Adaptively Tracking the Best Arm with an Unknown Number of Distribution Changes" #146

Implement (environment and) algorithm(s) from "Adaptively Tracking the Best Arm with an Unknown Number of Distribution Changes" #146

Naereen commented Oct 2, 2018 •

edited

Naereen commented Nov 26, 2018

Naereen commented Nov 26, 2018

Naereen commented Nov 28, 2018

Naereen commented Nov 28, 2018

This comment has been minimized.

Naereen commented Jul 8, 2019

haoyuzhao123 commented Aug 17, 2019

Naereen commented Aug 17, 2019

Naereen commented Oct 3, 2019

Naereen commented Oct 4, 2019

Naereen commented Oct 4, 2019

Naereen commented Oct 7, 2019

Naereen commented Oct 9, 2019

Implement (environment and) algorithm(s) from "Adaptively Tracking the Best Arm with an Unknown Number of Distribution Changes" #146

Implement (environment and) algorithm(s) from "Adaptively Tracking the Best Arm with an Unknown Number of Distribution Changes" #146

Comments

Naereen commented Oct 2, 2018 • edited

Naereen commented Nov 26, 2018

Naereen commented Nov 26, 2018

Naereen commented Nov 28, 2018

Example of regret for stationary bandit (K=9 arms, µ=[0.1,…,0.9]), T=1000 and N=10 repetitions (basic simulation):

Example of regret for piecewise stationary bandit (K=2 arms, T=10000, N=16 repetitions):

Naereen commented Nov 28, 2018

This comment has been minimized.

Naereen commented Jul 8, 2019

haoyuzhao123 commented Aug 17, 2019

Naereen commented Aug 17, 2019

Naereen commented Oct 3, 2019

Naereen commented Oct 4, 2019

Naereen commented Oct 4, 2019

Naereen commented Oct 7, 2019

Naereen commented Oct 9, 2019

Some results of (tiny horizon) numerical simulations

Problem 1 (T=2000 and N=128 rep)

Problem 2 (T=2000 and N=128 rep)

Problem 3 (T=2000 and N=128 rep)

Problem 4 (T=2000 and N=128 rep)

Naereen commented Oct 2, 2018 •

edited