Varying number of arm for single-player problems? #123

pratayaa · 2018-04-25T10:35:42Z

Want to know the initializing steps for varying number of arm in MEGA

Naereen · 2018-04-25T12:22:58Z

First, let K(t) denote the number of arms at time t. Classic problems are for K(t)=K for all t, and generic problems concern a K(t) that varies and can possibly be different at each time step t.

First, a few specifications and restrictions regarding the model:

It will be almost impossible if K(t) is unbounded or with an unknown bound. I suggest to restrict to problems where there is a K_min(=1?), K_max such that for all t, K(t) is in [K_min, …, K_max]. This way, you can store the history of rewards, pulls etc in just a big np.array, no need to do anything fancy.
SMPyBandits only support problems with static distributions (nu_1,…,nu_K) to sample the reward r(t)~nu_{A(t)}, and does not support dynamic distributions nu_k(t) changing, neither from time to time nor at each time. So it would be easier to start by adding support for problems with an underlying list of arm distributions (static), nu_{K_min},…,nu_{K_max}, and simply a list of "activated" arms, act(t) sub set of [ K_min,…,K_max], possibly changing at each time. This way, at time t, there is a list act(t) of arms, and a list [ nu_k for k in act(t) ] of distributions from were to sample the rewards r(t).

Directions towards implementing this:

I would write a new MAB class, child of the MAB.DynamicMAB class (maybe called VaryingNBArmsMAB or something like this), that adds a method activated_arms(t) which gives a list of "activated" arms at each time t. Note that DynamicMAB keep the size of problem fixed, but changes the means of arms 1,…,K at each call of newRandomArms(), so it's similar but different.
Then you should write a new Evaluator class, child of Environment.Evaluator (maybe called EvaluatorVaryingNBArms, that uses act_t = env.activated_arms(t) at each time to get this list of activated arm, and pass it to policy.choiceFromSubSet(availableArms=act_t) method of each policy, instead of policy.choice() (yes, this was already implemented, and it's available for most policies! including all index based policies, see here in IndexPolicy.py).
You probably need to write a new Result class, ResultVaryingNBArms, which keeps in memory the set act_t at each time (hint: use a np.array of bools, of size (horizon, K_max)).
Write another main_varyingnbarms simulation script, inspired by main.py, that imports everything, and uses a EvaluatorVaryingNBArms instead of Evaluator.
Then in Policies/YourPolicyOfchoice.py, do the necessary changes to add a method choiceFromSubSet() which makes sense.
First try the simulations without plotting anything nor computing any regret, if it works continues,
You need to think about what mathematical definition to give to the regret in this non-static problems, and implement the getCumulatedRegret method in EvaluatorVaryingNBArms in order to have regret plots.

This steps were if you want to add support for single-player simulations. I would suggest to start to try this.

For multi-player

Then if you want to add support for multi-player as well, more work is needed but it will be similar:

Write a ResultMultiPlayersVaryingNBArms class and a EvaluatorMultiPlayersVaryingNBArms class, that implements the changed explained above,
Then in PoliciesMultiPlayers/YourPolicyOfchoice.py, do the necessary changes to add a method choiceFromSubSet() which makes sense.
Same work for the regret.

Final remarks

It shouldn't be impossible, but all this is not trivial either.

I would say I'm now interested in doing it… but don't have time right now (and not until a few weeks or maybe months!).

Naereen · 2018-04-25T12:23:43Z

Needless to say, please do your best in trying all this, and I will consider pull requests only when working simulations can be run.
(I don't and won't have time to debug or write code with you, sorry.)

Naereen changed the title ~~Want to know the initializing steps for varying number of arm in MEGA, and apologies from my side of asking it in other repository.~~ Varying number of arm for single-player problems? Apr 25, 2018

Naereen self-assigned this Apr 25, 2018

Naereen added enhancement I have to improve something which already works not too badly question Things I'm not sure how to solve labels Apr 25, 2018

Naereen mentioned this issue Apr 25, 2018

Dynamic number of players in multi-players simulations (for static number of arms and static problem) ? #124

Open

Naereen added the non-stationary For non-stationary bandits simulations label Oct 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Varying number of arm for single-player problems? #123

Varying number of arm for single-player problems? #123

pratayaa commented Apr 25, 2018 •

edited by Naereen

Naereen commented Apr 25, 2018

Naereen commented Apr 25, 2018

Varying number of arm for single-player problems? #123

Varying number of arm for single-player problems? #123

Comments

pratayaa commented Apr 25, 2018 • edited by Naereen

Naereen commented Apr 25, 2018

First, a few specifications and restrictions regarding the model:

Directions towards implementing this:

For multi-player

Final remarks

Naereen commented Apr 25, 2018

pratayaa commented Apr 25, 2018 •

edited by Naereen