Bug: noisy-net layer #97

mysl · 2019-01-31T07:11:07Z

hi @Kismuz
I was reading the paper "Noisy Network for exploration". And have a question w.r.t its usage in btgym. The paper says that "As A3C is an on-policy algorithm the gradients are unbiased when noise of the network is consistent for the whole roll-out. Consistency among action value functions is ensured by letting the noise be the same throughout each rollout"

It looks to me that in current implementation in btgym, it can't ensure "the noise is the same throughout each rollout", because the training steps and environment steps are executed in different threads, and could be interleaved. Or do I miss anythong? Thanks!

Kismuz · 2019-01-31T11:08:24Z

@mysl , seems you are right. Appendix A of the paper clearly states noise should be fixed for the entire rollout. Layer was adapted from DQN implementation without sufficient expertise, sorry for that.
Do I understand it correctly: noise is fixed for a train batch pass and get resampled with every step when collecting experience?

Kismuz · 2019-01-31T11:27:33Z

As a quick fix, to disable noisy_net layer one can pass policy kwarg explicitly, mind tuning entropy regularisation:

from btgym.algorithms.nn.layers import linear

# Policy architecture setup:
policy_config = dict(
    class_ref=GuidedPolicy_0_0,
    kwargs={
        'lstm_layers': (256, 256),
        'state_encoder_class_ref': conv_1d_casual_encoder,
        'dropout_keep_prob': 0.5,
        'linear_layer_ref': linear,   <------
    }
)

# Algorithm config:
trainer_config = dict(
   ...,
    kwargs=dict(
        ...,
        model_beta=0.05,   <------
        ...,
    )
)

mysl · 2019-01-31T13:48:27Z

Do I understand it correctly: noise is fixed for a train batch pass and get resampled with every step when collecting experience?

My understanding is that the algorithm in the paper for NoisyNet-DQN(appendix C.1), noise is sampled on every environment step. While for NoisyNet-A3C(appendix C.2), noise is sampled on every rollout batch, so in this implementation, maybe we should use a placeholder for the noise, and sample outside of the network?

Kismuz · 2019-01-31T13:56:48Z

noise is sampled on every rollout

yes, but isn't that solely in context of gradient estimation (train pass)?

maybe we should use a placeholder for the noise, and sample outside of the network?

yes, if wee need to fix noise at time of data acquisition (see above), no if noise to be fixed for train batch only (can infer size and sample in-graph)

mysl · 2019-01-31T14:32:10Z

yes, if wee need to fix noise at time of data acquisition (see above), no if noise to be fixed for train batch only (can infer size and sample in-graph)

I think the noise should be fixed when collecting experience as well, since A3C is an on policy algorithm. And this seems agreeing to the pseduo code (line 7) in the paper

Kismuz · 2019-01-31T14:49:18Z

Yes, indeed.
As pseudocode shows it is the same noise for collecting and for training (of same rollout);
that mens it sure be placeholder input; but is also essential to keep noise as part of experience;
currently all rollout information packing/unpacking is handled by btgym.algorithm.rollout.Rollout class which is essentially a nested dictionary of lists; maybe it can be optimal to extend it with new key holding one noise tensor per rollout; noise emitting method could be one of a policy instance (knows required shape an properties) or even .get_initial_features() with dummy output when no noisy-net layers present

Kismuz · 2019-02-02T09:25:20Z

Due to time limitations expected time to fix the issue is four to five days.
Until that It is best to use a linear layer (mentioned above).
If anyone wants to contribute - it is is highly appreciated.

Kismuz · 2019-02-16T09:27:09Z

TODO checklist:

btgym.algorithms.rollout.Rollout:

define an episode-wide configurable [nested dict] field

btgym.algorithms.policy.base.BaseAacPolicy:

infer noise samples shapes from specs of added layers
add policy noise input placeholders of same shape
make specific noise-generating method (same shape)
redefine policy step callbacks and rollout callbacks.

btgym.algorithms.runner:
add above field processing to runners via policy callback functions,
separate: policy step callbacks and rollout callbacks

modify btgym.algorithms.runner.base.BaseThreadRunner_fn
modify btgym.algorithms.runner.sychro.BaseSynchroRunner

btgym.algorithms.aac.BaseAac:

add nose pl. handling to _get_main_feeder()

Kismuz added question algorithm bug and removed question labels Jan 31, 2019

Kismuz changed the title ~~Question about noisynet usage~~ Bug: noisy-net layer Jan 31, 2019

Kismuz pinned this issue Jan 31, 2019

Kismuz added this to High priority in BTGym Feb 8, 2019

Kismuz moved this from High priority to Needs triage in BTGym Feb 8, 2019

Kismuz moved this from Needs triage to In progress in BTGym Feb 8, 2019

Kismuz unpinned this issue Feb 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: noisy-net layer #97

Bug: noisy-net layer #97

mysl commented Jan 31, 2019

Kismuz commented Jan 31, 2019

Kismuz commented Jan 31, 2019

mysl commented Jan 31, 2019 •

edited

Kismuz commented Jan 31, 2019 •

edited

mysl commented Jan 31, 2019

Kismuz commented Jan 31, 2019

Kismuz commented Feb 2, 2019

Kismuz commented Feb 16, 2019 •

edited

Bug: noisy-net layer #97

Bug: noisy-net layer #97

Comments

mysl commented Jan 31, 2019

Kismuz commented Jan 31, 2019

Kismuz commented Jan 31, 2019

mysl commented Jan 31, 2019 • edited

Kismuz commented Jan 31, 2019 • edited

mysl commented Jan 31, 2019

Kismuz commented Jan 31, 2019

Kismuz commented Feb 2, 2019

Kismuz commented Feb 16, 2019 • edited

TODO checklist:

mysl commented Jan 31, 2019 •

edited

Kismuz commented Jan 31, 2019 •

edited

Kismuz commented Feb 16, 2019 •

edited