Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: noisy-net layer #97

Open
mysl opened this issue Jan 31, 2019 · 8 comments
Open

Bug: noisy-net layer #97

mysl opened this issue Jan 31, 2019 · 8 comments

Comments

@mysl
Copy link

mysl commented Jan 31, 2019

hi @Kismuz
I was reading the paper "Noisy Network for exploration". And have a question w.r.t its usage in btgym. The paper says that "As A3C is an on-policy algorithm the gradients are unbiased when noise of the network is consistent for the whole roll-out. Consistency among action value functions is ensured by letting the noise be the same throughout each rollout"

It looks to me that in current implementation in btgym, it can't ensure "the noise is the same throughout each rollout", because the training steps and environment steps are executed in different threads, and could be interleaved. Or do I miss anythong? Thanks!

@Kismuz
Copy link
Owner

Kismuz commented Jan 31, 2019

@mysl , seems you are right. Appendix A of the paper clearly states noise should be fixed for the entire rollout. Layer was adapted from DQN implementation without sufficient expertise, sorry for that.
Do I understand it correctly: noise is fixed for a train batch pass and get resampled with every step when collecting experience?

@Kismuz
Copy link
Owner

Kismuz commented Jan 31, 2019

As a quick fix, to disable noisy_net layer one can pass policy kwarg explicitly, mind tuning entropy regularisation:

from btgym.algorithms.nn.layers import linear

# Policy architecture setup:
policy_config = dict(
    class_ref=GuidedPolicy_0_0,
    kwargs={
        'lstm_layers': (256, 256),
        'state_encoder_class_ref': conv_1d_casual_encoder,
        'dropout_keep_prob': 0.5,
        'linear_layer_ref': linear,   <------
    }
)

# Algorithm config:
trainer_config = dict(
   ...,
    kwargs=dict(
        ...,
        model_beta=0.05,   <------
        ...,
    )
)

@Kismuz Kismuz changed the title Question about noisynet usage Bug: noisy-net layer Jan 31, 2019
@Kismuz Kismuz pinned this issue Jan 31, 2019
@mysl
Copy link
Author

mysl commented Jan 31, 2019

Do I understand it correctly: noise is fixed for a train batch pass and get resampled with every step when collecting experience?

My understanding is that the algorithm in the paper for NoisyNet-DQN(appendix C.1), noise is sampled on every environment step. While for NoisyNet-A3C(appendix C.2), noise is sampled on every rollout batch, so in this implementation, maybe we should use a placeholder for the noise, and sample outside of the network?

@Kismuz
Copy link
Owner

Kismuz commented Jan 31, 2019

noise is sampled on every rollout

yes, but isn't that solely in context of gradient estimation (train pass)?

maybe we should use a placeholder for the noise, and sample outside of the network?

yes, if wee need to fix noise at time of data acquisition (see above), no if noise to be fixed for train batch only (can infer size and sample in-graph)

@mysl
Copy link
Author

mysl commented Jan 31, 2019

yes, if wee need to fix noise at time of data acquisition (see above), no if noise to be fixed for train batch only (can infer size and sample in-graph)

I think the noise should be fixed when collecting experience as well, since A3C is an on policy algorithm. And this seems agreeing to the pseduo code (line 7) in the paper

image

@Kismuz
Copy link
Owner

Kismuz commented Jan 31, 2019

Yes, indeed.
As pseudocode shows it is the same noise for collecting and for training (of same rollout);
that mens it sure be placeholder input; but is also essential to keep noise as part of experience;
currently all rollout information packing/unpacking is handled by btgym.algorithm.rollout.Rollout class which is essentially a nested dictionary of lists; maybe it can be optimal to extend it with new key holding one noise tensor per rollout; noise emitting method could be one of a policy instance (knows required shape an properties) or even .get_initial_features() with dummy output when no noisy-net layers present

@Kismuz
Copy link
Owner

Kismuz commented Feb 2, 2019

Due to time limitations expected time to fix the issue is four to five days.
Until that It is best to use a linear layer (mentioned above).
If anyone wants to contribute - it is is highly appreciated.

@Kismuz Kismuz added this to High priority in BTGym Feb 8, 2019
@Kismuz Kismuz moved this from High priority to Needs triage in BTGym Feb 8, 2019
@Kismuz Kismuz moved this from Needs triage to In progress in BTGym Feb 8, 2019
@Kismuz
Copy link
Owner

Kismuz commented Feb 16, 2019

TODO checklist:

btgym.algorithms.rollout.Rollout:

  • define an episode-wide configurable [nested dict] field

btgym.algorithms.policy.base.BaseAacPolicy:

  • infer noise samples shapes from specs of added layers
  • add policy noise input placeholders of same shape
  • make specific noise-generating method (same shape)
  • redefine policy step callbacks and rollout callbacks.

btgym.algorithms.runner:
add above field processing to runners via policy callback functions,
separate: policy step callbacks and rollout callbacks

  • modify btgym.algorithms.runner.base.BaseThreadRunner_fn
  • modify btgym.algorithms.runner.sychro.BaseSynchroRunner

btgym.algorithms.aac.BaseAac:

  • add nose pl. handling to _get_main_feeder()

@Kismuz Kismuz unpinned this issue Feb 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
BTGym
  
In progress
Development

No branches or pull requests

2 participants