Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix missing seed / option args in dummy and subproc vec_env resets during stepping #1805

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

npit
Copy link
Contributor

@npit npit commented Jan 11, 2024

Description

  1. Passed the appropriate elements from self._seeds and self._options to a "done" env that calls its reset function, in DummyVecEnv.step_wait
  2. Added missing options argument in the reset functions of TimeDelayEnv and CustomSubClassedSpaceEnv
  3. Modified SubprocVecEnv to send options and seeds arguments as in (1) in step_async.
  4. Modified the _worker function's "step" case, to receive and pass these arguments into the env's reset, as in (1).

Motivation and Context

Closes #1790

  • I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (update in the documentation)

Checklist

  • I've read the CONTRIBUTION guide (required)
  • I have updated the changelog accordingly (required).
  • My change requires a change to the documentation.
  • I have updated the tests accordingly (required for a bug fix or a new feature).
  • I have updated the documentation accordingly.
  • I have opened an associated PR on the SB3-Contrib repository (if necessary)
  • I have opened an associated PR on the RL-Zoo3 repository (if necessary)
  • I have reformatted the code using make format (required)
  • I have checked the codestyle using make check-codestyle and make lint (required)
  • I have ensured make pytest and make type both pass. (required)
  • I have checked that the documentation builds using make doc (required)

Note: You can run most of the checks using make commit-checks.

Note: we are using a maximum length of 127 characters per line

@araffin araffin self-requested a review January 16, 2024 09:07
@araffin
Copy link
Member

araffin commented Jan 18, 2024

Hello @npit thanks for the PR and sorry for the delay.

By looking at it, I think it would be better to document the fact that we don't allow to pass options if reset is not called right after.
I'm afraid that dealing with partially reset env is more tricky.

To explain what I mean:
Imagine that you have two envs and you passed options=[{"a": 1}, {"a": 2}] (so you want to pass a=1 for 1st env and a=2 for second env.

Let's say the first env reset, then you can pass a=1, but what shall you discard it from the options list?
Also, what should happen if afterward the user options=[{"a": 3}, {"a": 3}] before the second env is reset?
In that case, the first env would have received the correct a=1 but the a=2 would have been silently discarded by the user passing the new options.

Enforcing this limitation would not limit the user though.
The user can still access setter using the VevEnv api (see #1789)

What do you think @qgallouedec ?

PS: options should not be required to work with SB3, so you should in theory make it opt-in (see what I did for the reset())

@npit
Copy link
Contributor Author

npit commented Jan 18, 2024

By looking at it, I think it would be better to document the fact that we don't allow to pass options if reset is not called right after.
I'm afraid that dealing with partially reset env is more tricky.

As I see it, providing functionality of parameterizing resets but having it apply on a subset of resets is quite confusing.
What's more, the implicit resets are currently outside the user's control: the vec_env.set_options() utility is to be called before the user loses control by invoking vec_env.step, evaluate_policy or whatever: at that point, the user may as well modify the environment directly by a bunch of vec_env.set_attr, since they currently have access to the vec_env. On the other hand, the implicit resets on episode terminations are out of bounds to the user once vec_env starts stepping.

What is the alternative if you want to control implicit resets? Should we make custom functionality instead (e.g. constructor args and dedicated logic)? Doesn't that route kind of defeat the purpose of having arguments to reset and the set_options util?

Let's say the first env reset, then you can pass a=1, but what shall you discard it from the options list?
Also, what should happen if afterward the user options=[{"a": 3}, {"a": 3}] before the second env is reset?
In that case, the first env would have received the correct a=1 but the a=2 would have been silently discarded by the user passing the new options.

Currently, options persist until the user explicitly invokes vec_env.reset as per the docs. If #1790 is indeed an issue to be fixed (i.e. if we do want implicit resets to consider passed option/seed parameters), then it would be up to the user to build the reset logic and arguments to set_options to fit whatever they want to do, if their use case is complicated enough for the examples listed above to be undesirable.

E.g. if reset options are a function of what previous options were consumed in the previous step, users can perhaps track / count the number of episodes per env and adjust. If they want different options per implicit reset, maybe pass an options list and cycle elements from it in reset.
Etc.

Does that make sense, or am I missing something?

@araffin
Copy link
Member

araffin commented Jan 20, 2024

the user may as well modify the environment directly by a bunch of vec_env.set_attr, since they currently have access to the vec_env.

yes, this is the recommended way (having setters, see the example I linked which is in the doc now: https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html#modifying-vectorized-environments-attributes).

Doesn't that route kind of defeat the purpose of having arguments to reset and the set_options util?

this API (passing options at reset) was not a SB3 decision but a gymnasium :/ (I was actually against it) and it was designed with single env in mind. We only provided some helper for people that wanted to use it.

What I would do in any way is update the doc, add a warning and link to the current discussion.

I will wait for the opinion of other maintainers to decide if this is something to be fixed or if it is a behavior that is ok but must be documented.

@qgallouedec
Copy link
Collaborator

I have the impression that passing an option to reset is a very little-used feature. I think it's important to support it though.

However, supporting very precise usage such as a list of options seems to me very complicated, both in terms of implementation and maintenance. And it seems like a bottomless pit:
What if the user wants to cycle through a common list of options for all envs? If we support both an option in the form of an object and an utterable of objects, what do we do if the option type expected by the environment is an iterable (do we need to fetch the signature, or do a try;expect?)?
In short, too complicated for nothing.

I approve of this PR as it stands, and I think we should limit the support to this.
For more advanced uses of option in reset, I think the user will have to customize their environment.

@qgallouedec
Copy link
Collaborator

qgallouedec commented Apr 17, 2024

If the user absolutely need option to be a cycle, they can still do this:

import gymnasium as gym

class MyWrapper(gym.Wrapper):
    def __init__(self, env, option_cycle):
        super().__init__(env)
        self.option_cycle = option_cycle

    def reset(self, seed=None, options=None):
        return self.env.reset(seed=seed, options=next(self.option_cycle))

# reset the environment, supplying seed and options
seed = self._seeds[env_idx]
options = self._options[env_idx]
obs, self.reset_infos[env_idx] = self.envs[env_idx].reset(seed=seed, options=options)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as I wrote in my first comment: "PS: options should not be required to work with SB3, so you should in theory make it opt-in (see what I did for the reset())"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Reset options ignored when resetting due to termination / truncation from within wrapper's step
3 participants