Skip to content

Commit

Permalink
The Great Flattening (#219)
Browse files Browse the repository at this point in the history
* Prevent more than one call to wandb.login

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Fix bug: `decay_factor` hparam in avalanche/ewc.py

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Use wandb when wandb.project is enabled

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* wip: Adding task-inference to Avalanche models

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Use task-inference mechanism only at test-time

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Improved is_classic_control_env and is_atari_env

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Add is_monsterkong_env to utils.py

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Add passing env per task for IncrementalRL

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* [WIP]: Adding modified MuJoCO Envs

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* modified_size xml tweaking is better

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Finish adding ModifiedSizeEnv (not on-the-fly yet)

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* [wip] Fixing bugs in size scaling for hopper

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* [wip] Writing an algo for updating the sizes

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Wrapping up work on the mujoco envs

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Rework the task creation in RL

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Upgrade metaworld version

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Minor tweak in formatting

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Fix import bugs with mujoco envs

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Add ContinualHalfCheetah-v0 to ContinualRLSetting

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Debugging docker setup

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Remove 'observe_state_directly' field

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Add deprecation warning for observe_state_directly

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Fix tests for the current RL side of the tree

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Fix uncaught error when trying to import mujoco

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* [cluster debugging] fix bug in envs/__init__.py

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Fix issues with Avalanche logging on Cluster

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* [cluster debugging] fix stupid typo

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Added missing tasks.py file in incremental rl

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* [The Great Renaming] Flatten and rename modules

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* [The Great Renaming] SL setting tests pass

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Fix mujoco dependency during tests

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* [The Great Renaming] Add ContinualSLSetting

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Upgrade the 'smooth shuffling' code and tests

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Prevent iterating on closed ContinualSLEnvironment

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Fix bugs with BaselineMethod in continual_sl

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Fixed more failing tests, task-incremental bugs

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Fix formatting issue in assumption file

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Fix all tests on SL side

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Fix imports in Avalanche methods

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Add tests for Continuous/Discrete SL to Avalanche

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Add missing RL settings, fix tests for SB3

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Fix mujoco import bug and all Avalanche tests

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Add entry for mjkey.txt to gitignore

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Skip SAC tests unless --slow is passed

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* [docker debugging] Fix import error

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* [docker debugging] Adding mujoco-py as extras-req.

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Save results to a dir and upload to wandb

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Add presets for the RL sweep

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Set video_callable to True if using wandb

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Fix bugs with benchmark schedules, gym Monitor

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Fix sweeps crashing after first run

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* [Cluster debugging] Adding util scripts for eai

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Make all Avalanche methods target ContinualSL

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* [Docker debugging] Removing some scripts

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Monitor the training performance by default

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Fix hpo_sweep not correctly using wandb

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Fix task labels not given at test time in T-IL

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Fix bugs with PNN

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Fix wandb bug with saving the results to a dir

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Ignore exceptions when trying to save results

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Fix bug with PNN in Cifar100 with 10 tasks

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Avoid bug with Avalanche's default_logger

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Rebuild docker container before sweep

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Before RL Sweep

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Fix bug in ExperienceReplay, some SB3 Methods

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Fix sl_sweep.sh

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Add the 'objective_name' property

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Added the InteractiveLogger to Avalanche methods

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Fixing bug with HOME directory in eai jobs

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Fix bug in ContinualRLSetting with half_cheetah

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* [WIP] Rework ContinualRL to be based on Continual

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* [wip] Fix bugs in sequoia/common

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Rework of ContinualRLSetting

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* [temp] starting to fix Discrete RL

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Adding pixel variants for classic-control envs

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Improving tests, Fixing Discrete RL

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* [wip] Fixing Discrete results

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Fix all tests for DiscreteTaskAgnosticRLSetting

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Fix tests for IncrementalRLSetting

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* [wip] refactoring of the 'RL' settings

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Fix the children() bug in readme.py

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Fix mapping from mujoco env names to env specs

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* TraditionalRLSetting under IncremementalRLSetting

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Ugly commit: All tests in RL 'pass'

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Update README table and puml diagrams

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Fix tests for SL Settings

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Fix error in tests for EWC

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Add TypedDictSpace, to replace NamedTupleSpace

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Fix BaseMethod bug w/ max_epochs >= 1 in Continual

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Replace NamedTupleSpace -> TypedDictSpace

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Fix bugs with obs_space[0] -> obs_space["x"]

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Import the make_task functions in settings.rl

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Improving the tests for SettingProxy

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Removing the /tests folder

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Fix formatting and imports in examples

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Fix pytest.ini config, test everything by default

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Support dicts with extra keys in TypedDictSpace

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Reduce debug logging verbosity in RL

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Fix little bugs left in the RL settings

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Fix issues with the EpisodeLimit wrapper

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Add RoundRobin, Concat and RandomMulti wrappers

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Adding MultiEnv wrappers

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Fix issues with IterableWrapper

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Fix bug in MeasureSLPerformanceWrapper

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* [temp] MeasureRlPerformance wrapper / EnvDataset

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Fix bug in ConvertToFromTensors wrapper

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Fix pretty much every tests in RL

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Fix bugs with MeasureRLPerformanceWrapper

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Add on_task_switch_callback to ConcatEnvsWrapper

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Add test for equality in TypedDict test

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Adding support for metaworld's MT10 benchmark

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Fix bugs caused by obs[0] instead of obs["x"]

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Update README in sequoia/settings, methods

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Fix tests for SB3 methods

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Fix more bugs with obs_space[0] -> obs_space["x"]

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Fix bug with multi-headed BaseModel in MultiTaskRL

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Fix bug with avalanche.LwF with Multi-Task Model

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Use task_inference_forward pass when taskid=None

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* "fix" AssertionError in avalanche.Replay

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* Fix bugs in tests for avalanche.EWC

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

* "Fix" bugs in avalanche.SynapticIntelligence

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>
  • Loading branch information
lebrice committed Jun 18, 2021
1 parent 9108ad5 commit 84c35e6
Show file tree
Hide file tree
Showing 323 changed files with 19,495 additions and 7,434 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -28,3 +28,5 @@ build
dist
*.egg-info
sequoia/results

mjkey.txt
71 changes: 39 additions & 32 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,23 +27,23 @@ Requires python >= 3.7

1. Clone the repo:

```console
$ git clone https://www.github.com/lebrice/Sequoia.git
$ cd Sequoia
```
```console
$ git clone https://www.github.com/lebrice/Sequoia.git
$ cd Sequoia
```

2. Optional: Create the conda environment (only once):

```console
$ conda env create -f environment.yaml
$ conda activate sequoia
```
```console
$ conda env create -f environment.yaml
$ conda activate sequoia
```

3. Install the dependencies:

```console
$ pip install -e .
```
```console
$ pip install -e .
```

### Additional Installation Steps for Mac

Expand All @@ -66,21 +66,28 @@ sudo chown root /tmp/.X11-unix/

### Current Settings & Assumptions:

| Setting | Active/Passive? | clear task boundaries? | task labels at train time | task labels at test time | # of tasks ? |
| ----- | -------------- | ---------------------- | ------------------------- | ------------------------ | ------------ |
| [ContinualRLSetting](sequoia/settings/active/continual/continual_rl_setting.py) | Active | no | no | no | 1* |
| [IncrementalRLSetting](sequoia/settings/active/continual/incremental/incremental_rl_setting.py) | Active | **yes** | **yes** | no | ≥1 |
| [TaskIncrementalRLSetting](sequoia/settings/active/continual/incremental/task_incremental/task_incremental_rl_setting.py) | Active | **yes** | **yes** | **yes** | ≥1 |
| [RLSetting](sequoia/settings/active/continual/incremental/task_incremental/stationary/iid_rl_setting.py) | Active | **yes** | **yes** | **yes** | **1** |
| [ClassIncrementalSetting](sequoia/settings/passive/cl/class_incremental_setting.py) | Passive | **yes** | **yes** | no | ≥1 |
| [TaskIncrementalSetting](sequoia/settings/passive/cl/task_incremental/task_incremental_setting.py) | Passive | **yes** | **yes** | **yes** | ≥1 |
| [IIDSetting](sequoia/settings/passive/cl/task_incremental/iid/iid_setting.py) | Passive | **yes** | **yes** | **yes** | **1** |
| Setting | RL vs SL | clear task boundaries? | Task boundaries given? | Task labels at training time? | task labels at test time | Stationary context? | Fixed action space |
| ------------------------------------------------------------------------ | -- | ---------------------- | ---------------------- | ----------------------------- | ------------------------ | ------------------- | ------------------- |
| [Continual RL](sequoia/settings/rl/continual/setting.py) | RL | no | no | no | no | no | no(?) |
| [Discrete Task-Agnostic RL](sequoia/settings/rl/discrete/setting.py) | RL | **yes** | **yes** | no | no | no | no(?) |
| [Incremental RL](sequoia/settings/rl/incremental/setting.py) | RL | **yes** | **yes** | **yes** | no | no | no(?) |
| [Task-Incremental RL](sequoia/settings/rl/task_incremental/setting.py) | RL | **yes** | **yes** | **yes** | **yes** | no | no(?) |
| [Traditional RL](sequoia/settings/rl/task_incremental/setting.py) | RL | **yes** | **yes** | **yes** | no | **yes** | no(?) |
| [Multi-Task RL](sequoia/settings/rl/task_incremental/setting.py) | RL | **yes** | **yes** | **yes** | **yes** | **yes** | no(?) |
| [Continual SL](sequoia/settings/sl/continual/setting.py) | SL | no | no | no | no | no | no |
| [Discrete Task-Agnostic SL](sequoia/settings/sl/discrete/setting.py) | SL | **yes** | no | no | no | no | no |
| [(Class) Incremental SL](sequoia/settings/sl/incremental/setting.py) | SL | **yes** | **yes** | no | no | no | no |
| [Domain-Incremental SL](sequoia/settings/sl/domain_incremental/setting.py) | SL | **yes** | **yes** | **yes** | no | no | **yes** |
| [Task-Incremental SL](sequoia/settings/sl/task_incremental/setting.py) | SL | **yes** | **yes** | **yes** | **yes** | no | no |
| [Traditional SL](sequoia/settings/sl/traditional/setting.py) | SL | **yes** | **yes** | **yes** | no | **yes** | no |
| [Multi-Task SL](sequoia/settings/sl/multi_task/setting.py) | SL | **yes** | **yes** | **yes** | **yes** | **yes** | no |
<!-- | [Class-Incremental SL](sequoia/settings/sl/class_incremental/setting.py) | SL | **yes** | **yes** | no | no | no | -->

#### Notes

- **Active / Passive**:
Active settings are Settings where the next observation depends on the current action, i.e. where actions influence future observations, e.g. Reinforcement Learning.
Passive settings are Settings where the current actions don't influence the next observations (e.g. Supervised Learning.)
Active settings are Settings where the next observation depends on the current action, i.e. where actions influence future observations, e.g. Reinforcement Learning.
Passive settings are Settings where the current actions don't influence the next observations (e.g. Supervised Learning.)

- **Bold entries** in the table mark constant attributes which cannot be
changed from their default value.
Expand All @@ -97,9 +104,9 @@ sudo chown root /tmp/.X11-unix/
#### Directly in code:

```python
from sequoia.settings import TaskIncrementalSetting
from sequoia.settings import TaskIncrementalSLSetting
from sequoia.methods import BaselineMethod
setting = TaskIncrementalSetting(dataset="mnist")
setting = TaskIncrementalSLSetting(dataset="mnist")
method = BaselineMethod(max_epochs=1)

results = setting.apply(method)
Expand All @@ -112,17 +119,17 @@ sequoia --setting <some_setting> --method <some_method> (arguments)
```
For example:
- Run the BaselineMethod on task-incremental MNIST, with one epoch per task, and without wandb:
```console
sequoia --setting task_incremental --dataset mnist --method baseline --max_epochs 1 --no_wandb
```
```console
sequoia --setting task_incremental --dataset mnist --method baseline --max_epochs 1 --no_wandb
```
- Run the PPO Method from stable-baselines3 on an incremental RL setting, with the default dataset (CartPole) and 5 tasks:
```console
sequoia --setting incremental_rl --nb_tasks 5 --method ppo --steps_per_task 10_000
```
```console
sequoia --setting incremental_rl --nb_tasks 5 --method ppo --steps_per_task 10_000
```

- Running multiple experiments (wip):

If you leave out the `--method` argument above, the experiment will compare the results of all the methods applicable to the chosen Setting.
If you leave out the `--method` argument above, the experiment will compare the results of all the methods applicable to the chosen Setting.

Likewise, if you leave the `--setting` option unset, the experiment will evaluate the performance of the selected method on all its applicable settings (WIP: and a table will be shown).
Likewise, if you leave the `--setting` option unset, the experiment will evaluate the performance of the selected method on all its applicable settings (WIP: and a table will be shown).

2 changes: 1 addition & 1 deletion dockers/base/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,6 @@ SHELL [ "conda", "run", "-n", "base", "/bin/bash", "-c"]
# RUN sed -i 's/robbyrussell/clean/' ~/.zshrc
# RUN sed -i 's/plugins=(git)/plugins=(git debian history-substring-search)/' ~/.zshrc

RUN mkdir /workspace/tools

# MuJoCo-related stuff:
# RUN curl -o ~/mujoco200_linux.zip -L -C - https://www.roboti.us/download/mujoco200_linux.zip
Expand All @@ -70,6 +69,7 @@ RUN mkdir /workspace/tools
# COPY mjkey.txt /home/toolkit/.mujoco/
# ENV LD_LIBRARY_PATH /home/toolkit/.mujoco/mujoco200/bin:${LD_LIBRARY_PATH}
# ENV LD_LIBRARY_PATH /home/toolkit/.mujoco/mjpro150/bin:${LD_LIBRARY_PATH}
# RUN mkdir /workspace/tools
# RUN cd /workspace/tools && git clone https://github.com/openai/mujoco-py.git && pip install -e mujoco-py

# For Wandb (TODO: Doesn't appear to work, using env variable with WANDB_API_KEY
Expand Down
17 changes: 8 additions & 9 deletions docs/diagrams/src/gym.puml
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,9 @@
package gym {
package spaces as gym.spaces {
abstract class Space<T> {
+ bool contains(T sample)
+ T sample()
+ contains(T sample) -> bool
+ sample() -> T
}

class Box extends Space {
+ low: np.ndarray
+ high: np.ndarray
Expand Down Expand Up @@ -39,16 +38,16 @@ package gym {
Dict *-- Space
}

abstract class Env<Obs, Act, Rew>{
abstract class gym.Env<Obs, Act, Rew> {
+ observation_space: Space<Obs>
+ action_space: Space<Act>
+ step(Actions): Tuple[Obs, Rew, bool, dict]
+ reset(): Obs
+ step(Actions) -> Tuple[Obs, Rew, bool, dict]
+ reset() -> Obs
}
Env .. Space
gym.Env .. Space

abstract class Wrapper extends Env{
+ env: Env
abstract class Wrapper extends gym.Env{
+ env: gym.Env
}
}

Expand Down
13 changes: 6 additions & 7 deletions examples/advanced/RL_and_SL_demo.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
from sequoia.methods import BaselineMethod
from sequoia.methods.aux_tasks import AuxiliaryTask, SimCLRTask
from sequoia.methods.models import BaselineModel, ForwardPass
from sequoia.settings import Setting, Environment, ActiveSetting
from sequoia.settings import Setting, Environment, RLSetting
from sequoia.utils import camel_case, dict_intersection, get_logger

logger = get_logger(__file__)
Expand Down Expand Up @@ -192,7 +192,7 @@ def configure(self, setting: Setting):

# For example, change the value of the coefficient of our
# regularization loss when in RL vs SL:
if isinstance(setting, ActiveSetting):
if isinstance(setting, RLSetting):
self.hparams.simple_reg.coefficient = 0.01
else:
self.hparams.simple_reg.coefficient = 1.0
Expand Down Expand Up @@ -236,16 +236,15 @@ def from_argparse_args(cls, args: Namespace, dest: str = ""):
def demo_manual():
""" Apply the custom method to a Setting, creating both manually in code. """
# Create any Setting from the tree:
from sequoia.settings import TaskIncrementalRLSetting, TaskIncrementalSetting
from sequoia.settings import TaskIncrementalRLSetting, TaskIncrementalSLSetting

# setting = TaskIncrementalSetting(dataset="mnist", nb_tasks=5) # SL
# setting = TaskIncrementalSLSetting(dataset="mnist", nb_tasks=5) # SL
setting = TaskIncrementalRLSetting( # RL
dataset="cartpole",
train_task_schedule={
0: {"gravity": 10, "length": 0.5},
5000: {"gravity": 10, "length": 1.0},
},
observe_state_directly=True, # state input, rather than pixel input.
max_steps=10_000,
)

Expand Down Expand Up @@ -291,9 +290,9 @@ def demo_command_line():
parser = ArgumentParser(description=__doc__)

## Add command-line arguments for any Setting in the tree:
from sequoia.settings import TaskIncrementalRLSetting, TaskIncrementalSetting
from sequoia.settings import TaskIncrementalRLSetting, TaskIncrementalSLSetting

# parser.add_arguments(TaskIncrementalSetting, dest="setting")
# parser.add_arguments(TaskIncrementalSLSetting, dest="setting")
parser.add_arguments(TaskIncrementalRLSetting, dest="setting")
parser.add_arguments(Config, dest="config")

Expand Down
3 changes: 1 addition & 2 deletions examples/advanced/continual_rl_demo.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,7 @@
# setting = TaskIncrementalRLSetting(
# setting = RLSetting(
dataset="CartPole-v1",
observe_state_directly=True,
max_steps=2000,
train_max_steps=2000,
train_task_schedule=task_schedule,
)
# Create the method to use here:
Expand Down
3 changes: 1 addition & 2 deletions examples/advanced/ewc_in_rl.py
Original file line number Diff line number Diff line change
Expand Up @@ -325,13 +325,12 @@ def on_task_switch(self, task_id: Optional[int]) -> None:
if __name__ == "__main__":
setting = TaskIncrementalRLSetting(
dataset="cartpole",
observe_state_directly=True,
nb_tasks=2,
train_task_schedule={
0: {"gravity": 10, "length": 0.3},
1000: {"gravity": 10, "length": 0.5}, # second task is 'easier' than the first one.
},
max_steps =2000,
train_max_steps =2000,
)
method = EWCExampleMethod(reg_coefficient=0.)
results_without_reg = setting.apply(method)
Expand Down

0 comments on commit 84c35e6

Please sign in to comment.