Feature: Control number of vmapped envs in evaluator using `arch.num_envs` #1071

OmaymaMahjoub · 2024-03-26T07:26:25Z

What?

Modify the evaluator to limit the number of vmapped envs to arch.num_envs when the total number of evaluation episodes arch.num_eval_episodes exceeds this limit (instead of parallelise all the arch.num_eval_episodes). In such cases, evaluations are conducted in sequential batches, with each batch containing arch.num_envs parallel envs.

Why?

Limiting parallel evaluations to num_envs prevents out-of-memory issues by avoiding vmap over all episodes at once.

WiemKhlifi

LGTM, thanks @OmaymaMahjoub for making things more flexible 🙏

WiemKhlifi · 2024-03-26T11:09:34Z

mava/evaluator.py

        n_devices = len(jax.devices())
+        episodes_per_device = config.arch.num_eval_episodes * eval_multiplier // n_devices


Sadly jax doesn't allow us to carry these fixed values (parallel_eval_batch_size) 😢

I am quite sure there is a better way to implement this, but we can keep it as it is at the moment and create an issue calling for cleaning and readibility of the evaluator

sash-a

Two small things, one would take a while though so if there's no time we can wait till we refactor the evaluator

mava/evaluator.py

sash-a · 2024-03-26T11:13:13Z

mava/evaluator.py

+        parallel_eval_batch_size = min(config.arch.num_envs, episodes_per_device)
+        # Compute the number of sequential evaluation batches required per device
+        # to cover all episodes.
+        sequential_eval_batches = episodes_per_device // parallel_eval_batch_size


If you make keys of shape (sequential_eval_batches, num_vmapped_episodes) then you don't need the extra method, you can just call:

jax.lax.scan(jax.vmap(eval_one_episode), None, eval_states)

I like this for two reasons, you don't have to calculate parallel_eval_batch_size twice and it is also clear that you're scanning and then vmapping over episodes. But this is a big change so if you don't have time it's ok. I think the evaluator is due for a big overhaul anyways

I back this idea also 🔥

I see so even the eval_init will be created inside the one episode, I will give it a try

Co-authored-by: Sasha <reallysasha@gmail.com>

feat: make parallel envs in evaluator can't exceed num_envs

ada9090

OmaymaMahjoub added enhancement New feature or request priority/high labels Mar 26, 2024

OmaymaMahjoub self-assigned this Mar 26, 2024

OmaymaMahjoub requested review from arnupretorius, DriesSmit, RuanJohn, jcformanek, siddarthsingh1, sash-a, ulricharmel, callumtilbury and WiemKhlifi as code owners March 26, 2024 07:26

pull-request-size bot added the size/L label Mar 26, 2024

WiemKhlifi previously approved these changes Mar 26, 2024

View reviewed changes

sash-a requested changes Mar 26, 2024

View reviewed changes

Update mava/evaluator.py

2c1b36b

Co-authored-by: Sasha <reallysasha@gmail.com>

OmaymaMahjoub dismissed WiemKhlifi’s stale review via 2c1b36b March 27, 2024 07:49

fix: var name fixes

d2dce0f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Control number of vmapped envs in evaluator using `arch.num_envs` #1071

Feature: Control number of vmapped envs in evaluator using `arch.num_envs` #1071

OmaymaMahjoub commented Mar 26, 2024

WiemKhlifi left a comment

WiemKhlifi Mar 26, 2024 •

edited

OmaymaMahjoub Mar 27, 2024

sash-a left a comment

sash-a Mar 26, 2024 •

edited

WiemKhlifi Mar 26, 2024

OmaymaMahjoub Mar 27, 2024

		n_devices = len(jax.devices())
		episodes_per_device = config.arch.num_eval_episodes * eval_multiplier // n_devices

Feature: Control number of vmapped envs in evaluator using arch.num_envs #1071

Are you sure you want to change the base?

Feature: Control number of vmapped envs in evaluator using arch.num_envs #1071

Conversation

OmaymaMahjoub commented Mar 26, 2024

What?

Why?

WiemKhlifi left a comment

Choose a reason for hiding this comment

WiemKhlifi Mar 26, 2024 • edited

Choose a reason for hiding this comment

OmaymaMahjoub Mar 27, 2024

Choose a reason for hiding this comment

sash-a left a comment

Choose a reason for hiding this comment

sash-a Mar 26, 2024 • edited

Choose a reason for hiding this comment

WiemKhlifi Mar 26, 2024

Choose a reason for hiding this comment

OmaymaMahjoub Mar 27, 2024

Choose a reason for hiding this comment

Feature: Control number of vmapped envs in evaluator using `arch.num_envs` #1071

Feature: Control number of vmapped envs in evaluator using `arch.num_envs` #1071

WiemKhlifi Mar 26, 2024 •

edited

sash-a Mar 26, 2024 •

edited