DQN: Action mask is not compatible in vectorized environments #186

nargizsentience · 2024-01-29T08:52:04Z

What version of AgileRL are you using?
v0.1.19

What operating system and processor architecture are you using?
Windows, 64-bit operating system, x64-based processor

What did you do?
I attempted to add vectorization to the self-play script to train DQN agent in PettingZoo AEC env. However, it seems like DQN's getAction assumes the usage of single action mask for all environments. It results in the mismatch between the shapes of mask and data fed into np.ma.array

Steps to reproduce the behaviour:

Run the next reproduction script:

import numpy as np
from agilerl.algorithms.dqn import DQN

state_dim = [4]
action_dim = 2
one_hot = True

dqn = DQN(state_dim, action_dim, one_hot)
state = np.array([[1], [1]])
action_mask = np.array([[0, 1], [1, 0]])
epsilon = 1
action = dqn.getAction(state, epsilon, action_mask)
print(action)

See error:

  File "C:\....\AgileRL\.venv\lib\site-packages\numpy\ma\core.py", line 2900, in __new__
    raise MaskError(msg % (nd, nm))
numpy.ma.core.MaskError: Mask and data not compatible: data size is 2, mask size is 4.

What did you expect to see?
A list of actions [1, 0]. Each action corresponds to a respective action mask and state.

What did you see instead? Describe the bug.
numpy.ma.core.MaskError: Mask and data not compatible: data size is 2, mask size is 4.

Additional context
The current getAction() seems to assume that action_mask is an 1D array, the size of which corresponds to the action_dim. It then samples n actions, where n is the number of observations (state.size()[0]). However, when the 'action_mask' is not an 1D array, the mask shape does not have the same shape as np.arange(0, self.action_dim).
I fixed this issue locally by modifying the getAction().

I expand dimension if the action_mask.ndim == 1.
Then randomly sample one action for each action mask.

The text was updated successfully, but these errors were encountered:

nicku-a · 2024-02-12T13:32:43Z

Hey, how did you vectorize the environment? PZ doesn't offer any wrapper for vectorizing envs. We have one for parallel-API PZ envs. Could make a good contribution to the framework!

In terms of the action mask not working for vectorized envs, this is because it wasn't originally designed to, but with more clarity on how your vectorization works we can easily implement it I'm sure

nargizsentience · 2024-02-14T01:56:30Z

Hi! I referenced your wrapper for PZ Parallel Env and implemented a wrapper for AEC API. It might not be fully vectorized because it waits until episodes in all environments are terminated or truncated (done) to start a new set of episodes. The main changes are:

I keep track of which environments are done in an array.
SubprocVecEnv's step() takes a list of actions and corresponding environment indices instead of sending actions to all environments to handle case when some episodes are done.
If worker() receives "last", it separates action_mask from observation and sends obs, action_mask, reward, terminated, truncated, info for all environments back. As I have a custom training loop, having separate obs and action_mask is not a problem but they can be put into a dict as well.
I update the array based on which envs are done
I extract the observations and action masks for environments that are not done and call agent's getAction to compute actions. action_mask[i] corresponds to state[i].

gonultasbu · 2024-03-15T01:12:27Z

hey @nargizsentience, is there a particular reason for the environments not autoresetting themselves? I remember writing the parallelization with keeping autoresetting in mind so this is a personal question.

with your description, I don't think amending the getAction would be too hard and make it vectorization compatible.

nargizsentience added the bug Something isn't working label Jan 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DQN: Action mask is not compatible in vectorized environments #186

DQN: Action mask is not compatible in vectorized environments #186

nargizsentience commented Jan 29, 2024

nicku-a commented Feb 12, 2024

nargizsentience commented Feb 14, 2024

gonultasbu commented Mar 15, 2024

DQN: Action mask is not compatible in vectorized environments #186

DQN: Action mask is not compatible in vectorized environments #186

Comments

nargizsentience commented Jan 29, 2024

nicku-a commented Feb 12, 2024

nargizsentience commented Feb 14, 2024

gonultasbu commented Mar 15, 2024