You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What operating system and processor architecture are you using?
Windows, 64-bit operating system, x64-based processor
What did you do?
I attempted to add vectorization to the self-play script to train DQN agent in PettingZoo AEC env. However, it seems like DQN's getAction assumes the usage of single action mask for all environments. It results in the mismatch between the shapes of mask and data fed into np.ma.array
File "C:\....\AgileRL\.venv\lib\site-packages\numpy\ma\core.py", line 2900, in __new__
raise MaskError(msg % (nd, nm))
numpy.ma.core.MaskError: Mask and data not compatible: data size is 2, mask size is 4.
What did you expect to see?
A list of actions [1, 0]. Each action corresponds to a respective action mask and state.
What did you see instead? Describe the bug. numpy.ma.core.MaskError: Mask and data not compatible: data size is 2, mask size is 4.
Additional context
The current getAction() seems to assume that action_mask is an 1D array, the size of which corresponds to the action_dim. It then samples n actions, where n is the number of observations (state.size()[0]). However, when the 'action_mask' is not an 1D array, the mask shape does not have the same shape as np.arange(0, self.action_dim).
I fixed this issue locally by modifying the getAction().
I expand dimension if the action_mask.ndim == 1.
Then randomly sample one action for each action mask.
The text was updated successfully, but these errors were encountered:
Hey, how did you vectorize the environment? PZ doesn't offer any wrapper for vectorizing envs. We have one for parallel-API PZ envs. Could make a good contribution to the framework!
In terms of the action mask not working for vectorized envs, this is because it wasn't originally designed to, but with more clarity on how your vectorization works we can easily implement it I'm sure
Hi! I referenced your wrapper for PZ Parallel Env and implemented a wrapper for AEC API. It might not be fully vectorized because it waits until episodes in all environments are terminated or truncated (done) to start a new set of episodes. The main changes are:
I keep track of which environments are done in an array.
SubprocVecEnv's step() takes a list of actions and corresponding environment indices instead of sending actions to all environments to handle case when some episodes are done.
If worker() receives "last", it separates action_mask from observation and sends obs, action_mask, reward, terminated, truncated, info for all environments back. As I have a custom training loop, having separate obs and action_mask is not a problem but they can be put into a dict as well.
I update the array based on which envs are done
I extract the observations and action masks for environments that are not done and call agent's getAction to compute actions. action_mask[i] corresponds to state[i].
hey @nargizsentience, is there a particular reason for the environments not autoresetting themselves? I remember writing the parallelization with keeping autoresetting in mind so this is a personal question.
with your description, I don't think amending the getAction would be too hard and make it vectorization compatible.
What version of AgileRL are you using?
v0.1.19
What operating system and processor architecture are you using?
Windows, 64-bit operating system, x64-based processor
What did you do?
I attempted to add vectorization to the self-play script to train DQN agent in PettingZoo AEC env. However, it seems like DQN's getAction assumes the usage of single action mask for all environments. It results in the mismatch between the shapes of mask and data fed into
np.ma.array
Steps to reproduce the behaviour:
What did you expect to see?
A list of actions
[1, 0]
. Each action corresponds to a respective action mask and state.What did you see instead? Describe the bug.
numpy.ma.core.MaskError: Mask and data not compatible: data size is 2, mask size is 4.
Additional context
The current getAction() seems to assume that
action_mask
is an 1D array, the size of which corresponds to theaction_dim
. It then samplesn
actions, wheren
is the number of observations (state.size()[0]
). However, when the 'action_mask' is not an 1D array, the mask shape does not have the same shape asnp.arange(0, self.action_dim)
.I fixed this issue locally by modifying the getAction().
action_mask.ndim == 1
.The text was updated successfully, but these errors were encountered: