Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Neighboring states are being affected by partial observability #6

Closed
marimeireles opened this issue Mar 24, 2024 · 3 comments
Closed

Comments

@marimeireles
Copy link
Collaborator

I suspect there's something wrong with the way we're stepping through the different iterations to generate the flow graphs. I believe that the probabilities of observing states in one tensor are affecting the probabilities of seeing other states in other tensors.

Here's a h=(2,2,2) history plot.
Agent 1 has partial observability in which the agent's c,c.|c,c.| state is completely obscure:

memo1pd.O[1] = np.array(       
        [[0.0625, 0.0625, 0.0625, 0.0625,0.0625, 0.0625, 0.0625, 0.0625,0.0625, 0.0625, 0.0625, 0.0625,0.0625, 0.0625, 0.0625, 0.0625],
        [0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]])

And the agent 0 has complete observability.

Here's the plot of the first c,c.|c,c.| state and the next 3 states.

Screenshot 2024-03-24 at 12 03 43 PM

My understanding is that by obscuring state c,c.|c,c.| we shouldn't have any influence in the neighboring states, but we do. As we can see if I plot now a history h=(2,2,2) with both agents homogeneously fully observing the environment. The expected result should be the three states preceding c,c.|c,c.| to be exactly the same, however we observe that they're different:

Screenshot 2024-03-24 at 12 08 24 PM

We observe similar results for heterogeneous agents (as described previously) in a h=(1,2,2) scenario:

Screenshot 2024-03-24 at 12 09 31 PM

And for comparison here's the plot of homogeneous agents for a scenario with h=(1,2,2):

Screenshot 2024-03-24 at 12 10 31 PM

As you can see the changes are very slight! But they're more pungent in 4x4 matrixes because instead of 0.0625 we have 0.25 and they have a stronger influence in the neighboring graphs. My understanding is that this behaviour shouldn't be happening.

@marimeireles
Copy link
Collaborator Author

Or maybe that is expected behavior after all, and I'm just confused.
Looking at how the history class builds observation tensors we have the following parameters:

Number of agents, which in this case is 2.
Total number of observation-action histories, 4:
[(0, 0, 0), (0, 1, 0), (1, 0, 0), (1, 1, 0)]
And total number of state-action histories, also 4:
[(0, 0, 0), (0, 1, 0), (1, 0, 0), (1, 1, 0)]

I think I'm mostly confused about how to correctly represent states as the ones @wbarfuss and I discussed.

p(s=DD, o=D)=1
p(s=CD, o=D)=1
p(s=DC, o=C)=1
p(s=CC, o=CC)=1

Originating an observation matrix looking like the following:

[[[1,0,0,0],
  [1,0,0,0],
  [0,1,0,0],
  [0,1,0,0]]]

The question is: how to surely know one is observing states D or C when the matrix represents observation pairs? CC CD DC DD?

@marimeireles
Copy link
Collaborator Author

When I look at simpler examples like a simple heterogeneous observability environment like the following:

mae1.env.O

array([[[1.  , 0.  , 0.  , 0.  ],
        [0.  , 1.  , 0.  , 0.  ],
        [0.  , 0.  , 1.  , 0.  ],
        [0.  , 0.  , 0.  , 1.  ]],

       [[0.5  , 0.  , 0.5, 0.  ],
        [0.5  , 0.  , 0.5, 0.  ],
        [0.5  , 0.  , 0.5, 0.  ],
        [0.5  , 0.  , 0.5, 0.  ]]])

The results are pretty obviously influenced by how the states are being observed.

Screenshot 2024-03-24 at 5 45 29 PM

If the completely blind actor was to not influence in the other states, the graph of an agent that has the following observability matrix:

       [[0.25, 0.25, 0.25, 0.25],
        [0.25, 0.25, 0.25, 0.25],
        [0.25, 0.25, 0.25, 0.25],
        [0.25, 0.25, 0.25, 0.25]]
Screenshot 2024-03-24 at 5 49 44 PM

@marimeireles
Copy link
Collaborator Author

Closed as this is what makes RL especial

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant