Neighboring states are being affected by partial observability #6

marimeireles · 2024-03-24T15:12:21Z

I suspect there's something wrong with the way we're stepping through the different iterations to generate the flow graphs. I believe that the probabilities of observing states in one tensor are affecting the probabilities of seeing other states in other tensors.

Here's a h=(2,2,2) history plot.
Agent 1 has partial observability in which the agent's c,c.|c,c.| state is completely obscure:

memo1pd.O[1] = np.array(       
        [[0.0625, 0.0625, 0.0625, 0.0625,0.0625, 0.0625, 0.0625, 0.0625,0.0625, 0.0625, 0.0625, 0.0625,0.0625, 0.0625, 0.0625, 0.0625],
        [0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]])

And the agent 0 has complete observability.

Here's the plot of the first c,c.|c,c.| state and the next 3 states.

My understanding is that by obscuring state c,c.|c,c.| we shouldn't have any influence in the neighboring states, but we do. As we can see if I plot now a history h=(2,2,2) with both agents homogeneously fully observing the environment. The expected result should be the three states preceding c,c.|c,c.| to be exactly the same, however we observe that they're different:

We observe similar results for heterogeneous agents (as described previously) in a h=(1,2,2) scenario:

And for comparison here's the plot of homogeneous agents for a scenario with h=(1,2,2):

As you can see the changes are very slight! But they're more pungent in 4x4 matrixes because instead of 0.0625 we have 0.25 and they have a stronger influence in the neighboring graphs. My understanding is that this behaviour shouldn't be happening.

The text was updated successfully, but these errors were encountered:

marimeireles · 2024-03-24T20:42:48Z

Or maybe that is expected behavior after all, and I'm just confused.
Looking at how the history class builds observation tensors we have the following parameters:

Number of agents, which in this case is 2.
Total number of observation-action histories, 4:
[(0, 0, 0), (0, 1, 0), (1, 0, 0), (1, 1, 0)]
And total number of state-action histories, also 4:
[(0, 0, 0), (0, 1, 0), (1, 0, 0), (1, 1, 0)]

I think I'm mostly confused about how to correctly represent states as the ones @wbarfuss and I discussed.

p(s=DD, o=D)=1
p(s=CD, o=D)=1
p(s=DC, o=C)=1
p(s=CC, o=CC)=1

Originating an observation matrix looking like the following:

[[[1,0,0,0],
  [1,0,0,0],
  [0,1,0,0],
  [0,1,0,0]]]

The question is: how to surely know one is observing states D or C when the matrix represents observation pairs? CC CD DC DD?

marimeireles · 2024-03-24T20:50:02Z

When I look at simpler examples like a simple heterogeneous observability environment like the following:

mae1.env.O

array([[[1.  , 0.  , 0.  , 0.  ],
        [0.  , 1.  , 0.  , 0.  ],
        [0.  , 0.  , 1.  , 0.  ],
        [0.  , 0.  , 0.  , 1.  ]],

       [[0.5  , 0.  , 0.5, 0.  ],
        [0.5  , 0.  , 0.5, 0.  ],
        [0.5  , 0.  , 0.5, 0.  ],
        [0.5  , 0.  , 0.5, 0.  ]]])

The results are pretty obviously influenced by how the states are being observed.

If the completely blind actor was to not influence in the other states, the graph of an agent that has the following observability matrix:

       [[0.25, 0.25, 0.25, 0.25],
        [0.25, 0.25, 0.25, 0.25],
        [0.25, 0.25, 0.25, 0.25],
        [0.25, 0.25, 0.25, 0.25]]

marimeireles · 2024-04-22T11:24:55Z

Closed as this is what makes RL especial

marimeireles closed this as completed Apr 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Neighboring states are being affected by partial observability #6

Neighboring states are being affected by partial observability #6

marimeireles commented Mar 24, 2024

marimeireles commented Mar 24, 2024

marimeireles commented Mar 24, 2024

marimeireles commented Apr 22, 2024

Neighboring states are being affected by partial observability #6

Neighboring states are being affected by partial observability #6

Comments

marimeireles commented Mar 24, 2024

marimeireles commented Mar 24, 2024

marimeireles commented Mar 24, 2024

marimeireles commented Apr 22, 2024