Inconsistent between code and pseudocode in agent input #115

Ynjxsjmh · 2021-08-02T01:31:24Z

Reading the pseudocode in paper Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

The inputs of agent network is τᵃₜ and uᵃₜ. According to the pseudocode, τ is a list of (oₜ, uₜ₋₁). τᵃ and uᵃ are introduced as following in the paper

At each time step, each agent a ∈ A ≡ {1,...,n} chooses an action uᵃ ∈ U.
Each agent has an action-observation history τᵃ ∈ T ≡ (Z×U)*.

However, in the pymarl code, the inputs of agent network seems not τ and u but o and u:

https://github.com/oxwhirl/pymarl/blob/73960e11c5a72e7f9c492d36dbfde02016fde05a/src/controllers/basic_controller.py#L77-92

    def _build_inputs(self, batch, t):
        # Assumes homogenous agents with flat observations.
        # Other MACs might want to e.g. delegate building inputs to each agent
        bs = batch.batch_size
        inputs = []
        inputs.append(batch["obs"][:, t])  # b1av
        if self.args.obs_last_action:
            if t == 0:
                inputs.append(th.zeros_like(batch["actions_onehot"][:, t]))
            else:
                inputs.append(batch["actions_onehot"][:, t-1])
        if self.args.obs_agent_id:
            inputs.append(th.eye(self.n_agents, device=batch.device).unsqueeze(0).expand(bs, -1, -1))

        inputs = th.cat([x.reshape(bs*self.n_agents, -1) for x in inputs], dim=1)
        return inputs

In your implementation, inputs is constructed with batch["obs"][:, t] and batch["actions_onehot"][:, t-1] rather than action-observation history and action.

The text was updated successfully, but these errors were encountered:

hijkzzz · 2021-08-06T22:40:00Z

the action-observation history is encoded by RNN.
We also recommend our finetuned qmix: https://github.com/hijkzzz/pymarl2.

Ynjxsjmh · 2021-08-07T01:51:53Z

As far as I know, _build_inputs() is only used in forward() method in which DRQN model is used.

pymarl/src/controllers/basic_controller.py

Lines 26 to 29 in 73960e1

    
           def forward(self, ep_batch, t, test_mode=False): 
        
               agent_inputs = self._build_inputs(ep_batch, t) 
        
               avail_actions = ep_batch["avail_actions"][:, t] 
        
               agent_outs, self.hidden_states = self.agent(agent_inputs, self.hidden_states)

You say "the action-observation history is encoded by RNN.", but I didn't see anything related in agent.forward() method.

pymarl/src/modules/agents/rnn_agent.py

Lines 18 to 23 in 73960e1

    
           def forward(self, inputs, hidden_state): 
        
               x = F.relu(self.fc1(inputs)) 
        
               h_in = hidden_state.reshape(-1, self.args.rnn_hidden_dim) 
        
               h = self.rnn(x, h_in) 
        
               q = self.fc2(h) 
        
               return q, h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent between code and pseudocode in agent input #115

Inconsistent between code and pseudocode in agent input #115

Ynjxsjmh commented Aug 2, 2021 •

edited

hijkzzz commented Aug 6, 2021

Ynjxsjmh commented Aug 7, 2021

Inconsistent between code and pseudocode in agent input #115

Inconsistent between code and pseudocode in agent input #115

Comments

Ynjxsjmh commented Aug 2, 2021 • edited

hijkzzz commented Aug 6, 2021

Ynjxsjmh commented Aug 7, 2021

Ynjxsjmh commented Aug 2, 2021 •

edited