关于DDPG算法 #208

zhenbin-li · 2022-10-26T13:36:39Z

def choose_action(self, s):
s = s[np.newaxis, :] # single state
return self.sess.run(self.a, feed_dict={S: s})[0] # single action

你好，想问问这个函数返回值最后[0]是返回的什么东西呢

yin1999 · 2022-11-20T09:06:55Z

神经网络每次会读取一个批次的数据，但这里的输入只有一个状态样本，在将状态s feed到模型之前，这里对其扩充了一个维度，使输入变为了批次大小为1的向量（s = s[np.newaxis, :]）。同样地，模型返回的预测也是一个批次的，我们要获取输入的单个状态的预测，则需要通过 [0]，将返回的数组中的第一个元素取出来，其对应输入的单个状态的预测值。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

关于DDPG算法 #208

关于DDPG算法 #208

zhenbin-li commented Oct 26, 2022

yin1999 commented Nov 20, 2022

关于DDPG算法 #208

关于DDPG算法 #208

Comments

zhenbin-li commented Oct 26, 2022

yin1999 commented Nov 20, 2022