import tensorflow as tf
print(tf.__version__)

# import os
# os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
# tf.test.is_gpu_available( cuda_only=False, min_cuda_compute_capability=None )

2.0.0

2.3 自动求梯度

在深度学习中，我们经常需要对函数求梯度（gradient）。本节将介绍如何使用tensorflow2.0提供的GradientTape来自动求梯度。

GradientTape 可以理解为“梯度流记录磁带”：

在记录阶段：记录被 GradientTape 包裹的运算过程中，依赖于 source node （被 watch “监视”的变量）的关系图。

在求导阶段：通过搜索 source node 到 target node 的路径，进而计算出偏微分。

source node 在记录运算过程之前进行指定：

自动“监控”所有可训练变量：GradientTape 默认（watch_accessed_variables=True）将所有可训练变量（created by tf.Variable, where trainable=True）视为需要“监控”的 source node 。

对于不可训练的变量（比如tf.constant）可以使用tape.watch()对其进行“监控”。

此外，还可以设定watch_accessed_variables=False，然后使用tf.watch()精确控制需要“监控”哪些变量。

2.3.1 简单示例

我们先看一个简单例子：对函数 $y = 2\boldsymbol{x}^{\top}\boldsymbol{x}$ 求关于列向量 $\boldsymbol{x}$ 的梯度。我们先创建变量x，并赋初值。

x = tf.reshape(tf.Variable(range(4), dtype=tf.float32),(4,1))
x

<tf.Tensor: id=10, shape=(4, 1), dtype=float32, numpy=
array([[0.],
       [1.],
       [2.],
       [3.]], dtype=float32)>

函数 $y = 2\boldsymbol{x}^{\top}\boldsymbol{x}$ 关于$\boldsymbol{x}$ 的梯度应为$4\boldsymbol{x}$。现在我们来验证一下求出来的梯度是正确的。

with tf.GradientTape() as t:
    t.watch(x) #对于Variable类型的变量，一般不用加此监控
    y = 2 * tf.matmul(tf.transpose(x), x)
    
dy_dx = t.gradient(y, x)
dy_dx

<tf.Tensor: id=30, shape=(4, 1), dtype=float32, numpy=
array([[ 0.],
       [ 4.],
       [ 8.],
       [12.]], dtype=float32)>

2.3.2 训练模式和预测模式

with tf.GradientTape(persistent=True) as g:
    g.watch(x)
    y = x * x
    z = y * y
    dz_dx = g.gradient(z, x)  # 108.0 (4*x^3 at x = 3)
    dy_dx = g.gradient(y, x)  # 6.0
dz_dx,dy_dx

WARNING:tensorflow:Calling GradientTape.gradient on a persistent tape inside its context is significantly less efficient than calling it outside the context (it causes the gradient ops to be recorded on the tape, leading to increased CPU and memory usage). Only call GradientTape.gradient inside the context if you actually want to trace the gradient in order to compute higher order derivatives.
WARNING:tensorflow:Calling GradientTape.gradient on a persistent tape inside its context is significantly less efficient than calling it outside the context (it causes the gradient ops to be recorded on the tape, leading to increased CPU and memory usage). Only call GradientTape.gradient inside the context if you actually want to trace the gradient in order to compute higher order derivatives.

(<tf.Tensor: id=41, shape=(4, 1), dtype=float32, numpy=
 array([[  0.],
        [  4.],
        [ 32.],
        [108.]], dtype=float32)>,
 <tf.Tensor: id=47, shape=(4, 1), dtype=float32, numpy=
 array([[0.],
        [2.],
        [4.],
        [6.]], dtype=float32)>)

2.3.3 对Python控制流求梯度

即使函数的计算图包含了Python的控制流（如条件和循环控制），我们也有可能对变量求梯度。

考虑下面程序，其中包含Python的条件和循环控制。需要强调的是，这里循环（while循环）迭代的次数和条件判断（if语句）的执行都取决于输入a的值。

def f(a):
    b = a * 2
    while tf.norm(b) < 1000:
        b = b * 2
    if tf.reduce_sum(b) > 0:
        c = b
    else:
        c = 100 * b
    return c

我们来分析一下上面定义的f函数。事实上，给定任意输入a，其输出必然是 f(a) = x * a的形式，其中标量系数x的值取决于输入a。由于c = f(a)有关a的梯度为x，且值为c / a，我们可以像下面这样验证对本例中控制流求梯度的结果的正确性。

a = tf.random.normal((1,1),dtype=tf.float32)
with tf.GradientTape() as t:
    t.watch(a)
    c = f(a)
t.gradient(c,a) == c/a

<tf.Tensor: id=201, shape=(1, 1), dtype=bool, numpy=array([[ True]])>

注：本节除了代码之外与原书基本相同，原书传送门参考资料

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2.3_autograd.md

2.3_autograd.md

2.3 自动求梯度

2.3.1 简单示例

2.3.2 训练模式和预测模式

2.3.3 对Python控制流求梯度

Files

2.3_autograd.md

Latest commit

History

2.3_autograd.md

File metadata and controls

2.3 自动求梯度

2.3.1 简单示例

2.3.2 训练模式和预测模式

2.3.3 对Python控制流求梯度