模型自动求导¶

CPU GPU Linux 入门

神经网络训练常用反向传播算法，根据损失函数对于给定参数的梯度来调整参数（模型权重）。

LuoJiaNET计算梯度的方法为luojianet_ms.ops.GradOperation (get_all=False, get_by_list=False, sens_param=False)，其中get_all为False时，只会对第一个输入求导，为True时，会对所有输入求导；get_by_list为False时，不会对权重求导，为True时，会对权重求导；sens_param对网络的输出值做缩放以改变最终梯度。下面用MatMul算子的求导做深入分析。

首先导入本文档需要的模块和接口，如下所示：

[1]:

import numpy as np
import luojianet_ms.nn as nn
import luojianet_ms.ops as ops
from luojianet_ms import Tensor
from luojianet_ms import ParameterTuple, Parameter
from luojianet_ms import dtype as mstype

对输入求一阶导¶

如果需要对输入进行求导，首先需要定义一个需要求导的网络，以一个由MatMul算子构成的网络\(f(x,y)=z*x*y\)为例。

定义网络结构如下：

[2]:

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.matmul = ops.MatMul()
        self.z = Parameter(Tensor(np.array([1.0], np.float32)), name='z') #z为输入变量

    def call(self, x, y):
        x = x * self.z
        out = self.matmul(x, y)
        return out

接着定义对输入的求导网络，__init__函数中定义需要求导的网络self.net和ops.GradOperation操作，call函数中对self.net进行求导。

求导网络结构如下：

[3]:

class GradNetWrtX(nn.Module):
    def __init__(self, net):
        super(GradNetWrtX, self).__init__()
        self.net = net
        self.grad_op = ops.GradOperation()

    def call(self, x, y):
        gradient_function = self.grad_op(self.net)
        return gradient_function(x, y)

定义参数输入并且打印输出：

[4]:

x = Tensor([[0.8, 0.6, 0.2], [1.8, 1.3, 1.1]], dtype=mstype.float32)
y = Tensor([[0.11, 3.3, 1.1], [1.1, 0.2, 1.4], [1.1, 2.2, 0.3]], dtype=mstype.float32)
output = GradNetWrtX(Net())(x, y)
print(output)

[[4.5099998 2.7       3.6000001]
 [4.5099998 2.7       3.6000001]]

若考虑对x、y输入求导，只需在GradNetWrtX中设置self.grad_op = GradOperation(get_all=True)。

对权重求一阶导¶

若需要对权重的求导，将ops.GradOperation中的get_by_list设置为True：

则GradNetWrtX结构为：

[5]:

class GradNetWrtX(nn.Cell):
    def __init__(self, net):
        super(GradNetWrtX, self).__init__()
        self.net = net
        self.params = ParameterTuple(net.trainable_params())
        self.grad_op = ops.GradOperation(get_by_list=True)

    def construct(self, x, y):
        gradient_function = self.grad_op(self.net, self.params)
        return gradient_function(x, y)

运行并打印输出：

[6]:

output = GradNetWrtX(Net())(x, y)
print(output)

(Tensor(shape=[1], dtype=Float32, value= [ 2.15359993e+01]),)

若需要对某些权重不进行求导，则在定义求导网络时，对相应的权重中requires_grad设置为False。

self.z = Parameter(Tensor(np.array([1.0], np.float32)), name='z', requires_grad=False)

停止计算梯度¶

我们可以使用stop_gradient来禁止网络内的算子对梯度的影响，例如：

[ ]:

import numpy as np
import luojianet_ms.nn as nn
import luojianet_ms.ops as ops
from luojianet_ms import Tensor
from luojianet_ms import ParameterTuple, Parameter
from luojianet_ms import dtype as mstype
from luojianet_ms.ops import stop_gradient

class MyNet(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.matmul = ops.MatMul()

    def call(self, x, y):
        out1 = self.matmul(x, y)
        out2 = self.matmul(x, y)
        out2 = stop_gradient(out2)
        out = out1 + out2
        return out

class GradMyNetWrtX(nn.Module):
    def __init__(self, net):
        super(GradMyNetWrtX, self).__init__()
        self.net = net
        self.grad_op = ops.GradOperation()

    def call(self, x, y):
        gradient_function = self.grad_op(self.net)
        return gradient_function(x, y)

x = Tensor([[0.8, 0.6, 0.2], [1.8, 1.3, 1.1]], dtype=mstype.float32)
y = Tensor([[0.11, 3.3, 1.1], [1.1, 0.2, 1.4], [1.1, 2.2, 0.3]], dtype=mstype.float32)
output = GradNetWrtX(Net())(x, y)
print(output)

[[4.5 2.7 3.6]
 [4.5 2.7 3.6]]

在这里我们对out2设置了stop_gradient, 所以out2没有对梯度计算有任何的贡献。如果我们删除out2 = stop_gradient(out2)，那么输出值会变为：

[ ]:

output = GradNetWrtX(Net())(x, y)
print(output)

[[9.0 5.4 7.2]
 [9.0 5.4 7.2]]

在我们不对out2设置stop_gradient后， out2和out1会对梯度产生相同的贡献。所以我们可以看到，结果中每一项的值都变为了原来的两倍。