Port sensitivity definitions to ChainRules #177

ararslan · 2019-06-18T16:15:52Z

In order to make Nabla use ChainRules for sensitivities with full feature parity with the current implementation, we'll need to port the sensitivity definitions here (i.e. the ∇ methods) and turn them into rrule methods in ChainRules.

How to port

Porting them over is actually pretty straightforward. Nabla's ∇ methods pass an Arg{i} argument that dictates which of the arguments to the function is being differentiated in the current method. For example,

f(a::Int, b::Int) = a + 2b + 1

# Derivative for the first argument, `a`
∇(::typeof(f), ::Type{Arg{1}}, p, y, ȳ, a::Int, b::Int) = ȳ

# Derivative for the second, `b`
∇(::typeof(f), ::Type{Arg{2}}, p, y, ȳ, a::Int, b::Int) = 2ȳ

ChainRules rrules methods include both derivatives in a single method. So the above translates to

function rrule(::typeof(f), a::Int, b::Int)
    y = f(a, b)
    ∂a = Rule(ȳ -> ȳ)
    ∂b = Rule(ȳ -> 2ȳ)
    return y, (∂a, ∂b)
end

There are cases where a ∇ is purposefully not defined for a given Arg{i}; that denotes that there is no derivative with respect to that argument. In ChainRules, we express that by returning a DNERule() in place of the Rule. So if in the above example f was only differentiable with respect to b, the rrule would instead look like

function rrule(::typeof(f), a::Int, b::Int)
    y = f(a, b)
    ∂b = Rule(ȳ -> 2ȳ)
    return y, (DNERule(), ∂b)
end

Also note that the derivatives for the various arguments can share intermediate computation. That can go into the body of the rrule method itself, with the defined variables captured in the closures in the Rules.

There are some cases where Nabla defines custom methods for updating the tape with a given sensitivity. Those are expressed as methods of ∇ with the tape value x̄ as the first argument. ChainRules does this differently: if you have a special way in which you'd like to accumulate a sensitivity to a given value, you provide a second argument to Rule that's another function that takes arguments (x̄, ȳ). This is used by the ChainRules.accumulate!(value, rule, args...) method. Just like Nabla, if no such special method exists for updating, a generic fallback is used.

Progress

Below is a list of all of the basic ∇ methods. More items are finished than are currently checked as of this writing; as you find that ChainRules does indeed include a corresponding rrule method and the sensitivity definition it uses looks correct, please check these off.

Note that this list does not include methods which update the tape directly!

The text was updated successfully, but these errors were encountered:

oxinabox mentioned this issue Aug 4, 2019

Writie initial Getting Started documentation JuliaDiff/ChainRules.jl#72

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port sensitivity definitions to ChainRules #177

Port sensitivity definitions to ChainRules #177

ararslan commented Jun 18, 2019 •

edited

Port sensitivity definitions to ChainRules #177

Port sensitivity definitions to ChainRules #177

Comments

ararslan commented Jun 18, 2019 • edited

How to port

Progress

ararslan commented Jun 18, 2019 •

edited