Remove redundant sum() rules #1453

ToucheSir · 2023-09-01T19:48:28Z

The pullback is non-differentiable, which messes with nested AD (#1450). It's also not clear to me why this rule still exists when ChainRules has a seemingly GPU-compatible one. Let's see what CI says.

PR Checklist

~~Tests are added~~
~~Documentation, if applicable~~

mcabbott · 2023-09-02T19:39:46Z

I thought this existed in order to opt-out of the Zygote rule for sum which makes a FillArray.

julia> gradient(sum, [2.0, 3.0])
(Fill(1.0, 2),)

We could delete that too, it saves one copy sometimes but rarely matters in real code, and causes problems.

ToucheSir · 2023-09-05T03:32:25Z

Deleting that rule fixes all but one testsuite,

Zygote.jl/test/lib/array.jl

Line 53 in 6129613

@test g isa Dict{Int, Int}

. Not sure how best to fix it. Perhaps we could generalize

Zygote.jl/src/lib/array.jl

Lines 340 to 342 in 6129613

    
           @adjoint function sum(xs::AbstractArray{Bool}; dims = :) 
        
             sum(xs, dims = dims), Δ -> (nothing,) 
        
           end

to work on all Integers and convert it to a rrule(::ZygoteRuleConfig, ...) for future-proofing at the same time?

mcabbott · 2023-09-05T17:15:39Z

We could certainly delete the rule for bool arrays, as there's one here:

https://github.com/JuliaDiff/ChainRules.jl/blob/ba52ec89ddd97a07e79cc35a9fa39019915d203b/src/rulesets/Base/nondiff.jl#L80

IDK what the issue with that Dict test is.

(Considering integers to be differentiable was a mistake, IMO, but a breaking change to fix that, here or in CR.)

ToucheSir · 2023-09-05T22:13:30Z

IDK what the issue with that Dict test is.

The old rule was arguably wrong, because it was passing through the gradient for the summed value without doing any form of projection. If this were a scalar function, asking to differentiate wrt an integer argument would return a float gradient. So in my mind the test is actually capturing incorrect and inconsistent behaviour of the current rule. If we agree on that, I'll just tweak the test and we'll be back on green CI (minus known AbstractFFT failures).

mcabbott · 2023-09-05T23:00:07Z

Sorry I didn't look closely, but if the change is just that now you get a Dict of Floats not Ints, then that seems totally fine, we just adjust the test.

ToucheSir · 2023-09-08T03:38:32Z

The one remaining failure:

sum, prod, cumsum: Test Failed at /var/lib/buildkite-agent/builds/gpuci-1/julialang/zygote-dot-jl/test/gradcheck.jl:117
  Expression: gradient(sum, [true, false, true]) == (nothing,)
 Evaluated: nothing == (nothing,)

Which comes from the isnothing ternary on

Zygote.jl/src/compiler/interface.jl

Line 98 in e0d3d8b

isnothing(grad) ? nothing : map(_project, args, grad)

@mcabbott do you recall why we're collapsing to nothing here? I can't recall how we're supposed to handle nothing vs (nothing,) vs (nothing, ..., nothing) when returned from the pullback.

mcabbott · 2023-09-08T03:44:28Z

My memory is that Zygote is eager to collapse any tuple of nothings to just nothing, but doesn't always manage to do so. I think at least withgradient and perhaps gradient try to restore them & always make a tuple. But I may have forgotten things.

ToucheSir · 2023-09-08T03:54:11Z

It looks like gradient is not trying to make a tuple when it goes get singular nothing. Should we make it do so? A version of this problem (more aggressive collapsing of zeros after moving to CR rules) is also causing the last two (non-unbreaking) test failures in #1328, ref. https://github.com/FluxML/Zygote.jl/actions/runs/6117262926/job/16603631586?pr=1328#step:6:747.

FerreolS · 2023-11-30T08:26:53Z

Hi,
Is there any hope to merge this PR soon? Is there anything I can do in that direction?

ToucheSir · 2023-12-01T15:59:59Z

Maybe, if we can get some consensus on the behaviour of gradient around collapsing zeros. See #1466 (comment). Once that's been established, the failing test here will either automatically pass or just requires a one-line tweak to start passing.

ToucheSir added CUDA All things GPU ChainRules adjoint -> rrule, and further integration labels Sep 1, 2023

ToucheSir changed the base branch from master to bc/ci-noise September 1, 2023 20:58

ToucheSir closed this Sep 1, 2023

ToucheSir reopened this Sep 1, 2023

ToucheSir changed the base branch from bc/ci-noise to master September 4, 2023 23:44

ToucheSir changed the title ~~Remove GPU sum() rule~~ Remove redundant sum() rules Sep 6, 2023

ToucheSir added 3 commits September 7, 2023 20:01

Remove GPU sum() rule

4c470eb

Try removing Fill sum rule too

33946f3

Remove bool rule too and correct test

a32f039

ToucheSir force-pushed the bc/rm-gpu-sum-adj branch from 1037852 to a32f039 Compare September 8, 2023 03:01

ToucheSir mentioned this pull request Jan 12, 2024

Un-collapse nothings in gradient #1495

Merged

2 tasks

lkdvos mentioned this pull request Apr 20, 2024

Freed reference problem when combining cuTENSOR and Zygote Jutho/TensorOperations.jl#169

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove redundant sum() rules #1453

Remove redundant sum() rules #1453

ToucheSir commented Sep 1, 2023

mcabbott commented Sep 2, 2023

ToucheSir commented Sep 5, 2023 •

edited

mcabbott commented Sep 5, 2023

ToucheSir commented Sep 5, 2023

mcabbott commented Sep 5, 2023

ToucheSir commented Sep 8, 2023

mcabbott commented Sep 8, 2023

ToucheSir commented Sep 8, 2023 •

edited

FerreolS commented Nov 30, 2023

ToucheSir commented Dec 1, 2023

Remove redundant sum() rules #1453

Are you sure you want to change the base?

Remove redundant sum() rules #1453

Conversation

ToucheSir commented Sep 1, 2023

PR Checklist

mcabbott commented Sep 2, 2023

ToucheSir commented Sep 5, 2023 • edited

mcabbott commented Sep 5, 2023

ToucheSir commented Sep 5, 2023

mcabbott commented Sep 5, 2023

ToucheSir commented Sep 8, 2023

mcabbott commented Sep 8, 2023

ToucheSir commented Sep 8, 2023 • edited

FerreolS commented Nov 30, 2023

ToucheSir commented Dec 1, 2023

ToucheSir commented Sep 5, 2023 •

edited

ToucheSir commented Sep 8, 2023 •

edited