Multireduce Kernels: fixing loop scope w/ local buffers #4453

0xtimmy · 2024-05-06T17:56:02Z

uops.py has a function fix_loop_scope which will not lift uops out of a loop if they have a DEFINE_LOCAL as a parent
with multiple reducops, uoptimize needs to be able to lift operations on the result of a reduceop out of following reductions

ex: in standard deviation, the calculation for mean will get placed within the second loop, because the next time ast_parse is called is while rendering the second reduction

... [ a previous reduce]
35 UOps.DEFINE_ACC     : dtypes.float              []                               0.0
36 UOps.LOOP           : dtypes.int                [6, 7]                           None
37 UOps.ALU            : dtypes.int                [12, 36]                         BinaryOps.ADD
38 UOps.LOAD           : dtypes.float              [1, 37]                          None
39 UOps.ALU            : dtypes.float              [34, 8]                          BinaryOps.MUL     <- (34 is the result of the previous reduce)
40 UOps.ALU            : dtypes.float              [38, 39]                         BinaryOps.SUB
41 UOps.ALU            : dtypes.float              [40, 35]                         BinaryOps.ADD
42 UOps.PHI            : dtypes.float              [35, 41, 36]                     None
43 UOps.ENDLOOP        :                           [36]                             None

should instead look like:

... [ a previous reduce]
35 UOps.DEFINE_ACC     : dtypes.float              []                               0.0
36 UOps.ALU            : dtypes.float              [34, 8]                          BinaryOps.MUL     <-
37 UOps.LOOP           : dtypes.int                [6, 7]                           None
38 UOps.ALU            : dtypes.int                [12, 37]                         BinaryOps.ADD
39 UOps.LOAD           : dtypes.float              [1, 38]                          None
40 UOps.ALU            : dtypes.float              [39, 36]                         BinaryOps.SUB
41 UOps.ALU            : dtypes.float              [40, 35]                         BinaryOps.ADD
42 UOps.PHI            : dtypes.float              [35, 41, 37]                     None

I included a test that covers all the cases I ran into with sequential reduces. The change didn't break any existing tests so I'm not really sure what else to test

this is NOT the case for calculations in the full shape, and that's considered in the new test as well

github-actions · 2024-05-06T17:56:34Z

Changes

Name                        Lines    Diff    Tokens/Line    Diff
------------------------  -------  ------  -------------  ------
tinygrad/codegen/uops.py      320      -2           17.3    +0.1


total lines changes: -2

Qazalin · 2024-05-06T20:48:56Z

test/test_uop_graph.py

@@ -46,5 +46,145 @@ def test_const_cast(self):
    self.assertEqual(out.uop, UOps.CONST)
    self.assertEqual(out.arg, 0)

+  def test_loop_scope_load_local(self):


Could you make this test more end-to-end?
Ideally, we'd want AST + List of optimizations that create this uop graph instead of handcoding it

yeah bet, should i add them to #4220 somehow as well? master doesn't have the scheduler changes or linearizer changes to generate the patterns i want to test

handcoding the AST is fine.
see test/test_linearizer.py TestLinearizer.test_multioutput
https://github.com/tinygrad/tinygrad/blob/master/test/test_linearizer.py#L82
oh you also need to remove the assert and add a bunch of logic,
the current test is fine. I'll review the changes as-is.

Feel free to add a test to test_linearizer with @unittest.skip for now in addition to this.

0xtimmy added 5 commits May 5, 2024 19:36

does this break ci?

cc6de56

added tests and if add end change

ac3ab29

typo in add_ends

9d91ee7

linters

b3bd7ac

Merge remote-tracking branch 'upstream/master' into uops-loop-scope

fb3d3c1

removing comments

bd0bba5

Qazalin reviewed May 6, 2024

View reviewed changes

0xtimmy mentioned this pull request May 11, 2024

Multireduce-Kernels: Linearizer Changes and Tests #4259

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multireduce Kernels: fixing loop scope w/ local buffers #4453

Multireduce Kernels: fixing loop scope w/ local buffers #4453

0xtimmy commented May 6, 2024

github-actions bot commented May 6, 2024

Qazalin May 6, 2024

0xtimmy May 6, 2024

Qazalin May 6, 2024 •

edited

Multireduce Kernels: fixing loop scope w/ local buffers #4453

Are you sure you want to change the base?

Multireduce Kernels: fixing loop scope w/ local buffers #4453

Conversation

0xtimmy commented May 6, 2024

github-actions bot commented May 6, 2024

Changes

Qazalin May 6, 2024

Choose a reason for hiding this comment

0xtimmy May 6, 2024

Choose a reason for hiding this comment

Qazalin May 6, 2024 • edited

Choose a reason for hiding this comment

Qazalin May 6, 2024 •

edited