Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multireduce Kernels: fixing loop scope w/ local buffers #4453

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

0xtimmy
Copy link
Contributor

@0xtimmy 0xtimmy commented May 6, 2024

uops.py has a function fix_loop_scope which will not lift uops out of a loop if they have a DEFINE_LOCAL as a parent
with multiple reducops, uoptimize needs to be able to lift operations on the result of a reduceop out of following reductions

ex: in standard deviation, the calculation for mean will get placed within the second loop, because the next time ast_parse is called is while rendering the second reduction

... [ a previous reduce]
35 UOps.DEFINE_ACC     : dtypes.float              []                               0.0
36 UOps.LOOP           : dtypes.int                [6, 7]                           None
37 UOps.ALU            : dtypes.int                [12, 36]                         BinaryOps.ADD
38 UOps.LOAD           : dtypes.float              [1, 37]                          None
39 UOps.ALU            : dtypes.float              [34, 8]                          BinaryOps.MUL     <- (34 is the result of the previous reduce)
40 UOps.ALU            : dtypes.float              [38, 39]                         BinaryOps.SUB
41 UOps.ALU            : dtypes.float              [40, 35]                         BinaryOps.ADD
42 UOps.PHI            : dtypes.float              [35, 41, 36]                     None
43 UOps.ENDLOOP        :                           [36]                             None

should instead look like:

... [ a previous reduce]
35 UOps.DEFINE_ACC     : dtypes.float              []                               0.0
36 UOps.ALU            : dtypes.float              [34, 8]                          BinaryOps.MUL     <-
37 UOps.LOOP           : dtypes.int                [6, 7]                           None
38 UOps.ALU            : dtypes.int                [12, 37]                         BinaryOps.ADD
39 UOps.LOAD           : dtypes.float              [1, 38]                          None
40 UOps.ALU            : dtypes.float              [39, 36]                         BinaryOps.SUB
41 UOps.ALU            : dtypes.float              [40, 35]                         BinaryOps.ADD
42 UOps.PHI            : dtypes.float              [35, 41, 37]                     None

I included a test that covers all the cases I ran into with sequential reduces. The change didn't break any existing tests so I'm not really sure what else to test

this is NOT the case for calculations in the full shape, and that's considered in the new test as well

Copy link
Contributor

github-actions bot commented May 6, 2024

Changes

Name                        Lines    Diff    Tokens/Line    Diff
------------------------  -------  ------  -------------  ------
tinygrad/codegen/uops.py      320      -2           17.3    +0.1


total lines changes: -2

@@ -46,5 +46,145 @@ def test_const_cast(self):
self.assertEqual(out.uop, UOps.CONST)
self.assertEqual(out.arg, 0)

def test_loop_scope_load_local(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make this test more end-to-end?
Ideally, we'd want AST + List of optimizations that create this uop graph instead of handcoding it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah bet, should i add them to #4220 somehow as well? master doesn't have the scheduler changes or linearizer changes to generate the patterns i want to test

Copy link
Collaborator

@Qazalin Qazalin May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handcoding the AST is fine.
see test/test_linearizer.py TestLinearizer.test_multioutput
https://github.com/tinygrad/tinygrad/blob/master/test/test_linearizer.py#L82
oh you also need to remove the assert and add a bunch of logic,
the current test is fine. I'll review the changes as-is.

Feel free to add a test to test_linearizer with @unittest.skip for now in addition to this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants