New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multireduce Kernels: fixing loop scope w/ local buffers #4453
base: master
Are you sure you want to change the base?
Conversation
Changes
|
@@ -46,5 +46,145 @@ def test_const_cast(self): | |||
self.assertEqual(out.uop, UOps.CONST) | |||
self.assertEqual(out.arg, 0) | |||
|
|||
def test_loop_scope_load_local(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you make this test more end-to-end?
Ideally, we'd want AST + List of optimizations that create this uop graph instead of handcoding it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah bet, should i add them to #4220 somehow as well? master doesn't have the scheduler changes or linearizer changes to generate the patterns i want to test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
handcoding the AST is fine.
see test/test_linearizer.py TestLinearizer.test_multioutput
https://github.com/tinygrad/tinygrad/blob/master/test/test_linearizer.py#L82
oh you also need to remove the assert and add a bunch of logic,
the current test is fine. I'll review the changes as-is.
Feel free to add a test to test_linearizer with @unittest.skip for now in addition to this.
uops.py
has a functionfix_loop_scope
which will not lift uops out of a loop if they have aDEFINE_LOCAL
as a parentwith multiple reducops,
uoptimize
needs to be able to lift operations on the result of a reduceop out of following reductionsex: in standard deviation, the calculation for mean will get placed within the second loop, because the next time
ast_parse
is called is while rendering the second reductionshould instead look like:
I included a test that covers all the cases I ran into with sequential reduces. The change didn't break any existing tests so I'm not really sure what else to test
this is NOT the case for calculations in the full shape, and that's considered in the new test as well