New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clang jit #4492
base: master
Are you sure you want to change the base?
Clang jit #4492
Conversation
it mostly works on aarch64, everything fails because x86, i'll make the last few test pass on aarch64 and then x86 |
1396.63s => 23.53s on m1 max ~60x improvement```python (.venv) [user@macbook ~/src/tinygrad]% git checkout master Switched to branch 'master' Your branch is up to date with 'origin/master'. (.venv) [user@macbook ~/src/tinygrad]% CLANG=1 python -m pytest -n=auto test/ --ignore=test/external --ignore=test/models --durations=20 ================================================================================= test session starts ================================================================================= platform darwin -- Python 3.11.9, pytest-7.4.3, pluggy-1.3.0 rootdir: /Users/user/src/tinygrad plugins: hypothesis-6.92.0, xdist-3.5.0 10 workers [1546 items] ...sssssss........................x............sssssssss........sss..........................xss.................................s...sss.....s................sssssssss.....sss [ 11%] s.................sss.....s.........sss......ssssssssssss..............ssssss..sssssssss...............s....s............s..................................................... [ 22%] ............................s.......s...........................................s......ss..............................................s.............................s......... [ 33%] ..........................F..........s...............................................................................................................s......................... [ 45%] ....................................s.................................s......................................sssss............s.x..............s.....s...s...ss.....s....s..... [ 56%] ....s...........................s.......................................................................s.....s....................s.........ssss.s............................ [ 67%] ......................s.....ss...............s.ssssssss...s....s.............................s........s...s...................................................sss....sss....... [ 79%] ........................................................................................................s...................................................................... [ 90%] ......................................................s.......................s......ss............................................ssss..sss...s.. [100%] ====================================================================================== FAILURES ======================================================================================= ___________________________________________________________________________ TestFusionOp.test_recursive_add ___________________________________________________________________________ [gw6] darwin -- Python 3.11.9 /Users/user/src/tinygrad/.venv/bin/python3self = <test.test_fusion_op.TestFusionOp testMethod=test_recursive_add>
E AssertionError: 2.480366833000062 not less than 1.0 test/test_fusion_op.py:31: AssertionError test/test_dtype.py::TestHalfDtype::test_casts_to test/test_dtype.py::TestHalfDtype::test_casts_to test/test_dtype_alu.py::TestDTypeALU::test_float16 test/test_dtype_alu.py::TestDTypeALU::test_float16 test/test_dtype_alu.py: 19 warnings test/test_dtype_alu.py: 18 warnings test/test_dtype_alu.py::TestDTypeALU::test_float16_unary test/test_dtype_alu.py: 13 warnings test/test_dtype_alu.py::TestDTypeALU::test_float16_unary test/test_dtype_alu.py: 14 warnings test/test_dtype_alu.py: 78 warnings test/test_dtype_alu.py::TestDTypeALU::test_float_midcast_int32 test/test_dtype_alu.py::TestDTypeALU::test_float_midcast_int32 test/test_dtype_alu.py::TestDTypeALU::test_float_midcast_int32 -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html self = <test.test_sample.TestSample testMethod=test_sample>
test/test_sample.py:17: args = (, array([[3.28800082e-01, 9.07372057e-01, 2.37981558e-01, 2.16014087e-01, 1.49323285e-01, 4.698...43615544e-01,
E AssertionError: /opt/homebrew/Cellar/python@3.11/3.11.9/Frameworks/Python.framework/Versions/3.11/lib/python3.11/contextlib.py:81: AssertionError test/test_dtype.py::TestHalfDtype::test_casts_to test/test_dtype_alu.py::TestDTypeALU::test_float16 test/test_dtype_alu.py: 21 warnings test/test_dtype_alu.py: 12 warnings test/test_dtype_alu.py::TestDTypeALU::test_float16_unary test/test_dtype_alu.py::TestDTypeALU::test_float16_unary test/test_dtype_alu.py::TestDTypeALU::test_float16_unary test/test_dtype_alu.py::TestDTypeALU::test_float32 test/test_dtype_alu.py::TestDTypeALU::test_float32 test/test_dtype_alu.py::TestDTypeALU::test_float32_unary test/test_gc.py::TestGC::test_gc test/test_dtype_alu.py: 63 warnings test/test_dtype_alu.py::TestDTypeALU::test_float_midcast_int32 test/test_dtype_alu.py::TestDTypeALU::test_float_midcast_int32 test/test_dtype_alu.py::TestDTypeALU::test_float_midcast_int32 test/test_dtype_alu.py::TestDTypeALU::test_int32_midcast_float -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
|
lol i've seen that dlopen is slow but i didn't thought it would be 60 times faster with my impl.... |
Lol I was working on this, but your code looks better so:
|
Cool, bounty locked. There's a lot to clean up here, but yea, dlopen is mad slow. |
I don't want to have to compile other functions or deal with relocations. How can we get the compiler to emit asm for the math functions? All the processors should support it. On Mac:
|
See the Taylor series bounty, I think the solution here is not to support relocations and (out of file) function calling. |
I'm making a taylor series powered tinymath.h right now (i don't think that it being in function.py is a good idea) Relocation support is required for any function call even from the same object file (eg in your clang graph impl) and for constants (they're loaded from memory at least on aarch64) |
Even if you don't use constants the compiler can generate them for vectorization and I couldn't really find a way to turn that off. The function calls limitation can be worked around by declaring everything except the main func static though. |
This branch currently is behind tinygrad/master. The line count difference bot is disabled. |
No description provided.