Clang jit #4492

uuuvn · 2024-05-09T17:50:48Z

No description provided.

uuuvn · 2024-05-09T17:53:13Z

it mostly works on aarch64, everything fails because x86, i'll make the last few test pass on aarch64 and then x86

uuuvn · 2024-05-09T19:47:25Z

1396.63s => 23.53s on m1 max ~60x improvement

```python (.venv) [user@macbook ~/src/tinygrad]% git checkout master Switched to branch 'master' Your branch is up to date with 'origin/master'. (.venv) [user@macbook ~/src/tinygrad]% CLANG=1 python -m pytest -n=auto test/ --ignore=test/external --ignore=test/models --durations=20 ================================================================================= test session starts ================================================================================= platform darwin -- Python 3.11.9, pytest-7.4.3, pluggy-1.3.0 rootdir: /Users/user/src/tinygrad plugins: hypothesis-6.92.0, xdist-3.5.0 10 workers [1546 items] ...sssssss........................x............sssssssss........sss..........................xss.................................s...sss.....s................sssssssss.....sss [ 11%] s.................sss.....s.........sss......ssssssssssss..............ssssss..sssssssss...............s....s............s..................................................... [ 22%] ............................s.......s...........................................s......ss..............................................s.............................s......... [ 33%] ..........................F..........s...............................................................................................................s......................... [ 45%] ....................................s.................................s......................................sssss............s.x..............s.....s...s...ss.....s....s..... [ 56%] ....s...........................s.......................................................................s.....s....................s.........ssss.s............................ [ 67%] ......................s.....ss...............s.ssssssss...s....s.............................s........s...s...................................................sss....sss....... [ 79%] ........................................................................................................s...................................................................... [ 90%] ......................................................s.......................s......ss............................................ssss..sss...s.. [100%] ====================================================================================== FAILURES ======================================================================================= ___________________________________________________________________________ TestFusionOp.test_recursive_add ___________________________________________________________________________ [gw6] darwin -- Python 3.11.9 /Users/user/src/tinygrad/.venv/bin/python3

self = <test.test_fusion_op.TestFusionOp testMethod=test_recursive_add>

def test_recursive_add(self):
  st = time.perf_counter()
  a = Tensor([1,2,3,4])
  for _ in range(24): a = a + a
  sched = create_schedule([a.lazydata], None)
  ji = Device[Device.DEFAULT].get_runner(*sched[-1].ast)

self.assertLess(time.perf_counter()-st, 1.0)

E AssertionError: 2.480366833000062 not less than 1.0

test/test_fusion_op.py:31: AssertionError
================================================================================== warnings summary ===================================================================================
test/test_gc.py::TestGC::test_gc
test/test_gc.py::TestGC::test_gc_complex
/Users/user/src/tinygrad/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:347: UserWarning: torch.distributed.reduce_op is deprecated, please use torch.distributed.ReduceOp instead
warnings.warn(

test/test_dtype.py::TestHalfDtype::test_casts_to
test/test_dtype.py::TestHalfDtype::test_casts_to
/Users/user/src/tinygrad/test/test_dtype.py:51: RuntimeWarning: overflow encountered in cast
_test_op(lambda: a.cast(target_dtype), target_dtype, list(a.numpy().astype(target_dtype.np)))

test/test_dtype.py::TestHalfDtype::test_casts_to
test/test_dtype.py::TestHalfDtype::test_casts_to
test/test_dtype.py::TestHalfDtype::test_casts_to
test/test_dtype.py::TestFloatDType::test_casts_to
test/test_dtype.py::TestFloatDType::test_casts_to
test/test_dtype.py::TestFloatDType::test_casts_to
test/test_dtype.py::TestDoubleDtype::test_casts_to
test/test_dtype.py::TestDoubleDtype::test_casts_to
test/test_dtype.py::TestDoubleDtype::test_casts_to
/Users/user/src/tinygrad/tinygrad/tensor.py:120: RuntimeWarning: invalid value encountered in cast
else: data = _fromcpu(data.astype(dtype.np) if dtype is not None and dtype.np is not None else data)

test/test_dtype_alu.py::TestDTypeALU::test_float16
test/test_dtype_alu.py::TestDTypeALU::test_float32
test/test_dtype_alu.py::TestDTypeALU::test_float32
test/test_dtype_alu.py::TestDTypeALU::test_float32
test/test_dtype_alu.py::TestDTypeALU::test_float32
test/test_dtype_alu.py::TestDTypeALU::test_float32
test/test_dtype_alu.py::TestDTypeALU::test_float32
test/test_dtype_alu.py::TestDTypeALU::test_float64
/Users/user/src/tinygrad/test/test_dtype_alu.py:62: RuntimeWarning: invalid value encountered in subtract
numpy_value = op[1](np.array([a]).astype(dtype.np), np.array([b]).astype(dtype.np))

test/test_dtype_alu.py::TestDTypeALU::test_float16
/Users/user/src/tinygrad/test/test_dtype_alu.py:62: RuntimeWarning: invalid value encountered in add
numpy_value = op[1](np.array([a]).astype(dtype.np), np.array([b]).astype(dtype.np))

test/test_dtype_alu.py: 19 warnings
/Users/user/src/tinygrad/test/test_dtype_alu.py:73: RuntimeWarning: invalid value encountered in sqrt
numpy_value = op1

test/test_dtype_alu.py: 18 warnings
/Users/user/src/tinygrad/test/test_dtype_alu.py:73: RuntimeWarning: invalid value encountered in log
numpy_value = op1

test/test_dtype_alu.py::TestDTypeALU::test_float16_unary
test/test_dtype_alu.py::TestDTypeALU::test_float16_unary
test/test_dtype_alu.py::TestDTypeALU::test_float16_unary
test/test_dtype_alu.py::TestDTypeALU::test_float32_unary
/Users/user/src/tinygrad/test/test_dtype_alu.py:73: RuntimeWarning: divide by zero encountered in log
numpy_value = op1

test/test_dtype_alu.py: 13 warnings
/Users/user/src/tinygrad/test/test_dtype_alu.py:73: RuntimeWarning: invalid value encountered in sin
numpy_value = op1

test/test_dtype_alu.py::TestDTypeALU::test_float16_unary
test/test_dtype_alu.py::TestDTypeALU::test_float16_unary
test/test_dtype_alu.py::TestDTypeALU::test_float32_unary
/Users/user/src/tinygrad/test/test_dtype_alu.py:73: RuntimeWarning: divide by zero encountered in reciprocal
numpy_value = op1

test/test_dtype_alu.py: 14 warnings
/Users/user/src/tinygrad/test/test_dtype_alu.py:73: RuntimeWarning: overflow encountered in exp
numpy_value = op1

test/test_dtype_alu.py: 78 warnings
/Users/user/src/tinygrad/test/test_dtype_alu.py:92: RuntimeWarning: invalid value encountered in cast
numpy_value = op2[1](op1[1](an, bn).astype(d2.np), cn)

test/test_dtype_alu.py::TestDTypeALU::test_float_midcast_int32
test/test_dtype_alu.py::TestDTypeALU::test_float_midcast_int32
test/test_dtype_alu.py::TestDTypeALU::test_float_midcast_int32
test/test_dtype_alu.py::TestDTypeALU::test_float_midcast_int32
test/test_dtype_alu.py::TestDTypeALU::test_float_midcast_int32
/Users/user/src/tinygrad/test/test_dtype_alu.py:92: RuntimeWarning: invalid value encountered in subtract
numpy_value = op2[1](op1[1](an, bn).astype(d2.np), cn)

test/test_dtype_alu.py::TestDTypeALU::test_float_midcast_int32
/Users/user/src/tinygrad/test/test_dtype_alu.py:92: RuntimeWarning: overflow encountered in add
numpy_value = op2[1](op1[1](an, bn).astype(d2.np), cn)

test/test_dtype_alu.py::TestDTypeALU::test_float_midcast_int32
/Users/user/src/tinygrad/test/test_dtype_alu.py:92: RuntimeWarning: invalid value encountered in multiply
numpy_value = op2[1](op1[1](an, bn).astype(d2.np), cn)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================================================================ slowest 20 durations =================================================================================
340.18s call test/test_multitensor.py::TestMultiTensor::test_data_parallel_resnet_train_step
267.86s call test/imported/test_indexing.py::TestIndexing::test_advancedindex
243.90s call test/test_multitensor.py::TestMultiTensor::test_simple_reduce
228.07s call test/test_multitensor.py::TestShrinkMultiTensorShardedAxis::test_ops
195.03s call test/test_dtype.py::TestAutoCastType::test_int_to_float_unary_func
185.23s call test/test_ops.py::TestOps::test_broadcast_partial
178.40s call test/test_multitensor.py::TestMultiTensor::test_data_parallel_resnet
155.75s call test/test_ops.py::TestOps::test_einsum
149.40s call test/test_ops.py::TestOps::test_conv2d
140.84s call test/test_ops.py::TestOps::test_conv2d_bs_4_cin_1
136.36s call test/test_dtype.py::TestInt8Dtype::test_upcast_ops
133.97s call test/test_ops.py::TestOps::test_conv1d
127.57s call test/unit/test_disk_tensor.py::TestSafetensors::test_efficientnet_safetensors
123.90s call test/test_ops.py::TestOps::test_conv2d_bs_4_cin_3
118.71s call test/test_multitensor.py::TestMultiTensor::test_uneven_shard
118.35s call test/imported/test_indexing.py::TestIndexing::test_index
117.26s call test/test_ops.py::TestOps::test_conv2d_bs_1_cin_1
116.80s call test/test_dtype.py::TestUint8Dtype::test_upcast_ops
114.29s call test/test_ops.py::TestOps::test_pad_slice
112.93s call test/test_multitensor.py::TestShrinkMultiTensorShardedAxis::test_unsynced_backprop_conv_bn
=============================================================================== short test summary info ===============================================================================
FAILED test/test_fusion_op.py::TestFusionOp::test_recursive_add - AssertionError: 2.480366833000062 not less than 1.0
================================================== 1 failed, 1396 passed, 146 skipped, 3 xfailed, 178 warnings in 1396.63s (0:23:16) ==================================================
(.venv) [user@macbook ~/src/tinygrad]% git checkout clang_jit
Switched to branch 'clang_jit'
Your branch is up to date with 'fork/clang_jit'.
(.venv) [user@macbook ~/src/tinygrad]% CLANG=1 python -m pytest -n=auto test/ --ignore=test/external --ignore=test/models --durations=20
================================================================================= test session starts =================================================================================
platform darwin -- Python 3.11.9, pytest-7.4.3, pluggy-1.3.0
rootdir: /Users/user/src/tinygrad
plugins: hypothesis-6.92.0, xdist-3.5.0
10 workers [1546 items]
.sssssss...........................x.............sss.........................................s.....................................xss.......s...sss.s......................... [ 11%]
.....s.............sssssssss.......................ssssss.....................................................................s...................................s............ [ 22%]
.......................................s......s........................ssss..sss..........................s.......................sssssssss............s.s..................... [ 33%]
...........sssssssss.............................ss.......................................s.......ss...ss....s..............................s..........................s....... [ 45%]
................................s.....s...............................x................sssssssss.................................sss........s...............................s.s [ 56%]
sssssssssss..................................................s.s.................F................s....................................s.......................s..........s.sss [ 67%]
...sssss....s....s........................s.......ss................sss...s.....s...s................s.......................................................sss............... [ 79%]
.........sssss..ss...........................................................................................................s....................sss.......................... [ 90%]
............................................................................................s............................................s........ [100%]
====================================================================================== FAILURES =======================================================================================
_______________________________________________________________________________ TestSample.test_sample ________________________________________________________________________________
[gw0] darwin -- Python 3.11.9 /Users/user/src/tinygrad/.venv/bin/python3

self = <test.test_sample.TestSample testMethod=test_sample>

def test_sample(self):
  X = Tensor.rand(10000, 50).realize()
  BS = 16
  idxs = np.random.randint(0, X.shape[0], size=(BS))
  # this uncovered a bug with arg sort order
  batch = [Variable(f'idx{i}', 0, X.shape[0]-1).bind(s) for i,s in enumerate(idxs.tolist())]
  x = Tensor.cat(*[X.shrink(((batch[i], batch[i]+1), None)) for i in range(BS)])
  print(idxs)
  ret = x.numpy()
  base = X.numpy()[idxs]

np.testing.assert_equal(ret, base)

test/test_sample.py:17:

args = (, array([[3.28800082e-01, 9.07372057e-01, 2.37981558e-01, 2.16014087e-01, 1.49323285e-01, 4.698...43615544e-01,
7.18499601e-01, 9.41257715e-01, 2.58335233e-01, 9.06049430e-01, 6.31195784e-01]], dtype=float32))
kwds = {'err_msg': '', 'header': 'Arrays are not equal', 'strict': False, 'verbose': True}

@wraps(func)
def inner(*args, **kwds):
    with self._recreate_cm():

      return func(*args, **kwds)

E AssertionError:
E Arrays are not equal
E
E Mismatched elements: 450 / 800 (56.2%)
E Max absolute difference: 0.9638492
E Max relative difference: 271.94272
E x: array([[3.288001e-01, 9.073721e-01, 2.379816e-01, 2.160141e-01, 1.493233e-01, 4.698067e-01, 8.117076e-01, 7.092213e-01, 3.878354e-01, 2.677359e-01,
E 1.999879e-01, 8.964562e-01, 4.113811e-02, 5.459162e-01, 4.983971e-01, 5.667402e-01, 3.959671e-01, 6.422870e-01, 8.493959e-01, 2.250167e-01,
E 6.323931e-01, 5.950993e-01, 8.642179e-01, 5.737190e-01, 1.366822e-01, 6.082999e-01, 1.459509e-02, 1.543481e-01, 4.873741e-01, 1.395370e-01,...
E y: array([[3.288001e-01, 9.073721e-01, 2.379816e-01, 2.160141e-01, 1.493233e-01, 4.698067e-01, 8.117076e-01, 7.092213e-01, 3.878354e-01, 2.677359e-01,
E 1.999879e-01, 8.964562e-01, 4.113811e-02, 5.459162e-01, 4.983971e-01, 5.667402e-01, 3.959671e-01, 6.422870e-01, 8.493959e-01, 2.250167e-01,
E 6.323931e-01, 5.950993e-01, 8.642179e-01, 5.737190e-01, 1.366822e-01, 6.082999e-01, 1.459509e-02, 1.543481e-01, 4.873741e-01, 1.395370e-01,...

/opt/homebrew/Cellar/python@3.11/3.11.9/Frameworks/Python.framework/Versions/3.11/lib/python3.11/contextlib.py:81: AssertionError
-------------------------------------------------------------------------------- Captured stdout call ---------------------------------------------------------------------------------
[ 86 5894 7979 3852 9064 3467 2933 1694 9164 5181 9148 9944 3641 8154 3293 5925]
================================================================================== warnings summary ===================================================================================
test/test_dtype.py::TestHalfDtype::test_casts_to
/Users/user/src/tinygrad/test/test_dtype.py:51: RuntimeWarning: overflow encountered in cast
_test_op(lambda: a.cast(target_dtype), target_dtype, list(a.numpy().astype(target_dtype.np)))

test/test_dtype.py::TestHalfDtype::test_casts_to
test/test_dtype.py::TestFloatDType::test_casts_to
test/test_dtype.py::TestDoubleDtype::test_casts_to
/Users/user/src/tinygrad/tinygrad/tensor.py:120: RuntimeWarning: invalid value encountered in cast
else: data = _fromcpu(data.astype(dtype.np) if dtype is not None and dtype.np is not None else data)

test/test_dtype_alu.py::TestDTypeALU::test_float16
test/test_dtype_alu.py::TestDTypeALU::test_float16
test/test_dtype_alu.py::TestDTypeALU::test_float16
test/test_dtype_alu.py::TestDTypeALU::test_float16
test/test_dtype_alu.py::TestDTypeALU::test_float32
test/test_dtype_alu.py::TestDTypeALU::test_float32
/Users/user/src/tinygrad/test/test_dtype_alu.py:62: RuntimeWarning: invalid value encountered in subtract
numpy_value = op[1](np.array([a]).astype(dtype.np), np.array([b]).astype(dtype.np))

test/test_dtype_alu.py: 21 warnings
/Users/user/src/tinygrad/test/test_dtype_alu.py:73: RuntimeWarning: invalid value encountered in log
numpy_value = op1

test/test_dtype_alu.py: 12 warnings
/Users/user/src/tinygrad/test/test_dtype_alu.py:73: RuntimeWarning: invalid value encountered in sqrt
numpy_value = op1

test/test_dtype_alu.py::TestDTypeALU::test_float16_unary
test/test_dtype_alu.py::TestDTypeALU::test_float32_unary
test/test_dtype_alu.py::TestDTypeALU::test_float32_unary
test/test_dtype_alu.py::TestDTypeALU::test_float32_unary
/Users/user/src/tinygrad/test/test_dtype_alu.py:73: RuntimeWarning: divide by zero encountered in log
numpy_value = op1

test/test_dtype_alu.py::TestDTypeALU::test_float16_unary
test/test_dtype_alu.py::TestDTypeALU::test_float16_unary
test/test_dtype_alu.py::TestDTypeALU::test_float32_unary
/Users/user/src/tinygrad/test/test_dtype_alu.py:73: RuntimeWarning: divide by zero encountered in reciprocal
numpy_value = op1

test/test_dtype_alu.py::TestDTypeALU::test_float16_unary
test/test_dtype_alu.py::TestDTypeALU::test_float16_unary
test/test_dtype_alu.py::TestDTypeALU::test_float16_unary
/Users/user/src/tinygrad/test/test_dtype_alu.py:73: RuntimeWarning: invalid value encountered in sin
numpy_value = op1

test/test_dtype_alu.py::TestDTypeALU::test_float32
test/test_dtype_alu.py::TestDTypeALU::test_float64
/Users/user/src/tinygrad/test/test_dtype_alu.py:62: RuntimeWarning: overflow encountered in add
numpy_value = op[1](np.array([a]).astype(dtype.np), np.array([b]).astype(dtype.np))

test/test_dtype_alu.py::TestDTypeALU::test_float32
test/test_dtype_alu.py::TestDTypeALU::test_float32
test/test_dtype_alu.py::TestDTypeALU::test_float64
/Users/user/src/tinygrad/test/test_dtype_alu.py:62: RuntimeWarning: overflow encountered in multiply
numpy_value = op[1](np.array([a]).astype(dtype.np), np.array([b]).astype(dtype.np))

test/test_dtype_alu.py::TestDTypeALU::test_float32_unary
test/test_dtype_alu.py::TestDTypeALU::test_float32_unary
test/test_dtype_alu.py::TestDTypeALU::test_float32_unary
test/test_dtype_alu.py::TestDTypeALU::test_float32_unary
test/test_dtype_alu.py::TestDTypeALU::test_float32_unary
test/test_dtype_alu.py::TestDTypeALU::test_float32_unary
test/test_dtype_alu.py::TestDTypeALU::test_float32_unary
test/test_dtype_alu.py::TestDTypeALU::test_float32_unary
test/test_dtype_alu.py::TestDTypeALU::test_float32_unary
/Users/user/src/tinygrad/test/test_dtype_alu.py:73: RuntimeWarning: overflow encountered in exp
numpy_value = op1

test/test_gc.py::TestGC::test_gc
test/test_gc.py::TestGC::test_gc_complex
/Users/user/src/tinygrad/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:347: UserWarning: torch.distributed.reduce_op is deprecated, please use torch.distributed.ReduceOp instead
warnings.warn(

test/test_dtype_alu.py: 63 warnings
/Users/user/src/tinygrad/test/test_dtype_alu.py:92: RuntimeWarning: invalid value encountered in cast
numpy_value = op2[1](op1[1](an, bn).astype(d2.np), cn)

test/test_dtype_alu.py::TestDTypeALU::test_float_midcast_int32
test/test_dtype_alu.py::TestDTypeALU::test_float_midcast_int32
test/test_dtype_alu.py::TestDTypeALU::test_float_midcast_int32
/Users/user/src/tinygrad/test/test_dtype_alu.py:92: RuntimeWarning: invalid value encountered in subtract
numpy_value = op2[1](op1[1](an, bn).astype(d2.np), cn)

test/test_dtype_alu.py::TestDTypeALU::test_float_midcast_int32
/Users/user/src/tinygrad/test/test_dtype_alu.py:92: RuntimeWarning: overflow encountered in multiply
numpy_value = op2[1](op1[1](an, bn).astype(d2.np), cn)

test/test_dtype_alu.py::TestDTypeALU::test_float_midcast_int32
/Users/user/src/tinygrad/test/test_dtype_alu.py:92: RuntimeWarning: overflow encountered in add
numpy_value = op2[1](op1[1](an, bn).astype(d2.np), cn)

test/test_dtype_alu.py::TestDTypeALU::test_int32_midcast_float
test/test_dtype_alu.py::TestDTypeALU::test_int32_midcast_float
test/test_dtype_alu.py::TestDTypeALU::test_int32_midcast_float
/Users/user/src/tinygrad/test/test_dtype_alu.py:92: RuntimeWarning: invalid value encountered in multiply
numpy_value = op2[1](op1[1](an, bn).astype(d2.np), cn)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================================================================ slowest 20 durations =================================================================================
6.26s call test/testextra/test_lr_scheduler.py::TestLrScheduler::test_onecyclelr
5.58s call test/test_multitensor.py::TestMultiTensor::test_data_parallel_resnet
5.24s call test/test_multitensor.py::TestMultiTensor::test_data_parallel_resnet_train_step
3.89s call test/test_linearizer_failures.py::TestLinearizerFailures::test_failure_7
3.55s call test/unit/test_shapetracker.py::TestIndexExpressions2d::test_reshape_combining_4
3.32s call test/test_copy_speed.py::TestCopySpeed::testCopyCPUtoDefaultFresh
3.02s call test/test_linearizer.py::TestHandCodedOpts::test_masked_upcast_wino_full
2.08s call test/test_speed_v_torch.py::TestSpeed::test_permute
1.95s call test/unit/test_shm_tensor.py::TestRawShmBuffer::test_e2e_big
1.79s call test/test_nn.py::TestNN::test_conv_transpose2d
1.67s call test/imported/test_indexing.py::TestIndexing::test_advancedindex
1.49s call test/test_net_speed.py::TestConvSpeed::test_mnist
1.39s call test/test_ops.py::TestOps::test_einsum
1.39s call test/test_speed_v_torch.py::TestSpeed::test_add
1.30s call test/test_fuzz_shape_ops.py::TestShapeOps::test_split
1.30s call test/test_dtype_alu.py::TestDTypeALU::test_int32_midcast_float
1.26s call test/test_dtype_alu.py::TestDTypeALU::test_float_midcast_int32
1.25s call test/test_dtype.py::TestAutoCastType::test_int_to_float_unary_func
1.24s call test/test_speed_v_torch.py::TestSpeed::test_pow
1.23s call test/test_copy_speed.py::TestCopySpeed::testCopyCPUtoDefault
=============================================================================== short test summary info ===============================================================================
FAILED test/test_sample.py::TestSample::test_sample - AssertionError:
======================================================== 1 failed, 1396 passed, 146 skipped, 3 xfailed, 140 warnings in 23.53s ========================================================
(.venv) [user@macbook ~/src/tinygrad]%

</details>

uuuvn · 2024-05-09T19:48:30Z

lol i've seen that dlopen is slow but i didn't thought it would be 60 times faster with my impl....

ym1234 · 2024-05-09T20:59:56Z

Lol I was working on this, but your code looks better so:

I used clang2py to autogenerate the elf structs/defines and then Struct.from_buffer to read them, which might be a little cleaner
I used -fno-plt and used the GOT instead so no need to generate stub instructions (sadly -fno-plt is broken on current clang release for aarch64 llvm/llvm-project@201572e)
If you use -fno-plt you just can do away with tinymath.h and EXTERNAL_SYMBOLS and do lookups on ctypes.libc directly, and this makes the clang/llvm graph runner way simpler since you can just pass the elf objects as a dict and link against them, it also helps with this bug test_dtype fails with a segfault on LLVM on linux #1367 since we can link against our own smaller compiler runtime for both clang and llvm

geohot · 2024-05-09T22:01:24Z

Cool, bounty locked. There's a lot to clean up here, but yea, dlopen is mad slow.

This reverts commit b0505d5.

This reverts commit 55a1b3d.

geohot · 2024-05-11T21:11:42Z

I don't want to have to compile other functions or deal with relocations.

How can we get the compiler to emit asm for the math functions? All the processors should support it.

On Mac:

sqrt is instruction
_log2f is call
_exp2f is call
_sinf is call
mul/div are instructions

geohot · 2024-05-12T17:01:23Z

See the Taylor series bounty, I think the solution here is not to support relocations and (out of file) function calling.

uuuvn · 2024-05-12T17:03:58Z

See the Taylor series bounty, I think the solution here is not to support relocations and (out of file) function calling.

I'm making a taylor series powered tinymath.h right now (i don't think that it being in function.py is a good idea)

Relocation support is required for any function call even from the same object file (eg in your clang graph impl) and for constants (they're loaded from memory at least on aarch64)

ym1234 · 2024-05-13T17:49:41Z

Relocation support is required for any function call even from the same object file (eg in your clang graph impl) and for constants (they're loaded from memory at least on aarch64)

Even if you don't use constants the compiler can generate them for vectorization and I couldn't really find a way to turn that off. The function calls limitation can be worked around by declaring everything except the main func static though.

github-actions · 2024-05-15T15:45:47Z

This branch currently is behind tinygrad/master. The line count difference bot is disabled.

v0

9f7834f

fix R_AARCH64_ADR_PREL_PG_HI21 and cosmetic changes

9380019

geohot added the bounty locked Bounty is locked to someone label May 9, 2024

uuuvn added 12 commits May 11, 2024 23:07

hacked but kind of working x86 support

2b1a1ef

Merge branch master into clang_jit

f70537e

meh

65d54ed

meh x2

8660f77

...

c82c5de

that's going to be fun

b0505d5

i hate x86

e677660

meh

feb6ab7

again

da9de14

what the hell is wrong with ci

143500c

Revert "that's going to be fun"

55a1b3d

This reverts commit b0505d5.

Reapply "that's going to be fun"

0c73c80

This reverts commit 55a1b3d.

uuuvn added 2 commits May 12, 2024 15:20

revert CI cpu schenenegans experements

cf9341a

beginnings of single-header tinymath.h

680b897

uuuvn added 4 commits May 12, 2024 21:26

software sin

c8b4ba8

Merge branch master into clang_jit

0df6e60

cleanup

451dd52

revert a bunch of stuff

47fa10a

uuuvn added 5 commits May 14, 2024 19:18

copy paste a bunch of stuff from musl and julia openlibm

4546e48

sin in tinymath and compile-time relocation fixup

ca4cb64

Merge branch 'master' into clang_jit

b687275

software floor impl

3bd03b5

linter

cd08da7

uuuvn added 6 commits May 15, 2024 19:15

mark as static

ef283c8

???

787ad2b

native half on x86

c717b25

native half on x86 try 2

a5bb82c

meh

2cb06a3

debugging CI weirdness

d401118

uuuvn mentioned this pull request May 16, 2024

Metal linearizer failure 22 is flaky not just on CI #4617

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clang jit #4492

Clang jit #4492

uuuvn commented May 9, 2024

uuuvn commented May 9, 2024

uuuvn commented May 9, 2024

uuuvn commented May 9, 2024

ym1234 commented May 9, 2024 •

edited

geohot commented May 9, 2024

geohot commented May 11, 2024 •

edited

geohot commented May 12, 2024 •

edited

uuuvn commented May 12, 2024 •

edited

ym1234 commented May 13, 2024

github-actions bot commented May 15, 2024

Clang jit #4492

Are you sure you want to change the base?

Clang jit #4492

Conversation

uuuvn commented May 9, 2024

uuuvn commented May 9, 2024

uuuvn commented May 9, 2024

uuuvn commented May 9, 2024

ym1234 commented May 9, 2024 • edited

geohot commented May 9, 2024

geohot commented May 11, 2024 • edited

geohot commented May 12, 2024 • edited

uuuvn commented May 12, 2024 • edited

ym1234 commented May 13, 2024

github-actions bot commented May 15, 2024

ym1234 commented May 9, 2024 •

edited

geohot commented May 11, 2024 •

edited

geohot commented May 12, 2024 •

edited

uuuvn commented May 12, 2024 •

edited