Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clang jit #4492

Draft
wants to merge 31 commits into
base: master
Choose a base branch
from
Draft

Clang jit #4492

wants to merge 31 commits into from

Conversation

uuuvn
Copy link
Contributor

@uuuvn uuuvn commented May 9, 2024

No description provided.

@uuuvn
Copy link
Contributor Author

uuuvn commented May 9, 2024

it mostly works on aarch64, everything fails because x86, i'll make the last few test pass on aarch64 and then x86

@uuuvn
Copy link
Contributor Author

uuuvn commented May 9, 2024

1396.63s => 23.53s on m1 max ~60x improvement ```python (.venv) [user@macbook ~/src/tinygrad]% git checkout master Switched to branch 'master' Your branch is up to date with 'origin/master'. (.venv) [user@macbook ~/src/tinygrad]% CLANG=1 python -m pytest -n=auto test/ --ignore=test/external --ignore=test/models --durations=20 ================================================================================= test session starts ================================================================================= platform darwin -- Python 3.11.9, pytest-7.4.3, pluggy-1.3.0 rootdir: /Users/user/src/tinygrad plugins: hypothesis-6.92.0, xdist-3.5.0 10 workers [1546 items] ...sssssss........................x............sssssssss........sss..........................xss.................................s...sss.....s................sssssssss.....sss [ 11%] s.................sss.....s.........sss......ssssssssssss..............ssssss..sssssssss...............s....s............s..................................................... [ 22%] ............................s.......s...........................................s......ss..............................................s.............................s......... [ 33%] ..........................F..........s...............................................................................................................s......................... [ 45%] ....................................s.................................s......................................sssss............s.x..............s.....s...s...ss.....s....s..... [ 56%] ....s...........................s.......................................................................s.....s....................s.........ssss.s............................ [ 67%] ......................s.....ss...............s.ssssssss...s....s.............................s........s...s...................................................sss....sss....... [ 79%] ........................................................................................................s...................................................................... [ 90%] ......................................................s.......................s......ss............................................ssss..sss...s.. [100%] ====================================================================================== FAILURES ======================================================================================= ___________________________________________________________________________ TestFusionOp.test_recursive_add ___________________________________________________________________________ [gw6] darwin -- Python 3.11.9 /Users/user/src/tinygrad/.venv/bin/python3

self = <test.test_fusion_op.TestFusionOp testMethod=test_recursive_add>

def test_recursive_add(self):
  st = time.perf_counter()
  a = Tensor([1,2,3,4])
  for _ in range(24): a = a + a
  sched = create_schedule([a.lazydata], None)
  ji = Device[Device.DEFAULT].get_runner(*sched[-1].ast)
self.assertLess(time.perf_counter()-st, 1.0)

E AssertionError: 2.480366833000062 not less than 1.0

test/test_fusion_op.py:31: AssertionError
================================================================================== warnings summary ===================================================================================
test/test_gc.py::TestGC::test_gc
test/test_gc.py::TestGC::test_gc_complex
/Users/user/src/tinygrad/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:347: UserWarning: torch.distributed.reduce_op is deprecated, please use torch.distributed.ReduceOp instead
warnings.warn(

test/test_dtype.py::TestHalfDtype::test_casts_to
test/test_dtype.py::TestHalfDtype::test_casts_to
/Users/user/src/tinygrad/test/test_dtype.py:51: RuntimeWarning: overflow encountered in cast
_test_op(lambda: a.cast(target_dtype), target_dtype, list(a.numpy().astype(target_dtype.np)))

test/test_dtype.py::TestHalfDtype::test_casts_to
test/test_dtype.py::TestHalfDtype::test_casts_to
test/test_dtype.py::TestHalfDtype::test_casts_to
test/test_dtype.py::TestFloatDType::test_casts_to
test/test_dtype.py::TestFloatDType::test_casts_to
test/test_dtype.py::TestFloatDType::test_casts_to
test/test_dtype.py::TestDoubleDtype::test_casts_to
test/test_dtype.py::TestDoubleDtype::test_casts_to
test/test_dtype.py::TestDoubleDtype::test_casts_to
/Users/user/src/tinygrad/tinygrad/tensor.py:120: RuntimeWarning: invalid value encountered in cast
else: data = _fromcpu(data.astype(dtype.np) if dtype is not None and dtype.np is not None else data)

test/test_dtype_alu.py::TestDTypeALU::test_float16
test/test_dtype_alu.py::TestDTypeALU::test_float32
test/test_dtype_alu.py::TestDTypeALU::test_float32
test/test_dtype_alu.py::TestDTypeALU::test_float32
test/test_dtype_alu.py::TestDTypeALU::test_float32
test/test_dtype_alu.py::TestDTypeALU::test_float32
test/test_dtype_alu.py::TestDTypeALU::test_float32
test/test_dtype_alu.py::TestDTypeALU::test_float64
/Users/user/src/tinygrad/test/test_dtype_alu.py:62: RuntimeWarning: invalid value encountered in subtract
numpy_value = op[1](np.array([a]).astype(dtype.np), np.array([b]).astype(dtype.np))

test/test_dtype_alu.py::TestDTypeALU::test_float16
/Users/user/src/tinygrad/test/test_dtype_alu.py:62: RuntimeWarning: invalid value encountered in add
numpy_value = op[1](np.array([a]).astype(dtype.np), np.array([b]).astype(dtype.np))

test/test_dtype_alu.py: 19 warnings
/Users/user/src/tinygrad/test/test_dtype_alu.py:73: RuntimeWarning: invalid value encountered in sqrt
numpy_value = op1

test/test_dtype_alu.py: 18 warnings
/Users/user/src/tinygrad/test/test_dtype_alu.py:73: RuntimeWarning: invalid value encountered in log
numpy_value = op1

test/test_dtype_alu.py::TestDTypeALU::test_float16_unary
test/test_dtype_alu.py::TestDTypeALU::test_float16_unary
test/test_dtype_alu.py::TestDTypeALU::test_float16_unary
test/test_dtype_alu.py::TestDTypeALU::test_float32_unary
/Users/user/src/tinygrad/test/test_dtype_alu.py:73: RuntimeWarning: divide by zero encountered in log
numpy_value = op1

test/test_dtype_alu.py: 13 warnings
/Users/user/src/tinygrad/test/test_dtype_alu.py:73: RuntimeWarning: invalid value encountered in sin
numpy_value = op1

test/test_dtype_alu.py::TestDTypeALU::test_float16_unary
test/test_dtype_alu.py::TestDTypeALU::test_float16_unary
test/test_dtype_alu.py::TestDTypeALU::test_float32_unary
/Users/user/src/tinygrad/test/test_dtype_alu.py:73: RuntimeWarning: divide by zero encountered in reciprocal
numpy_value = op1

test/test_dtype_alu.py: 14 warnings
/Users/user/src/tinygrad/test/test_dtype_alu.py:73: RuntimeWarning: overflow encountered in exp
numpy_value = op1

test/test_dtype_alu.py: 78 warnings
/Users/user/src/tinygrad/test/test_dtype_alu.py:92: RuntimeWarning: invalid value encountered in cast
numpy_value = op2[1](op1[1](an, bn).astype(d2.np), cn)

test/test_dtype_alu.py::TestDTypeALU::test_float_midcast_int32
test/test_dtype_alu.py::TestDTypeALU::test_float_midcast_int32
test/test_dtype_alu.py::TestDTypeALU::test_float_midcast_int32
test/test_dtype_alu.py::TestDTypeALU::test_float_midcast_int32
test/test_dtype_alu.py::TestDTypeALU::test_float_midcast_int32
/Users/user/src/tinygrad/test/test_dtype_alu.py:92: RuntimeWarning: invalid value encountered in subtract
numpy_value = op2[1](op1[1](an, bn).astype(d2.np), cn)

test/test_dtype_alu.py::TestDTypeALU::test_float_midcast_int32
/Users/user/src/tinygrad/test/test_dtype_alu.py:92: RuntimeWarning: overflow encountered in add
numpy_value = op2[1](op1[1](an, bn).astype(d2.np), cn)

test/test_dtype_alu.py::TestDTypeALU::test_float_midcast_int32
/Users/user/src/tinygrad/test/test_dtype_alu.py:92: RuntimeWarning: invalid value encountered in multiply
numpy_value = op2[1](op1[1](an, bn).astype(d2.np), cn)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================================================================ slowest 20 durations =================================================================================
340.18s call test/test_multitensor.py::TestMultiTensor::test_data_parallel_resnet_train_step
267.86s call test/imported/test_indexing.py::TestIndexing::test_advancedindex
243.90s call test/test_multitensor.py::TestMultiTensor::test_simple_reduce
228.07s call test/test_multitensor.py::TestShrinkMultiTensorShardedAxis::test_ops
195.03s call test/test_dtype.py::TestAutoCastType::test_int_to_float_unary_func
185.23s call test/test_ops.py::TestOps::test_broadcast_partial
178.40s call test/test_multitensor.py::TestMultiTensor::test_data_parallel_resnet
155.75s call test/test_ops.py::TestOps::test_einsum
149.40s call test/test_ops.py::TestOps::test_conv2d
140.84s call test/test_ops.py::TestOps::test_conv2d_bs_4_cin_1
136.36s call test/test_dtype.py::TestInt8Dtype::test_upcast_ops
133.97s call test/test_ops.py::TestOps::test_conv1d
127.57s call test/unit/test_disk_tensor.py::TestSafetensors::test_efficientnet_safetensors
123.90s call test/test_ops.py::TestOps::test_conv2d_bs_4_cin_3
118.71s call test/test_multitensor.py::TestMultiTensor::test_uneven_shard
118.35s call test/imported/test_indexing.py::TestIndexing::test_index
117.26s call test/test_ops.py::TestOps::test_conv2d_bs_1_cin_1
116.80s call test/test_dtype.py::TestUint8Dtype::test_upcast_ops
114.29s call test/test_ops.py::TestOps::test_pad_slice
112.93s call test/test_multitensor.py::TestShrinkMultiTensorShardedAxis::test_unsynced_backprop_conv_bn
=============================================================================== short test summary info ===============================================================================
FAILED test/test_fusion_op.py::TestFusionOp::test_recursive_add - AssertionError: 2.480366833000062 not less than 1.0
================================================== 1 failed, 1396 passed, 146 skipped, 3 xfailed, 178 warnings in 1396.63s (0:23:16) ==================================================
(.venv) [user@macbook ~/src/tinygrad]% git checkout clang_jit
Switched to branch 'clang_jit'
Your branch is up to date with 'fork/clang_jit'.
(.venv) [user@macbook ~/src/tinygrad]% CLANG=1 python -m pytest -n=auto test/ --ignore=test/external --ignore=test/models --durations=20
================================================================================= test session starts =================================================================================
platform darwin -- Python 3.11.9, pytest-7.4.3, pluggy-1.3.0
rootdir: /Users/user/src/tinygrad
plugins: hypothesis-6.92.0, xdist-3.5.0
10 workers [1546 items]
.sssssss...........................x.............sss.........................................s.....................................xss.......s...sss.s......................... [ 11%]
.....s.............sssssssss.......................ssssss.....................................................................s...................................s............ [ 22%]
.......................................s......s........................ssss..sss..........................s.......................sssssssss............s.s..................... [ 33%]
...........sssssssss.............................ss.......................................s.......ss...ss....s..............................s..........................s....... [ 45%]
................................s.....s...............................x................sssssssss.................................sss........s...............................s.s [ 56%]
sssssssssss..................................................s.s.................F................s....................................s.......................s..........s.sss [ 67%]
...sssss....s....s........................s.......ss................sss...s.....s...s................s.......................................................sss............... [ 79%]
.........sssss..ss...........................................................................................................s....................sss.......................... [ 90%]
............................................................................................s............................................s........ [100%]
====================================================================================== FAILURES =======================================================================================
_______________________________________________________________________________ TestSample.test_sample ________________________________________________________________________________
[gw0] darwin -- Python 3.11.9 /Users/user/src/tinygrad/.venv/bin/python3

self = <test.test_sample.TestSample testMethod=test_sample>

def test_sample(self):
  X = Tensor.rand(10000, 50).realize()
  BS = 16
  idxs = np.random.randint(0, X.shape[0], size=(BS))
  # this uncovered a bug with arg sort order
  batch = [Variable(f'idx{i}', 0, X.shape[0]-1).bind(s) for i,s in enumerate(idxs.tolist())]
  x = Tensor.cat(*[X.shrink(((batch[i], batch[i]+1), None)) for i in range(BS)])
  print(idxs)
  ret = x.numpy()
  base = X.numpy()[idxs]
np.testing.assert_equal(ret, base)

test/test_sample.py:17:


args = (, array([[3.28800082e-01, 9.07372057e-01, 2.37981558e-01, 2.16014087e-01, 1.49323285e-01, 4.698...43615544e-01,
7.18499601e-01, 9.41257715e-01, 2.58335233e-01, 9.06049430e-01, 6.31195784e-01]], dtype=float32))
kwds = {'err_msg': '', 'header': 'Arrays are not equal', 'strict': False, 'verbose': True}

@wraps(func)
def inner(*args, **kwds):
    with self._recreate_cm():
      return func(*args, **kwds)

E AssertionError:
E Arrays are not equal
E
E Mismatched elements: 450 / 800 (56.2%)
E Max absolute difference: 0.9638492
E Max relative difference: 271.94272
E x: array([[3.288001e-01, 9.073721e-01, 2.379816e-01, 2.160141e-01, 1.493233e-01, 4.698067e-01, 8.117076e-01, 7.092213e-01, 3.878354e-01, 2.677359e-01,
E 1.999879e-01, 8.964562e-01, 4.113811e-02, 5.459162e-01, 4.983971e-01, 5.667402e-01, 3.959671e-01, 6.422870e-01, 8.493959e-01, 2.250167e-01,
E 6.323931e-01, 5.950993e-01, 8.642179e-01, 5.737190e-01, 1.366822e-01, 6.082999e-01, 1.459509e-02, 1.543481e-01, 4.873741e-01, 1.395370e-01,...
E y: array([[3.288001e-01, 9.073721e-01, 2.379816e-01, 2.160141e-01, 1.493233e-01, 4.698067e-01, 8.117076e-01, 7.092213e-01, 3.878354e-01, 2.677359e-01,
E 1.999879e-01, 8.964562e-01, 4.113811e-02, 5.459162e-01, 4.983971e-01, 5.667402e-01, 3.959671e-01, 6.422870e-01, 8.493959e-01, 2.250167e-01,
E 6.323931e-01, 5.950993e-01, 8.642179e-01, 5.737190e-01, 1.366822e-01, 6.082999e-01, 1.459509e-02, 1.543481e-01, 4.873741e-01, 1.395370e-01,...

/opt/homebrew/Cellar/python@3.11/3.11.9/Frameworks/Python.framework/Versions/3.11/lib/python3.11/contextlib.py:81: AssertionError
-------------------------------------------------------------------------------- Captured stdout call ---------------------------------------------------------------------------------
[ 86 5894 7979 3852 9064 3467 2933 1694 9164 5181 9148 9944 3641 8154 3293 5925]
================================================================================== warnings summary ===================================================================================
test/test_dtype.py::TestHalfDtype::test_casts_to
/Users/user/src/tinygrad/test/test_dtype.py:51: RuntimeWarning: overflow encountered in cast
_test_op(lambda: a.cast(target_dtype), target_dtype, list(a.numpy().astype(target_dtype.np)))

test/test_dtype.py::TestHalfDtype::test_casts_to
test/test_dtype.py::TestFloatDType::test_casts_to
test/test_dtype.py::TestDoubleDtype::test_casts_to
/Users/user/src/tinygrad/tinygrad/tensor.py:120: RuntimeWarning: invalid value encountered in cast
else: data = _fromcpu(data.astype(dtype.np) if dtype is not None and dtype.np is not None else data)

test/test_dtype_alu.py::TestDTypeALU::test_float16
test/test_dtype_alu.py::TestDTypeALU::test_float16
test/test_dtype_alu.py::TestDTypeALU::test_float16
test/test_dtype_alu.py::TestDTypeALU::test_float16
test/test_dtype_alu.py::TestDTypeALU::test_float32
test/test_dtype_alu.py::TestDTypeALU::test_float32
/Users/user/src/tinygrad/test/test_dtype_alu.py:62: RuntimeWarning: invalid value encountered in subtract
numpy_value = op[1](np.array([a]).astype(dtype.np), np.array([b]).astype(dtype.np))

test/test_dtype_alu.py: 21 warnings
/Users/user/src/tinygrad/test/test_dtype_alu.py:73: RuntimeWarning: invalid value encountered in log
numpy_value = op1

test/test_dtype_alu.py: 12 warnings
/Users/user/src/tinygrad/test/test_dtype_alu.py:73: RuntimeWarning: invalid value encountered in sqrt
numpy_value = op1

test/test_dtype_alu.py::TestDTypeALU::test_float16_unary
test/test_dtype_alu.py::TestDTypeALU::test_float32_unary
test/test_dtype_alu.py::TestDTypeALU::test_float32_unary
test/test_dtype_alu.py::TestDTypeALU::test_float32_unary
/Users/user/src/tinygrad/test/test_dtype_alu.py:73: RuntimeWarning: divide by zero encountered in log
numpy_value = op1

test/test_dtype_alu.py::TestDTypeALU::test_float16_unary
test/test_dtype_alu.py::TestDTypeALU::test_float16_unary
test/test_dtype_alu.py::TestDTypeALU::test_float32_unary
/Users/user/src/tinygrad/test/test_dtype_alu.py:73: RuntimeWarning: divide by zero encountered in reciprocal
numpy_value = op1

test/test_dtype_alu.py::TestDTypeALU::test_float16_unary
test/test_dtype_alu.py::TestDTypeALU::test_float16_unary
test/test_dtype_alu.py::TestDTypeALU::test_float16_unary
/Users/user/src/tinygrad/test/test_dtype_alu.py:73: RuntimeWarning: invalid value encountered in sin
numpy_value = op1

test/test_dtype_alu.py::TestDTypeALU::test_float32
test/test_dtype_alu.py::TestDTypeALU::test_float64
/Users/user/src/tinygrad/test/test_dtype_alu.py:62: RuntimeWarning: overflow encountered in add
numpy_value = op[1](np.array([a]).astype(dtype.np), np.array([b]).astype(dtype.np))

test/test_dtype_alu.py::TestDTypeALU::test_float32
test/test_dtype_alu.py::TestDTypeALU::test_float32
test/test_dtype_alu.py::TestDTypeALU::test_float64
/Users/user/src/tinygrad/test/test_dtype_alu.py:62: RuntimeWarning: overflow encountered in multiply
numpy_value = op[1](np.array([a]).astype(dtype.np), np.array([b]).astype(dtype.np))

test/test_dtype_alu.py::TestDTypeALU::test_float32_unary
test/test_dtype_alu.py::TestDTypeALU::test_float32_unary
test/test_dtype_alu.py::TestDTypeALU::test_float32_unary
test/test_dtype_alu.py::TestDTypeALU::test_float32_unary
test/test_dtype_alu.py::TestDTypeALU::test_float32_unary
test/test_dtype_alu.py::TestDTypeALU::test_float32_unary
test/test_dtype_alu.py::TestDTypeALU::test_float32_unary
test/test_dtype_alu.py::TestDTypeALU::test_float32_unary
test/test_dtype_alu.py::TestDTypeALU::test_float32_unary
/Users/user/src/tinygrad/test/test_dtype_alu.py:73: RuntimeWarning: overflow encountered in exp
numpy_value = op1

test/test_gc.py::TestGC::test_gc
test/test_gc.py::TestGC::test_gc_complex
/Users/user/src/tinygrad/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:347: UserWarning: torch.distributed.reduce_op is deprecated, please use torch.distributed.ReduceOp instead
warnings.warn(

test/test_dtype_alu.py: 63 warnings
/Users/user/src/tinygrad/test/test_dtype_alu.py:92: RuntimeWarning: invalid value encountered in cast
numpy_value = op2[1](op1[1](an, bn).astype(d2.np), cn)

test/test_dtype_alu.py::TestDTypeALU::test_float_midcast_int32
test/test_dtype_alu.py::TestDTypeALU::test_float_midcast_int32
test/test_dtype_alu.py::TestDTypeALU::test_float_midcast_int32
/Users/user/src/tinygrad/test/test_dtype_alu.py:92: RuntimeWarning: invalid value encountered in subtract
numpy_value = op2[1](op1[1](an, bn).astype(d2.np), cn)

test/test_dtype_alu.py::TestDTypeALU::test_float_midcast_int32
/Users/user/src/tinygrad/test/test_dtype_alu.py:92: RuntimeWarning: overflow encountered in multiply
numpy_value = op2[1](op1[1](an, bn).astype(d2.np), cn)

test/test_dtype_alu.py::TestDTypeALU::test_float_midcast_int32
/Users/user/src/tinygrad/test/test_dtype_alu.py:92: RuntimeWarning: overflow encountered in add
numpy_value = op2[1](op1[1](an, bn).astype(d2.np), cn)

test/test_dtype_alu.py::TestDTypeALU::test_int32_midcast_float
test/test_dtype_alu.py::TestDTypeALU::test_int32_midcast_float
test/test_dtype_alu.py::TestDTypeALU::test_int32_midcast_float
/Users/user/src/tinygrad/test/test_dtype_alu.py:92: RuntimeWarning: invalid value encountered in multiply
numpy_value = op2[1](op1[1](an, bn).astype(d2.np), cn)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================================================================ slowest 20 durations =================================================================================
6.26s call test/testextra/test_lr_scheduler.py::TestLrScheduler::test_onecyclelr
5.58s call test/test_multitensor.py::TestMultiTensor::test_data_parallel_resnet
5.24s call test/test_multitensor.py::TestMultiTensor::test_data_parallel_resnet_train_step
3.89s call test/test_linearizer_failures.py::TestLinearizerFailures::test_failure_7
3.55s call test/unit/test_shapetracker.py::TestIndexExpressions2d::test_reshape_combining_4
3.32s call test/test_copy_speed.py::TestCopySpeed::testCopyCPUtoDefaultFresh
3.02s call test/test_linearizer.py::TestHandCodedOpts::test_masked_upcast_wino_full
2.08s call test/test_speed_v_torch.py::TestSpeed::test_permute
1.95s call test/unit/test_shm_tensor.py::TestRawShmBuffer::test_e2e_big
1.79s call test/test_nn.py::TestNN::test_conv_transpose2d
1.67s call test/imported/test_indexing.py::TestIndexing::test_advancedindex
1.49s call test/test_net_speed.py::TestConvSpeed::test_mnist
1.39s call test/test_ops.py::TestOps::test_einsum
1.39s call test/test_speed_v_torch.py::TestSpeed::test_add
1.30s call test/test_fuzz_shape_ops.py::TestShapeOps::test_split
1.30s call test/test_dtype_alu.py::TestDTypeALU::test_int32_midcast_float
1.26s call test/test_dtype_alu.py::TestDTypeALU::test_float_midcast_int32
1.25s call test/test_dtype.py::TestAutoCastType::test_int_to_float_unary_func
1.24s call test/test_speed_v_torch.py::TestSpeed::test_pow
1.23s call test/test_copy_speed.py::TestCopySpeed::testCopyCPUtoDefault
=============================================================================== short test summary info ===============================================================================
FAILED test/test_sample.py::TestSample::test_sample - AssertionError:
======================================================== 1 failed, 1396 passed, 146 skipped, 3 xfailed, 140 warnings in 23.53s ========================================================
(.venv) [user@macbook ~/src/tinygrad]%

</details>

@uuuvn
Copy link
Contributor Author

uuuvn commented May 9, 2024

lol i've seen that dlopen is slow but i didn't thought it would be 60 times faster with my impl....

@ym1234
Copy link
Contributor

ym1234 commented May 9, 2024

Lol I was working on this, but your code looks better so:

  • I used clang2py to autogenerate the elf structs/defines and then Struct.from_buffer to read them, which might be a little cleaner
  • I used -fno-plt and used the GOT instead so no need to generate stub instructions (sadly -fno-plt is broken on current clang release for aarch64 llvm/llvm-project@201572e)
    If you use -fno-plt you just can do away with tinymath.h and EXTERNAL_SYMBOLS and do lookups on ctypes.libc directly, and this makes the clang/llvm graph runner way simpler since you can just pass the elf objects as a dict and link against them, it also helps with this bug test_dtype fails with a segfault on LLVM on linux #1367 since we can link against our own smaller compiler runtime for both clang and llvm

@geohot geohot added the bounty locked Bounty is locked to someone label May 9, 2024
@geohot
Copy link
Collaborator

geohot commented May 9, 2024

Cool, bounty locked. There's a lot to clean up here, but yea, dlopen is mad slow.

@geohot
Copy link
Collaborator

geohot commented May 11, 2024

I don't want to have to compile other functions or deal with relocations.

How can we get the compiler to emit asm for the math functions? All the processors should support it.

On Mac:

  • sqrt is instruction
  • _log2f is call
  • _exp2f is call
  • _sinf is call
  • mul/div are instructions

@geohot
Copy link
Collaborator

geohot commented May 12, 2024

See the Taylor series bounty, I think the solution here is not to support relocations and (out of file) function calling.

@uuuvn
Copy link
Contributor Author

uuuvn commented May 12, 2024

See the Taylor series bounty, I think the solution here is not to support relocations and (out of file) function calling.

I'm making a taylor series powered tinymath.h right now (i don't think that it being in function.py is a good idea)

Relocation support is required for any function call even from the same object file (eg in your clang graph impl) and for constants (they're loaded from memory at least on aarch64)

@ym1234
Copy link
Contributor

ym1234 commented May 13, 2024

Relocation support is required for any function call even from the same object file (eg in your clang graph impl) and for constants (they're loaded from memory at least on aarch64)

Even if you don't use constants the compiler can generate them for vectorization and I couldn't really find a way to turn that off. The function calls limitation can be worked around by declaring everything except the main func static though.

Copy link
Contributor

This branch currently is behind tinygrad/master. The line count difference bot is disabled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bounty locked Bounty is locked to someone
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants