test_gpt2.cu correctness bounds tune per-parameter #223

karpathy · 2024-04-22T16:47:31Z

adding a todo

this is me being a bit paranoid but in test_gpt2.cu we check that our code agrees with pytorch reference. we're using a single global threshold for all comparisons of 1e-2. we could instead compare the gradients on the parameters parameter by parameter, and tune this amount to be per-parameter as low as we can make it, maybe eyeballing a plus ~10% buffer. otherwise my concern is that one global 1e-2 could be too large for some of these parameter gradients in absolute terms, and we could be making silent errors with new kernels. when we "trip the wire" with a new kernel, we should inspect manually and carefully that things are ok despite tripping the check, and it's okay to increase the bound.

the code for checking all parameters is already there, but commented out.

would welcome a PR that digs into this on per-parameter basis and looks at what thresholds we can get away with in this comparison.

The text was updated successfully, but these errors were encountered:

swayam0322 · 2024-05-25T21:43:15Z

Still a beginner at writing kernels, would love to work on this issue and delve deep experimenting with the weights this summer.

karpathy added feature-request good first issue Good for newcomers labels Apr 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test_gpt2.cu correctness bounds tune per-parameter #223

test_gpt2.cu correctness bounds tune per-parameter #223

karpathy commented Apr 22, 2024

swayam0322 commented May 25, 2024

test_gpt2.cu correctness bounds tune per-parameter #223

test_gpt2.cu correctness bounds tune per-parameter #223

Comments

karpathy commented Apr 22, 2024

swayam0322 commented May 25, 2024