Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_gpt2.cu correctness bounds tune per-parameter #223

Open
karpathy opened this issue Apr 22, 2024 · 1 comment
Open

test_gpt2.cu correctness bounds tune per-parameter #223

karpathy opened this issue Apr 22, 2024 · 1 comment
Labels

Comments

@karpathy
Copy link
Owner

adding a todo

this is me being a bit paranoid but in test_gpt2.cu we check that our code agrees with pytorch reference. we're using a single global threshold for all comparisons of 1e-2. we could instead compare the gradients on the parameters parameter by parameter, and tune this amount to be per-parameter as low as we can make it, maybe eyeballing a plus ~10% buffer. otherwise my concern is that one global 1e-2 could be too large for some of these parameter gradients in absolute terms, and we could be making silent errors with new kernels. when we "trip the wire" with a new kernel, we should inspect manually and carefully that things are ok despite tripping the check, and it's okay to increase the bound.

the code for checking all parameters is already there, but commented out.

would welcome a PR that digs into this on per-parameter basis and looks at what thresholds we can get away with in this comparison.

@swayam0322
Copy link

Still a beginner at writing kernels, would love to work on this issue and delve deep experimenting with the weights this summer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants