Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for AMD devices #413

Closed
wants to merge 5 commits into from
Closed

Conversation

anthonix
Copy link
Contributor

This unobtrusively adds support for AMD devices, in a way that minimizes changes or adding new code.

Performance with bfloat16 on a 7900 XTX is ~50,000 toks/sec for a single GPU, and ~210,000 toks/sec for 4x GPUs (as a frame of reference, the latest pytorch 2.4.0.dev20240513 runs at ~42,000 toks/sec on a single device).

Should this be merged here, or maintained as a separate fork?

@anthonix
Copy link
Contributor Author

FWIW, I've been playing around with more aggressive AMD specific optimizations that realize good gains over this, but I thought it was worth getting some feedback on how baseline AMD support could be integrated first.

@karpathy
Copy link
Owner

This is very interesting to browse through and see! But yes, I think separate fork makes a lot more sense for AMD, and super happy to link to it in the notable forks section. It's cool that you seem to be getting some very nice throughputs!

@anthonix
Copy link
Contributor Author

Sounds good, will open another pull req with link to the fork in the README

@anthonix anthonix closed this May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants