Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama 2 70B not usable on tinybox green #4367

Open
wozeparrot opened this issue May 1, 2024 · 3 comments
Open

llama 2 70B not usable on tinybox green #4367

wozeparrot opened this issue May 1, 2024 · 3 comments

Comments

@wozeparrot
Copy link
Collaborator

CUDA=1 python3 examples/llama.py --gen 2 --size 70B --shard 6 --prompt "Hello." --count 10 --temperature 0 --timing

Runs out of host memory at loading 90% of the weights.

@nimlgen
Copy link
Collaborator

nimlgen commented May 1, 2024

Yeah, NV should be a bit better but also OOMs. Wrote a custom allocator to fit it better, but it OOM when creating graphs. Is there any way to remove this reserved memory (like it stops when it has >500ms and requesting ~80mb)?

@geohot
Copy link
Collaborator

geohot commented May 3, 2024

It's not the GPU memory that's the problem AFAIK, it's the host memory not being freed.

@nimlgen
Copy link
Collaborator

nimlgen commented May 3, 2024

Hmm, need to retest. I recall trying to set cuMemcpyHtoD_v2 in copyin and it still was an OOM during gpu buffer allocation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants