llama 2 70B not usable on tinybox green #4367

wozeparrot · 2024-05-01T02:22:19Z

CUDA=1 python3 examples/llama.py --gen 2 --size 70B --shard 6 --prompt "Hello." --count 10 --temperature 0 --timing

Runs out of host memory at loading 90% of the weights.

The text was updated successfully, but these errors were encountered:

nimlgen · 2024-05-01T09:07:47Z

Yeah, NV should be a bit better but also OOMs. Wrote a custom allocator to fit it better, but it OOM when creating graphs. Is there any way to remove this reserved memory (like it stops when it has >500ms and requesting ~80mb)?

geohot · 2024-05-03T02:16:49Z

It's not the GPU memory that's the problem AFAIK, it's the host memory not being freed.

nimlgen · 2024-05-03T17:36:07Z

Hmm, need to retest. I recall trying to set cuMemcpyHtoD_v2 in copyin and it still was an OOM during gpu buffer allocation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama 2 70B not usable on tinybox green #4367

llama 2 70B not usable on tinybox green #4367

wozeparrot commented May 1, 2024

nimlgen commented May 1, 2024

geohot commented May 3, 2024

nimlgen commented May 3, 2024

llama 2 70B not usable on tinybox green #4367

llama 2 70B not usable on tinybox green #4367

Comments

wozeparrot commented May 1, 2024

nimlgen commented May 1, 2024

geohot commented May 3, 2024

nimlgen commented May 3, 2024