You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Yeah, NV should be a bit better but also OOMs. Wrote a custom allocator to fit it better, but it OOM when creating graphs. Is there any way to remove this reserved memory (like it stops when it has >500ms and requesting ~80mb)?
CUDA=1 python3 examples/llama.py --gen 2 --size 70B --shard 6 --prompt "Hello." --count 10 --temperature 0 --timing
Runs out of host memory at loading 90% of the weights.
The text was updated successfully, but these errors were encountered: