RuntimeError #16

BlackHandsomeLee · 2024-03-17T13:29:22Z

When I run the script of Training Unified model (GRIT)。
got a error:
RuntimeError: NVML_SUCCESS == DriverAPI::get()->nvmlDeviceGetHandleByPciBusId_v2_( pci_id, &nvml_device) INTERNAL ASSERT FAILED at "../c10/cuda/CUDACachingAllocator.cpp":1139, please report a bug to PyTorch.

This error involves operations related to NVML (NVIDIA Management Library) and is likely related to the handling of CUDA and PyTorch

Could you please provide the versions of the various packages you were running at that time?

Muennighoff · 2024-03-17T15:49:12Z

I've added our torch version here: https://github.com/ContextualAI/gritlm?tab=readme-ov-file#run
Let me know if it's still not clear!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError #16

RuntimeError #16

BlackHandsomeLee commented Mar 17, 2024

Muennighoff commented Mar 17, 2024

RuntimeError #16

RuntimeError #16

Comments

BlackHandsomeLee commented Mar 17, 2024

Muennighoff commented Mar 17, 2024