Error: make: *** [Makefile:203: train_gpt2cu] Error 255 #359

yushengsu-thu · 2024-05-05T14:29:56Z

Environment:

System: Ubuntu 22.04.2 LTS
CUDA Version: cuda_12.1.r12.1/compiler.32688072_0
nvcc: 12.1

I encounter an error when I execute:

make train_gpt2cu

Warring and error message:

---------------------------------------------
→ cuDNN is manually disabled by default, run make with `USE_CUDNN=1` to try to enable
✓ OpenMP found
✓ OpenMPI found, OK to train with multiple GPUs
✓ nvcc found, including GPU/CUDA support
---------------------------------------------
/lustre/apps/apps/cuda/cuda-12.1/bin/nvcc -O3 -t=0 --use_fast_math -DMULTI_GPU -DENABLE_BF16 train_gpt2.cu -lcublas -lcublasLt -L/usr/lib/x86_64-linux-gnu/openmpi/lib/ -I/usr/lib/x86_64-linux-gnu/openmpi/include -lmpi -lnccl -o train_gpt2cu
train_gpt2.cu(284): warning #20012-D: __device__ annotation is ignored on a function("Packed128") that is explicitly defaulted on its first declaration
      __attribute__((device)) Packed128() = default;
                     ^

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

train_gpt2.cu(201): error: identifier "__ushort_as_bfloat16" is undefined
      __nv_bfloat162 add_val = (ptr_val & 0x3) ? __halves2bfloat162(__ushort_as_bfloat16(0), val)
                                                                    ^

train_gpt2.cu(201): error: identifier "__halves2bfloat162" is undefined
      __nv_bfloat162 add_val = (ptr_val & 0x3) ? __halves2bfloat162(__ushort_as_bfloat16(0), val)
                                                 ^

train_gpt2.cu(203): error: no instance of overloaded function "atomicAdd" matches the argument list
            argument types are: (__nv_bfloat162 *, __nv_bfloat162)
      atomicAdd(ptr_bf16, add_val);
      ^

train_gpt2.cu(242): error: no operator "+=" matches these operands
            operand types are: floatX += float
          val += __shfl_xor_sync(0xFFFFFFFF, val, offset);
              ^

train_gpt2.cu(284): warning #20012-D: __device__ annotation is ignored on a function("Packed128") that is explicitly defaulted on its first declaration
      __attribute__((device)) Packed128() = default;
                     ^

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

train_gpt2.cu(608): error: no instance of overloaded function "__stcs" matches the argument list
            argument types are: (floatX *, floatX)
          __stcs(mean + idx, (floatX)m);
          ^

train_gpt2.cu(620): error: no instance of overloaded function "__stcs" matches the argument list
            argument types are: (floatX *, floatX)
          __stcs(rstd + idx, (floatX)s);
          ^

train_gpt2.cu(629): error: no instance of overloaded function "__ldcs" matches the argument list
            argument types are: (const floatX *)
          float n = s * ((float)__ldcs(x+c) - m);
                                ^

train_gpt2.cu(630): error: no instance of overloaded function "__stcs" matches the argument list
            argument types are: (floatX *, floatX)
          __stcs(o+c, (floatX)(n * (float)weight[c] + (float)bias[c]));
          ^

train_gpt2.cu(650): error: no instance of overloaded function "__ldcs" matches the argument list
            argument types are: (const floatX *)
          q[idx] = __ldcs(&inp[inp_idx]);
                   ^

train_gpt2.cu(651): error: no instance of overloaded function "__ldcs" matches the argument list
            argument types are: (const floatX *)
          k[idx] = __ldcs(&inp[inp_idx + NH * d]);
                   ^

train_gpt2.cu(652): error: no instance of overloaded function "__ldcs" matches the argument list
            argument types are: (const floatX *)
          v[idx] = __ldcs(&inp[inp_idx + 2 * (NH * d)]);
                   ^

train_gpt2.cu(688): error: no instance of overloaded function "__ldcs" matches the argument list
            argument types are: (floatX *)
          out[other_idx] = __ldcs(&inp[idx]);
                           ^

train_gpt2.cu(769): error: no instance of overloaded function "__ldcs" matches the argument list
            argument types are: (const floatX *)
          float ev = expf(inv_temperature * ((float)__ldcs(x + i) - global_maxval));
                                                    ^

train_gpt2.cu(770): error: no instance of overloaded function "__stcs" matches the argument list
            argument types are: (floatX *, floatX)
          __stcs(out + idx * T + i, (floatX)(ev * norm));
          ^

train_gpt2.cu(924): error: no instance of overloaded function "__ldcs" matches the argument list
            argument types are: (const floatX *)
              float dout_i = (float)__ldcs(&dout_bt[i]);
                                    ^

train_gpt2.cu(925): error: no instance of overloaded function "__ldcs" matches the argument list
            argument types are: (const floatX *)
              float norm_bti = ((float)__ldcs(&inp_bt[i]) - mean_bt) * rstd_bt;
                                       ^

train_gpt2.cu(996): error: no instance of overloaded function "__ldcs" matches the argument list
            argument types are: (const floatX *)
              float acc = (float)__ldcs(att_bth + t3) * ((float)__ldcs(datt_bth + t3) - local_sum);
                                 ^

train_gpt2.cu(996): error: no instance of overloaded function "__ldcs" matches the argument list
            argument types are: (const floatX *)
              float acc = (float)__ldcs(att_bth + t3) * ((float)__ldcs(datt_bth + t3) - local_sum);
                                                                ^

train_gpt2.cu(997): error: no instance of overloaded function "__stcs" matches the argument list
            argument types are: (floatX *, floatX)
              __stcs(dpreatt_bth + t3, (floatX)(scale * acc));
              ^

train_gpt2.cu(1135): error: no operator "+=" matches these operands
            operand types are: floatX += floatX
      if (i < n) { dst[i] += (floatX)src[i]; }
                          ^

train_gpt2.cu(80): warning #177-D: variable "ncclFloatN" was declared but never referenced
  const ncclDataType_t ncclFloatN = ncclFloat;
                       ^

20 errors detected in the compilation of "train_gpt2.cu".
make: *** [Makefile:203: train_gpt2cu] Error 255

This problem or question might seem kind of stupid since I'm a beginner in CUDA and C. I would appreciate it if anyone could provide me with some solutions or suggestions.

ngc92 · 2024-05-05T15:48:31Z

you need to either disable BF16 ( -DENABLE_BF16) or instruct your compiler to compile for a more recent GPU (Ampere) that actually has hardware support for bf16

ifromeast · 2024-05-06T02:25:04Z

similar ERROR

---------------------------------------------
→ cuDNN is manually disabled by default, run make with `USE_CUDNN=1` to try to enable
✓ OpenMP found
✓ OpenMPI found, OK to train with multiple GPUs
✓ nvcc found, including GPU/CUDA support
---------------------------------------------
/usr/local/cuda/bin/nvcc -O3 -t=0 --use_fast_math -DMULTI_GPU -DENABLE_FP16 train_gpt2.cu -lcublas -lcublasLt -L/usr/lib/x86_64-linux-gnu/openmpi/lib/  -I/usr/lib/x86_64-linux-gnu/openmpi/include  -lmpi -lnccl -o train_gpt2cu 
train_gpt2.cu(215): error: no instance of overloaded function "atomicAdd" matches the argument list
            argument types are: (half2 *, half2)

train_gpt2.cu(242): error: no operator "+=" matches these operands
            operand types are: floatX += __half

train_gpt2.cu(284): warning #20012-D: __device__ annotation is ignored on a function("Packed128") that is explicitly defaulted on its first declaration

train_gpt2.cu(1135): error: no operator "+=" matches these operands
            operand types are: floatX += floatX

train_gpt2.cu(80): warning #177-D: variable "ncclFloatN" was declared but never referenced

3 errors detected in the compilation of "train_gpt2.cu".
make: *** [Makefile:203: train_gpt2cu] Error 255

rosslwheeler · 2024-05-07T06:17:28Z

Try upgrading your Cuda version to 12.4.1?

lancerts · 2024-05-08T04:22:58Z

upgrade nvcc to 12.4.
check the computation capability of the GPU card, in the source code include/cuda_bf16.h (or hpp). You might see

#if defined(__CUDACC__) && (!defined(__CUDA_ARCH__) || (__CUDA_ARCH__ >= 800) || defined(_NVHPC_CUDA))

This basically means functions are not available for computation capability <8.0..

Note, the header source is dependent on cuda tool kit version. Things that cannot be compiled in 12.1 may be compilable in 12.4 (this is the case for me).

yanqd0 · 2024-05-20T03:17:58Z

By default, PRECISION=BF16.

make
# It is the same as:
PRECISION=BF16 make

Compile with other options can also solve this issue.

PRECISION=FP16 make
# or
PRECISION=FP32 make

Related code in Makefile:

# Precision settings, default to bf16 but ability to override
PRECISION ?= BF16
VALID_PRECISIONS := FP32 FP16 BF16
ifeq ($(filter $(PRECISION),$(VALID_PRECISIONS)),)
  $(error Invalid precision $(PRECISION), valid precisions are $(VALID_PRECISIONS))
endif
ifeq ($(PRECISION), FP32)
  PFLAGS = -DENABLE_FP32
else ifeq ($(PRECISION), FP16)                                                                                                                                               
  PFLAGS = -DENABLE_FP16
else
  PFLAGS = -DENABLE_BF16
endif

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error: make: *** [Makefile:203: train_gpt2cu] Error 255 #359

Error: make: *** [Makefile:203: train_gpt2cu] Error 255 #359

yushengsu-thu commented May 5, 2024 •

edited

ngc92 commented May 5, 2024

ifromeast commented May 6, 2024

rosslwheeler commented May 7, 2024

lancerts commented May 8, 2024 •

edited

yanqd0 commented May 20, 2024

Error: make: *** [Makefile:203: train_gpt2cu] Error 255 #359

Error: make: *** [Makefile:203: train_gpt2cu] Error 255 #359

Comments

yushengsu-thu commented May 5, 2024 • edited

ngc92 commented May 5, 2024

ifromeast commented May 6, 2024

rosslwheeler commented May 7, 2024

lancerts commented May 8, 2024 • edited

yanqd0 commented May 20, 2024

yushengsu-thu commented May 5, 2024 •

edited

lancerts commented May 8, 2024 •

edited