Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: make: *** [Makefile:203: train_gpt2cu] Error 255 #359

Open
yushengsu-thu opened this issue May 5, 2024 · 5 comments
Open

Error: make: *** [Makefile:203: train_gpt2cu] Error 255 #359

yushengsu-thu opened this issue May 5, 2024 · 5 comments

Comments

@yushengsu-thu
Copy link
Contributor

yushengsu-thu commented May 5, 2024

Environment:

  • System: Ubuntu 22.04.2 LTS
  • CUDA Version: cuda_12.1.r12.1/compiler.32688072_0
  • nvcc: 12.1

I encounter an error when I execute:

make train_gpt2cu

Warring and error message:

---------------------------------------------
→ cuDNN is manually disabled by default, run make with `USE_CUDNN=1` to try to enable
✓ OpenMP found
✓ OpenMPI found, OK to train with multiple GPUs
✓ nvcc found, including GPU/CUDA support
---------------------------------------------
/lustre/apps/apps/cuda/cuda-12.1/bin/nvcc -O3 -t=0 --use_fast_math -DMULTI_GPU -DENABLE_BF16 train_gpt2.cu -lcublas -lcublasLt -L/usr/lib/x86_64-linux-gnu/openmpi/lib/ -I/usr/lib/x86_64-linux-gnu/openmpi/include -lmpi -lnccl -o train_gpt2cu
train_gpt2.cu(284): warning #20012-D: __device__ annotation is ignored on a function("Packed128") that is explicitly defaulted on its first declaration
      __attribute__((device)) Packed128() = default;
                     ^

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

train_gpt2.cu(201): error: identifier "__ushort_as_bfloat16" is undefined
      __nv_bfloat162 add_val = (ptr_val & 0x3) ? __halves2bfloat162(__ushort_as_bfloat16(0), val)
                                                                    ^

train_gpt2.cu(201): error: identifier "__halves2bfloat162" is undefined
      __nv_bfloat162 add_val = (ptr_val & 0x3) ? __halves2bfloat162(__ushort_as_bfloat16(0), val)
                                                 ^

train_gpt2.cu(203): error: no instance of overloaded function "atomicAdd" matches the argument list
            argument types are: (__nv_bfloat162 *, __nv_bfloat162)
      atomicAdd(ptr_bf16, add_val);
      ^

train_gpt2.cu(242): error: no operator "+=" matches these operands
            operand types are: floatX += float
          val += __shfl_xor_sync(0xFFFFFFFF, val, offset);
              ^

train_gpt2.cu(284): warning #20012-D: __device__ annotation is ignored on a function("Packed128") that is explicitly defaulted on its first declaration
      __attribute__((device)) Packed128() = default;
                     ^

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

train_gpt2.cu(608): error: no instance of overloaded function "__stcs" matches the argument list
            argument types are: (floatX *, floatX)
          __stcs(mean + idx, (floatX)m);
          ^

train_gpt2.cu(620): error: no instance of overloaded function "__stcs" matches the argument list
            argument types are: (floatX *, floatX)
          __stcs(rstd + idx, (floatX)s);
          ^

train_gpt2.cu(629): error: no instance of overloaded function "__ldcs" matches the argument list
            argument types are: (const floatX *)
          float n = s * ((float)__ldcs(x+c) - m);
                                ^

train_gpt2.cu(630): error: no instance of overloaded function "__stcs" matches the argument list
            argument types are: (floatX *, floatX)
          __stcs(o+c, (floatX)(n * (float)weight[c] + (float)bias[c]));
          ^

train_gpt2.cu(650): error: no instance of overloaded function "__ldcs" matches the argument list
            argument types are: (const floatX *)
          q[idx] = __ldcs(&inp[inp_idx]);
                   ^

train_gpt2.cu(651): error: no instance of overloaded function "__ldcs" matches the argument list
            argument types are: (const floatX *)
          k[idx] = __ldcs(&inp[inp_idx + NH * d]);
                   ^

train_gpt2.cu(652): error: no instance of overloaded function "__ldcs" matches the argument list
            argument types are: (const floatX *)
          v[idx] = __ldcs(&inp[inp_idx + 2 * (NH * d)]);
                   ^

train_gpt2.cu(688): error: no instance of overloaded function "__ldcs" matches the argument list
            argument types are: (floatX *)
          out[other_idx] = __ldcs(&inp[idx]);
                           ^

train_gpt2.cu(769): error: no instance of overloaded function "__ldcs" matches the argument list
            argument types are: (const floatX *)
          float ev = expf(inv_temperature * ((float)__ldcs(x + i) - global_maxval));
                                                    ^

train_gpt2.cu(770): error: no instance of overloaded function "__stcs" matches the argument list
            argument types are: (floatX *, floatX)
          __stcs(out + idx * T + i, (floatX)(ev * norm));
          ^

train_gpt2.cu(924): error: no instance of overloaded function "__ldcs" matches the argument list
            argument types are: (const floatX *)
              float dout_i = (float)__ldcs(&dout_bt[i]);
                                    ^

train_gpt2.cu(925): error: no instance of overloaded function "__ldcs" matches the argument list
            argument types are: (const floatX *)
              float norm_bti = ((float)__ldcs(&inp_bt[i]) - mean_bt) * rstd_bt;
                                       ^

train_gpt2.cu(996): error: no instance of overloaded function "__ldcs" matches the argument list
            argument types are: (const floatX *)
              float acc = (float)__ldcs(att_bth + t3) * ((float)__ldcs(datt_bth + t3) - local_sum);
                                 ^

train_gpt2.cu(996): error: no instance of overloaded function "__ldcs" matches the argument list
            argument types are: (const floatX *)
              float acc = (float)__ldcs(att_bth + t3) * ((float)__ldcs(datt_bth + t3) - local_sum);
                                                                ^

train_gpt2.cu(997): error: no instance of overloaded function "__stcs" matches the argument list
            argument types are: (floatX *, floatX)
              __stcs(dpreatt_bth + t3, (floatX)(scale * acc));
              ^

train_gpt2.cu(1135): error: no operator "+=" matches these operands
            operand types are: floatX += floatX
      if (i < n) { dst[i] += (floatX)src[i]; }
                          ^

train_gpt2.cu(80): warning #177-D: variable "ncclFloatN" was declared but never referenced
  const ncclDataType_t ncclFloatN = ncclFloat;
                       ^

20 errors detected in the compilation of "train_gpt2.cu".
make: *** [Makefile:203: train_gpt2cu] Error 255

This problem or question might seem kind of stupid since I'm a beginner in CUDA and C. I would appreciate it if anyone could provide me with some solutions or suggestions.

@ngc92
Copy link
Contributor

ngc92 commented May 5, 2024

you need to either disable BF16 ( -DENABLE_BF16) or instruct your compiler to compile for a more recent GPU (Ampere) that actually has hardware support for bf16

@ifromeast
Copy link

similar ERROR

---------------------------------------------
→ cuDNN is manually disabled by default, run make with `USE_CUDNN=1` to try to enable
✓ OpenMP found
✓ OpenMPI found, OK to train with multiple GPUs
✓ nvcc found, including GPU/CUDA support
---------------------------------------------
/usr/local/cuda/bin/nvcc -O3 -t=0 --use_fast_math -DMULTI_GPU -DENABLE_FP16 train_gpt2.cu -lcublas -lcublasLt -L/usr/lib/x86_64-linux-gnu/openmpi/lib/  -I/usr/lib/x86_64-linux-gnu/openmpi/include  -lmpi -lnccl -o train_gpt2cu 
train_gpt2.cu(215): error: no instance of overloaded function "atomicAdd" matches the argument list
            argument types are: (half2 *, half2)

train_gpt2.cu(242): error: no operator "+=" matches these operands
            operand types are: floatX += __half

train_gpt2.cu(284): warning #20012-D: __device__ annotation is ignored on a function("Packed128") that is explicitly defaulted on its first declaration

train_gpt2.cu(1135): error: no operator "+=" matches these operands
            operand types are: floatX += floatX

train_gpt2.cu(80): warning #177-D: variable "ncclFloatN" was declared but never referenced

3 errors detected in the compilation of "train_gpt2.cu".
make: *** [Makefile:203: train_gpt2cu] Error 255

@rosslwheeler
Copy link
Contributor

Try upgrading your Cuda version to 12.4.1?

@lancerts
Copy link
Contributor

lancerts commented May 8, 2024

  1. upgrade nvcc to 12.4.
  2. check the computation capability of the GPU card, in the source code include/cuda_bf16.h (or hpp). You might see
#if defined(__CUDACC__) && (!defined(__CUDA_ARCH__) || (__CUDA_ARCH__ >= 800) || defined(_NVHPC_CUDA))

This basically means functions are not available for computation capability <8.0..

Note, the header source is dependent on cuda tool kit version. Things that cannot be compiled in 12.1 may be compilable in 12.4 (this is the case for me).

@yanqd0
Copy link

yanqd0 commented May 20, 2024

By default, PRECISION=BF16.

make
# It is the same as:
PRECISION=BF16 make

Compile with other options can also solve this issue.

PRECISION=FP16 make
# or
PRECISION=FP32 make

Related code in Makefile:

# Precision settings, default to bf16 but ability to override
PRECISION ?= BF16
VALID_PRECISIONS := FP32 FP16 BF16
ifeq ($(filter $(PRECISION),$(VALID_PRECISIONS)),)
  $(error Invalid precision $(PRECISION), valid precisions are $(VALID_PRECISIONS))
endif
ifeq ($(PRECISION), FP32)
  PFLAGS = -DENABLE_FP32
else ifeq ($(PRECISION), FP16)                                                                                                                                               
  PFLAGS = -DENABLE_FP16
else
  PFLAGS = -DENABLE_BF16
endif

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants