Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in triton while running unsloth/mistral instruct v0.2 #440

Open
xlar-sanjeet opened this issue May 8, 2024 · 7 comments
Open

Error in triton while running unsloth/mistral instruct v0.2 #440

xlar-sanjeet opened this issue May 8, 2024 · 7 comments
Labels
currently fixing Am fixing now!

Comments

@xlar-sanjeet
Copy link

Used the following lines for env creation


conda create --name unsloth_env python=3.10
conda activate unsloth_env

conda install pytorch-cuda=<12.1/11.8> pytorch cudatoolkit xformers -c pytorch -c nvidia -c xformers

pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

pip install --no-deps trl peft accelerate bitsandbytes


Here is the full trace back of error



==((====))== Unsloth - 2x faster free finetuning | Num GPUs = 1
\ /| Num examples = 156,533 | Num Epochs = 3
O^O/ _/ \ Batch size per device = 1 | Gradient Accumulation steps = 4
\ / Total batch size = 4 | Total steps = 117,399
"-____-" Number of trainable parameters = 41,943,040
/tmp/tmpl4se9f_s/main.c: In function ‘list_to_cuuint64_array’:
/tmp/tmpl4se9f_s/main.c:354:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (Py_ssize_t i = 0; i < len; i++) {
^
/tmp/tmpl4se9f_s/main.c:354:3: note: use option -std=c99 or -std=gnu99 to compile your code
/tmp/tmpl4se9f_s/main.c: In function ‘list_to_cuuint32_array’:
/tmp/tmpl4se9f_s/main.c:365:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (Py_ssize_t i = 0; i < len; i++) {
^

CalledProcessError Traceback (most recent call last)
Cell In[26], line 1
----> 1 trainer.train()

File ~/.conda/envs/unsloth_env/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:361, in SFTTrainer.train(self, *args, **kwargs)
358 if self.neftune_noise_alpha is not None and not self._trainer_supports_neftune:
359 self.model = self._trl_activate_neftune(self.model)
--> 361 output = super().train(*args, **kwargs)
363 # After training we make sure to retrieve back the original forward pass method
364 # for the embedding layer by removing the forward post hook.
365 if self.neftune_noise_alpha is not None and not self._trainer_supports_neftune:

File ~/.conda/envs/unsloth_env/lib/python3.10/site-packages/transformers/trainer.py:1859, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
1857 hf_hub_utils.enable_progress_bars()
1858 else:
-> 1859 return inner_training_loop(
1860 args=args,
1861 resume_from_checkpoint=resume_from_checkpoint,
1862 trial=trial,
1863 ignore_keys_for_eval=ignore_keys_for_eval,
1864 )

File :361, in _fast_inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)

File ~/.conda/envs/unsloth_env/lib/python3.10/site-packages/transformers/trainer.py:3138, in Trainer.training_step(self, model, inputs)
3135 return loss_mb.reduce_mean().detach().to(self.args.device)
3137 with self.compute_loss_context_manager():
-> 3138 loss = self.compute_loss(model, inputs)
3140 if self.args.n_gpu > 1:
3141 loss = loss.mean() # mean() to average on multi-gpu parallel training

File ~/.conda/envs/unsloth_env/lib/python3.10/site-packages/transformers/trainer.py:3161, in Trainer.compute_loss(self, model, inputs, return_outputs)
3159 else:
3160 labels = None
-> 3161 outputs = model(**inputs)
3162 # Save past state if it exists
3163 # TODO: this needs to be fixed and made cleaner later.
3164 if self.args.past_index >= 0:

File ~/.conda/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py:1532, in Module._wrapped_call_impl(self, *args, **kwargs)
1530 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1531 else:
-> 1532 return self._call_impl(*args, **kwargs)

File ~/.conda/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py:1541, in Module._call_impl(self, *args, **kwargs)
1536 # If we don't have any hooks, we want to skip the rest of the logic in
1537 # this function, and just call forward.
1538 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1539 or _global_backward_pre_hooks or _global_backward_hooks
1540 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1541 return forward_call(*args, **kwargs)
1543 try:
1544 result = None

File ~/.conda/envs/unsloth_env/lib/python3.10/site-packages/accelerate/utils/operations.py:822, in convert_outputs_to_fp32..forward(*args, **kwargs)
821 def forward(*args, **kwargs):
--> 822 return model_forward(*args, **kwargs)

File ~/.conda/envs/unsloth_env/lib/python3.10/site-packages/accelerate/utils/operations.py:810, in ConvertOutputsToFp32.call(self, *args, **kwargs)
809 def call(self, *args, **kwargs):
--> 810 return convert_to_fp32(self.model_forward(*args, **kwargs))

File ~/.conda/envs/unsloth_env/lib/python3.10/site-packages/torch/amp/autocast_mode.py:16, in autocast_decorator..decorate_autocast(*args, **kwargs)
13 @functools.wraps(func)
14 def decorate_autocast(*args, **kwargs):
15 with autocast_instance:
---> 16 return func(*args, **kwargs)

File ~/.conda/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/llama.py:882, in PeftModelForCausalLM_fast_forward(self, input_ids, causal_mask, attention_mask, inputs_embeds, labels, output_attentions, output_hidden_states, return_dict, task_ids, **kwargs)
869 def PeftModelForCausalLM_fast_forward(
870 self,
871 input_ids=None,
(...)
880 **kwargs,
881 ):
--> 882 return self.base_model(
883 input_ids=input_ids,
884 causal_mask=causal_mask,
885 attention_mask=attention_mask,
886 inputs_embeds=inputs_embeds,
887 labels=labels,
888 output_attentions=output_attentions,
889 output_hidden_states=output_hidden_states,
890 return_dict=return_dict,
891 **kwargs,
892 )

File ~/.conda/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py:1532, in Module._wrapped_call_impl(self, *args, **kwargs)
1530 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1531 else:
-> 1532 return self._call_impl(*args, **kwargs)

File ~/.conda/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py:1541, in Module._call_impl(self, *args, **kwargs)
1536 # If we don't have any hooks, we want to skip the rest of the logic in
1537 # this function, and just call forward.
1538 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1539 or _global_backward_pre_hooks or _global_backward_hooks
1540 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1541 return forward_call(*args, **kwargs)
1543 try:
1544 result = None

File ~/.conda/envs/unsloth_env/lib/python3.10/site-packages/peft/tuners/tuners_utils.py:161, in BaseTuner.forward(self, *args, **kwargs)
160 def forward(self, *args: Any, **kwargs: Any):
--> 161 return self.model.forward(*args, **kwargs)

File ~/.conda/envs/unsloth_env/lib/python3.10/site-packages/accelerate/hooks.py:166, in add_hook_to_module..new_forward(module, *args, **kwargs)
164 output = module._old_forward(*args, **kwargs)
165 else:
--> 166 output = module._old_forward(*args, **kwargs)
167 return module._hf_hook.post_forward(module, output)

File ~/.conda/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/mistral.py:213, in MistralForCausalLM_fast_forward(self, input_ids, causal_mask, attention_mask, position_ids, past_key_values, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict, *args, **kwargs)
205 outputs = LlamaModel_fast_forward_inference(
206 self,
207 input_ids,
(...)
210 attention_mask = attention_mask,
211 )
212 else:
--> 213 outputs = self.model(
214 input_ids=input_ids,
215 causal_mask=causal_mask,
216 attention_mask=attention_mask,
217 position_ids=position_ids,
218 past_key_values=past_key_values,
219 inputs_embeds=inputs_embeds,
220 use_cache=use_cache,
221 output_attentions=output_attentions,
222 output_hidden_states=output_hidden_states,
223 return_dict=return_dict,
224 )
225 pass
227 hidden_states = outputs[0]

File ~/.conda/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py:1532, in Module._wrapped_call_impl(self, *args, **kwargs)
1530 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1531 else:
-> 1532 return self._call_impl(*args, **kwargs)

File ~/.conda/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py:1541, in Module._call_impl(self, *args, **kwargs)
1536 # If we don't have any hooks, we want to skip the rest of the logic in
1537 # this function, and just call forward.
1538 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1539 or _global_backward_pre_hooks or _global_backward_hooks
1540 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1541 return forward_call(*args, **kwargs)
1543 try:
1544 result = None

File ~/.conda/envs/unsloth_env/lib/python3.10/site-packages/accelerate/hooks.py:166, in add_hook_to_module..new_forward(module, *args, **kwargs)
164 output = module._old_forward(*args, **kwargs)
165 else:
--> 166 output = module._old_forward(*args, **kwargs)
167 return module._hf_hook.post_forward(module, output)

File ~/.conda/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/llama.py:650, in LlamaModel_fast_forward(self, input_ids, causal_mask, attention_mask, position_ids, past_key_values, inputs_embeds, use_cache, output_attentions, output_hidden_states, return_dict, *args, **kwargs)
647 past_key_value = past_key_values[idx] if past_key_values is not None else None
649 if offloaded_gradient_checkpointing:
--> 650 hidden_states = Unsloth_Offloaded_Gradient_Checkpointer.apply(
651 decoder_layer,
652 hidden_states,
653 causal_mask,
654 attention_mask,
655 position_ids,
656 past_key_values,
657 output_attentions,
658 use_cache,
659 )
661 elif gradient_checkpointing:
662 def create_custom_forward(module):

File ~/.conda/envs/unsloth_env/lib/python3.10/site-packages/torch/autograd/function.py:598, in Function.apply(cls, *args, **kwargs)
595 if not torch._C._are_functorch_transforms_active():
596 # See NOTE: [functorch vjp and autograd interaction]
597 args = _functorch.utils.unwrap_dead_wrappers(args)
--> 598 return super().apply(*args, **kwargs) # type: ignore[misc]
600 if not is_setup_ctx_defined:
601 raise RuntimeError(
602 "In order to use an autograd.Function with functorch transforms "
603 "(vmap, grad, jvp, jacrev, ...), it must override the setup_context "
604 "staticmethod. For more details, please see "
605 "https://pytorch.org/docs/master/notes/extending.func.html"
606 )

File ~/.conda/envs/unsloth_env/lib/python3.10/site-packages/torch/cuda/amp/autocast_mode.py:115, in custom_fwd..decorate_fwd(*args, **kwargs)
113 if cast_inputs is None:
114 args[0]._fwd_used_autocast = torch.is_autocast_enabled()
--> 115 return fwd(*args, **kwargs)
116 else:
117 autocast_context = torch.is_autocast_enabled()

File ~/.conda/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/_utils.py:369, in Unsloth_Offloaded_Gradient_Checkpointer.forward(ctx, forward_function, hidden_states, *args)
367 saved_hidden_states = hidden_states.to("cpu", non_blocking = True)
368 with torch.no_grad():
--> 369 (output,) = forward_function(hidden_states, *args)
370 ctx.save_for_backward(saved_hidden_states)
371 ctx.forward_function = forward_function

File ~/.conda/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py:1532, in Module._wrapped_call_impl(self, *args, **kwargs)
1530 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1531 else:
-> 1532 return self._call_impl(*args, **kwargs)

File ~/.conda/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py:1541, in Module._call_impl(self, *args, **kwargs)
1536 # If we don't have any hooks, we want to skip the rest of the logic in
1537 # this function, and just call forward.
1538 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1539 or _global_backward_pre_hooks or _global_backward_hooks
1540 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1541 return forward_call(*args, **kwargs)
1543 try:
1544 result = None

File ~/.conda/envs/unsloth_env/lib/python3.10/site-packages/accelerate/hooks.py:166, in add_hook_to_module..new_forward(module, *args, **kwargs)
164 output = module._old_forward(*args, **kwargs)
165 else:
--> 166 output = module._old_forward(*args, **kwargs)
167 return module._hf_hook.post_forward(module, output)

File ~/.conda/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/llama.py:432, in LlamaDecoderLayer_fast_forward(self, hidden_states, causal_mask, attention_mask, position_ids, past_key_value, output_attentions, use_cache, padding_mask, *args, **kwargs)
430 else:
431 residual = hidden_states
--> 432 hidden_states = fast_rms_layernorm(self.input_layernorm, hidden_states)
433 hidden_states, self_attn_weights, present_key_value = self.self_attn(
434 hidden_states=hidden_states,
435 causal_mask=causal_mask,
(...)
441 padding_mask=padding_mask,
442 )
443 hidden_states = residual + hidden_states

File ~/.conda/envs/unsloth_env/lib/python3.10/site-packages/unsloth/kernels/rms_layernorm.py:190, in fast_rms_layernorm(layernorm, X, gemma)
188 W = layernorm.weight
189 eps = layernorm.variance_epsilon
--> 190 out = Fast_RMS_Layernorm.apply(X, W, eps, gemma)
191 return out

File ~/.conda/envs/unsloth_env/lib/python3.10/site-packages/torch/autograd/function.py:598, in Function.apply(cls, *args, **kwargs)
595 if not torch._C._are_functorch_transforms_active():
596 # See NOTE: [functorch vjp and autograd interaction]
597 args = _functorch.utils.unwrap_dead_wrappers(args)
--> 598 return super().apply(*args, **kwargs) # type: ignore[misc]
600 if not is_setup_ctx_defined:
601 raise RuntimeError(
602 "In order to use an autograd.Function with functorch transforms "
603 "(vmap, grad, jvp, jacrev, ...), it must override the setup_context "
604 "staticmethod. For more details, please see "
605 "https://pytorch.org/docs/master/notes/extending.func.html"
606 )

File ~/.conda/envs/unsloth_env/lib/python3.10/site-packages/unsloth/kernels/rms_layernorm.py:144, in Fast_RMS_Layernorm.forward(ctx, X, W, eps, gemma)
141 r = torch.empty(n_rows, dtype = torch.float32, device = "cuda")
143 fx = _gemma_rms_layernorm_forward if gemma else _rms_layernorm_forward
--> 144 fx[(n_rows,)](
145 Y, Y.stride(0),
146 X, X.stride(0),
147 W, W.stride(0),
148 r, r.stride(0),
149 n_cols, eps,
150 BLOCK_SIZE = BLOCK_SIZE,
151 num_warps = num_warps,
152 )
153 ctx.eps = eps
154 ctx.BLOCK_SIZE = BLOCK_SIZE

File ~/.conda/envs/unsloth_env/lib/python3.10/site-packages/triton/runtime/jit.py:167, in KernelInterface.getitem..(*args, **kwargs)
161 def getitem(self, grid) -> T:
162 """
163 A JIT function is launched with: fn[grid](*args, **kwargs).
164 Hence JITFunction.getitem returns a callable proxy that
165 memorizes the grid.
166 """
--> 167 return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)

File ~/.conda/envs/unsloth_env/lib/python3.10/site-packages/triton/runtime/jit.py:363, in JITFunction.run(self, grid, warmup, *args, **kwargs)
361 assert "stream" not in kwargs, "stream option is deprecated; current stream will be used"
362 # parse options
--> 363 device = driver.get_current_device()
364 stream = driver.get_current_stream(device)
365 target = driver.get_current_target()

File ~/.conda/envs/unsloth_env/lib/python3.10/site-packages/triton/runtime/driver.py:209, in LazyProxy.getattr(self, name)
208 def getattr(self, name):
--> 209 self._initialize_obj()
210 return getattr(self._obj, name)

File ~/.conda/envs/unsloth_env/lib/python3.10/site-packages/triton/runtime/driver.py:206, in LazyProxy._initialize_obj(self)
204 def _initialize_obj(self):
205 if self._obj is None:
--> 206 self._obj = self._init_fn()

File ~/.conda/envs/unsloth_env/lib/python3.10/site-packages/triton/runtime/driver.py:239, in initialize_driver()
237 return HIPDriver()
238 elif torch.cuda.is_available():
--> 239 return CudaDriver()
240 else:
241 return UnsupportedDriver()

File ~/.conda/envs/unsloth_env/lib/python3.10/site-packages/triton/runtime/driver.py:102, in CudaDriver.init(self)
101 def init(self):
--> 102 self.utils = CudaUtils()
103 self.backend = self.CUDA
104 self.binary_ext = "cubin"

File ~/.conda/envs/unsloth_env/lib/python3.10/site-packages/triton/runtime/driver.py:49, in CudaUtils.init(self)
47 with open(src_path, "w") as f:
48 f.write(src)
---> 49 so = _build("cuda_utils", src_path, tmpdir)
50 with open(so, "rb") as f:
51 cache_path = cache.put(f.read(), fname, binary=True)

File ~/.conda/envs/unsloth_env/lib/python3.10/site-packages/triton/common/build.py:106, in _build(name, src, srcdir)
101 cc_cmd = [
102 cc, src, "-O3", f"-I{cu_include_dir}", f"-I{py_include_dir}", f"-I{srcdir}", "-shared", "-fPIC", "-lcuda",
103 "-o", so
104 ]
105 cc_cmd += [f"-L{dir}" for dir in cuda_lib_dirs]
--> 106 ret = subprocess.check_call(cc_cmd)
108 if ret == 0:
109 return so

File ~/.conda/envs/unsloth_env/lib/python3.10/subprocess.py:369, in check_call(*popenargs, **kwargs)
367 if cmd is None:
368 cmd = popenargs[0]
--> 369 raise CalledProcessError(retcode, cmd)
370 return 0

CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmpl4se9f_s/main.c', '-O3', '-I/home/chemical/phd/chz208394/.conda/envs/unsloth_env/lib/python3.10/site-packages/triton/common/../third_party/cuda/include', '-I/home/chemical/phd/chz208394/.conda/envs/unsloth_env/include/python3.10', '-I/tmp/tmpl4se9f_s', '-shared', '-fPIC', '-lcuda', '-o', '/tmp/tmpl4se9f_s/cuda_utils.cpython-310-x86_64-linux-gnu.so', '-L/lib64', '-L/lib', '-L/lib64', '-L/lib']' returned non-zero exit status 1.



Any help will be greatly appreciated.

Thank you

@danielhanchen danielhanchen added the currently fixing Am fixing now! label May 9, 2024
@danielhanchen
Copy link
Contributor

Oh thats a weird error - i will try Conda installs and get back to you

@xlar-sanjeet
Copy link
Author

Thank you for the reply.
Few hours back the issue was resolved by loading a gcc compiler.
No need to worry about. Thanks

@vincent775
Copy link

Hello, I also encountered the same problem. Can you explain in detail how to solve it? Thank you so much

@xlar-sanjeet
Copy link
Author

xlar-sanjeet commented May 17, 2024 via email

@vincent775
Copy link

On the aws server。
image

@vincent775
Copy link

The environment I just created

@xlar-sanjeet
Copy link
Author

xlar-sanjeet commented May 17, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
currently fixing Am fixing now!
Projects
None yet
Development

No branches or pull requests

3 participants