RuntimeError: mat1 and mat2 shapes cannot be multiplied #181

lcw99 · 2023-04-24T06:12:03Z

When I call multiple streaming completions at the same time I get the error below.

start listening on 127.0.0.1:8888
ERROR:waitress:Exception while serving /v1/completions
Traceback (most recent call last):
  File "/home/chang/anaconda3/envs/hf38/lib/python3.8/site-packages/waitress/channel.py", line 428, in service
    task.service()
  File "/home/chang/anaconda3/envs/hf38/lib/python3.8/site-packages/waitress/task.py", line 168, in service
    self.execute()
  File "/home/chang/anaconda3/envs/hf38/lib/python3.8/site-packages/waitress/task.py", line 456, in execute
    for chunk in app_iter:
  File "/home/chang/anaconda3/envs/hf38/lib/python3.8/site-packages/werkzeug/wsgi.py", line 500, in __next__
    return self._next()
  File "/home/chang/anaconda3/envs/hf38/lib/python3.8/site-packages/werkzeug/wrappers/response.py", line 50, in _iter_encoded
    for item in iterable:
  File "/home/chang/AI/llm/basaran/basaran/__main__.py", line 168, in stream
    for choice in stream_model(**options):
  File "/home/chang/AI/llm/basaran/basaran/model.py", line 73, in __call__
    for (
  File "/home/chang/AI/llm/basaran/basaran/model.py", line 237, in generate
    outputs = self.model(
  File "/home/chang/anaconda3/envs/hf38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/chang/anaconda3/envs/hf38/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/chang/anaconda3/envs/hf38/lib/python3.8/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 662, in forward
    outputs = self.gpt_neox(
  File "/home/chang/anaconda3/envs/hf38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/chang/anaconda3/envs/hf38/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/chang/anaconda3/envs/hf38/lib/python3.8/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 553, in forward
    outputs = layer(
  File "/home/chang/anaconda3/envs/hf38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/chang/anaconda3/envs/hf38/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/chang/anaconda3/envs/hf38/lib/python3.8/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 335, in forward
    mlp_output = self.mlp(self.post_attention_layernorm(hidden_states))
  File "/home/chang/anaconda3/envs/hf38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/chang/anaconda3/envs/hf38/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/chang/anaconda3/envs/hf38/lib/python3.8/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 297, in forward
    hidden_states = self.dense_4h_to_h(hidden_states)
  File "/home/chang/anaconda3/envs/hf38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/chang/anaconda3/envs/hf38/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/chang/anaconda3/envs/hf38/lib/python3.8/site-packages/bitsandbytes/nn/modules.py", line 320, in forward
    out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
  File "/home/chang/anaconda3/envs/hf38/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py", line 500, in matmul
    return MatMul8bitLt.apply(A, B, out, bias, state)
  File "/home/chang/anaconda3/envs/hf38/lib/python3.8/site-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/chang/anaconda3/envs/hf38/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py", line 417, in forward
    output += torch.matmul(subA, state.subB)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (238x13 and 29x5120)
ERROR:waitress:Exception while serving /v1/completions

The text was updated successfully, but these errors were encountered:

fardeon · 2023-04-24T08:37:31Z

We've ran into the exact same error before: #5. The error is caused by TimDettmers/bitsandbytes#162 and seems fully random.

Currently the only workaround is to stop using INT8 quantization, and use half-precision instead.

fardeon added duplicate This issue or pull request already exists bug Something isn't working labels Apr 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: mat1 and mat2 shapes cannot be multiplied #181

RuntimeError: mat1 and mat2 shapes cannot be multiplied #181

lcw99 commented Apr 24, 2023

fardeon commented Apr 24, 2023 •

edited

RuntimeError: mat1 and mat2 shapes cannot be multiplied #181

RuntimeError: mat1 and mat2 shapes cannot be multiplied #181

Comments

lcw99 commented Apr 24, 2023

fardeon commented Apr 24, 2023 • edited

fardeon commented Apr 24, 2023 •

edited