Falcon 40B : too slow and random answers #204

ArnaudHureaux · 2023-06-06T12:43:50Z

Hi,
When i deployed the Falcon 40B model on the Basaran WebUI i had :
-random answers, by example, when i said "hi", i get : " był AbramsPlayEvent磨}$,ocempreferred LaceKUZOOOoodlesWCHawaiiVEsecured cardvue ..."
-a very slow inference, whereas i was using a RunPod server costing $10 per hour with 4 GPU A100 80GB

I tried to custom the setting like that :
kwargs = {
"local_files_only": local_files_only,
"trust_remote_code": trust_remote_code,
"torch_dtype": torch.bfloat16,
"device_map": "auto"
}

i used the half precision, but nothing changed,

Any idea how i could handle this issue ?

Thanks (and congrat for this beautiful webui !)

peakji · 2023-06-07T03:27:21Z

Hi @ArnaudHureaux! I haven't used RunPod before, and there could be multiple reasons for this issue:

Falcon models seem to require PyTorch 2.0, while Basaran's images use version 1.1.4.
The custom settings you mentioned are not in the format accepted by Basaran. Options supported by Basaran can be found in the Dockerfile.

We will attempt to reproduce the issue using tiiuae/falcon-40b on our local machine later.

jgcb00 · 2023-06-08T08:54:32Z

Hi,
The Falcon model is pretty bad when asking very small prompt, like hi, hello etc... you often get exactly that kind of output. If you ask a longer question, you will get a proper answer, it's not related with the basaran implementation

ArnaudHureaux · 2023-06-08T09:01:08Z

On my case, the answer was totally random with message like "był AbramsPlayEvent磨}$,ocempreferred LaceKUZOOOoodlesWCHawaiiVEsecured cardvue ..." ??

I didn't have this comportment on other implementation, so i think that the problem is from the implementation ?

jgcb00 · 2023-06-08T09:06:37Z

Using only hugging face :
I got the same result with load_in_8bit=True :

Question: hi
Answer:  (4).

'I don't think I'll ever be able to forget you.'

or :

Question: hi
Answer:  
It seems that the error is caused by a problem with your `onRequestSuccess` function. Specifically, the error message mentions that the function is returning an undefined value, and it seems like the `onRequestSuccess` is trying to return before the response from the server has been read.

To fix this error, you can try modifying the `onRequestSuccess` function to use Promises instead of callbacks. Instead of using `callback` to pass data to the next function, you can use `return` statements to return Promises.

Here's an example:


function onRequestSuccess(response) {
   return new Promise(function(resolve, reject) {
      console.log(response);

      // Parse JSON
      if (response.data && response.data.hasOwnProperty('success')) {
         resolve(response);
      } else {
         reject(response);
      }
   });
}

function onError(error) {
   console.log('Error:', error);
}

function sendRequest() {
  var requestData = { "username": "myusername", "password": "

0xDigest · 2023-06-09T18:14:36Z

If it helps,
I updated the Dockerfile to use nvcr.io/nvidia/pytorch:23.05-py3 and was able to load the model referenced above and run inference. I can confirm that it runs slow for me, but I am attributing that to it not loading in GPU, even in 8-bit mode which should be able to run with just 45GB/RAM per https://huggingface.co/blog/falcon#fine-tuning-with-peft. I don't see these same quality issues as @ArnaudHureaux . To me that looks like a tokenizer problem maybe?

Inference with a short prompt:

~/basaran$ curl -w 'Total: %{time_total}s\n' http://127.0.0.1/v1/completions -H 'Content-Type: application/json' -d '{ "prompt": ["once upon a time,"], "echo": true }'

{"id":"cmpl-8ba3deeed1b838469f2a0d6e","object":"text_completion","created":1686333906,"model":"/models/falcon-40b","choices":[{"text":"once upon a time, spring 2011 was going to be the beginning of the bandeau bikini.","index":0,"logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":5,"completion_tokens":21,"total_tokens":26}}
Total: 274.909453s

GPUs when loaded:

| 22%   25C    P8               14W / 250W|      6MiB / 12288MiB |      0%      Default |
| 22%   25C    P8               15W / 250W|      6MiB / 12288MiB |      0%      Default |
| 22%   26C    P8               15W / 250W|      6MiB / 12288MiB |      0%      Default |
| 22%   26C    P8               15W / 250W|      6MiB / 12288MiB |      0%      Default |
| 22%   24C    P8               13W / 250W|      6MiB / 12288MiB |      0%      Default |
| 22%   25C    P8               15W / 250W|      6MiB / 12288MiB |      0%      Default |
| 22%   23C    P8               14W / 250W|      6MiB / 12288MiB |      0%      Default |
| 22%   23C    P8               15W / 250W|      6MiB / 12288MiB |      0%      Default |
| 22%   24C    P8               14W / 250W|      6MiB / 12288MiB |      0%      Default |

Louanes1 · 2023-06-20T10:13:31Z

Am I the only one who encountered an error saying I need to install the "einops" library when trying to deploy the Falcon 40B model ? This library is not part of the requirements.txt of the 0.19.0 version

jgcb00 · 2023-06-20T11:04:10Z

einops is only used by the falcon model, it should not be a requirement for the package

peakji added the bug Something isn't working label Jun 7, 2023

peakji assigned fardeon Jun 7, 2023

peakji added question Further information is requested and removed bug Something isn't working labels Jun 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Falcon 40B : too slow and random answers #204

Falcon 40B : too slow and random answers #204

ArnaudHureaux commented Jun 6, 2023

peakji commented Jun 7, 2023

jgcb00 commented Jun 8, 2023 •

edited

ArnaudHureaux commented Jun 8, 2023

jgcb00 commented Jun 8, 2023 •

edited

0xDigest commented Jun 9, 2023 •

edited

Louanes1 commented Jun 20, 2023

jgcb00 commented Jun 20, 2023

Falcon 40B : too slow and random answers #204

Falcon 40B : too slow and random answers #204

Comments

ArnaudHureaux commented Jun 6, 2023

peakji commented Jun 7, 2023

jgcb00 commented Jun 8, 2023 • edited

ArnaudHureaux commented Jun 8, 2023

jgcb00 commented Jun 8, 2023 • edited

0xDigest commented Jun 9, 2023 • edited

Louanes1 commented Jun 20, 2023

jgcb00 commented Jun 20, 2023

jgcb00 commented Jun 8, 2023 •

edited

jgcb00 commented Jun 8, 2023 •

edited

0xDigest commented Jun 9, 2023 •

edited