Size of Tensor A must match size of Tensor B #1540

rohitnanda1443 · 2024-04-09T11:46:34Z

HI,

I am trying to do RAG query on a large PDF file and get the below error:

Error: The size of tensor a (3351) must match the size of tensor b (4096) at non-singleton dimension 3.

The run script: python generate.py --base_model=mistralai/Mixtral-8x7B-Instruct-v0.1 --pre_load_embedding_model=True --score_model=None --enable_tts=False --enable_stt=False --enable_transcriptions=False --auth=auth.json --system_prompt="My name is H2O-GPT and I am an intelligent AI" --attention_sinks=True --max_new_tokens=100000 --max_max_new_tokens=100000 --top_k_docs=-1 --use_gpu_id=False --max_seq_len=4096 --sink_dict="{'num_sink_tokens': 4, 'window_length': 4096}"

pseudotensor · 2024-04-09T15:01:38Z

Can you provide more of the stack trace? My guess is that attention sinks in transformers is not bug free.

Separately, I recommend using Mixtral through vLLM in general. Likely it will be hard to make Mixtral run for long sequences, and it already supports 32k total input+output.

pseudotensor · 2024-04-09T15:50:49Z

I tried same with Mistral and didn't find any issues.

But the output and input aren't very long. The input isn't very long because you set --max_seq_len=4096, so it tries to take 29 docs but gets reduced.

rohitnanda1443 · 2024-04-11T05:24:34Z

Noted.

I will try again with removing --max_seq_len=4096

Also where are the error log files saved in the H2O-GPT folder? (so that i can send you the stack trace) Does one get the CLI output dumped on a file using "> /home/user/dump" after the CLI startup script?

rohitnanda1443 · 2024-04-11T17:20:12Z

I tried using Mixtral with vLLM and did the following:

Using this guide: https://github.com/h2oai/h2ogpt/blob/main/docs/README_InferenceServers.md Local install the Inference server
Ran inference server: NCCL_SHM_DISABLE=1 CUDA_VISIBLE_DEVICES=0 text-generation-launcher --model-id h2oai/h2ogpt-oig-oasst1-512-6_9b --port 8080 --sharded false --trust-remote-code --max-stop-sequences=6
Ran Model: python generate.py --base_model=mistralai/Mixtral-8x7B-Instruct-v0.1 --prompt_type=zephyr --max_seq_len=4096 --pre_load_embedding_model=True --score_model=None --enable_tts=False --enable_stt=False --enable_transcriptions=False --max_seq_len=4096 --auth=auth.json --inference_server="http://127.0.0.1:8080" &

rohitnanda1443 · 2024-04-12T06:51:24Z

Issue:

Unable to connect to the inference server: After starting the inference server if I do the curl test to connect to it I get the connection refused error at the port.

The Gradio Dump:

Using Model mistralai/mixtral-8x7b-instruct-v0.1
load INSTRUCTOR_Transformer
max_seq_length 512
Starting get_model: mistralai/Mixtral-8x7B-Instruct-v0.1 http:://127.0.0.1:8080
GR Client Begin: http://http: mistralai/Mixtral-8x7B-Instruct-v0.1
GR Client Failed http://http: mistralai/Mixtral-8x7B-Instruct-v0.1: HTTPConnectionPool(host='http', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x7f7357c07bb0>: Failed to resolve 'http' ([Errno -2] Name or service not known)"))
HF Client Begin: http://http: mistralai/Mixtral-8x7B-Instruct-v0.1
HF Client Failed http://http: mistralai/Mixtral-8x7B-Instruct-v0.1: Traceback (most recent call last):
File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/urllib3/connection.py", line 198, in _new_conn
sock = connection.create_connection(
File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/urllib3/util/connection.py", line 60, in create_connection
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
File "/root/miniconda3/envs/h2ogpt/lib/python3.10/socket.py", line 955, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/urllib3/connectionpool.py", line 793, in urlopen
response = self._make_request(
File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/urllib3/connectionpool.py", line 496, in _make_request
conn.request(
File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/urllib3/connection.py", line 400, in request
self.endheaders()
File "/root/miniconda3/envs/h2ogpt/lib/python3.10/http/client.py", line 1278, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/root/miniconda3/envs/h2ogpt/lib/python3.10/http/client.py", line 1038, in _send_output
self.send(msg)
File "/root/miniconda3/envs/h2ogpt/lib/python3.10/http/client.py", line 976, in send
self.connect()
File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/urllib3/connection.py", line 238, in connect
self.sock = self._new_conn()
File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/urllib3/connection.py", line 205, in _new_conn
raise NameResolutionError(self.host, self, e) from e
urllib3.exceptions.NameResolutionError: <urllib3.connection.HTTPConnection object at 0x7f735dbfd000>: Failed to resolve 'http' ([Errno -2] Name or service not known)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/requests/adapters.py", line 486, in send
resp = conn.urlopen(
File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/urllib3/connectionpool.py", line 847, in urlopen
retries = retries.increment(
File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/urllib3/util/retry.py", line 515, in increment
raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='http', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x7f735dbfd000>: Failed to resolve 'http' ([Errno -2] Name or service not known)"))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/root/h2ogpt/src/gen.py", line 2498, in get_client_from_inference_server
res = hf_client.generate('What?', max_new_tokens=1)
File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/text_generation/client.py", line 275, in generate
resp = requests.post(
File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/requests/api.py", line 115, in post
return request("post", url, data=data, json=json, **kwargs)
File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/requests/api.py", line 59, in request
return session.request(method=method, url=url, **kwargs)
File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/requests/adapters.py", line 519, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='http', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x7f735dbfd000>: Failed to resolve 'http' ([Errno -2] Name or service not known)"))

HF Client End: http://http: mistralai/Mixtral-8x7B-Instruct-v0.1 : None
Begin auto-detect HF cache text generation models
End auto-detect HF cache text generation models
Begin auto-detect llama.cpp models
End auto-detect llama.cpp models
Running on local URL: http://0.0.0.0:7863

To create a public link, set share=True in launch().
Started Gradio Server and/or GUI: server_name: localhost port: None
Use local URL: http://localhost:7863/

pseudotensor · 2024-04-15T02:21:14Z

If you look at the trace, you have an odd "Begin: http:://127.0.0.1:8080" with extra :. As in the docs, with vLLM one would do something like vllm:127.0.0.1:8080 or with HF client http://127.0.0.1:8080 but not extra :'s

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Size of Tensor A must match size of Tensor B #1540

Size of Tensor A must match size of Tensor B #1540

rohitnanda1443 commented Apr 9, 2024

pseudotensor commented Apr 9, 2024

pseudotensor commented Apr 9, 2024

rohitnanda1443 commented Apr 11, 2024

rohitnanda1443 commented Apr 11, 2024

rohitnanda1443 commented Apr 12, 2024 •

edited

pseudotensor commented Apr 15, 2024

Size of Tensor A must match size of Tensor B #1540

Size of Tensor A must match size of Tensor B #1540

Comments

rohitnanda1443 commented Apr 9, 2024

pseudotensor commented Apr 9, 2024

pseudotensor commented Apr 9, 2024

rohitnanda1443 commented Apr 11, 2024

rohitnanda1443 commented Apr 11, 2024

rohitnanda1443 commented Apr 12, 2024 • edited

pseudotensor commented Apr 15, 2024

rohitnanda1443 commented Apr 12, 2024 •

edited