Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Size of Tensor A must match size of Tensor B #1540

Open
rohitnanda1443 opened this issue Apr 9, 2024 · 6 comments
Open

Size of Tensor A must match size of Tensor B #1540

rohitnanda1443 opened this issue Apr 9, 2024 · 6 comments

Comments

@rohitnanda1443
Copy link

HI,

I am trying to do RAG query on a large PDF file and get the below error:

Error: The size of tensor a (3351) must match the size of tensor b (4096) at non-singleton dimension 3.

The run script: python generate.py --base_model=mistralai/Mixtral-8x7B-Instruct-v0.1 --pre_load_embedding_model=True --score_model=None --enable_tts=False --enable_stt=False --enable_transcriptions=False --auth=auth.json --system_prompt="My name is H2O-GPT and I am an intelligent AI" --attention_sinks=True --max_new_tokens=100000 --max_max_new_tokens=100000 --top_k_docs=-1 --use_gpu_id=False --max_seq_len=4096 --sink_dict="{'num_sink_tokens': 4, 'window_length': 4096}"

@pseudotensor
Copy link
Collaborator

Can you provide more of the stack trace? My guess is that attention sinks in transformers is not bug free.

Separately, I recommend using Mixtral through vLLM in general. Likely it will be hard to make Mixtral run for long sequences, and it already supports 32k total input+output.

@pseudotensor
Copy link
Collaborator

I tried same with Mistral and didn't find any issues.

image

But the output and input aren't very long. The input isn't very long because you set --max_seq_len=4096, so it tries to take 29 docs but gets reduced.

@rohitnanda1443
Copy link
Author

Noted.

I will try again with removing --max_seq_len=4096

Also where are the error log files saved in the H2O-GPT folder? (so that i can send you the stack trace) Does one get the CLI output dumped on a file using "> /home/user/dump" after the CLI startup script?

@rohitnanda1443
Copy link
Author

I tried using Mixtral with vLLM and did the following:

  1. Using this guide: https://github.com/h2oai/h2ogpt/blob/main/docs/README_InferenceServers.md Local install the Inference server

  2. Ran inference server: NCCL_SHM_DISABLE=1 CUDA_VISIBLE_DEVICES=0 text-generation-launcher --model-id h2oai/h2ogpt-oig-oasst1-512-6_9b --port 8080 --sharded false --trust-remote-code --max-stop-sequences=6

  3. Ran Model: python generate.py --base_model=mistralai/Mixtral-8x7B-Instruct-v0.1 --prompt_type=zephyr --max_seq_len=4096 --pre_load_embedding_model=True --score_model=None --enable_tts=False --enable_stt=False --enable_transcriptions=False --max_seq_len=4096 --auth=auth.json --inference_server="http://127.0.0.1:8080" &

image

@rohitnanda1443
Copy link
Author

rohitnanda1443 commented Apr 12, 2024

Issue:

Unable to connect to the inference server: After starting the inference server if I do the curl test to connect to it I get the connection refused error at the port.

The Gradio Dump:

Using Model mistralai/mixtral-8x7b-instruct-v0.1
load INSTRUCTOR_Transformer
max_seq_length 512
Starting get_model: mistralai/Mixtral-8x7B-Instruct-v0.1 http:://127.0.0.1:8080
GR Client Begin: http://http: mistralai/Mixtral-8x7B-Instruct-v0.1
GR Client Failed http://http: mistralai/Mixtral-8x7B-Instruct-v0.1: HTTPConnectionPool(host='http', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x7f7357c07bb0>: Failed to resolve 'http' ([Errno -2] Name or service not known)"))
HF Client Begin: http://http: mistralai/Mixtral-8x7B-Instruct-v0.1
HF Client Failed http://http: mistralai/Mixtral-8x7B-Instruct-v0.1: Traceback (most recent call last):
File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/urllib3/connection.py", line 198, in _new_conn
sock = connection.create_connection(
File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/urllib3/util/connection.py", line 60, in create_connection
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
File "/root/miniconda3/envs/h2ogpt/lib/python3.10/socket.py", line 955, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/urllib3/connectionpool.py", line 793, in urlopen
response = self._make_request(
File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/urllib3/connectionpool.py", line 496, in _make_request
conn.request(
File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/urllib3/connection.py", line 400, in request
self.endheaders()
File "/root/miniconda3/envs/h2ogpt/lib/python3.10/http/client.py", line 1278, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/root/miniconda3/envs/h2ogpt/lib/python3.10/http/client.py", line 1038, in _send_output
self.send(msg)
File "/root/miniconda3/envs/h2ogpt/lib/python3.10/http/client.py", line 976, in send
self.connect()
File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/urllib3/connection.py", line 238, in connect
self.sock = self._new_conn()
File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/urllib3/connection.py", line 205, in _new_conn
raise NameResolutionError(self.host, self, e) from e
urllib3.exceptions.NameResolutionError: <urllib3.connection.HTTPConnection object at 0x7f735dbfd000>: Failed to resolve 'http' ([Errno -2] Name or service not known)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/requests/adapters.py", line 486, in send
resp = conn.urlopen(
File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/urllib3/connectionpool.py", line 847, in urlopen
retries = retries.increment(
File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/urllib3/util/retry.py", line 515, in increment
raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='http', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x7f735dbfd000>: Failed to resolve 'http' ([Errno -2] Name or service not known)"))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/root/h2ogpt/src/gen.py", line 2498, in get_client_from_inference_server
res = hf_client.generate('What?', max_new_tokens=1)
File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/text_generation/client.py", line 275, in generate
resp = requests.post(
File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/requests/api.py", line 115, in post
return request("post", url, data=data, json=json, **kwargs)
File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/requests/api.py", line 59, in request
return session.request(method=method, url=url, **kwargs)
File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/requests/adapters.py", line 519, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='http', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x7f735dbfd000>: Failed to resolve 'http' ([Errno -2] Name or service not known)"))

HF Client End: http://http: mistralai/Mixtral-8x7B-Instruct-v0.1 : None
Begin auto-detect HF cache text generation models
End auto-detect HF cache text generation models
Begin auto-detect llama.cpp models
End auto-detect llama.cpp models
Running on local URL: http://0.0.0.0:7863

To create a public link, set share=True in launch().
Started Gradio Server and/or GUI: server_name: localhost port: None
Use local URL: http://localhost:7863/

@pseudotensor
Copy link
Collaborator

If you look at the trace, you have an odd "Begin: http:://127.0.0.1:8080" with extra :. As in the docs, with vLLM one would do something like vllm:127.0.0.1:8080 or with HF client http://127.0.0.1:8080 but not extra :'s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants