Retry query on another node if execution status != 200 #115

wlp7s0 · 2021-04-23T12:32:21Z

Hello, I'm currently testing chproxy in test environment and I have a question about query execution.
Let's say I have 2 nodes with replication and 4 zookeepers with 1 chproxy to balance rw queries between two nodes.
Also, I have a stream of data from dozens of servers to chproxy.
I have configured a health check to select specific path in the replicated table to make sure that both nodes have this tables and database itself.
But, in my test env I've removed access to zookeeper from one of the node, what rendered database on the node readonly and health check select didn't mark the node as faulty. At the same time all INSERT requests to the readonly node exited with error code 500 and all failed INSERT requests are lost.
Using /metrics I can see that chproxy can check for the query execution status, but I can't see any way to execute the fault query on another node if the response status from the node was not 200. Or, may be to store them for manual recovery.
Am I missing something?
Thanks!

gontarzpawel · 2022-01-20T17:02:40Z

Hello @wlp7s0, I'll try to reproduce it.
I'd advice you to add a retry strategy on client side and rely on message bus before your insertion services - to be resilient to Clickhouse downtime.

gontarzpawel · 2022-01-28T15:53:33Z

Hi @wlp7s0 ,

I performed following test scenario:

setup clickhouse cluster consisting of 4 nodes
chproxy targets that cluster. 4 nodes marked as healthy
manually kill one node
chproxy marked correctly killed node us unhealthy
chproxy excluded it from the list of available nodes

I fail to reproduce scenario you described. Could you please provide how to reproduce it?

ranjbaryshahab · 2023-01-05T14:50:42Z

Hello @gontarzpawel
How about another scenario status code 404 or etc?
for example, I have 3 nodes and 2 tables [A, B]
A table is replicated table and exists on all nodes, B table isn't replicated table and only exists on one node.
When I execute "select * from B" sometimes I have got the exception: Table B doesn't exist. (UNKNOWN_TABLE)
Is there any way when a table doesn't exist Chproxy try again on other nodes?
Also, I changed this line

chproxy/proxy.go

Line 215 in aeca5b7

if rw.StatusCode() == http.StatusBadGateway {

to "if rw.StatusCode() != http.StatusOK"
but it hasn't worked yet.

mga-chka · 2023-01-07T09:30:01Z

IHMO in this situation you should fix your clickhouse config or rewrite your query to specify the server that contains table B using the remote syntaxe https://clickhouse.com/docs/en/sql-reference/table-functions/remote/

Regarding the retry-ability, we looked at the error codes returned by clickhouse and decided to do it only if it makes sens (i.e if a retry can make the failed query work). If we allow a retry on 404, everytime someone does a mistake, it will be retry despite the fact that it won't work and therefore it will slowdown the query response time.

gontarzpawel added enhancement unexpected behaviour labels Jan 20, 2022

gontarzpawel added need more info and removed enhancement unexpected behaviour labels Jan 28, 2022

gontarzpawel mentioned this issue Nov 14, 2022

update changelog for version 1.20.0 #262

Merged

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retry query on another node if execution status != 200 #115

Retry query on another node if execution status != 200 #115

wlp7s0 commented Apr 23, 2021

gontarzpawel commented Jan 20, 2022

gontarzpawel commented Jan 28, 2022

ranjbaryshahab commented Jan 5, 2023 •

edited

mga-chka commented Jan 7, 2023

Retry query on another node if execution status != 200 #115

Retry query on another node if execution status != 200 #115

Comments

wlp7s0 commented Apr 23, 2021

gontarzpawel commented Jan 20, 2022

gontarzpawel commented Jan 28, 2022

ranjbaryshahab commented Jan 5, 2023 • edited

mga-chka commented Jan 7, 2023

ranjbaryshahab commented Jan 5, 2023 •

edited