Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zeebe cluster when node 0 fail cluster fail #18583

Closed
ahmedbeledy opened this issue May 16, 2024 · 11 comments
Closed

zeebe cluster when node 0 fail cluster fail #18583

ahmedbeledy opened this issue May 16, 2024 · 11 comments
Assignees
Labels
component/zeebe Related to the Zeebe component/team kind/bug Categorizes an issue or PR as a bug

Comments

@ahmedbeledy
Copy link

ahmedbeledy commented May 16, 2024

problem
I have a basic problem with my setup. I have 4 servers: one gateway and three nodes acting as brokers. The nodes are named as follows:

Node 0
Node 1
Node 2
Here is the issue:

If either Node 1 or Node 2 fails, the system continues to work as expected.
However, if Node 0 fails, the entire system stops working.
This issue does not occur when using Docker. The problem only happens on the local machine setup. Additionally, I've attached my configuration files at the bottom of this message.

Can you help me understand why this behavior is occurring and how to fix it?

Environment:

OS: Windows 10
Zeebe Version:8.5.1
installed : locally 4 machines one gateway and other node broker
configration :

gatway config

zeebe:
gateway:
network:
host: 192.168.8.115
port: 26500
cluster:
host: 192.168.8.115
port: 26502
initialContactPoints: [192.168.8.114:26502 , 192.168.8.110:26502 , 192.168.8.105:26502]
# initialContactPoints: [192.168.8.114:26502 ]
security:
enabled: false
multiTenancy:
enabled: false

nodes config
first node
zeebe:
broker:
gateway:
enable: false

network:
  host: 192.168.8.114
  port: 26500
  security:
        enabled: false
data:
  directory: data
cluster:
  nodeId: 0
  partitionsCount: 2
  replicationFactor: 3
  clusterSize: 3
  initialContactPoints: [ 192.168.8.110:26502 , 192.168.8.114:26502 , 192.168.8.105:26502  ]
 second node

zeebe:
broker:
gateway:
enable: false

network:
  host: 192.168.8.110
  port: 26500
  security:
        enabled: false
data:
  directory: data
cluster:
  nodeId: 1
  partitionsCount: 2
  replicationFactor: 3
  clusterSize: 3
  initialContactPoints: [ 192.168.8.110:26502 , 192.168.8.114:26502 , 192.168.8.105:26502  ]

zeebe:
broker:
gateway:
enable: false

network:
  host: 192.168.8.105
  port: 26500
  security:
        enabled: false
data:
  directory: data
cluster:
  nodeId: 2
  partitionsCount: 2
  replicationFactor: 3
  clusterSize: 3
  initialContactPoints: [ 192.168.8.110:26502 , 192.168.8.114:26502 , 192.168.8.105:26502  ]

i try in docker and its works fine my docker file

version: "2"

networks:
zeebe_network:
driver: bridge

services:
gateway:
restart: always
container_name: gateway
image: camunda/zeebe:${ZEEBE_VERSION}
environment:
- ZEEBE_LOG_LEVEL=debug
- ZEEBE_STANDALONE_GATEWAY=true
- ZEEBE_GATEWAY_NETWORK_HOST=0.0.0.0
- ZEEBE_GATEWAY_NETWORK_PORT=26500
- ZEEBE_GATEWAY_CLUSTER_CONTACTPOINT=node0:26502
- ZEEBE_GATEWAY_CLUSTER_PORT=26502
- ZEEBE_GATEWAY_CLUSTER_HOST=gateway
ports:
- "26500:26500"
networks:
- zeebe_network
node0:
container_name: zeebe_broker_1
image: camunda/zeebe:${ZEEBE_VERSION}
environment:
- ZEEBE_LOG_LEVEL=debug
- ZEEBE_BROKER_CLUSTER_NODEID=0
- ZEEBE_BROKER_CLUSTER_PARTITIONSCOUNT=2
- ZEEBE_BROKER_CLUSTER_REPLICATIONFACTOR=3
- ZEEBE_BROKER_CLUSTER_CLUSTERSIZE=3
- ZEEBE_BROKER_CLUSTER_INITIALCONTACTPOINTS=node0:26502,node1:26502,node2:26502

networks:
  - zeebe_network

node1:
container_name: zeebe_broker_2
image: camunda/zeebe:${ZEEBE_VERSION}
environment:
- ZEEBE_LOG_LEVEL=debug
- ZEEBE_BROKER_CLUSTER_NODEID=1
- ZEEBE_BROKER_CLUSTER_PARTITIONSCOUNT=2
- ZEEBE_BROKER_CLUSTER_REPLICATIONFACTOR=3
- ZEEBE_BROKER_CLUSTER_CLUSTERSIZE=3
- ZEEBE_BROKER_CLUSTER_INITIALCONTACTPOINTS=node0:26502,node1:26502,node2:26502
networks:
- zeebe_network
depends_on:
- node0
node2:
container_name: zeebe_broker_3
image: camunda/zeebe:${ZEEBE_VERSION}
environment:
- ZEEBE_LOG_LEVEL=debug
- ZEEBE_BROKER_CLUSTER_NODEID=2
- ZEEBE_BROKER_CLUSTER_PARTITIONSCOUNT=2
- ZEEBE_BROKER_CLUSTER_REPLICATIONFACTOR=3
- ZEEBE_BROKER_CLUSTER_CLUSTERSIZE=3
- ZEEBE_BROKER_CLUSTER_INITIALCONTACTPOINTS=node0:26502,node1:26502,node2:26502
networks:
- zeebe_network
depends_on:
- node0
i try contactpoint like docker and not work

@ahmedbeledy ahmedbeledy added component/zeebe Related to the Zeebe component/team kind/bug Categorizes an issue or PR as a bug labels May 16, 2024
@Zelldon Zelldon self-assigned this May 17, 2024
@Zelldon
Copy link
Member

Zelldon commented May 21, 2024

Hey @ahmedbeledy

sorry but it is not really clear what your issue is, could you please reformat your issue description and elaborate what you're observing vs. expecting and what you have tried so far?

@ahmedbeledy
Copy link
Author

hello @Zelldon
I have a basic problem with my setup. I have 4 servers: one gateway and three nodes acting as brokers. The nodes are named as follows:

Node 0
Node 1
Node 2
Here is the issue:

If either Node 1 or Node 2 fails, the system continues to work as expected.
However, if Node 0 fails, the entire system stops working.
This issue does not occur when using Docker. The problem only happens on the local machine setup. Additionally, I've attached my configuration files at the top of this message.

Can you help me understand why this behavior is occurring and how to fix it?

Thank you.

@Zelldon
Copy link
Member

Zelldon commented May 22, 2024

Thanks @ahmedbeledy

Could you answer the following questions:

  1. What do you mean by local setup? Using KIND? Using docker compose?
  2. Is the configuration the same locally as with the other setup?
  3. "Entire system stops working." What does this mean and implies?

In general I would recommend to have this kind of discussion in the forum https://forum.camunda.io/ and if you found a clear bug/issue, open an issue here.

@ahmedbeledy
Copy link
Author

1- local setup i not use kind or docker i download release and run it from bin
2- yes
3- i mean my requests to the gateway fail

@Zelldon
Copy link
Member

Zelldon commented May 22, 2024

Could you post the logs from the different brokers and gateway please and also tell me to which gateway you connect. What is the port and IP? Maybe you connect to an embedded gateway from the broker 0 ? 🤔

@ahmedbeledy
Copy link
Author

ahmedbeledy commented May 22, 2024

logs befor node fails
image

@ahmedbeledy
Copy link
Author

logs of gateway after node 0 stop
image

@ahmedbeledy
Copy link
Author

gateway config

gateway:
cluster:
#contactPoint: 127.0.0.1:26502
host: 192.168.1.3
initialContactPoints: [192.168.1.16:26502 , 192.168.1.13:26502 , 192.168.1.14:26502]
port: 26502
network:
host: 0.0.0.0
port: 26500

@ahmedbeledy
Copy link
Author

ahmedbeledy commented May 22, 2024

@Zelldon

@korthout
Copy link
Member

@Zelldon This issue seems ZDP specific. I'm removing it from our board, please re-add if you think we should take care of something here.

@Zelldon
Copy link
Member

Zelldon commented May 27, 2024

Hey @ahmedbeledy

sorry but the provided screenshots are not really useful. I would need to see more than that. Furthermore, see my questions above #18583 (comment) that haven't been covered yet.

I will close this issue as this is mostly investigating and troubleshooting and no actual issue to work on. Please raise your problem in the forum https://forum.camunda.io/

Make sure to provide as much as possible information, like configurations, logs (upload them directly, no screenshots), explain what your expectation is and your observation, etc.

@Zelldon Zelldon closed this as not planned Won't fix, can't repro, duplicate, stale May 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/zeebe Related to the Zeebe component/team kind/bug Categorizes an issue or PR as a bug
Projects
None yet
Development

No branches or pull requests

3 participants