Service is shutting down after reconnecting node to cluster #3689
Labels
kind/bug
Issues which are a software defect
quality of life
Issues that are fairly minor, but which improve the usage of EventStoreDB
Describe the bug
I have a cluster with three nodes. The Cluster is configured using IP-Gossip. When I disconnect the network of the active node, another node is active. But When I reconnect the network of the disconnected node, the node is shutting down instead of reconnecting to the cluster. The issue is happening consistently but not every time, sometimes everything works as expected.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
The node should reconnect to the cluster. (This works sometimes)
Actual behavior
The node is sending a gossip to the other nodes and then shutting down. (See log below)
Config/Logs/Screenshots
[ 7116,18,10:42:34.874,INF] ES "TcpConnection" closed [09:42:34.874: N"10.11.0.4:1112", L"10.10.0.3:54707", {b05c5910-e6b5-44c4-b2a3-b3a714f96da8}]:Received bytes: 5769, Sent bytes: 659
[ 7116,18,10:42:34.874,INF] ES "TcpConnection" closed [09:42:34.874: N"10.11.0.4:1112", L"10.10.0.3:54707", {b05c5910-e6b5-44c4-b2a3-b3a714f96da8}]:Send calls: 1, callbacks: 1
[ 7116,18,10:42:34.874,INF] ES "TcpConnection" closed [09:42:34.874: N"10.11.0.4:1112", L"10.10.0.3:54707", {b05c5910-e6b5-44c4-b2a3-b3a714f96da8}]:Receive calls: 14, callbacks: 14
[ 7116,18,10:42:34.874,INF] ES "TcpConnection" closed [09:42:34.874: N"10.11.0.4:1112", L"10.10.0.3:54707", {b05c5910-e6b5-44c4-b2a3-b3a714f96da8}]:Close reason: [Success] "Node state changed to ShuttingDown. Closing replication connection."
[ 7116,18,10:42:34.874,INF] Connection '"leader-normal"""' ["10.11.0.4:1112", {b05c5910-e6b5-44c4-b2a3-b3a714f96da8}] closed: Success.
[ 7116,69,10:42:34.874,DBG] Persistent Subscriptions have been stopped.
[ 7116,69,10:42:34.874,DBG] Persistent subscriptions received state change to ShuttingDown. Stopping listening
[ 7116,69,10:42:34.874,DBG] Persistent Subscriptions have been stopped.
[ 7116,18,10:42:34.875,DBG] SLOW QUEUE MSG ["MainQueue"]: "RequestShutdown" - 488ms. Q: 0/25.
[ 7116,69,10:42:34.876,INF] Live subscription 79fc7be1-8260-4d92-b3f1-2dc87e668cb8 to "GlobalCommands" disposed.
[ 7116,18,10:42:34.876,INF] ========== ["10.10.0.3:2113"] Service '"ReplicationTrackingService"' has shut down.
[ 7116,18,10:42:34.876,INF] Looks like node ["Unspecified/10.10.0.2:2113"] is DEAD (Gossip send failed).
[ 7116,44,10:42:34.876,DBG] Live subscription b8d39015-9c95-43a1-a875-2b228a174717 to "WdpConfig" dropped: Unsubscribed
[ 7116,44,10:42:34.876,DBG] Live subscription 2042b5a9-a024-4281-b91a-fc77fed67e89 to "WdpInfo" dropped: Unsubscribed
[ 7116,44,10:42:34.876,DBG] Live subscription f64780d9-aed5-4be7-996b-5b4e2abbcf64 to "WdpConfig" dropped: Unsubscribed
[ 7116,44,10:42:34.876,DBG] Live subscription 5fce7bfb-a1e8-4ab2-88d5-a581f190783d to "WdpInfo" dropped: Unsubscribed
[ 7116,18,10:42:34.876,INF] CLUSTER HAS CHANGED "gossip send failed to [Unspecified/10.10.0.2:2113]"
Old:
["Priority: 0 VND {5e565c72-fb3d-4d46-9e55-3fe2c8a0e3e0} [Leader, Unspecified/10.11.0.4:1112, n/a, Unspecified/10.11.0.4:1113, n/a, Unspecified/10.11.0.4:2113, (ADVERTISED: HTTP::0, TCP::0)] 266888975/266889129/266889129/E390@266887833:{1d36aa56-c23f-4d93-8b2f-ec5bff2a3dff} | 2023-01-03 09:42:34.371", "Priority: 0 VND {a21af6f5-ba4f-4320-ae79-c0307b65dd19} [ShuttingDown, Unspecified/10.10.0.3:1112, n/a, Unspecified/10.10.0.3:1113, n/a, Unspecified/10.10.0.3:2113, (ADVERTISED: HTTP::0, TCP::0)] 266885324/266885790/266885790/E387@266817565:{6ae6a7df-297d-4490-8e81-c92e1c08ed12} | 2023-01-03 09:42:34.874", "Priority: 0 VND {2806dd5c-33cf-4eb0-bf73-4684118e1a8f} [Unknown, Unspecified/10.10.0.2:1112, n/a, Unspecified/10.10.0.2:1113, n/a, Unspecified/10.10.0.2:2113, (ADVERTISED: HTTP::0, TCP::0)] 266888667/266888821/266888821/E390@266887833:{1d36aa56-c23f-4d93-8b2f-ec5bff2a3dff} | 2023-01-03 09:42:34.370"]
New:
["Priority: 0 VND {5e565c72-fb3d-4d46-9e55-3fe2c8a0e3e0} [Leader, Unspecified/10.11.0.4:1112, n/a, Unspecified/10.11.0.4:1113, n/a, Unspecified/10.11.0.4:2113, (ADVERTISED: HTTP::0, TCP::0)] 266888975/266889129/266889129/E390@266887833:{1d36aa56-c23f-4d93-8b2f-ec5bff2a3dff} | 2023-01-03 09:42:34.371", "Priority: 0 VND {a21af6f5-ba4f-4320-ae79-c0307b65dd19} [ShuttingDown, Unspecified/10.10.0.3:1112, n/a, Unspecified/10.10.0.3:1113, n/a, Unspecified/10.10.0.3:2113, (ADVERTISED: HTTP::0, TCP::0)] 266885324/266885790/266885790/E387@266817565:{6ae6a7df-297d-4490-8e81-c92e1c08ed12} | 2023-01-03 09:42:34.874", "Priority: 0 VND {2806dd5c-33cf-4eb0-bf73-4684118e1a8f} [Unknown, Unspecified/10.10.0.2:1112, n/a, Unspecified/10.10.0.2:1113, n/a, Unspecified/10.10.0.2:2113, (ADVERTISED: HTTP::0, TCP::0)] 266888667/266888821/266888821/E390@266887833:{1d36aa56-c23f-4d93-8b2f-ec5bff2a3dff} | 2023-01-03 09:42:34.876"]
[ 7116,18,10:42:34.876,INF] ========== ["10.10.0.3:2113"] Service '"StorageWriter"' has shut down.
[ 7116,18,10:42:34.876,INF] ========== ["10.10.0.3:2113"] Service '"StorageReader"' has shut down.
[ 7116,18,10:42:34.876,INF] ========== ["10.10.0.3:2113"] Service '"HttpServer [10.10.0.3:2113]"' has shut down.
[ 7116,18,10:42:34.877,INF] Looks like node ["Unspecified/10.11.0.4:1112"] is DEAD (TCP connection lost). Issuing a gossip to confirm.
[ 7116,18,10:42:34.883,INF] Gossip Received, The node ["Unspecified/10.11.0.4:2113"] is not DEAD.
[ 7116,18,10:42:34.885,INF] ========== ["10.10.0.3:2113"] Service '"Storage Chaser"' has shut down.
[ 7116,18,10:42:34.947,INF] ========== ["10.10.0.3:2113"] Service '"Index Committer"' has shut down.
[ 7116,18,10:42:34.947,INF] ========== ["10.10.0.3:2113"] All Services Shutdown.
[ 7116,18,10:42:34.982,INF] ========== ["10.10.0.3:2113"] IS SHUT DOWN.
[ 7116,18,10:42:35.017,INF] Exiting with exit code: 0.
Exit reason: "Shutdown and exit from process was requested."
[ 7116,18,10:42:35.017,DBG] SLOW QUEUE MSG ["MainQueue"]: "ServiceShutdown" - 70ms. Q: 0/3.
[ 7116,59,10:42:40.026,INF] Live subscription 2042b5a9-a024-4281-b91a-fc77fed67e89 to "WdpInfo" disposed.
[ 7116,38,10:42:40.026,INF] Live subscription b8d39015-9c95-43a1-a875-2b228a174717 to "WdpConfig" disposed.
[ 7116,69,10:42:40.026,INF] Live subscription f64780d9-aed5-4be7-996b-5b4e2abbcf64 to "WdpConfig" disposed.
[ 7116,71,10:42:40.026,INF] Live subscription 5fce7bfb-a1e8-4ab2-88d5-a581f190783d to "WdpInfo" disposed.
EventStore details
EventStore server version: 21.10.8
Operating system: Windows Server 2022
EventStore client version (if applicable):
Additional context
The text was updated successfully, but these errors were encountered: