Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-10749. Shutdown datanode when RatisServer is down #6587

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ChenSammi
Copy link
Contributor

What changes were proposed in this pull request?

Currently, when RatisServer is down(mainly due to long GC which exceeds the ratis close threshold), Datanode is still running and in HEALTHY and IN_SERVICE state, which is confusing.

This tasks will shutdown the Datanode after RatisServer is down.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-10749

How was this patch tested?

Manual test

@ChenSammi
Copy link
Contributor Author

ChenSammi commented Apr 25, 2024

A normal DN shutdown log, first XceiverServerRatis is stopped, "Stopping XceiverServerRatis 01effdc6-dad1-4bf3-916a-749d9aa7e5e5", then ContainerStateMachine is stopped, "Stopping ContainerStateMachine for group-5EA60976374E".

2024-04-24 17:53:21,589 ERROR ozone.HddsDatanodeService (SignalLogger.java:handle(60)) - RECEIVED SIGNAL 2: SIGINT
2024-04-24 17:53:21,590 INFO  ozone.HddsDatanodeService (StringUtils.java:lambda$startupShutdownMessage$0(144)) - SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down HddsDatanodeService at SAMMICHEN-MB0/0.0.0.0
************************************************************/
2024-04-24 17:53:21,595 INFO  ozoneimpl.OzoneContainer (OzoneContainer.java:stop(482)) - Attempting to stop container services.
2024-04-24 17:53:21,595 WARN  ozoneimpl.AbstractBackgroundContainerScanner (AbstractBackgroundContainerScanner.java:handleRemainingSleep(134)) - Background container scan was interrupted.
2024-04-24 17:53:21,595 INFO  ozoneimpl.AbstractBackgroundContainerScanner (AbstractBackgroundContainerScanner.java:run(61)) - Thread[ContainerMetadataScanner,5,main] exiting.
2024-04-24 17:53:21,595 INFO  ozoneimpl.BackgroundContainerDataScanner (BackgroundContainerDataScanner.java:shutdown(141)) - ContainerDataScanner(/tmp/datanode1/storage/hdds) is shutting down. 
2024-04-24 17:53:21,595 WARN  ozoneimpl.AbstractBackgroundContainerScanner (AbstractBackgroundContainerScanner.java:handleRemainingSleep(134)) - Background container scan was interrupted.
2024-04-24 17:53:21,596 INFO  ozoneimpl.AbstractBackgroundContainerScanner (AbstractBackgroundContainerScanner.java:run(61)) - ContainerDataScanner(/tmp/datanode1/storage/hdds, DS-af727dc0-66f9-4db9-8f1f-8ce487a40766) exiting.
2024-04-24 17:53:21,596 INFO  ozoneimpl.OnDemandContainerDataScanner (OnDemandContainerDataScanner.java:shutdownScanner(206)) - On-demand container scanner is shutting down.
2024-04-24 17:53:21,606 INFO  ratis.XceiverServerRatis (XceiverServerRatis.java:stop(604)) - Stopping XceiverServerRatis 01effdc6-dad1-4bf3-916a-749d9aa7e5e5
2024-04-24 17:53:21,606 INFO  server.RaftServer (RaftServerProxy.java:lambda$close$9(416)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5: close
2024-04-24 17:53:21,607 INFO  server.RaftServer$Division (RaftServerImpl.java:lambda$close$3(526)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5@group-5EA60976374E: shutdown
2024-04-24 17:53:21,607 INFO  server.GrpcService (GrpcService.java:closeImpl(311)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5: shutdown server org.apache.ratis.grpc.server.GrpcClientProtocolService now
2024-04-24 17:53:21,607 INFO  util.JmxRegister (JmxRegister.java:unregister(73)) - Successfully un-registered JMX Bean with object name Ratis:service=RaftServer,group=group-5EA60976374E,id=01effdc6-dad1-4bf3-916a-749d9aa7e5e5
2024-04-24 17:53:21,607 INFO  impl.RoleInfo (RoleInfo.java:shutdownLeaderState(94)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5: shutdown 01effdc6-dad1-4bf3-916a-749d9aa7e5e5@group-5EA60976374E-LeaderStateImpl
2024-04-24 17:53:21,610 INFO  server.GrpcService (GrpcService.java:closeImpl(320)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5: shutdown server org.apache.ratis.grpc.server.GrpcClientProtocolService successfully
2024-04-24 17:53:21,610 INFO  server.GrpcService (GrpcService.java:closeImpl(311)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5: shutdown server GrpcServerProtocolService now
2024-04-24 17:53:21,611 INFO  server.GrpcService (GrpcService.java:closeImpl(320)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5: shutdown server GrpcServerProtocolService successfully
2024-04-24 17:53:21,611 INFO  server.GrpcService (GrpcService.java:closeImpl(311)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5: shutdown server org.apache.ratis.grpc.server.GrpcAdminProtocolService now
2024-04-24 17:53:21,614 INFO  server.GrpcService (GrpcService.java:closeImpl(320)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5: shutdown server org.apache.ratis.grpc.server.GrpcAdminProtocolService successfully
2024-04-24 17:53:21,614 INFO  impl.PendingRequests (PendingRequests.java:sendNotLeaderResponses(289)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5@group-5EA60976374E-PendingRequests: sendNotLeaderResponses
2024-04-24 17:53:21,620 INFO  impl.StateMachineUpdater (StateMachineUpdater.java:stopAndJoin(157)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5@group-5EA60976374E-StateMachineUpdater: set stopIndex = 2
2024-04-24 17:53:21,620 INFO  ratis.ContainerStateMachine (ContainerStateMachine.java:takeSnapshot(359)) - group-5EA60976374E: Taking a snapshot at:(t:2, i:2) file /tmp/datanode1/ratis/e9e7ba3c-7686-4b3a-96fd-5ea60976374e/sm/snapshot.2_2
2024-04-24 17:53:21,621 INFO  ratis.ContainerStateMachine (ContainerStateMachine.java:takeSnapshot(370)) - group-5EA60976374E: Finished taking a snapshot at:(t:2, i:2) file:/tmp/datanode1/ratis/e9e7ba3c-7686-4b3a-96fd-5ea60976374e/sm/snapshot.2_2 took: 1 ms
2024-04-24 17:53:21,622 INFO  impl.StateMachineUpdater (StateMachineUpdater.java:takeSnapshot(295)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5@group-5EA60976374E-StateMachineUpdater: Took a snapshot at index 2
2024-04-24 17:53:21,622 INFO  impl.StateMachineUpdater (StateMachineUpdater.java:lambda$new$0(98)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5@group-5EA60976374E-StateMachineUpdater: snapshotIndex: updateIncreasingly 0 -> 2
2024-04-24 17:53:21,623 INFO  ratis.ContainerStateMachine (ContainerStateMachine.java:close(1150)) - Stopping ContainerStateMachine for group-5EA60976374E.
2024-04-24 17:53:21,623 INFO  server.RaftServer$Division (ServerState.java:close(427)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5@group-5EA60976374E: applyIndex: 2
2024-04-24 17:53:21,623 INFO  util.AwaitToRun (AwaitToRun.java:run(49)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5@group-5EA60976374E-cacheEviction-AwaitToRun-AwaitForSignal is interrupted
2024-04-24 17:53:21,695 INFO  segmented.SegmentedRaftLogWorker (SegmentedRaftLogWorker.java:close(245)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5@group-5EA60976374E-SegmentedRaftLogWorker close()
2024-04-24 17:53:21,697 INFO  util.JvmPauseMonitor (JvmPauseMonitor.java:run(152)) - JvmPauseMonitor-01effdc6-dad1-4bf3-916a-749d9aa7e5e5: Stopped
2024-04-24 17:53:23,783 INFO  volume.HddsVolume (HddsVolume.java:closeDbStore(470)) - SchemaV3 db is stopped at /tmp/datanode1/storage/hdds/CID-9ba4109c-68b1-4311-9623-42f82149fb80/DS-af727dc0-66f9-4db9-8f1f-8ce487a40766/container.db for volume DS-af727dc0-66f9-4db9-8f1f-8ce487a40766
2024-04-24 17:53:23,783 INFO  utils.BackgroundService (BackgroundService.java:shutdown(160)) - Shutting down service BlockDeletingService
2024-04-24 17:53:23,784 INFO  utils.BackgroundService (BackgroundService.java:shutdown(160)) - Shutting down service StaleRecoveringContainerScrubbingService
2024-04-24 17:53:23,785 INFO  statemachine.DatanodeStateMachine (DatanodeStateMachine.java:stopDaemon(640)) - Ozone container server stopped.
2024-04-24 17:53:23,790 INFO  handler.ContextHandler (ContextHandler.java:doStop(1159)) - Stopped o.e.j.w.WebAppContext@3baf6936{hddsDatanode,/,null,STOPPED}{file:/Users/sammi/workspace/hadoop-ozone/hadoop-hdds/container-service/target/classes/webapps/hddsDatanode}
2024-04-24 17:53:23,794 INFO  server.AbstractConnector (AbstractConnector.java:doStop(383)) - Stopped ServerConnector@4f453e63{HTTP/1.1, (http/1.1)}{SAMMICHEN-MB0:9882}
2024-04-24 17:53:23,794 INFO  server.session (HouseKeeper.java:stopScavenging(149)) - node0 Stopped scavenging
2024-04-24 17:53:23,794 INFO  handler.ContextHandler (ContextHandler.java:doStop(1159)) - Stopped o.e.j.s.ServletContextHandler@1816e24a{static,/static,file:///Users/sammi/workspace/hadoop-ozone/hadoop-hdds/container-service/target/classes/webapps/static,STOPPED}
2024-04-24 17:53:23,795 INFO  ozone.HddsDatanodeClientProtocolServer (HddsDatanodeClientProtocolServer.java:stop(83)) - Stopping the RPC server for Client Protocol
2024-04-24 17:53:23,795 INFO  ipc.Server (Server.java:stop(3523)) - Stopping server on 19864
2024-04-24 17:53:23,796 INFO  ipc.Server (Server.java:run(1434)) - Stopping IPC Server listener on 19864
2024-04-24 17:53:23,796 INFO  ipc.Server (Server.java:run(1567)) - Stopping IPC Server Responder

@ChenSammi
Copy link
Contributor Author

A DN shutdown due to Ratis server is shutdown. First ContainerStateMachine is closed, "Container statemachine is closed by ratis, terminating HddsDatanodeService", then XceiverServerRatis is stopped, "Stopping XceiverServerRatis 01effdc6-dad1-4bf3-916a-749d9aa7e5e5".

2024-04-24 18:06:16,666 WARN  util.JvmPauseMonitor (JvmPauseMonitor.java:detectPause(168)) - JvmPauseMonitor-01effdc6-dad1-4bf3-916a-749d9aa7e5e5: Detected pause in JVM or host machine approximately 93.265s without any GCs.
2024-04-24 18:06:16,666 ERROR server.RaftServer (RaftServerProxy.java:handleJvmPause(237)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5: JVM pause detected 93.265s longer than the close-threshold 60s, shutting down ...
2024-04-24 18:06:16,678 INFO  server.RaftServer (RaftServerProxy.java:lambda$close$9(416)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5: close
2024-04-24 18:06:16,684 INFO  server.RaftServer$Division (RaftServerImpl.java:lambda$close$3(526)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5@group-5EA60976374E: shutdown
2024-04-24 18:06:16,685 INFO  server.GrpcService (GrpcService.java:closeImpl(311)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5: shutdown server org.apache.ratis.grpc.server.GrpcClientProtocolService now
2024-04-24 18:06:16,690 INFO  util.JmxRegister (JmxRegister.java:unregister(73)) - Successfully un-registered JMX Bean with object name Ratis:service=RaftServer,group=group-5EA60976374E,id=01effdc6-dad1-4bf3-916a-749d9aa7e5e5
2024-04-24 18:06:16,691 INFO  impl.RoleInfo (RoleInfo.java:shutdownLeaderState(94)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5: shutdown 01effdc6-dad1-4bf3-916a-749d9aa7e5e5@group-5EA60976374E-LeaderStateImpl
2024-04-24 18:06:16,724 INFO  impl.PendingRequests (PendingRequests.java:sendNotLeaderResponses(289)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5@group-5EA60976374E-PendingRequests: sendNotLeaderResponses
2024-04-24 18:06:16,727 INFO  server.GrpcService (GrpcService.java:closeImpl(320)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5: shutdown server org.apache.ratis.grpc.server.GrpcClientProtocolService successfully
2024-04-24 18:06:16,727 INFO  server.GrpcService (GrpcService.java:closeImpl(311)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5: shutdown server GrpcServerProtocolService now
2024-04-24 18:06:16,728 INFO  impl.StateMachineUpdater (StateMachineUpdater.java:stopAndJoin(157)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5@group-5EA60976374E-StateMachineUpdater: set stopIndex = 4
2024-04-24 18:06:16,729 INFO  ratis.ContainerStateMachine (ContainerStateMachine.java:takeSnapshot(359)) - group-5EA60976374E: Taking a snapshot at:(t:3, i:4) file /tmp/datanode1/ratis/e9e7ba3c-7686-4b3a-96fd-5ea60976374e/sm/snapshot.3_4
2024-04-24 18:06:16,729 INFO  server.GrpcService (GrpcService.java:closeImpl(320)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5: shutdown server GrpcServerProtocolService successfully
2024-04-24 18:06:16,729 INFO  server.GrpcService (GrpcService.java:closeImpl(311)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5: shutdown server org.apache.ratis.grpc.server.GrpcAdminProtocolService now
2024-04-24 18:06:16,732 INFO  ratis.ContainerStateMachine (ContainerStateMachine.java:takeSnapshot(370)) - group-5EA60976374E: Finished taking a snapshot at:(t:3, i:4) file:/tmp/datanode1/ratis/e9e7ba3c-7686-4b3a-96fd-5ea60976374e/sm/snapshot.3_4 took: 4 ms
2024-04-24 18:06:16,733 INFO  server.GrpcService (GrpcService.java:closeImpl(320)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5: shutdown server org.apache.ratis.grpc.server.GrpcAdminProtocolService successfully
2024-04-24 18:06:16,734 INFO  impl.StateMachineUpdater (StateMachineUpdater.java:takeSnapshot(295)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5@group-5EA60976374E-StateMachineUpdater: Took a snapshot at index 4
2024-04-24 18:06:16,734 INFO  impl.StateMachineUpdater (StateMachineUpdater.java:lambda$new$0(98)) - 01effdc6-dad1-4bf3-916a-749d9aa7e5e5@group-5EA60976374E-StateMachineUpdater: snapshotIndex: updateIncreasingly 2 -> 4
2024-04-24 18:06:16,740 ERROR ratis.ContainerStateMachine (ContainerStateMachine.java:close(1142)) - Container statemachine is closed by ratis, terminating HddsDatanodeService
2024-04-24 18:06:26,754 INFO  ozoneimpl.OzoneContainer (OzoneContainer.java:stop(482)) - Attempting to stop container services.
2024-04-24 18:06:26,754 WARN  ozoneimpl.AbstractBackgroundContainerScanner (AbstractBackgroundContainerScanner.java:handleRemainingSleep(134)) - Background container scan was interrupted.
2024-04-24 18:06:26,754 INFO  ozoneimpl.AbstractBackgroundContainerScanner (AbstractBackgroundContainerScanner.java:run(61)) - Thread[ContainerMetadataScanner,5,main] exiting.
2024-04-24 18:06:26,755 INFO  ozoneimpl.BackgroundContainerDataScanner (BackgroundContainerDataScanner.java:shutdown(141)) - ContainerDataScanner(/tmp/datanode1/storage/hdds) is shutting down. 
2024-04-24 18:06:26,755 WARN  ozoneimpl.AbstractBackgroundContainerScanner (AbstractBackgroundContainerScanner.java:handleRemainingSleep(134)) - Background container scan was interrupted.
2024-04-24 18:06:26,755 INFO  ozoneimpl.AbstractBackgroundContainerScanner (AbstractBackgroundContainerScanner.java:run(61)) - ContainerDataScanner(/tmp/datanode1/storage/hdds, DS-af727dc0-66f9-4db9-8f1f-8ce487a40766) exiting.
2024-04-24 18:06:26,755 INFO  ozoneimpl.OnDemandContainerDataScanner (OnDemandContainerDataScanner.java:shutdownScanner(206)) - On-demand container scanner is shutting down.
2024-04-24 18:06:26,756 INFO  ratis.XceiverServerRatis (XceiverServerRatis.java:stop(604)) - Stopping XceiverServerRatis 01effdc6-dad1-4bf3-916a-749d9aa7e5e5
2024-04-24 18:06:26,757 INFO  util.JvmPauseMonitor (JvmPauseMonitor.java:run(152)) - JvmPauseMonitor-01effdc6-dad1-4bf3-916a-749d9aa7e5e5: Stopped
2024-04-24 18:06:28,892 INFO  volume.HddsVolume (HddsVolume.java:closeDbStore(470)) - SchemaV3 db is stopped at /tmp/datanode1/storage/hdds/CID-9ba4109c-68b1-4311-9623-42f82149fb80/DS-af727dc0-66f9-4db9-8f1f-8ce487a40766/container.db for volume DS-af727dc0-66f9-4db9-8f1f-8ce487a40766
2024-04-24 18:06:28,893 INFO  utils.BackgroundService (BackgroundService.java:shutdown(160)) - Shutting down service BlockDeletingService
2024-04-24 18:06:28,893 INFO  utils.BackgroundService (BackgroundService.java:shutdown(160)) - Shutting down service StaleRecoveringContainerScrubbingService
2024-04-24 18:06:28,894 INFO  statemachine.DatanodeStateMachine (DatanodeStateMachine.java:stopDaemon(640)) - Ozone container server stopped.
2024-04-24 18:06:28,899 INFO  handler.ContextHandler (ContextHandler.java:doStop(1159)) - Stopped o.e.j.w.WebAppContext@5fbdc49b{hddsDatanode,/,null,STOPPED}{file:/Users/sammi/workspace/hadoop-ozone/hadoop-hdds/container-service/target/classes/webapps/hddsDatanode}
2024-04-24 18:06:28,903 INFO  server.AbstractConnector (AbstractConnector.java:doStop(383)) - Stopped ServerConnector@7fc7c4a{HTTP/1.1, (http/1.1)}{SAMMICHEN-MB0:9882}
2024-04-24 18:06:28,903 INFO  server.session (HouseKeeper.java:stopScavenging(149)) - node0 Stopped scavenging
2024-04-24 18:06:28,903 INFO  handler.ContextHandler (ContextHandler.java:doStop(1159)) - Stopped o.e.j.s.ServletContextHandler@76c387f9{static,/static,file:///Users/sammi/workspace/hadoop-ozone/hadoop-hdds/container-service/target/classes/webapps/static,STOPPED}
2024-04-24 18:06:28,904 INFO  ozone.HddsDatanodeClientProtocolServer (HddsDatanodeClientProtocolServer.java:stop(83)) - Stopping the RPC server for Client Protocol
2024-04-24 18:06:28,905 INFO  ipc.Server (Server.java:stop(3523)) - Stopping server on 19864
2024-04-24 18:06:28,905 INFO  ipc.Server (Server.java:run(1434)) - Stopping IPC Server listener on 19864
2024-04-24 18:06:28,905 INFO  ipc.Server (Server.java:run(1567)) - Stopping IPC Server Responder
2024-04-24 18:06:28,908 INFO  util.ExitUtil (ExitUtil.java:terminate(241)) - Exiting with status 1: ExitException
2024-04-24 18:06:28,909 INFO  ozone.HddsDatanodeService (StringUtils.java:lambda$startupShutdownMessage$0(144)) - SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down HddsDatanodeService at SAMMICHEN-MB0/0.0.0.0
************************************************************/

Process finished with exit code 1

Comment on lines +1143 to +1145
// wait a while for other pipeline's ContainerStateMachine.close() called.
try {
Thread.sleep(10000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a more reliable way to wait for other pipeline closure other than sleep here?

And what happens if there are still unclosed pipeline after 10 seconds' wait?

Copy link
Contributor Author

@ChenSammi ChenSammi Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the ContainerStateMachine.java, which is called by each pipeline, it doesn't have the knowledge of other pipelines. 10s here is try to let other pipelines have time to close. And it's only memory operation in ContainerStateMachine.close() call. Missed a call to ContainerStateMachine.close() is not a big issue. So Shutdown immediately, wait 5s or 10s, has no big difference. Just think wait a while would be better, like when executor pool is shutdown.

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ChenSammi for working on this.

  • Each Raft group, which corresponds to a pipeline membership in Datanode, has its own ContainerStateMachine.
  • When SCM detects a dead datanode, it closes all pipelines associated with it, which triggers close of the state machine in the other two nodes.

So with this patch stopping a datanode or closing a pipeline kills other datanodes.

Note that due with multi-Raft, the effect can be cascading, since datanodes may be associated with different sets of other nodes for each Raft group.

Repro:

cd hadoop-ozone/dist/target/ozone-1.5.0-SNAPSHOT/compose/ozone
OZONE_DATANODES=6 ./run.sh -d
docker-compose exec scm ozone admin safemode wait -t 60
docker-compose ps
docker-compose up -d --no-recreate --scale datanode=5
docker-compose ps
sleep 120
docker-compose ps

Datanodes at the last step:

      Name                    Command               State                                             Ports                                          
-----------------------------------------------------------------------------------------------------------------------------------------------------
ozone_datanode_1   /usr/local/bin/dumb-init - ...   Up       0.0.0.0:33008->19864/tcp,:::33008->19864/tcp, 0.0.0.0:33011->9882/tcp,:::33011->9882/tcp
ozone_datanode_2   /usr/local/bin/dumb-init - ...   Exit 1                                                                                           
ozone_datanode_3   /usr/local/bin/dumb-init - ...   Up       0.0.0.0:33014->19864/tcp,:::33014->19864/tcp, 0.0.0.0:33015->9882/tcp,:::33015->9882/tcp
ozone_datanode_4   /usr/local/bin/dumb-init - ...   Up       0.0.0.0:33006->19864/tcp,:::33006->19864/tcp, 0.0.0.0:33007->9882/tcp,:::33007->9882/tcp
ozone_datanode_5   /usr/local/bin/dumb-init - ...   Exit 1                                                                                           

@ChenSammi
Copy link
Contributor Author

@adoroszlai , I noticed the impact to the integration test too. It looks like terminate the DN in ContainerStateMachine is not a good idea for DN. Let me think if there is other solutions.

@adoroszlai adoroszlai marked this pull request as draft May 3, 2024 09:46
@ChenSammi
Copy link
Contributor Author

ChenSammi commented May 16, 2024

Wait for RATIS release including https://issues.apache.org/jira/browse/RATIS-2066.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants