Replies: 1 comment 1 reply
-
Thanks @Kawon1 for the detailed write up, the |
Beta Was this translation helpful? Give feedback.
-
Thanks @Kawon1 for the detailed write up, the |
Beta Was this translation helpful? Give feedback.
-
Hello, could you please provide more information about how the drifted stream replica issue is being handled by the NATS JetStream Cluster while also managing new requests? For the context, i will be talking about NATS JetStream being deployed in kubernetes environment (AKS) and performing tests on self-healing/sync of Stream cluster by purposefully purge all of the filestores on one of the NATS JetStreams - for the purpose of this experiment it is called
nats-0
.When it comes to unexpected OOMKill caused by a replica that is lagging behind the others it seems that NATS JetStream does not appear to have any rate limiters and tries to catch up with the leader as fast as it can. I've observed that The NATS JetStream is highly optimized for managing extremely high traffic volumes! Nevertheless, it doesn't appear that the process of restoring and catching up with streams is particularly optimized. Here is the graph representing the difference between messages on Stream replicas for the specfic Stream, let's call it
StreamOne
. The Query used for this Prometheus graph:sum(nats_stream_total_messages{stream_name=~"$stream"}) by (pod)
I intentionally erased the storage on
nats-0
(the green one) at 13:37 to simulate the breakdown scenario and test the self-healing capabilities of NATS JetStream. Such disasters might be happening in production environment. At this point this replica was behind the leader and was attempting to catch up with the mentioned leader. For clarification, the situation did not occur when the traffic was stopped, and the only task within this Stream cluster was to catch up with the leader bynats-0
however it seems to be completely different when 2 of these situations are happening - catching up with streams and receiving new requests. The gaps between green dots/lines are the result of OOMKill, which was the result of catching up with the leader and receiving requests by the other members of RAFT cluster. The fact that the same action was taken at 16:58 and 18:00 further, i see the deterministic nature of the situation in light of our configuration. Here are the logs which depicts the situation of catching up regarding the StreamOne stream:What's more interesting is that the second stream which name, for instance is
StreamTwo
was drifted away from leader completely as it is depicted on the graph below (It's the same situation/time which was described above but with different stream; the purging of whole storage was done at this same time what is being seen by the gaps between green dots/lines) :Here are the logs regarding the
StreamTwo
stream:Now, at timestamp 15:40, we can observe that the NATS JetStream has caught up; it is claiming that all streams are current because there is no OOM (no gaps between green dots/lines); the pod was ready because it passed the startUpProbe provided by default in official Helm Chart of NATS. But is it really? Absolutely not! On this graph, we can observe that the number of messages in
nats-0
is parallel to the number of messages innats-1
andnats-2
, indicating that the synchronization process is complete even though the replica of stream onnats-0
is completely different!. But according to the logs, the streamtwo stream is current (as seen in the later case, which is also shown on these graphs):Here is the graph which depicts all of the messages without specifying any stream (for this case, only 2 streams were storing horendously many messages):
They are just drifted and self-healing doesn't seem to be working very well. In this case the raw bytes of messages are on the level of 231 GiB (
nats-1
,nats-2
) whereas thenats-0
has got less than half of 231 GiB what is shown in these graphs.Even though OOMKills were happening, the Stream replicas were converged in the end... But as you see it's not necessarily true for every stream.
Regarding this issue, my questions are:
streamtwo
leader was moved tonats-0
, we might experience data loss as a result of attempting to catch up to the leader bynats-1
andnats-2
because as i showed you, in spite of huge lag/drift of messages thestreamtwo
stream appears to be current. So in this case we would get these kind of messages, or something similar?OR
XXX.blk.tmp
blocks i the filestore of stream? I've observed that some of them are being fixed/transformed to theXXX.blk
but some of them are not. Maybe it is the reason of such disruptions so Stream may be losing the idea of its indexes, state and etc so we may end up in unsynchronized state of Stream on few replicas across Stream clusterStreamTwo
) has got completely different number of messages on one of the JetStreams but according to JetStream is current... So assuming that this Stream in future will become the leader it seems that data loss may occurBeta Was this translation helpful? Give feedback.
All reactions