Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abnormal NATS write load associated with a specific jetstream #5423

Open
alekseipalshin opened this issue May 15, 2024 · 2 comments
Open

Abnormal NATS write load associated with a specific jetstream #5423

alekseipalshin opened this issue May 15, 2024 · 2 comments
Labels
defect Suspected defect such as a bug or regression

Comments

@alekseipalshin
Copy link

Observed behavior

Hello.

We have detected high disk write load on some servers of the NATS cluster. (up to 100 MB/s)
nats_disk_usage

Restarting the NATS server helps reduce the disk load.
We had no ideas what could have caused it, so we conducted an analysis using Strace:

407112 08:02:01 write(90</data/jetstream/MYSTREAM/streams/my-stream_prod/obs/my-stream_user_rating_fetch_did_request_v1_prod/o.dat>, "\26\2\345\255\241 \244\260\206\234\1\360\213\363\177\344\323\253\226\4\242\211\303\1\364\351\230\344\f\335\300\326"..., 58764653 <unfinished ...>
407112 08:02:01 <... write resumed> )   = 58764653
407112 08:02:02 write(43</data/jetstream/MYSTREAM/streams/my-stream_prod/obs/my-stream_user_rating_fetch_did_request_v1_prod/o.dat>, "\26\2\345\255\241 \244\260\206\234\1\363\213\363\177\361\323\253\226\4\243\211\303\1\364\351\230\344\f\253\264\275"..., 58764663 <unfinished ...>
407112 08:02:02 <... write resumed> )   = 58764663
407112 08:02:03 write(43</data/jetstream/MYSTREAM/streams/my-stream_prod/obs/my-stream_user_rating_fetch_did_request_v1_prod/o.dat>, "\26\2\345\255\241 \244\260\206\234\1\364\213\363\177\363\323\253\226\4\241\211\303\1\366\351\230\344\f\316\307\242"..., 58764643 <unfinished ...>
407112 08:02:03 <... write resumed> )   = 58764643

nats stream info

? Select a Stream my-stream_prod
Information for Stream my-stream_prod created 2023-04-27 14:25:50

             Subjects: my-stream.*.*.prod
             Replicas: 3
              Storage: File

Options:

            Retention: Limits
     Acknowledgements: true
       Discard Policy: Old
     Duplicate Window: 2m0s
           Direct Get: true
    Allows Msg Delete: true
         Allows Purge: true
       Allows Rollups: false

Limits:

     Maximum Messages: unlimited
  Maximum Per Subject: unlimited
        Maximum Bytes: unlimited
          Maximum Age: 15m0s
 Maximum Message Size: unlimited
    Maximum Consumers: unlimited


Cluster Information:

                 Name: nats
               Leader: nats-cluster-4
              Replica: nats-cluster-1, current, seen 0.00s ago
              Replica: nats-cluster-2, outdated, seen 0.00s ago, 1 operation behind

State:

             Messages: 111,685
                Bytes: 21 MiB
             FirstSeq: 4,136,545,200 @ 2024-05-15T10:51:20 UTC
              LastSeq: 4,136,656,884 @ 2024-05-15T11:06:20 UTC
     Active Consumers: 10
   Number of Subjects: 9

nats consumer info

? Select a Stream my-stream_prod
? Select a Consumer  [Use arrows to move, type to filter]
> 
  ...
  my-stream_user_rating_fetch_did_request_v1_prod
  ...

Expected behavior

Disk write throughput is ~1 MB/s like after restart

Server and client version

nats-server:2.9.16

github.com/nats-io/nats.go v1.31.0

Host environment

Kubernetes v1.26.12-eks

Installed from helm chart:

apiVersion: v2
appVersion: 2.8.4
description: A Helm chart for the NATS.io High Speed Cloud Native Distributed Communications
  Technology.
home: http://github.com/nats-io/k8s
icon: https://nats.io/img/nats-icon-color.png
keywords:
- nats
- messaging
- cncf
maintainers:
- email: wally@nats.io
  name: Waldemar Quevedo
  url: https://github.com/wallyqs
- email: colin@nats.io
  name: Colin Sullivan
  url: https://github.com/ColinSullivan1
- email: jaime@nats.io
  name: Jaime Piña
  url: https://github.com/nsurfer
- email: caleb@nats.io
  name: Caleb Lloyd
  url: https://github.com/caleblloyd
name: nats
version: 0.17.5

Steps to reproduce

No response

@alekseipalshin alekseipalshin added the defect Suspected defect such as a bug or regression label May 15, 2024
@Jarema
Copy link
Member

Jarema commented May 15, 2024

This is a pretty old NATS server version.
I would start with upgrading to latest 2.10.14 to see it that solves the problem.
We did put a lot of effort into JetStream between those two versions.

@alekseipalshin
Copy link
Author

alekseipalshin commented May 15, 2024

Thank you, I'll try.
When we updated the version from 2.8.4 to 2.9.16, the performance became much better.
Although the behavior seems strange, we have many other streams and there are no problems with them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect Suspected defect such as a bug or regression
Projects
None yet
Development

No branches or pull requests

2 participants