1M topics #4341

antlad · 2023-07-27T09:07:39Z

Recently, doing performance research - we found that after NATS server reaches some large number of topics, its consuming performance degraded. Even if hardware is capable, NATS server is not scaling. Is there any configuration/patch we can do to improve performance with large number of topics? Below are details:

Hardware: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz (32 cores) . 126GB RAM.
Publishing is done in synthetic test: 1M msg/sec messages, but in 1M topics with pattern X.X.X
Expected: 1M msg/sec
Actual: msg/sec is around 700k.
NATS server version: 2.9.19
nats-top:

htop:

Test code (compiled with golang 1.20).:
start with

for ((i=0; i<=40; i++))
do
  nohup ./publish  --app_num $i &
done

package main

import (
	"context"
	"fmt"
	"github.com/nats-io/nats.go"
	"github.com/urfave/cli/v2"
	"os"
	"os/signal"
	"sync/atomic"
	"time"
)

var payload = []byte(`J534V53qJr4zs756XKEjlZ5dXpCbSN8LdaLtujtgBOWJ4bgFuVwJLQWMAa6yjrIbxcnYbsodBSXXLr6LvA4YbYcOtlgWnJ52mhAyf8Px9D5t4OQUTUHcaTCjMcX8f0UYfOrvBnEwY8oKyeVFTLQLXpSs4rUPc5Gt4xU3oKbBZF4WKjgikxn3tPLoIkHZhPSG58RyAJD6U7A9DF4COuemClBq6WIe68ZeI41OiOQyV0ChhEUiyXz5PgI3oKJWlW30glZg06DGk024rJVCMDG9nRZt3hIEgFHANoLyJd9lqWyQ`)

func worker(ctx context.Context, interval time.Duration, nc *nats.Conn, topic string, batchSize int, publishedCount *uint32) {
	ticker := time.NewTicker(interval)
	defer ticker.Stop()

	for {
		select {
		case <-ticker.C:
			for n := 0; n < batchSize; n++ {

				if err := nc.Publish(fmt.Sprintf("%s.%d", topic, n), payload); err != nil {
					fmt.Println("error publish: ", err)
				} else {
					atomic.AddUint32(publishedCount, 1)
				}
			}
		case <-ctx.Done():
			return
		}
	}
}

func work(c *cli.Context) error {
	ctx, cancel := signal.NotifyContext(context.Background(), os.Interrupt)
	defer cancel()
	nc, err := nats.Connect(c.String("url"))
	if err != nil {
		return err
	}
	defer nc.Close()

	appNum := c.Int("app_num")
	parallel := c.Int("parallel")
	interval := c.Duration("interval")
	totalMessages := c.Int("tags")

	batchSize := totalMessages / parallel

	var publishedCount uint32

	go func() {
		ticker := time.NewTicker(1 * time.Second)
		defer ticker.Stop()

		for {
			select {
			case <-ticker.C:
				count := atomic.SwapUint32(&publishedCount, 0)
				fmt.Printf("app %d published %d\n", appNum, count)
			case <-ctx.Done():
				return
			}
		}
	}()

	for i := 0; i < parallel; i++ {
		go worker(ctx, interval, nc, fmt.Sprintf("%d.%d", appNum, i), batchSize, &publishedCount)
	}
	<-ctx.Done()
	fmt.Println("Benchmarking completed.")

	return nil
}

func main() {
	app := &cli.App{
		Flags: []cli.Flag{
			&cli.IntFlag{
				Name:  "app_num",
				Value: 0,
			},
			&cli.StringFlag{
				Name:  "url",
				Value: nats.DefaultURL,
			},
			&cli.IntFlag{
				Name:  "parallel",
				Value: 100,
			},
			&cli.IntFlag{
				Name:  "tags",
				Value: 25000,
			},
			&cli.DurationFlag{
				Name:  "interval",
				Value: time.Second,
			},
		},
		Action: work,
	}

	if err := app.Run(os.Args); err != nil {
		fmt.Println("Error:", err)
	}
}

The text was updated successfully, but these errors were encountered:

derekcollison · 2023-07-28T23:46:36Z

Why publish batches at 1s interval, why not just nc.Publish() in a for loop as fast as you can?

antlad · 2023-07-29T16:03:31Z

I.m simulating here real workflow. Where we have some predefined polling interval. Software grubbing data with this polling interval and need to publish it asap.

derekcollison · 2023-07-29T18:39:48Z

ok, so the server does show 1M msgs/sec which is inline with your expectations, but only until the number of subjects you have published too becomes very high, meaning number of distinct subjects.

kozlovic · 2023-07-31T02:48:18Z

Have you also tried to use more connections on the client? If I understand your test correctly, each instance of publish is running 100 concurrent go routines each producing 25,000 messages. All 100 go routines share the same connection, so there is quite a bit of contention (locking wise) in the application. Furthermore each go routine sleeps for the same amount of time (1 sec), so they all try to send at the same time. You could also add a bit of random delay so that there is less contention.

Do not hold onto no interest subjects from a client in the unlocked cache. If sending lots of different subjects all with no interest performance could be affected. Signed-off-by: Derek Collison <derek@nats.io> Resolves #4341

antlad · 2023-08-07T17:04:46Z

I see performance improvements in latest version. But not yet target number. Will try more connection per client and will come back.

antlad · 2023-08-07T19:25:46Z

Still present, even with higher number of connections. Will continue in slack

jnmoyne · 2023-08-07T23:34:53Z

What performance number do you get using nats bench

e.g. try nats bench foo --msgs 1000000 --pub 100 --no-progress --multisubject --multisubjectmax 100000? What kind of network do you have between the clients and the server(s)?

antlad · 2023-08-08T06:12:08Z

@jnmoyne thanks, localhost, just tried this command

jnmoyne · 2023-08-08T22:37:13Z

Going over localhost those are really low numbers...
For comparison on my M1 Mac Studio:

...
 [95] 27,612 msgs/sec ~ 3.37 MB/sec (10000 msgs)
 [96] 27,580 msgs/sec ~ 3.37 MB/sec (10000 msgs)
 [97] 27,549 msgs/sec ~ 3.36 MB/sec (10000 msgs)
 [98] 27,479 msgs/sec ~ 3.35 MB/sec (10000 msgs)
 [99] 27,452 msgs/sec ~ 3.35 MB/sec (10000 msgs)
 [100] 27,362 msgs/sec ~ 3.34 MB/sec (10000 msgs)
 min 27,362 | avg 29,265 | max 33,743 | stddev 1,412 msgs

antlad · 2023-08-09T17:49:22Z

I got similar numbers on macbook M1. Will try to explore more hardware.

HeavyHorst · 2023-08-23T13:04:21Z

@antlad FWIW: if you constantly push to so many different topics it may be good (performance wise) to disable the sublist cache.

On an i5-10210U i get (with cache):

 [93] 17,069 msgs/sec ~ 2.08 MB/sec (10000 msgs)
 [94] 17,056 msgs/sec ~ 2.08 MB/sec (10000 msgs)
 [95] 17,030 msgs/sec ~ 2.08 MB/sec (10000 msgs)
 [96] 17,025 msgs/sec ~ 2.08 MB/sec (10000 msgs)
 [97] 17,023 msgs/sec ~ 2.08 MB/sec (10000 msgs)
 [98] 17,013 msgs/sec ~ 2.08 MB/sec (10000 msgs)
 [99] 17,006 msgs/sec ~ 2.08 MB/sec (10000 msgs)
 [100] 16,994 msgs/sec ~ 2.07 MB/sec (10000 msgs)
 min 16,994 | avg 18,519 | max 22,831 | stddev 1,452 msgs

with cache disabled:

cat config 
disable_sublist_cache: true

nats-server -c config

 [90] 26,665 msgs/sec ~ 3.26 MB/sec (10000 msgs)
 [91] 26,655 msgs/sec ~ 3.25 MB/sec (10000 msgs)
 [92] 26,611 msgs/sec ~ 3.25 MB/sec (10000 msgs)
 [93] 26,620 msgs/sec ~ 3.25 MB/sec (10000 msgs)
 [94] 26,606 msgs/sec ~ 3.25 MB/sec (10000 msgs)
 [95] 26,544 msgs/sec ~ 3.24 MB/sec (10000 msgs)
 [96] 26,572 msgs/sec ~ 3.24 MB/sec (10000 msgs)
 [97] 26,566 msgs/sec ~ 3.24 MB/sec (10000 msgs)
 [98] 26,535 msgs/sec ~ 3.24 MB/sec (10000 msgs)
 [99] 26,533 msgs/sec ~ 3.24 MB/sec (10000 msgs)
 [100] 26,539 msgs/sec ~ 3.24 MB/sec (10000 msgs)
 min 26,533 | avg 30,923 | max 166,424 | stddev 14,025 msgs

Another thing i tried is increasing the default sublistCache size from 1024 to 100000 (slCacheMax constant in the sublist.go file). I don't know the implications but this gives me basically the same performance as when i publish only to 10 subjects:

 [86] 64,116 msgs/sec ~ 7.83 MB/sec (10000 msgs)
 [87] 63,770 msgs/sec ~ 7.78 MB/sec (10000 msgs)
 [88] 63,759 msgs/sec ~ 7.78 MB/sec (10000 msgs)
 [89] 63,434 msgs/sec ~ 7.74 MB/sec (10000 msgs)
 [90] 63,400 msgs/sec ~ 7.74 MB/sec (10000 msgs)
 [91] 62,001 msgs/sec ~ 7.57 MB/sec (10000 msgs)
 [92] 62,025 msgs/sec ~ 7.57 MB/sec (10000 msgs)
 [93] 61,951 msgs/sec ~ 7.56 MB/sec (10000 msgs)
 [94] 61,681 msgs/sec ~ 7.53 MB/sec (10000 msgs)
 [95] 61,233 msgs/sec ~ 7.47 MB/sec (10000 msgs)
 [96] 60,950 msgs/sec ~ 7.44 MB/sec (10000 msgs)
 [97] 60,419 msgs/sec ~ 7.38 MB/sec (10000 msgs)
 [98] 60,316 msgs/sec ~ 7.36 MB/sec (10000 msgs)
 [99] 59,991 msgs/sec ~ 7.32 MB/sec (10000 msgs)
 [100] 59,796 msgs/sec ~ 7.30 MB/sec (10000 msgs)
 min 59,796 | avg 109,752 | max 1,066,742 | stddev 115,126 msgs

antlad · 2023-08-30T14:36:32Z

@HeavyHorst Thank you! It works! Ideally will be put this constant into config.

derekcollison mentioned this issue Aug 2, 2023

[IMPROVED] Publish performance with lots of no interest subjects #4359

Merged

derekcollison closed this as completed in #4359 Aug 2, 2023

HeavyHorst mentioned this issue Aug 30, 2023

lockfree fixed size sublist cache #4451

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1M topics #4341

1M topics #4341

antlad commented Jul 27, 2023

derekcollison commented Jul 28, 2023

antlad commented Jul 29, 2023

derekcollison commented Jul 29, 2023

kozlovic commented Jul 31, 2023

antlad commented Aug 7, 2023

antlad commented Aug 7, 2023

jnmoyne commented Aug 7, 2023

antlad commented Aug 8, 2023 •

edited

jnmoyne commented Aug 8, 2023

antlad commented Aug 9, 2023

HeavyHorst commented Aug 23, 2023

antlad commented Aug 30, 2023

1M topics #4341

1M topics #4341

Comments

antlad commented Jul 27, 2023

derekcollison commented Jul 28, 2023

antlad commented Jul 29, 2023

derekcollison commented Jul 29, 2023

kozlovic commented Jul 31, 2023

antlad commented Aug 7, 2023

antlad commented Aug 7, 2023

jnmoyne commented Aug 7, 2023

antlad commented Aug 8, 2023 • edited

jnmoyne commented Aug 8, 2023

antlad commented Aug 9, 2023

HeavyHorst commented Aug 23, 2023

antlad commented Aug 30, 2023

antlad commented Aug 8, 2023 •

edited