Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1M topics #4341

Closed
antlad opened this issue Jul 27, 2023 · 12 comments · Fixed by #4359 · May be fixed by #4451
Closed

1M topics #4341

antlad opened this issue Jul 27, 2023 · 12 comments · Fixed by #4359 · May be fixed by #4451

Comments

@antlad
Copy link

antlad commented Jul 27, 2023

Recently, doing performance research - we found that after NATS server reaches some large number of topics, its consuming performance degraded. Even if hardware is capable, NATS server is not scaling. Is there any configuration/patch we can do to improve performance with large number of topics? Below are details:

Hardware: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz (32 cores) . 126GB RAM.
Publishing is done in synthetic test: 1M msg/sec messages, but in 1M topics with pattern X.X.X
Expected: 1M msg/sec
Actual: msg/sec is around 700k.
NATS server version: 2.9.19
nats-top:
image
htop:
image

Test code (compiled with golang 1.20).:
start with

for ((i=0; i<=40; i++))
do
  nohup ./publish  --app_num $i &
done
package main

import (
	"context"
	"fmt"
	"github.com/nats-io/nats.go"
	"github.com/urfave/cli/v2"
	"os"
	"os/signal"
	"sync/atomic"
	"time"
)

var payload = []byte(`J534V53qJr4zs756XKEjlZ5dXpCbSN8LdaLtujtgBOWJ4bgFuVwJLQWMAa6yjrIbxcnYbsodBSXXLr6LvA4YbYcOtlgWnJ52mhAyf8Px9D5t4OQUTUHcaTCjMcX8f0UYfOrvBnEwY8oKyeVFTLQLXpSs4rUPc5Gt4xU3oKbBZF4WKjgikxn3tPLoIkHZhPSG58RyAJD6U7A9DF4COuemClBq6WIe68ZeI41OiOQyV0ChhEUiyXz5PgI3oKJWlW30glZg06DGk024rJVCMDG9nRZt3hIEgFHANoLyJd9lqWyQ`)

func worker(ctx context.Context, interval time.Duration, nc *nats.Conn, topic string, batchSize int, publishedCount *uint32) {
	ticker := time.NewTicker(interval)
	defer ticker.Stop()

	for {
		select {
		case <-ticker.C:
			for n := 0; n < batchSize; n++ {

				if err := nc.Publish(fmt.Sprintf("%s.%d", topic, n), payload); err != nil {
					fmt.Println("error publish: ", err)
				} else {
					atomic.AddUint32(publishedCount, 1)
				}
			}
		case <-ctx.Done():
			return
		}
	}
}

func work(c *cli.Context) error {
	ctx, cancel := signal.NotifyContext(context.Background(), os.Interrupt)
	defer cancel()
	nc, err := nats.Connect(c.String("url"))
	if err != nil {
		return err
	}
	defer nc.Close()

	appNum := c.Int("app_num")
	parallel := c.Int("parallel")
	interval := c.Duration("interval")
	totalMessages := c.Int("tags")

	batchSize := totalMessages / parallel

	var publishedCount uint32

	go func() {
		ticker := time.NewTicker(1 * time.Second)
		defer ticker.Stop()

		for {
			select {
			case <-ticker.C:
				count := atomic.SwapUint32(&publishedCount, 0)
				fmt.Printf("app %d published %d\n", appNum, count)
			case <-ctx.Done():
				return
			}
		}
	}()

	for i := 0; i < parallel; i++ {
		go worker(ctx, interval, nc, fmt.Sprintf("%d.%d", appNum, i), batchSize, &publishedCount)
	}
	<-ctx.Done()
	fmt.Println("Benchmarking completed.")

	return nil
}

func main() {
	app := &cli.App{
		Flags: []cli.Flag{
			&cli.IntFlag{
				Name:  "app_num",
				Value: 0,
			},
			&cli.StringFlag{
				Name:  "url",
				Value: nats.DefaultURL,
			},
			&cli.IntFlag{
				Name:  "parallel",
				Value: 100,
			},
			&cli.IntFlag{
				Name:  "tags",
				Value: 25000,
			},
			&cli.DurationFlag{
				Name:  "interval",
				Value: time.Second,
			},
		},
		Action: work,
	}

	if err := app.Run(os.Args); err != nil {
		fmt.Println("Error:", err)
	}
}
@derekcollison
Copy link
Member

Why publish batches at 1s interval, why not just nc.Publish() in a for loop as fast as you can?

@antlad
Copy link
Author

antlad commented Jul 29, 2023

I.m simulating here real workflow. Where we have some predefined polling interval. Software grubbing data with this polling interval and need to publish it asap.

@derekcollison
Copy link
Member

ok, so the server does show 1M msgs/sec which is inline with your expectations, but only until the number of subjects you have published too becomes very high, meaning number of distinct subjects.

@kozlovic
Copy link
Member

Have you also tried to use more connections on the client? If I understand your test correctly, each instance of publish is running 100 concurrent go routines each producing 25,000 messages. All 100 go routines share the same connection, so there is quite a bit of contention (locking wise) in the application. Furthermore each go routine sleeps for the same amount of time (1 sec), so they all try to send at the same time. You could also add a bit of random delay so that there is less contention.

derekcollison added a commit that referenced this issue Aug 2, 2023
Do not hold onto no interest subjects from a client in the unlocked cache.
If sending lots of different subjects all with no interest performance could be affected.

Signed-off-by: Derek Collison <derek@nats.io>

Resolves #4341
@antlad
Copy link
Author

antlad commented Aug 7, 2023

I see performance improvements in latest version. But not yet target number. Will try more connection per client and will come back.

@antlad
Copy link
Author

antlad commented Aug 7, 2023

Still present, even with higher number of connections. Will continue in slack

@jnmoyne
Copy link
Contributor

jnmoyne commented Aug 7, 2023

What performance number do you get using nats bench

e.g. try nats bench foo --msgs 1000000 --pub 100 --no-progress --multisubject --multisubjectmax 100000? What kind of network do you have between the clients and the server(s)?

@antlad
Copy link
Author

antlad commented Aug 8, 2023

@jnmoyne thanks, localhost, just tried this command
image

@jnmoyne
Copy link
Contributor

jnmoyne commented Aug 8, 2023

Going over localhost those are really low numbers...
For comparison on my M1 Mac Studio:

...
 [95] 27,612 msgs/sec ~ 3.37 MB/sec (10000 msgs)
 [96] 27,580 msgs/sec ~ 3.37 MB/sec (10000 msgs)
 [97] 27,549 msgs/sec ~ 3.36 MB/sec (10000 msgs)
 [98] 27,479 msgs/sec ~ 3.35 MB/sec (10000 msgs)
 [99] 27,452 msgs/sec ~ 3.35 MB/sec (10000 msgs)
 [100] 27,362 msgs/sec ~ 3.34 MB/sec (10000 msgs)
 min 27,362 | avg 29,265 | max 33,743 | stddev 1,412 msgs

@antlad
Copy link
Author

antlad commented Aug 9, 2023

I got similar numbers on macbook M1. Will try to explore more hardware.
Screenshot 2023-08-09 at 21 39 07

@HeavyHorst
Copy link

@antlad FWIW: if you constantly push to so many different topics it may be good (performance wise) to disable the sublist cache.

On an i5-10210U i get (with cache):

 [93] 17,069 msgs/sec ~ 2.08 MB/sec (10000 msgs)
 [94] 17,056 msgs/sec ~ 2.08 MB/sec (10000 msgs)
 [95] 17,030 msgs/sec ~ 2.08 MB/sec (10000 msgs)
 [96] 17,025 msgs/sec ~ 2.08 MB/sec (10000 msgs)
 [97] 17,023 msgs/sec ~ 2.08 MB/sec (10000 msgs)
 [98] 17,013 msgs/sec ~ 2.08 MB/sec (10000 msgs)
 [99] 17,006 msgs/sec ~ 2.08 MB/sec (10000 msgs)
 [100] 16,994 msgs/sec ~ 2.07 MB/sec (10000 msgs)
 min 16,994 | avg 18,519 | max 22,831 | stddev 1,452 msgs

with cache disabled:

cat config 
disable_sublist_cache: true

nats-server -c config
 [90] 26,665 msgs/sec ~ 3.26 MB/sec (10000 msgs)
 [91] 26,655 msgs/sec ~ 3.25 MB/sec (10000 msgs)
 [92] 26,611 msgs/sec ~ 3.25 MB/sec (10000 msgs)
 [93] 26,620 msgs/sec ~ 3.25 MB/sec (10000 msgs)
 [94] 26,606 msgs/sec ~ 3.25 MB/sec (10000 msgs)
 [95] 26,544 msgs/sec ~ 3.24 MB/sec (10000 msgs)
 [96] 26,572 msgs/sec ~ 3.24 MB/sec (10000 msgs)
 [97] 26,566 msgs/sec ~ 3.24 MB/sec (10000 msgs)
 [98] 26,535 msgs/sec ~ 3.24 MB/sec (10000 msgs)
 [99] 26,533 msgs/sec ~ 3.24 MB/sec (10000 msgs)
 [100] 26,539 msgs/sec ~ 3.24 MB/sec (10000 msgs)
 min 26,533 | avg 30,923 | max 166,424 | stddev 14,025 msgs

Another thing i tried is increasing the default sublistCache size from 1024 to 100000 (slCacheMax constant in the sublist.go file). I don't know the implications but this gives me basically the same performance as when i publish only to 10 subjects:

 [86] 64,116 msgs/sec ~ 7.83 MB/sec (10000 msgs)
 [87] 63,770 msgs/sec ~ 7.78 MB/sec (10000 msgs)
 [88] 63,759 msgs/sec ~ 7.78 MB/sec (10000 msgs)
 [89] 63,434 msgs/sec ~ 7.74 MB/sec (10000 msgs)
 [90] 63,400 msgs/sec ~ 7.74 MB/sec (10000 msgs)
 [91] 62,001 msgs/sec ~ 7.57 MB/sec (10000 msgs)
 [92] 62,025 msgs/sec ~ 7.57 MB/sec (10000 msgs)
 [93] 61,951 msgs/sec ~ 7.56 MB/sec (10000 msgs)
 [94] 61,681 msgs/sec ~ 7.53 MB/sec (10000 msgs)
 [95] 61,233 msgs/sec ~ 7.47 MB/sec (10000 msgs)
 [96] 60,950 msgs/sec ~ 7.44 MB/sec (10000 msgs)
 [97] 60,419 msgs/sec ~ 7.38 MB/sec (10000 msgs)
 [98] 60,316 msgs/sec ~ 7.36 MB/sec (10000 msgs)
 [99] 59,991 msgs/sec ~ 7.32 MB/sec (10000 msgs)
 [100] 59,796 msgs/sec ~ 7.30 MB/sec (10000 msgs)
 min 59,796 | avg 109,752 | max 1,066,742 | stddev 115,126 msgs

@antlad
Copy link
Author

antlad commented Aug 30, 2023

@HeavyHorst Thank you! It works! Ideally will be put this constant into config.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants