RFC: Introduce subject package #4886

svenfoo · 2023-12-14T08:01:06Z

This is work in progress. It could potentially be merged already, but I am opening this PR mostly to get some comments on the approach taken here.

This introduces a new package subject, which holds code that is to a large extent copied from server/sublist.go. Eventually the code this was copied from should not any longer be needed and can then be removed. The idea of using a separate package is that this allows to hide the internals from direct access. This will make it much easier to refactor these internals later.

The new package introduces the type Subject which holds the string representation of a subject together with its tokenized form. By using this type throughout the server code we can avoid tokenizing the same subject string over and over again. At least for some use cases this seems to be happening. I profiled the NATS server and I can show you profiles where tokenizeSubjectIntoSlice() is the function where most CPU cycles are spent.

The PR has two commits, the first introduces the new type, the second commit shows how it can be used. To really benefit from the new Subject type it would also have to be used in sublist.go, memstore,go and filestore,go. Basically it should be used all over the place, everywhere a subject is used. This would become a large change and I would only want to invest time into it if the NATS developers agree with the approach and embrace it.

Please feel free to also point out bugs or problematic implementation details if you spot them. But most of all I am interested in hearing your opinion on the approach in general and whether you believe it is feasible and worthwhile.

The idea is to have a simple abstraction layer for a subject. Together with the subject string it also stores the tokenized form. Having this in a seperate package allows us to hide the internals from direct access. Signed-off-by: Sven Neumann <sven.neumann@holoplot.com>

Signed-off-by: Sven Neumann <sven.neumann@holoplot.com>

svenfoo · 2023-12-15T16:51:59Z

I have described our use case in more details in #4888 and also attached a profile there that shows the problems that my proposed changes are trying to address.

derekcollison · 2023-12-15T21:42:11Z

Will take a look, but in this instance we are more concerned about memory bloat and are working on ways to alleviate memory pressure from []byte to string etc. This would add additional memory, so might not work.

We have customers pushing over 100M subjects etc.

svenfoo · 2023-12-16T11:20:20Z

Will take a look, but in this instance we are more concerned about memory bloat and are working on ways to alleviate memory pressure from []byte to string etc. This would add additional memory, so might not work.

Thanks for you input. I understand that memory is a concern. However I believe this can be implemented without memory bloat.

By keeping the Subject type in a separate package, we can refactor its inner guts. It could for example store the subject as a byte array and the tokenized form could be stored as offsets into this byte array.

I also do not suggest to actually store this Subject type in the message store. I would aim for instantiating it when a message is loaded into memory so that the upper layers (account, consumer, stream, ...) have something to work with that allows them to do efficient filter matching.

derekcollison · 2023-12-19T23:51:59Z

It is always a trade off for memory vs speed, but we are watching memory usage quite carefully these days.

There are several designs underway that could refactor the PSIM layering of the filestore (in memory) and on disk. This work could lead to a new sublist structure that is more efficient.

As that work plays out we can revisit this at that time.

svenfoo added 2 commits December 14, 2023 08:17

Use subject.Subject in consumer and start using it in stream

ba2a78c

Signed-off-by: Sven Neumann <sven.neumann@holoplot.com>

svenfoo requested a review from a team as a code owner December 14, 2023 08:01

svenfoo mentioned this pull request Dec 15, 2023

Performance issues with consumers that have multiple filter subjects #4888

Closed

github-actions bot added the stale This issue has had no activity in a while label Mar 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Introduce subject package #4886

RFC: Introduce subject package #4886

svenfoo commented Dec 14, 2023 •

edited

svenfoo commented Dec 15, 2023

derekcollison commented Dec 15, 2023

svenfoo commented Dec 16, 2023

derekcollison commented Dec 19, 2023

RFC: Introduce subject package #4886

Are you sure you want to change the base?

RFC: Introduce subject package #4886

Conversation

svenfoo commented Dec 14, 2023 • edited

svenfoo commented Dec 15, 2023

derekcollison commented Dec 15, 2023

svenfoo commented Dec 16, 2023

derekcollison commented Dec 19, 2023

svenfoo commented Dec 14, 2023 •

edited