Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Introduce subject package #4886

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

svenfoo
Copy link
Contributor

@svenfoo svenfoo commented Dec 14, 2023

This is work in progress. It could potentially be merged already, but I am opening this PR mostly to get some comments on the approach taken here.

This introduces a new package subject, which holds code that is to a large extent copied from server/sublist.go. Eventually the code this was copied from should not any longer be needed and can then be removed. The idea of using a separate package is that this allows to hide the internals from direct access. This will make it much easier to refactor these internals later.

The new package introduces the type Subject which holds the string representation of a subject together with its tokenized form. By using this type throughout the server code we can avoid tokenizing the same subject string over and over again. At least for some use cases this seems to be happening. I profiled the NATS server and I can show you profiles where tokenizeSubjectIntoSlice() is the function where most CPU cycles are spent.

The PR has two commits, the first introduces the new type, the second commit shows how it can be used. To really benefit from the new Subject type it would also have to be used in sublist.go, memstore,go and filestore,go. Basically it should be used all over the place, everywhere a subject is used. This would become a large change and I would only want to invest time into it if the NATS developers agree with the approach and embrace it.

Please feel free to also point out bugs or problematic implementation details if you spot them. But most of all I am interested in hearing your opinion on the approach in general and whether you believe it is feasible and worthwhile.

The idea is to have a simple abstraction layer for a subject.
Together with the subject string it also stores the tokenized
form. Having this in a seperate package allows us to hide the
internals from direct access.

Signed-off-by: Sven Neumann <sven.neumann@holoplot.com>
Signed-off-by: Sven Neumann <sven.neumann@holoplot.com>
@svenfoo
Copy link
Contributor Author

svenfoo commented Dec 15, 2023

I have described our use case in more details in #4888 and also attached a profile there that shows the problems that my proposed changes are trying to address.

@derekcollison
Copy link
Member

Will take a look, but in this instance we are more concerned about memory bloat and are working on ways to alleviate memory pressure from []byte to string etc. This would add additional memory, so might not work.

We have customers pushing over 100M subjects etc.

@svenfoo
Copy link
Contributor Author

svenfoo commented Dec 16, 2023

Will take a look, but in this instance we are more concerned about memory bloat and are working on ways to alleviate memory pressure from []byte to string etc. This would add additional memory, so might not work.

Thanks for you input. I understand that memory is a concern. However I believe this can be implemented without memory bloat.

By keeping the Subject type in a separate package, we can refactor its inner guts. It could for example store the subject as a byte array and the tokenized form could be stored as offsets into this byte array.

I also do not suggest to actually store this Subject type in the message store. I would aim for instantiating it when a message is loaded into memory so that the upper layers (account, consumer, stream, ...) have something to work with that allows them to do efficient filter matching.

@derekcollison
Copy link
Member

It is always a trade off for memory vs speed, but we are watching memory usage quite carefully these days.

There are several designs underway that could refactor the PSIM layering of the filestore (in memory) and on disk. This work could lead to a new sublist structure that is more efficient.

As that work plays out we can revisit this at that time.

@github-actions github-actions bot added the stale This issue has had no activity in a while label Mar 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale This issue has had no activity in a while
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants