Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String optimizations for xDS #988

Closed
howardjohn opened this issue Apr 29, 2024 · 0 comments · Fixed by #1016
Closed

String optimizations for xDS #988

howardjohn opened this issue Apr 29, 2024 · 0 comments · Fixed by #1016
Assignees

Comments

@howardjohn
Copy link
Member

Ztunnel has two primary sources of memory usage:

  1. data plane traffic, largely buffers we hold in the IO flow
  2. Data passed over xds originally

There is essentially no other sources. While we store things for lookup, metrics, etc -- this data is all from XDS.

This largely is metadata. Strings like default, etc are copied likely 1000s of times. We can explore ways to optimize these by string interning and/or optimized string representations.

Quick numbers:

  • String 24 bytes (ptr, len, cap)
  • Arc<str> 16 bytes

Interning

Interning allows us to basically store all of our unique strings in one place, and then reference them. There are a variety of options here.

Requirements:

  1. We must have a way to bound growth (cleanup strings on last reference)
  2. We need to be able to intern all strings from xDS
  3. Interned strings should be ergonomic to use

Interning crates vary by whether they support multiple threads and if they require globals or locals.
In our case, the string creation is all single threaded in XDS.
However, it needs to be Send (because we are in async).
Additionally, the usage and Droping of strings is done in separate threads.

For cleanup, we can:

  1. return a type with a custom Drop implementation that removes things. This requires multithreading on the intern backing map (b/c now we are dropping from a different thread than creation); this could maybe be hyper-optimized to have drops queue up in some data structure and periodically flushed, but ultimately its still multithreaded. If we do not use a global store, we also need to reference to the map to cleanup, which adds 8 bytes
  2. Periodically cleanup. We know that things will pretty much only be dropped when an xds removal happens, so we could do cleanup then. This probably isn't 100% robust though, since somethings will hold on slightly past the removal, so we would need a timer as well. This would be pretty solid, but scanning the entire set may be expensive and lead to issues like "Every 5min my cpu spikes".

Some libraries return u64 (or even u32) identifiers that can be used to lookup a str. This is extreme efficient since we use only 4/8 bytes. However, this makes usage slower (need a lookup, which needs a lock) and possible less ergonomic (depending on implementation). This also only really makes sense with a global table, else we are spending half our bytes pointing back to the table.

String type

String is a bad fit for us; we do lots of cloning and no mutability. String has mutability (which we pay for in size) and slow clones. Arc<str> is 8 bytes less (16bytes) and has ~free clones; we could get a lot of wins there.

Another option is smol_str. This is like Arc<str> (and has the same cheap cloning), but uses 24 bytes. For the extra 8 bytes, we get inline strings for strings <22bytes long.

Overall

The best course of action here seems to be to have a global intern mapping, that does cleanup on drop. By using a global map, we save 8 bytes per string. By doing cleanup on drop, we use minimal memory and don't have weird edge cases.

For string type, Arc<str> seems to be the best fit for us right now.

https://crates.io/crates/internment provides all of the above semantics out of the box, so I suggest we experiment with that.

All of this, of course, is pending benchmarks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant