String optimizations for xDS #988

howardjohn · 2024-04-29T20:29:16Z

Ztunnel has two primary sources of memory usage:

data plane traffic, largely buffers we hold in the IO flow
Data passed over xds originally

There is essentially no other sources. While we store things for lookup, metrics, etc -- this data is all from XDS.

This largely is metadata. Strings like default, etc are copied likely 1000s of times. We can explore ways to optimize these by string interning and/or optimized string representations.

Quick numbers:

String 24 bytes (ptr, len, cap)
Arc<str> 16 bytes

Interning

Interning allows us to basically store all of our unique strings in one place, and then reference them. There are a variety of options here.

Requirements:

We must have a way to bound growth (cleanup strings on last reference)
We need to be able to intern all strings from xDS
Interned strings should be ergonomic to use

Interning crates vary by whether they support multiple threads and if they require globals or locals.
In our case, the string creation is all single threaded in XDS.
However, it needs to be Send (because we are in async).
Additionally, the usage and Droping of strings is done in separate threads.

For cleanup, we can:

return a type with a custom Drop implementation that removes things. This requires multithreading on the intern backing map (b/c now we are dropping from a different thread than creation); this could maybe be hyper-optimized to have drops queue up in some data structure and periodically flushed, but ultimately its still multithreaded. If we do not use a global store, we also need to reference to the map to cleanup, which adds 8 bytes
Periodically cleanup. We know that things will pretty much only be dropped when an xds removal happens, so we could do cleanup then. This probably isn't 100% robust though, since somethings will hold on slightly past the removal, so we would need a timer as well. This would be pretty solid, but scanning the entire set may be expensive and lead to issues like "Every 5min my cpu spikes".

Some libraries return u64 (or even u32) identifiers that can be used to lookup a str. This is extreme efficient since we use only 4/8 bytes. However, this makes usage slower (need a lookup, which needs a lock) and possible less ergonomic (depending on implementation). This also only really makes sense with a global table, else we are spending half our bytes pointing back to the table.

String type

String is a bad fit for us; we do lots of cloning and no mutability. String has mutability (which we pay for in size) and slow clones. Arc<str> is 8 bytes less (16bytes) and has ~free clones; we could get a lot of wins there.

Another option is smol_str. This is like Arc<str> (and has the same cheap cloning), but uses 24 bytes. For the extra 8 bytes, we get inline strings for strings <22bytes long.

Overall

The best course of action here seems to be to have a global intern mapping, that does cleanup on drop. By using a global map, we save 8 bytes per string. By doing cleanup on drop, we use minimal memory and don't have weird edge cases.

For string type, Arc<str> seems to be the best fit for us right now.

https://crates.io/crates/internment provides all of the above semantics out of the box, so I suggest we experiment with that.

All of this, of course, is pending benchmarks.

The text was updated successfully, but these errors were encountered:

howardjohn self-assigned this Apr 29, 2024

howardjohn mentioned this issue May 3, 2024

performance: optimized string and various other XDS improvements #1016

Merged

istio-testing closed this as completed in #1016 May 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

String optimizations for xDS #988

String optimizations for xDS #988

howardjohn commented Apr 29, 2024

String optimizations for xDS #988

String optimizations for xDS #988

Comments

howardjohn commented Apr 29, 2024

Interning

String type

Overall