You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
data plane traffic, largely buffers we hold in the IO flow
Data passed over xds originally
There is essentially no other sources. While we store things for lookup, metrics, etc -- this data is all from XDS.
This largely is metadata. Strings like default, etc are copied likely 1000s of times. We can explore ways to optimize these by string interning and/or optimized string representations.
Quick numbers:
String 24 bytes (ptr, len, cap)
Arc<str> 16 bytes
Interning
Interning allows us to basically store all of our unique strings in one place, and then reference them. There are a variety of options here.
Requirements:
We must have a way to bound growth (cleanup strings on last reference)
We need to be able to intern all strings from xDS
Interned strings should be ergonomic to use
Interning crates vary by whether they support multiple threads and if they require globals or locals.
In our case, the string creation is all single threaded in XDS.
However, it needs to be Send (because we are in async).
Additionally, the usage and Droping of strings is done in separate threads.
For cleanup, we can:
return a type with a custom Drop implementation that removes things. This requires multithreading on the intern backing map (b/c now we are dropping from a different thread than creation); this could maybe be hyper-optimized to have drops queue up in some data structure and periodically flushed, but ultimately its still multithreaded. If we do not use a global store, we also need to reference to the map to cleanup, which adds 8 bytes
Periodically cleanup. We know that things will pretty much only be dropped when an xds removal happens, so we could do cleanup then. This probably isn't 100% robust though, since somethings will hold on slightly past the removal, so we would need a timer as well. This would be pretty solid, but scanning the entire set may be expensive and lead to issues like "Every 5min my cpu spikes".
Some libraries return u64 (or even u32) identifiers that can be used to lookup a str. This is extreme efficient since we use only 4/8 bytes. However, this makes usage slower (need a lookup, which needs a lock) and possible less ergonomic (depending on implementation). This also only really makes sense with a global table, else we are spending half our bytes pointing back to the table.
String type
String is a bad fit for us; we do lots of cloning and no mutability. String has mutability (which we pay for in size) and slow clones. Arc<str> is 8 bytes less (16bytes) and has ~free clones; we could get a lot of wins there.
Another option is smol_str. This is like Arc<str> (and has the same cheap cloning), but uses 24 bytes. For the extra 8 bytes, we get inline strings for strings <22bytes long.
Overall
The best course of action here seems to be to have a global intern mapping, that does cleanup on drop. By using a global map, we save 8 bytes per string. By doing cleanup on drop, we use minimal memory and don't have weird edge cases.
For string type, Arc<str> seems to be the best fit for us right now.
Ztunnel has two primary sources of memory usage:
There is essentially no other sources. While we store things for lookup, metrics, etc -- this data is all from XDS.
This largely is metadata. Strings like
default
, etc are copied likely 1000s of times. We can explore ways to optimize these by string interning and/or optimized string representations.Quick numbers:
String
24 bytes (ptr, len, cap)Arc<str>
16 bytesInterning
Interning allows us to basically store all of our unique strings in one place, and then reference them. There are a variety of options here.
Requirements:
Interning crates vary by whether they support multiple threads and if they require globals or locals.
In our case, the string creation is all single threaded in XDS.
However, it needs to be
Send
(because we are in async).Additionally, the usage and
Drop
ing of strings is done in separate threads.For cleanup, we can:
Drop
implementation that removes things. This requires multithreading on the intern backing map (b/c now we are dropping from a different thread than creation); this could maybe be hyper-optimized to have drops queue up in some data structure and periodically flushed, but ultimately its still multithreaded. If we do not use a global store, we also need to reference to the map to cleanup, which adds 8 bytesSome libraries return u64 (or even u32) identifiers that can be used to lookup a str. This is extreme efficient since we use only 4/8 bytes. However, this makes usage slower (need a lookup, which needs a lock) and possible less ergonomic (depending on implementation). This also only really makes sense with a global table, else we are spending half our bytes pointing back to the table.
String type
String
is a bad fit for us; we do lots of cloning and no mutability. String has mutability (which we pay for in size) and slow clones.Arc<str>
is 8 bytes less (16bytes) and has ~free clones; we could get a lot of wins there.Another option is
smol_str
. This is likeArc<str>
(and has the same cheap cloning), but uses 24 bytes. For the extra 8 bytes, we get inline strings for strings <22bytes long.Overall
The best course of action here seems to be to have a global intern mapping, that does cleanup on drop. By using a global map, we save 8 bytes per string. By doing cleanup on drop, we use minimal memory and don't have weird edge cases.
For string type,
Arc<str>
seems to be the best fit for us right now.https://crates.io/crates/internment provides all of the above semantics out of the box, so I suggest we experiment with that.
All of this, of course, is pending benchmarks.
The text was updated successfully, but these errors were encountered: