Improved Symbolizer #4500

brancz · 2024-04-09T12:51:42Z

Parca's memory usage is very unpredictable, mainly because of symbolization.

Around 80% of the total memory usage can be attributed to symbolization-related things:

metastore: stores all the stacktraces, and tracks the ones that haven't been symbolized yet (using badgerdb)
symbolizer: the different symbolizer implementations (DWARF, gopclntab, symtab) all load entire sections into memory, as well as caches parts of the processing

The metastore additionally causes headaches as it can't just be rebuilt from scratch whenever it's wiped.

I think both of these can be significantly improved by changing our symbolization strategy a bit:

store stack traces in the columnar database (unsymbolized if they are not symbolized at write time)
perform read-time symbolization: instead of performing symbolization for absolutely everything in time asynchronously, only symbolize and cache those things that are actually being queried when they're being queried
improve symbolizer implementations to not require loading the whole section into memory (this will require forking the stdlib DWARF, gopclntab and symtab implementations but shouldn't be too difficult since just all accesses straight to a byte slice just need to be replaced with an io.ReaderAt)

This should remove most of the excessive memory usage.

The text was updated successfully, but these errors were encountered:

zdyj3170101136 · 2024-04-22T13:56:56Z

yeah~.

my ingester machine use 32 core 128 GB machine do symbolization for thousands of machine in real time.

Provide feedback