Idea: Fiber and worker prod profiling #3834

djspiewak · 2023-09-17T20:10:08Z

One of the hardest problems in building a coroutine application is figuring out how to balance your task granularity, not to mention sort out which part of your application may or may not be causing load issues. This isn't helped by the fact that the userspace scheduler is incredibly opaque.

Conceptually, all work-stealing schedulers converge to a self-balanced set of tuning parameters implicitly encoded in their rate of task advance, frequency of theft, frequency of external polling, and general queue size. The concept of fairness is implicit within this self-tuning and the way in which it relates to the construction of the fibers themselves (e.g. how many flatMaps, how big your delays are, etc). As these tuning parameters are not numerically expressed, it can be a bit difficult to even talk about what they are or what they mean. To make matters worse, they are also very environment-dependent. Despite the negative feedback loops, the butterfly effect is still quite real: small changes in clock speed, I/O latency, memory bus, etc can have massive impacts on how the runtime ends up "settling" when it converges on its optimal steady state.

This is all very unfortunate because, in theory, the tuning configuration associated with this steady state captures everything you might want to know about how your application is performing and, even more importantly, what isn't performing well (e.g. a fiber that has poor task granularity) and roughly how to fix it. Again, the problem here is not that we don't infer this information, but rather that the information is represented in a form which is neither discretely representable nor comprehensible to our puny human brains.

What we really want to see is stuff like task time (in terms of microseconds), relative granularity, which fibers are taking up extra time, stealing rate, etc. A lot of this tends to be conceptualized in terms of wall-clock time, which… we now have.

Since we started handling timers intrinsically we added a nanoTime syscall to the worker thread loop. In theory, this could allow the worker threads to "profile" themselves passively, having an idea of what their task granularity looks like, how quickly they advance through their state machine, rough scheduling overhead, etc. This in and of itself is really great information and we should expose it (probably via mbeans), but we can go even further.

With a bit of math, we can probably build on this thread-specific information and derive fiber-specific profiling information (e.g. task granularity). This data could be stored on the fibers themselves (at the cost of one more pointer in the object header, though maybe we could cheat and put it on the tracing object). We already have a way of getting a reference to all fibers within the system, and just as we can enumerate them for state and trace information, we could also enumerate them for task granularity, total runtime, etc. This kind of information could be incredibly invaluable for a tool like @mpilquist's cats-effect-console.

The text was updated successfully, but these errors were encountered:

djspiewak added the 🔬 experiment label Sep 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Idea: Fiber and worker prod profiling #3834

Idea: Fiber and worker prod profiling #3834

djspiewak commented Sep 17, 2023

Idea: Fiber and worker prod profiling #3834

Idea: Fiber and worker prod profiling #3834

Comments

djspiewak commented Sep 17, 2023