Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Event streaming #404

Open
apangin opened this issue Mar 24, 2021 · 9 comments
Open

Event streaming #404

apangin opened this issue Mar 24, 2021 · 9 comments

Comments

@apangin
Copy link
Collaborator

apangin commented Mar 24, 2021

Allow to process events while profiling is active

@PaulBGD
Copy link

PaulBGD commented Sep 23, 2021

Does this exist now with jfr sync?

@apangin
Copy link
Collaborator Author

apangin commented Sep 23, 2021

@PaulBGD No, jfrsync combines events from async-profiler and Flight Recorder in a file.

@jhalliday
Copy link

Is this being worked on? As OpenTelemetry starts to look at adding continuous profiling support (issue, slack) I'm considering how it may be implemented in their Java SDK. My current prototype uses the JFR's new(ish) streaming API, but I'd like to be able to abstract over JFR and async-profiler as data sources. Whilst it's possible to read the log file and transform it to an event stream, eliding that and getting events more directly through a Java API would be a welcome enhancement.

@apangin
Copy link
Collaborator Author

apangin commented Jun 27, 2022

@jhalliday I'm not working on this feature right now, since there are no customers for it. This may change though, if there is enough interest.

I doubt that event streaming in async-profiler will look any similar to JEP 349. Despite its name, JFR Event Streaming is not the right tool for the real-time streaming of profile data, since it ruins the main advantage of the production profiling - the low overhead. 1
Instead, async-profiler will likely transmit data in a compact binary representation that may be parsed (but not necessarily) into Java object model on the receiver side.

Footnotes

  1. http://hirt.se/blog/?p=1239

@jhalliday
Copy link

Thanks for the update.

The OTel work is specifying an interop wire format for profiling data across many platforms, looking for a good balance between bandwidth use and data handling overhead. Marcus' DataDog colleagues and other observability vendors are part of that discussion and bring their experience with JFR and similar tools to the table.

For the JVM OTel SDK, there is the possibility to handle the encoding in C rather than Java before passing it out through the Java network stack, but one way or another we'll need to transcode the various profiler's 'native' recording format to the OTel transport one, whatever that ends up looking like, either at the client or perhaps at an intermediate gateway ('collector', in OTel terms).

I'm not particularly enchanted with the JFR streaming API for this use case, but it's what we have right now if reading files back from disk doesn't appeal. Longer term there may be the possibility of better aligning it with emerging observability needs, but perhaps async-profiler has the opportunity to be more agile here, given the JDK API change cadence. The async-profiler current context id PR for example would give OTel a way to correlate tracing and profiling signals by labelling a thread with the trace spanId, which is a gap in the functionality offered by JFR.

Anyhow, we'd welcome your thoughts if you have time to participate in the OTel process.

@farmerworking
Copy link

much needed!

continuous profiling is useless without analyze. currently I can only do it by reading output files produced by async profiler which is not convenient

I noticed that a issue related to "Publish Jar" is on going and AsyncProfilerMXBean is already defined

so it would be nice if I can just supply a callback function and get notified with profile result periodicity so that I can do things magic like: interrupt thread which allocate too much memory

by the way, async-profiler is really amazing

@JonasKunz
Copy link

We (elastic) would also be very interested in this feature!
We recently ported the inferred spans feature from our elastic-apm-agent to a standalone OpenTelemetry extension.

This feature enabled async-profiler in wall-clock profiling mode to fetch stacktraces for threads which have active OpenTelemetry spans.
These stacktraces are then used to generate synthetic spans for areas where the application spends time which are not covered by instrumentation.

One of the main pain points of this implementation is that at the moment the processing needs to happen after the profiling session has ended:

  • When a span is started, a profiling session with a fixed duration is started if none is running already
  • For the duration of this session, the extension also needs to spill a log of when which span was active at which thread to disk
  • After the profiling session is over, both the profiling data and the span log is read back and the synthetic spans are reconstructed

We would like to contribute this extension to the upstream OpenTelemetry project.
However, the current approach of having to spill tracing data to disk is quiet complex and required a lot of code, making this harder to contribute and maintain.
In addition the current approach has the downside that it doesn't work properly with spans which overlap multiple profiling sessions. While in theory this is doable, this would further increase the complexity.

A way of directly streaming the profiling samples back to the application with a reasonable low latency would greatly simplify and improve this feature.

I initially thought of proposing to extend the existing async-profiler Java API to:

  • Allow streaming of profiling stacktraces in the form of jmethodId-arrays (ignoring native stackframes) back to the Java application
  • Extend the API to allow resolving classes and method-names from jmethodIds

I would also be willing to contribute here, though I'd likely need a good amount of guidance for the first part.

However, I figured that this proposal might be just kind of a special case of your intentions of this issue, so I decided to comment here instead.

@apangin
Copy link
Collaborator Author

apangin commented Mar 31, 2024

@JonasKunz Thank you for your interest in the feature.
Currently, event streaming is not in the nearest project plans because of other priority features. That said, if you have a specific detailed proposal and ready to contribute, we can discuss that.

@pnf
Copy link

pnf commented May 4, 2024

We have similar requirements for streaming, always-on, massively distributed profiling with stacks representing native, java and asynchronous continuations. Our approach uses an async-profiler fork that supports injection of await/continuation frames and a compact ascii format for exporting multi-event stacks (cpu, lock, heap+native allocation, plus arbitrary custom events). Stacks are aggregated over chunks of (say) a minute in situ on every engine, processed, published out of band over kafka, and finally loaded into pyroscope. The chunking is key to making this practical across thousands of hosts; bandwidth aside, pyroscope can barely sustain assimilating a few dozen jfr files simultaneously, much less the magnitude we require.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants