Skip to content

Latest commit

 

History

History
205 lines (175 loc) · 13.1 KB

notes.md

File metadata and controls

205 lines (175 loc) · 13.1 KB

Instrumenting Elixir with Telemetry + LiveView

Resources

https://blog.smartlogic.io/instrumenting-with-telemetry/ https://github.com/beam-telemetry/telemetry https://hexdocs.pm/plug/Plug.Telemetry.html

Outline

  • Telemetry plug instruments request duration
  • Attach request [:my, :app, :start/:stop] to a handler
  • That handler sends messages to async reporters, this case LiveView - could also be StatsD, Prometheus, log statement, etc. This is where we talk about async processes.
  • LiveView handles the event and updates something in the UI
  • Add another point of instrumentation, maybe a metric for successful/failed external API requests
  • Attach a handler for that event, report to LV, etc.
  • Instrumenting Ecto query times?

UI with Chartkick

https://github.com/buren/chartkick-ex https://jacobburenstam.com/chartkick-ex/ https://github.com/buren/chartkick-phoenix-example

Implementation Plan

  • Simple Phoenix app with endpoints:
    • Landing page
    • [] Simple auth flow (sign up/sign in)
      • User schema and module, account context
      • Log in, sign up, log out, user show
      • Set up Telemetry + handler
      • Set up LiveView to receive messages from Telemetry handler, display simple count for num logins
      • Metric increment for num logins -> chart, count
      • Metric increment success/failure for logins -> chart
      • Query duration for find and create queries -> chart
      • Telemetry plug for landing page load time (add a random sleep between 1 and 5 seconds) -> chart

Instrumenting LiveView w Telemetry

We know how to take advantage of the telemetry plug to measure request times. But what about client/server interaction that does not occur over HTTP? As we use LV for more and more real-time features, how can we instrument WS communication duration in a sane and scalable manner?

Inspired by: https://github.com/elixir-plug/plug/blob/master/lib/plug/telemetry.ex#L76

defmeasured handle_event(event, payload, socket) do
  # should do the equivalent of:
  start_time = System.monotonic_time()
  prefix     = #{String.downcase(__MODULE__}).#{event}
  opts       = [] # any tags?
  telemetry
    .execute("#{prefix}.start", %{time: start_time},%{socket: socket, options: opts})
  socket = assign(socket, %{telemetry_event_prefix: prefix})
  # execute body with new socket
end

defmeasured render(assigns) do
  # should do the equivalent of:
  duration = System.monotonic_time() - start_time
  prefix   = assigns.telemetry_event_prefix
  opts     = [] # any tags?
  :telemetry
    .execute("#{prefix}.stop", %{duration: duration}, %{conn: conn, options: opts})
  socket = assigns(socket, :telemetry_event_prefix, nil)
  # execute body with assigns
end

def execute_before_render_callbacks(assigns) do
  assigns.before_render()
  # |> invoke function body with updated assigns
end
  • Macro needs to switch on function type -> handle_event or render. Should just call to function if any other type.
  • Don't need to register per-event b/c LV process will only work on one event at a time. So we are sure that render is for the event that we just registered a process for. But we should track event name, only for reporting, otherwise how do we know which event it is that we just checked and reported duration for.
  • No need to clear prefix from socket assigns before rendering b/c it will update as soon as next event is received? What about handle_info tho? Assume we will measure that too. Either you're using telemetry to measure duration of all incoming messages or you're not. No way to enforce this tho :( Better to clear prefix though and not assume that every message is instrumented.
  • Have to manually attach Telemetry event handler for each telemetry event. Either we attach once for the LV module start/stop and use tags to be more granular about event type or user is on the hook for attaching handler for each event's start/stop. I'm leaning towards option 1 but have to play around with tags more first.

Implementation Plan

  • Simple live view with three events--three buttons that you click to change color and each one has a sleep for a diff amount of time.
  • Instrument duration of "request/response" for each event type, register LV telemetry handler to receive telemetry events, that handler can send to our dashboard LV. Question: Same telemetry event handler for all events or separate for LV vs. application? I.e. separation of concerns with telemetry event handling modules or just one giant one?
  • prob. want to play around with metric label names and tags

Instrumenting Phoenix with Telemetry + StatsD

Resources

Next Steps

  • Can the reporter support Dogstatsd events? Can we hack it?
  • Which telemetry events is Phoenix/Ecto/etc emitted for us for free?

  • Run statsd to view output for each of the mapped metrics

TODO

  • Success/failure web request response instrumentation
  • LiveView metrics with channel joined and channel handled_in -> can't be done OOTB, blog post should explain, show channel source, link to LV issue
  • Three custom metrics:
    • Worker polling
    • Custom event polling
    • Telemetry plug
    • LiveView handle event duration and timer
  • VM metrics with polling
  • Visualize DD reporting by using DD formatter but running regular statsd, grab log statement from error message

Notes

  • We're instrumenting for free:
    • Database query duration and counts
    • HTTP request duration and counts
    • VM metrics
  • Telemetry event handling for free with Telemetry metrics module--can emit any event with :telemetry.execute (is this Erlang??) and don't need to define and attach custom handle module.

Blog Post

  • What is observability? What is instrumentation?
  • Common needs: web requests, database queries
  • Show the DIY - define an event + module, attach, custom log in handler module to report, log, etc. This might be a good place to look under the hood at ETS.
    • Reporter calls telemetry.attach
    • Look in telemetry.erl:
      • attach stores handler modules with associated events in ETS
      • execute looks up the handler for the event in ETS and invokes it
  • This is all abstracted away with Telemetry metrics!
  • OOTB instrumentation with Elixir Telemetry
    • We'll get web requests, database queries, VM monitoring
    • Implementation
      • Use Telemetry package
      • Establish module that defines which events you are listening to--this attaches them to the default handler.
        • Go through all of the OOTB events and link to source code
        • Look at source code in Phoenix that emits those telemetry events.
        • Tagging - slice up HTTP requests by contoller + action; DB queries by source and command. Tags become part of metric name in standard statsd formatting. Custom tag values functions
        • Note on Datadog formatter
          • Tags translate into metric tags (show the mapping)
          • Can leverage prefix, global tags, HTTP route tag now more usefully
  • Custom instrumentation -> not necessary, any event can be handled by one Telemetry module importing Telemetry.Metrics
  • Instrumentating LiveView with Phoenix's OOTB Telemetry events - CAN'T! Worth noting and comparing to Phoenix channel OOTB telemetry events, link to issue.
    • Custom duration and count instrumentation for
  • Telemetry under the hood - trace the flow of Phoenix/Ecto/app code emitting event and telemetry looking up event handle and calling it. Look at tags, etc.

Questions

Ecto Telemetry Event Source Code

To Do

  • Post 1: Intro to Telemetry in Elixir (covers: intro to obs, getting starting with hand-rolled approach, Telemetry under the hood)
    • What is observability/why do we need it? What's so great about getting it with Telemetry lib?
    • DIY metrics with Telemetry lib -> start with dummy Quantum app and emit event for every sign up (counter and duration)
      • Define handler with callback. That callback does some reporting to StatsD, but can dummy this up.
      • Attach handler to event
      • Execute event
    • Under the hood
      • Telemetry attach adds to ETS
      • Telemetry execute looks up handler in ETS and invokes it
  • We need abstraction! Right now, we hand-rolled:
    • Handler module definition and callback
    • Reporting code
    • Calls to attach
    • Even our call to execute seems kind of onerous--plenty of stuff that everyone would want to instrument (HTTP request counts and durations, look at success/failure responses, Ecto query times)
    • Elixir abstracts a lot of this away!
      • Lots of OOTB events emitted--baked in telemetry events executed from Phoenix and Ecto source code and provides a family
      • No need to define custom handlers, reporting logic and enact attach calls thanks to Elixir's family of Telemetry libs--metrics, polling, reporters.
    • Post 2: OTTB Instrumentation with Telemetry Metrics, Polling and Reporters (covers OOTB instrumentation, usage of reporters, adding "custom" events with little effort or custom code)
      • Up and running:
        • Define module that uses telemetry metrics
        • Declare which OOTB events you will listen to in your metrics function
        • Start supervisor with Statsd reporter, VM polling in application.ex
        • Closer look at events
          • Each event source code, map execute to metric func, view in statsd and dogstatsd
      • Under the hood to see that reporter calls attach, stores its own module name with event
        • Telemetry calls execute, which looks up handler and invokes handle_event
        • Reporter's handle_event contains all the statsd/udp logic, uses metrics struct definitions to format metrics for statsd and sends traffic

Where to put custom event section? How to sequence "closer look at events" vs. "metrics + reporter under the hood"? Better to see it wired all up and then closer look at events maybe? Maybe keep the hand-rolled sign-in event but get rid of the custom module and attachment call, instead move that into new telemetry module. Then show it all wired up, including looks under the hood. Then replace with OOTB metrics, link to source code, list all helpful metrics. Maybe leave out LV entirely.