Skip to content
paulcleary edited this page Nov 10, 2015 · 4 revisions

Money is a tracing and monitoring library that allows systems to correlate events that take place across the system together. It was purpose built to support distributed systems, but any project that executes code based on events / requests can easily take advantage of the features that Money provides. If you have experienced difficulty in parsing log files to assemble a list of entries that relate to a single request, then money will be able to help you.

The current system is inspired by Google Dapper and Twitter Zipkin.

Terms

  • Distributed Tracing - the tracking of a request from its origin throughout all of the systems that participate to satisfy the request.
  • Trace - all information associated with a single request that is tracked across systems. A single trace forms a directed graph, starting from some origin or root, and spanning out to include all systems involved in satisfying the original request.
  • Span - a single step or operation in the distributed trace, Spans appear as nodes in the directed graph of a distributed trace. Spans typically (almost always) represent an invocation of some operation. The operation can be remote, but does not have to be a remote invocation. We use Spans to represent any operation that takes time, and is something that we want to know happened.
  • Edge - the edges between spans in a trace form a causal relationship between two spans. This allows us to understand the direct and indirect upstream and downstream dependencies of a Span.
  • Note - an individual data element recorded on a span.

Dapper and Zipkin call Notes "Annotations". This was confusing to a primarily Java team that work with Java Annotations on a regular basis.

What a Distributed Trace Looks Like?

How do we make distributed tracing work?

Distributed tracing works by propagating and extending the current Trace Context to all systems in the request. The Trace Context is comprised of:

  • trace-id: a UUID that holds the same value for an individual trace
  • parent-id: a Random signed long that represents the direct upstream Span of the current Span
  • span-id: a Random signed long that represents a unique id for the current Span

The process of propagating and extending the Trace Context looks like:

  1. The root / origin creates a new Trace Context. The parent-id and span-id are the same value! The process of origin will propagate that Trace Context across system boundaries by passing the Trace Context in an HTTP Header called X-MoneyTrace. The following is an example value:

X-MoneyTrace: trace-id=de305d54-75b4-431b-adb2-eb6b9e546013;parent-id=3285573610483682037;span-id=3285573610483682037

  1. The system that receives the header extends the existing Trace Context by creating a new Span. The incoming span-id becomes the parent-id of the new Span, and a new Random Long is created to become the new span-id value. The trace-id of the new Span is copied from the incoming Trace Context.

  2. When a system completes a request and sends a response, the "X-MoneyTrace" of the original request should be returned.

What does money buy me?

Everything! The following features highlight what Money gets you:

  • Managing Trace Contexts across Threads - today's modern applications use multi-threading in order to more efficiently fulfill requests and take advantage of multiple CPU cores. Money supports propagating traces by providing Trace Friendly versions of Thread Pools.
  • Managing Trace Contexts through Scala Futures - Scala Futures are a way to implement parallelism in a system. Money supports propagation of the Trace Context across Scala Futures, including across different Execution Contexts
  • Automatically Extending incoming HTTP Requests - Money provides a standard Java Servlet filter that will inspect inbound HTTP requests for the X-MoneyTrace header, automatically creating a new Span if the header is found. The Servlet Filter will also return the original header in the response.
  • Automatically sending the X-MoneyTrace header to downstream systems - Money wraps the common Apache Http Components to detect a Trace Context, and automatically create the appropriate X-MoneyTrace header for you on outbound HTTP requests.
  • Timers - Money supports the ability to start and stop timers in your code, so you don't have to manage timers explicitly yourself.
  • Spring 3 and 4 modules - using Spring AOP, makes it simpler to support tracing in Spring enabled applications.
  • AspectJ module - if you are not using Spring, you can still take advantage of simple, unobtrusive tracing by using AspectJ
  • JMX monitoring - you can monitor the health of the Money system through JMX beans

What is captured?

Money will capture the following for you, without having to record anything yourself:

  • span-name - the name you give to a span
  • span-duration - how long the span took to execute, from "start" to "stop". Money captures duration in microseconds
  • span-success - the result of the span
  • app-name - the name of the application that is using money
  • start-time - the UTC start time in microseconds
  • host - the hostname / ipaddress of the source system

In addition, the Money Http module will provide the following metrics:

  • http-response-code - the http response code received after making a request

Where does my Money go?

Money data can be configured to go to any number of destinations:

  • Log Files - Money uses SLF4J, and we use logback by default
  • Graphite - You can emit numeric span data and metrics to Graphite
  • Kafka - You can emit span data to Kafka. Span data is encoded using Avro.
  • JMX - You can log standard span metrics to JMX.

You can also create your own "Emitter" to handle span data however you like. Money provides an Emitter framework that allows you to pretty much do whatever you want with the Spans.

Show me the money!

Here is a sample log entry from a single trace span in a RESTful web application. This span recorded the execution of a single request against an external service. This specific request failed with a 500 error due to a timeout.

Span: [ span-id=-460900382554701468 ][ trace-id=de305d54-75b4-431b-adb2-eb6b9e546013]
[ parent-id=-460900382554701468 ]
[ span-name=accessLog ][ app-name=Pulsar ][ start-time=1412550594494 ]
[ http-response-code=500 ][ span-success=false ][ span-duration=120004.0 ]