Serverless Distributed Tracing: Trace Extractor/Propagation for batched events #317

lucashfreitas · 2022-10-21T03:50:51Z

I am working with a serverless event drive architecture that uses Event Bridge, SQS, and Lambda:

The lambda function (wrapped by datadog-cdk construct) pushes message to Event Bridge.
The event bridge has SQS queues as targets and forward the messages to it.
Lambda function (wrapped by datadog-cdk construct) consumes the message from the SQS queue and sends them into Event Bus again and we move back to step 1.

Our goal is to enable end-to-end traces for this architecture.

1. We wrapped all lambdas (publishers and consumers) with `datadog-cdk` construct but this produced multiple disconnected traces:

Following this documentation https://docs.datadoghq.com/serverless/distributed_tracing/serverless_trace_propagation/?tab=nodejs, I would expect that the trace propagation happens automatically as mentioned here:

Tracing many AWS Managed services (listed here) is supported out-of-the-box and does not require following the steps outlined on this page.

But the traces are not being associated/propagated and I am seeing multiple disconnected traces - not sure if this happens because event bridge invokes the lambda asynchronously, so maybe we really need to "manually" extract the traceContext and pass it through the _datadog field in the event bus.

2. We have implemented a manual trace extractor propagation following datadog documentation:

We have implemented a manual trace propagation following the docs/tutorial https://docs.datadoghq.com/serverless/distributed_tracing/serverless_trace_propagation/?tab=nodejs here and we managed to connect the tracing, but we are now facing another issue to handle/propagate trace for batched events on Lambda functions.

All the examples/docs for trace extraction, even the handler wrapper provided by this library expect to return a single trace per lambda function.

import {datadog} form "datadog-lambda-js"

const lambdaHandler = (event, context) => {
 //my lambda handler
}

export const handler = datadog(handler, {
traceExtractor: (event, context) => {
//datadog expects to return a single trace data here.
}}

If we decide to export a file on the function and set the DD_TRACE_EXTRACTOR we also return a single object.

The issue is that our lambda function actually handles a batch of events coming from an SQS queue (10+) and each of those events might have a different trace context but we are not sure how to handle this using this library or perhaps we should manually use dd-trace library to automatically create the trace and send it to datalog for each event in the batch.

Can someone help or provide if that's not possible to achieve using this library and we really need to use dd-trace to manually create and send the trace to datadog?

Thanks

The text was updated successfully, but these errors were encountered:

astuyve · 2022-10-24T15:29:45Z

Hi @lucashfreitas - thank you for your detailed note.

For EventBridge, as of today we only support Lambda as a direct target to automatically decode and pass trace context. I think we could explore expanding that to SQS/SNS as well as other services we traditionally support, so please feel free to reach out to your account manager to open a feature request.

What I would suggest is doing what you've already done - which brings us to your second point.

Today, Datadog APM doesn't support merging multiple upstream trace contexts. So you'll need to pick one of the SQS messages and use its context as your upstream trace context for the rest of your function execution.

Please feel free to reach out with any additional questions.

Thank you!

lucashfreitas · 2022-10-24T19:58:15Z

Hey @astuyve, thanks for answering that quickly.

Today, Datadog APM doesn't support merging multiple upstream trace contexts. So you'll need to pick one of the SQS messages and use its context as your upstream trace context for the rest of your function execution.

We are trying to do that but somethings are still not clear, e.g how to define a custom extractor for multiple events inside a lambda. Currently, the extractor function has a 1-to-1 relationship with the lambda handler function as per the example, but how we would extract traces inside a for loop? e.g a lambda handler has 10 events as the payload, so we would need to get the trace context 10 times and send 10 additional traces of the lambda function execution.

We are opening a ticket with datadog to track this.

Thank you

lucashfreitas · 2024-05-06T02:36:25Z

@astuyve does datadog-lambda-js have any updates for propagating traces using batched events?

duncanista added the enhancement New feature or request label Jan 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serverless Distributed Tracing: Trace Extractor/Propagation for batched events #317

Serverless Distributed Tracing: Trace Extractor/Propagation for batched events #317

lucashfreitas commented Oct 21, 2022

astuyve commented Oct 24, 2022

lucashfreitas commented Oct 24, 2022

lucashfreitas commented May 6, 2024

Serverless Distributed Tracing: Trace Extractor/Propagation for batched events #317

Serverless Distributed Tracing: Trace Extractor/Propagation for batched events #317

Comments

lucashfreitas commented Oct 21, 2022

1. We wrapped all lambdas (publishers and consumers) with datadog-cdk construct but this produced multiple disconnected traces:

2. We have implemented a manual trace extractor propagation following datadog documentation:

astuyve commented Oct 24, 2022

lucashfreitas commented Oct 24, 2022

lucashfreitas commented May 6, 2024

1. We wrapped all lambdas (publishers and consumers) with `datadog-cdk` construct but this produced multiple disconnected traces: