Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Crosslinking Synthetics with APM #265

Closed
vigneshshanmugam opened this issue Apr 29, 2021 · 13 comments · May be fixed by #595
Closed

[Proposal] Crosslinking Synthetics with APM #265

vigneshshanmugam opened this issue Apr 29, 2021 · 13 comments · May be fixed by #595
Assignees
Labels
enhancement New feature or request

Comments

@vigneshshanmugam
Copy link
Member

vigneshshanmugam commented Apr 29, 2021

Crosslinking Synthetics with APM

Provide visibility into how the synthetic journeys are executed and what actions are happening inside every step.

Linking via trace headers (traceparent)

The previous proposal of linking between the Synthetics/Heartbeat is to generate a unique Traceid for each run and propagate the trace context details to the other APM agents which could pick up the id and generate the transactions/spans respectively.

How this would work from the Synthetics agent,

journey('Synthetics + APM', ({ page }) => {
  step('propagate traceparent ', async () => {
    await page.route('**', (route, request) => {
      if (request.isNavigationRequest()) {
       // add trace headers for all requests.
        const headers = {
          traceparent:
            '00-dc739d61ca520efa486e48aae80a66c7-4de1afd949fe4b1d-01',
          ...request.headers(),
        };
        route.continue({ headers });
      }
      route.continue();
    });
    await page.goto('http://localhost:8080/index');
  });
});

Note: Backend server running on port 8080 is running Node.js agent and injects RUM agent for the index page.

The above code propagates the trace headers traceparent to all the backend servers which would help identify the overall trace from serving the index page to what requests went inside that page since we use RUM and also Node.js agent.

Screen Shot 2021-04-26 at 16 43 49

Synthetics as an APM agent

Instead of injecting the trace details for every run, Synthetics could work as an APM agent and start creating transactions for both journeys and steps which could be propagated to the other APM agents including Node.js and RUM. This would let the user visualize the overall picture from how the browser got launched to how each actions in journey and step impacted the whole test.

I have patched both Synthetic and RUM agent to make it work locally, See an example of how this would look like

Screen Shot 2021-04-28 at 16 57 54

Think about how we could leverage the Network details in Synthetics with APM and RUM.

Both approaches will result in browser ignoring the DISK/HTTP cache as they modify the headers before the request is being made to the server. We might need to dig more and figure out how to do this without affecting cache. However as its a test mode, it might not be a big of a deal.

Would like to thank @spalger from Kibana Operations team for brainstorming.

/cc @graphaelli @andrewvc @paulb-elastic

@vigneshshanmugam
Copy link
Member Author

vigneshshanmugam commented Apr 29, 2021

This proposal would potentially replace #245 and elastic/uptime#302 if we go ahead with the mentioned approach.

@andrewvc
Copy link
Contributor

In the second proposal, would we be able to keep the context and trace backend calls as well?

@vigneshshanmugam
Copy link
Member Author

@andrewvc RUM API calls does inject trace parent headers and we could potentially use that to trace backend calls.

If we need more details, we can also create a span for each Synthetic Network activity and associate the details with RUM activity and through to every backend. However that would require some changes in both Synthetics and RUM.

@andrewvc
Copy link
Contributor

I'm a bit confused still about the second approach, does it involve running the RUM agent and sending additional transactions / spans from the synthetics node process? Or some other method

@spalger
Copy link
Contributor

spalger commented Apr 29, 2021

does it involve running the RUM agent and sending additional transactions / spans from the synthetics node process

Yeah, in the second proposal we would keep all the data we currently get but add a third APM client in the synthetics process which starts a transaction for each journey. The transaction for each journey would then be propagated to the server when a new page request is sent (the request which returns the HTML for the page). This allows all page loads and browser sessions active during a journey to be tracked by APM under a single parent transaction (without loosing the child transactions) and allows us to analyze all the transactions/spans that were created within a specific journey, or a step of a journey, or any granularity below that.

Did I get that right @vigneshshanmugam? Does this explanation help @andrewvc? Happy to chat about this synchronously if you like.

@spalger
Copy link
Contributor

spalger commented Apr 29, 2021

@vigneshshanmugam the second screenshot you have is showing several transactions collapsed, but all the data from the first screenshot is still available in the second screenshot right? It's just collapsed for clarity about the overarching view this strategy provides?

@vigneshshanmugam
Copy link
Member Author

Did I get that right @vigneshshanmugam?

Yeah that is 💯

It's just collapsed for clarity about the overarching view this strategy provides?

Exactly, I just collapsed to show the overall graph, but all data still appears the same like we have now, But we would have to change the RUM agent Synthetics code as its pointed out in my previous comments.

@spalger
Copy link
Contributor

spalger commented Apr 29, 2021

But we would have to change the RUM agent Synthetics code as its pointed out in my previous comments.

I missed that the RUM agent required modification, why was that necessary?

Additionally, I'm noticing now that the spans for the two steps in the journey start at the same time in the waterfall, is that a bug in the way that synthetics runs steps or creates spans? The second step shouldn't start until the first step is complete right?

@vigneshshanmugam
Copy link
Member Author

I missed that the RUM agent required modification, why was that necessary?

RUM agent always treats the root transaction starts from the browser, But in synthetics that notion changes and we have to introduce a new config pageLoadParentId or something similar which would create transactions with parent id pointing to the synthetics root transaction.

Please Ignore the timing and other span bits for the time being, its super hacky and just used them for the showcase to see how things would pan out.

@graphaelli
Copy link
Member

In the first option, can we know if the trace was initiated by a synthetic test?

Since the root transaction defines the trace, grouped by name and originating services, adding the synthetic runner's transaction as the root of these traces will move these into their own group, that seems like a neutral change.

@Mpdreamz @felixbarny @AlexanderWert please feel free to chime in here with opinions.

@AlexanderWert
Copy link
Member

IIUIC, the second option supersedes the first one as it adds additional value of being able to track individual journeys, which isn't possible with the first option. Plus, option 2 would inherently provide a natural, dedicated entry point in the UI (through the dedicated, new "synthetics service / agent") for investigating the Synthetics related traces. With option 1, the root transactions / traces for synthetic requests would be mixed up with real requests to the same services, (as long as there is no additional information like a label that would allow differentiation / filtering), right?

One thought for the future: Since we are connecting Synthetics with APM, I think it would be quite valuable to be able to differentiate normal requests from synthetic requests down the whole path (even in downstream services / transactions) and also in derived metrics (throughput, dependencies, breakdown, etc.). This would allow to switch between showing only real requests, only synthetic requests or both for any service. This, however, would require baggage support in the APM agents to propagate the "this is a synthetic request" information down the calls.

@jasonrhodes
Copy link
Member

I'm just commenting because I'm very excited about the possibilities here, especially with the 2nd option as I understand it.

I basically want to set up a bunch of critical path synthetics tests on a CI box that runs outside of PR builds, let it run somewhat constantly, and keep track of the overall timings of all the various parts of those traces. If the average overall time for a synthetics journey changes drastically over a set number of runs/time period, we flag it and can investigate the trace to see which transactions increased in time. This would give us amazing insight from the UI side to know if we should focus on query performance, browser load, render performance, etc.

Also, these times-per-transaction within a given synthetics trace could be run through ML anomaly detection to find anomalous behavior, and we could use APM annotations to annotate PR merges, etc.

@paulb-elastic
Copy link
Contributor

Closing in favour of elastic/apm#823

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants