Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use diagnostics_channel in layers #96

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Qard
Copy link

@Qard Qard commented Sep 17, 2020

This is a proof-of-concept implementation of the diagnostics_channel feature I'm working on in Node.js core. The value here is for APM products to be able to capture changes to routing state through the lifecycle of a request. This should of course not be merged until the module exists in Node.js itself. I'm just creating this now to prove the value to userland and to make sure a more complex framework can benefit from it.

APM vendors want to use events from diagnostics_channel to know when a route has changed and what the full routing path to the route is so they can bucket trace data based on the full routing path. For example, requests to /hello/world and /hello/stephen might map to a /:name route on a nested router under /hello which should be able to produce /hello/:name as the full routing path. This requires some method of tracking the active layer so the point at which the diagnostics_channel events are published there will be valid routing information present for the subscriber to gather from the request.

I'll leave this as a draft PR until diagnostics_channel lands in Node.js core.

Copy link

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the overhead of adding diagnostics_channel to express enabled and disabled?

lib/layer.js Outdated
@@ -65,9 +78,23 @@ Layer.prototype.handle_error = function handle_error(error, req, res, next) {
return next(error)
}

req.layerStack = req.layerStack || []

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend against creating a new array for every request if diagnostics_channel is not enabled.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair. Seemed like a possibly useful feature outside of the diagnostics_channel stuff itself, but sounds like there's already another layer-tracking thing in discussion, so I'll take a closer look at that. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May also need to consider what the expected behavior is when different versions of this module are mixed together.

@wesleytodd
Copy link
Member

wesleytodd commented Sep 18, 2020

I think this is actually two features added (not that I think that should pose a problem). It both tracks the layers hit on a req and also reports to the diag channel. The request for tracking the layers is long standing, and there are a few other implementations we would want to consider before we would land this. If you don't want to take that discussion on, but still want to see about landing the usage of this feature you might consider removing that.

FWIW, I think this approach is fine with that layer array, but it will still require most folks to iterate to determine the last "route" layer which is typically what folks want when they ask for this feature.

Existing PR for "layer tracking": #86

lib/layer.js Outdated Show resolved Hide resolved
@dougwilson
Copy link
Contributor

Yes @wesleytodd , I thought the same thing regarding this PR. Looking at the tracking this adds, I think it is done differently from what I've seen the others doing so far, including that linked PR. Specifically, it doesn't store the matched layers/routes on the request, instead it is trying to keep just a stack of the current hierchy of where it is at the moment, popping them back off once it leaves (vs keeping them for tracking which ones matched).

@wesleytodd
Copy link
Member

Yeah, and maybe that is not a problem, but also if we can solve the long standing ask and also deliver on the needs to reporting to the diag channel I think we would want to.

@Qard
Copy link
Author

Qard commented Sep 18, 2020

Yeah, so the layer matching bits are basically because what an APM will want from this diagnostics_channel data is to be able to lookup what is the full path to the route I'm at right now.

Ideally we would be able to attribute any async activity to the route or middleware it originated from. I had initially tried to use the Span API to also include an end event to encapsulate each route or middleware, but ran into issues as it's easy to track when next is called but harder to track when a route "completes" by sending a response.

@wesleytodd
Copy link
Member

For the "send a response" case you can just use on-finished, and then do your cleanup on either next or in that callback, whichever is hit first.

@dougwilson
Copy link
Contributor

Gotcha. I see you are just string concatening together regexps and arrays in your end result path string, so I'm not sure that is going to work well as a general feature.

@Qard
Copy link
Author

Qard commented Sep 18, 2020

Doesn't look like on-finished is currently a dependency, so I'd have to add it. It would also somewhat complicate the logic of how diagnostics_channel is used here.

As for the path string, it's a concatenation of the routing path strings before being converted to regex. For example:

const router = new Router()
const hello = new Router()

router.use('/hello', hello)

hello.get('/:name', ...)

This router structure would result in a concatenated path of /hello/:name. Basically APM products want to know how to map individual routes to the full routing path it takes to get to it. We currently do that by monkey-patching express, which is fragile and has performance implications. We'd prefer to be able to receive that information through diagnostics_channel and not have to patch at all.

@dougwilson
Copy link
Contributor

dougwilson commented Sep 19, 2020

As for the path string, it's a concatenation of the routing path strings before being converted to regex

It sounds like you are assuming the users are passing a string in the first place. Take the following router combo, though:

const router = new Router()
const hello = new Router()

router.use(['/hello', /^\/(?:good)?bye/, '/aloha'], hello)

hello.get('/:name', ...)

@Qard
Copy link
Author

Qard commented Sep 21, 2020

That's fine. It will use the stringified form of the regex. The purpose is just to have a unique-to-that-route name to match transaction data to. If that name is /hello/:name or /^\\/(?:good)?bye//:name doesn't really matter, just that it has a name that can be relatively easily read by the user and traced back to the location in their app code. Some APMs may have support for keeping the routing fragments separate, but most would just join them.

@dougwilson
Copy link
Contributor

Ah, I see. But of course this router allows you to declare the same route name multiple times, as it does not enforce that route names are unique. This is because of things like next() flow control. For example:

const router = new Router()
const hello = new Router()

router.use('/hello', hello)
router.get('/hello/:name', (req, res) => res.end(`hello, ${name}`))

hello.param('name', (req, res, next, name) => next(/^[a-z]$/.test(name) ? null : 'route'))
hello.get('/:name', (req, res, next) => res.end(`hola, ${name}`))

The above has effectively two different routes named /hello/:name that your code would identify, but they are two different unique routes. /hello/dan will invoke one while /hello/Dan will invoke the other.

lib/layer.js Outdated
} catch (err) {
req.layerStack.pop()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this just be put into a finally block?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'd run after the next(...) though, which would interfere with subsequent routes, as far as I can tell. Could do that if the next(...) on the following line was in a process.nextTick(...), but that'd have a performance hit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. But isn't the one within the try already running after the next call? Like if the middleware sync calls next then it will run after, but if the middleware async calls next it will be before.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, true. Probably a better place to put that code. 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, it looks like adding anything non-trivial to your tests seems to highlight the weirdness here. For example, I added the cors module to the nested routes test like the following:

  it('nested routes have multiple layers with paths', function (done) {
    var outer = new Router();
    var inner = new Router();

    inner.get('/:name', function (req, res) {
      res.end();
    });

    outer.use('/hello', cors(), inner);

    function end() {
      assert.strictEqual(joinLayerStack(handleRequest.request.layerStack), '/hello/:name/');
      done();
    }

    outer.handle({ url: '/hello/world', method: 'GET', headers: {} }, { end, setHeader: noop }, noop);
  });

But it failed the test listing the path as /hello/hello/:name/, as I would surmise that it is performing the pop operation too late, after the middleware is getting iterated through (the more middleware added, the more times /hello appears).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a possible fix for the layer timing issue. Not sure it's the best solution, but seems to work. Still unsure if the layerStack approach is the best way. Willing to accept any suggestions on better ways to track the overall routing state of the app. 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yea. It seems seems to have issues with the latest change keeping track of the stack. For example adding inner.get('/world', (req, res, next) => next()) above inner.get produces the route as /hello/world/:name.

As far as a best solution... I think perhaps it may be better to maybe start from the top so we can understand what, exactly, we are trying to accomplish in this pull request? The PR title and description only mentions adding hooks for diagnostics_channel, but almost all the following discussion does not seem related to the actual diagnostics_channel usage, and rather adding a completely new feature, independent of the diagnostics_channel feature.

Should these two things be split apart into two PRs (first suggested at #96 (comment)) or can the description be updated to describe what the goal of this PR is? That may help better focus the conversation and code work.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capturing routing information is the whole point of adding diagnostics_channel, so this PR doesn't really make sense without it. I'll update the description to elaborate on that need.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I get it, but let's think iteratively here, if you are interested in getting things landed instead of waiting for the entire pie to be built. For example, folks want to construct the path even outside of diagnostics_channel, so having access to the path is an independent ask of adding diagnostics_channel support. In addition, if you didn't have that layer stack hanging off the req object, you still already have access to a whole bunch of information, including that the request is being handled, that an error occurred, what the http method is, the http path, etc.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We get all that directly from http already. The only context we're really missing from express is the routing information. I only intended the layer tracking stuff in this to be used by diagnostics_channel. Maybe there are more general-purpose uses for that information, but I lack the context and the time to do anything about those right now. This PR was created primarily as a demonstration that the diagnostics_channel API is usable in more complex scenarios. It's not intended to be landed any time soon as the diagnostics_channel feature itself hasn't even landed yet. If you want something more complex from this, I'm probably not going to get to it for awhile.

@Qard
Copy link
Author

Qard commented Sep 21, 2020

I don't think it's a big issue that the paths aren't unique. It's just a bucketing mechanism. If a bucket might contain data from two routes that do different things but share the same path, it's not ideal, but there's not really any other great way to differentiate them. Most routes internally have some degree of branching anyway, so it's expected that every request in that bucket might not look exactly the same. It doesn't break anything if data from multiple routes winds up in the same bucket, it just makes the data maybe slightly less meaningful. Also, small side note: the HTTP method is also generally included in the string used as the bucket key.

I'm not concerned about two routes winding up with the same name, so long as that name can be instructive to the user on where in their app to look for the code. If they happen to have multiple routes with the same name, it's on them to look at those routes and figure out on their own which one a given trace came from. Route-naming is very much best-effort in APM. There's lots of cases where we get stuff like /static/* or something like that which will result in wildly different performance profiles and possibly very different execution paths. We just do our best to define reasonable buckets that are somewhat intuitive to the user and will help them to narrow down trace data to somewhat specific parts of their code.


function end() {
assert.strictEqual(joinLayerStack(handleRequest.request.layerStack), '/hello/:name/');
assert.strictEqual(handleError.error, error);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this validate that the handle error layer stack is pointing to the / path for the error handler that was executed that called res.end()?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we should be treating error handlers as top-level routes, that's an implementation detail. It makes more sense from the APM perspective to treat them as continuations of whatever route the error handler was reached from.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, interesting. I was assuming your wanted the layer stack to be the current stack of layers--you didn't note that certain layers were going to be treated differently. That is likely a very important point that was left out of your explanation.

So you are saying that even an error handler that is itself a route would not be listed? I.e. app.get('/blog/:slug', (err, req, res, next) = {}) It seems like when you have a complex app where you have a lot of error handlers all defined at a lot of different paths, it may be useful to understand which of your error handlers was the one that ran the error handling, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking a look, I'm not sure if this behavior is why the following will generate a layer stack showing the "route" is /hello/:name/hello/:val rather than showing it as /hello/:val when called as GET /hello/100:

    var router = new Router();

    router.get('/hello/:name', (req, res, next) => {
      if (!isNaN(Number(req.params.name))) next('route')
      else next()
    }, (req, res) => {
      res.end()
    });

    router.get('/hello/100', (req, res) => {
      res.end()
    })

done();
}

router.handle({ url: '/hello/world', method: 'GET' }, { end }, noop);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as a general comment: I'm not sure if you planed to clean this up later, so apologies if so, just wanted to call out that we don't want to be passing in mock-like objects to the router handling methods; they should be the real Node.js HTTP objects like all the other tests so we validate that everything is working as new Node.js versions come out and these objects change over time.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, the changes are all copied and pasted from the PR I made directly to express, which apparently did tests differently. I'll clean up the tests at some point, when I get back to this. 👍

@wesleytodd
Copy link
Member

@Qard I am working on getting this repo back on track for the renewed plans for v5. Are you interested in landing this still? Let me know so I can get it on our plans if so.

@Qard
Copy link
Author

Qard commented Mar 17, 2024

It would still be of value to have diagnostics_channel in there, yes. Though it probably needs reworking at this point to get it up-to-date. At the time we opted for targeting fastify as a routing framework to test diagnostics_channel with instead as express and related projects had somewhat stagnated at that time so we didn't see the changes as likely to actually land any time soon at that point.

I'll share with the team at Datadog to see if we want to put some time into updating this in the near future. No big deal if you're pushing for release soon though--we can add support later if necessary. Thanks for the reminder though! I had entirely forgotten I made this. 😅

@wesleytodd
Copy link
Member

Yeah this would likely be a minor release so not a big deal to wait, but I was just trying to wrangle all the open things to make sure we had a clear plan in place and this still does seem valuable to me. Let me know if you folks have time to work on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants