New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(earliest): Handlers that execute ASAP in the req/res lifecycle #1756
Conversation
@retrohacker would it make sense to use the same middleware API that we use for othes (pre, use etc.)? So turning this into a new Chain? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel this deviates a little bit from existing chain
ed middleware behavior that users have come to expect.
Also, the use-case (shedding load immediately) is very specific in nature. I feel the better approach is to allow users to add a single synchronous function
returning true/false
that determines whether restify server should take any more requests for further processing or not. So basically I am proposing that we should not try to introduce a chained middleware stye function list but a synchronous hook for users to latch onto and determine if the server should process the request or not. This proposal is specific to code example in the PR.
lib/server.js
Outdated
@@ -117,6 +117,7 @@ function Server(options) { | |||
|
|||
this.onceNext = !!options.onceNext; | |||
this.strictNext = !!options.strictNext; | |||
this.earliestChain = []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be a new Chain()
instead of an array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or the reason for using an array is because the earliest chain middlewares are expected to be synchronous and always return a boolean which is not the same behavior for Chain
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, even more so now though. We've discovered that (unsurprisingly in retrospect) breaking the event loop between the load shedding evaluation and inflight request accounting introduces a race condition. Depending on how the accounting done, this will result in either too many requests being shed or not enough.
Forcing them to be synchronous appears to be effective at maintaining the correct inflight request count under load.
lib/server.js
Outdated
* return true; | ||
* }) | ||
*/ | ||
Server.prototype.earliest = function earliest() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unclear about what is the benefit of introducing another phase called earliest
. I feel the pre
phase should be sufficient. The order in which the pre
middlewares are setup should be sufficient to solve this use case. Maybe I am misunderstanding something here. Do you mind helping me understand the reasoning behind it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, sorry this wasn't clear. I followed up with a comment about the performance delta between the two handlers.
lib/server.js
Outdated
// this request before we do any work on it | ||
var earliestChain = self.earliestChain; | ||
for (var i = 0; i < earliestChain.length; i++) { | ||
if (earliestChain[i](req, res) === false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking the example you mentioned in code comments :
sever.earliest(function(req, res) {
if(server.inflightRequests > 100) {
res.statusCode = 503;
res.end();
return false
}
return true;
});
If we call earliestChain
functions here and say inflightRequests
count is 101, then the subsequent requests will always be sent a 503 . And there won't be a way to even track these incoming requests and no lifecycle events will be generated. Is this what we are expecting here?
I think at least after
event should be emitted (even if we return false
) to help keep track of such an immediate shedding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comes down to performance, I agree in principal, but the trade-off is pretty severe.
I'm wodnering if better approach would be to expose handle like Express does const http = require('http');
const restify = require('restify');
const httpServer = http.createServer();
const server = restify.createServer();
const PORT = 3000;
const INFLIGHT_REQUESTS_LIMIT = 100;
httpServer.listen(PORT, function onHttpListen(req, res) {
if (server.inflightRequests > INFLIGHT_REQUESTS_LIMIT) {
return res.send(302);
}
return server.handle(req, res);
}); |
+1 to @rajatkumar's comments. I think more than anything we want consistency in the metrics and lifecycle events, even if it comes at the cost of some throughput. @retrohacker are we able to quantify the delta between the implementation in this PR vs shedding in the first installed |
The difference between On my machine, a server under heavy load (200 concurrent connections and an inflight limit of 50) It's important to note that these overheads are calculated without I worry the trade-off we would make to get consistency in the metrics and life-cycle events is that we would be unable to withstand any significant increase in traffic, irrespective of how load shedding is tuned. If we want effective load shedding, I don't see a way forward that exercises the restify framework.
|
Assuming the load shedding implementation handles it's own accounting, it is still possible to capture statistics about the behavior of load shedding. For example, you can keep a long running counter of the number of requests that are being rejected at the front door, and you can log specific headers (since those are parsed before the Node.js It's also worth noting that we already capture a lot of information about these requests at our cloud gateway, and would be able to track load shedding with those metrics as well. There are alternative methods of gathering insight which are lower overhead than pushing the request through restify. |
I've updated this PR to match @hekike feedback, using the more generic |
Reverting to the original proposed API I still like the handle API, and think we should expose it, but as a more generic alternative to this API where new codebases can opt in to that pattern before heavily investing in restify's internal http server. |
f74a1da
to
7db83a5
Compare
I've been reflecting on your feedback @rajatkumar and @DonutEspresso I explored Using chain carries about a 50% impact to overall throughput, but outperforms healthy traffic. I was trying to dig into why, and discovered that making Moving the inflight request logic up around the earliest chain in Keeping a consistent interface doesn't appear to have a massive overhead. Seems like chain w/o fix
chain w/ fix
synchronous
|
No matter what I do with async stuff, I end up in a situation where the accounting for load shedding is wrong. Unless inflight accounting is handled by load shedding, any break in the eventloop introduces race conditions. With inflight accounting happening before the With inflight accounting working the way it works today, you can let too much traffic through in one tick of the event loop, saturating the server. Starting to lean towards the original form of the PR. I've been noodling on using only a single synchronous handler, though that seems to complicate plugins. With a single handler, we will only have one place in the codebase where we can attach an |
Alright, hope this is the final version of this PR. Sorry for the churn! After digging in, it seems any attempt at asynchronous accounting of inflight requests is going to introduce race conditions. This appears to be true for our current implementation of InflightRequestThrottle as well, which let's through more traffic than it should. The only effective way I can find to hold the door on the inflight limit is to make the inflight request accounting and load shedding synchronous, so the evaluation for both happen on a single request before yielding the event loop. The current form of this PR has strict synchronous load shedding and inflight request accounting. Here are the benchmarks for this run:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solid! Barring comments I think this is close to being ready.
Now the real discussion comes - I am not a fan of the earliest
naming 😂 (don't hate me). I'm not sure why, but the name just doesn't resonate with me. Some other alternatives: server.first()
, server.firstSync()
(if we want to emphasize the sync nature of it).
Thoughts?
* }) | ||
*/ | ||
Server.prototype.earliest = function earliest() { | ||
var args = Array.prototype.slice.call(arguments); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we dropped Node6, any reason not to use rest parameters here?
assert.arrayOfFunc(...func);
this.earliestChain = this.earliestChain.concat(...func);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Linting fails.
I originally used const
and let
everywhere but make prepush
failed, so I changed everything back to good ol' ES5.
test/server.test.js
Outdated
}); | ||
}); | ||
|
||
test('earliest chain should allow multiple handlers', function(t) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we should add a test that confirms inflight numbers are accurate after an earliest rejection event?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good point 👍
👍 on In my head |
lib/server.js
Outdated
* Takes one or more functions. | ||
* @returns {Object} returns self | ||
* @example | ||
* sever.first(function(req, res) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/sever/server
lib/server.js
Outdated
* @returns {Object} returns self | ||
* @example | ||
* sever.first(function(req, res) { | ||
* if(server._inflightRequests > 100) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the sample code be using private vars? Any reason not to use the public inflightRequests()
API?
lib/server.js
Outdated
* | ||
* The only work restify does for a first handler is to increment the number of | ||
* inflightRequests prior to calling the chain, and decrement that value if the | ||
* handler returns false. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can set more context around this feature and and the "bare metal" nature of it. e.g., using server.first()
has some trade offs:
- Standard restify server lifecycle events like
after
are not triggered for these requests - You do not have access to restify req/res APIs (because we modify the prototype they're there, but without state being setup it's unlikely to work correctly for many scenarios)
- No req.timers for
server.first
- Domains do not apply, even if you have set the flag
etc, etc. There might be others but I think it's worth calling these out (DANGER: BREAK GLASS ONLY IN EMERGENCY
).
Usage
Performance for Above Example
Attached flamegraph leads me to believe there isn't much room for improvement over this.