Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding callback to res.write and res.end for streaming support #80

Closed
wants to merge 20 commits into from

Conversation

jpodwys
Copy link

@jpodwys jpodwys commented May 8, 2016

I want to be able to use compression to enable GZIP on streamed responses that flush multiple chunks to the browser whenever the implementer wants. (This is especially useful when implementing a BigPipe algorithm such as with my express-stream library.) As a result, I need a sure way of flushing only after a chunk of output has been zipped.

Currently, the following code encounters a race condition because res.write is not blocking. As a result, it's possible that res.flush will execute before res.write completes.

res.write(html);
if(res.flush) res.flush();

I've confirmed the above by ensuring this works as expected:

res.write(html);
setTimeout(function(){
  if(res.flush) res.flush();
}, 10);

What I'd prefer to do, and what this PR enables, is the following:

res.write(html, null, function(){
  if(res.flush) res.flush();
});

@dougwilson
Copy link
Contributor

Thanks! This looks fine to me. Can you add tests and documentation as well? We cannot accept this pull requests without at least tests fully testing all new code paths. From looking at this, we probably need tests for at least checking that the callback works for .write, it works for .end and maybe that they are called in the correct order.

@dougwilson
Copy link
Contributor

Also thinking about the problem described, it almost seems like another (better?) solution would be to actually have the call to res.flush() internally queue up a flush for when all previous writes complete. This may have the added benefit to work more like what people would expect when calling .write() + .flush(). Thoughts?

@jpodwys
Copy link
Author

jpodwys commented May 8, 2016

Thanks for the fast response! I'd be happy to write tests and documentation, although that might have to wait until tomorrow.

As for your suggestion, you're saying it might be better to make it so .flush() is blocked until all of the queued write()s finish? That would certainly make compression users' code flatter and easier to read in the event they want to write() multiple things but only flush() once. However, assuming write() calls are synchronous, couldn't this also be accomplished by simply adding a res.flush() inside the final res.write()'s callback?

If the implementation you're envisioning is simple, I say let's go for your recommendation. Otherwise, this PR works just fine.

I'm trying to think through how we would determine that all queued write()s have completed so we can execute the queued flush(). Would you increment a counter each time write() is called and decrement that counter inside of the stream.on('data', ... listener so that, when the counter reaches zero and there is a pending flush(), go ahead with the flush? Then if a flush() is called, we first check whether the counter is greater than zero? Sorry if that sounds sloppy haha haven't spent much time thinking it through but I do like your idea.

@jpodwys
Copy link
Author

jpodwys commented May 8, 2016

I was doing some more thinking and it seems to me that providing callbacks is the most flexible solution. Let's say that, for some reason, a customer wants to execute a specific set of code only after a specific write or flush step has completed. If write does not provide a callback then this isn't possible. If flush is blocked by queued write commands but doesn't provide a callback then we've simply moved the race condition.

I still like your proposed solution, but it doesn't appear to mitigate the need for callbacks. So we could do only callbacks or we could block flushes on queued writes and allow flush to accept a callback. What do you think?

@dougwilson
Copy link
Contributor

Yea, I definitely agree that having the callbacks are useful for certain workflows. I guess I was mostly thinking about your original report that calling .flush() can result in unexpected flushing. For example, no one would question that calling .write('1') + .write('2) would result in '12', since it's done in the order called. If you had to wait for the callback from the first write to make the second write, well, that would be super annoying :) That was the context I was thinking about regarding the timing of the .flush() is all.

So perhaps from your description plus your proposal here gives us two separate tasks (two different pull requests):

  1. A pull request that adds the ability to specify callbacks on the functions to do something after the operation has completed.
  2. Actually fix the .flush() bug you are reporting, were by res.write('a'); res.flush() should not flush until after 'a' has actually been written. This is what users are expecting, and this is a "big bug" (in npm terms) if this is not what is happening.

Adding the callbacks is useful, absolutely, but not really a change that helps with the .flush() bug, only provides a stop-gap for people who just happen to know they have to call .flush() in the callback of a write, not sequentionally. Our own example in the README would be encountering this bug, so I think we need to at least fix the bug as our main focus, and we can always circle back around to this new feature in addition, if that makes sense.

@jpodwys
Copy link
Author

jpodwys commented May 9, 2016

Good points, lets fix it right! I'd be happy to help because I would really benefit from this in some production applications as well as in my open-source work referenced above.

Does the rough algorithm I outlined above seem reasonable, or do you have a better suggestion? I'm happy to code this if we can agree on an algorithm ahead of time.

@jpodwys
Copy link
Author

jpodwys commented May 11, 2016

I apologize for being a badger on this, but I was hoping we could discuss implementation so I know how you prefer I code this. Or, if you're coding it, please let me know and I'll stop bugging you :)

@dougwilson
Copy link
Contributor

Hey, sorry, just been busy catching up after a long vacation :) I'm not coding it currently, no, so there would be no duplicate effort if you were going to do it. As for the algorithm, I can't really say for sure without digging into the code, but it roughly sounds fine to me :)

@jpodwys
Copy link
Author

jpodwys commented May 11, 2016

OK thanks, Doug! I'll get on it and hopefully have a proposition (without unit tests and updated docs at first) in 1-3 days.

@jpodwys
Copy link
Author

jpodwys commented May 12, 2016

Doug, I've taken a first stab at making calls to res.write, res.flush, and res.end internally synchronous. The code is rough and does not work fully ATM, but I wanted to show it to you to get some kind of buy off on this approach before putting in any more effort.

What do you think?

@dougwilson
Copy link
Contributor

Hi @jpodwys, ignoring any roughness of the code, it does seem pretty different than your initial idea of just doing counting, and seems like it's probably a lot more complex than necessary. I would think you could just track when something was written to the zlib stream and then when that stream drained out. Knowing that state, you could then handle a flush by either just passing it though if zlib stream has been drained, or queue up all future writes to the zlib stream until it drains, then flush + write all and start the loop again. This would probably make it easier to implement back-pressure semantics, which the current WIP is lacking (and seems like it would be pretty difficult to add in).

@jpodwys
Copy link
Author

jpodwys commented May 18, 2016

I'm not having the time to implement the complete race condition fix we've outlined here like I thought I would. Are you willing to merge the addition of callbacks so users can get around the async issue in the mean time?

@dougwilson
Copy link
Contributor

Hi @jpodwys, I certainly can! This just brings us back to the comments in #80 (comment) to address here :)

@jpodwys
Copy link
Author

jpodwys commented May 23, 2016

I've rebased and updated the unit tests and readme.

.get('/')
.set('Accept-Encoding', 'gzip')
.expect('Content-Encoding', 'gzip')
.end(function(){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first argument to this callback is an err and the code here is not handling the error. Please add error handling.

@dougwilson
Copy link
Contributor

Awesome, @jpodwys! Is there a reason we have to change the example? I thought it was agreed to be a bug and that it needs to be fixed, just not in this PR. As such, unless we are going to put the word out there is a breaking change everyone needs to make to use the callback, I think the readme should not be changed at this time.

Also, it looks like you still have some outstanding tasks from #80 (comment), namely:

We cannot accept this pull requests without at least tests fully testing all new code paths. From looking at this, we probably need tests for at least checking that the callback works for .write, it works for .end and maybe that they are called in the correct order.

I don't see any test verifying the callback order, or even all the code paths being covered. An example of one of the un-covered code path is there is no tests that the callbacks are functioning when the response is being compressed (i.e. the callbacks are only tests in half of those ternaries).

res.setHeader('Content-Type', 'text/plain')
res.write('Hello', null, function(){
callbacks++
res.flush()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the test is labeled "when callbacks are used should call the passed callbacks", it doesn't sound like the test should be testing .flush() functionality. Is there a reason to call .flush() in theses callbacks? If it is testing something, should that be a different test, or can it be reflected in the name?

Copy link
Author

@jpodwys jpodwys May 23, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'll remove it. I was simply including .flush() for completeness since nothing makes it to the client without that call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. So perhaps that would call for two different tests? Testing the callback when not flushed vs testing with flushed? DO you think that would be a meaningful test?

@jpodwys
Copy link
Author

jpodwys commented May 24, 2016

I prefer the first option.

Doing the below, I'm able to determine whether the available _write and _end functions accept 3 parameters. Of course, this assumes that the person who may have overwritten an API or added a callback where there wasn't one before has also preserved the encoding parameter. Is it too strict to assume there are 3 parameters?

cb = (res._write.length === 3) ? cb : noop

You'll notice that in my latest commit, I attached _write and _end to res. I did this so that I can determine from the unit tests whether the pre-existing .write() and .end() calls accept a callback parameter.

This is getting a little strange--I've rarely seen assumptions based on argument list lengths. Any feedback is welcome. But at least there's progress--the tests now pass in all configured node versions.

index.js Outdated

var _end = res.end
res._end = res.end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cannot store the references as res._end and _write. There is no code where that check nothing is being overwritten, for example. Using this module twice will no longer function, etc.

Doing this causes too many bugs to be able to accept. Why did you have to change this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that doing this isn't going to work, but I documented why I did it in the comment just above this one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll notice that in my latest commit, I attached _write and _end to res . I did this so that I can determine from the unit tests whether the pre-existing .write() and .end() calls accept a callback parameter.

Ah, sorry, I missed that. Unfortunately that is not an acceptable solution, due to the bugs it is causing from actual real-world apps I dropped your PR into in the last few hours. There is other code from npm that has written to those variables and responses are hanging or causing a stack overflow now.

You will need you use a different method for that check in the tests, not compromise the self-contained nature of this module. For example, add a middleware prior to this on in the tests and check against that.

In fact, the necessity to do this seems like a critical flaw to this functionality, for example, another module overwrites the moths with the three argument form, and just passes them upstream. This will now erroneously think callbacks are supported and we end up in the same buggy situation we set out to solve. It seems the method of using function length is just too fragile and I encourage you to come up with a different solution.

@jpodwys
Copy link
Author

jpodwys commented May 24, 2016

It seems that the solution you proposed where we detect whether the upstream stream accepts callbacks is not reasonable. Essentially, I'd have to accept a black box function that could be any number of layers abstracted from the original res object and somehow determine whether it accepts and executes callbacks.

Another solution you proposed is to implement support for calling the callbacks at the correct time even if the upstream stream does not support callbacks. I would prefer to only implement it when it's not already natively supported, but because I can't tell whether it's supported, I would have to implement it 100% of the time thereby bypassing native support for it in newer versions of node. In order to accomplish this, I would need to either prototype into res, which I've already shown I can't assume I have access to, or listen for events, but the necessary events don't appear to exist in node 0.10 when not compressing.

This brings us back to the idea of forcing res.write and res.end to be synchronous. In order to make things synchronous, I need to know when an action completes so I can proceed to the next action. You recommended I listen for when a stream has drained, but this is not possible in flowing mode. I can switch the stream to paused mode so that I can listen to the readable event as shown in this example, but I don't know if you're open to the idea of implementing flowing mode manually in order to have more events available.

@dougwilson
Copy link
Contributor

So all the proposals so far were just my initial thoughts, off the top of my head kinds of thoughts. Without having the time to actually sit down to look into this new feature, that's pretty much the best I can do. I'm leaving it up to you to bring a solution to the table until I can get time to work on this (since this is an enhancement, not a bug fix, it falls at least in my enhancement queue (you can publicly view this queue at https://github.com/pulls?q=is%3Aopen+assignee%3Adougwilson+label%3Aenhancement+sort%3Acreated-asc). I can take some time out to consult on your feature, and comment on the implementation, issues, and provide possible solutions, but I just don't know the solution off the top of my head.

This may bring us back around to #80 (comment) on if you are actually trying to fix a bug, perhaps we should focus our efforts on trying to fix the bug vs trying to add a new feature that would enable you to work-around the bug? I'm not sure, it's up to you on the best approach you want to take.

@jpodwys
Copy link
Author

jpodwys commented May 24, 2016

In my latest attempt, I'm looking at http.OutgoingMessage.prototype.write.length to determine whether callbacks are supported. This does not run the same risk of being overwritten as res.write as it is independent of anything prior middleware do to the res object. I think the likelihood that someone is patching so deeply within node is slim-to-none and, if they are, they should accept that things are likely to break due to their changes.

Some changes you requested earlier to the unit test error handling still need to happen, but please let me know what you think of this approach so I know how to proceed.

@jpodwys
Copy link
Author

jpodwys commented May 24, 2016

Perhaps this latest commit still fails this test though:

For number 1, sniffing the version of Node.js would not cut it, because that does not indicate if the upstream stream actually supports callbacks or not (because the upstream res.write or res.end may have been overwritten by another module, for example, just like this module does :)

@dougwilson
Copy link
Contributor

Yea, I mentioned not testing for Node.js versions, because I have been burned multiple times trying to do this (and the io.js team stressed only doing feature detection, and they are now the Node.js team), especially with people these days running these modules on non-Node.js runtimes, for better or worse.

Currently there is a big push to get Express to work better with non-Node.js-core HTTP servers, while this PR is now in direct conflict with a core Express.js directive (this module falls under the expressjs organization, under the jurisdiction of the Express.js TC, within the Node.js foundation), as it tires it directly to the Node.js core HTTP server implementation. I am aware (from issues coming up and questions) that people are using this module without issues using spdy and http2 servers, among others. At least http2 does not have a prototype that leads to the one being checked here.

Another issue from jumping up and checking the "root write" is that it also glosses over people trying to use other middleware in their Express.js application. There are many popular middleware that overwrite res.write/res.end, for example express-session (https://github.com/expressjs/session/blob/master/index.js#L213), connect-jsx (https://github.com/jut-io/connect-jsx/blob/master/connect-jsx.js#L60), connect-livereload (https://github.com/intesso/connect-livereload/blob/master/index.js#L114), and pretty much every other middleware patching those functions I could find. This module already gets a lot of issues about people thinking that compression is not working, and back when connect-livereload had a bug where it incorrect tried to restore res.write/res.end, destroying our pipe, the issues were still flowing in, over a year after it being fixed, so bugs from using this module with others causing almost impossible to debug hangs is not something I am looking forward to answering for a long time to come :)

@jpodwys
Copy link
Author

jpodwys commented May 26, 2016

Noticed the link to the other issue here. As a note, I'm still trying to get synchronous executing working. I believe I've simplified my original algorithm quite a bit. Hopefully I'll have a new approach within a few days.

@jpodwys
Copy link
Author

jpodwys commented May 28, 2016

@dougwilson I've finally come up with an insanely simple way to make .flush() execute synchronously. It currently fails a flush test in node 0.8, but I think that can be retooled. It passes all 38 tests in all other supported node versions and maintains 100% code coverage, although I understand if some additional tests need to be added.

I'd like to know what you think of the code changes (+6 and -1 lines).

It relies on the fact that .write() already executes synchronously and that zlib.write() has always accepted a callback function.

I've tested it in the application that originally prompted me to open this PR and it works like a charm.

@dougwilson
Copy link
Contributor

Hi @jpodwys, it seems fine. I would like to understand why the test doesn't pass on 0.8 (to determine if it's actually an issue to be resolved) and also see an issue I would like to discuss. I don't want to diverge from this pull request which attempts to add callback support for res.write/res.end, so I would love it if you created that second pull request we talked about with those changes and we can discuss over there, to keep this one on topic :)

@jpodwys jpodwys mentioned this pull request May 28, 2016
@jpodwys
Copy link
Author

jpodwys commented May 28, 2016

OK I've moved it to #84 but it has all the commit history from this one with it.

@whitingj
Copy link

whitingj commented Jan 5, 2017

@jpodwys I was having the same problem you were but found a work around.

function writeAndFlush(res, content) {
  var needsDrain = !res.write(content, 'utf-8');
  if (needsDrain) {
    res.on('drain', function() {
      res.flush();
    });
  } else {
    res.flush();
  }
}

Basically what this does is it attempts to write the content. If stream needs to drain then it will wait for the drain and then flush. If the write was successful it will just flush and move on.

@dougwilson dougwilson modified the milestone: 2.0 Jan 6, 2017
@dougwilson dougwilson force-pushed the master branch 3 times, most recently from d7bb81b to cd957aa Compare May 30, 2018 04:09
@jpodwys
Copy link
Author

jpodwys commented Jan 16, 2019

@whitingj sorry for the extremely late response, I just stumbled onto this PR again. Thanks for the note!

@jpodwys jpodwys closed this Jan 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants