Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

overriden res.write() has altered flushing semantics #61

Closed
usernameisalreadytaken2014 opened this issue Oct 27, 2015 · 18 comments
Closed
Assignees
Labels

Comments

@usernameisalreadytaken2014

When writing via the regular res.write(), under normal circumstances, data always reaches the browser.

This is also the case when the connection is never-ending, and data is pushed to write() as it becomes available. For example when data comes in small portions interspersed with pauses.

The Nagle algorithm makes sure that data reaches the browser in this case. The default setting on Linux is to wait 40 milliseconds, and if no further writes have happened, flush the data to the network.

When the compression module is included via app.use(), this no longer happens. There is a buffer in the zlib library which holds on to the data forever, and it never reaches the browser.

This feature request is to implement the following:

  1. When a write() happens, set or reset a 40 millisecond timer.
  2. If the timer runs out, call zlib to flush() the data to the browser.

That way, we retain the semantics of the original res.write().

Implementation ideas below.

I imagine that one way to implement this would be a "nagle" writable stream running on top of the gzip stream.

When the "nagle" stream's pipe() method is called, it looks if the passed stream is "instanceof Gzip".

If so, set an internal property to reference the underlying stream, so that we can later call flush() on it when a timer runs out.

@dougwilson dougwilson self-assigned this Oct 27, 2015
@dougwilson
Copy link
Contributor

Unfortunately we won't be making such a change. You're completely welcome to publish a version to npm that implements this, if that is your desire :) The way this module works is also the same as if you had nginx between your client and the server doing compression; it would buffer it up to compress it as well.

@aegyed91
Copy link

aegyed91 commented Nov 7, 2015

@dougwilson Hey there, sorry for posting a non relevant question here.

Is that a correct assumption if nginx gzipping the static files i don't need to use the compression module myself? Like i dont have to use serve-static either.

Also if you got 3 minutes to run over this gist i would really appreciate it. 👍

@usernameisalreadytaken2014
Copy link
Author

@dougwilson

Unfortunately we won't be making such a change. You're completely welcome to publish a version

Hi Doug

Can you substantiate why you believe that this is not worth fixing?

(I know from personal experience that it sucks when someone comes along and points out that you're doing something incorrectly according to an overall design, user expectations or what not. That said, I'd much appreciate a logical discussion of the issue at hand.)

It seems to me that the current implementation breaks applications unexpectedly when the compression module is enabled because of this issue.

@dougwilson
Copy link
Contributor

Hi @usernameisalreadytaken2014, the reason we're not going to change it is because you are perfectly capable of implementing this is a user-land module.

It seems to me that the current implementation breaks applications unexpectedly when the compression module is enabled because of this issue.

This is not true and can happen with anything buffering. Just because the buffering in memory does not change this fact. If you have a reverse proxy front your Node.js server doing the buffering, your application would have the same issue. If you are trying to stream out, you need to disable the compression, for example setting the Cache-Control: no-transform header.

But in the end, this was really a question, and asking for us to implement these changes, which we don't see the need. Perhaps your case will be more compelling if you can provide a pull request, implementing the changes in this module, complete with full documentation and tests with full coverage. That would make a much more compelling case to add this feature, and can also allow us to discuss the implications of the given implementation.

@usernameisalreadytaken2014
Copy link
Author

This is not true

I beg to differ. The reason that I opened this issue is that my application broke the moment I enabled the compression module.

I expect anyone else using both SSE and compression to run into the same issue.

@dougwilson
Copy link
Contributor

The reason that I opened this issue is that my application broke the moment I enabled the compression module.

Then I'd argue your application was written incorrectly. Regardless, please provide the aforementioned pull request. Also, would you be interested in simply taking over this module? This would be the best way to make progress on your issue and provide the compress expertise the community needs from you.

@dougwilson
Copy link
Contributor

I expect anyone else using both SSE and compression to run into the same issue.

SSE will have this same issue on intermediate servers. Why are you not setting the Cache-Control: no-transform header?

@dougwilson
Copy link
Contributor

Your entire argument of fixing SSE behavior with a magical 40ms write timeout is even worse than how it is today, not only because it will hurt the compression for all other applications, like slowing reading data from disk, but also will cause people to scratch their heads for why their event takes 40ms longer to reach the client with compression than without.

Regardless, if you are passionate about pursuing this change, please provide a pull request with the necessary implementation, complete with full documentation and tests with full coverage.

@usernameisalreadytaken2014
Copy link
Author

Why are you not setting the Cache-Control: no-transform header?

Would that cause the compression module to behave any different?

If using this header gets rids of the stalls that adding the compression module causes, then this is a nice workaround.

(Will compression still work then? The first couple thousands of events are sent in bulk and are highly compressible, the latter arrive in drips and are reasonably compressible.)

@dougwilson
Copy link
Contributor

Why are you not setting the Cache-Control: no-transform header?

Would that cause the compression module to behave any different?

Yes, as is documented on the README.

If using this header gets rids of the stalls that adding the compression module causes, then this is a nice workaround.

Yes, it absolutely would.

Will compression still work then?

No, this header is acknowledged by the module and no longer transforms the request (compressing is a transformation of the request, as defined by the HTTP RFCs). The biggest problem with not including that header is that any intermediate proxy between your server and the client can buffer the request in the same way this module is doing, and kill your SSE functionality. There are many servers out there people use like Amazon Silk's speed boost, Opera and Google have similar public services. Not including that header is rolling the dice with your users being able to receive the events in a timely fashion.

@usernameisalreadytaken2014
Copy link
Author

it will hurt the compression for all other applications, like slowing reading data from disk

Yeah, that makes sense. The algorithm is basically Nagle and should only be used on network streams.

will cause people to scratch their heads for why their event takes 40ms longer to reach the client with compression than without.

The default Nagle timeout is 40ms, so the tail segment in a non-compressed stream (stream = no response.end() to flush) that doesn't fill the segment size fully should have the exact same behaviour. As far as I can tell, no difference?

@dougwilson
Copy link
Contributor

will cause people to scratch their heads for why their event takes 40ms longer to reach the client with compression than without.

The default Nagle timeout is 40ms, so the tail segment in a non-compressed stream (stream = no response.end() to flush) that doesn't fill the segment size fully should have the exact same behaviour. As far as I can tell, no difference?

I don't think you understand. For example, the system reads a chunk of 4kb from disk, and gzip compresses it down to 600b. The next 4kb buffer does not get read off the disk until 50ms later, but since that was longer than 40ms, this module will have sent that tiny, incomplete gzip window to the client, when it could have compressed more of that second 4kb buffer into the same gzip window, resulting in very sub-optimal compression for slow file systems. In fact, this can likely cause the response size to go up due to the overhead of the gzip window framing and dictionaries if the system reads are consistently below this 40ms flush window.

While we were talking here, I wondered how can you do SSE responses in other languages like Java and use compression. They all require the user to indicate when you want to actually flush our your event, aka call res.flus() in this module, or the code generating the SSE writes is directly handling the compression and not leaving it up to a middleware-like system.

If you don't want to go and make that pull request, can you at least point me to some other compression/SSE systems like this in other languages and stuff that implement the solution you are describing?

@usernameisalreadytaken2014
Copy link
Author

[... snip explanation: sub-optimal compression for slow file systems ...]

Yep, I misunderstood. Thanks for the explanation.

There's a similar effect when not using compression. For example, if the disk stalls for 50ms and 600 bytes have been read, the network stack pushes those 600 bytes out, including an additional overhead (TCP + IP headers, ethernet framing) after 40ms.

The link between client and server is probably a lot faster than 600 bytes per 10 msec (faster than ~0.5 mbps), so the extra overhead has likely long gone, disappeared onto the wire, once the disk returns with more data.

On a more massive scale, if the server with a slow disk is handling many simultaneous connections, that extra overhead could tip the total link bandwidth towards causing congestion to occur. There is a couple of options to circumvent that, for example the TCP_CORK option on the server-side socket to hold back until buffers are filled entirely before transmission.

I guess my point is that the default on a network stream is that the kernel guarantees that what you send() will reach the other end in a timely fashion. It does not guarantee to make optimal use of bandwidth to achieve this.

When the compression module is enabled, this is reversed; optimal use of bandwidth is guaranteed but your connection may stall.
(In particular when a stream happens to slow down.)

While we were talking here, I wondered how can you do SSE responses in other languages like Java and use compression.

Good question. I haven't touched Java for years...

other compression/SSE systems

If you mean reverse proxies doing compression, then I would have to test different products to say something conclusive. Best guess: simple and dumb products like Varnish would not be able to compress a slow-dripping stream, while advanced products with a hefty price tag such as BIG-IPs probably handles it just fine.

If you don't want to go and make that pull request

Ah, but I'm not averse to doing a pull request at all.

I'm probably a shitty programmer and don't want to insist on polluting your very nice codebase with any of my gunk.

Also I don't want to spend time writing something that you do not want to include anyway. I prefer to have the discussion first.

I submitted this issue because I thought that maybe there was an edge case here, where the quality of the compression module might be improved.

@dougwilson
Copy link
Contributor

Hi @usernameisalreadytaken2014,

There's a similar effect when not using compression. For example, if the disk stalls for 50ms and 600 bytes have been read, the network stack pushes those 600 bytes out, including an additional overhead (TCP + IP headers, ethernet framing) after 40ms.

The link between client and server is probably a lot faster than 600 bytes per 10 msec (faster than ~0.5 mbps), so the extra overhead has likely long gone, disappeared onto the wire, once the disk returns with more data.

So in my example, the code was always writing 4kb chunks to res.write, so without compression, there would have not been the sub-optimal case of only writing out 600 bytes. That cause would have been created purely by using this module with the proposal 40ms timer solution.

When the compression module is enabled, this is reversed; optimal use of bandwidth is guaranteed but your connection may stall.

Yes, this is true in a way. This module will turn your stream into being optimized to fill up the configured gzip chunks. You can actually configure this with the chunkSize option https://github.com/expressjs/compression#chunksize

Ah, but I'm not averse to doing a pull request at all.

I'm probably a shitty programmer and don't want to insist on polluting your very nice codebase with any of my gunk.

Also I don't want to spend time writing something that you do not want to include anyway. I prefer to have the discussion first.

I submitted this issue because I thought that maybe there was an edge case here, where the quality of the compression module might be improved.

That's OK. I really don't have any use-case for this feature, and there is no available time for a very, very long time to get something like this added to the module, which means that it's unlikely to move without a provided contribution due to (1) lack of time to spending implementing and (2) no real need seen by the current maintainers, resulting in even lower priority.

If it helps, perhaps this could be solved in some other way in the future, who knows? There is a problem right now with res.flush() in that Node.js core decided to use that method and then deprecate it, resulting is annoying messages in recent Node.js versions. There has been some talk about what to do about that, and thus may very well end up altering this behavior in some fashion, because really, this comes down to the "why do I need to use res.flush() with SSEs?" and if res.flush() is removed, SSE scenarios would have to get solved in some other fashion (perhaps transparently).

@usernameisalreadytaken2014
Copy link
Author

Yes, this is true in a way.

Thank you!! :-)

That's OK. I really don't have any use-case for this feature, and there is no available time for a very, very long time to get something like this added to the module,

OK.

You are probably correct that res.flush() after each write is a workable solution for my particular problem. At least I can't think of any reason why not.

Might be worthwhile to mention in the documentation for this module that res.write() has slightly different semantics when compression is used.

Eg.:
== Long-lived streams / data trickle ==
"When compression is enabled, there is a change in res.write() semantics. Long-lived HTTP transactions that occasionally trickle small amounts of data must therefore use res.flush() after each bit of data to ensure that the data reaches the remote end (client) in a timely fashion. This is probably particularly relevant for SSE streams."

There is a problem right now with res.flush() in that Node.js core decided to use that method and then deprecate it, resulting is annoying messages in recent Node.js versions.

Ugh.

@usernameisalreadytaken2014
Copy link
Author

FWIW, it seems that stream.Writable has grown a cork() method. Might come in handy, for example to revert to complete buffer fills, should the compress module ever change the default to use heuristics.

@usernameisalreadytaken2014
Copy link
Author

Any chance of getting #74 and #57 merged? ;-)

Looks like I'll be needing both...

@usernameisalreadytaken2014
Copy link
Author

Missed this one:

would you be interested in simply taking over this module?

Hmm, that would be interesting. I know compression and HTTP well enough, but I don't think my NodeJS skills are of the necessary caliber.

Haven't released any open source projects since CPAN and Perl was hot fuzz. I would like to get to know github first by releasing some smaller projects, before taking over something that is used en masse such as compression...

(Also: need to come up with a better username, this one is just embarrassing)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants