Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chunked encoding + GZIP prevents keep-alive #367

Closed
eddavisson opened this issue Sep 1, 2017 · 6 comments · Fixed by #840 or #990
Closed

chunked encoding + GZIP prevents keep-alive #367

eddavisson opened this issue Sep 1, 2017 · 6 comments · Fixed by #840 or #990
Assignees
Labels
priority: p2 Moderately-important priority. Fix may not be included in next release. 🚨 This issue needs some love. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@eddavisson
Copy link

GZIPInputStream.read() starts returning -1 once it finishes reading the gzip trailer. Depending on the underlying InputStream, this may occur before it has finished consuming the underlying InputStream.

In particular, if the underlying InputStream is a ChunkedInputStream, the ChunkedInputStream never reads the final chunk header (which indicates a chunk of length 0).

When ChunkedInputStream.close() is called, it is in a non-STATE_DONE state, and does not call HttpClient.finished(), which is required for the HttpClient object to get into the keep alive cache.

One possible fix is here:
https://github.com/google/google-http-java-client/blob/acdaeacc3c43402d795266d2ebcb584abe918961/google-http-client/src/main/java/com/google/api/client/http/HttpResponse.java#L363

Rather than returning a GZIPInputStream, return a proxy input stream that retains references to both the gzip stream and the underlying stream. The proxy's close() method first consumes the underlying stream by calling read() and then calls close() on the gzip stream.

@eddavisson
Copy link
Author

It's possible this only applies for some HttpTransport types. For example, Apache's ChunkedInputStream.close() method indicates that it will consume the rest the stream automatically:
https://hc.apache.org/httpclient-3.x/apidocs/org/apache/commons/httpclient/ChunkedInputStream.html#close()

@wsh
Copy link

wsh commented Nov 15, 2017

We've worked around this in our code for now, but it's definitely worth fixing in the underlying library. 😄

@mattwhisenhunt mattwhisenhunt added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. labels Jan 8, 2018
@JustinBeckwith JustinBeckwith added priority: p2 Moderately-important priority. Fix may not be included in next release. 🚨 This issue needs some love. and removed priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. labels Jun 6, 2018
@pmakani pmakani removed their assignment Mar 19, 2019
@chingor13 chingor13 self-assigned this Jul 15, 2019
@elharo
Copy link
Contributor

elharo commented Jul 29, 2019

This is really weird. Is the issue that there's a GZIP file/blob in a much larger stream that contains extra data after the end of the gzipped part?

@eddavisson
Copy link
Author

IIRC, the issue occurs when the GZIPed content is chunked, so there is a GZIPInputStream wrapping a ChunkedInputStream.

The GZIPInputStream reports that is has been exhausted as soon as it observes the GZIP trailer. However, the ChunkedInputStream is not exhausted until it observes the final chunk header (with length 0).

@eddavisson
Copy link
Author

Note that the hack we put in place for google-cloud-datastore is going to become increasingly inconvenient in future JDK releases.

codyoss added a commit to codyoss/google-http-java-client that referenced this issue Oct 14, 2019
If a connection is closed and there are some bytes that have not
been read that connection can't be reused. Now GZIPInputStream
will have all of its bytes read on close automatically to promote
connection reuse.

Checkpicked from: googleapis#749
Fixes: googleapis#367
codyoss added a commit to codyoss/google-http-java-client that referenced this issue Oct 14, 2019
If a connection is closed and there are some bytes that have not
been read that connection can't be reused. Now GZIPInputStream
will have all of its bytes read on close automatically to promote
connection reuse.

Cherry-picked: googleapis#749
Fixes: googleapis#367
codyoss added a commit that referenced this issue Oct 18, 2019
If a connection is closed and there are some bytes that have not
been read that connection can't be reused. Now GZIPInputStream
will have all of its bytes read on close automatically to promote
connection reuse.

Cherry-picked: #749
Fixes: #367
@hiranya911
Copy link
Contributor

I believe the solution implemented in #840 is inadequate. Consider the following test case, which is very similar to the code used in the NetHttpTransport:

  @Test
  public void testConnectionReuse() throws Exception {
    for (int i = 0; i < 3; i++) {
      URL url = new URL("https://jsonplaceholder.typicode.com/todos/1");
      HttpURLConnection conn = (HttpURLConnection) url.openConnection();
      conn.setRequestMethod("GET");
      conn.addRequestProperty("accept-encoding", "gzip");

      InputStream original = conn.getInputStream();
      InputStream is = new GZIPInputStream(original);
      byte[] data = ByteStreams.toByteArray(is);
      System.out.println(new String(data));

      is.close();
    }
  }

I expect the above code to open only 1 connection due to keep-alive, but it actually opens 3:

$ sudo tcpdump -i eth0 'tcp[tcpflags] & (tcp-syn) != 0' and dst host typicode.com
14:34:01.425773 IP my.host.49644 > 104.28.28.194.https: Flags [S], seq 2833089104, win 64240, options [mss 1460,sackOK,TS val 2275669545 ecr 0,nop,wscale 7], length 0
14:34:01.548967 IP my.host.49646 > 104.28.28.194.https: Flags [S], seq 538857976, win 64240, options [mss 1460,sackOK,TS val 2275669668 ecr 0,nop,wscale 7], length 0
14:34:01.573442 IP my.host.49648 > 104.28.28.194.https: Flags [S], seq 3217697064, win 64240, options [mss 1460,sackOK,TS val 2275669693 ecr 0,nop,wscale 7], length 0

The solution in #840 simply wraps the GZIPInputStream in a ConsumingInputStream:

InputStream is = new ConsumingInputStream(new GZIPInputStream(original));

But this won't change the behavior, because ConsumingInputStream only does this:

@Override
  public void close() throws IOException {
    if (!closed && in != null) {
      try {
        ByteStreams.exhaust(this);
        super.in.close();
      } finally {
        this.closed = true;
      }
    }
  }

Since ByteStreams.exhaust() is simply trying to read any remaining bytes from the GZIPInputStream, it doesn't really help clean up the underlying the ChunkedInputStream.

To get the desired keep-alive behavior we need to do something like the following in the test case:

ByteStreams.exhaust(original);

codyoss pushed a commit that referenced this issue Mar 11, 2020
Previous implementation doesn't properly cleanup the last chunk of a chunked response when the response is also gzipped. This in turns prevents proper connection reuse.

Fixes #367
gcf-merge-on-green bot pushed a commit that referenced this issue Apr 27, 2020
🤖 I have created a release \*beep\* \*boop\* 
---
## [1.35.0](https://www.github.com/googleapis/google-http-java-client/compare/v1.34.2...v1.35.0) (2020-04-27)


### Features

* add logic for verifying ES256 JsonWebSignatures ([#1033](https://www.github.com/googleapis/google-http-java-client/issues/1033)) ([bb4227f](https://www.github.com/googleapis/google-http-java-client/commit/bb4227f9daec44fc2976fa9947e2ff5ee07ed21a))


### Bug Fixes

* add linkage monitor plugin ([#1000](https://www.github.com/googleapis/google-http-java-client/issues/1000)) ([027c227](https://www.github.com/googleapis/google-http-java-client/commit/027c227e558164f77be204152fb47023850b543f))
* Correctly handling chunked response streams with gzip ([#990](https://www.github.com/googleapis/google-http-java-client/issues/990)) ([1ba2197](https://www.github.com/googleapis/google-http-java-client/commit/1ba219743e65c89bc3fdb196acc5d2042e01f542)), closes [#367](https://www.github.com/googleapis/google-http-java-client/issues/367)
* FileDataStoreFactory will throw IOException for any permissions errors ([#1012](https://www.github.com/googleapis/google-http-java-client/issues/1012)) ([fd33073](https://www.github.com/googleapis/google-http-java-client/commit/fd33073da3674997897d7a9057d1d0e9d42d7cd4))
* include request method and URL into HttpResponseException message ([#1002](https://www.github.com/googleapis/google-http-java-client/issues/1002)) ([15111a1](https://www.github.com/googleapis/google-http-java-client/commit/15111a1001d6f72cb92cd2d76aaed6f1229bc14a))
* incorrect check for Windows OS in FileDataStoreFactory ([#927](https://www.github.com/googleapis/google-http-java-client/issues/927)) ([8b4eabe](https://www.github.com/googleapis/google-http-java-client/commit/8b4eabe985794fc64ad6a4a53f8f96201cf73fb8))
* reuse reference instead of calling getter twice ([#983](https://www.github.com/googleapis/google-http-java-client/issues/983)) ([1f66222](https://www.github.com/googleapis/google-http-java-client/commit/1f662224d7bee6e27e8d66975fda39feae0c9359)), closes [#982](https://www.github.com/googleapis/google-http-java-client/issues/982)
* **android:** set minimum API level to 19 a.k.a. 4.4 Kit Kat ([#1016](https://www.github.com/googleapis/google-http-java-client/issues/1016)) ([b9a8023](https://www.github.com/googleapis/google-http-java-client/commit/b9a80232c9c8b16a3c3277458835f72e346f6b2c)), closes [#1015](https://www.github.com/googleapis/google-http-java-client/issues/1015)


### Documentation

* android 4.4 or later is required ([#1008](https://www.github.com/googleapis/google-http-java-client/issues/1008)) ([bcc41dd](https://www.github.com/googleapis/google-http-java-client/commit/bcc41dd615af41ae6fb58287931cbf9c2144a075))
* libraries-bom 4.0.1 ([#976](https://www.github.com/googleapis/google-http-java-client/issues/976)) ([fc21dc4](https://www.github.com/googleapis/google-http-java-client/commit/fc21dc412566ef60d23f1f82db5caf3cfd5d447b))
* libraries-bom 4.1.1 ([#984](https://www.github.com/googleapis/google-http-java-client/issues/984)) ([635c813](https://www.github.com/googleapis/google-http-java-client/commit/635c81352ae383b3abfe6d7c141d987a6944b3e9))
* libraries-bom 5.2.0 ([#1032](https://www.github.com/googleapis/google-http-java-client/issues/1032)) ([ca34202](https://www.github.com/googleapis/google-http-java-client/commit/ca34202bfa077adb70313b6c4562c7a5d904e064))
* require Android 4.4 ([#1007](https://www.github.com/googleapis/google-http-java-client/issues/1007)) ([f9d2bb0](https://www.github.com/googleapis/google-http-java-client/commit/f9d2bb030398fe09e3c47b84ea468603355e08e9))


### Dependencies

* httpclient 4.5.12 ([#991](https://www.github.com/googleapis/google-http-java-client/issues/991)) ([79bc1c7](https://www.github.com/googleapis/google-http-java-client/commit/79bc1c76ebd48d396a080ef715b9f07cd056b7ef))
* update to Guava 29 ([#1024](https://www.github.com/googleapis/google-http-java-client/issues/1024)) ([ca9520f](https://www.github.com/googleapis/google-http-java-client/commit/ca9520f2da4babc5bbd28c828da1deb7dbdc87e5))
---


This PR was generated with [Release Please](https://github.com/googleapis/release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p2 Moderately-important priority. Fix may not be included in next release. 🚨 This issue needs some love. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
8 participants