Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: when disconnecting, close the underlying connection before the response InputStream #1315

Merged
merged 5 commits into from Mar 15, 2021

Conversation

chingor13
Copy link
Collaborator

Adds a test to the NetHttpTransport and ApacheHttpTransport to make sure we don't wait until all the content is read when disconnecting the response.

Fixes #1303

@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Mar 11, 2021
@chingor13 chingor13 changed the title test: add failing test for waiting on response when disconnecting fix: when disconnecting, close the underlying connection before the response InputStream Mar 11, 2021
@chingor13 chingor13 marked this pull request as ready for review March 11, 2021 00:49
@chingor13 chingor13 requested a review from a team as a code owner March 11, 2021 00:49
response.disconnect();
ignore();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to call this at all? I'd think disconnecting alone is enough.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure of the original intention for sure, but it looks like it's supposed to close the response's InputStream so you can't read from it anymore - a way of letting google-http-client cleaning up the connections and their resources. This class was designed before AutoClosable was available (and possibly even before Closable was available).

Copy link
Contributor

@elharo elharo Mar 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more I look at this, the more off this seems. I think what we should do is:

      InputStream lowLevelResponseContent = this.response.getContent();
      lowLevelResponseContent.close();

ignore() actually reads and decodes the bytes remaining on the stream. That could be quite a long-running operation. The point of disconnecting is to throw those away and not waste time reading them.

If you see otherwise, let's discuss.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems reasonable, I'll try it out.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified ignore() to directly close the LowLevelHttpResponse's InputStream rather than doing the extra reading logic in getContent().

I still do need to swap the disconnect() and ignore() lines as the ApacheHttpResponse will still try to read the content from its input stream unless it's disconnected first.

Comment on lines -435 to +438
ignore();
// Close the connection before trying to close the InputStream content. If you are trying to
// disconnect, we shouldn't need to try to read any further content.
response.disconnect();
ignore();
Copy link
Collaborator Author

@chingor13 chingor13 Mar 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sold this is the best solution -- the tests do pass however.

An alternative is to mark that this response is "disconnecting" and in getContent() don't read from the underlying response if we're "disconnecting".

response.disconnect();
ignore();
Copy link
Contributor

@elharo elharo Mar 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more I look at this, the more off this seems. I think what we should do is:

      InputStream lowLevelResponseContent = this.response.getContent();
      lowLevelResponseContent.close();

ignore() actually reads and decodes the bytes remaining on the stream. That could be quite a long-running operation. The point of disconnecting is to throw those away and not waste time reading them.

If you see otherwise, let's discuss.

@chingor13 chingor13 added the automerge Merge the pull request once unit tests and other checks pass. label Mar 15, 2021
@gcf-merge-on-green gcf-merge-on-green bot merged commit f84ed59 into googleapis:master Mar 15, 2021
@gcf-merge-on-green gcf-merge-on-green bot removed the automerge Merge the pull request once unit tests and other checks pass. label Mar 15, 2021
@chingor13 chingor13 deleted the disconnect-wait branch March 15, 2021 17:43
gcf-merge-on-green bot pushed a commit that referenced this pull request Mar 15, 2021
🤖 I have created a release \*beep\* \*boop\*
---
### [1.39.1](https://www.github.com/googleapis/google-http-java-client/compare/v1.39.0...v1.39.1) (2021-03-15)


### Bug Fixes

* default application/json charset to utf-8 ([#1305](https://www.github.com/googleapis/google-http-java-client/issues/1305)) ([c4dfb48](https://www.github.com/googleapis/google-http-java-client/commit/c4dfb48cb8248564b19efdf1a4272eb6fafe3138)), closes [#1102](https://www.github.com/googleapis/google-http-java-client/issues/1102)
* when disconnecting, close the underlying connection before the response InputStream ([#1315](https://www.github.com/googleapis/google-http-java-client/issues/1315)) ([f84ed59](https://www.github.com/googleapis/google-http-java-client/commit/f84ed5964f376ada5eb724a3d1f3ac526d31d9c5)), closes [#1303](https://www.github.com/googleapis/google-http-java-client/issues/1303)


### Documentation

* 19.0.0 libraries-bom ([#1312](https://www.github.com/googleapis/google-http-java-client/issues/1312)) ([62be21b](https://www.github.com/googleapis/google-http-java-client/commit/62be21b84a5394455d828b0f97f9e53352b8aa18))
* update version ([#1296](https://www.github.com/googleapis/google-http-java-client/issues/1296)) ([f17755c](https://www.github.com/googleapis/google-http-java-client/commit/f17755cf5e8ccbf441131ebb13fe60028fb63850))


### Dependencies

* update dependency com.fasterxml.jackson.core:jackson-core to v2.12.2 ([#1309](https://www.github.com/googleapis/google-http-java-client/issues/1309)) ([aa7d703](https://www.github.com/googleapis/google-http-java-client/commit/aa7d703d94e5e34d849bc753cfe8bd332ff80443))
* update dependency com.google.protobuf:protobuf-java to v3.15.3 ([#1301](https://www.github.com/googleapis/google-http-java-client/issues/1301)) ([1db338b](https://www.github.com/googleapis/google-http-java-client/commit/1db338b8b98465e03e93013b40fd8d821ac245c8))
* update dependency com.google.protobuf:protobuf-java to v3.15.6 ([#1310](https://www.github.com/googleapis/google-http-java-client/issues/1310)) ([9cb50e4](https://www.github.com/googleapis/google-http-java-client/commit/9cb50e49e1cfc196b915465bb6ecbd90fb6d04d7))
---


This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
suztomo added a commit to suztomo/beam that referenced this pull request Apr 14, 2021
Since google-http-client 1.39.1, it does not read content stream
when a response has error status. The related pull request is
googleapis/google-http-java-client#1315
@suztomo
Copy link
Member

suztomo commented Apr 15, 2021

@chingor13 I need help in upgrading this google-http-client dependency for Beam. With this PR 1315, google-http-client does not seem to read the content of responses with error status codes. Is this intended behavior when you made this pull request?

Background: I'm trying to upgrade the Beam's dependencies (apache/beam#14527) and was updating assertions. To check the reason of the behavior change, I found this PR 1315 in google-http-client. Most of them are trivial (changing the number of invocation of getStatusCode, getContent and, getContentType), but I think certain error handling logic (throwing exceptions and logging messages to users) cannot work without reading the content of HTTP responses. Example assertion I'm stuck: https://github.com/apache/beam/pull/14527/files/7fe131c6256774134fc0a8503628c7add97b9984#r613669141

    when(response.getStatusCode()).thenReturn(403).thenReturn(200);
    // Google-http-client 1.39.1 and later does not read content stream of failed response
    when(response.getContent())
        .thenReturn(toStream(errorWithReasonAndStatus("rateLimitExceeded", 403)))
        .thenReturn(toStream(testTable));
...

    BigQueryServicesImpl.DatasetServiceImpl services =
        new BigQueryServicesImpl.DatasetServiceImpl(
            bigquery, null, PipelineOptionsFactory.create());
    Table ret =
        services.tryCreateTable(
            testTable, new RetryBoundedBackOff(3, BackOff.ZERO_BACKOFF), Sleeper.DEFAULT);
...
    // This logging does not work after upgrading google-http-client to 1.39.2 (from 1.38.1)
    expectedLogs.verifyInfo(
        "Quota limit reached when creating table project:dataset.table, "
            + "retrying up to 5.0 minutes");

The tryCreateTable method relies on ApiErrorExtractor to log errors appropriatey.

https://github.com/apache/beam/blob/4b7b74673b647c8d964b4877a8d66d47096acce4/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L626

        } catch (IOException e) {
          ApiErrorExtractor extractor = new ApiErrorExtractor();
          if (extractor.itemAlreadyExists(e)) {
            // The table already exists, nothing to return.
            return null;
          } else if (extractor.rateLimited(e)) {

I appreciate if you can help me to update the test case for the latest google-http-client (or fix google-http-client if this is not intended behavior).

@elharo
Copy link
Contributor

elharo commented Apr 15, 2021

http client is supposed to close the stream on a deliberate client invoked disconnect. not necessarily on an http error response. Is beam calling disconnect somewhere?

@elharo
Copy link
Contributor

elharo commented Apr 15, 2021

The problem might be this code in HttpRequest, in particular the call to disconnect when throwExceptionOnExecuteError is true.

    // throw an exception if unsuccessful response
    if (throwExceptionOnExecuteError && !response.isSuccessStatusCode()) {
      try {
        throw new HttpResponseException(response);
      } finally {
        response.disconnect();
      }
    }

However I can't yet prove this with a test.

@elharo
Copy link
Contributor

elharo commented Apr 15, 2021

Is the BigQuery Beam test trying to read the content from an exception?

@elharo
Copy link
Contributor

elharo commented Apr 15, 2021

Might want to look into extractor.rateLimited(e) to see what it's doing. It might be incomplete or making invalid assumptions.

@suztomo
Copy link
Member

suztomo commented Apr 15, 2021

@chingor13 @elharo I now think the problem seems to lie in the mock setup in Beam. At least it has assumption that lowLevelHttpRequest.getStatusCode() is called only once per response. (Thank you for the idea of FakeServer!) It's not that getContent was called fewer times, but getStatusCode was called more than the mock setup assumed.

I'll update you more.

@ChristianCiach
Copy link

This PR is the cause of issue GoogleContainerTools/jib#3409. I don't know if this PR is broken by itself or if Jib somehow uses the google-http-java-client in the wrong way. I just want to bring this to your attention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HttpResponse.disconnect() broken for ApacheHttpTransport
5 participants