Skip to content
This repository has been archived by the owner on Nov 11, 2022. It is now read-only.

Google API Client Library version 1.23.0 causes runtime problems with Dataflow Java SDK #607

Open
moandcompany opened this issue Oct 5, 2017 · 19 comments

Comments

@moandcompany
Copy link

moandcompany commented Oct 5, 2017

The new Google API Client Library, version 1.23.0, appears to cause problems with the Dataflow Java SDK when submitting and/or running jobs.

This appears to affect Dataflow Java SDKs in both major version families (e.g. 1.9.1, 2.0.0, and 2.1.0)

In some cases, these problems manifest as 404 HTTP errors when attempting to upload staging files

Exception in thread "main" org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.io.IOException: Error executing batch GCS request :userprofile:run
at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:322)
at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:292)
at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:200)
at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:63)
(...)

Caused by: java.util.concurrent.ExecutionException: com.google.api.client.http.HttpResponseException: 404 Not Found
Not Found
at org.apache.beam.sdks.java.extensions.google.cloud.platform.core.repackaged.com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:500)
at org.apache.beam.sdks.java.extensions.google.cloud.platform.core.repackaged.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:479)
at org.apache.beam.sdks.java.extensions.google.cloud.platform.core.repackaged.com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:76)
at org.apache.beam.sdk.util.GcsUtil.executeBatches(GcsUtil.java:611)
at org.apache.beam.sdk.util.GcsUtil.getObjects(GcsUtil.java:358)
at org.apache.beam.sdk.extensions.gcp.storage.GcsFileSystem.matchNonGlobs(GcsFileSystem.java:217)
at org.apache.beam.sdk.extensions.gcp.storage.GcsFileSystem.match(GcsFileSystem.java:86)
(...)

Caused by: com.google.api.client.http.HttpResponseException: 404 Not Found
Not Found
at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1070)
at com.google.api.client.googleapis.batch.BatchRequest.execute(BatchRequest.java:241)
at org.apache.beam.sdk.util.GcsUtil$3.call(GcsUtil.java:604)
at org.apache.beam.sdk.util.GcsUtil$3.call(GcsUtil.java:602)
at org.apache.beam.sdks.java.extensions.google.cloud.platform.core.repackaged.com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111)
at org.apache.beam.sdks.java.extensions.google.cloud.platform.core.repackaged.com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58)

Workaround:
Pinning Google API Client Library dependencies to version 1.22.0 appears to avoid this issue

  • com.google.api-client:google-api-client:1.22.0

Gradle Example:

compile ('com.google.api-client:google-api-client:1.22.0') {
        force = true
    }

Maven Example:

<dependency>
  <groupId>com.google.api-client</groupId>
  <artifactId>google-api-client</artifactId>
  <version>[1.22.0]</version>
</dependency>
@polleyg
Copy link

polleyg commented Oct 11, 2017

We've had the same problem. Except for us, it was with the the BigQuery API that we were bringing into our project. Removing it fixed it (Beam has a dependancy in it anyway).

@pheromonez
Copy link

We're also experiencing issues during file staging. Before the attempt to upload files is made, we receive this error:
WARNING: Request failed with code 409, performed 0 retries due to IOExceptions, performed 0 retries due to unsuccessful status codes, HTTP framework says request can be retried, (caller responsible for retrying): https://www.googleapis.com/storage/v1/b?predefinedAcl=projectPrivate&predefinedDefaultObjectAcl=projectPrivate&project=<project name omitted>

Accessing the HTTP resource specified will return JSON data, within which there is an error with message
Anonymous users does not have storage.buckets.list access to project <project number omitted>.

@afcastano
Copy link

afcastano commented Oct 11, 2017

We had the same issue and we can confirm that as @moandcompany suggest, this fixes it:

compile ('com.google.api-client:google-api-client:1.22.0') {
        force = true
    }

For the record, our stack trace is pretty similar. We are running 2.2.0 snapshot version of apache beam:

java.io.IOException: Error executing batch GCS request
        at org.apache.beam.sdk.util.GcsUtil.executeBatches(GcsUtil.java:603)
        at org.apache.beam.sdk.util.GcsUtil.getObjects(GcsUtil.java:342)
        at org.apache.beam.sdk.extensions.gcp.storage.GcsFileSystem.matchNonGlobs(GcsFileSystem.java:217)
        at org.apache.beam.sdk.extensions.gcp.storage.GcsFileSystem.match(GcsFileSystem.java:86)
        at org.apache.beam.sdk.io.FileSystems.match(FileSystems.java:125)
        at org.apache.beam.sdk.io.FileSystems.matchSingleFileSpec(FileSystems.java:190)
        at org.apache.beam.runners.dataflow.util.PackageUtil.alreadyStaged(PackageUtil.java:159)
        at org.apache.beam.runners.dataflow.util.PackageUtil.stagePackageSynchronously(PackageUtil.java:188)
        at org.apache.beam.runners.dataflow.util.PackageUtil.access$000(PackageUtil.java:69)
        at org.apache.beam.runners.dataflow.util.PackageUtil$2.call(PackageUtil.java:176)
        at org.apache.beam.runners.dataflow.util.PackageUtil$2.call(PackageUtil.java:173)
        at org.apache.beam.runners.dataflow.repackaged.com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111)
        at org.apache.beam.runners.dataflow.repackaged.com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58)
        at org.apache.beam.runners.dataflow.repackaged.com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.ExecutionException: com.google.api.client.http.HttpResponseException: 404 Not Found
Not Found
        at org.apache.beam.sdks.java.extensions.google.cloud.platform.core.repackaged.com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:500)
        at org.apache.beam.sdks.java.extensions.google.cloud.platform.core.repackaged.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:459)
        at org.apache.beam.sdks.java.extensions.google.cloud.platform.core.repackaged.com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:76)
        at org.apache.beam.sdk.util.GcsUtil.executeBatches(GcsUtil.java:595)
        ... 16 more

@zinuzoid
Copy link

I got similar problem. Here's the API response.

Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request
{
  "code" : 400,
  "errors" : [ {
    "domain" : "global",
    "message" : "(249a6f2653c550b0): The workflow was automatically rejected by the service because it may trigger an identified bug in the SDK.\nBug details: com.google.api-client:google-api-client library version 1.23.0 is not supported..\nContact dataflow-feedback@google.com for further help. Please use this identifier in your communication: 67379331.",
    "reason" : "badRequest"
  } ],
  "message" : "(249a6f2653c550b0): The workflow was automatically rejected by the service because it may trigger an identified bug in the SDK.\nBug details: com.google.api-client:google-api-client library version 1.23.0 is not supported..\nContact dataflow-feedback@google.com for further help. Please use this identifier in your communication: 67379331.",
  "status" : "INVALID_ARGUMENT"
}

@lukecwik
Copy link
Contributor

Google added support to reject jobs from being created with this issue to prevent users from starting malformed jobs.

@frew
Copy link

frew commented Nov 14, 2017

The root cause for the 404's is outlined at googleapis/google-api-java-client#1073. Hilariously, you can't get to the error rejecting the job for bad dependencies until you've cleared up the staging problem (in our case by upgrading to com.google.apis:google-api-services-storage:v1-rev115-1.23.0 ). Is there another problem that's causing the job rejection? We're being forced to 1.23.0 by a bug in another Google API so this puts us between a rock and a hard place because lol @ Java versioning on Maven.

@Jdban
Copy link

Jdban commented Dec 4, 2017

+1 happening to us too. Is there any suggested remedy?

@moandcompany
Copy link
Author

The Cloud Dataflow team has added a page on Dataflow SDK and Worker Dependencies that identifies the google-api-client 1.22.0 version requirement (Java)

@Jdban
Copy link

Jdban commented Dec 12, 2017

The Cloud Dataflow team has added a page on Dataflow SDK and Worker Dependencies that identifies the google-api-client 1.22.0 version requirement (Java)

That is a useful link, but not really a solution for those of us like @frew who need to use google-api-client 1.23.0 due to a bug in another library

@sgri
Copy link

sgri commented Jan 16, 2018

I also have this issue

@ghost
Copy link

ghost commented Jan 29, 2018

any updates? Im running into this issue

@alan-ma-umg
Copy link

same here. apache beam 2.3.0 with dataflowrunner having the same 404 error. A permanent fix would be ideal.

Thanks.

@dsquier
Copy link

dsquier commented Mar 9, 2018

We encountered this as well. We're on Scio 0.5.5-beta1 and attempted to force the version to 1.2.2 using Overrides never worked. However, explicitly adding this library with a force() did work, i.e.,

"com.google.api-client" % "google-api-client" % "1.22.0" force()

@gfengster
Copy link

I have the same problem. Google forces moving out of storage@v1. Add

com.google.apis
google-api-services-storage
v1-rev115-1.23.0

The runtime error becomes
Exception in thread "main" org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.lang.NoClassDefFoundError: com/google/api/gax/rpc/HeaderProvider
It looks libraries conflict across Google's infrastructure libraries. Horrible.

@andrewcassidy
Copy link

@dsquier omg thank you. I was battling dependencyOverrides for a while and didn't think about force.

@pabloazurduy
Copy link

pabloazurduy commented May 23, 2018

I was redirected here from google because I was using the bigquery-client library and the same error appeared. Does anybody found a workaround to this issue?
I've tried (without success)

    <dependency>
      <groupId>com.google.cloud</groupId>
      <artifactId>google-cloud-bigquery</artifactId>
      <version>0.21.0-beta</version>
    </dependency>

@pievis
Copy link

pievis commented Jun 6, 2018

After analyzing my dependencies and checking the error, I was able to fix this by forcing the version of google-api-services-dataflow to v1b3-rev221-1.22.0 (and of course setting google-api-client to version 1.22.0)

Only setting google-api-client to the old version wasn't enough for me since I had the following error thrown:


java.io.IOException: Error executing batch GCS request
        at org.apache.beam.sdk.util.GcsUtil.executeBatches(GcsUt

when trying to compile my dataflow template

@vinnybod
Copy link

For anyone else still seeing issues like this, check out the version numbers here and make sure you aren't importing a conflicting dependency.

@labianchin
Copy link

Now Beam 2.5.0 depends on google-api-client:1.23.0, see https://cloud.google.com/dataflow/docs/concepts/sdk-worker-dependencies. Is this still an issue?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests