Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloud Storage recently stopped handling the RFC 3986 URI encoding by it self #57

Closed
montss opened this issue Jan 13, 2020 · 26 comments · Fixed by #135
Closed

Cloud Storage recently stopped handling the RFC 3986 URI encoding by it self #57

montss opened this issue Jan 13, 2020 · 26 comments · Fixed by #135
Assignees
Labels
api: storage Issues related to the googleapis/java-storage API. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. 🚨 This issue needs some love. semver: minor A new feature was added. No breaking changes. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@montss
Copy link

montss commented Jan 13, 2020

Environment details

Java version: 8
google-cloud-java version(s): libraries-bom:3.3.0

Steps to reproduce

1- Upload a file with spaces in its name
2- Read that file

Code example

StorageOptions.getDefaultInstance().getService().create(BlobInfo.newBuilder("local-files", "da8c9305-f275-4a52-affb-a3ac6e583dc9/p2p eq transit- Transit time estimator.xlsx").build(), bytes);
StorageOptions.getDefaultInstance().getService().readAllBytes("local-files", "da8c9305-f275-4a52-affb-a3ac6e583dc9/p2p eq transit- Transit time estimator.xlsx");

Stack trace

com.google.api.client.googleapis.json.GoogleJsonResponseException: 404 Not Found
No such object: local-files/da8c9305-f275-4a52-affb-a3ac6e583dc9/p2p+eq+transit-+Transit+time+estimator.xlsx  404 Not Found
No such object: local-files/da8c9305-f275-4a52-affb-a3ac6e583dc9/p2p+eq+transit-+Transit+time+estimator.xlsx
com.google.api.client.googleapis.json.GoogleJsonResponseException: 404 Not Found
No such object: local-files/da8c9305-f275-4a52-affb-a3ac6e583dc9/p2p+eq+transit-+Transit+time+estimator.xlsx
	at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:150)
	at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)
	at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:443)
	at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1108)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:541)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:474)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeMedia(AbstractGoogleClientRequest.java:502)
	at com.google.api.services.storage.Storage$Objects$Get.executeMedia(Storage.java:7006)
	at com.google.cloud.storage.spi.v1.HttpStorageRpc.load(HttpStorageRpc.java:630)
	at com.google.cloud.storage.StorageImpl$16.call(StorageImpl.java:589)
	at com.google.cloud.storage.StorageImpl$16.call(StorageImpl.java:586)
	at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:105)
	at com.google.cloud.RetryHelper.run(RetryHelper.java:76)
	at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
	at com.google.cloud.storage.StorageImpl.readAllBytes(StorageImpl.java:585)
	at com.google.cloud.storage.StorageImpl.readAllBytes(StorageImpl.java:577)

External references such as API reference guides used

https://cloud.google.com/storage/docs/request-endpoints#encoding

According to this the encoding is typically handled for you by client libraries, such as the Cloud Storage Client Libraries, so you can pass the raw object name to them.

Any additional information below

This stopped working recently After Upgrading the libraries-bom from 3.1.0 to 3.3.0
The object exists in the bucket with spaces.

Annotation 2020-01-13 165401

@chingor13 chingor13 transferred this issue from googleapis/google-cloud-java Jan 13, 2020
@yoshi-automation yoshi-automation added the triage me I really want to be triaged. label Jan 14, 2020
@dmitry-fa
Copy link
Contributor

I do confirm: it works with the libraries-bom 3.2.0, but no longer works with 3.3.0.
I will investigate the root couse.

@dmitry-fa
Copy link
Contributor

dmitry-fa commented Jan 14, 2020

This was broken by the following commit:
googleapis/google-http-java-client@7d4a048

The problem disappeared when I changed GenericUrl.java as it was before:

result.add(verbatim ? sub : CharEscapers.decodeUriPath(sub));
=>
result.add(verbatim ? sub : CharEscapers.decodeUri(sub));

@dmitry-fa
Copy link
Contributor

Workaround:

            <dependency>
                <groupId>com.google.http-client</groupId>
                <artifactId>google-http-client</artifactId>
                <version>1.33.0</version>
            </dependency>

@elharo
Copy link
Contributor

elharo commented Jan 15, 2020

Thanks for the detailed report. I think you've nailed the root cause. We're going to have to dig into this one and figure out exactly how to fix it. The old http java client behavior was a common but incorrect handling of plus signs that caused other problems. Something in the GCS stack--I'm not sure what yet--was depending on the buggy behavior. If we're lucky, we can fix it here. If not, we might have to fix this somewhere on the server.

@elharo elharo added priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. semver: minor A new feature was added. No breaking changes. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. and removed triage me I really want to be triaged. labels Jan 15, 2020
@elharo
Copy link
Contributor

elharo commented Jan 15, 2020

@dmitry-fa
Copy link
Contributor

Encoding spaces as "%20" not as "+" if met in the path component of URL should probably help.

@elharo
Copy link
Contributor

elharo commented Jan 15, 2020

Yes, that's exactly what we should be doing. The question is what piece of code in which project is doing this encoding. It probably isn't in this repo.

@elharo elharo closed this as completed Jan 15, 2020
@elharo elharo reopened this Jan 15, 2020
@dmitry-fa
Copy link
Contributor

The easiest way to fix:

com.google.api.client.util.escape.PercentEscaper

private static final char[] URI_ESCAPED_SPACE = {'+'};
=>
private static final char[] URI_ESCAPED_SPACE = "%20".toCharArray();

It fixes problem with the storage. But it may have some side effects.

@elharo
Copy link
Contributor

elharo commented Jan 15, 2020

Where do you propose making that change?

@dmitry-fa
Copy link
Contributor

I guess the source of the com.google.api.client.util.escape.PercentEscaper class exists only here:
https://github.com/googleapis/google-http-java-client/blob/master/google-http-client/src/main/java/com/google/api/client/util/escape/PercentEscaper.java

@elharo
Copy link
Contributor

elharo commented Jan 15, 2020

I don't think we're going to change http-java-client. 1.33.0 had a bug here and 1.34.0 fixed it. The problem is some other code somewhere has a complementary bug.

@elharo
Copy link
Contributor

elharo commented Jan 15, 2020

We should dig into the actual HTTP calls that are happening here to understand what the server is sending us. That might narrow this down. See

https://cloud.google.com/storage/docs/xml-api/overview

and

https://cloud.google.com/storage/docs/json_api/

I'm not sure which of these two the client library is wrap[ping.

@elharo
Copy link
Contributor

elharo commented Jan 15, 2020

@elharo
Copy link
Contributor

elharo commented Jan 15, 2020

Per the GCS API docs %20 is correct and + is wrong. This is consistent with the HTTP spec as well. Something is not following the docs, and I now think it might actually be in this repo. let me look

@dmitry-fa
Copy link
Contributor

As I can see, the storage passes the names of bucket and object to the http client as is. The client is responsible for URL encoding/decoding. It encodes spaces as '+', but since 1.33.0 '+' is no longer decoded as space. Encoding as '%20' allows to decode it properly.

@elharo
Copy link
Contributor

elharo commented Jan 15, 2020

The client should not encode spaces as +. Just need to figure out exactly where that happens so I can fix it.

@elharo elharo self-assigned this Jan 15, 2020
@dmitry-fa
Copy link
Contributor

This works only to obtain a signed URL

@dmitry-fa
Copy link
Contributor

dmitry-fa commented Jan 15, 2020

This is stack trace from the point where the storage parameters (bucket name, object name) are starting to be encoded:

	  at com.google.api.client.http.UriTemplate$CompositeOutput.getEncodedValue(UriTemplate.java:160)
	  at com.google.api.client.http.UriTemplate.getSimpleValue(UriTemplate.java:330)
	  at com.google.api.client.http.UriTemplate.expand(UriTemplate.java:314)
	  at com.google.api.client.http.UriTemplate.expand(UriTemplate.java:229)
	  at com.google.api.services.storage.Storage$Objects$Get.buildHttpRequestUrl(Storage.java:7014)
	  at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.buildHttpRequest(AbstractGoogleClientRequest.java:422)
	  at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:541)
	  at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:474)
	  at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeMedia(AbstractGoogleClientRequest.java:502)
	  at com.google.api.services.storage.Storage$Objects$Get.executeMedia(Storage.java:7006)
	  at com.google.cloud.storage.spi.v1.HttpStorageRpc.load(HttpStorageRpc.java:631)
	  at com.google.cloud.storage.StorageImpl$16.call(StorageImpl.java:589)
	  at com.google.cloud.storage.StorageImpl$16.call(StorageImpl.java:586)
	  at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:105)
	  at com.google.cloud.RetryHelper.run(RetryHelper.java:76)
	  at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
	  at com.google.cloud.storage.StorageImpl.readAllBytes(StorageImpl.java:585)
	  at com.google.cloud.storage.StorageImpl.readAllBytes(StorageImpl.java:577)
	  at storage.Generic.main(Generic.java:36)

This was referenced Jan 15, 2020
@jnizet
Copy link

jnizet commented Jan 17, 2020

This is quite a huge bug, preventing anyone from reading files containing spaces in their name, which is quite frequent.

Apparently, it exists in the latest version 1.103.0, but not in version 1.102.0. Wouldn't it be possible to release a new version identical to 1.102.0 for those upgrading to the latest version?

I had the luck to have caught this issue by a test before going to production, but I guess this could break a lot of applications.

@yoshi-automation yoshi-automation added 🚨 This issue needs some love. and removed 🚨 This issue needs some love. labels Jan 22, 2020
dkocher referenced this issue in googleapis/google-http-java-client Jan 27, 2020
* feat: decode uri path components correctly

The old implementation was incorrecly treating '+' as a space. Now
the only things that get decoded in the path are uri escaped sequences.

Fixes #398

* tweak javadoc

* remove hardcoded string
@dkocher
Copy link

dkocher commented Jan 27, 2020

This might be it:

https://github.com/googleapis/java-storage/blob/master/google-cloud-storage/src/main/java/com/google/cloud/storage/StorageImpl.java#L669

This only applies to signing objects. The code path is not used when reading objects.

@dkocher
Copy link

dkocher commented Jan 27, 2020

This was broken by the following commit:
googleapis/google-http-java-client@7d4a048

The problem disappeared when I changed GenericUrl.java as it was before:

result.add(verbatim ? sub : CharEscapers.decodeUriPath(sub));
=>
result.add(verbatim ? sub : CharEscapers.decodeUri(sub));

I can confirm this finding.

@google-cloud-label-sync google-cloud-label-sync bot added the api: storage Issues related to the googleapis/java-storage API. label Jan 29, 2020
@yoshi-automation yoshi-automation added 🚨 This issue needs some love. and removed 🚨 This issue needs some love. labels Feb 5, 2020
@dmitry-fa
Copy link
Contributor

@elharo Have you considered update com.google.api.client.util.escape.PercentEscaper

private static final char[] URI_ESCAPED_SPACE = {'+'};
=>
private static final char[] URI_ESCAPED_SPACE = "%20".toCharArray();

@athakor
Copy link
Contributor

athakor commented Feb 12, 2020

closing this issue as its duplicate of #121.

@athakor athakor closed this as completed Feb 12, 2020
@montss
Copy link
Author

montss commented Feb 13, 2020

@athakor I don't know why you choose to close this old bug and keep the new one which it seems has less priority as well, but any way we are still waiting for a fix, we can just use the work around and make sure google-http-client stays with none problematic version, but we use many Google client java libraries, not sure we want to take the risk of updating them but keeping one of their transitive dependencies.

@athakor
Copy link
Contributor

athakor commented Feb 13, 2020

@montss this issue has been fixed. wait until next version of google-http-client.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the googleapis/java-storage API. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. 🚨 This issue needs some love. semver: minor A new feature was added. No breaking changes. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants