Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gitlab + S3: Percent encoding 307 URL causes 403 Forbidden #1590

Closed
alkoclick opened this issue Apr 1, 2019 · 4 comments
Closed

Gitlab + S3: Percent encoding 307 URL causes 403 Forbidden #1590

alkoclick opened this issue Apr 1, 2019 · 4 comments

Comments

@alkoclick
Copy link

alkoclick commented Apr 1, 2019

Description of the issue:
So, this is an issue that's been troubling me for quite some time now and has been an open ticket to our Gitlab team for a few days.

The CERN Gitlab registry is connected to S3 buckets on the backend. I have been replacing the deprecated Docker-Maven plugin in the CERN c2mon project and I noticed that a lot of my image pushes failed, but some of them got through. This only happened for jib pushes and only happened for Gitlab registries. Eventually I realized that only my first push to a specific tag got through, while the rest failed.

The problem, in bullet points:

Works as expected

  • Switching credentials-helpers or using the docker config json
  • Pushing to other registries (e.g DockerHub)
  • Using docker push on the images created, even multiple times, to other registries
  • After turning on logging and debugging output:
    • Manually executing the curl commands to Gitlab registry that received a 307,
    • then taking the returned redirect url and using it in a new, manual curl request to S3

Fails

  • Pushing to a tag (including namespace) that has been created before
  • Manually executing the curl command to S3 that received a 403

Expected behavior:
mvn jib:build results in the image being pushed to the remote gitlab registry

Steps to reproduce:

  • Set up an enterprise GitLab instance, using image registry with S3?
    Neither minimal not precise, I know
  • Push an image tag more than once to the same registry with the same name

Environment:

  • Linux Ubuntu 18.04
  • Maven 3.5.2
  • JiB 1.0.2
  • Gitlab Enterprise Edition 11.7.6-ee

jib-maven-plugin Configuration:

<properties>
    <image.base></image.base>
    <image.name></image.name>
    <registry>registry.hub.docker.com</registry>
    <repository>cern/${image.name}</repository>
  </properties>

The subprojects override the image.base and image.name

      <plugin>
        <groupId>com.google.cloud.tools</groupId>
        <artifactId>jib-maven-plugin</artifactId>
        <version>1.0.2</version>
        <executions>
          <execution>
            <phase>deploy</phase>
            <id>build</id>
            <goals>
              <goal>build</goal>
            </goals>
          </execution>
        </executions>
        <configuration>
          <from>
            <image>${image.base}</image>
          </from>
          <to>
            <image>${registry}/${repository}</image>
            <tags>
              <tag>${project.version}</tag>
              <tag>latest</tag>
            </tags>
          </to>
          <container>
            <useCurrentTimestamp>true</useCurrentTimestamp>
            <entrypoint>INHERIT</entrypoint>
          </container>
        </configuration>
      </plugin

Log output:
(Truncated, a lot)

CONFIG: curl -v --compressed -X HEAD -H 'Accept: ' -H 'Accept-Encoding: gzip' -H 'Authorization: Bearer ...' -H 'User-Agent: jib 1.0.2 jib-maven-plugin Google-HTTP-Java-Client/1.27.0 (gzip)' -- 'https://gitlab-registry.cern.ch/v2/alpapage/c2mon/blobs/sha256:9234fa5d2c00b58d3f2e02e030d2a121968152b56e68eb736d630f1d7926241a'
Apr 01, 2019 5:16:09 PM com.google.api.client.http.HttpResponse <init>
CONFIG: -------------- RESPONSE --------------
HTTP/1.1 307 Temporary Redirect
Content-Type: application/octet-stream
Docker-Distribution-Api-Version: registry/2.0
Location: https://s3.cern.ch/...Amz-Credential=...%2F20190401%2Fus-east-1%2Fs3%2Faws4_request...
X-Content-Type-Options: nosniff

...

CONFIG: curl -v --compressed -X HEAD -H 'Accept: ' -H 'Accept-Encoding: gzip' -H 'User-Agent: jib 1.0.2 jib-maven-plugin Google-HTTP-Java-Client/1.27.0 (gzip)' -- 'https://s3.cern.ch/...Amz-Credential=.../20190401/us-east-1/s3/aws4_request&X-Amz-Date=...'

Apr 01, 2019 5:16:09 PM com.google.api.client.http.HttpResponse <init>
CONFIG: -------------- RESPONSE --------------
HTTP/1.1 403 Forbidden
Accept-Ranges: bytes
Content-Length: 200
Content-Type: application/xml
Date: M TIMED	Pushing BLOB digest: [...], size: 56413556 : 2021.0 ms
[INFO]  TIMED	Setting up to push layers : 2376.0 ms
[INFO] Executing tasks:
[INFO] [===========================   ] 90.0% complete
[INFO] > building image to registry
Apr 01, 2019 5:16:09 PM com.google.common.util.concurrent.AggregateFuture$RunningState handleException
SEVERE: Got more than one input Future failure. Logging failures after the first
com.google.cloud.tools.jib.registry.RegistryUnauthorizedException: Unauthorized for gitlab-registry.cern.ch/cern/c2mon
	at com.google.cloud.tools.jib.registry.RegistryEndpointCaller.call(RegistryEndpointCaller.java:262)
	at com.google.cloud.tools.jib.registry.RegistryEndpointCaller.callWithAllowInsecureRegistryHandling(RegistryEndpointCaller.java:152)
	at com.google.cloud.tools.jib.registry.RegistryEndpointCaller.call(RegistryEndpointCaller.java:142)
	at com.google.cloud.tools.jib.registry.RegistryClient.callRegistryEndpoint(RegistryClient.java:356)
[ERROR] Failed to execute goal com.google.cloud.tools:jib-maven-plugin:1.0.2:build (default-cli) on project img-server: Build image failed, perhaps you should make sure you have permissions for gitlab-registry.cern.ch/cern/c2mon: Unauthorized for gitlab-registry.cern.ch/cern/c2mon: 403 Forbidden -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal com.google.cloud.tools:jib-maven-plugin:1.0.2:build (default-cli) on project img-server: Build image failed, perhaps you should make sure you have permissions for gitlab-registry.cern.ch/cern/c2mon
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:213)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:154)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:146)
    at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:117)
    at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:81)
    at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build (SingleThreadedBuilder.java:51)
    at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:128)

Additional Information:
It all comes down to this:
JiB executes:

curl (...) 'https://s3.cern.ch/...Amz-Credential=.../20190401/us-east-1/s3/aws4_request&...'

while the location returned from the 307 is

'https://s3.cern.ch/...Amz-Credential=...%2F20190401%2Fus-east-1%2Fs3%2Faws4_request&...'

It's the percent encoding or rather, decoding, that's causing this. I am not an expert, but my understanding is that S3 expects the exact same URL that GitLab redirected us to, and the percent decoding is messing the authorization up. When I manually curl to the percent encoded address I can get my requests through.

Wow, that was a long read! Keep up the awesome work!

@chanseokoh
Copy link
Member

chanseokoh commented Apr 1, 2019

Thanks for the detailed analysis. Looks like you really got to the root of the issue. I've looked into our codebase and figured out what's happening.

Basically, GenericUrl was decoding the encoded slashes in the query string (%2F into /) when instantiated. Jib basically does new GenericUrl(new URL("https://url string")). The code below

    URL url3 = new URL("http://example.com?query=string%2Fpart");
    System.out.println(url3);
    System.out.println(new GenericUrl(url3));
    System.out.println(new GenericUrl(url3).toURL().toString());

prints

http://example.com?query=string%2Fpart
http://example.com?query=string/part
http://example.com?query=string/part

Jib uses the Google HTTP Client library, and as this comment says, I don't see a way to not use GenericUrl when using the library. (BTW, the comment is talking about path segments rather than query strings, so that GitHub issue is unrelated for that part.) So, I can't think of any fix unless Jib ditches the Google HTTP Client library.

Then, a question remains if this decoding is the bug of GenericUrl or the Google HTTP Client library. The answer seems no, according to this GitHub comment as well as the code comment in PercentEscaper, where they say / does not have to be escaped and backends should handle them correctly.

  /**
   * A string of characters that do not need to be encoded when used in URI query strings, as
   * specified in RFC 3986. Note that some of these characters do need to be escaped when used in
   * other parts of the URI.
   */
  public static final String SAFEQUERYSTRINGCHARS_URLENCODER = "-_.!~*'()@:$,;/?:";

So, it seems the best and fast solution is to fix whatever behind https://s3.cern.ch/ so that it correctly handles query strings containing /. Or, it might be easier to make GitLab not encode / in the query string when returning a redirect location.

@alkoclick
Copy link
Author

Apparently this is a thing: https://github.com/ceph/ceph/pull/23652/files

Our gitlab team is currently working to implement this fix and I'll update here once we have some results

@alkoclick
Copy link
Author

Aaaaaand we got it! After implementing this fix we are able to successfully and successively push to Gitlab S3-backed container registries :)

🚀 🚀 🚀

@chanseokoh
Copy link
Member

@alkoclick great to hear it worked! Thanks for the update. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants