Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ACR support damaged by Gitlab registry change #1620

Open
wallyhall opened this issue Jan 11, 2023 · 4 comments
Open

ACR support damaged by Gitlab registry change #1620

wallyhall opened this issue Jan 11, 2023 · 4 comments
Labels

Comments

@wallyhall
Copy link

wallyhall commented Jan 11, 2023

Bug description

I believe ACR support has been damaged by the Gitlab registry change in commit f442f32

Specifically this line:

## binderhub/registry.py:194

                url_concat(self.token_url, {"scope": "repository:{}:pull".format(image)}),
                url_concat(self.token_url, {
                    "scope": "repository:{}:pull".format(image),
                    "service": "container_registry"   # <-- this.  For ACRs, the documentation simply says inject what the specific CR nedes via token_url config ... for Gitlab, it's been hard-coded (and now breaks the instructions for ACRs).  That's my theory.
                }),

Expected behaviour

When following a Binderhub link, if a pre-built/cached image already exists in the ACR, I expect Binderhub to pull it, not rebuild from scratch.

Actual behaviour

Binderhub runs repo2docker and builds a new image, pushing it to the ACR.

The logs contain the following errors:

[I 230111 09:52:26 launcher:257] Starting server for user ****** with image ******.azurecr.io/******/******
[I 230111 09:52:29 builder:744] Launched ****** in 3s
[I 230111 09:53:02 log:135] 200 GET /v2/git/******/2cd69cb92c9fa36877786c4672158f343e2d1a79?urlpath=******.ipynb (anonymous@******) 2.64ms
[E 230111 09:53:02 builder:394] Tornado HTTP Timeout error: Failed to get image manifest for ******.azurecr.io/******/******:2cd69cb92c9fa36877786c4672158f343e2d1a79
    Traceback (most recent call last):
      File "/usr/local/lib/python3.9/site-packages/binderhub/builder.py", line 388, in get
        image_manifest = await self.registry.get_image_manifest(
      File "/usr/local/lib/python3.9/site-packages/binderhub/registry.py", line 263, in get_image_manifest
        token = await self._get_token(
      File "/usr/local/lib/python3.9/site-packages/binderhub/registry.py", line 221, in _get_token
        auth_resp = await client.fetch(auth_req)
    tornado.httpclient.HTTPClientError: HTTP 400: Bad Request

I believe this is due to the token_url configured (shown below) being partially overwritten - specifically the ?service= by the linked git commit above...

config:
  DockerRegistry:
    token_url: https://********.azurecr.io/oauth2/token?service=********.azurecr.io

(I suspect it is becoming https://********.azurecr.io/oauth2/token?service=container_registry.)

How to reproduce

Follow the instructions in Binderhub's installation docs for an ACR, and attempt to launch the same repository multiple times.

Observe the logs (and behaviour).

Your personal set up

Have installed via Helm:

config:
  BinderHub:
    hub_url: https://hub.binder.*********
    hub_url_local: http://10.100.100.122
    image_prefix: *********.azurecr.io/*********/binder-prod-
    use_registry: true
    build_image: *********.azurecr.io/repo2docker-ssh:latest
  DockerRegistry:
    token_url: https://*********.azurecr.io/oauth2/token?service=*********.azurecr.io
  RateLimiter:
    period_seconds: 5
  KubernetesBuildExecutor:
    # this is vanilla repo2docker with the openssh client installed.
    build_image: *********.azurecr.io/repo2docker-ssh:latest

extraVolumes:
  - name: pvc-ssh-config
    persistentVolumeClaim:
      claimName: pvc-ssh-config
extraVolumeMounts:
  - name: pvc-ssh-config
    mountPath: /root/.ssh

imageBuilderType: pink

jupyterhub:
  hub:
    services:
      binder:
        apiToken: *********
  imagePullSecret:
    create: true
    password: *********
    registry: *********.azurecr.io
    username: *********
  proxy:
    secretToken: *********

registry:
  password: *********
  url: https://*********.azurecr.io
  username: *********

We are using Helm chart version 1.0.0-0.dev.git.2937.h0f65f33.

@wallyhall wallyhall added the bug label Jan 11, 2023
@welcome
Copy link

welcome bot commented Jan 11, 2023

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
welcome
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

@manics
Copy link
Member

manics commented Jan 11, 2023

Maybe we should make _get_image_basename_and_tag part of the Registry class so it can be specialised for particular registry implementations instead of trying to have it work with everything.

@manics
Copy link
Member

manics commented Jan 11, 2023

Can you suggest a fullname,basename,tag tuple we could add to this test?

@pytest.mark.parametrize(
"fullname,basename,tag",
[
(
"jupyterhub/k8s-binderhub:0.2.0-a2079a5",
"jupyterhub/k8s-binderhub",
"0.2.0-a2079a5",
),
("jupyterhub/jupyterhub", "jupyterhub/jupyterhub", "latest"),
("gcr.io/project/image:tag", "project/image", "tag"),
("weirdregistry.com/image:tag", "image", "tag"),
(
"gitlab-registry.example.com/group/project:some-tag",
"group/project",
"some-tag",
),
(
"gitlab-registry.example.com/group/project/image:latest",
"group/project/image",
"latest",
),
(
"gitlab-registry.example.com/group/project/my/image:rc1",
"group/project/my/image",
"rc1",
),
],
)
def test_image_basename_resolution(fullname, basename, tag):
result_basename, result_tag = _get_image_basename_and_tag(fullname)
assert result_basename == basename
assert result_tag == tag

You can anonymise it, but ideally make it as close as possible to a real case.

@wallyhall
Copy link
Author

wallyhall commented Jan 11, 2023

Can you suggest a fullname,basename,tag tuple we could add to this test?

I think this would be correct as an ACR example:

         ( 
             "myrepo.azurecr.io/myproject/prefix-myimage:latest", 
             "myproject/prefix-myimage", 
             "latest", 
         ), 

I'm not sure (unless the proposed _get_image_basename_and_tag will include dealing with the retrieval of authentication tokens?) this will cover what I suspect the specific issue is though.

In the logs, I can see a 400/Bad Request being returned by Azure when in the _get_token method of registry.py:

[E 230111 09:53:02 builder:394] Tornado HTTP Timeout error: Failed to get image manifest for ******.azurecr.io/******/******:2cd69cb92c9fa36877786c4672158f343e2d1a79
    Traceback (most recent call last):
      File "/usr/local/lib/python3.9/site-packages/binderhub/builder.py", line 388, in get
        image_manifest = await self.registry.get_image_manifest(
      File "/usr/local/lib/python3.9/site-packages/binderhub/registry.py", line 263, in get_image_manifest
        token = await self._get_token(
      File "/usr/local/lib/python3.9/site-packages/binderhub/registry.py", line 221, in _get_token
        auth_resp = await client.fetch(auth_req)
    tornado.httpclient.HTTPClientError: HTTP 400: Bad Request

I've been able to reproduce that in Postman using the following GET request:

https://binderhubacrpublic.azurecr.io/oauth2/token?service=binderhubacrpublic.azurecr.io&service=container_registry&scope=repository:IMAGE:pull

The URL has two service= query parameters, which I'm speculating is the result of the commit I linked in the description above. My speculation of why this is happening is based on the (confirmed) behaviour of tornado.httputil.url_concat as used inside registry.py / _get_token, and Binderhub documentation for using ACRs saying you must specify ?service=<repo>:

https://binderhub.readthedocs.io/en/latest/zero-to-binderhub/setup-binderhub.html#id3 (section 3.3.3):

```
token_url: "https://<ACR_NAME>.azurecr.io/oauth2/token?service=<ACR_NAME>.azurecr.io"
```

A "fix", I think, would be to drop the offending line from registry.py as shown below and instruct users of Gitlab to specify token_url=https://.../?service=container_registry (as like we're instructed to do for ACRs in the documentation linked above):

## binderhub/registry.py:194

                url_concat(self.token_url, {"scope": "repository:{}:pull".format(image)}),
                url_concat(self.token_url, {
                    "scope": "repository:{}:pull".format(image)
   #### problematic line removed ####
                }),

I hope that makes some semblance of sense. My brain is fried today!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants