Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow dynamic repository credentials for authenticated Binderhub instances. #1169

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

rprimet
Copy link
Contributor

@rprimet rprimet commented Oct 22, 2020

This PR is related to issue #1154

Proposed change

Make it possible to fill repo provider / git credentials for a forge (e.g. gitlab instance) dynamically using the currently logged-in user information. This would help set up binderhub instances for use with private repositories for organizations that run a forge (e.g. gitlab instance).

Scope

This change is meant for authenticated Binderhubs (e.g. an instance of BinderHub that is authenticated against a Gitlab instance and whose main purpose is to build repositories hosted in that instance). It does not address building private repositories from an anonymous Binderhub instance (which would probably entail UI changes and/or a complex authentication workflow?).

In that use case, an auth-aware RepoProvider could assume that auth information is available.

Issues

  • The example RepoProvider (AuthGitLabProvider) in issue Build/launch private repositories as the logged-in user (within an authenticated binderhub instance) #1154 makes a blocking http call in the initializer to retrieve auth info -- this should be made asynchronous, but I'm not sure how.

  • It should be possible to fill default traitlets config attributes (e.g. default_git_credentials) using dynamic credentials, after retrieving those credentials asynchronously. How can that be implemented?

  • The RepoProviders should probably made aware of the auth info in order to be able to fetch private repositories, but are RepoProviders the right place to implement authorization / access controls ? If not, would a pre_build_hook as suggested in User hook for the build endpoint #1117 help?

Testing this PR locally

  • Install this branch of Binderhub locally using minikube as per the CONTRIBUTING instructions

  • Start minikube (e.g. run minikube start --memory 8192) and note the minikube IP (run minikube ip)

  • Register for an account at gitlab.com (user will be called user1)

  • Create a private project

  • Create an application on gitlab.com ("settings" on your user profile) for authentication purposes (e.g. bhubtest).

    • In the redirect URL field, use http://<minikube ip>:30123/hub/oauth_callback (note that the minikube IP may change when minikube restarts, usually alternating between two values. If so, put both in the redirect_uri field to ease testing).
    • Give the application the api and read_repository scopes.
    • Note the values of Application ID and Application Secret.
    • Note: it would be nice to restrict the scope to read_api now that gitlab offers this...
  • Install the requests package (used by the example repo provider) e.g. pip install requests

  • change the file testing/minikube/binderhub_auth_config.py so that it looks like this

import json
from urllib.parse import urlparse
import os
import requests
from traitlets import default

from binderhub.repoproviders import GitLabRepoProvider

here = os.path.abspath(os.path.dirname(__file__))
load_subconfig(os.path.join(here, 'binderhub_config.py'))

c.BinderHub.base_url = '/'
c.BinderHub.auth_enabled = True
# configuration for authentication
hub_url = urlparse(c.BinderHub.hub_url)
c.HubOAuth.hub_host = '{}://{}'.format(hub_url.scheme, hub_url.netloc)
c.HubOAuth.api_token = c.BinderHub.hub_api_token
c.HubOAuth.api_url = c.BinderHub.hub_url + '/hub/api/'
c.HubOAuth.base_url = c.BinderHub.base_url
c.HubOAuth.hub_prefix = c.BinderHub.base_url + 'hub/'
c.HubOAuth.oauth_redirect_uri = 'http://127.0.0.1:8585/oauth_callback'
c.HubOAuth.oauth_client_id = '<client ID from the newly-created gitlab application>'

class AuthGitLabProvider(GitLabRepoProvider):
    def __init__(self, *args, handler, **kwargs):
        super().__init__(*args, **kwargs)
        self.handler = handler
        ud = self.get_user_data(self.handler.get_current_user()['name'])
        self.access_token = ud['auth_state']['access_token']

    def get_user_data(self, username):
        r = requests.get(c.HubOAuth.api_url + f'/users/{username}',
            headers={
             'Authorization': 'token %s' % c.HubOAuth.api_token,
            }
        )
        r.raise_for_status()
        return r.json()

    @default('git_credentials')
    def _default_git_credentials(self):
        if self.access_token:
            return r'username=oauth2\npassword={token}'.format(token=self.access_token)
        return ""

c.BinderHub.repo_providers = {'gl': AuthGitLabProvider}
  • change the file testing/minikube/jupyterhub-helm-auth-config.yaml so that it looks like this
cull:
  users: false
hub:
  services:
    binder:
      oauth_no_confirm: true
      oauth_redirect_uri: "http://127.0.0.1:8585/oauth_callback"
      # gitlab application id
      oauth_client_id: <newly-created Gitlab app ID>

custom:
  binderauth_enabled: true

singleuser:
  # to make notebook servers aware of hub
  cmd: jupyterhub-singleuser

auth:
  type: gitlab
  gitlab:
    callbackUrl: http://<minikube IP>:30123/hub/oauth_callback
    # retrieved from the GitLab UI (see above)
    clientId: "<newly-created Gitlab app ID>"
    clientSecret: "<newly-created Gitlab app secret>"
  state:
    enabled: true
    cryptoKey: "<output of `openssl rand -hex 32`>"
  • Install jupyterhub by running ./testing/minikube/install-hub --auth

  • Run binderhub: python3 -m binderhub -f testing/minikube/binderhub_auth_config.py

  • Access http://127.0.0.1:8585 using your browser. You will be prompted to login using gitlab.com

  • Authorize the application, you should end up at the familiar BinderHub UI (phew!)

  • Try building your private repository: in the provider dropdown, select gitlab.com and enter the "namespace" (user1/project) for your private project. Check that it builds and launches

  • Register a new gitlab user (user2), create a new private project for user2.

  • Check that user2 can build its own private project, but not user1's

(Note that whenever the minikube IP changes due to a restart, the file testing/minikube/jupyterhub-helm-auth-config.yaml should be updated accordingly and ./testing/minikube/install-hub --auth should be run again)

@minrk
Copy link
Member

minrk commented Nov 3, 2020

Thanks for opening this! I'm not sure passing the handler to the repoprovider is the right implementation, but having a concrete use case is very helpful. Some thoughts:

  • when deploying binderhub in an authenticated context, the Handler object should have the authenticated user model available as self.current_user. This should already have the info fetched by get_user_data(self, username). If it doesn't, that's something I think we can fix in HubAuth on the jupyterhub side without needing to re-identify the user a second time
  • if we are talking about standardizing on the auth info provided by HubAuth, we can pass user=self.current_user to RepoProviders instead of the Handler itself.
  • I believe passing the user model to the repo provider solves the need to make the request async in this particular case, but if we need a new async API to load info, I think we can add it.

@rprimet
Copy link
Contributor Author

rprimet commented Jan 15, 2021

Hi @minrk,

Thanks for your reply!

Regarding the Handler's current_user, at least in the configuration described above, it does not seem to contain all info fetched by get_user_data (crucially, auth_state seems to be missing).

Regarding the other points, yes, at least for this use case limiting ourselves to passing the user info to the RepoProvider would be enough.

@meeseeksmachine
Copy link

This pull request has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/a-question-about-binderhub-authentication-and-privacy/7758/2

@adriendelsalle
Copy link
Contributor

Hi!

That would be definitely a great feature !

when deploying binderhub in an authenticated context, the Handler object should have the authenticated user model available as self.current_user. This should already have the info fetched by get_user_data(self, username). If it doesn't, that's something I think we can fix in HubAuth on the jupyterhub side without needing to re-identify the user a second time

self.current_user is only exposing basic information (not incl. auth_state) about the user because it makes use of get_current_user_token or get_current_user_cookie, right ?
Since it requires the requester to be a hub admin to get the user auth_state, we can't get this info from token or cookie.

I would propose to fetch the user model once and only when auth_enabled is True, and pass a user_model kwarg to repo providers (defaulting to None when auth_enabled is False). This proposition is here!
Maybe we could/should define a trait in the base binderhub.RepoProvider class.

Then, you could just write your configuration as follow:

from binderhub.repoproviders import GitLabRepoProvider
from traitlets import default

class AuthGitLabProvider(GitLabRepoProvider):
    def __init__(self, *args, user_model, **kwargs):
        self.access_token = user_model['auth_state']['access_token']
        super().__init__(*args, **kwargs)

    @default('git_credentials')
    def _default_git_credentials(self):
        if self.access_token:
            return r'username=oauth2\npassword={token}'.format(token=self.access_token)
        return ""

c.BinderHub.repo_providers = {'gl': AuthGitLabProvider}

@rprimet @minrk Do you think it's a good way to do ?

@rprimet
Copy link
Contributor Author

rprimet commented Feb 19, 2021

@adriendelsalle that looks good to me!

@adriendelsalle
Copy link
Contributor

I would be happy to open a PR on your fork if you add me as a collaborator on it!

@rprimet
Copy link
Contributor Author

rprimet commented Feb 19, 2021

OK I just did, but I'm not sure if it would be valuable (what's the workflow?), maybe opening a PR on the main repo referencing this one would be better? Sorry for the naive question :-)

@meeseeksmachine
Copy link

This pull request has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/private-gitlab-access-for-binderhub/504/9

@g-braeunlich
Copy link
Contributor

Any chances, that we can get this merged?

@g-braeunlich
Copy link
Contributor

I tested it. Worked perfectly with the update of @adriendelsalle !

@rprimet rprimet closed this Mar 11, 2021
@rprimet rprimet force-pushed the dynamic_credentials_for_repoproviders branch from 275851b to 4eca8d2 Compare March 11, 2021 10:38
@rprimet
Copy link
Contributor Author

rprimet commented Mar 11, 2021

Seems that I messed something up while trying to sync with the current master. Yet master...rprimet:dynamic_credentials_for_repoproviders still seems to show the correct diff so I'm puzzled as to why...

@rprimet rprimet reopened this Mar 11, 2021
@rprimet rprimet marked this pull request as ready for review March 11, 2021 12:39
@adriendelsalle adriendelsalle force-pushed the dynamic_credentials_for_repoproviders branch from 275851b to 748e13d Compare March 18, 2021 19:14
@adriendelsalle
Copy link
Contributor

Seems that I messed something up while trying to sync with the current master. Yet master...rprimet:dynamic_credentials_for_repoproviders still seems to show the correct diff so I'm puzzled as to why...

Did you rebased or merged ? only rebase :).
This should be fixed now, I pushed on your fork.

@rprimet
Copy link
Contributor Author

rprimet commented Mar 18, 2021

@adriendelsalle great thanks!

@rprimet
Copy link
Contributor Author

rprimet commented Mar 25, 2021

Is there anything else (e.g. tests) to do on this PR?

@nreith
Copy link

nreith commented Sep 26, 2021

When will this be merged and available? I have a use case where users would want to share only with other people authorized on their repo.

@srpgilles
Copy link

What is the exact status of this PR? Seems to me like @adriendelsalle provided an helpful workaround which answered most if not all the objections raised in @minrk's remark, so it would be nice for the feature to be merged if the solution is good enough.

Thanks!

@larsbonczek
Copy link

Hey @minrk, could you please take a look at this again? :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants