Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: replace keyring dependency with generic pip-defined credential helper API #10389

Open
rittneje opened this issue Aug 22, 2021 · 20 comments
Labels
type: feature request Request for a new feature

Comments

@rittneje
Copy link
Contributor

Currently pip directly imports keyring in order to use it for authentication. Furthermore, to avoid a performance hit in the majority case, it will only consult keyring if the original request fails with 401. This has caused a few issues:

  1. Some users do not want pip to use keyring, even if it is present on the system. Keyring support should require an --enable-keyring flag #8719
  2. When interacting with a private PyPI repo, the initial unauthenticated request is undesirable. Add flag to force pip to always fetch password from keyring #10269
  3. Due to a missing keyring feature (at least on macos), the username still has to be specified in the index url. This in turn causes pip's credential cache to not function properly. fix auth cache to allow for username in index url #10288
  4. Relying on keyring leads to a chicken-or-the-egg bootstrapping issue. You need pip in order to install keyring, but if you need to auth with a private PyPI repo to do so, now what?
  5. In general, having pip directly rely on a specific external module seems undesirable. What happens if keyring falls out of vogue, or is abandoned?

I would like to propose that the direct dependency on keyring be removed, and replaced with a more generic concept, akin to docker's credential helper concept. https://docs.docker.com/engine/reference/commandline/login/#credentials-store

The basic idea is that the pip configuration be able to specify a credential helper library for a given PyPI hostname. If there is no entry for a hostname, then requests are sent without any authentication. This allows the majority case of talking to public PyPI to continue to work as it does today with no performance hit. If a hostname does have an entry, then it is expected to map to a library that exposes a get_pip_credentials(hostname) function, which takes the download hostname and returns the username and password to use. The user just needs to arrange for this library to be importable, so for example they could write a custom library and add it to their PYTHONPATH.

For keyring (excluding macos), a very simple adapter around the keyring API could be created, and ideally could even be part of keyring itself. For macos, I can make my own shim to pull the keyring username from some environment variable of my choosing. For CI/CD, a simple credential helper could be made to pull the username and password from pre-canned environment variables. (Since this logic lives entirely outside pip, the issue of multiple index urls mentioned here is not relevant.)

If a user doesn't want to use keyring, they simply would not specify it in their pip configuration. That means that by default, keyring would never be used at all. But it also means that users that do want to use it are not required to pass some flag to pip all the time.

If the credential helper API is standardized, then it can be shared with other dependency management tools such as pipenv, instead of the situation today where the two have completely different approaches. (For reference, most docker-like tools, such as skopeo and buildkit, will abide by your credential helpers, thus providing a consistent experience.)

@DiddiLeija DiddiLeija added the type: feature request Request for a new feature label Aug 22, 2021
@uranusjr
Copy link
Member

Sounds like a good thing to me. Pip historically tend to expose too much dependencies' internals to the user, and they generally become a problem down the road (the most significant is at the moment our dependence on pkg_resources and distutils). However, keyring being not an essential part of the packaging mechanism, it is more difficult to gather maintainer interest for this (keyring support was contributed by its author). Would you be interested in working on this? I would be happy to help review the design and implementation.

@pfmoore
Copy link
Member

pfmoore commented Aug 24, 2021

One constraint I would note is that we've traditionally been very explicit in not supporting people writing code that uses pip's internal API. From a brief scan of the proposal here, it seems to do a good job of limiting the interface, and I much prefer an approach like this than either the current "magical behaviour change" if keyring is installed, or the other proposals for command line options. So I'm basically in favour of this, as long as care is taken to make sure it doesn't set a precedent that people can write code against pip's internals.

@rittneje
Copy link
Contributor Author

@uranusjr @pfmoore I've raised an initial draft PR with just the changes for auth.py. I haven't wired it through to actually pull the mapping from pip.conf yet. Been trying to piece together how that actually works.

On that note, would something like this be reasonable? I'm not sure of the best way to do it given the constraints of the pip.conf format.

[global]
credential-helper = my-private-repo.com=keyring
credential-helper = another-private-repo.com=some_custom_module

@sbidoul
Copy link
Member

sbidoul commented Aug 25, 2021

Would it make sense to use Entry Points as discovery mechanism for credential helpers ? This implies delegating the configuration mechanism to the helpers themselves.

@pradyunsg
Copy link
Member

I think keeping the configuration of the credentials helper in pip is cleaner, since it allows arbitrary executables to be the helpers (i.e. allowing non-Python based implementations), doesn't require them to manage duplicated configuration stuff, and lets us the dispatch to different credential managers based on the domain.

@sbidoul
Copy link
Member

sbidoul commented Aug 25, 2021

allowing non-Python based implementations

a small python wrapper could call non-python implementations ?

doesn't require them to manage duplicated configuration stuff

fair

dispatch to different credential managers based on the domain

credential managers could accept the domain as argument, and pip could try them all until one knows about the domain ?

OTOH returning a user and password is only one possible authentication mechanism (people have been asking for ways to inject http headers etc). So the authentication and possibly in the future the http session mechanisms are bound to grow in complexity, which may or may not make them difficult to handle via pip's configuration.

@pfmoore
Copy link
Member

pfmoore commented Aug 25, 2021

... on the other hand, if we use an entry point, someone could write a helper (in Python) that did all of that config management, delegation to arbitrary executables, dispatching based on domain, etc.

That saves us having to get into the business of designing credential management schemes, and lets interested parties come up with their own mechanisms. I'd expect "standard" helpers to get developed and published, so this wouldn't require everyone to write their own.

(Basically, I'm arguing that we have the bare minimum in pip, and let others do the work 😉)

@uranusjr
Copy link
Member

The problem (I think) with supporting alternative authentication means (e.g. adding headers) is we’ll basically need to expose the entire session interface, which is another internal thing we don’t really want to rely on. If we want this to be as pluggable as possible, pip would need to provide an abstraction layer over requests.Session, which could be significant work (both initially and in maintenance).

@rittneje
Copy link
Contributor Author

I don't entirely understand what an entry point is, but it kind of feels like it would be the same situation where it gets used just because it is on the system, and not because the user actually wants it. To me the advantage of having it be explicitly part of the pip config is that the user has full control over what pip does.

I can't speak to the need for arbitrary HTTP headers since it isn't a part of my specific use case, but we could have the return value from the helper be a dict rather than just the username/password tuple directly. That would for example allow it to return a "headers" field in the future too without breaking anything.

@rittneje
Copy link
Contributor Author

@uranusjr @pfmoore Any further thoughts on this?

@pradyunsg
Copy link
Member

pradyunsg commented Aug 31, 2021

credential managers could accept the domain as argument, and pip could try them all until one knows about the domain ?

This sounds very inefficient, and a painful thing for users to debug when it's misbehaving.

(Basically, I'm arguing that we have the bare minimum in pip, and let others do the work 😉)

While I agree with this principle in general, I think we shouldn't do that here.

This has many similarities to PEP 516 vs PEP 517 IMO -- have-everyone-solve-the-same-problem for a simpler core, or a solve-the-problem-in-one-place for simplicity everywhere else. I prefer the latter, and that's also the route we took with PEP 517. :)

The problem we have at hand is exactly the same as what docker has (minus the fact that we need to move away from keyring). The exact problem at hand has already been solved by the thing we're modelling against and, honestly, I don't think we're going to be making major usability or maintainability breakthroughs over what they have.

This change would also give us a clear point to make this explicitly opt-in as well, which is nice. :)

IMO, one of the nice things that we can do here is piggy back on the ecosystem of credential helpers that have been built for that -- instead of coming up with our own protocol here, I think we can basically reuse docker's protocol for this. Allowing downstream users to do something like ln -s /usr/bin/docker-credential-foo /usr/bin/python-credential-foo (yes, this is unix-specific but equivalents exists on Windows and copy-renaming executables will likely work too) is better for everyone involved IMO -- we reduce the amount of busy-work the ecosystem has to do overall and we also make life easier for ourselves by using an existing definitely-working solution instead of needing to come up with our own and solve issues as they prop up (see keyring integration).

@pfmoore
Copy link
Member

pfmoore commented Aug 31, 2021

I think we can basically reuse docker's protocol for this

Is this the same as the git credential manager protocol? Do you have a link to a spec? This is the basis of the git one.

While I don't want to over-engineer too much here, if this is a general issue, should we go the extra step and standardise? After all, we try to make a point of saying that pip uses standards-based solutions so that we avoid the problem of having so much implementation-defined behaviour that nobody will ever stand a chance of writing a pip replacement.

It could be as simple as "Use the docker (git) protocol [add a link to the spec here], using the name pypa-credential-foo" (note the use of "pypa" rather than "pip").

(BTW, if standardising is "too complicated" for something this simple, I'd argue that means our process is bad, not that this doesn't need to be standardised!! In which case let's fix our process...)

@pradyunsg
Copy link
Member

pradyunsg commented Aug 31, 2021

Based on a quick reading, it looks like the protocols are different.

I think what we should use for the prefix is python if we’re gonna go through standardisation.

@pfmoore If I write the PEP for this, would you be willing to sponsor it?

@rittneje
Copy link
Contributor Author

Is this the same as the git credential manager protocol?

No, it seems docker and git are slightly different. git sends a bunch of key-value pairs on stdin, and expects a bunch of key-value pairs on stdout in response. docker provides only the hostname on stdin, and expects a JSON object on stdout in response. See https://docs.docker.com/engine/reference/commandline/login/#credential-helper-protocol (specifically the part about the get command). For reference, the git helper protocol is ultimately described here: https://git-scm.com/docs/gitcredentials

I think what we should use for the prefix is python if we’re gonna go through standardisation.

I feel "python" might be too vague/generic for this. To me this has to do with private PyPI (or at least PyPI-like) repos specfically. In any case, if the prefix is just "python", where would the configuration file mapping hostnames to credential helpers (i.e., the equivalent of ~/.docker/config.json) go? I imagine we wouldn't use pip.conf for that.

@pfmoore
Copy link
Member

pfmoore commented Aug 31, 2021

@pfmoore If I write the PEP for this, would you be willing to sponsor it?

Yep, certainly 🙂

@pfmoore
Copy link
Member

pfmoore commented Aug 31, 2021

docker provides only the hostname on stdin, and expects a JSON object on stdout in response

Thanks for the link. I agree, this protocol looks nice and simple, and suitable for our needs. One small thought - is it stable? If we tie ourselves to the Docker protocol, could tracking changes to it become a burden?

I feel "python" might be too vague/generic for this.

I'm not sure why we even need a prefix. If we simply have an entry in pip.ini that says credential_helper=xxx, then xxx is what we run (as a standard OS command). That's it. Users have to opt in, which is what we want, but everything else is down to the helper. Why do we need a prefix, or any sort of mandated name?

In fact, if we only allow a single credential helper to be configured, and state that it follows the Docker protocol, we may not even need a PyPA standard here. We'd only need a standard if we wanted to configure the name of the helper globally, and doing that now is likely premature (unless we know of other projects that would want access to the information - poetry? pipenv?)

Edit: Of course, we might want to make a PyPA policy that this is how projects handle credential management. But the PyPA doesn't really include policy-setting in its remit, so 🤷

@rittneje
Copy link
Contributor Author

I'm not sure why we even need a prefix.

Technically you do not. Docker does it more as a convention thing. For example, if you configure it to use "foobar" as the credential helper, it will actually run "docker-credential-foobar". But there is no reason I can think of that pip could not choose to instead say that you have to specify the whole command name and there won't be any prefixing "magic".

In fact, if we only allow a single credential helper to be configured, and state that it follows the Docker protocol, we may not even need a PyPA standard here. We'd only need a standard if we wanted to configure the name of the helper globally, and doing that now is likely premature (unless we know of other projects that would want access to the information - poetry? pipenv?)

Obviously I cannot speak on behalf of those project maintainers, but as a consumer it would certainly be preferable that all the PyPI stuff work consistently. (That's actually one of the main disadvantages of keyring today - as far as I can tell pipenv doesn't use it so we end up needing various one off solutions.)

You could certainly allow only one helper, which in turn maybe outsources to other helpers based on the specific hostname. At worst, you could always change pip to allow for the user to directly configure helpers per hostname in the future, with this singleton helper serving as a catch-all. (This is basically the concept of docker's credential store vs. the credential helpers.)

@jpedrick
Copy link

I've taken a swing at a mechanism for an override credential provider: 0205e2e

@odormond
Copy link

As pointed out in the ticket description,

When interacting with a private PyPI repo, the initial unauthenticated request is undesirable.

This has lead to issue #11721 being opened which I tried to address through two different PRs: #12473 and #12496 (I was not aware of this issue at the time). In the end, the discussion on #12496 came to the conclusion that the way pip currently manages credentials is overly complicated and based on heuristics that are broken for some users and that an overhaul was needed to properly solve it.

This proposal of delegating credentials management to an external helper could in my understanding also solve #11721.

However, the discussion in this issue was focused on the technical aspect so far and didn't touch the question of the backward compatibility / deprecation cycle. The way I'm imagining it is that the current MultiDomainBasicAuth implementation would be left in place for the duration of the deprecation cycle and a new CredentialsHelperAuth added and enabled in place of the other only when configured explicitly through pip config set global.credentials-helper /path/to/helper. At the end the deprecation cycle, MultiDomainBasicAuth would be replaced by a simplified implementation what would ignore keyring and any other heuristic about username and password and just pass through whatever is stored in the userinfo part of the URL (I guess support for netrc could be kept in there as well) or just support no authentication at all.

Unfortunately I cannot really commit on any timeframe for trying to implement such a CredentialsHelperAuth class for now, so feel free to give it a shot if interested.

@dougthor42
Copy link
Contributor

Would there be any consideration for using an existing credential helper specification, rather than rolling a custom one? The one I've used most is EngFlow's as it's used by Bazel (design doc), which itself was inspired by the docker credential helpers mentioned in the OP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: feature request Request for a new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants