Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for providing username and password for curl downloads #195

Open
colindean opened this issue Oct 27, 2018 · 17 comments
Open

Support for providing username and password for curl downloads #195

colindean opened this issue Oct 27, 2018 · 17 comments

Comments

@colindean
Copy link
Contributor

colindean commented Oct 27, 2018

I'd like to retrieve dependencies from a private URL that requires credentials. In my current setup, I'm using curl --netrc, which enables me to put a hostname, username, and password into ~/.netrc.

Poking through the source of the curl module, I see

with urllib.request.urlopen(url) as request:
is where the request is created. I looked at the urllib.request docs and see this example for setting up for basic auth:

import urllib.request
# Create an OpenerDirector with support for Basic HTTP Authentication...
auth_handler = urllib.request.HTTPBasicAuthHandler()
auth_handler.add_password(realm='PDQ Application',
                          uri='https://mahler:8092/site-updates.py',
                          user='klem',
                          passwd='kadidd!ehopper')
opener = urllib.request.build_opener(auth_handler)
# ...and install it globally so it can be used with urlopen.
urllib.request.install_opener(opener)
urllib.request.urlopen('http://www.example.com/login.html')

netrc parsing is, fortunately, a standard library feature of Python, through the aptly-named netrc library.

The implementation of this all seems pretty straightforward but the configuration within peru is unclear. It would probably make sense for peru to read this file and populate the opener by default but I could see how some users may feel that's a security risk. So, a compromise may be opt-in in the peru.yml, an auth block:

# from the example in README.md
curl module pathogen:
    url: https://codeload.github.com/tpope/vim-pathogen/tar.gz/v2.3
    auth: netrc
    # Untar the archive after fetching.
    unpack: tar
    # After the unpack, use this subdirectory as the root of the module.
    export: vim-pathogen-2.3/autoload/

curl module pathogen:
    url: https://codeload.github.com/tpope/vim-pathogen/tar.gz/v2.3
    auth:
        username: whatever
        password: whywouldyouputthathere
    # Untar the archive after fetching.
    unpack: tar
    # After the unpack, use this subdirectory as the root of the module.
    export: vim-pathogen-2.3/autoload/
@oconnor663
Copy link
Member

Interesting idea. How do you think we'd handle a project that has two or more modules requiring authentication, but their credentials are different. I'm not familiar with the netrc format, so I don't know if there's an obvious convention for this.

@colindean
Copy link
Contributor Author

netrc format is this:

machine 1.example.com login myusername password @#$%!
machine 2.example.com
  login myusername
  password @#$%!
default
login myusername
password @#$%!

All lines are valid, where machine is an identifier of some kind, generally a hostname or IP address. I'm not sure how netrc recommends handling two sets of credentials for the same machine other than to try them sequentially. There is a keyword default that can replace machine and indicates that it's the last thing to be tried with any hostname matching should previous definitions fail.

More info on curl's usage here: https://ec.haxx.se/usingcurl-netrc.html

@oconnor663
Copy link
Member

oconnor663 commented Oct 30, 2018

I think if the netrc file indicates that foo.com should have a username and password, then it's probably ok for any curl module pointed at foo.com to use it. An extra field like auth: true wouldn't really change much, I think. If the idea is that the peru.yaml file isn't trusted, well, the untrusted author could just insert the auth flag too. This isn't too different from how a git module will implicitly use your ~/.ssh credentials.

Would you be willing to put together a PR for this? I'd be happy to review it. I'm not sure how to test it though -- can Python's HTTPServer be set up locally to require this type of auth? Is there a standard environment variable we might want to use to override the ~/.netrc path for testing / in general?

@colindean
Copy link
Contributor Author

I'll see what I can drum up!

@Hologos
Copy link

Hologos commented Jan 23, 2019

If the idea is that the peru.yaml file isn't trusted, well, the untrusted author could just insert the auth flag too.

From my point of view, it's not about peru.yaml isn't trusted. I think it's not a good idea in general to store credentials in a repository. And since peru.yaml should be stored in a repository, it should not contain fields username and password.

I really like the idea of .netrc file, though.

@felipefoz
Copy link
Contributor

Hi,
Is anyone up for PR review of this feature?

I am in need of this feature in a near feature, so I might start working on it if you are willing to accept.

@oconnor663
Copy link
Member

oconnor663 commented Aug 19, 2021

Yes I'm happy to review. (Though please ping me if I miss an email and it seems like I'm not responding.) Here are the two main design issues as I understand them:

  • Since we're talking about passwords, the best method for obtaining them (particularly in modern CI testing environments like GitHub Actions) is from the environment. Hardcoding them in peru.yaml is of course a potential footgun, and outsourcing them to another file is probably the same footgun. There might be some use cases where putting passwords in files is acceptable (legacy systems with public but nonempty passwords?), but I'd want to be real sure there's a good use case before even supporting such a feature.
  • Because different modules might need different passwords, we could consider letting the module specify the name of the variable that should be read to provide its password. Alternatively, maybe we could look automatically look for an environment variable like PERU_CURL_PASSWORD_* where * is the uppercased module name. A similar approach is already in use internally. That has the advantage of not requiring any changes to peru.yaml, maybe at the cost of some flexibility? (Does any use case really need that flexibility?)

Let me know your thoughts.

@felipefoz
Copy link
Contributor

Yes, I declined the netrc approach when I started considering a CI idea, it would not be so feasible.

This environment variable approach would be probably the safest option and the quickest.

Just not to discard, I also did some investigation on some other options, and the git config url.insteaof sounds interesting. Because it could be used implicitly by the git plugin too.

But for the curl plugin, it would require parsing the git config file and loading internally. There is a git config library for python, but I am not so sure it is needed for this use case.

@felipefoz
Copy link
Contributor

Hi,

I've been thinking about this feature, and I've had two approaches:

  1. Use environment variables to get password. But, in this case, we also might need a user name. So, every module would need a user and a password environment variable, in my opinion, this does not scale very well. One or two modules, ok, but downloading from the same source could be more efficient.
    Good Point: we know it has a good chance of working.
    Bad point: it does not scale well.

  2. Second option is using git config options. In this case, you could use the command.
    git config url."https://user@password:somewebsite.com".insteadOf https://somewebsite.com
    Some tests I did (gitlab), if you put the user and password as secrets, it does not persist across jobs (sessions).
    Second thing, it can be used implicitly by the git module.
    For using in the curl module, we would have to parse the base url to extract only the "root" and check against this configuration using ConfigParser.
    Good point: it works for git module, and for multiple modules from same source.
    Bad Point: I have not tested yet, we can have surprises.

What do you think?

@colindean
Copy link
Contributor Author

colindean commented Aug 29, 2021

It's great to see traffic on this issue! I've long since moved on from the org that inspired my original submission but recently returned to using Peru and simultaneously working in Python for the first time in 15 years... but I have very little spare time nowadays.

I'd say that designing this well could enable all of these credential resolution methods. I'd recommend choosing environment variables as the first implementation. Doing something as @oconnor663 suggests with a module name in the envvar should be a first pass expanding perhaps the domain of the target system, but these both are limited in applicability mostly to curl, I think. Envvars are far more likely to be used in CI systems and if a user finds the number of envvars to be burdensome, they're going to script something to set the envvars anyway.

The VCS modules (git, hg, svn) should be configured the way their systems expect it. As such, I discourage putting credentials into .git/config. Instead, the user should configure their git repo with a credentials helper. Following this logic, because cURL provides a mechanism for credential retrieval from a file in the form of netrc support, Peru should support that convention. However, Peru's "curl" plugin doesn't actually use cURL: it just uses Python's urllib.request. I kinda wish a future version of Peru would disambiguate, renaming its internal HTTP retrieval something like "http" and enabling the curl plugin to use literally cURL.

import urllib.request

So, if Peru were to support envvars at a basic level enough to satisfy one scale level (maybe up to 10 modules requiring authentication; little effort put into deduplication: exercise left to implementor) and then choose the standard netrc format for the next scale level (beyond 10 modules) for the "curl" plugin, I think it would have a solid and secure interface for that plugin. The netrc implementation should read the file and pass the credentials into a HTTPPasswordMgr instance used by urllib. The other modules' credentials are set however their respective system specify and that's something left to the Peru user to configure safely using those systems (mostly VCS) credential storage.

To properly support password auth for rsync, Peru would have to set RSYNC_PASSWORD or pass --password-file. N.b. this is only for accessing a secured rsync server, not rsync over SSH, which uses SSH credential storage.

@felipefoz
Copy link
Contributor

The VCS modules (git, hg, svn) should be configured the way their systems expect it. As such, I discourage putting credentials into .git/config. Instead, the user should configure their git repo with a credentials helper.

Perfect, that's correct.

I kinda wish a future version of Peru would disambiguate, renaming its internal HTTP retrieval something like "http" and enabling the curl plugin to use literally cURL.

I totally agree. Maybe we could come up with an alias to not break compatibility.

So, if Peru were to support envvars at a basic level enough to satisfy one scale level (maybe up to 10 modules requiring authentication; little effort put into deduplication: exercise left to implementor) and then choose the standard netrc format for the next scale level (beyond 10 modules) for the "curl" plugin, I think it would have a solid and secure interface for that plugin.

I totally disagree. I think we should come up with a solution for all cases. No one wants to spend extra time doing this "deduplication". If it is someone new coming to the tool, this person wants a solid tool. We don't want to scare people from here, do we?

The netrc implementation should read the file and pass the credentials into a HTTPPasswordMgr instance used by urllib. The other modules' credentials are set however their respective system specify and that's something left to the Peru user to configure safely using those systems (mostly VCS) credential storage.

How we would write to this file? Where are you storing this file?

@colindean
Copy link
Contributor Author

I kinda wish a future version of Peru would disambiguate, renaming its internal HTTP retrieval something like "http" and enabling the curl plugin to use literally cURL.

I totally agree. Maybe we could come up with an alias to not break compatibility.

I could see this rolling out in a major release. In a minor release (e.g. 1.5.0), introduce the http as an alias for curl and emit a warning that curl will be renamed http in a future release. When the major release comes out (say, 2.0.0), the curl plugin should still maintain configuration compatibility even though the underlying mechanism is different. Only upon the next major release (3.0.0) should any backward-incompatible changes occur in order to maintain strict semver. I tend to abide by strict semver myself but have seen some software with other major, highly-desirable improvements delay breaking changes to the first minor release with adequate notification in the major release and warnings in the logs, etc.

So, if Peru were to support envvars at a basic level enough to satisfy one scale level (maybe up to 10 modules requiring authentication; little effort put into deduplication: exercise left to implementor) and then choose the standard netrc format for the next scale level (beyond 10 modules) for the "curl" plugin, I think it would have a solid and secure interface for that plugin.

I totally disagree. I think we should come up with a solution for all cases. No one wants to spend extra time doing this "deduplication". If it is someone new coming to the tool, this person wants a solid tool. We don't want to scare people from here, do we?

To clarify, by deduplication, I mean enabling two modules that share credentials to reference the same credential set. For example, for two modules with the names foo and bar:

PERU_PLUGIN_HTTP_CRED_FOO_USERNAME="asdf"
PERU_PLUGIN_HTTP_CRED_FOO_PASSWORD="hunter2"
PERU_PLUGIN_HTTP_CRED_BAR_USERNAME="asdf"
PERU_PLUGIN_HTTP_CRED_BAR_PASSWORD="hunter2"

This will quickly grow unwieldy, obviously, but enabling envvars in this way lends itself to this following configuration in nearly all CI configurations, I speculate:

# vela/drone object style
environment:
  MYSERVICE_USER: "asdf"
  MYSERVER_PASS: "hunter2"
  PERU_PLUGIN_HTTP_CRED_FOO_USERNAME: "${MYSERVICE_USER}"
  PERU_PLUGIN_HTTP_CRED_FOO_PASSWORD: "${MYSERVER_PASS}"
  PERU_PLUGIN_HTTP_CRED_BAR_USERNAME: "${MYSERVICE_USER}"
  PERU_PLUGIN_HTTP_CRED_BAR_PASSWORD: "${MYSERVER_PASS}"

Eventually, this is going to have to be defined somehow, even if it ends up being a step in CI to avoid the introduction of a ton of envvars:

curl module mypackage:
    url: https://myservice.example.com/packages/mypackage-1.2.4.tar.gz
    # this module _could_ automatically look at .netrc if it exists...
    auth: netrc
    # Untar the archive after fetching.
    unpack: tar
    # After the unpack, use this subdirectory as the root of the module.
    export: mypackage
# using Vela- and Drone-like config because I know it well enough to not look at docs ;-)
steps:
  - name: write .netrc
    image: alpine:latest
    commands:
    - echo "machine myservice.example.com login ${MYSERVICE_USER} password ${MYSERVICE_USER}" > .netrc
    - chmod 400 .netrc
  - name: get modules using netrc
    image: buildinspace/peru:latest
    commands:
    - peru sync
    # or specified instead of discovered in ./.netrc, ~/netrc, /etc/netrc, etc. whatever Python's netrc resolver does is fine
    - peru sync --netrc .netrc
  - name: get modules using envvars
    image: buildinspace/peru:latest
    environment:
      - PERU_PLUGIN_HTTP_CRED_MYPACKAGE_USERNAME: ${MYSERVICE_USER}
      - PERU_PLUGIN_HTTP_CRED_MYPACKAGE_PASSWORD: ${MYSERVICE_PASS}
    commands:
    - peru sync
    # or specified instead of discovered in ./.netrc, ~/netrc, /etc/netrc, etc. whatever Python's netrc resolver does is fine
    - peru sync --netrc .netrc
secrets:
- type: native
  name: myservice_user
  target: MYSERVICE_USER
- type: native
  name: myservice_pass
  target: MYSERVICE_PASS

The netrc implementation should read the file and pass the credentials into a HTTPPasswordMgr instance used by urllib. The other modules' credentials are set however their respective system specify and that's something left to the Peru user to configure safely using those systems (mostly VCS) credential storage.

How we would write to this file? Where are you storing this file?

Peru would not create it. Such feels like it could inevitably necessitate Peru allowing the definition of credentials in its configuration. That repeats a common security footgun pattern out of convenience!

Most likely, a user would create it at repo setup time, including CI preparation steps. Some CI systems, notably Travis, enable users to store entire files in secret storage and that file is written to a predictable location on the runner or in the working directory before all CI steps. Vela and Drone write declared secrets as a file to a /vela/secrets hierarchy, IIRC, and limit secret content to like 5 KiB, "which ought to be enough for anybody".

At a previous job, I accessed a number of services behind authentication and carried a .netrc file in my home directory for a few years alongside my SSH keys, etc.

@felipefoz
Copy link
Contributor

felipefoz commented Aug 30, 2021

Great,

So, the flow would be something like that:

At the developer machine the .netrc is located at default location which is usually ~/.netrc.
The user would have to include the field auth: true or auth: netrc in the "curl" module.
In this case peru will automatically fetch the data from .netrc because it is in the default location.

At the CI, usually there is not a ~/.netrc because is outside the runner scope. So the user would have to add the information in a local .netrc and then pass the file location to peru in the command like you wrote there.

peru sync --netrc .netrc

Have I understood correctly?

If you all agree I may start working on that.

@oconnor663
Copy link
Member

One small idea that could be the best of both worlds: We could have curl (or http) modules look for something like PERU_{MODULE}_USERNAME and PERU_{MODULE}_PASSWORD by default, and also support new fields like username_var and password_var to allow you to specify what env var to look for. The latter might be helpful in cases where you have many modules sharing the same password. At the risk of making things excessively complicated I guess...

@felipefoz
Copy link
Contributor

We could have curl (or http) modules look for something like PERU_{MODULE}USERNAME and PERU{MODULE}_PASSWORD

This would be more cumbersome to set in env. variables in developer machine. It would good for CI though.

I am not sure yet, but I feel that .netrc it is a more solid solution after all, for either CI or local development.
It wouldn't require so much effort I think, but there could be some challenges that I am not aware of.

@colindean
Copy link
Contributor Author

also support new fields like username_var and password_var to allow you to specify what env var to look for

This seems like a next-iteration feature of this feature. Perhaps a convention of something like PERU_PLUGIN_HTTP_CRED_MYMODULE_USERNAME or more succinctly PERU_CRED_MODULE_HTTP_USER would be good on the first pass alongside netrc, and then see how users end up feeling about setting an envvar versus creating a file before enablingg deviations from the convention.

@felipefoz
Copy link
Contributor

It seems that I lost in this one.

Let's see how it goes then. I will try to work on something in the next few days.

About the alias, I am opening another issue mentioning this discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants