Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to push to custom endpoint URL #1629

Open
chorus12 opened this issue May 12, 2020 · 6 comments
Open

Ability to push to custom endpoint URL #1629

chorus12 opened this issue May 12, 2020 · 6 comments

Comments

@chorus12
Copy link

Hi.
I am using https://github.com/wbingli/awscli-plugin-endpoint to setup endpoint_url in aws config file for minio storage.
Is there a way to setup quilt to work with custom endpoint url?

@akarve
Copy link
Member

akarve commented May 12, 2020

I assume you are talking about an object gateway endpoint, so that the client would talk to min.io instead of S3 (i.e. not what's in quilt3.config())? If so, we are working on this. Would you explain how you would like this to work? i.e. would you just plug in the address of the object gateway and then all calls that normally go to S3 go the gateway or endpoint_url as you refer to it?

@chorus12
Copy link
Author

quilt3.config() gives me that output:

<QuiltConfig at '~/.local/share/Quilt/config.yml' {
    "navigator_url": "https://open.quiltdata.com",
    "default_local_registry": "file:///home/sergei/.local/share/Quilt/packages",
    "default_remote_registry": null,
    "default_install_location": null,
    "registryUrl": "https://open-registry.quiltdata.com",
    "telemetry_disabled": false,
    "s3Proxy": "https://open-s3-proxy.quiltdata.com",
    "apiGatewayEndpoint": "https://sttuv8u2u4.execute-api.us-east-1.amazonaws.com/prod",
    "binaryApiGatewayEndpoint": "https://ap8tbn363c.execute-api.us-east-1.amazonaws.com/prod"
}>

As I understand there are at least two mandatory entities in a quilt installation :

  • remote registry (that processes api calls, holds all the metadata, performs user authentication/ authorization, etc)
  • data storage (the only one option now is amazon s3)

I am looking for a way to tell quilt to use specific s3-compatible storage, similar to the command below aws --endpoint-url https://my-minio-installation.com:9000 s3 ls that allows to use aws cli with a minio server.

@akarve
Copy link
Member

akarve commented May 14, 2020

Makes sense. We are doing some planning work now on adding GCP support. Would you like to contribute in the open source? If so we can start with a design that solves both problems.

@akarve
Copy link
Member

akarve commented May 14, 2020

I think what needs to happen here is that all of the naked S3 calls get wrapped with an abstraction and then, depending on the physical type of the wrapped storage layer, we dispatch the appropriate call. In theory S3, min.io, and GCP all support the same S3 v4 API. In practice... I'm not sure yet.

@cometta
Copy link

cometta commented Aug 27, 2020

@akarve we are looking for the same feature, use quilt to upload data to self hosted s3 endpoint. we defined the endpoint in config and credentials files, but it does not work hurdat.push('aleksey/hurdat', 's3://<self_hosted_s3_bucket>')

ClientError: An error occurred (InvalidAccessKeyId) when calling the PutObject operation: The AWS Access Key Id you provided does not exist in our records.

i confirmed we able to write to the on premise aws s3 cp ./testfile.txt s3://<self_hosted_s3_bucket> , i able to write file to the bucket

@Midnighter
Copy link

Midnighter commented Dec 8, 2020

I tried this out but ran into the same problem as @cometta. I set up a local MinIO for testing using docker-compose:

version: "3.8"

volumes:
  bucket-data: {}

secrets:
  access_key:
    file: ./secrets/access_key.txt
  secret_key:
    file: ./secrets/secret_key.txt

services:
  minio:
    image: minio/minio:RELEASE.2020-12-03T05-49-24Z
    volumes:
      - bucket-data:/data
    secrets:
      - access_key
      - secret_key
    environment:
      - "MINIO_ACCESS_KEY_FILE=/run/secrets/access_key"
      - "MINIO_SECRET_KEY_FILE=/run/secrets/secret_key"
    ports:
      - "127.0.0.1:9000:9000"
    command: "server /data"

Then configured credentials and profile using the AWS CLI with the above mentioned endpoint plugin. So now my profile looks like this:

[default]
s3api =
    endpoint_url = http://127.0.0.1:9000
s3 =
    endpoint_url = http://127.0.0.1:9000
    signature_version = s3v4
[plugins]
endpoint = awscli_plugin_endpoint

and I have corresponding credentials

[default]
aws_access_key_id = ****
aws_secret_access_key = ****

I can configure a default registry using quilt3 config-default-remote-registry s3://test-bucket but when I try to push an empty package, I get the same error as @cometta.

import quilt3

quilt3.config()
<QuiltConfig at '~/.local/share/Quilt/config.yml' {
    "navigator_url": null,
    "default_local_registry": "file:///~/.local/share/Quilt/packages",
    "default_remote_registry": "s3://test-bucket",
    "default_install_location": null,
    "registryUrl": null,
    "telemetry_disabled": true,
    "s3Proxy": null,
    "apiGatewayEndpoint": null,
    "binaryApiGatewayEndpoint": null,
    "default_registry_version": 1
}>
pkg = quilt3.Package()
top_hash = pkg.push("test/pkg", "s3://test-bucket", message="Hello World")
~/.pyenv/versions/3.8.6/envs/quilt/lib/python3.8/site-packages/quilt3/data_transfer.py in put_bytes(data, dest)
    825             raise ValueError("Cannot set VersionId on destination")
    826         s3_client = S3ClientProvider().standard_client
--> 827         s3_client.put_object(
    828             Bucket=dest.bucket,
    829             Key=dest.path,

~/.pyenv/versions/3.8.6/envs/quilt/lib/python3.8/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
    355                     "%s() only accepts keyword arguments." % py_operation_name)
    356             # The "self" in this scope is referring to the BaseClient.
--> 357             return self._make_api_call(operation_name, kwargs)
    358 
    359         _api_call.__name__ = str(py_operation_name)

~/.pyenv/versions/3.8.6/envs/quilt/lib/python3.8/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
    674             error_code = parsed_response.get("Error", {}).get("Code")
    675             error_class = self.exceptions.from_code(error_code)
--> 676             raise error_class(parsed_response, operation_name)
    677         else:
    678             return parsed_response

ClientError: An error occurred (InvalidAccessKeyId) when calling the PutObject operation: The AWS Access Key Id you provided does not exist in our records.

Do I need to supply the credentials differently? Should I make MinIO available under a domain instead of localhost for it to work? Thank you for any insights.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants