Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create multiple signed URLs in bulk #2077

Closed
PierBover opened this issue Sep 27, 2022 · 7 comments
Closed

Create multiple signed URLs in bulk #2077

PierBover opened this issue Sep 27, 2022 · 7 comments
Labels
api: storage Issues related to the googleapis/nodejs-storage API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@PierBover
Copy link

As per @shaffeeullah instructions here I'm opening this feature request.

Currently the Node storage client only allows creating signed URLs one at a time. This is very inefficient in situations where users need to upload files en masse.

Surprisingly, the gsutil command already supports creating URLs in bulk for multiple objects, even using a wildcard:

Multiple gs:// URLs may be provided and may contain wildcards. A signed URL will be produced for each provided URL, authorized for the specified HTTP method and valid for the given duration.

https://cloud.google.com/storage/docs/gsutil/commands/signurl

So the feature request would be to bring this feature (which apparently already exists in the platform) to the Node client.

@PierBover PierBover added priority: p3 Desirable enhancement or fix. May not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. labels Sep 27, 2022
@product-auto-label product-auto-label bot added the api: storage Issues related to the googleapis/nodejs-storage API. label Sep 27, 2022
@borodayev
Copy link

Any updates on this?

@avinash1203
Copy link

Have any updates on this?

@Hans-PeterKuehn
Copy link

Is there any update on this?

@frankyn
Copy link
Member

frankyn commented Aug 10, 2023

Hi everyone,

Cloud Storage recently added matchGlob feature for list objects that makes this easier to write with the existing surface:

(async () => {
  const {Storage} = require('@google-cloud/storage');

  // Bulk Signed URLs
  // bucket has objects with objects starting with `random_file`
  const bucketName = "bucketname";

  const storage = new Storage();
  // Recommend reading through glob support: 
  // matchGlob: https://cloud.google.com/storage/docs/json_api/v1/objects/list#list-objects-and-prefixes-using-glob
  const [files] = await storage.bucket(bucketName).getFiles({matchGlob: "random_file*"});
  
  files.forEach(async file => {
    const [url] = await file
    .getSignedUrl({
      version: 'v4',
      action: 'read',
      expires: Date.now() + 15 * 60 * 1000, // 15 minutes
    });
    console.log(url);
  });
})();

The minimal code to accomplish generating a list of signed URLs is short enough that it does not require additional support in the library.

I want to add that there's varying performance results when running this code in a GCE instance with Application Default Credentials and service account private key being available to application in GCE.

  1. ADC requires making a call to iamcredentials signBlob; where each file must send a separate RPC for each object. (There's a quota of 60,000 requests per minute)
    image

  2. Service account private key being provided to application in GCE is fastest because it does not require a request to IAM but does require you self-manage the service account private key.

@PierBover
Copy link
Author

PierBover commented Aug 10, 2023

@frankyn this doesn't solve the current issue.

  1. Dealing with the complexity of creating and managing thousands of signed URLs. Your code doesn't take into account errors that could happen when creating say thousands of signed URLs.

  2. Dealing with the costs and quotas of many Node environments such as AWS Lambda or GC Functions when executing getSignedUrl for thousands of files concurrently or not.

All of this would be eliminated if we could create a single signed URL with a wildcard from the Node client. Or at the very minimum, be able to create multiple signed URLs in a single request.

Both of these features are supported by the GCS platform as already explained in my first post.

@frankyn
Copy link
Member

frankyn commented Aug 10, 2023

Hi @PierBover, thanks for the response, i have a few comments and questions. Please let me know if I'm misunderstanding.

Dealing with the complexity of creating and managing thousands of signed URLs. Your code doesn't take into account errors that could happen when creating say thousands of signed URLs.

I agree that the error handling is not fleshed out, but I think this can be helped by making existing getFiles and getSignedUrl methods more reliable. If you're seeing issues already those would be good to address separately.

Dealing with the costs and quotas of many Node environments such as AWS Lambda or GC Functions when executing getSignedUrl for thousands of files concurrently or not.

I think these would still be the same with or without an extra helper method in the Node.js storage library though. It would just be hidden.

All of this would be eliminated if we could create a single signed URL with a wildcard from the Node client. Or at the very minimum, be able to create multiple signed URLs in a single request.

  • Did you mean a bulk IAM SignBlob request or a method call to the Node.js client?
  • When you say single signed URL with a wildcard, I'm assuming you'd like https://.../signed-url-bucket/all-objects*?...?

@PierBover
Copy link
Author

PierBover commented Aug 11, 2023

I agree that the error handling is not fleshed out, but I think this can be helped by making existing getFiles and getSignedUrl methods more reliable. If you're seeing issues already those would be good to address separately.

I meant other errors. Some examples:

  • A cloud function gets throttled or just shutdown because of exceeding some quotas
  • Maybe some object has been deleted while generating thousands of signed URLs

I think these would still be the same with our without an extra helper method in the Node.js storage library though. It would just be hidden.

I see what you mean now.

I tried gsutil with a wildcard eg:

gsutil signurl -d 10 pier.json gs://my-bucket/something/*

I was expecting to get a single signed URL that would work with the wildcard. Instead gsutil figured out which files corresponded to the wildcard and then did multiple GET requests to receive multiple signed URLs.

So yes, you're right, it wouldn't make much of a difference.

I will just close this issue because this seems like a limitation in the core GCS service which cannot be solved from a client.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the googleapis/nodejs-storage API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

5 participants