New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: allow files in directories to be downloaded onto local machine #2199
base: main
Are you sure you want to change the base?
Conversation
@ddelgrosso1 , I tried running Is that accurate? |
Also, I would like to add a test to check that the files are truly downloaded into the |
That is accurate. I wouldn't worry about running these on your own, they get run in the CI pipeline each time a commit is pushed. However, unit tests should work without issue locally.
If you feel up to it a unit test can probably be created to test this. I can look to see if we have any similar tests elsewhere that might serve as a guide. One thing I will do is to cleanup the the JS Docs to make it abundantly clear that not supplying a |
…nloadManyFiles to File.download
done(); | ||
} catch (e) { | ||
done(e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any assert(false)
will throw an Error, meaning done()
never gets called and the test times out. Adding the try catch
guarantees done()
gets called. Similar to what is described in this Stack overflow discussion: https://stackoverflow.com/questions/66461468/mocha-test-false-assert-timeouts
I'm a bit surprised why this doesn't cause issues in the fs.readFile
callback assertions in the other tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting find, let me dig into this a little bit more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@danielbankhead would you mind just giving this a second set of eyes? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! I think we can make it better for File#download
customers by moving the logic to TransferManager#downloadManyFiles
if ( | ||
destination && | ||
(destination.endsWith('/') || destination.endsWith('\\')) | ||
) { | ||
callback?.(null, Buffer.alloc(0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could be problematic as /
could be a valid ending character for an object name: https://cloud.google.com/storage/docs/objects
Customers, namely outside of the Transfer Manager flow, could face issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @danielbankhead , thanks for the review. The issue with object names ending with /
is that while they may be valid on GCS, it can't actually be written on a Linux-based filesystem. I don't have a windows to try it on. Perhaps you know of an edge case where names ending with /
can actually be written.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scenario you described is valid, if destination
is unset/falsy
, and a Buffer
gets created/returned for the GCS object ending with /
. That behaviour remains unchanged in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this case, I think allowing the file system to error would be clearer and easier to understand than returning a Buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vishwarajanand I think we want to address this per @danielbankhead feedback.
|
||
// Skip directory objects as they cannot be written to local filesystem | ||
if ( | ||
destination && | ||
(destination.endsWith('/') || destination.endsWith('\\')) | ||
) { | ||
callback?.(null, Buffer.alloc(0)); | ||
} else if (destination) { | ||
fs.mkdirSync(path.dirname(destination), {recursive: true}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this logic should live out of File#download
for a few reasons:
- The
mkdirSync
would make this method slower for non-TM customers:- where the directory already exist (unnecessary I/O)
- with slower filesystems (blocking I/O)
- The logic for determining if a directory should be created can be handled for multiple files at once (rather than each file in the same directory doing the same work).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coincidentally, that was the original design I had (creating the directory structure on TransferManager). However, I made the change on File#download
because the root issue is that File#download
itself cannot write nested paths. And solving it at the lower level would solve the problem for both File
and TransferManager
.
That being said, it's not hard to optimize at TransferManager, I can create another branch. What do we do then about the limitation of File#download
above? Do we:
- warn users not to provide nested destination paths?
- or warn users that they need to ensure that the folder hierarchy exists before providing nested destination paths?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a simple solution would be to add a flag to gate this functionality and Transfer Manager use this flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vishwarajanand same comment here, I think we want to address @danielbankhead idea about flagging / gating this.
Hey @danielbankhead @ddelgrosso1 , just wanted to provide a heads up that I'll be revisiting this in about 1-2 months due to project priorities. Thank you for all the feedback and recommendations, they are all valid points for a user like myself. I do need something like Couple things I would design for, after both your feedback:
I envision users wanting to use TransferManager to download entire buckets or "sub-directories". So, those are some of the considerations I could think of. Something else that the |
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #2200 🦕